1. 15 Jul, 2022 22 commits
  2. 14 Jul, 2022 18 commits
    • Nathan Chancellor's avatar
      x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current · db886979
      Nathan Chancellor authored
      Clang warns:
      
        arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection]
        DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
                            ^
        arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here
        extern u64 x86_spec_ctrl_current;
                   ^
        1 error generated.
      
      The declaration should be using DECLARE_PER_CPU instead so all
      attributes stay in sync.
      
      Cc: stable@vger.kernel.org
      Fixes: fc02735b ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db886979
    • Linus Torvalds's avatar
      Merge tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9bd572ec
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, bpf and wireless.
      
        Still no major regressions, the release continues to be calm. An
        uptick of fixes this time around due to trivial data race fixes and
        patches flowing down from subtrees.
      
        There has been a few driver fixes (particularly a few fixes for false
        positives due to 66e4c8d9 which went into -next in May!) that make
        me worry the wide testing is not exactly fully through.
      
        So "calm" but not "let's just cut the final ASAP" vibes over here.
      
        Current release - regressions:
      
         - wifi: rtw88: fix write to const table of channel parameters
      
        Current release - new code bugs:
      
         - mac80211: add gfp_t arg to ieeee80211_obss_color_collision_notify
      
         - mlx5:
            - TC, allow offload from uplink to other PF's VF
            - Lag, decouple FDB selection and shared FDB
            - Lag, correct get the port select mode str
      
         - bnxt_en: fix and simplify XDP transmit path
      
         - r8152: fix accessing unset transport header
      
        Previous releases - regressions:
      
         - conntrack: fix crash due to confirmed bit load reordering (after
           atomic -> refcount conversion)
      
         - stmmac: dwc-qos: disable split header for Tegra194
      
        Previous releases - always broken:
      
         - mlx5e: ring the TX doorbell on DMA errors
      
         - bpf: make sure mac_header was set before using it
      
         - mac80211: do not wake queues on a vif that is being stopped
      
         - mac80211: fix queue selection for mesh/OCB interfaces
      
         - ip: fix dflt addr selection for connected nexthop
      
         - seg6: fix skb checksums for SRH encapsulation/insertion
      
         - xdp: fix spurious packet loss in generic XDP TX path
      
         - bunch of sysctl data race fixes
      
         - nf_log: incorrect offset to network header
      
        Misc:
      
         - bpf: add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs"
      
      * tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
        nfp: flower: configure tunnel neighbour on cmsg rx
        net/tls: Check for errors in tls_device_init
        MAINTAINERS: Add an additional maintainer to the AMD XGBE driver
        xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
        selftests/net: test nexthop without gw
        ip: fix dflt addr selection for connected nexthop
        net: atlantic: remove aq_nic_deinit() when resume
        net: atlantic: remove deep parameter on suspend/resume functions
        sfc: fix kernel panic when creating VF
        seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
        seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
        seg6: fix skb checksum evaluation in SRH encapsulation/insertion
        sfc: fix use after free when disabling sriov
        net: sunhme: output link status with a single print.
        r8152: fix accessing unset transport header
        net: stmmac: fix leaks in probe
        net: ftgmac100: Hold reference returned by of_get_child_by_name()
        nexthop: Fix data-races around nexthop_compat_mode.
        ipv4: Fix data-races around sysctl_ip_dynaddr.
        tcp: Fix a data-race around sysctl_tcp_ecn_fallback.
        ...
      9bd572ec
    • Linus Torvalds's avatar
      Merge tag '5.19-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · f41d5df5
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three smb3 client fixes:
      
         - two multichannel fixes: fix a potential deadlock freeing a channel,
           and fix a race condition on failed creation of a new channel
      
         - mount failure fix: work around a server bug in some common older
           Samba servers by avoiding padding at the end of the negotiate
           protocol request"
      
      * tag '5.19-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: workaround negprot bug in some Samba servers
        cifs: remove unnecessary locking of chan_lock while freeing session
        cifs: fix race condition with delayed threads
      f41d5df5
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · a24a6c05
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
       "Notable regression fixes:
      
         - Enable SETATTR(time_create) to fix regression with Mac OS clients
      
         - Fix a lockd crasher and broken NLM UNLCK behavior"
      
      * tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        lockd: fix nlm_close_files
        lockd: set fl_owner when unlocking files
        NFSD: Decode NFSv4 birth time attribute
      a24a6c05
    • Linus Torvalds's avatar
      Merge tag 'integrity-v5.19-fix' of... · 4adfa865
      Linus Torvalds authored
      Merge tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
      
      Pull integrity fixes from Mimi Zohar:
       "Here are a number of fixes for recently found bugs.
      
        Only 'ima: fix violation measurement list record' was introduced in
        the current release. The rest address existing bugs"
      
      * tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
        ima: Fix potential memory leak in ima_init_crypto()
        ima: force signature verification when CONFIG_KEXEC_SIG is configured
        ima: Fix a potential integer overflow in ima_appraise_measurement
        ima: fix violation measurement list record
        Revert "evm: Fix memleak in init_desc"
      4adfa865
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 2eb5866c
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
      
       - quieten the spectre-bhb prints
      
       - mark flattened device tree sections as shareable
      
       - remove some obsolete CPU domain code and help text
      
       - fix thumb unaligned access abort emulation
      
       - fix amba_device_add() refcount underflow
      
       - fix literal placement
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 9208/1: entry: add .ltorg directive to keep literals in range
        ARM: 9207/1: amba: fix refcount underflow if amba_device_add() fails
        ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
        ARM: 9213/1: Print message about disabled Spectre workarounds only once
        ARM: 9212/1: domain: Modify Kconfig help text
        ARM: 9211/1: domain: drop modify_domain()
        ARM: 9210/1: Mark the FDT_FIXED sections as shareable
        ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle
      2eb5866c
    • Guenter Roeck's avatar
      um: Replace to_phys() and to_virt() with less generic function names · 097da1a4
      Guenter Roeck authored
      The UML function names to_virt() and to_phys() are exposed by UML
      headers, and are very generic and may be defined by drivers.  As it
      turns out, commit 9409c9b6 ("pmem: refactor pmem_clear_poison()")
      did exactly that.
      
      This results in build errors such as the following when trying to build
      um:allmodconfig:
      
        drivers/nvdimm/pmem.c: In function ‘pmem_dax_zero_page_range’:
        ./arch/um/include/asm/page.h:105:20: error: too few arguments to function ‘to_phys’
          105 | #define __pa(virt) to_phys((void *) (unsigned long) (virt))
              |                    ^~~~~~~
      
      Use less generic function names for the um specific to_phys() and
      to_virt() functions to fix the problem and to avoid similar problems in
      the future.
      
      Fixes: 9409c9b6 ("pmem: refactor pmem_clear_poison()")
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      097da1a4
    • Linus Torvalds's avatar
      Merge tag 'sound-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c4634a3c
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Hopefully the last one for 5.19. This became bigger than wished, but
        all changes are pretty device-specific small fixes, which look less
        worrisome.
      
        The majority of changes are about various ASoC fixes, while the usual
        HD-audio quirks are included as well"
      
      * tag 'sound-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
        ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
        ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221
        ALSA: hda/realtek: fix mute/micmute LEDs for HP machines
        ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
        ALSA: hda - Add fixup for Dell Latitidue E5430
        ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
        ALSA: hda/realtek: Fix headset mic for Acer SF313-51
        ASoC: Intel: Skylake: Correct the handling of fmt_config flexible array
        ASoC: Intel: Skylake: Correct the ssp rate discovery in skl_get_ssp_clks()
        ASoC: rt5640: Fix the wrong state of JD1 and JD2
        ASoC: Intel: sof_rt5682: fix out-of-bounds array access
        ASoC: qdsp6: fix potential memory leak in q6apm_get_audioreach_graph()
        ASoC: tas2764: Fix amp gain register offset & default
        ASoC: tas2764: Correct playback volume range
        ASoC: tas2764: Fix and extend FSYNC polarity handling
        ASoC: tas2764: Add post reset delays
        ASoC: dt-bindings: Fix description for msm8916
        ASoC: doc: Capitalize RESET line name
        ASoC: arizona: Update arizona_aif_cfg_changed to use RX_BCLK_RATE
        ASoC: cs47l92: Fix event generation for OUT1 demux
        ...
      c4634a3c
    • Tianyu Yuan's avatar
      nfp: flower: configure tunnel neighbour on cmsg rx · 656bd03a
      Tianyu Yuan authored
      nfp_tun_write_neigh() function will configure a tunnel neighbour when
      calling nfp_tun_neigh_event_handler() or nfp_flower_cmsg_process_one_rx()
      (with no tunnel neighbour type) from firmware.
      
      When configuring IP on physical port as a tunnel endpoint, no operation
      will be performed after receiving the cmsg mentioned above.
      
      Therefore, add a progress to configure tunnel neighbour in this case.
      
      v2: Correct format of fixes tag.
      
      Fixes: f1df7956 ("nfp: flower: rework tunnel neighbour configuration")
      Signed-off-by: default avatarTianyu Yuan <tianyu.yuan@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Reviewed-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220714081915.148378-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      656bd03a
    • Tariq Toukan's avatar
      net/tls: Check for errors in tls_device_init · 3d8c51b2
      Tariq Toukan authored
      Add missing error checks in tls_device_init.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d8c51b2
    • Tom Lendacky's avatar
      MAINTAINERS: Add an additional maintainer to the AMD XGBE driver · 51f1c31f
      Tom Lendacky authored
      Add Shyam Sundar S K as an additional maintainer to support the AMD XGBE
      network device driver.
      
      Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Link: https://lore.kernel.org/r/db367f24089c2bbbcd1cec8e21af49922017a110.1657751501.git.thomas.lendacky@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      51f1c31f
    • Juergen Gross's avatar
      xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue · 94e81006
      Juergen Gross authored
      xenvif_rx_next_skb() is expecting the rx queue not being empty, but
      in case the loop in xenvif_rx_action() is doing multiple iterations,
      the availability of another skb in the rx queue is not being checked.
      
      This can lead to crashes:
      
      [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
      [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
      [40072.537534] PGD 0 P4D 0
      [40072.537644] Oops: 0000 [#1] SMP NOPTI
      [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
      [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
      [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
      [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
      [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
      [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
      [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
      [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
      [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
      [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
      [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
      [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
      [40072.539211] Call Trace:
      [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
      [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]
      
      Fix that by stopping the loop in case the rx queue becomes empty.
      
      Cc: stable@vger.kernel.org
      Fixes: 98f6d57c ("xen-netback: process guest rx packets in batches")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Link: https://lore.kernel.org/r/20220713135322.19616-1-jgross@suse.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      94e81006
    • Linus Torvalds's avatar
      amdgpu: disable powerpc support for the newer display engine · d11219ad
      Linus Torvalds authored
      The DRM_AMD_DC_DCN display engine support (Raven, Navi, and newer) has
      not been building cleanly on powerpc and causes link errors due to
      mixing hard- and soft-float object files:
      
        powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o uses soft float
        powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_resource.o
        [..]
      
      and while patches are floating around, it's not exactly obvious what is
      going on.
      
      The problem bisects to commit 41b7a347 ("powerpc: Book3S 64-bit
      outline-only KASAN support") but that is probably more about changing
      config variables than the fundamental cause.
      
      Despite the bisection result, a more directly related commit seems to be
      26f4712a ("drm/amd/display: move FPU related code from dcn31 to
      dml/dcn31 folder").  It's probably a combination of the two.
      
      This has been going on since the merge window, without any final word.
      So instead of blindly applying patches that may or may not be the right
      thing, let's disable this for now.
      
      As Michael Ellerman says:
       "IIUIC this code was never enabled on ppc before, so disabling it seems
        like a reasonable fix to get the build clean"
      
      and once we have more actual feedback (and find any potential users) we
      can always re-enable it with the patch that fixes the issues and
      back-port as necessary.
      
      Fixes: 41b7a347 ("powerpc: Book3S 64-bit outline-only KASAN support")
      Fixes: 26f4712a ("drm/amd/display: move FPU related code from dcn31 to dml/dcn31 folder")
      Reported-and-tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/all/20220606153910.GA1773067@roeck-us.net/
      Link: https://lore.kernel.org/all/20220618232737.2036722-1-linux@roeck-us.net/
      Link: https://lore.kernel.org/all/20220713050724.GA2471738@roeck-us.net/Acked-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Acked-by: default avatarAlex Deucher <alexdeucher@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d11219ad
    • Lennert Buytenhek's avatar
      igc: Reinstate IGC_REMOVED logic and implement it properly · 7c1ddcee
      Lennert Buytenhek authored
      The initially merged version of the igc driver code (via commit
      146740f9, "igc: Add support for PF") contained the following
      IGC_REMOVED checks in the igc_rd32/wr32() MMIO accessors:
      
      	u32 igc_rd32(struct igc_hw *hw, u32 reg)
      	{
      		u8 __iomem *hw_addr = READ_ONCE(hw->hw_addr);
      		u32 value = 0;
      
      		if (IGC_REMOVED(hw_addr))
      			return ~value;
      
      		value = readl(&hw_addr[reg]);
      
      		/* reads should not return all F's */
      		if (!(~value) && (!reg || !(~readl(hw_addr))))
      			hw->hw_addr = NULL;
      
      		return value;
      	}
      
      And:
      
      	#define wr32(reg, val) \
      	do { \
      		u8 __iomem *hw_addr = READ_ONCE((hw)->hw_addr); \
      		if (!IGC_REMOVED(hw_addr)) \
      			writel((val), &hw_addr[(reg)]); \
      	} while (0)
      
      E.g. igb has similar checks in its MMIO accessors, and has a similar
      macro E1000_REMOVED, which is implemented as follows:
      
      	#define E1000_REMOVED(h) unlikely(!(h))
      
      These checks serve to detect and take note of an 0xffffffff MMIO read
      return from the device, which can be caused by a PCIe link flap or some
      other kind of PCI bus error, and to avoid performing MMIO reads and
      writes from that point onwards.
      
      However, the IGC_REMOVED macro was not originally implemented:
      
      	#ifndef IGC_REMOVED
      	#define IGC_REMOVED(a) (0)
      	#endif /* IGC_REMOVED */
      
      This led to the IGC_REMOVED logic to be removed entirely in a
      subsequent commit (commit 3c215fb1, "igc: remove IGC_REMOVED
      function"), with the rationale that such checks matter only for
      virtualization and that igc does not support virtualization -- but a
      PCIe device can become detached even without virtualization being in
      use, and without proper checks, a PCIe bus error affecting an igc
      adapter will lead to various NULL pointer dereferences, as the first
      access after the error will set hw->hw_addr to NULL, and subsequent
      accesses will blindly dereference this now-NULL pointer.
      
      This patch reinstates the IGC_REMOVED checks in igc_rd32/wr32(), and
      implements IGC_REMOVED the way it is done for igb, by checking for the
      unlikely() case of hw_addr being NULL.  This change prevents the oopses
      seen when a PCIe link flap occurs on an igc adapter.
      
      Fixes: 146740f9 ("igc: Add support for PF")
      Signed-off-by: default avatarLennert Buytenhek <buytenh@arista.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Acked-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7c1ddcee
    • Sasha Neftin's avatar
      Revert "e1000e: Fix possible HW unit hang after an s0ix exit" · 6cfa4536
      Sasha Neftin authored
      This reverts commit 1866aa0d.
      
      Commit 1866aa0d ("e1000e: Fix possible HW unit hang after an s0ix
      exit") was a workaround for CSME problem to handle messages comes via H2ME
      mailbox. This problem has been fixed by patch "e1000e: Enable the GPT
      clock before sending message to the CSME".
      
      Fixes: 3e55d231 ("e1000e: Add handshake with the CSME to support S0ix")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      6cfa4536
    • Sasha Neftin's avatar
      e1000e: Enable GPT clock before sending message to CSME · b49feacb
      Sasha Neftin authored
      On corporate (CSME) ADL systems, the Ethernet Controller may stop working
      ("HW unit hang") after exiting from the s0ix state. The reason is that
      CSME misses the message sent by the host. Enabling the dynamic GPT clock
      solves this problem. This clock is cleared upon HW initialization.
      
      Fixes: 3e55d231 ("e1000e: Add handshake with the CSME to support S0ix")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821Reviewed-by: default avatarDima Ruinskiy <dima.ruinskiy@intel.com>
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b49feacb
    • Nicolas Dichtel's avatar
      selftests/net: test nexthop without gw · cd72e61b
      Nicolas Dichtel authored
      This test implement the scenario described in the commit
      "ip: fix dflt addr selection for connected nexthop".
      The test configures a nexthop object with an output device only (no gateway
      address) and a route that uses this nexthop. The goal is to check if the
      kernel selects a valid source address.
      
      Link: https://lore.kernel.org/netdev/20220712095545.10947-1-nicolas.dichtel@6wind.com/Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Link: https://lore.kernel.org/r/20220713114853.29406-2-nicolas.dichtel@6wind.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cd72e61b
    • Nicolas Dichtel's avatar
      ip: fix dflt addr selection for connected nexthop · 747c1430
      Nicolas Dichtel authored
      When a nexthop is added, without a gw address, the default scope was set
      to 'host'. Thus, when a source address is selected, 127.0.0.1 may be chosen
      but rejected when the route is used.
      
      When using a route without a nexthop id, the scope can be configured in the
      route, thus the problem doesn't exist.
      
      To explain more deeply: when a user creates a nexthop, it cannot specify
      the scope. To create it, the function nh_create_ipv4() calls fib_check_nh()
      with scope set to 0. fib_check_nh() calls fib_check_nh_nongw() wich was
      setting scope to 'host'. Then, nh_create_ipv4() calls
      fib_info_update_nhc_saddr() with scope set to 'host'. The src addr is
      chosen before the route is inserted.
      
      When a 'standard' route (ie without a reference to a nexthop) is added,
      fib_create_info() calls fib_info_update_nhc_saddr() with the scope set by
      the user. iproute2 set the scope to 'link' by default.
      
      Here is a way to reproduce the problem:
      ip netns add foo
      ip -n foo link set lo up
      ip netns add bar
      ip -n bar link set lo up
      sleep 1
      
      ip -n foo link add name eth0 type dummy
      ip -n foo link set eth0 up
      ip -n foo address add 192.168.0.1/24 dev eth0
      
      ip -n foo link add name veth0 type veth peer name veth1 netns bar
      ip -n foo link set veth0 up
      ip -n bar link set veth1 up
      
      ip -n bar address add 192.168.1.1/32 dev veth1
      ip -n bar route add default dev veth1
      
      ip -n foo nexthop add id 1 dev veth0
      ip -n foo route add 192.168.1.1 nhid 1
      
      Try to get/use the route:
      > $ ip -n foo route get 192.168.1.1
      > RTNETLINK answers: Invalid argument
      > $ ip netns exec foo ping -c1 192.168.1.1
      > ping: connect: Invalid argument
      
      Try without nexthop group (iproute2 sets scope to 'link' by dflt):
      ip -n foo route del 192.168.1.1
      ip -n foo route add 192.168.1.1 dev veth0
      
      Try to get/use the route:
      > $ ip -n foo route get 192.168.1.1
      > 192.168.1.1 dev veth0 src 192.168.0.1 uid 0
      >     cache
      > $ ip netns exec foo ping -c1 192.168.1.1
      > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
      > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.039 ms
      >
      > --- 192.168.1.1 ping statistics ---
      > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
      > rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms
      
      CC: stable@vger.kernel.org
      Fixes: 597cfe4f ("nexthop: Add support for IPv4 nexthops")
      Reported-by: default avatarEdwin Brossette <edwin.brossette@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Link: https://lore.kernel.org/r/20220713114853.29406-1-nicolas.dichtel@6wind.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      747c1430