1. 05 Jun, 2024 11 commits
    • Jakub Kicinski's avatar
      rtnetlink: make the "split" NLM_DONE handling generic · 5b4b62a1
      Jakub Kicinski authored
      Jaroslav reports Dell's OMSA Systems Management Data Engine
      expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo()
      and inet_dump_ifaddr(). We already added a similar fix previously in
      commit 460b0d33 ("inet: bring NLM_DONE out to a separate recv() again")
      
      Instead of modifying all the dump handlers, and making them look
      different than modern for_each_netdev_dump()-based dump handlers -
      put the workaround in rtnetlink code. This will also help us move
      the custom rtnl-locking from af_netlink in the future (in net-next).
      
      Note that this change is not touching rtnl_dump_all(). rtnl_dump_all()
      is different kettle of fish and a potential problem. We now mix families
      in a single recvmsg(), but NLM_DONE is not coalesced.
      
      Tested:
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \
                 --dump getaddr --json '{"ifa-family": 2}'
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
                 --dump getroute --json '{"rtm-family": 2}'
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \
                 --dump getlink
      
      Fixes: 3e41af90 ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()")
      Fixes: cdb2f80f ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
      Reported-by: default avatarJaroslav Pulchart <jaroslav.pulchart@gooddata.com>
      Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b4b62a1
    • David S. Miller's avatar
      Merge branch 'tcp-mptcp-close-wait' · e137596e
      David S. Miller authored
      Jason Xing says:
      
      ====================
      tcp/mptcp: count CLOSE-WAIT for CurrEstab
      
      Taking CLOSE-WAIT sockets into CurrEstab counters is in accordance with RFC
      1213, as suggested by Eric and Neal.
      
      v5
      Link: https://lore.kernel.org/all/20240531091753.75930-1-kerneljasonxing@gmail.com/
      1. add more detailed comment (Matthieu)
      
      v4
      Link: https://lore.kernel.org/all/20240530131308.59737-1-kerneljasonxing@gmail.com/
      1. correct the Fixes: tag in patch [2/2]. (Eric)
      
      Previous discussion
      Link: https://lore.kernel.org/all/20240529033104.33882-1-kerneljasonxing@gmail.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e137596e
    • Jason Xing's avatar
      mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB · 9633e937
      Jason Xing authored
      Like previous patch does in TCP, we need to adhere to RFC 1213:
      
        "tcpCurrEstab OBJECT-TYPE
         ...
         The number of TCP connections for which the current state
         is either ESTABLISHED or CLOSE- WAIT."
      
      So let's consider CLOSE-WAIT sockets.
      
      The logic of counting
      When we increment the counter?
      a) Only if we change the state to ESTABLISHED.
      
      When we decrement the counter?
      a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
      say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
      b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
      from CLOSE-WAIT to LAST-ACK.
      
      Fixes: d9cd27b8 ("mptcp: add CurrEstab MIB counter support")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9633e937
    • Jason Xing's avatar
      tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB · a46d0ea5
      Jason Xing authored
      According to RFC 1213, we should also take CLOSE-WAIT sockets into
      consideration:
      
        "tcpCurrEstab OBJECT-TYPE
         ...
         The number of TCP connections for which the current state
         is either ESTABLISHED or CLOSE- WAIT."
      
      After this, CurrEstab counter will display the total number of
      ESTABLISHED and CLOSE-WAIT sockets.
      
      The logic of counting
      When we increment the counter?
      a) if we change the state to ESTABLISHED.
      b) if we change the state from SYN-RECEIVED to CLOSE-WAIT.
      
      When we decrement the counter?
      a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
      say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
      b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
      from CLOSE-WAIT to LAST-ACK.
      
      Please note: there are two chances that old state of socket can be changed
      to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED.
      So we have to take care of the former case.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a46d0ea5
    • Hangbin Liu's avatar
      selftests: hsr: add missing config for CONFIG_BRIDGE · 712115a2
      Hangbin Liu authored
      hsr_redbox.sh test need to create bridge for testing. Add the missing
      config CONFIG_BRIDGE in config file.
      
      Fixes: eafbf057 ("test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Tested-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      712115a2
    • Daniel Borkmann's avatar
      vxlan: Fix regression when dropping packets due to invalid src addresses · 1cd4bc98
      Daniel Borkmann authored
      Commit f58f45c1 ("vxlan: drop packets from invalid src-address")
      has recently been added to vxlan mainly in the context of source
      address snooping/learning so that when it is enabled, an entry in the
      FDB is not being created for an invalid address for the corresponding
      tunnel endpoint.
      
      Before commit f58f45c1 vxlan was similarly behaving as geneve in
      that it passed through whichever macs were set in the L2 header. It
      turns out that this change in behavior breaks setups, for example,
      Cilium with netkit in L3 mode for Pods as well as tunnel mode has been
      passing before the change in f58f45c1 for both vxlan and geneve.
      After mentioned change it is only passing for geneve as in case of
      vxlan packets are dropped due to vxlan_set_mac() returning false as
      source and destination macs are zero which for E/W traffic via tunnel
      is totally fine.
      
      Fix it by only opting into the is_valid_ether_addr() check in
      vxlan_set_mac() when in fact source address snooping/learning is
      actually enabled in vxlan. This is done by moving the check into
      vxlan_snoop(). With this change, the Cilium connectivity test suite
      passes again for both tunnel flavors.
      
      Fixes: f58f45c1 ("vxlan: drop packets from invalid src-address")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: David Bauer <mail@david-bauer.net>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Cc: Nikolay Aleksandrov <razor@blackwall.org>
      Cc: Martin KaFai Lau <martin.lau@kernel.org>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Bauer <mail@david-bauer.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cd4bc98
    • Hangyu Hua's avatar
      net: sched: sch_multiq: fix possible OOB write in multiq_tune() · affc18fd
      Hangyu Hua authored
      q->bands will be assigned to qopt->bands to execute subsequent code logic
      after kmalloc. So the old q->bands should not be used in kmalloc.
      Otherwise, an out-of-bounds write will occur.
      
      Fixes: c2999f7f ("net: sched: multiq: don't call qdisc_put() while holding tree lock")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Acked-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      affc18fd
    • Taehee Yoo's avatar
      ionic: fix kernel panic in XDP_TX action · 491aee89
      Taehee Yoo authored
      In the XDP_TX path, ionic driver sends a packet to the TX path with rx
      page and corresponding dma address.
      After tx is done, ionic_tx_clean() frees that page.
      But RX ring buffer isn't reset to NULL.
      So, it uses a freed page, which causes kernel panic.
      
      BUG: unable to handle page fault for address: ffff8881576c110c
      PGD 773801067 P4D 773801067 PUD 87f086067 PMD 87efca067 PTE 800ffffea893e060
      Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI
      CPU: 1 PID: 25 Comm: ksoftirqd/1 Not tainted 6.9.0+ #11
      Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
      RIP: 0010:bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
      Code: 00 53 41 55 41 56 41 57 b8 01 00 00 00 48 8b 5f 08 4c 8b 77 00 4c 89 f7 48 83 c7 0e 48 39 d8
      RSP: 0018:ffff888104e6fa28 EFLAGS: 00010283
      RAX: 0000000000000002 RBX: ffff8881576c1140 RCX: 0000000000000002
      RDX: ffffffffc0051f64 RSI: ffffc90002d33048 RDI: ffff8881576c110e
      RBP: ffff888104e6fa88 R08: 0000000000000000 R09: ffffed1027a04a23
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881b03a21a8
      R13: ffff8881589f800f R14: ffff8881576c1100 R15: 00000001576c1100
      FS: 0000000000000000(0000) GS:ffff88881ae00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8881576c110c CR3: 0000000767a90000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
      <TASK>
      ? __die+0x20/0x70
      ? page_fault_oops+0x254/0x790
      ? __pfx_page_fault_oops+0x10/0x10
      ? __pfx_is_prefetch.constprop.0+0x10/0x10
      ? search_bpf_extables+0x165/0x260
      ? fixup_exception+0x4a/0x970
      ? exc_page_fault+0xcb/0xe0
      ? asm_exc_page_fault+0x22/0x30
      ? 0xffffffffc0051f64
      ? bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
      ? do_raw_spin_unlock+0x54/0x220
      ionic_rx_service+0x11ab/0x3010 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? ionic_tx_clean+0x29b/0xc60 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_tx_clean+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? ionic_tx_cq_service+0x25d/0xa00 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ionic_cq_service+0x69/0x150 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ionic_txrx_napi+0x11a/0x540 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      __napi_poll.constprop.0+0xa0/0x440
      net_rx_action+0x7e7/0xc30
      ? __pfx_net_rx_action+0x10/0x10
      
      Fixes: 8eeed837 ("ionic: Add XDP_TX support")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      491aee89
    • Tristram Ha's avatar
      net: phy: Micrel KSZ8061: fix errata solution not taking effect problem · 0a8d3f2e
      Tristram Ha authored
      KSZ8061 needs to write to a MMD register at driver initialization to fix
      an errata.  This worked in 5.0 kernel but not in newer kernels.  The
      issue is the main phylib code no longer resets PHY at the very beginning.
      Calling phy resuming code later will reset the chip if it is already
      powered down at the beginning.  This wipes out the MMD register write.
      Solution is to implement a phy resume function for KSZ8061 to take care
      of this problem.
      
      Fixes: 232ba3a5 ("net: phy: Micrel KSZ8061: link failure after cable connect")
      Signed-off-by: default avatarTristram Ha <tristram.ha@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a8d3f2e
    • Wen Gu's avatar
      net/smc: avoid overwriting when adjusting sock bufsizes · fb0aa078
      Wen Gu authored
      When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf
      to sysctl_tcp_wmem[1], since this may overwrite the value set by
      tcp_sndbuf_expand() in TCP connection establishment.
      
      And the other setting sk_{snd|rcv}buf to sysctl value in
      smc_adjust_sock_bufsizes() can also be omitted since the initialization
      of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem
      or ipv4_sysctl_tcp_{w|r}mem[1].
      
      Fixes: 30c3c4a4 ("net/smc: Use correct buffer sizes when switching between TCP and SMC")
      Link: https://lore.kernel.org/r/5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.comSigned-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>, too.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb0aa078
    • Subbaraya Sundeep's avatar
      octeontx2-af: Always allocate PF entries from low prioriy zone · 8b0f7410
      Subbaraya Sundeep authored
      PF mcam entries has to be at low priority always so that VF
      can install longest prefix match rules at higher priority.
      This was taken care currently but when priority allocation
      wrt reference entry is requested then entries are allocated
      from mid-zone instead of low priority zone. Fix this and
      always allocate entries from low priority zone for PFs.
      
      Fixes: 7df5b4b2 ("octeontx2-af: Allocate low priority entries for PF")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b0f7410
  2. 04 Jun, 2024 10 commits
    • Jakub Kicinski's avatar
      net: tls: fix marking packets as decrypted · a535d594
      Jakub Kicinski authored
      For TLS offload we mark packets with skb->decrypted to make sure
      they don't escape the host without getting encrypted first.
      The crypto state lives in the socket, so it may get detached
      by a call to skb_orphan(). As a safety check - the egress path
      drops all packets with skb->decrypted and no "crypto-safe" socket.
      
      The skb marking was added to sendpage only (and not sendmsg),
      because tls_device injected data into the TCP stack using sendpage.
      This special case was missed when sendpage got folded into sendmsg.
      
      Fixes: c5c37af6 ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a535d594
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · d6301802
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.10-rc3
      
      The first fixes for v6.10. And we have a big one, I suspect the
      biggest wireless pull request we ever had. There are fixes all over,
      both in stack and drivers. Likely the most important here are mt76 not
      working on mt7615 devices, ath11k not being able to connect to 6 GHz
      networks and rtlwifi suffering from packet loss. But of course there's
      much more.
      
      * tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (37 commits)
        wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
        wifi: mt76: mt7615: add missing chanctx ops
        wifi: wilc1000: document SRCU usage instead of SRCU
        Revert "wifi: wilc1000: set atomic flag on kmemdup in srcu critical section"
        Revert "wifi: wilc1000: convert list management to RCU"
        wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()
        wifi: mac80211: correctly parse Spatial Reuse Parameter Set element
        wifi: mac80211: fix Spatial Reuse element size check
        wifi: iwlwifi: mvm: don't read past the mfuart notifcation
        wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill
        wifi: iwlwifi: mvm: check n_ssids before accessing the ssids
        wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option
        wifi: iwlwifi: mvm: handle BA session teardown in RF-kill
        wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd
        wifi: iwlwifi: mvm: remove stale STA link data during restart
        wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef
        wifi: iwlwifi: mvm: set properly mac header
        wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64
        wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup
        wifi: iwlwifi: mvm: fix a crash on 7265
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20240603115129.9494CC2BD10@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6301802
    • Jeff Johnson's avatar
      lib/test_rhashtable: add missing MODULE_DESCRIPTION() macro · c6cab01d
      Jeff Johnson authored
      make allmodconfig && make W=1 C=1 reports:
      WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_rhashtable.o
      
      Add the missing invocation of the MODULE_DESCRIPTION() macro.
      Signed-off-by: default avatarJeff Johnson <quic_jjohnson@quicinc.com>
      Link: https://lore.kernel.org/r/20240531-md-lib-test_rhashtable-v1-1-cd6d4138f1b6@quicinc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6cab01d
    • Jakub Kicinski's avatar
      Merge branch 'dst_cache-fix-possible-races' · d730a42c
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      dst_cache: fix possible races
      
      This series is inspired by various undisclosed syzbot
      reports hinting at corruptions in dst_cache structures.
      
      It seems at least four users of dst_cache are racy against
      BH reentrancy.
      
      Last patch is adding a DEBUG_NET check to catch future misuses.
      ====================
      
      Link: https://lore.kernel.org/r/20240531132636.2637995-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d730a42c
    • Eric Dumazet's avatar
      net: dst_cache: add two DEBUG_NET warnings · 2fe6fb36
      Eric Dumazet authored
      After fixing four different bugs involving dst_cache
      users, it might be worth adding a check about BH being
      blocked by dst_cache callers.
      
      DEBUG_NET_WARN_ON_ONCE(!in_softirq());
      
      It is not fatal, if we missed valid case where no
      BH deadlock is to be feared, we might change this.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-6-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fe6fb36
    • Eric Dumazet's avatar
      ila: block BH in ila_output() · cf28ff8e
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      ila_output() is called from lwtunnel_output()
      possibly from process context, and under rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter ila_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf28ff8e
    • Eric Dumazet's avatar
      ipv6: sr: block BH in seg6_output_core() and seg6_input_core() · c0b98ac1
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in seg6_output_core() is not good enough,
      because seg6_output_core() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter seg6_output_core()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Apply a similar change in seg6_input_core().
      
      Fixes: fa79581e ("ipv6: sr: fix several BUGs when preemption is enabled")
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Lebrun <dlebrun@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0b98ac1
    • Eric Dumazet's avatar
      net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input() · db0090c6
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in rpl_output() is not good enough,
      because rpl_output() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter rpl_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Apply a similar change in rpl_input().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Aring <aahringo@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db0090c6
    • Eric Dumazet's avatar
      ipv6: ioam: block BH from ioam6_output() · 2fe40483
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in ioam6_output() is not good enough,
      because ioam6_output() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter ioam6_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Fixes: 8cb3bf8b ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Justin Iurman <justin.iurman@uliege.be>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fe40483
    • Matthias Stocker's avatar
      vmxnet3: disable rx data ring on dma allocation failure · ffbe335b
      Matthias Stocker authored
      When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base,
      the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset
      rq->data_ring.desc_size for the data ring that failed, which presumably
      causes the hypervisor to reference it on packet reception.
      
      To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell
      the hypervisor to disable this feature.
      
      [   95.436876] kernel BUG at net/core/skbuff.c:207!
      [   95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [   95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1
      [   95.441558] Hardware name: VMware, Inc. VMware Virtual
      Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
      [   95.443481] RIP: 0010:skb_panic+0x4d/0x4f
      [   95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50
      ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9
      ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24
      [   95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246
      [   95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f
      [   95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
      [   95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60
      [   95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000
      [   95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0
      [   95.455682] FS:  0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000
      [   95.457178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0
      [   95.459791] Call Trace:
      [   95.460515]  <IRQ>
      [   95.461180]  ? __die_body.cold+0x19/0x27
      [   95.462150]  ? die+0x2e/0x50
      [   95.462976]  ? do_trap+0xca/0x110
      [   95.463973]  ? do_error_trap+0x6a/0x90
      [   95.464966]  ? skb_panic+0x4d/0x4f
      [   95.465901]  ? exc_invalid_op+0x50/0x70
      [   95.466849]  ? skb_panic+0x4d/0x4f
      [   95.467718]  ? asm_exc_invalid_op+0x1a/0x20
      [   95.468758]  ? skb_panic+0x4d/0x4f
      [   95.469655]  skb_put.cold+0x10/0x10
      [   95.470573]  vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3]
      [   95.471853]  vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3]
      [   95.473185]  __napi_poll+0x2b/0x160
      [   95.474145]  net_rx_action+0x2c6/0x3b0
      [   95.475115]  handle_softirqs+0xe7/0x2a0
      [   95.476122]  __irq_exit_rcu+0x97/0xb0
      [   95.477109]  common_interrupt+0x85/0xa0
      [   95.478102]  </IRQ>
      [   95.478846]  <TASK>
      [   95.479603]  asm_common_interrupt+0x26/0x40
      [   95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20
      [   95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
      [   95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246
      [   95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000
      [   95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001
      [   95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3
      [   95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260
      [   95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000
      [   95.495035]  acpi_safe_halt+0x14/0x20
      [   95.496127]  acpi_idle_do_entry+0x2f/0x50
      [   95.497221]  acpi_idle_enter+0x7f/0xd0
      [   95.498272]  cpuidle_enter_state+0x81/0x420
      [   95.499375]  cpuidle_enter+0x2d/0x40
      [   95.500400]  do_idle+0x1e5/0x240
      [   95.501385]  cpu_startup_entry+0x29/0x30
      [   95.502422]  start_secondary+0x11c/0x140
      [   95.503454]  common_startup_64+0x13e/0x141
      [   95.504466]  </TASK>
      [   95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4
      nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
      nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
      nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback
      vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock
      sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev
      loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx
      crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel
      sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi
      ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
      scsi_dh_alua ip6_tables ip_tables fuse
      [   95.516536] ---[ end trace 0000000000000000 ]---
      
      Fixes: 6f483338 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()")
      Signed-off-by: default avatarMatthias Stocker <mstocker@barracuda.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarRonak Doshi <ronak.doshi@broadcom.com>
      Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffbe335b
  3. 03 Jun, 2024 1 commit
  4. 01 Jun, 2024 17 commits
  5. 30 May, 2024 1 commit
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d8ec1985
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - gro: initialize network_offset in network layer
      
         - tcp: reduce accepted window in NEW_SYN_RECV state
      
        Current release - new code bugs:
      
         - eth: mlx5e: do not use ptp structure for tx ts stats when not
           initialized
      
         - eth: ice: check for unregistering correct number of devlink params
      
        Previous releases - regressions:
      
         - bpf: Allow delete from sockmap/sockhash only if update is allowed
      
         - sched: taprio: extend minimum interval restriction to entire cycle
           too
      
         - netfilter: ipset: add list flush to cancel_gc
      
         - ipv4: fix address dump when IPv4 is disabled on an interface
      
         - sock_map: avoid race between sock_map_close and sk_psock_put
      
         - eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
           status rules
      
        Previous releases - always broken:
      
         - core: fix __dst_negative_advice() race
      
         - bpf:
             - fix multi-uprobe PID filtering logic
             - fix pkt_type override upon netkit pass verdict
      
         - netfilter: tproxy: bail out if IP has been disabled on the device
      
         - af_unix: annotate data-race around unix_sk(sk)->addr
      
         - eth: mlx5e: fix UDP GSO for encapsulated packets
      
         - eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
           buffers
      
         - eth: i40e: fully suspend and resume IO operations in EEH case
      
         - eth: octeontx2-pf: free send queue buffers incase of leaf to inner
      
         - eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"
      
      * tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
        netdev: add qstat for csum complete
        ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
        net: ena: Fix redundant device NUMA node override
        ice: check for unregistering correct number of devlink params
        ice: fix 200G PHY types to link speed mapping
        i40e: Fully suspend and resume IO operations in EEH case
        i40e: factoring out i40e_suspend/i40e_resume
        e1000e: move force SMBUS near the end of enable_ulp function
        net: dsa: microchip: fix RGMII error in KSZ DSA driver
        ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
        net: fix __dst_negative_advice() race
        nfc/nci: Add the inconsistency check between the input data length and count
        MAINTAINERS: dwmac: starfive: update Maintainer
        net/sched: taprio: extend minimum interval restriction to entire cycle too
        net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
        netfilter: nft_fib: allow from forward/input without iif selector
        netfilter: tproxy: bail out if IP has been disabled on the device
        netfilter: nft_payload: skbuff vlan metadata mangle support
        net: ti: icssg-prueth: Fix start counter for ft1 filter
        sock_map: avoid race between sock_map_close and sk_psock_put
        ...
      d8ec1985