1. 14 Sep, 2023 1 commit
  2. 13 Sep, 2023 9 commits
    • Corinna Vinschen's avatar
      igb: clean up in all error paths when enabling SR-IOV · bc6ed2fa
      Corinna Vinschen authored
      After commit 50f30349 ("igb: Enable SR-IOV after reinit"), removing
      the igb module could hang or crash (depending on the machine) when the
      module has been loaded with the max_vfs parameter set to some value != 0.
      
      In case of one test machine with a dual port 82580, this hang occurred:
      
      [  232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
      [  233.093257] igb 0000:41:00.1: IOV Disabled
      [  233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
      [  233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
      [  233.352248] igb 0000:41:00.0:   device [8086:1516] error status/mask=00100000
      [  233.361088] igb 0000:41:00.0:    [20] UnsupReq               (First)
      [  233.368183] igb 0000:41:00.0: AER:   TLP Header: 40000001 0000040f cdbfc00c c
      [  233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
      [  233.388779] igb 0000:41:00.1:   device [8086:1516] error status/mask=00100000
      [  233.397629] igb 0000:41:00.1:    [20] UnsupReq               (First)
      [  233.404736] igb 0000:41:00.1: AER:   TLP Header: 40000001 0000040f cdbfc00c c
      [  233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
      [  233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
      [  233.546197] pcieport 0000:40:01.0: AER: device recovery failed
      [  234.157244] igb 0000:41:00.0: IOV Disabled
      [  371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
      [  371.627489]       Not tainted 6.4.0-dirty #2
      [  371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
      [  371.641000] task:irq/35-aerdrv   state:D stack:0     pid:257   ppid:2      f0
      [  371.650330] Call Trace:
      [  371.653061]  <TASK>
      [  371.655407]  __schedule+0x20e/0x660
      [  371.659313]  schedule+0x5a/0xd0
      [  371.662824]  schedule_preempt_disabled+0x11/0x20
      [  371.667983]  __mutex_lock.constprop.0+0x372/0x6c0
      [  371.673237]  ? __pfx_aer_root_reset+0x10/0x10
      [  371.678105]  report_error_detected+0x25/0x1c0
      [  371.682974]  ? __pfx_report_normal_detected+0x10/0x10
      [  371.688618]  pci_walk_bus+0x72/0x90
      [  371.692519]  pcie_do_recovery+0xb2/0x330
      [  371.696899]  aer_process_err_devices+0x117/0x170
      [  371.702055]  aer_isr+0x1c0/0x1e0
      [  371.705661]  ? __set_cpus_allowed_ptr+0x54/0xa0
      [  371.710723]  ? __pfx_irq_thread_fn+0x10/0x10
      [  371.715496]  irq_thread_fn+0x20/0x60
      [  371.719491]  irq_thread+0xe6/0x1b0
      [  371.723291]  ? __pfx_irq_thread_dtor+0x10/0x10
      [  371.728255]  ? __pfx_irq_thread+0x10/0x10
      [  371.732731]  kthread+0xe2/0x110
      [  371.736243]  ? __pfx_kthread+0x10/0x10
      [  371.740430]  ret_from_fork+0x2c/0x50
      [  371.744428]  </TASK>
      
      The reproducer was a simple script:
      
        #!/bin/sh
        for i in `seq 1 5`; do
          modprobe -rv igb
          modprobe -v igb max_vfs=1
          sleep 1
          modprobe -rv igb
        done
      
      It turned out that this could only be reproduce on 82580 (quad and
      dual-port), but not on 82576, i350 and i210.  Further debugging showed
      that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
      dev->is_physfn is 0 on 82580.
      
      Prior to commit 50f30349 ("igb: Enable SR-IOV after reinit"),
      igb_enable_sriov() jumped into the "err_out" cleanup branch.  After this
      commit it only returned the error code.
      
      So the cleanup didn't take place, and the incorrect VF setup in the
      igb_adapter structure fooled the igb driver into assuming that VFs have
      been set up where no VF actually existed.
      
      Fix this problem by cleaning up again if pci_enable_sriov() fails.
      
      Fixes: 50f30349 ("igb: Enable SR-IOV after reinit")
      Signed-off-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Reviewed-by: default avatarAkihiko Odaki <akihiko.odaki@daynix.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc6ed2fa
    • Vadim Fedorenko's avatar
      ixgbe: fix timestamp configuration code · 3c44191d
      Vadim Fedorenko authored
      The commit in fixes introduced flags to control the status of hardware
      configuration while processing packets. At the same time another structure
      is used to provide configuration of timestamper to user-space applications.
      The way it was coded makes this structures go out of sync easily. The
      repro is easy for 82599 chips:
      
      [root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1
      current settings:
      tx_type 0
      rx_filter 0
      new settings:
      tx_type 1
      rx_filter 12
      
      The eth0 device is properly configured to timestamp any PTPv2 events.
      
      [root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1
      current settings:
      tx_type 1
      rx_filter 12
      SIOCSHWTSTAMP failed: Numerical result out of range
      The requested time stamping mode is not supported by the hardware.
      
      The error is properly returned because HW doesn't support all packets
      timestamping. But the adapter->flags is cleared of timestamp flags
      even though no HW configuration was done. From that point no RX timestamps
      are received by user-space application. But configuration shows good
      values:
      
      [root@hostname ~]# hwstamp_ctl -i eth0
      current settings:
      tx_type 1
      rx_filter 12
      
      Fix the issue by applying new flags only when the HW was actually
      configured.
      
      Fixes: a9763f3c ("ixgbe: Update PTP to support X550EM_x devices")
      Signed-off-by: default avatarVadim Fedorenko <vadim.fedorenko@linux.dev>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c44191d
    • David S. Miller's avatar
      Merge branch 'tcp-bind-fixes' · ab6c4ec8
      David S. Miller authored
      Kuniyuki Iwashima says:
      
      ====================
      tcp: Fix bind() regression for v4-mapped-v6 address
      
      Since bhash2 was introduced, bind() is broken in two cases related
      to v4-mapped-v6 address.
      
      This series fixes the regression and adds test to cover the cases.
      
      Changes:
        v2:
          * Added patch 1 to factorise duplicated comparison (Eric Dumazet)
      
        v1: https://lore.kernel.org/netdev/20230911165106.39384-1-kuniyu@amazon.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab6c4ec8
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c. · 8637d8e8
      Kuniyuki Iwashima authored
      We add these 8 test cases in bind_wildcard.c to check bind() conflicts.
      
        1st bind()          2nd bind()
        ---------           ---------
        0.0.0.0             ::FFFF:0.0.0.0
        ::FFFF:0.0.0.0      0.0.0.0
        0.0.0.0             ::FFFF:127.0.0.1
        ::FFFF:127.0.0.1    0.0.0.0
        127.0.0.1           ::FFFF:0.0.0.0
        ::FFFF:0.0.0.0      127.0.0.1
        127.0.0.1           ::FFFF:127.0.0.1
        ::FFFF:127.0.0.1    127.0.0.1
      
      All test passed without bhash2 and with bhash2 and this series.
      
       Before bhash2:
        $ uname -r
        6.0.0-rc1-00393-g0bf73255
        $ ./bind_wildcard
        ...
        # PASSED: 16 / 16 tests passed.
      
       Just after bhash2:
        $ uname -r
        6.0.0-rc1-00394-g28044fc1
        $ ./bind_wildcard
        ...
        ok 15 bind_wildcard.v4_local_v6_v4mapped_local.v4_v6
        not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4
        # FAILED: 15 / 16 tests passed.
      
       On net.git:
        $ ./bind_wildcard
        ...
        not ok 14 bind_wildcard.v4_local_v6_v4mapped_any.v6_v4
        not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4
        # FAILED: 13 / 16 tests passed.
      
       With this series:
        $ ./bind_wildcard
        ...
        # PASSED: 16 / 16 tests passed.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8637d8e8
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Move expected_errno into each test case in bind_wildcard.c. · 2895d879
      Kuniyuki Iwashima authored
      This is a preparation patch for the following patch.
      
      Let's define expected_errno in each test case so that we can add other test
      cases easily.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2895d879
    • Kuniyuki Iwashima's avatar
      selftest: tcp: Fix address length in bind_wildcard.c. · 0071d155
      Kuniyuki Iwashima authored
      The selftest passes the IPv6 address length for an IPv4 address.
      We should pass the correct length.
      
      Note inet_bind_sk() does not check if the size is larger than
      sizeof(struct sockaddr_in), so there is no real bug in this
      selftest.
      
      Fixes: 13715acf ("selftest: Add test for bind() conflicts.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0071d155
    • Kuniyuki Iwashima's avatar
      tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address. · c48ef9c4
      Kuniyuki Iwashima authored
      Since bhash2 was introduced, the example below does not work as expected.
      These two bind() should conflict, but the 2nd bind() now succeeds.
      
        from socket import *
      
        s1 = socket(AF_INET6, SOCK_STREAM)
        s1.bind(('::ffff:127.0.0.1', 0))
      
        s2 = socket(AF_INET, SOCK_STREAM)
        s2.bind(('127.0.0.1', s1.getsockname()[1]))
      
      During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find()
      fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates
      a new tb2 for the 2nd socket.  Then, we call inet_csk_bind_conflict() that
      checks conflicts in the new tb2 by inet_bhash2_conflict().  However, the
      new tb2 does not include the 1st socket, thus the bind() finally succeeds.
      
      In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has
      the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find()
      returns the 1st socket's tb2.
      
      Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1,
      the 2nd bind() fails properly for the same reason mentinoed in the previous
      commit.
      
      Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAndrei Vagin <avagin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c48ef9c4
    • Kuniyuki Iwashima's avatar
      tcp: Fix bind() regression for v4-mapped-v6 wildcard address. · aa99e5f8
      Kuniyuki Iwashima authored
      Andrei Vagin reported bind() regression with strace logs.
      
      If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
      socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.
      
        from socket import *
      
        s1 = socket(AF_INET6, SOCK_STREAM)
        s1.bind(('::ffff:0.0.0.0', 0))
      
        s2 = socket(AF_INET, SOCK_STREAM)
        s2.bind(('127.0.0.1', s1.getsockname()[1]))
      
      During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
      AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
      if tb has the v4-mapped-v6 wildcard address.
      
      The example above does not work after commit 5456262d ("net: Fix
      incorrect address comparison when searching for a bind2 bucket"), but
      the blamed change is not the commit.
      
      Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
      as 0.0.0.0, and the sequence above worked by chance.  Technically, this
      case has been broken since bhash2 was introduced.
      
      Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
      the 2nd bind() fails properly because we fall back to using bhash to
      detect conflicts for the v4-mapped-v6 address.
      
      Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
      Reported-by: default avatarAndrei Vagin <avagin@google.com>
      Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa99e5f8
    • Kuniyuki Iwashima's avatar
      tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any). · c6d27706
      Kuniyuki Iwashima authored
      This is a prep patch to make the following patches cleaner that touch
      inet_bind2_bucket_match() and inet_bind2_bucket_match_addr_any().
      
      Both functions have duplicated comparison for netns, port, and l3mdev.
      Let's factorise them.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6d27706
  3. 12 Sep, 2023 4 commits
    • Eric Dumazet's avatar
      ipv6: fix ip6_sock_set_addr_preferences() typo · 8cdd9f1a
      Eric Dumazet authored
      ip6_sock_set_addr_preferences() second argument should be an integer.
      
      SUNRPC attempts to set IPV6_PREFER_SRC_PUBLIC were
      translated to IPV6_PREFER_SRC_TMP
      
      Fixes: 18d5ad62 ("ipv6: add ip6_sock_set_addr_preferences")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230911154213.713941-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8cdd9f1a
    • Toke Høiland-Jørgensen's avatar
      veth: Update XDP feature set when bringing up device · 7a6102aa
      Toke Høiland-Jørgensen authored
      There's an early return in veth_set_features() if the device is in a down
      state, which leads to the XDP feature flags not being updated when enabling
      GRO while the device is down. Which in turn leads to XDP_REDIRECT not
      working, because the redirect code now checks the flags.
      
      Fix this by updating the feature flags after bringing the device up.
      
      Before this patch:
      
      NETDEV_XDP_ACT_BASIC:		yes
      NETDEV_XDP_ACT_REDIRECT:	yes
      NETDEV_XDP_ACT_NDO_XMIT:	no
      NETDEV_XDP_ACT_XSK_ZEROCOPY:	no
      NETDEV_XDP_ACT_HW_OFFLOAD:	no
      NETDEV_XDP_ACT_RX_SG:		yes
      NETDEV_XDP_ACT_NDO_XMIT_SG:	no
      
      After this patch:
      
      NETDEV_XDP_ACT_BASIC:		yes
      NETDEV_XDP_ACT_REDIRECT:	yes
      NETDEV_XDP_ACT_NDO_XMIT:	yes
      NETDEV_XDP_ACT_XSK_ZEROCOPY:	no
      NETDEV_XDP_ACT_HW_OFFLOAD:	no
      NETDEV_XDP_ACT_RX_SG:		yes
      NETDEV_XDP_ACT_NDO_XMIT_SG:	yes
      
      Fixes: fccca038 ("veth: take into account device reconfiguration for xdp_features flag")
      Fixes: 66c0e13a ("drivers: net: turn on XDP features")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20230911135826.722295-1-toke@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7a6102aa
    • Sascha Hauer's avatar
      net: macb: fix sleep inside spinlock · 403f0e77
      Sascha Hauer authored
      macb_set_tx_clk() is called under a spinlock but itself calls clk_set_rate()
      which can sleep. This results in:
      
      | BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
      | pps pps1: new PPS source ptp1
      | in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 40, name: kworker/u4:3
      | preempt_count: 1, expected: 0
      | RCU nest depth: 0, expected: 0
      | 4 locks held by kworker/u4:3/40:
      |  #0: ffff000003409148
      | macb ff0c0000.ethernet: gem-ptp-timer ptp clock registered.
      |  ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c
      |  #1: ffff8000833cbdd8 ((work_completion)(&pl->resolve)){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c
      |  #2: ffff000004f01578 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x44/0x4e8
      |  #3: ffff000004f06f50 (&bp->lock){....}-{3:3}, at: macb_mac_link_up+0x40/0x2ac
      | irq event stamp: 113998
      | hardirqs last  enabled at (113997): [<ffff800080e8503c>] _raw_spin_unlock_irq+0x30/0x64
      | hardirqs last disabled at (113998): [<ffff800080e84478>] _raw_spin_lock_irqsave+0xac/0xc8
      | softirqs last  enabled at (113608): [<ffff800080010630>] __do_softirq+0x430/0x4e4
      | softirqs last disabled at (113597): [<ffff80008001614c>] ____do_softirq+0x10/0x1c
      | CPU: 0 PID: 40 Comm: kworker/u4:3 Not tainted 6.5.0-11717-g9355ce8b2f50-dirty #368
      | Hardware name: ... ZynqMP ... (DT)
      | Workqueue: events_power_efficient phylink_resolve
      | Call trace:
      |  dump_backtrace+0x98/0xf0
      |  show_stack+0x18/0x24
      |  dump_stack_lvl+0x60/0xac
      |  dump_stack+0x18/0x24
      |  __might_resched+0x144/0x24c
      |  __might_sleep+0x48/0x98
      |  __mutex_lock+0x58/0x7b0
      |  mutex_lock_nested+0x24/0x30
      |  clk_prepare_lock+0x4c/0xa8
      |  clk_set_rate+0x24/0x8c
      |  macb_mac_link_up+0x25c/0x2ac
      |  phylink_resolve+0x178/0x4e8
      |  process_one_work+0x1ec/0x51c
      |  worker_thread+0x1ec/0x3e4
      |  kthread+0x120/0x124
      |  ret_from_fork+0x10/0x20
      
      The obvious fix is to move the call to macb_set_tx_clk() out of the
      protected area. This seems safe as rx and tx are both disabled anyway at
      this point.
      It is however not entirely clear what the spinlock shall protect. It
      could be the read-modify-write access to the NCFGR register, but this
      is accessed in macb_set_rx_mode() and macb_set_rxcsum_feature() as well
      without holding the spinlock. It could also be the register accesses
      done in mog_init_rings() or macb_init_buffers(), but again these
      functions are called without holding the spinlock in macb_hresp_error_task().
      The locking seems fishy in this driver and it might deserve another look
      before this patch is applied.
      
      Fixes: 633e98a7 ("net: macb: use resolved link config in mac_link_up()")
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Link: https://lore.kernel.org/r/20230908112913.1701766-1-s.hauer@pengutronix.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      403f0e77
    • Liu Jian's avatar
      net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() · cfaa80c9
      Liu Jian authored
      I got the below warning when do fuzzing test:
      BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
      Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9
      
      CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G           OE
      Hardware name: linux,dummy-virt (DT)
      Workqueue: pencrypt_parallel padata_parallel_worker
      Call trace:
       dump_backtrace+0x0/0x420
       show_stack+0x34/0x44
       dump_stack+0x1d0/0x248
       __kasan_report+0x138/0x140
       kasan_report+0x44/0x6c
       __asan_load4+0x94/0xd0
       scatterwalk_copychunks+0x320/0x470
       skcipher_next_slow+0x14c/0x290
       skcipher_walk_next+0x2fc/0x480
       skcipher_walk_first+0x9c/0x110
       skcipher_walk_aead_common+0x380/0x440
       skcipher_walk_aead_encrypt+0x54/0x70
       ccm_encrypt+0x13c/0x4d0
       crypto_aead_encrypt+0x7c/0xfc
       pcrypt_aead_enc+0x28/0x84
       padata_parallel_worker+0xd0/0x2dc
       process_one_work+0x49c/0xbdc
       worker_thread+0x124/0x880
       kthread+0x210/0x260
       ret_from_fork+0x10/0x18
      
      This is because the value of rec_seq of tls_crypto_info configured by the
      user program is too large, for example, 0xffffffffffffff. In addition, TLS
      is asynchronously accelerated. When tls_do_encryption() returns
      -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
      skmsg is released before the asynchronous encryption process ends. As a
      result, the UAF problem occurs during the asynchronous processing of the
      encryption module.
      
      If the operation is asynchronous and the encryption module returns
      EINPROGRESS, do not free the record information.
      
      Fixes: 635d9398 ("net/tls: free record only on encryption error")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cfaa80c9
  4. 11 Sep, 2023 17 commits
    • Lorenzo Bianconi's avatar
      net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988 · 5a124b1f
      Lorenzo Bianconi authored
      MT7988 SoC support 3 NICs. Fix pse_port configuration in
      mtk_flow_set_output_device routine if the traffic is offloaded to eth2.
      Rely on mtk_pse_port definitions.
      
      Fixes: 88efedf5 ("net: ethernet: mtk_eth_soc: enable nft hw flowtable_offload for MT7988 SoC")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a124b1f
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix uninitialized variable · e10a35ab
      Daniel Golle authored
      Variable dma_addr in function mtk_poll_rx can be uninitialized on
      some of the error paths. In practise this doesn't matter, even random
      data present in uninitialized stack memory can safely be used in the
      way it happens in the error path.
      
      However, in order to make Smatch happy make sure the variable is
      always initialized.
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e10a35ab
    • Shigeru Yoshida's avatar
      kcm: Fix memory leak in error path of kcm_sendmsg() · c821a88b
      Shigeru Yoshida authored
      syzbot reported a memory leak like below:
      
      BUG: memory leak
      unreferenced object 0xffff88810b088c00 (size 240):
        comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s)
        hex dump (first 32 bytes):
          00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634
          [<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline]
          [<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815
          [<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline]
          [<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748
          [<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494
          [<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548
          [<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577
          [<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append
      newly allocated skbs to 'head'. If some bytes are copied, an error occurred,
      and jumped to out_error label, 'last_skb' is left unmodified. A later
      kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the
      'head' frag_list and causing the leak.
      
      This patch fixes this issue by properly updating the last allocated skb in
      'last_skb'.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccbSigned-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c821a88b
    • Hayes Wang's avatar
      r8152: check budget for r8152_poll() · a7b8d60b
      Hayes Wang authored
      According to the document of napi, there is no rx process when the
      budget is 0. Therefore, r8152_poll() has to return 0 directly when the
      budget is equal to 0.
      
      Fixes: d2187f8e ("r8152: divide the tx and rx bottom functions")
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7b8d60b
    • David S. Miller's avatar
      Merge branch 'sha1105-regressions' · 904de985
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fixes for SJA1105 DSA FDB regressions
      
      A report by Yanan Yang has prompted an investigation into the sja1105
      driver's behavior w.r.t. multicast. The report states that when adding
      multicast L2 addresses with "bridge mdb add", only the most recently
      added address works - the others seem to be overwritten. This is solved
      by patch 3/5 (with patch 2/5 as a dependency for it).
      
      Patches 4/5 and 5/5 fix a series of race conditions introduced during
      the same patch set as the bug above, namely this one:
      https://patchwork.kernel.org/project/netdevbpf/cover/20211024171757.3753288-1-vladimir.oltean@nxp.com/
      
      Finally, patch 1/5 fixes an issue found ever since the introduction of
      multicast forwarding offload in sja1105, which is that the multicast
      addresses are visible (with the "self" flag) in "bridge fdb show".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      904de985
    • Vladimir Oltean's avatar
      net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset · 86899e9e
      Vladimir Oltean authored
      Currently, when we add the first sja1105 port to a bridge with
      vlan_filtering 1, then we sometimes see this output:
      
      sja1105 spi2.2: port 4 failed to read back entry for be:79:b4:9e:9e:96 vid 3088: -ENOENT
      sja1105 spi2.2: Reset switch and programmed static config. Reason: VLAN filtering
      sja1105 spi2.2: port 0 failed to add be:79:b4:9e:9e:96 vid 0 to fdb: -2
      
      It is because sja1105_fdb_add() runs from the dsa_owq which is no longer
      serialized with switch resets since it dropped the rtnl_lock() in the
      blamed commit.
      
      Either performing the FDB accesses before the reset, or after the reset,
      is equally fine, because sja1105_static_fdb_change() backs up those
      changes in the static config, but FDB access during reset isn't ok.
      
      Make sja1105_static_config_reload() take the fdb_lock to fix that.
      
      Fixes: 0faf890f ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86899e9e
    • Vladimir Oltean's avatar
      net: dsa: sja1105: serialize sja1105_port_mcast_flood() with other FDB accesses · ea32690d
      Vladimir Oltean authored
      sja1105_fdb_add() runs from the dsa_owq, and sja1105_port_mcast_flood()
      runs from switchdev_deferred_process_work(). Prior to the blamed commit,
      they used to be indirectly serialized through the rtnl_lock(), which
      no longer holds true because dsa_owq dropped that.
      
      So, it is now possible that we traverse the static config BLK_IDX_L2_LOOKUP
      elements concurrently compared to when we change them, in
      sja1105_static_fdb_change(). That is not ideal, since it might result in
      data corruption.
      
      Introduce a mutex which serializes accesses to the hardware FDB and to
      the static config elements for the L2 Address Lookup table.
      
      I can't find a good reason to add locking around sja1105_fdb_dump().
      I'll add it later if needed.
      
      Fixes: 0faf890f ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea32690d
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix multicast forwarding working only for last added mdb entry · 7cef293b
      Vladimir Oltean authored
      The commit cited in Fixes: did 2 things: it refactored the read-back
      polling from sja1105_dynamic_config_read() into a new function,
      sja1105_dynamic_config_wait_complete(), and it called that from
      sja1105_dynamic_config_write() too.
      
      What is problematic is the refactoring.
      
      The refactored code from sja1105_dynamic_config_poll_valid() works like
      the previous one, but the problem is that it uses another packed_buf[]
      SPI buffer, and there was code at the end of sja1105_dynamic_config_read()
      which was relying on the read-back packed_buf[]:
      
      	/* Don't dereference possibly NULL pointer - maybe caller
      	 * only wanted to see whether the entry existed or not.
      	 */
      	if (entry)
      		ops->entry_packing(packed_buf, entry, UNPACK);
      
      After the change, the packed_buf[] that this code sees is no longer the
      entry read back from hardware, but the original entry that the caller
      passed to the sja1105_dynamic_config_read(), packed into this buffer.
      
      This difference is the most notable with the SJA1105_SEARCH uses from
      sja1105pqrs_fdb_add() - used for both fdb and mdb. There, we have logic
      added by commit 728db843 ("net: dsa: sja1105: ignore the FDB entry
      for unknown multicast when adding a new address") to figure out whether
      the address we're trying to add matches on any existing hardware entry,
      with the exception of the catch-all multicast address.
      
      That logic was broken, because with sja1105_dynamic_config_read() not
      working properly, it doesn't return us the entry read back from
      hardware, but the entry that we passed to it. And, since for multicast,
      a match will always exist, it will tell us that any mdb entry already
      exists at index=0 L2 Address Lookup table. It is index=0 because the
      caller doesn't know the index - it wants to find it out, and
      sja1105_dynamic_config_read() does:
      
      	if (index < 0) { // SJA1105_SEARCH
      		/* Avoid copying a signed negative number to an u64 */
      		cmd.index = 0; // <- this
      		cmd.search = true;
      	} else {
      		cmd.index = index;
      		cmd.search = false;
      	}
      
      So, to the caller of sja1105_dynamic_config_read(), the returned info
      looks entirely legit, and it will add all mdb entries to FDB index 0.
      There, they will always overwrite each other (not to mention,
      potentially they can also overwrite a pre-existing bridge fdb entry),
      and the user-visible impact will be that only the last mdb entry will be
      forwarded as it should. The others won't (will be flooded or dropped,
      depending on the egress flood settings).
      
      Fixing is a bit more complicated, and involves either passing the same
      packed_buf[] to sja1105_dynamic_config_wait_complete(), or moving all
      the extra processing on the packed_buf[] to
      sja1105_dynamic_config_wait_complete(). I've opted for the latter,
      because it makes sja1105_dynamic_config_wait_complete() a bit more
      self-contained.
      
      Fixes: df405910 ("net: dsa: sja1105: wait for dynamic config command completion on writes too")
      Reported-by: default avatarYanan Yang <yanan.yang@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cef293b
    • Vladimir Oltean's avatar
      net: dsa: sja1105: propagate exact error code from sja1105_dynamic_config_poll_valid() · c9567980
      Vladimir Oltean authored
      Currently, sja1105_dynamic_config_wait_complete() returns either 0 or
      -ETIMEDOUT, because it just looks at the read_poll_timeout() return code.
      
      There will be future changes which move some more checks to
      sja1105_dynamic_config_poll_valid(). It is important that we propagate
      their exact return code (-ENOENT, -EINVAL), because callers of
      sja1105_dynamic_config_read() depend on them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9567980
    • Vladimir Oltean's avatar
      net: dsa: sja1105: hide all multicast addresses from "bridge fdb show" · 02c652f5
      Vladimir Oltean authored
      Commit 4d942354 ("net: dsa: sja1105: offload bridge port flags to
      device") has partially hidden some multicast entries from showing up in
      the "bridge fdb show" output, but it wasn't enough. Addresses which are
      added through "bridge mdb add" still show up. Hide them all.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02c652f5
    • Ciprian Regus's avatar
      net:ethernet:adi:adin1110: Fix forwarding offload · 32530dba
      Ciprian Regus authored
      Currently, when a new fdb entry is added (with both ports of the
      ADIN2111 bridged), the driver configures the MAC filters for the wrong
      port, which results in the forwarding being done by the host, and not
      actually hardware offloaded.
      
      The ADIN2111 offloads the forwarding by setting filters on the
      destination MAC address of incoming frames. Based on these, they may be
      routed to the other port. Thus, if a frame has to be forwarded from port
      1 to port 2, the required configuration for the ADDR_FILT_UPRn register
      should set the APPLY2PORT1 bit (instead of APPLY2PORT2, as it's
      currently the case).
      
      Fixes: bc93e19d ("net: ethernet: adi: Add ADIN1110 support")
      Signed-off-by: default avatarCiprian Regus <ciprian.regus@analog.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32530dba
    • Ziyang Xuan's avatar
      hsr: Fix uninit-value access in fill_frame_info() · 484b4833
      Ziyang Xuan authored
      Syzbot reports the following uninit-value access problem.
      
      =====================================================
      BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline]
      BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
       fill_frame_info net/hsr/hsr_forward.c:601 [inline]
       hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
       hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223
       __netdev_start_xmit include/linux/netdevice.h:4889 [inline]
       netdev_start_xmit include/linux/netdevice.h:4903 [inline]
       xmit_one net/core/dev.c:3544 [inline]
       dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560
       __dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340
       dev_queue_xmit include/linux/netdevice.h:3082 [inline]
       packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
       packet_snd net/packet/af_packet.c:3087 [inline]
       packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       sock_sendmsg net/socket.c:753 [inline]
       __sys_sendto+0x781/0xa30 net/socket.c:2176
       __do_sys_sendto net/socket.c:2188 [inline]
       __se_sys_sendto net/socket.c:2184 [inline]
       __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
       do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
       __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
       do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
       do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
       entry_SYSENTER_compat_after_hwframe+0x70/0x82
      
      Uninit was created at:
       slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
       slab_alloc_node mm/slub.c:3478 [inline]
       kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523
       kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559
       __alloc_skb+0x318/0x740 net/core/skbuff.c:644
       alloc_skb include/linux/skbuff.h:1286 [inline]
       alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299
       sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794
       packet_alloc_skb net/packet/af_packet.c:2936 [inline]
       packet_snd net/packet/af_packet.c:3030 [inline]
       packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       sock_sendmsg net/socket.c:753 [inline]
       __sys_sendto+0x781/0xa30 net/socket.c:2176
       __do_sys_sendto net/socket.c:2188 [inline]
       __se_sys_sendto net/socket.c:2184 [inline]
       __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
       do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
       __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
       do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
       do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
       entry_SYSENTER_compat_after_hwframe+0x70/0x82
      
      It is because VLAN not yet supported in hsr driver. Return error
      when protocol is ETH_P_8021Q in fill_frame_info() now to fix it.
      
      Fixes: 451d8123 ("net: prp: add packet handling support")
      Reported-by: syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      484b4833
    • David S. Miller's avatar
      Merge branch 'rule_buf-OOB' · 0b9c3914
      David S. Miller authored
      Hangyu Hua says:
      
      ====================
      Fix possible OOB write when using rule_buf
      
      ADD bounds checks in bcmasp_netfilt_get_all_active and
      mvpp2_ethtool_get_rxnfc and mtk_hwlro_get_fdir_all when
      using rule_buf from ethtool_get_rxnfc.
      
      v2:
      [PATCH v2 1/3]: use -EMSGSIZE instead of truncating the list sliently.
      [PATCH v2 3/3]: drop the brackets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b9c3914
    • Hangyu Hua's avatar
      net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_hwlro_get_fdir_all() · e4c79810
      Hangyu Hua authored
      rule_locs is allocated in ethtool_get_rxnfc and the size is determined by
      rule_cnt from user space. So rule_cnt needs to be check before using
      rule_locs to avoid NULL pointer dereference.
      
      Fixes: 7aab747e ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4c79810
    • Hangyu Hua's avatar
      net: ethernet: mvpp2_main: fix possible OOB write in mvpp2_ethtool_get_rxnfc() · 51fe0a47
      Hangyu Hua authored
      rules is allocated in ethtool_get_rxnfc and the size is determined by
      rule_cnt from user space. So rule_cnt needs to be check before using
      rules to avoid OOB writing or NULL pointer dereference.
      
      Fixes: 90b509b3 ("net: mvpp2: cls: Add Classification offload support")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Reviewed-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51fe0a47
    • Hangyu Hua's avatar
      net: ethernet: bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active() · 9b90aca9
      Hangyu Hua authored
      rule_locs is allocated in ethtool_get_rxnfc and the size is determined by
      rule_cnt from user space. So rule_cnt needs to be check before using
      rule_locs to avoid OOB writing or NULL pointer dereference.
      
      Fixes: c5d511c4 ("net: bcmasp: Add support for wake on net filters")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b90aca9
    • Vincent Whitchurch's avatar
      net: stmmac: fix handling of zero coalescing tx-usecs · fa60b816
      Vincent Whitchurch authored
      Setting ethtool -C eth0 tx-usecs 0 is supposed to disable the use of the
      coalescing timer but currently it gets programmed with zero delay
      instead.
      
      Disable the use of the coalescing timer if tx-usecs is zero by
      preventing it from being restarted.  Note that to keep things simple we
      don't start/stop the timer when the coalescing settings are changed, but
      just let that happen on the next transmit or timer expiry.
      
      Fixes: 8fce3331 ("net: stmmac: Rework coalesce timer and fix multi-queue races")
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa60b816
  5. 10 Sep, 2023 7 commits
    • David S. Miller's avatar
      Merge branch 'smc-r-fixes' · 6eadb0b3
      David S. Miller authored
      Guangguan Wang says:
      
      ====================
      Two fixes for SMC-R
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6eadb0b3
    • Guangguan Wang's avatar
      net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add · f5146e3e
      Guangguan Wang authored
      While doing smcr_port_add, there maybe linkgroup add into or delete
      from smc_lgr_list.list at the same time, which may result kernel crash.
      So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in
      smcr_port_add.
      
      The crash calltrace show below:
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP NOPTI
      CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G
      Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
      Workqueue: events smc_ib_port_event_work [smc]
      RIP: 0010:smcr_port_add+0xa6/0xf0 [smc]
      RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297
      RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000
      RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000
      RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918
      R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4
      R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08
      FS:  0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0
      PKRU: 55555554
      Call Trace:
       smc_ib_port_event_work+0x18f/0x380 [smc]
       process_one_work+0x19b/0x340
       worker_thread+0x30/0x370
       ? process_one_work+0x340/0x340
       kthread+0x114/0x130
       ? __kthread_cancel_work+0x50/0x50
       ret_from_fork+0x1f/0x30
      
      Fixes: 1f90a05d ("net/smc: add smcr_port_add() and smcr_link_up() processing")
      Signed-off-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5146e3e
    • Guangguan Wang's avatar
      net/smc: bugfix for smcr v2 server connect success statistic · 6912e724
      Guangguan Wang authored
      In the macro SMC_STAT_SERV_SUCC_INC, the smcd_version is used
      to determin whether to increase the v1 statistic or the v2
      statistic. It is correct for SMCD. But for SMCR, smcr_version
      should be used.
      Signed-off-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6912e724
    • Ratheesh Kannoth's avatar
      octeontx2-pf: Fix page pool cache index corruption. · 88e69af0
      Ratheesh Kannoth authored
      The access to page pool `cache' array and the `count' variable
      is not locked. Page pool cache access is fine as long as there
      is only one consumer per pool.
      
      octeontx2 driver fills in rx buffers from page pool in NAPI context.
      If system is stressed and could not allocate buffers, refiiling work
      will be delegated to a delayed workqueue. This means that there are
      two cosumers to the page pool cache.
      
      Either workqueue or IRQ/NAPI can be run on other CPU. This will lead
      to lock less access, hence corruption of cache pool indexes.
      
      To fix this issue, NAPI is rescheduled from workqueue context to refill
      rx buffers.
      
      Fixes: b2e3406a ("octeontx2-pf: Add support for page pool")
      Signed-off-by: default avatarRatheesh Kannoth <rkannoth@marvell.com>
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88e69af0
    • Jinjie Ruan's avatar
      net: microchip: vcap api: Fix possible memory leak for vcap_dup_rule() · 281f65d2
      Jinjie Ruan authored
      Inject fault When select CONFIG_VCAP_KUNIT_TEST, the below memory leak
      occurs. If kzalloc() for duprule succeeds, but the following
      kmemdup() fails, the duprule, ckf and caf memory will be leaked. So kfree
      them in the error path.
      
      unreferenced object 0xffff122744c50600 (size 192):
        comm "kunit_try_catch", pid 346, jiffies 4294896122 (age 911.812s)
        hex dump (first 32 bytes):
          10 27 00 00 04 00 00 00 1e 00 00 00 2c 01 00 00  .'..........,...
          00 00 00 00 00 00 00 00 18 06 c5 44 27 12 ff ff  ...........D'...
        backtrace:
          [<00000000394b0db8>] __kmem_cache_alloc_node+0x274/0x2f8
          [<0000000001bedc67>] kmalloc_trace+0x38/0x88
          [<00000000b0612f98>] vcap_dup_rule+0x50/0x460
          [<000000005d2d3aca>] vcap_add_rule+0x8cc/0x1038
          [<00000000eef9d0f8>] test_vcap_xn_rule_creator.constprop.0.isra.0+0x238/0x494
          [<00000000cbda607b>] vcap_api_rule_remove_in_front_test+0x1ac/0x698
          [<00000000c8766299>] kunit_try_run_case+0xe0/0x20c
          [<00000000c4fe9186>] kunit_generic_run_threadfn_adapter+0x50/0x94
          [<00000000f6864acf>] kthread+0x2e8/0x374
          [<0000000022e639b3>] ret_from_fork+0x10/0x20
      
      Fixes: 814e7693 ("net: microchip: vcap api: Add a storage state to a VCAP rule")
      Signed-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      281f65d2
    • Julia Lawall's avatar
      net: bcmasp: add missing of_node_put · e73d1ab6
      Julia Lawall authored
      for_each_available_child_of_node performs an of_node_get
      on each iteration, so a break out of the loop requires an
      of_node_put.
      
      This was done using the Coccinelle semantic patch
      iterators/for_each_child.cocci
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e73d1ab6
    • Juntong Deng's avatar
      selftests/net: Improve bind_bhash.sh to accommodate predictable network interface names · ced33ca0
      Juntong Deng authored
      Starting with v197, systemd uses predictable interface network names,
      the traditional interface naming scheme (eth0) is deprecated, therefore
      it cannot be assumed that the eth0 interface exists on the host.
      
      This modification makes the bind_bhash test program run in a separate
      network namespace and no longer needs to consider the name of the
      network interface on the host.
      Signed-off-by: default avatarJuntong Deng <juntong.deng@outlook.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ced33ca0
  6. 08 Sep, 2023 2 commits
    • Liu Jian's avatar
      net: ipv4: fix one memleak in __inet_del_ifa() · ac28b1ec
      Liu Jian authored
      I got the below warning when do fuzzing test:
      unregister_netdevice: waiting for bond0 to become free. Usage count = 2
      
      It can be repoduced via:
      
      ip link add bond0 type bond
      sysctl -w net.ipv4.conf.bond0.promote_secondaries=1
      ip addr add 4.117.174.103/0 scope 0x40 dev bond0
      ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0
      ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0
      ip addr del 4.117.174.103/0 scope 0x40 dev bond0
      ip link delete bond0 type bond
      
      In this reproduction test case, an incorrect 'last_prim' is found in
      __inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40)
      is lost. The memory of the secondary address is leaked and the reference of
      in_device and net_device is leaked.
      
      Fix this problem:
      Look for 'last_prim' starting at location of the deleted IP and inserting
      the promoted IP into the location of 'last_prim'.
      
      Fixes: 0ff60a45 ("[IPV4]: Fix secondary IP addresses after promotion")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac28b1ec
    • Linus Torvalds's avatar
      Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 73be7fb1
      Linus Torvalds authored
      Pull networking updates from Jakub Kicinski:
       "Including fixes from netfilter and bpf.
      
        Current release - regressions:
      
         - eth: stmmac: fix failure to probe without MAC interface specified
      
        Current release - new code bugs:
      
         - docs: netlink: fix missing classic_netlink doc reference
      
        Previous releases - regressions:
      
         - deal with integer overflows in kmalloc_reserve()
      
         - use sk_forward_alloc_get() in sk_get_meminfo()
      
         - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
      
         - fib: avoid warn splat in flow dissector after packet mangling
      
         - skb_segment: call zero copy functions before using skbuff frags
      
         - eth: sfc: check for zero length in EF10 RX prefix
      
        Previous releases - always broken:
      
         - af_unix: fix msg_controllen test in scm_pidfd_recv() for
           MSG_CMSG_COMPAT
      
         - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
      
         - netfilter:
            - nft_exthdr: fix non-linear header modification
            - xt_u32, xt_sctp: validate user space input
            - nftables: exthdr: fix 4-byte stack OOB write
            - nfnetlink_osf: avoid OOB read
            - one more fix for the garbage collection work from last release
      
         - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
      
         - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
      
         - handshake: fix null-deref in handshake_nl_done_doit()
      
         - ip: ignore dst hint for multipath routes to ensure packets are
           hashed across the nexthops
      
         - phy: micrel:
            - correct bit assignments for cable test errata
            - disable EEE according to the KSZ9477 errata
      
        Misc:
      
         - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
      
         - Revert "net: macsec: preserve ingress frame ordering", it appears
           to have been developed against an older kernel, problem doesn't
           exist upstream"
      
      * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
        net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
        Revert "net: team: do not use dynamic lockdep key"
        net: hns3: remove GSO partial feature bit
        net: hns3: fix the port information display when sfp is absent
        net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
        net: hns3: fix debugfs concurrency issue between kfree buffer and read
        net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
        net: hns3: Support query tx timeout threshold by debugfs
        net: hns3: fix tx timeout issue
        net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
        netfilter: nf_tables: Unbreak audit log reset
        netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
        netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
        netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
        netfilter: nfnetlink_osf: avoid OOB read
        netfilter: nftables: exthdr: fix 4-byte stack OOB write
        selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
        bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
        bpf: bpf_sk_storage: Fix invalid wait context lockdep report
        s390/bpf: Pass through tail call counter in trampolines
        ...
      73be7fb1