1. 22 Nov, 2022 8 commits
  2. 21 Nov, 2022 9 commits
  3. 19 Nov, 2022 11 commits
  4. 18 Nov, 2022 12 commits
    • Slawomir Laba's avatar
      iavf: Fix race condition between iavf_shutdown and iavf_remove · a8417330
      Slawomir Laba authored
      Fix a deadlock introduced by commit
      97457801 ("iavf: Add waiting so the port is initialized in remove")
      due to race condition between iavf_shutdown and iavf_remove, where
      iavf_remove stucks forever in while loop since iavf_shutdown already
      set __IAVF_REMOVE adapter state.
      
      Fix this by checking if the __IAVF_IN_REMOVE_TASK has already been
      set and return if so.
      
      Fixes: 97457801 ("iavf: Add waiting so the port is initialized in remove")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarMarek Szlosek <marek.szlosek@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a8417330
    • Stefan Assmann's avatar
      iavf: remove INITIAL_MAC_SET to allow gARP to work properly · bb861c14
      Stefan Assmann authored
      IAVF_FLAG_INITIAL_MAC_SET prevents waiting on iavf_is_mac_set_handled()
      the first time the MAC is set. This breaks gratuitous ARP because the
      MAC address has not been updated yet when the gARP packet is sent out.
      
      Current behaviour:
      $ echo 1 > /sys/class/net/ens4f0/device/sriov_numvfs
      iavf 0000:88:02.0: MAC address: ee:04:19:14:ec:ea
      $ ip addr add 192.168.1.1/24 dev ens4f0v0
      $ ip link set dev ens4f0v0 up
      $ echo 1 > /proc/sys/net/ipv4/conf/ens4f0v0/arp_notify
      $ ip link set ens4f0v0 addr 00:11:22:33:44:55
      07:23:41.676611 ee:04:19:14:ec:ea > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.1, length 28
      
      With IAVF_FLAG_INITIAL_MAC_SET removed:
      $ echo 1 > /sys/class/net/ens4f0/device/sriov_numvfs
      iavf 0000:88:02.0: MAC address: 3e:8a:16:a2:37:6d
      $ ip addr add 192.168.1.1/24 dev ens4f0v0
      $ ip link set dev ens4f0v0 up
      $ echo 1 > /proc/sys/net/ipv4/conf/ens4f0v0/arp_notify
      $ ip link set ens4f0v0 addr 00:11:22:33:44:55
      07:28:01.836608 00:11:22:33:44:55 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.1, length 28
      
      Fixes: 35a2443d ("iavf: Add waiting for response from PF in set mac")
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bb861c14
    • Ivan Vecera's avatar
      iavf: Do not restart Tx queues after reset task failure · 08f1c147
      Ivan Vecera authored
      After commit aa626da9 ("iavf: Detach device during reset task")
      the device is detached during reset task and re-attached at its end.
      The problem occurs when reset task fails because Tx queues are
      restarted during device re-attach and this leads later to a crash.
      
      To resolve this issue properly close the net device in cause of
      failure in reset task to avoid restarting of tx queues at the end.
      Also replace the hacky manipulation with IFF_UP flag by device close
      that clears properly both IFF_UP and __LINK_STATE_START flags.
      In these case iavf_close() does not do anything because the adapter
      state is already __IAVF_DOWN.
      
      Reproducer:
      1) Run some Tx traffic (e.g. iperf3) over iavf interface
      2) Set VF trusted / untrusted in loop
      
      [root@host ~]# cat repro.sh
      
      PF=enp65s0f0
      IF=${PF}v0
      
      ip link set up $IF
      ip addr add 192.168.0.2/24 dev $IF
      sleep 1
      
      iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
      sleep 2
      
      while :; do
              ip link set $PF vf 0 trust on
              ip link set $PF vf 0 trust off
      done
      [root@host ~]# ./repro.sh
      
      Result:
      [ 2006.650969] iavf 0000:41:01.0: Failed to init adminq: -53
      [ 2006.675662] ice 0000:41:00.0: VF 0 is now trusted
      [ 2006.689997] iavf 0000:41:01.0: Reset task did not complete, VF disabled
      [ 2006.696611] iavf 0000:41:01.0: failed to allocate resources during reinit
      [ 2006.703209] ice 0000:41:00.0: VF 0 is now untrusted
      [ 2006.737011] ice 0000:41:00.0: VF 0 is now trusted
      [ 2006.764536] ice 0000:41:00.0: VF 0 is now untrusted
      [ 2006.768919] BUG: kernel NULL pointer dereference, address: 0000000000000b4a
      [ 2006.776358] #PF: supervisor read access in kernel mode
      [ 2006.781488] #PF: error_code(0x0000) - not-present page
      [ 2006.786620] PGD 0 P4D 0
      [ 2006.789152] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 2006.792903] ice 0000:41:00.0: VF 0 is now trusted
      [ 2006.793501] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 6.1.0-rc3+ #2
      [ 2006.805668] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
      [ 2006.815915] RIP: 0010:iavf_xmit_frame_ring+0x96/0xf70 [iavf]
      [ 2006.821028] ice 0000:41:00.0: VF 0 is now untrusted
      [ 2006.821572] Code: 48 83 c1 04 48 c1 e1 04 48 01 f9 48 83 c0 10 6b 50 f8 55 c1 ea 14 45 8d 64 14 01 48 39 c8 75 eb 41 83 fc 07 0f 8f e9 08 00 00 <0f> b7 45 4a 0f b7 55 48 41 8d 74 24 05 31 c9 66 39 d0 0f 86 da 00
      [ 2006.845181] RSP: 0018:ffffb253004bc9e8 EFLAGS: 00010293
      [ 2006.850397] RAX: ffff9d154de45b00 RBX: ffff9d15497d52e8 RCX: ffff9d154de45b00
      [ 2006.856327] ice 0000:41:00.0: VF 0 is now trusted
      [ 2006.857523] RDX: 0000000000000000 RSI: 00000000000005a8 RDI: ffff9d154de45ac0
      [ 2006.857525] RBP: 0000000000000b00 R08: ffff9d159cb010ac R09: 0000000000000001
      [ 2006.857526] R10: ffff9d154de45940 R11: 0000000000000000 R12: 0000000000000002
      [ 2006.883600] R13: ffff9d1770838dc0 R14: 0000000000000000 R15: ffffffffc07b8380
      [ 2006.885840] ice 0000:41:00.0: VF 0 is now untrusted
      [ 2006.890725] FS:  0000000000000000(0000) GS:ffff9d248e900000(0000) knlGS:0000000000000000
      [ 2006.890727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2006.909419] CR2: 0000000000000b4a CR3: 0000000c39c10002 CR4: 0000000000770ee0
      [ 2006.916543] PKRU: 55555554
      [ 2006.918254] ice 0000:41:00.0: VF 0 is now trusted
      [ 2006.919248] Call Trace:
      [ 2006.919250]  <IRQ>
      [ 2006.919252]  dev_hard_start_xmit+0x9e/0x1f0
      [ 2006.932587]  sch_direct_xmit+0xa0/0x370
      [ 2006.936424]  __dev_queue_xmit+0x7af/0xd00
      [ 2006.940429]  ip_finish_output2+0x26c/0x540
      [ 2006.944519]  ip_output+0x71/0x110
      [ 2006.947831]  ? __ip_finish_output+0x2b0/0x2b0
      [ 2006.952180]  __ip_queue_xmit+0x16d/0x400
      [ 2006.952721] ice 0000:41:00.0: VF 0 is now untrusted
      [ 2006.956098]  __tcp_transmit_skb+0xa96/0xbf0
      [ 2006.965148]  __tcp_retransmit_skb+0x174/0x860
      [ 2006.969499]  ? cubictcp_cwnd_event+0x40/0x40
      [ 2006.973769]  tcp_retransmit_skb+0x14/0xb0
      ...
      
      Fixes: aa626da9 ("iavf: Detach device during reset task")
      Cc: Jacob Keller <jacob.e.keller@intel.com>
      Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
      Cc: SlawomirX Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      08f1c147
    • Ivan Vecera's avatar
      iavf: Fix a crash during reset task · c678669d
      Ivan Vecera authored
      Recent commit aa626da9 ("iavf: Detach device during reset task")
      removed netif_tx_stop_all_queues() with an assumption that Tx queues
      are already stopped by netif_device_detach() in the beginning of
      reset task. This assumption is incorrect because during reset
      task a potential link event can start Tx queues again.
      Revert this change to fix this issue.
      
      Reproducer:
      1. Run some Tx traffic (e.g. iperf3) over iavf interface
      2. Switch MTU of this interface in a loop
      
      [root@host ~]# cat repro.sh
      
      IF=enp2s0f0v0
      
      iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
      sleep 2
      
      while :; do
              for i in 1280 1500 2000 900 ; do
                      ip link set $IF mtu $i
                      sleep 2
              done
      done
      [root@host ~]# ./repro.sh
      
      Result:
      [  306.199917] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 Gbps Full Duplex
      [  308.205944] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 Gbps Full Duplex
      [  310.103223] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [  310.110179] #PF: supervisor write access in kernel mode
      [  310.115396] #PF: error_code(0x0002) - not-present page
      [  310.120526] PGD 0 P4D 0
      [  310.123057] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [  310.127408] CPU: 24 PID: 183 Comm: kworker/u64:9 Kdump: loaded Not tainted 6.1.0-rc3+ #2
      [  310.135485] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
      [  310.145728] Workqueue: iavf iavf_reset_task [iavf]
      [  310.150520] RIP: 0010:iavf_xmit_frame_ring+0xd1/0xf70 [iavf]
      [  310.156180] Code: d0 0f 86 da 00 00 00 83 e8 01 0f b7 fa 29 f8 01 c8 39 c6 0f 8f a0 08 00 00 48 8b 45 20 48 8d 14 92 bf 01 00 00 00 4c 8d 3c d0 <49> 89 5f 08 8b 43 70 66 41 89 7f 14 41 89 47 10 f6 83 82 00 00 00
      [  310.174918] RSP: 0018:ffffbb5f0082caa0 EFLAGS: 00010293
      [  310.180137] RAX: 0000000000000000 RBX: ffff92345471a6e8 RCX: 0000000000000200
      [  310.187259] RDX: 0000000000000000 RSI: 000000000000000d RDI: 0000000000000001
      [  310.194385] RBP: ffff92341d249000 R08: ffff92434987fcac R09: 0000000000000001
      [  310.201509] R10: 0000000011f683b9 R11: 0000000011f50641 R12: 0000000000000008
      [  310.208631] R13: ffff923447500000 R14: 0000000000000000 R15: 0000000000000000
      [  310.215756] FS:  0000000000000000(0000) GS:ffff92434ee00000(0000) knlGS:0000000000000000
      [  310.223835] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  310.229572] CR2: 0000000000000008 CR3: 0000000fbc210004 CR4: 0000000000770ee0
      [  310.236696] PKRU: 55555554
      [  310.239399] Call Trace:
      [  310.241844]  <IRQ>
      [  310.243855]  ? dst_alloc+0x5b/0xb0
      [  310.247260]  dev_hard_start_xmit+0x9e/0x1f0
      [  310.251439]  sch_direct_xmit+0xa0/0x370
      [  310.255276]  __qdisc_run+0x13e/0x580
      [  310.258848]  __dev_queue_xmit+0x431/0xd00
      [  310.262851]  ? selinux_ip_postroute+0x147/0x3f0
      [  310.267377]  ip_finish_output2+0x26c/0x540
      
      Fixes: aa626da9 ("iavf: Detach device during reset task")
      Cc: Jacob Keller <jacob.e.keller@intel.com>
      Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
      Cc: SlawomirX Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c678669d
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: do not set up extensions for end interval · 33c7aba0
      Pablo Neira Ayuso authored
      Elements with an end interval flag set on do not store extensions. The
      global set definition is currently setting on the timeout and stateful
      expression for end interval elements.
      
      This leads to skipping end interval elements from the set->ops->walk()
      path as the expired check bogusly reports true.
      
      Moreover, do not set up stateful expressions for elements with end
      interval flag set on since this is never used.
      
      Fixes: 65038428 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
      Fixes: 8d8540c4 ("netfilter: nft_set_rbtree: add timeout support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      33c7aba0
    • Daniel Xu's avatar
      netfilter: conntrack: Fix data-races around ct mark · 52d1aa8b
      Daniel Xu authored
      nf_conn:mark can be read from and written to in parallel. Use
      READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted
      compiler optimizations.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      52d1aa8b
    • Wang Hai's avatar
      net: pch_gbe: fix potential memleak in pch_gbe_tx_queue() · 2360f9b8
      Wang Hai authored
      In pch_gbe_xmit_frame(), NETDEV_TX_OK will be returned whether
      pch_gbe_tx_queue() sends data successfully or not, so pch_gbe_tx_queue()
      needs to free skb before returning. But pch_gbe_tx_queue() returns without
      freeing skb in case of dma_map_single() fails. Add dev_kfree_skb_any()
      to fix it.
      
      Fixes: 77555ee7 ("net: Add Gigabit Ethernet driver of Topcliff PCH")
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2360f9b8
    • Lin Ma's avatar
      nfc/nci: fix race with opening and closing · 0ad6bded
      Lin Ma authored
      Previously we leverage NCI_UNREG and the lock inside nci_close_device to
      prevent the race condition between opening a device and closing a
      device. However, it still has problem because a failed opening command
      will erase the NCI_UNREG flag and allow another opening command to
      bypass the status checking.
      
      This fix corrects that by making sure the NCI_UNREG is held.
      
      Reported-by: syzbot+43475bf3cfbd6e41f5b7@syzkaller.appspotmail.com
      Fixes: 48b71a9e ("NFC: add NCI_UNREG flag to eliminate the race")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ad6bded
    • Vladimir Oltean's avatar
      net: dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus · 24deec6b
      Vladimir Oltean authored
      You'd think people know that the internal 100BASE-TX PHY on the SJA1110
      responds only to clause 22 MDIO transactions, but they don't :)
      
      When a clause 45 transaction is attempted, sja1105_base_tx_mdio_read()
      and sja1105_base_tx_mdio_write() don't expect "reg" to contain bit 30
      set (MII_ADDR_C45) and pack this value into the SPI transaction buffer.
      
      But the field in the SPI buffer has a width smaller than 30 bits, so we
      see this confusing message from the packing() API rather than a proper
      rejection of C45 transactions:
      
      Call trace:
       dump_stack+0x1c/0x38
       sja1105_pack+0xbc/0xc0 [sja1105]
       sja1105_xfer+0x114/0x2b0 [sja1105]
       sja1105_xfer_u32+0x44/0xf4 [sja1105]
       sja1105_base_tx_mdio_read+0x44/0x7c [sja1105]
       mdiobus_read+0x44/0x80
       get_phy_c45_ids+0x70/0x234
       get_phy_device+0x68/0x15c
       fwnode_mdiobus_register_phy+0x74/0x240
       of_mdiobus_register+0x13c/0x380
       sja1105_mdiobus_register+0x368/0x490 [sja1105]
       sja1105_setup+0x94/0x119c [sja1105]
      Cannot store 401d2405 inside bits 24-4 (would truncate)
      
      Fixes: 5a8f0974 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24deec6b
    • David Howells's avatar
      rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975] · 3bcd6c7e
      David Howells authored
      After rxrpc_unbundle_conn() has removed a connection from a bundle, it
      checks to see if there are any conns with available channels and, if not,
      removes and attempts to destroy the bundle.
      
      Whilst it does check after grabbing client_bundles_lock that there are no
      connections attached, this races with rxrpc_look_up_bundle() retrieving the
      bundle, but not attaching a connection for the connection to be attached
      later.
      
      There is therefore a window in which the bundle can get destroyed before we
      manage to attach a new connection to it.
      
      Fix this by adding an "active" counter to struct rxrpc_bundle:
      
       (1) rxrpc_connect_call() obtains an active count by prepping/looking up a
           bundle and ditches it before returning.
      
       (2) If, during rxrpc_connect_call(), a connection is added to the bundle,
           this obtains an active count, which is held until the connection is
           discarded.
      
       (3) rxrpc_deactivate_bundle() is created to drop an active count on a
           bundle and destroy it when the active count reaches 0.  The active
           count is checked inside client_bundles_lock() to prevent a race with
           rxrpc_look_up_bundle().
      
       (4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle().
      
      Fixes: 245500d8 ("rxrpc: Rewrite the client connection manager")
      Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: zdi-disclosures@trendmicro.com
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bcd6c7e
    • Wang Yufen's avatar
      selftests/net: fix missing xdp_dummy · 302e57f8
      Wang Yufen authored
      After commit afef88e6 ("selftests/bpf: Store BPF object files with
      .bpf.o extension"), we should use xdp_dummy.bpf.o instade of xdp_dummy.o.
      
      In addition, use the BPF_FILE variable to save the BPF object file name,
      which can be better identified and modified.
      
      Fixes: afef88e6 ("selftests/bpf: Store BPF object files with .bpf.o extension")
      Signed-off-by: default avatarWang Yufen <wangyufen@huawei.com>
      Cc: Daniel Müller <deso@posteo.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      302e57f8
    • Mahesh Bandewar's avatar
      ipvlan: hold lower dev to avoid possible use-after-free · 40b9d1ab
      Mahesh Bandewar authored
      Recently syzkaller discovered the issue of disappearing lower
      device (NETDEV_UNREGISTER) while the virtual device (like
      macvlan) is still having it as a lower device. So it's just
      a matter of time similar discovery will be made for IPvlan
      device setup. So fixing it preemptively. Also while at it,
      add a refcount tracker.
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40b9d1ab