1. 24 Aug, 2023 7 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b5cc3833
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wifi, can and netfilter.
      
        Fixes to fixes:
      
         - nf_tables:
             - GC transaction race with abort path
             - defer gc run if previous batch is still pending
      
        Previous releases - regressions:
      
         - ipv4: fix data-races around inet->inet_id
      
         - phy: fix deadlocking in phy_error() invocation
      
         - mdio: fix C45 read/write protocol
      
         - ipvlan: fix a reference count leak warning in ipvlan_ns_exit()
      
         - ice: fix NULL pointer deref during VF reset
      
         - i40e: fix potential NULL pointer dereferencing of pf->vf in
           i40e_sync_vsi_filters()
      
         - tg3: use slab_build_skb() when needed
      
         - mtk_eth_soc: fix NULL pointer on hw reset
      
        Previous releases - always broken:
      
         - core: validate veth and vxcan peer ifindexes
      
         - sched: fix a qdisc modification with ambiguous command request
      
         - devlink: add missing unregister linecard notification
      
         - wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
      
         - batman:
            - do not get eth header before batadv_check_management_packet
            - fix batadv_v_ogm_aggr_send memory leak
      
         - bonding: fix macvlan over alb bond support
      
         - mlxsw: set time stamp fields also when its type is MIRROR_UTC"
      
      * tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
        selftests: bonding: add macvlan over bond testing
        selftest: bond: add new topo bond_topo_2d1c.sh
        bonding: fix macvlan over alb bond support
        rtnetlink: Reject negative ifindexes in RTM_NEWLINK
        netfilter: nf_tables: defer gc run if previous batch is still pending
        netfilter: nf_tables: fix out of memory error handling
        netfilter: nf_tables: use correct lock to protect gc_list
        netfilter: nf_tables: GC transaction race with abort path
        netfilter: nf_tables: flush pending destroy work before netlink notifier
        netfilter: nf_tables: validate all pending tables
        ibmveth: Use dcbf rather than dcbfl
        i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters()
        net/sched: fix a qdisc modification with ambiguous command request
        igc: Fix the typo in the PTM Control macro
        batman-adv: Hold rtnl lock during MTU update via netlink
        igb: Avoid starting unnecessary workqueues
        can: raw: add missing refcount for memory leak fix
        can: isotp: fix support for transmission of SF without flow control
        bnx2x: new flag for track HW resource allocation
        sfc: allocate a big enough SKB for loopback selftest packet
        ...
      b5cc3833
    • Paolo Abeni's avatar
      Merge tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 8938fc0c
      Paolo Abeni authored
      Florian Westphal says:
      
      ====================
      netfilter updates for net
      
      This PR contains nf_tables updates for your *net* tree.
      
      First patch fixes table validation, I broke this in 6.4 when tracking
      validation state per table, reported by Pablo, fixup from myself.
      
      Second patch makes sure objects waiting for memory release have been
      released, this was broken in 6.1, patch from Pablo Neira Ayuso.
      
      Patch three is a fix-for-fix from previous PR: In case a transaction
      gets aborted, gc sequence counter needs to be incremented so pending
      gc requests are invalidated, from Pablo.
      
      Same for patch 4: gc list needs to use gc list lock, not destroy lock,
      also from Pablo.
      
      Patch 5 fixes a UaF in a set backend, but this should only occur when
      failslab is enabled for GFP_KERNEL allocations, broken since feature
      was added in 5.6, from myself.
      
      Patch 6 fixes a double-free bug that was also added via previous PR:
      We must not schedule gc work if the previous batch is still queued.
      
      netfilter pull request 2023-08-23
      
      * tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: defer gc run if previous batch is still pending
        netfilter: nf_tables: fix out of memory error handling
        netfilter: nf_tables: use correct lock to protect gc_list
        netfilter: nf_tables: GC transaction race with abort path
        netfilter: nf_tables: flush pending destroy work before netlink notifier
        netfilter: nf_tables: validate all pending tables
      ====================
      
      Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8938fc0c
    • Paolo Abeni's avatar
      Merge branch 'fix-macvlan-over-alb-bond-support' · b251610c
      Paolo Abeni authored
      Hangbin Liu says:
      
      ====================
      fix macvlan over alb bond support
      
      Currently, the macvlan over alb bond is broken after commit
      14af9963 ("bonding: Support macvlans on top of tlb/rlb mode bonds").
      Fix this and add relate tests.
      ====================
      
      Link: https://lore.kernel.org/r/20230823071907.3027782-1-liuhangbin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b251610c
    • Hangbin Liu's avatar
      selftests: bonding: add macvlan over bond testing · 246af950
      Hangbin Liu authored
      Add a macvlan over bonding test with mode active-backup, balance-tlb
      and balance-alb.
      
      ]# ./bond_macvlan.sh
      TEST: active-backup: IPv4: client->server                           [ OK ]
      TEST: active-backup: IPv6: client->server                           [ OK ]
      TEST: active-backup: IPv4: client->macvlan_1                        [ OK ]
      TEST: active-backup: IPv6: client->macvlan_1                        [ OK ]
      TEST: active-backup: IPv4: client->macvlan_2                        [ OK ]
      TEST: active-backup: IPv6: client->macvlan_2                        [ OK ]
      TEST: active-backup: IPv4: macvlan_1->macvlan_2                     [ OK ]
      TEST: active-backup: IPv6: macvlan_1->macvlan_2                     [ OK ]
      TEST: active-backup: IPv4: server->client                           [ OK ]
      TEST: active-backup: IPv6: server->client                           [ OK ]
      TEST: active-backup: IPv4: macvlan_1->client                        [ OK ]
      TEST: active-backup: IPv6: macvlan_1->client                        [ OK ]
      TEST: active-backup: IPv4: macvlan_2->client                        [ OK ]
      TEST: active-backup: IPv6: macvlan_2->client                        [ OK ]
      TEST: active-backup: IPv4: macvlan_2->macvlan_2                     [ OK ]
      TEST: active-backup: IPv6: macvlan_2->macvlan_2                     [ OK ]
      [...]
      TEST: balance-alb: IPv4: client->server                             [ OK ]
      TEST: balance-alb: IPv6: client->server                             [ OK ]
      TEST: balance-alb: IPv4: client->macvlan_1                          [ OK ]
      TEST: balance-alb: IPv6: client->macvlan_1                          [ OK ]
      TEST: balance-alb: IPv4: client->macvlan_2                          [ OK ]
      TEST: balance-alb: IPv6: client->macvlan_2                          [ OK ]
      TEST: balance-alb: IPv4: macvlan_1->macvlan_2                       [ OK ]
      TEST: balance-alb: IPv6: macvlan_1->macvlan_2                       [ OK ]
      TEST: balance-alb: IPv4: server->client                             [ OK ]
      TEST: balance-alb: IPv6: server->client                             [ OK ]
      TEST: balance-alb: IPv4: macvlan_1->client                          [ OK ]
      TEST: balance-alb: IPv6: macvlan_1->client                          [ OK ]
      TEST: balance-alb: IPv4: macvlan_2->client                          [ OK ]
      TEST: balance-alb: IPv6: macvlan_2->client                          [ OK ]
      TEST: balance-alb: IPv4: macvlan_2->macvlan_2                       [ OK ]
      TEST: balance-alb: IPv6: macvlan_2->macvlan_2                       [ OK ]
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      246af950
    • Hangbin Liu's avatar
      selftest: bond: add new topo bond_topo_2d1c.sh · 27aa43f8
      Hangbin Liu authored
      Add a new testing topo bond_topo_2d1c.sh which is used more commonly.
      Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the
      extra link.
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      27aa43f8
    • Hangbin Liu's avatar
      bonding: fix macvlan over alb bond support · e74216b8
      Hangbin Liu authored
      The commit 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode
      bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
      the current rlb bond mode only handles ARP packets to update remote neighbor
      entries. This causes an issue when a macvlan is on top of the bond, and
      remote devices send packets to the macvlan using the bond's MAC address
      as the destination. After delivering the packets to the macvlan, the macvlan
      will rejects them as the MAC address is incorrect. Consequently, this commit
      makes macvlan over bond non-functional.
      
      To address this problem, one potential solution is to check for the presence
      of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
      and return NULL in the rlb_arp_xmit() function. However, this approach
      doesn't fully resolve the situation when a VLAN exists between the bond and
      macvlan.
      
      So let's just do a partial revert for commit 14af9963 in rlb_arp_xmit().
      As the comment said, Don't modify or load balance ARPs that do not originate
      locally.
      
      Fixes: 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode bonds")
      Reported-by: susan.zheng@veritas.com
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e74216b8
    • Ido Schimmel's avatar
      rtnetlink: Reject negative ifindexes in RTM_NEWLINK · 30188bd7
      Ido Schimmel authored
      Negative ifindexes are illegal, but the kernel does not validate the
      ifindex in the ancillary header of RTM_NEWLINK messages, resulting in
      the kernel generating a warning [1] when such an ifindex is specified.
      
      Fix by rejecting negative ifindexes.
      
      [1]
      WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593
      [...]
      Call Trace:
       <TASK>
       register_netdevice+0x69a/0x1490 net/core/dev.c:10081
       br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552
       rtnl_newlink_create net/core/rtnetlink.c:3471 [inline]
       __rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688
       rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701
       rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427
       netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545
       netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
       netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368
       netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910
       sock_sendmsg_nosec net/socket.c:728 [inline]
       sock_sendmsg+0xd9/0x180 net/socket.c:751
       ____sys_sendmsg+0x6ac/0x940 net/socket.c:2538
       ___sys_sendmsg+0x135/0x1d0 net/socket.c:2592
       __sys_sendmsg+0x117/0x1e0 net/socket.c:2621
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 38f7b870 ("[RTNETLINK]: Link creation API")
      Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      30188bd7
  2. 23 Aug, 2023 20 commits
  3. 22 Aug, 2023 9 commits
  4. 21 Aug, 2023 4 commits
    • Peng Fan's avatar
      of/platform: increase refcount of fwnode · 7882541c
      Peng Fan authored
      commit 0f8e5651
      ("of/platform: Propagate firmware node by calling device_set_node()")
      use of_fwnode_handle to replace of_node_get, which introduces a side
      effect that the refcount is not increased. Then the out of tree
      jailhouse hypervisor enable/disable test will trigger kernel dump in
      of_overlay_remove, with the following sequence
      "
         of_changeset_revert(&overlay_changeset);
         of_changeset_destroy(&overlay_changeset);
         of_overlay_remove(&overlay_id);
      "
      
      So increase the refcount to avoid issues.
      
      This patch also release the refcount when releasing amba device to avoid
      refcount leakage.
      
      Fixes: 0f8e5651 ("of/platform: Propagate firmware node by calling device_set_node()")
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarPeng Fan <peng.fan@nxp.com>
      Link: https://lore.kernel.org/r/20230821023928.3324283-2-peng.fan@oss.nxp.comSigned-off-by: default avatarRob Herring <robh@kernel.org>
      7882541c
    • Petr Oros's avatar
      ice: Fix NULL pointer deref during VF reset · 67f6317d
      Petr Oros authored
      During stress test with attaching and detaching VF from KVM and
      simultaneously changing VFs spoofcheck and trust there was a
      NULL pointer dereference in ice_reset_vf that VF's VSI is null.
      
      More than one instance of ice_reset_vf() can be running at a given
      time. When we rebuild the VSI in ice_reset_vf, another reset can be
      triaged from ice_service_task. In this case we can access the currently
      uninitialized VSI and cause panic. The window for this racing condition
      has been around for a long time but it's much worse after commit
      227bf450 ("ice: move VSI delete outside deconfig") because
      the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
      we move this lock before accessing to the VF VSI, we can fix
      BUG for all cases.
      
      Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
      in ice_vsi_stop_all_rx_rings()
      
      With our reproducer, we can hit BUG:
      ~8h before commit 227bf450 ("ice: move VSI delete outside deconfig").
      ~20m after commit 227bf450 ("ice: move VSI delete outside deconfig").
      After this fix we are not able to reproduce it after ~48h
      
      There was commit cf90b743 ("ice: Fix call trace with null VSI during
      VF reset") which also tried to fix this issue, but it was only
      partially resolved and the bug still exists.
      
      [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 6420.665382] #PF: supervisor read access in kernel mode
      [ 6420.670521] #PF: error_code(0x0000) - not-present page
      [ 6420.675659] PGD 0
      [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
      [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
      [ 6420.698729] Workqueue: ice ice_service_task [ice]
      [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
      [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
      [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
      [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
      [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
      [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
      [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
      [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
      [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
      [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
      [ 6420.783742] FS:  0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
      [ 6420.791829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
      [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
      [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 6420.811841] PKRU: 55555554
      [ 6420.811842] Call Trace:
      [ 6420.811843]  <TASK>
      [ 6420.811844]  ice_reset_vf+0x9a/0x450 [ice]
      [ 6420.811876]  ice_process_vflr_event+0x8f/0xc0 [ice]
      [ 6420.841343]  ice_service_task+0x23b/0x600 [ice]
      [ 6420.845884]  ? __schedule+0x212/0x550
      [ 6420.849550]  process_one_work+0x1e2/0x3b0
      [ 6420.853563]  ? rescuer_thread+0x390/0x390
      [ 6420.857577]  worker_thread+0x50/0x3a0
      [ 6420.861242]  ? rescuer_thread+0x390/0x390
      [ 6420.865253]  kthread+0xdd/0x100
      [ 6420.868400]  ? kthread_complete_and_exit+0x20/0x20
      [ 6420.873194]  ret_from_fork+0x1f/0x30
      [ 6420.876774]  </TASK>
      [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
       r
      
      Fixes: efe41860 ("ice: Fix memory corruption in VF driver")
      Fixes: f23df522 ("ice: Fix spurious interrupt during removal of trusted VF")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      67f6317d
    • Petr Oros's avatar
      Revert "ice: Fix ice VF reset during iavf initialization" · 0ecff05e
      Petr Oros authored
      This reverts commit 7255355a.
      
      After this commit we are not able to attach VF to VM:
      virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
      error: Failed to attach interface
      error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
      
      ice_check_vf_ready_for_cfg() already contain waiting for reset.
      New condition in ice_check_vf_ready_for_reset() causing only problems.
      
      Fixes: 7255355a ("ice: Fix ice VF reset during iavf initialization")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0ecff05e
    • Jesse Brandeburg's avatar
      ice: fix receive buffer size miscalculation · 10083aef
      Jesse Brandeburg authored
      The driver is misconfiguring the hardware for some values of MTU such that
      it could use multiple descriptors to receive a packet when it could have
      simply used one.
      
      Change the driver to use a round-up instead of the result of a shift, as
      the shift can truncate the lower bits of the size, and result in the
      problem noted above. It also aligns this driver with similar code in i40e.
      
      The insidiousness of this problem is that everything works with the wrong
      size, it's just not working as well as it could, as some MTU sizes end up
      using two or more descriptors, and there is no way to tell that is
      happening without looking at ice_trace or a bus analyzer.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      10083aef