1. 04 Nov, 2022 22 commits
  2. 03 Nov, 2022 18 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · fbeb229a
      Jakub Kicinski authored
      No conflicts.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fbeb229a
    • Linus Torvalds's avatar
      Merge tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9521c9d6
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth and netfilter.
      
        Current release - regressions:
      
         - net: several zerocopy flags fixes
      
         - netfilter: fix possible memory leak in nf_nat_init()
      
         - openvswitch: add missing .resv_start_op
      
        Previous releases - regressions:
      
         - neigh: fix null-ptr-deref in neigh_table_clear()
      
         - sched: fix use after free in red_enqueue()
      
         - dsa: fall back to default tagger if we can't load the one from DT
      
         - bluetooth: fix use-after-free in l2cap_conn_del()
      
        Previous releases - always broken:
      
         - netfilter: netlink notifier might race to release objects
      
         - nfc: fix potential memory leak of skb
      
         - bluetooth: fix use-after-free caused by l2cap_reassemble_sdu
      
         - bluetooth: use skb_put to set length
      
         - eth: tun: fix bugs for oversize packet when napi frags enabled
      
         - eth: lan966x: fixes for when MTU is changed
      
         - eth: dwmac-loongson: fix invalid mdio_node"
      
      * tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
        vsock: fix possible infinite sleep in vsock_connectible_wait_data()
        vsock: remove the unused 'wait' in vsock_connectible_recvmsg()
        ipv6: fix WARNING in ip6_route_net_exit_late()
        bridge: Fix flushing of dynamic FDB entries
        net, neigh: Fix null-ptr-deref in neigh_table_clear()
        net/smc: Fix possible leaked pernet namespace in smc_init()
        stmmac: dwmac-loongson: fix invalid mdio_node
        ibmvnic: Free rwi on reset success
        net: mdio: fix undefined behavior in bit shift for __mdiobus_register
        Bluetooth: L2CAP: Fix attempting to access uninitialized memory
        Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
        Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
        Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
        Bluetooth: L2CAP: Fix memory leak in vhci_write
        Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
        Bluetooth: virtio_bt: Use skb_put to set length
        Bluetooth: hci_conn: Fix CIS connection dst_type handling
        Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
        netfilter: ipset: enforce documented limit to prevent allocating huge memory
        isdn: mISDN: netjet: fix wrong check of device registration
        ...
      9521c9d6
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 4d740391
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix an endian thinko in the asm-generic compat_arg_u64() which led to
         syscall arguments being swapped for some compat syscalls.
      
       - Fix syscall wrapper handling of syscalls with 64-bit arguments on
         32-bit kernels, which led to syscall arguments being misplaced.
      
       - A build fix for amdgpu on Book3E with AltiVec disabled.
      
      Thanks to Andreas Schwab, Christian Zigotzky, and Arnd Bergmann.
      
      * tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/32: Select ARCH_SPLIT_ARG64
        powerpc/32: fix syscall wrappers with 64-bit arguments
        asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual()
        powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec
      4d740391
    • Paolo Abeni's avatar
      Merge branch 'add-new-pcp-and-apptrust-attributes-to-dcbnl' · d9095f92
      Paolo Abeni authored
      Daniel Machon says:
      
      ====================
      Add new PCP and APPTRUST attributes to dcbnl
      
      This patch series adds new extension attributes to dcbnl, to support PCP
      prioritization (and thereby hw offloadable pcp-based queue
      classification) and per-selector trust and trust order. Additionally,
      the microchip sparx5 driver has been dcb-enabled to make use of the new
      attributes to offload PCP, DSCP and Default prio to the switch, and
      implement trust order of selectors.
      
      For pre-RFC discussion see:
      https://lore.kernel.org/netdev/Yv9VO1DYAxNduw6A@DEN-LT-70577/
      
      For RFC series see:
      https://lore.kernel.org/netdev/20220915095757.2861822-1-daniel.machon@microchip.com/
      
      In summary: there currently exist no convenient way to offload per-port
      PCP-based queue classification to hardware. The DCB subsystem offers
      different ways to prioritize through its APP table, but lacks an option
      for PCP. Similarly, there is no way to indicate the notion of trust for
      APP table selectors. This patch series addresses both topics.
      
      PCP based queue classification:
        - 8021Q standardizes the Priority Code Point table (see 6.9.3 of IEEE
          Std 802.1Q-2018).  This patch series makes it possible, to offload
          the PCP classification to said table.  The new PCP selector is not a
          standard part of the APP managed object, therefore it is
          encapsulated in a new non-std extension attribute.
      
      Selector trust:
        - ASIC's often has the notion of trust DSCP and trust PCP. The new
          attribute makes it possible to specify a trust order of app
          selectors, which drivers can then react on.
      
      DCB-enable sparx5 driver:
       - Now supports offloading of DSCP, PCP and default priority. Only one
         mapping of protocol:priority is allowed. Consecutive mappings of the
         same protocol to some new priority, will overwrite the previous. This
         is to keep a consistent view of the app table and the hardware.
       - Now supports dscp and pcp trust, by use of the introduced
         dcbnl_set/getapptrust ops. Sparx5 supports trust orders: [], [dscp],
         [pcp] and [dscp, pcp]. For now, only DSCP and PCP selectors are
         supported by the driver, everything else is bounced.
      
      Patch #1 introduces a new PCP selector to the APP object, which makes it
      possible to encode PCP and DEI in the app triplet and offload it to the
      PCP table of the ASIC.
      
      Patch #2 Introduces the new extension attributes
      DCB_ATTR_DCB_APP_TRUST_TABLE and DCB_ATTR_DCB_APP_TRUST. Trusted
      selectors are passed in the nested DCB_ATTR_DCB_APP_TRUST_TABLE
      attribute, and assembled into an array of selectors:
      
        u8 selectors[256];
      
      where lower indexes has higher precedence.  In the array, selectors are
      stored consecutively, starting from index zero. With a maximum number of
      256 unique selectors, the list has the same maximum size.
      
      Patch #3 Sets up the dcbnl ops hook, and adds support for offloading pcp
      app entries, to the PCP table of the switch.
      
      Patch #4 Makes use of the dcbnl_set/getapptrust ops, to set a per-port
      trust order.
      
      Patch #5 Adds support for offloading dscp app entries to the DSCP table
      of the switch.
      
      Patch #6 Adds support for offloading default prio app entries to the
      switch.
      
      ====================
      
      Link: https://lore.kernel.org/r/20221101094834.2726202-1-daniel.machon@microchip.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d9095f92
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for offloading default prio · c58ff3ed
      Daniel Machon authored
      Add support for offloading default prio {ETHERTYPE, 0, prio}.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c58ff3ed
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for offloading dscp table · 8dcf69a6
      Daniel Machon authored
      Add support for offloading dscp app entries. Dscp values are global for
      all ports on the sparx5 switch. Therefore, we replicate each dscp app
      entry per-port.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8dcf69a6
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for apptrust · 23f8382c
      Daniel Machon authored
      Make use of set/getapptrust() to implement per-selector trust and trust
      order.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      23f8382c
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for offloading pcp table · 92ef3d01
      Daniel Machon authored
      Add new registers and functions to support offload of pcp app entries.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      92ef3d01
    • Daniel Machon's avatar
      net: dcb: add new apptrust attribute · 6182d587
      Daniel Machon authored
      Add new apptrust extension attributes to the 8021Qaz APP managed object.
      
      Two new attributes, DCB_ATTR_DCB_APP_TRUST_TABLE and
      DCB_ATTR_DCB_APP_TRUST, has been added. Trusted selectors are passed in
      the nested attribute DCB_ATTR_DCB_APP_TRUST, in order of precedence.
      
      The new attributes are meant to allow drivers, whose hw supports the
      notion of trust, to be able to set whether a particular app selector is
      trusted - and in which order.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6182d587
    • Daniel Machon's avatar
      net: dcb: add new pcp selector to app object · ec32c0c4
      Daniel Machon authored
      Add new PCP selector for the 8021Qaz APP managed object.
      
      As the PCP selector is not part of the 8021Qaz standard, a new non-std
      extension attribute DCB_ATTR_DCB_APP has been introduced. Also two
      helper functions to translate between selector and app attribute type
      has been added. The new selector has been given a value of 255, to
      minimize the risk of future overlap of std- and non-std attributes.
      
      The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the
      app table. This means that the dcb_app struct can now both contain std-
      and non-std app attributes. Currently there is no overlap between the
      selector values of the two attributes.
      
      The purpose of adding the PCP selector, is to be able to offload
      PCP-based queue classification to the 8021Q Priority Code Point table,
      see 6.9.3 of IEEE Std 802.1Q-2018.
      
      PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a
      mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ec32c0c4
    • Saurabh Sengar's avatar
      net: mana: Assign interrupts to CPUs based on NUMA nodes · 71fa6887
      Saurabh Sengar authored
      In large VMs with multiple NUMA nodes, network performance is usually
      best if network interrupts are all assigned to the same virtual NUMA
      node. This patch assigns online CPU according to a numa aware policy,
      local cpus are returned first, followed by non-local ones, then it wraps
      around.
      Signed-off-by: default avatarSaurabh Sengar <ssengar@linux.microsoft.com>
      Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Link: https://lore.kernel.org/r/1667282761-11547-1-git-send-email-ssengar@linux.microsoft.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      71fa6887
    • Paolo Abeni's avatar
      Merge branch 'vsock-remove-an-unused-variable-and-fix-infinite-sleep' · 715aee0f
      Paolo Abeni authored
      Dexuan Cui says:
      
      ====================
      vsock: remove an unused variable and fix infinite sleep
      
      Patch 1 removes the unused 'wait' variable.
      Patch 2 fixes an infinite sleep issue reported by a hv_sock user.
      ====================
      
      Link: https://lore.kernel.org/r/20221101021706.26152-1-decui@microsoft.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      715aee0f
    • Dexuan Cui's avatar
      vsock: fix possible infinite sleep in vsock_connectible_wait_data() · 466a8533
      Dexuan Cui authored
      Currently vsock_connectible_has_data() may miss a wakeup operation
      between vsock_connectible_has_data() == 0 and the prepare_to_wait().
      
      Fix the race by adding the process to the wait queue before checking
      vsock_connectible_has_data().
      
      Fixes: b3f7fd54 ("af_vsock: separate wait data loop")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reported-by: default avatarFrédéric Dalleau <frederic.dalleau@docker.com>
      Tested-by: default avatarFrédéric Dalleau <frederic.dalleau@docker.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      466a8533
    • Dexuan Cui's avatar
      vsock: remove the unused 'wait' in vsock_connectible_recvmsg() · cf6ff0df
      Dexuan Cui authored
      Remove the unused variable introduced by 19c1b90e.
      
      Fixes: 19c1b90e ("af_vsock: separate receive data loop")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cf6ff0df
    • Shenwei Wang's avatar
      net: fec: add initial XDP support · 6d6b39f1
      Shenwei Wang authored
      This patch adds the initial XDP support to Freescale driver. It supports
      XDP_PASS, XDP_DROP and XDP_REDIRECT actions. Upcoming patches will add
      support for XDP_TX and Zero Copy features.
      
      As the patch is rather large, the part of codes to collect the
      statistics is separated and will prepare a dedicated patch for that
      part.
      
      I just tested with the application of xdpsock.
        -- Native here means running command of "xdpsock -i eth0"
        -- SKB-Mode means running command of "xdpsock -S -i eth0"
      
      The following are the testing result relating to XDP mode:
      
      root@imx8qxpc0mek:~/bpf# ./xdpsock -i eth0
       sock0@eth0:0 rxdrop xdp-drv
                         pps            pkts           1.00
      rx                 371347         2717794
      tx                 0              0
      
      root@imx8qxpc0mek:~/bpf# ./xdpsock -S -i eth0
       sock0@eth0:0 rxdrop xdp-skb
                         pps            pkts           1.00
      rx                 202229         404528
      tx                 0              0
      
      root@imx8qxpc0mek:~/bpf# ./xdp2 eth0
      proto 0:     496708 pkt/s
      proto 0:     505469 pkt/s
      proto 0:     505283 pkt/s
      proto 0:     505443 pkt/s
      proto 0:     505465 pkt/s
      
      root@imx8qxpc0mek:~/bpf# ./xdp2 -S eth0
      proto 0:          0 pkt/s
      proto 17:     118778 pkt/s
      proto 17:     118989 pkt/s
      proto 0:          1 pkt/s
      proto 17:     118987 pkt/s
      proto 0:          0 pkt/s
      proto 17:     118943 pkt/s
      proto 17:     118976 pkt/s
      proto 0:          1 pkt/s
      proto 17:     119006 pkt/s
      proto 0:          0 pkt/s
      proto 17:     119071 pkt/s
      proto 17:     119092 pkt/s
      Signed-off-by: default avatarShenwei Wang <shenwei.wang@nxp.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/r/20221031185350.2045675-1-shenwei.wang@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6d6b39f1
    • Ilya Maximets's avatar
      net: tun: bump the link speed from 10Mbps to 10Gbps · 598d2982
      Ilya Maximets authored
      The 10Mbps link speed was set in 2004 when the ethtool interface was
      initially added to the tun driver.  It might have been a good
      assumption 18 years ago, but CPUs and network stack came a long way
      since then.
      
      Other virtual ports typically report much higher speeds.  For example,
      veth reports 10Gbps since its introduction in 2007.
      
      Some userspace applications rely on the current link speed in
      certain situations.  For example, Open vSwitch is using link speed
      as an upper bound for QoS configuration if user didn't specify the
      maximum rate.  Advertised 10Mbps doesn't match reality in a modern
      world, so users have to always manually override the value with
      something more sensible to avoid configuration issues, e.g. limiting
      the traffic too much.  This also creates additional confusion among
      users.
      
      Bump the advertised speed to at least match the veth.
      
      Alternative might be to explicitly report UNKNOWN and let the user
      decide on a right value for them.  And it is indeed "the right way"
      of fixing the problem.  However, that may cause issues with bonding
      or with some userspace applications that may rely on speed value to
      be reported (even though they should not).  Just changing the speed
      value should be a safer option.
      
      Users can still override the speed with ethtool, if necessary.
      
      RFC discussion is linked below.
      
      Link: https://lore.kernel.org/lkml/20221021114921.3705550-1-i.maximets@ovn.org/
      Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-July/051958.htmlSigned-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Reviewed-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Link: https://lore.kernel.org/r/20221031173953.614577-1-i.maximets@ovn.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      598d2982
    • Zhengchao Shao's avatar
      ipv6: fix WARNING in ip6_route_net_exit_late() · 768b3c74
      Zhengchao Shao authored
      During the initialization of ip6_route_net_init_late(), if file
      ipv6_route or rt6_stats fails to be created, the initialization is
      successful by default. Therefore, the ipv6_route or rt6_stats file
      doesn't be found during the remove in ip6_route_net_exit_late(). It
      will cause WRNING.
      
      The following is the stack information:
      name 'rt6_stats'
      WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460
      Modules linked in:
      Workqueue: netns cleanup_net
      RIP: 0010:remove_proc_entry+0x389/0x460
      PKRU: 55555554
      Call Trace:
      <TASK>
      ops_exit_list+0xb0/0x170
      cleanup_net+0x4ea/0xb00
      process_one_work+0x9bf/0x1710
      worker_thread+0x665/0x1080
      kthread+0x2e4/0x3a0
      ret_from_fork+0x1f/0x30
      </TASK>
      
      Fixes: cdb18761 ("[NETNS][IPV6] route6 - create route6 proc files for the namespace")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      768b3c74
    • Ido Schimmel's avatar
      bridge: Fix flushing of dynamic FDB entries · 628ac04a
      Ido Schimmel authored
      The following commands should result in all the dynamic FDB entries
      being flushed, but instead all the non-local (non-permanent) entries are
      flushed:
      
       # bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
       # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
       # ip link set dev br0 type bridge fdb_flush
       # bridge fdb show brport dummy1
       00:00:00:00:00:01 master br0 permanent
       33:33:00:00:00:01 self permanent
       01:00:5e:00:00:01 self permanent
      
      This is because br_fdb_flush() works with FDB flags and not the
      corresponding enumerator values. Fix by passing the FDB flag instead.
      
      After the fix:
      
       # bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
       # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
       # ip link set dev br0 type bridge fdb_flush
       # bridge fdb show brport dummy1
       00:aa:bb:cc:dd:ee master br0 static
       00:00:00:00:00:01 master br0 permanent
       33:33:00:00:00:01 self permanent
       01:00:5e:00:00:01 self permanent
      
      Fixes: 1f78ee14 ("net: bridge: fdb: add support for fine-grained flushing")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://lore.kernel.org/r/20221101185753.2120691-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      628ac04a