1. 05 Nov, 2021 1 commit
  2. 04 Nov, 2021 6 commits
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · a5bda908
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-11-03
      
      Brett fixes issues with promiscuous mode settings not being properly
      enabled and removes setting of VF antispoof along with promiscuous
      mode. He also ensures that VF Tx queues are always disabled and resolves
      a race between virtchnl handling and VF related ndo ops.
      
      Sylwester fixes an issue where a VF MAC could not be set to its primary
      MAC if the address is already present.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Fix race conditions between virtchnl handling and VF ndo ops
        ice: Fix not stopping Tx queues for VFs
        ice: Fix replacing VF hardware MAC to existing MAC filter
        ice: Remove toggling of antispoof for VF trusted promiscuous mode
        ice: Fix VF true promiscuous mode
      ====================
      
      Link: https://lore.kernel.org/r/20211103161935.2997369-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a5bda908
    • Heiner Kallweit's avatar
      net: phy: fix duplex out of sync problem while changing settings · a4db9055
      Heiner Kallweit authored
      As reported by Zhang there's a small issue if in forced mode the duplex
      mode changes with the link staying up [0]. In this case the MAC isn't
      notified about the change.
      
      The proposed patch relies on the phylib state machine and ignores the
      fact that there are drivers that uses phylib but not the phylib state
      machine. So let's don't change the behavior for such drivers and fix
      it w/o re-adding state PHY_FORCING for the case that phylib state
      machine is used.
      
      [0] https://lore.kernel.org/netdev/a5c26ffd-4ee4-a5e6-4103-873208ce0dc5@huawei.com/T/
      
      Fixes: 2bd229df ("net: phy: remove state PHY_FORCING")
      Reported-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Tested-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/7b8b9456-a93f-abbc-1dc5-a2c2542f932c@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a4db9055
    • Guo Zhengkui's avatar
      devlink: fix flexible_array.cocci warning · 96d0c9be
      Guo Zhengkui authored
      Fix following coccicheck warning:
      ./net/core/devlink.c:69:6-10: WARNING use flexible-array member instead
      Signed-off-by: default avatarGuo Zhengkui <guozhengkui@vivo.com>
      Link: https://lore.kernel.org/r/20211103121607.27490-1-guozhengkui@vivo.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96d0c9be
    • Eric Dumazet's avatar
      net: fix possible NULL deref in sock_reserve_memory · d00c8ee3
      Eric Dumazet authored
      Sanity check in sock_reserve_memory() was not enough to prevent malicious
      user to trigger a NULL deref.
      
      In this case, the isse is that sk_prot->memory_allocated is NULL.
      
      Use standard sk_has_account() helper to deal with this.
      
      BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
      BUG: KASAN: null-ptr-deref in atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline]
      BUG: KASAN: null-ptr-deref in sk_memory_allocated_add include/net/sock.h:1371 [inline]
      BUG: KASAN: null-ptr-deref in sock_reserve_memory net/core/sock.c:994 [inline]
      BUG: KASAN: null-ptr-deref in sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443
      Write of size 8 at addr 0000000000000000 by task syz-executor.0/11270
      
      CPU: 1 PID: 11270 Comm: syz-executor.0 Not tainted 5.15.0-syzkaller #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       __kasan_report mm/kasan/report.c:446 [inline]
       kasan_report.cold+0x66/0xdf mm/kasan/report.c:459
       check_region_inline mm/kasan/generic.c:183 [inline]
       kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
       instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
       atomic_long_add_return include/linux/atomic/atomic-instrumented.h:1218 [inline]
       sk_memory_allocated_add include/net/sock.h:1371 [inline]
       sock_reserve_memory net/core/sock.c:994 [inline]
       sock_setsockopt+0x22ab/0x2b30 net/core/sock.c:1443
       __sys_setsockopt+0x4f8/0x610 net/socket.c:2172
       __do_sys_setsockopt net/socket.c:2187 [inline]
       __se_sys_setsockopt net/socket.c:2184 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2184
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f56076d5ae9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f5604c4b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007f56077e8f60 RCX: 00007f56076d5ae9
      RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000003
      RBP: 00007f560772ff25 R08: 000000000000fec7 R09: 0000000000000000
      R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fffb61a100f R14: 00007f5604c4b300 R15: 0000000000022000
       </TASK>
      
      Fixes: 2bb2f5fb ("net: add new socket option SO_RESERVE_MEM")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d00c8ee3
    • Leonard Crestez's avatar
      tcp: Use BIT() for OPTION_* constants · 3b65abb8
      Leonard Crestez authored
      Extending these flags using the existing (1 << x) pattern triggers
      complaints from checkpatch. Instead of ignoring checkpatch modify the
      existing values to use BIT(x) style in a separate commit.
      Signed-off-by: default avatarLeonard Crestez <cdleonard@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b65abb8
    • Andrea Righi's avatar
      selftests: net: properly support IPv6 in GSO GRE test · a985442f
      Andrea Righi authored
      Explicitly pass -6 to netcat when the test is using IPv6 to prevent
      failures.
      
      Also make sure to pass "-N" to netcat to close the socket after EOF on
      the client side, otherwise we would always hit the timeout and the test
      would fail.
      
      Without this fix applied:
      
       TEST: GREv6/v4 - copy file w/ TSO                                   [FAIL]
       TEST: GREv6/v4 - copy file w/ GSO                                   [FAIL]
       TEST: GREv6/v6 - copy file w/ TSO                                   [FAIL]
       TEST: GREv6/v6 - copy file w/ GSO                                   [FAIL]
      
      With this fix applied:
      
       TEST: GREv6/v4 - copy file w/ TSO                                   [ OK ]
       TEST: GREv6/v4 - copy file w/ GSO                                   [ OK ]
       TEST: GREv6/v6 - copy file w/ TSO                                   [ OK ]
       TEST: GREv6/v6 - copy file w/ GSO                                   [ OK ]
      
      Fixes: 025efa0a ("selftests: add simple GSO GRE test")
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a985442f
  3. 03 Nov, 2021 33 commits
    • Brett Creeley's avatar
      ice: Fix race conditions between virtchnl handling and VF ndo ops · e6ba5273
      Brett Creeley authored
      The VF can be configured via the PF's ndo ops at the same time the PF is
      receiving/handling virtchnl messages. This has many issues, with
      one of them being the ndo op could be actively resetting a VF (i.e.
      resetting it to the default state and deleting/re-adding the VF's VSI)
      while a virtchnl message is being handled. The following error was seen
      because a VF ndo op was used to change a VF's trust setting while the
      VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing:
      
      [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
      [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
      [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
      
      Fix this by making sure the virtchnl handling and VF ndo ops that
      trigger VF resets cannot run concurrently. This is done by adding a
      struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex
      will be locked around the critical operations and VFR. Since the ndo ops
      will trigger a VFR, the virtchnl thread will use mutex_trylock(). This
      is done because if any other thread (i.e. VF ndo op) has the mutex, then
      that means the current VF message being handled is no longer valid, so
      just ignore it.
      
      This issue can be seen using the following commands:
      
      for i in {0..50}; do
              rmmod ice
              modprobe ice
      
              sleep 1
      
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      
              sleep 2
      
              echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
              sleep 1
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      done
      
      Fixes: 7c710869 ("ice: Add handlers for VF netdevice operations")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e6ba5273
    • Brett Creeley's avatar
      ice: Fix not stopping Tx queues for VFs · b385cca4
      Brett Creeley authored
      When a VF is removed and/or reset its Tx queues need to be
      stopped from the PF. This is done by calling the ice_dis_vf_qs()
      function, which calls ice_vsi_stop_lan_tx_rings(). Currently
      ice_dis_vf_qs() is protected by the VF state bit ICE_VF_STATE_QS_ENA.
      Unfortunately, this is causing the Tx queues to not be disabled in some
      cases and when the VF tries to re-enable/reconfigure its Tx queues over
      virtchnl the op is failing. This is because a VF can be reset and/or
      removed before the ICE_VF_STATE_QS_ENA bit is set, but the Tx queues
      were already configured via ice_vsi_cfg_single_txq() in the
      VIRTCHNL_OP_CONFIG_VSI_QUEUES op. However, the ICE_VF_STATE_QS_ENA bit
      is set on a successful VIRTCHNL_OP_ENABLE_QUEUES, which will always
      happen after the VIRTCHNL_OP_CONFIG_VSI_QUEUES op.
      
      This was causing the following error message when loading the ice
      driver, creating VFs, and modifying VF trust in an endless loop:
      
      [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM
      [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5
      [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6
      
      Fix this by always calling ice_dis_vf_qs() and silencing the error
      message in ice_vsi_stop_tx_ring() since the calling code ignores the
      return anyway. Also, all other places that call ice_vsi_stop_tx_ring()
      catch the error, so this doesn't affect those flows since there was no
      change to the values the function returns.
      
      Other solutions were considered (i.e. tracking which VF queues had been
      "started/configured" in VIRTCHNL_OP_CONFIG_VSI_QUEUES, but it seemed
      more complicated than it was worth. This solution also brings in the
      chance for other unexpected conditions due to invalid state bit checks.
      So, the proposed solution seemed like the best option since there is no
      harm in failing to stop Tx queues that were never started.
      
      This issue can be seen using the following commands:
      
      for i in {0..50}; do
              rmmod ice
              modprobe ice
      
              sleep 1
      
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      
              sleep 2
      
              echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs
              sleep 1
              echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
              echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs
      
              ip link set ens785f1 vf 0 trust on
              ip link set ens785f0 vf 0 trust on
      done
      
      Fixes: 77ca27c4 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b385cca4
    • Sylwester Dziedziuch's avatar
      ice: Fix replacing VF hardware MAC to existing MAC filter · ce572a5b
      Sylwester Dziedziuch authored
      VF was not able to change its hardware MAC address in case
      the new address was already present in the MAC filter list.
      Change the handling of VF add mac request to not return
      if requested MAC address is already present on the list
      and check if its hardware MAC needs to be updated in this case.
      
      Fixes: ed4c068d ("ice: Enable ip link show on the PF to display VF unicast MAC(s)")
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Tested-by: default avatarTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ce572a5b
    • Brett Creeley's avatar
      ice: Remove toggling of antispoof for VF trusted promiscuous mode · 0299faea
      Brett Creeley authored
      Currently when a trusted VF enables promiscuous mode spoofchk will be
      disabled. This is wrong and should only be modified from the
      ndo_set_vf_spoofchk callback. Fix this by removing the call to toggle
      spoofchk for trusted VFs.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0299faea
    • Brett Creeley's avatar
      ice: Fix VF true promiscuous mode · 1a8c7778
      Brett Creeley authored
      When a VF requests promiscuous mode and it's trusted and true promiscuous
      mode is enabled the PF driver attempts to enable unicast and/or
      multicast promiscuous mode filters based on the request. This is fine,
      but there are a couple issues with the current code.
      
      [1] The define to configure the unicast promiscuous mode mask also
          includes bits to configure the multicast promiscuous mode mask, which
          causes multicast to be set/cleared unintentionally.
      [2] All 4 cases for enable/disable unicast/multicast mode are not
          handled in the promiscuous mode message handler, which causes
          unexpected results regarding the current promiscuous mode settings.
      
      To fix [1] make sure any promiscuous mask defines include the correct
      bits for each of the promiscuous modes.
      
      To fix [2] make sure that all 4 cases are handled since there are 2 bits
      (FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be
      either set or cleared. Also, since either unicast and/or multicast
      promiscuous configuration can fail, introduce two separate error values
      to handle each of these cases.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1a8c7778
    • Vladimir Oltean's avatar
      net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge · 92f62485
      Vladimir Oltean authored
      Normally it is expected that the dsa_device_ops :: rcv() method finishes
      parsing the DSA tag and consumes it, then never looks at it again.
      
      But commit c0bcf537 ("net: dsa: ocelot: add hardware timestamping
      support for Felix") added support for RX timestamping in a very
      unconventional way. On this switch, a partial timestamp is available in
      the DSA header, but the driver got away with not parsing that timestamp
      right away, but instead delayed that parsing for a little longer:
      
      dsa_switch_rcv():
      	nskb = cpu_dp->rcv(skb, dev); <------------- not here
      	-> ocelot_rcv()
      	...
      
      	skb = nskb;
      	skb_push(skb, ETH_HLEN);
      	skb->pkt_type = PACKET_HOST;
      	skb->protocol = eth_type_trans(skb, skb->dev);
      
      	...
      
      	if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here
      	-> felix_rxtstamp()
      		return 0;
      
      When in felix_rxtstamp(), this driver accounted for the fact that
      eth_type_trans() happened in the meanwhile, so it got a hold of the
      extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes
      from the current skb->data.
      
      This worked for quite some time but was quite fragile from the very
      beginning. Not to mention that having DSA tag parsing split in two
      different files, under different folders (net/dsa/tag_ocelot.c vs
      drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to
      come that they might break this.
      
      Finally, the blamed commit does the following: at the end of
      ocelot_rcv(), it checks whether the skb payload contains a VLAN header.
      If it does, and this port is under a VLAN-aware bridge, that VLAN ID
      might not be correct in the sense that the packet might have suffered
      VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID
      from the skb payload using __skb_vlan_pop(), and take the classified
      VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the
      classified VLAN, and the skb payload is VLAN-untagged.
      
      The big problem is that __skb_vlan_pop() does:
      
      	memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
      	__skb_pull(skb, VLAN_HLEN);
      
      aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes
      from the skb headroom (effectively also moving skb->data, by definition).
      So for felix_rxtstamp()'s fragile logic, all bets are off now.
      Instead of having the "extraction" pointer point to the DSA header,
      it actually points to 4 bytes _inside_ the extraction header.
      Corollary, the last 4 bytes of the "extraction" header are in fact 4
      stale bytes of the destination MAC address from the Ethernet header,
      from prior to the __skb_vlan_pop() movement.
      
      So of course, RX timestamps are completely bogus when the system is
      configured in this way.
      
      The fix is actually very simple: just don't structure the code like that.
      For better or worse, the DSA PTP timestamping API does not offer a
      straightforward way for drivers to present their RX timestamps, but
      other drivers (sja1105) have established a simple mechanism to carry
      their RX timestamp from dsa_device_ops :: rcv() all the way to
      dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to
      simply save the partial timestamp to the skb->cb, and complete it later.
      
      Question: why don't we simply populate the skb's struct
      skb_shared_hwtstamps from ocelot_rcv(), and bother with this
      complication of propagating the timestamp to felix_rxtstamp()?
      
      Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether
      PTP packets need sleepable context to retrieve the full RX timestamp.
      Currently felix_rxtstamp() answers "no, thanks" to that question, and
      calls ocelot_ptp_gettime64() from softirq atomic context. This is
      understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so
      hardware access does not require sleeping. But the felix driver is
      preparing for the introduction of other switches where hardware access
      is over a slow bus like SPI or MDIO:
      https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/
      
      So I would like to keep this code structure, so the rework needed when
      that driver will need PTP support will be minimal (answer "yes, I need
      deferred context for this skb's RX timestamp", then the partial
      timestamp will still be found in the skb->cb.
      
      Fixes: ea440cd2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available")
      Reported-by: default avatarPo Liu <po.liu@nxp.com>
      Cc: Yangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92f62485
    • Ansuel Smith's avatar
      net: dsa: qca8k: make sure PAD0 MAC06 exchange is disabled · 5f15d392
      Ansuel Smith authored
      Some device set MAC06 exchange in the bootloader. This cause some
      problem as we don't support this strange mode and we just set the port6
      as the primary CPU port. With MAC06 exchange, PAD0 reg configure port6
      instead of port0. Add an extra check and explicitly disable MAC06 exchange
      to correctly configure the port PAD config.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Fixes: 3fcf734a ("net: dsa: qca8k: add support for cpu port 6")
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f15d392
    • Ziyang Xuan's avatar
      net: vlan: fix a UAF in vlan_dev_real_dev() · 563bcbae
      Ziyang Xuan authored
      The real_dev of a vlan net_device may be freed after
      unregister_vlan_dev(). Access the real_dev continually by
      vlan_dev_real_dev() will trigger the UAF problem for the
      real_dev like following:
      
      ==================================================================
      BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
      Call Trace:
       kasan_report.cold+0x83/0xdf
       vlan_dev_real_dev+0xf9/0x120
       is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
       is_eth_port_of_netdev_filter+0x28/0x40
       ib_enum_roce_netdev+0x1a3/0x300
       ib_enum_all_roce_netdevs+0xc7/0x140
       netdevice_event_work_handler+0x9d/0x210
      ...
      
      Freed by task 9288:
       kasan_save_stack+0x1b/0x40
       kasan_set_track+0x1c/0x30
       kasan_set_free_info+0x20/0x30
       __kasan_slab_free+0xfc/0x130
       slab_free_freelist_hook+0xdd/0x240
       kfree+0xe4/0x690
       kvfree+0x42/0x50
       device_release+0x9f/0x240
       kobject_put+0x1c8/0x530
       put_device+0x1b/0x30
       free_netdev+0x370/0x540
       ppp_destroy_interface+0x313/0x3d0
      ...
      
      Move the put_device(real_dev) to vlan_dev_free(). Ensure
      real_dev not be freed before vlan_dev unregistered.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      563bcbae
    • Menglong Dong's avatar
      net: udp6: replace __UDP_INC_STATS() with __UDP6_INC_STATS() · 250962e4
      Menglong Dong authored
      __UDP_INC_STATS() is used in udpv6_queue_rcv_one_skb() when encap_rcv()
      fails. __UDP6_INC_STATS() should be used here, so replace it with
      __UDP6_INC_STATS().
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      250962e4
    • Jakub Kicinski's avatar
      ethtool: fix ethtool msg len calculation for pause stats · 1aabe578
      Jakub Kicinski authored
      ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id,
      so we need to subtract non-stats and add one to
      get a count (IOW -2+1 == -1).
      
      Otherwise we'll see:
      
        ethnl cmd 21: calculated reply length 40, but consumed 52
      
      Fixes: 9a27a330 ("ethtool: add standard pause stats")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1aabe578
    • Talal Ahmad's avatar
      net: avoid double accounting for pure zerocopy skbs · 9b65b17d
      Talal Ahmad authored
      Track skbs containing only zerocopy data and avoid charging them to
      kernel memory to correctly account the memory utilization for
      msg_zerocopy. All of the data in such skbs is held in user pages which
      are already accounted to user. Before this change, they are charged
      again in kernel in __zerocopy_sg_from_iter. The charging in kernel is
      excessive because data is not being copied into skb frags. This
      excessive charging can lead to kernel going into memory pressure
      state which impacts all sockets in the system adversely. Mark pure
      zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
      charge/uncharge for data in such skbs.
      
      Initially, an skb is marked pure zerocopy when it is empty and in
      zerocopy path. skb can then change from a pure zerocopy skb to mixed
      data skb (zerocopy and copy data) if it is at tail of write queue and
      there is room available in it and non-zerocopy data is being sent in
      the next sendmsg call. At this time sk_mem_charge is done for the pure
      zerocopied data and the pure zerocopy flag is unmarked. We found that
      this happens very rarely on workloads that pass MSG_ZEROCOPY.
      
      A pure zerocopy skb can later be coalesced into normal skb if they are
      next to each other in queue but this patch prevents coalescing from
      happening. This avoids complexity of charging when skb downgrades from
      pure zerocopy to mixed. This is also rare.
      
      In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
      for SKB_TRUESIZE(skb_end_offset(skb)) is done for sk_mem_charge in
      tcp_skb_entail for an skb without data.
      
      Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
      with zerocopy showed that before this patch the 'sock' variable in
      memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
      sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
      change it is 0. This is due to no charge to sk_forward_alloc for
      zerocopy data and shows memory utilization for kernel is lowered.
      
      With this commit we don't see the warning we saw in previous commit
      which resulted in commit 84882cf7.
      Signed-off-by: default avatarTalal Ahmad <talalahmad@google.com>
      Acked-by: default avatarArjun Roy <arjunroy@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b65b17d
    • Zhang Mingyu's avatar
      net:ipv6:Remove unneeded semicolon · acaea0d5
      Zhang Mingyu authored
      Eliminate the following coccinelle check warning:
      net/ipv6/seg6.c:381:2-3
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarZhang Mingyu <zhang.mingyu@zte.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acaea0d5
    • Lin Ma's avatar
      NFC: add necessary privilege flags in netlink layer · aedddb4e
      Lin Ma authored
      The CAP_NET_ADMIN checks are needed to prevent attackers faking a
      device under NCIUARTSETDRIVER and exploit privileged commands.
      
      This patch add GENL_ADMIN_PERM flags in genl_ops to fulfill the check.
      Except for commands like NFC_CMD_GET_DEVICE, NFC_CMD_GET_TARGET,
      NFC_CMD_LLC_GET_PARAMS, and NFC_CMD_GET_SE, which are mainly information-
      read operations.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aedddb4e
    • David S. Miller's avatar
      Merge branch 'sctp-=security-hook-fixes' · 2bd080b0
      David S. Miller authored
      Xin Long says:
      
      ====================
      security: fixups for the security hooks in sctp
      
      There are a couple of problems in the currect security hooks in sctp:
      
      1. The hooks incorrectly treat sctp_endpoint in SCTP as request_sock in
         TCP, while it's in fact no more than an extension of the sock, and
         represents the local host. It is created when sock is created, not
         when a conn request comes. sctp_association is actually the correct
         one to represent the connection, and created when a conn request
         arrives.
      
      2. security_sctp_assoc_request() hook should also be called in processing
         COOKIE ECHO, as that's the place where the real assoc is created and
         used in the future.
      
      The problems above may cause accept sk, peeloff sk or client sk having
      the incorrect security labels.
      
      So this patchset is to change some hooks and pass asoc into them and save
      these secids into asoc, as well as add the missing sctp_assoc_request
      hook into the COOKIE ECHO processing.
      
      v1->v2:
        - See each patch, and thanks the help from Ondrej, Paul and Richard.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bd080b0
    • Xin Long's avatar
      security: implement sctp_assoc_established hook in selinux · e7310c94
      Xin Long authored
      Different from selinux_inet_conn_established(), it also gives the
      secid to asoc->peer_secid in selinux_sctp_assoc_established(),
      as one UDP-type socket may have more than one asocs.
      
      Note that peer_secid in asoc will save the peer secid for this
      asoc connection, and peer_sid in sksec will just keep the peer
      secid for the latest connection. So the right use should be do
      peeloff for UDP-type socket if there will be multiple asocs in
      one socket, so that the peeloff socket has the right label for
      its asoc.
      
      v1->v2:
        - call selinux_inet_conn_established() to reduce some code
          duplication in selinux_sctp_assoc_established(), as Ondrej
          suggested.
        - when doing peeloff, it calls sock_create() where it actually
          gets secid for socket from socket_sockcreate_sid(). So reuse
          SECSID_WILD to ensure the peeloff socket keeps using that
          secid after calling selinux_sctp_sk_clone() for client side.
      
      Fixes: 72e89f50 ("security: Add support for SCTP security hooks")
      Reported-by: default avatarPrashanth Prahlad <pprahlad@redhat.com>
      Reviewed-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Tested-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7310c94
    • Xin Long's avatar
      security: add sctp_assoc_established hook · 7c2ef024
      Xin Long authored
      security_sctp_assoc_established() is added to replace
      security_inet_conn_established() called in
      sctp_sf_do_5_1E_ca(), so that asoc can be accessed in security
      subsystem and save the peer secid to asoc->peer_secid.
      
      v1->v2:
        - fix the return value of security_sctp_assoc_established() in
          security.h, found by kernel test robot and Ondrej.
      
      Fixes: 72e89f50 ("security: Add support for SCTP security hooks")
      Reported-by: default avatarPrashanth Prahlad <pprahlad@redhat.com>
      Reviewed-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Tested-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c2ef024
    • Xin Long's avatar
      security: call security_sctp_assoc_request in sctp_sf_do_5_1D_ce · e215dab1
      Xin Long authored
      The asoc created when receives the INIT chunk is a temporary one, it
      will be deleted after INIT_ACK chunk is replied. So for the real asoc
      created in sctp_sf_do_5_1D_ce() when the COOKIE_ECHO chunk is received,
      security_sctp_assoc_request() should also be called.
      
      v1->v2:
        - fix some typo and grammar errors, noticed by Ondrej.
      
      Fixes: 72e89f50 ("security: Add support for SCTP security hooks")
      Reported-by: default avatarPrashanth Prahlad <pprahlad@redhat.com>
      Reviewed-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Tested-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e215dab1
    • Xin Long's avatar
      security: pass asoc to sctp_assoc_request and sctp_sk_clone · c081d53f
      Xin Long authored
      This patch is to move secid and peer_secid from endpoint to association,
      and pass asoc to sctp_assoc_request and sctp_sk_clone instead of ep. As
      ep is the local endpoint and asoc represents a connection, and in SCTP
      one sk/ep could have multiple asoc/connection, saving secid/peer_secid
      for new asoc will overwrite the old asoc's.
      
      Note that since asoc can be passed as NULL, security_sctp_assoc_request()
      is moved to the place right after the new_asoc is created in
      sctp_sf_do_5_1B_init() and sctp_sf_do_unexpected_init().
      
      v1->v2:
        - fix the description of selinux_netlbl_skbuff_setsid(), as Jakub noticed.
        - fix the annotation in selinux_sctp_assoc_request(), as Richard Noticed.
      
      Fixes: 72e89f50 ("security: Add support for SCTP security hooks")
      Reported-by: default avatarPrashanth Prahlad <pprahlad@redhat.com>
      Reviewed-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Tested-by: default avatarRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c081d53f
    • David S. Miller's avatar
      Merge branch 'kselftests-net-missing' · 843c3cbb
      David S. Miller authored
      Hangbin Liu says:
      
      ====================
      kselftests/net: add missed tests to Makefile
      
      When generating the selftest to another folder, some tests are missing
      as they are not added in Makefile. e.g.
      
        make -C tools/testing/selftests/ install \
            TARGETS="net" INSTALL_PATH=/tmp/kselftests
      
      These pathset add them separately to make the Fixes tags less. It would
      also make the stable tree or downstream backport easier.
      
      If you think there is no need to add the Fixes tag for this minor issue.
      I can repost a new patch and merge all the fixes together.
      
      Thanks
      
      v3: no update, just rebase to latest net tree.
      v2: move toeplitz.sh/toeplitz_client.sh under TEST_PROGS_EXTENDED.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      843c3cbb
    • Hangbin Liu's avatar
      kselftests/net: add missed toeplitz.sh/toeplitz_client.sh to Makefile · 17b67370
      Hangbin Liu authored
      When generating the selftests to another folder, the toeplitz.sh
      and toeplitz_client.sh are missing as they are not in Makefile, e.g.
      
        make -C tools/testing/selftests/ install \
            TARGETS="net" INSTALL_PATH=/tmp/kselftests
      
      Making them under TEST_PROGS_EXTENDED as they test NIC hardware features
      and are not intended to be run from kselftests.
      
      Fixes: 5ebfb4cc ("selftests/net: toeplitz test")
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17b67370
    • Hangbin Liu's avatar
      kselftests/net: add missed vrf_strict_mode_test.sh test to Makefile · 8883deb5
      Hangbin Liu authored
      When generating the selftests to another folder, the
      vrf_strict_mode_test.sh test will miss as it is not in Makefile, e.g.
      
        make -C tools/testing/selftests/ install \
            TARGETS="net" INSTALL_PATH=/tmp/kselftests
      
      Fixes: 8735e6ea ("selftests: add selftest for the VRF strict mode")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8883deb5
    • Hangbin Liu's avatar
      kselftests/net: add missed SRv6 tests · 653e7f19
      Hangbin Liu authored
      When generating the selftests to another folder, the SRv6 tests are
      missing as they are not in Makefile, e.g.
      
        make -C tools/testing/selftests/ install \
            TARGETS="net" INSTALL_PATH=/tmp/kselftests
      
      Fixes: 03a0b567 ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior")
      Fixes: 2195444e ("selftests: add selftest for the SRv6 End.DT4 behavior")
      Fixes: 2bc03553 ("selftests: add selftest for the SRv6 End.DT6 (VRF) behavior")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      653e7f19
    • Hangbin Liu's avatar
      kselftests/net: add missed setup_loopback.sh/setup_veth.sh to Makefile · b99ac184
      Hangbin Liu authored
      When generating the selftests to another folder, the include file
      setup_loopback.sh/setup_veth.sh for gro.sh/gre_gro.sh are missing as
      they are not in Makefile, e.g.
      
        make -C tools/testing/selftests/ install \
            TARGETS="net" INSTALL_PATH=/tmp/kselftests
      
      Fixes: 7d157501 ("selftests/net: GRO coalesce test")
      Fixes: 9af771d2 ("selftests/net: allow GRO coalesce test on veth")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b99ac184
    • Hangbin Liu's avatar
      kselftests/net: add missed icmp.sh test to Makefile · ca3676f9
      Hangbin Liu authored
      When generating the selftests to another folder, the icmp.sh test will
      miss as it is not in Makefile, e.g.
      
        make -C tools/testing/selftests/ install \
            TARGETS="net" INSTALL_PATH=/tmp/kselftests
      
      Fixes: 7e9838b7 ("selftests/net: Add icmp.sh for testing ICMP dummy address responses")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca3676f9
    • Jiapeng Chong's avatar
      amt: Remove duplicate include · a4414341
      Jiapeng Chong authored
      Clean up the following includecheck warning:
      
      ./drivers/net/amt.c: net/protocol.h is included more than once.
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Reviewed-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4414341
    • Yang Yingliang's avatar
      amt: fix error return code in amt_init() · db243434
      Yang Yingliang authored
      Return error code when alloc_workqueue()
      fails in amt_init().
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Link: https://lore.kernel.org/r/20211102130353.1666999-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db243434
    • Shay Agroskin's avatar
      MAINTAINERS: Update ENA maintainers information · 18635d52
      Shay Agroskin authored
      The ENA driver is no longer maintained by Netanel and Guy
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Link: https://lore.kernel.org/r/20211102110358.193920-1-shayagr@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      18635d52
    • Eric Dumazet's avatar
      net: add and use skb_unclone_keeptruesize() helper · c4777efa
      Eric Dumazet authored
      While commit 097b9146 ("net: fix up truesize of cloned
      skb in skb_prepare_for_shift()") fixed immediate issues found
      when KFENCE was enabled/tested, there are still similar issues,
      when tcp_trim_head() hits KFENCE while the master skb
      is cloned.
      
      This happens under heavy networking TX workloads,
      when the TX completion might be delayed after incoming ACK.
      
      This patch fixes the WARNING in sk_stream_kill_queues
      when sk->sk_mem_queued/sk->sk_forward_alloc are not zero.
      
      Fixes: d3fb45f3 ("mm, kfence: insert KFENCE hooks for SLAB")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Link: https://lore.kernel.org/r/20211102004555.1359210-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c4777efa
    • Geert Uytterhoeven's avatar
      net: marvell: prestera: Add explicit padding · 236f57fe
      Geert Uytterhoeven authored
      On m68k:
      
          In function ‘prestera_hw_build_tests’,
      	inlined from ‘prestera_hw_switch_init’ at drivers/net/ethernet/marvell/prestera/prestera_hw.c:788:2:
          ././include/linux/compiler_types.h:335:38: error: call to ‘__compiletime_assert_345’ declared with attribute error: BUILD_BUG_ON failed: sizeof(struct prestera_msg_switch_attr_req) != 16
          ...
      
      The driver assumes structure members are naturally aligned, but does not
      add explicit padding, thus breaking architectures where integral values
      are not always naturally aligned (e.g. on m68k, __alignof(int) is 2, not
      4).
      
      Fixes: bb5dbf2c ("net: marvell: prestera: add firmware v4.0 support")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20211102082433.3820514-1-geert@linux-m68k.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      236f57fe
    • Wan Jiabing's avatar
      bnxt_en: avoid newline at end of message in NL_SET_ERR_MSG_MOD · 6ab9f57a
      Wan Jiabing authored
      Fix following coccicheck warning:
      ./drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:446:8-56: WARNING
      avoid newline at end of message in NL_SET_ERR_MSG_MOD.
      Signed-off-by: default avatarWan Jiabing <wanjiabing@vivo.com>
      Link: https://lore.kernel.org/r/20211102020312.16567-1-wanjiabing@vivo.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ab9f57a
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 71229d04
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      1) Fix mac address UAF reported by KASAN in nfnetlink_queue,
         from Florian Westphal.
      
      2) Autoload genetlink IPVS on demand, from Thomas Weissschuh.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
        ipvs: autoload ipvs on genl access
        netfilter: nfnetlink_queue: fix OOB when mac header was cleared
      ====================
      
      Link: https://lore.kernel.org/r/20211101221528.236114-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      71229d04
    • Maxim Kiselev's avatar
      net: davinci_emac: Fix interrupt pacing disable · d52bcb47
      Maxim Kiselev authored
      This patch allows to use 0 for `coal->rx_coalesce_usecs` param to
      disable rx irq coalescing.
      
      Previously we could enable rx irq coalescing via ethtool
      (For ex: `ethtool -C eth0 rx-usecs 2000`) but we couldn't disable
      it because this part rejects 0 value:
      
             if (!coal->rx_coalesce_usecs)
                     return -EINVAL;
      
      Fixes: 84da2658 ("TI DaVinci EMAC : Implement interrupt pacing functionality.")
      Signed-off-by: default avatarMaxim Kiselev <bigunclemax@gmail.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Link: https://lore.kernel.org/r/20211101152343.4193233-1-bigunclemax@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d52bcb47
    • Yuiko Oshino's avatar
      net: phy: microchip_t1: add lan87xx_config_rgmii_delay for lan87xx phy · 26499499
      Yuiko Oshino authored
      Add a function to initialize phy rgmii delay according to phydev->interface.
      Signed-off-by: default avatarYuiko Oshino <yuiko.oshino@microchip.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20211101162119.29275-1-yuiko.oshino@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      26499499