1. 20 Oct, 2021 9 commits
    • Peng Li's avatar
      net: hns3: disable sriov before unload hclge layer · 0dd8a25f
      Peng Li authored
      HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko.
      hns3.ko includes network stack and pci_driver, hclge.ko includes
      HW device action, algo_ops and timer task, hnae3.ko includes some
      register function.
      
      When SRIOV is enable and hclge.ko is removed, HW device is unloaded
      but VF still exists, PF will not reply VF mbx messages, and cause
      errors.
      
      This patch fix it by disable SRIOV before remove hclge.ko.
      
      Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dd8a25f
    • Yufeng Mo's avatar
      net: hns3: fix vf reset workqueue cannot exit · 1385cc81
      Yufeng Mo authored
      The task of VF reset is performed through the workqueue. It checks the
      value of hdev->reset_pending to determine whether to exit the loop.
      However, the value of hdev->reset_pending may also be assigned by
      the interrupt function hclgevf_misc_irq_handle(), which may cause the
      loop fail to exit and keep occupying the workqueue. This loop is not
      necessary, so remove it and the workqueue will be rescheduled if the
      reset needs to be retried or a new reset occurs.
      
      Fixes: 1cc9bc6e ("net: hns3: split hclgevf_reset() into preparing and rebuilding part")
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1385cc81
    • Yunsheng Lin's avatar
      net: hns3: schedule the polling again when allocation fails · 68752b24
      Yunsheng Lin authored
      Currently when there is a rx page allocation failure, it is
      possible that polling may be stopped if there is no more packet
      to be reveiced, which may cause queue stall problem under memory
      pressure.
      
      This patch makes sure polling is scheduled again when there is
      any rx page allocation failure, and polling will try to allocate
      receive buffers until it succeeds.
      
      Now the allocation retry is added, it is unnecessary to do the rx
      page allocation at the end of rx cleaning, so remove it. And reset
      the unused_count to zero after calling hns3_nic_alloc_rx_buffers()
      to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under
      memory pressure.
      
      Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68752b24
    • Yunsheng Lin's avatar
      net: hns3: fix for miscalculation of rx unused desc · 9f9f0f19
      Yunsheng Lin authored
      rx unused desc is the desc that need attatching new buffer
      before refilling to hw to receive new packet, the number of
      desc need attatching new buffer is calculated using next_to_use
      and next_to_clean. when next_to_use == next_to_clean, currently
      hns3 driver assumes that all the desc has the buffer attatched,
      but 'next_to_use == next_to_clean' also means all the desc need
      attatching new buffer if hw has comsumed all the desc and the
      driver has not attatched any buffer to the desc yet.
      
      This patch adds 'refill' in desc_cb to indicate whether a new
      buffer has been refilled to a desc.
      
      Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f9f0f19
    • Yunsheng Lin's avatar
      net: hns3: fix the max tx size according to user manual · adfb7b49
      Yunsheng Lin authored
      Currently the max tx size supported by the hw is calculated by
      using the max BD num supported by the hw. According to the hw
      user manual, the max tx size is fixed value for both non-TSO and
      TSO skb.
      
      This patch updates the max tx size according to the manual.
      
      Fixes: 8ae10cfb("net: hns3: support tx-scatter-gather-fraglist feature")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adfb7b49
    • Guangbin Huang's avatar
      net: hns3: add limit ets dwrr bandwidth cannot be 0 · 731797fd
      Guangbin Huang authored
      If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP
      mode. In this case, this tc may occupy all the tx bandwidth if it has
      huge traffic, so it violates the purpose of the user setting.
      
      To fix this problem, limit the ets dwrr bandwidth must greater than 0.
      
      Fixes: cacde272 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      731797fd
    • Guangbin Huang's avatar
      net: hns3: reset DWRR of unused tc to zero · b63fcaab
      Guangbin Huang authored
      Currently, DWRR of tc will be initialized to a fixed value when this tc
      is enabled, but it is not been reset to 0 when this tc is disabled. It
      cause a problem that the DWRR of unused tc is not 0 after using tc tool
      to add and delete multi-tc parameters.
      
      For examples, after enabling 4 TCs and restoring to 1 TC by follow
      tc commands:
      
      $ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \
        8@0 8@8 8@16 8@24 hw 1 mode channel
      $ tc qdisc del dev eth0 root
      
      Now there is just one TC is enabled for eth0, but the tc info querying by
      debugfs is shown as follow:
      
      $ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info
      enabled tc number: 1
      weight_offset: 14
      TC    MODE  WEIGHT
      0     dwrr    100
      1     dwrr    100
      2     dwrr    100
      3     dwrr    100
      4     dwrr      0
      5     dwrr      0
      6     dwrr      0
      7     dwrr      0
      
      This patch fixes it by resetting DWRR of tc to 0 when tc is disabled.
      
      Fixes: 84844054 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b63fcaab
    • Jiaran Zhang's avatar
      net: hns3: Add configuration of TM QCN error event · 60484103
      Jiaran Zhang authored
      Add configuration of interrupt type and fifo interrupt enable of TM QCN
      error event if enabled, otherwise this event will not be reported when
      there is error.
      
      Fixes: d914971d ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
      Signed-off-by: default avatarJiaran Zhang <zhangjiaran@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60484103
    • Eugene Crosser's avatar
      vrf: Revert "Reset skb conntrack connection..." · 55161e67
      Eugene Crosser authored
      This reverts commit 09e856d5.
      
      When an interface is enslaved in a VRF, prerouting conntrack hook is
      called twice: once in the context of the original input interface, and
      once in the context of the VRF interface. If no special precausions are
      taken, this leads to creation of two conntrack entries instead of one,
      and breaks SNAT.
      
      Commit above was intended to avoid creation of extra conntrack entries
      when input interface is enslaved in a VRF. It did so by resetting
      conntrack related data associated with the skb when it enters VRF context.
      
      However it breaks netfilter operation. Imagine a use case when conntrack
      zone must be assigned based on the original input interface, rather than
      VRF interface (that would make original interfaces indistinguishable). One
      could create netfilter rules similar to these:
      
              chain rawprerouting {
                      type filter hook prerouting priority raw;
                      iif realiface1 ct zone set 1 return
                      iif realiface2 ct zone set 2 return
              }
      
      This works before the mentioned commit, but not after: zone assignment
      is "forgotten", and any subsequent NAT or filtering that is dependent
      on the conntrack zone does not work.
      
      Here is a reproducer script that demonstrates the difference in behaviour.
      
      ==========
      #!/bin/sh
      
      # This script demonstrates unexpected change of nftables behaviour
      # caused by commit 09e856d5 ""vrf: Reset skb conntrack
      # connection on VRF rcv"
      # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
      #
      # Before the commit, it was possible to assign conntrack zone to a
      # packet (or mark it for `notracking`) in the prerouting chanin, raw
      # priority, based on the `iif` (interface from which the packet
      # arrived).
      # After the change, # if the interface is enslaved in a VRF, such
      # assignment is lost. Instead, assignment based on the `iif` matching
      # the VRF master interface is honored. Thus it is impossible to
      # distinguish packets based on the original interface.
      #
      # This script demonstrates this change of behaviour: conntrack zone 1
      # or 2 is assigned depending on the match with the original interface
      # or the vrf master interface. It can be observed that conntrack entry
      # appears in different zone in the kernel versions before and after
      # the commit.
      
      IPIN=172.30.30.1
      IPOUT=172.30.30.2
      PFXL=30
      
      ip li sh vein >/dev/null 2>&1 && ip li del vein
      ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
      nft list table testct >/dev/null 2>&1 && nft delete table testct
      
      ip li add vein type veth peer veout
      ip li add tvrf type vrf table 9876
      ip li set veout master tvrf
      ip li set vein up
      ip li set veout up
      ip li set tvrf up
      /sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
      /sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
      ip addr add $IPIN/$PFXL dev vein
      ip addr add $IPOUT/$PFXL dev veout
      
      nft -f - <<__END__
      table testct {
      	chain rawpre {
      		type filter hook prerouting priority raw;
      		iif { veout, tvrf } meta nftrace set 1
      		iif veout ct zone set 1 return
      		iif tvrf ct zone set 2 return
      		notrack
      	}
      	chain rawout {
      		type filter hook output priority raw;
      		notrack
      	}
      }
      __END__
      
      uname -rv
      conntrack -F
      ping -W 1 -c 1 -I vein $IPOUT
      conntrack -L
      Signed-off-by: default avatarEugene Crosser <crosser@average.org>
      Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55161e67
  2. 19 Oct, 2021 5 commits
  3. 18 Oct, 2021 12 commits
  4. 17 Oct, 2021 11 commits
  5. 16 Oct, 2021 3 commits
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: use multicast_membership_interval for IGMPv3 · fac3cb82
      Nikolay Aleksandrov authored
      When I added IGMPv3 support I decided to follow the RFC for computing
      the GMI dynamically:
      " 8.4. Group Membership Interval
      
         The Group Membership Interval is the amount of time that must pass
         before a multicast router decides there are no more members of a
         group or a particular source on a network.
      
         This value MUST be ((the Robustness Variable) times (the Query
         Interval)) plus (one Query Response Interval)."
      
      But that actually is inconsistent with how the bridge used to compute it
      for IGMPv2, where it was user-configurable that has a correct default value
      but it is up to user-space to maintain it. This would make it consistent
      with the other timer values which are also maintained correct by the user
      instead of being dynamically computed. It also changes back to the previous
      user-expected GMI behaviour for IGMPv3 queries which were supported before
      IGMPv3 was added. Note that to properly compute it dynamically we would
      need to add support for "Robustness Variable" which is currently missing.
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Fixes: 0436862e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fac3cb82
    • Stefano Garzarella's avatar
      vsock_diag_test: remove free_sock_stat() call in test_no_sockets · ba95a622
      Stefano Garzarella authored
      In `test_no_sockets` we don't expect any sockets, indeed
      check_no_sockets() prints an error and exits if `sockets` list is
      not empty, so free_sock_stat() call is unnecessary since it would
      only be called when the `sockets` list is empty.
      
      This was discovered by a strange warning printed by gcc v11.2.1:
        In file included from ../../include/linux/list.h:7,
                         from vsock_diag_test.c:18:
        vsock_diag_test.c: In function ‘test_no_sockets’:
        ../../include/linux/kernel.h:35:45: error: array subscript ‘struct vsock_stat[0]’ is partly outside array bound
        s of ‘struct list_head[1]’ [-Werror=array-bounds]
           35 |         const typeof(((type *)0)->member) * __mptr = (ptr);     \
              |                                             ^~~~~~
        ../../include/linux/list.h:352:9: note: in expansion of macro ‘container_of’
          352 |         container_of(ptr, type, member)
              |         ^~~~~~~~~~~~
        ../../include/linux/list.h:393:9: note: in expansion of macro ‘list_entry’
          393 |         list_entry((pos)->member.next, typeof(*(pos)), member)
              |         ^~~~~~~~~~
        ../../include/linux/list.h:522:21: note: in expansion of macro ‘list_next_entry’
          522 |                 n = list_next_entry(pos, member);                       \
              |                     ^~~~~~~~~~~~~~~
        vsock_diag_test.c:325:9: note: in expansion of macro ‘list_for_each_entry_safe’
          325 |         list_for_each_entry_safe(st, next, sockets, list) {
              |         ^~~~~~~~~~~~~~~~~~~~~~~~
        In file included from vsock_diag_test.c:18:
        vsock_diag_test.c:333:19: note: while referencing ‘sockets’
          333 |         LIST_HEAD(sockets);
              |                   ^~~~~~~
        ../../include/linux/list.h:23:26: note: in definition of macro ‘LIST_HEAD’
           23 |         struct list_head name = LIST_HEAD_INIT(name)
      
      It seems related to some compiler optimization and assumption
      about the empty `sockets` list, since this warning is printed
      only with -02 or -O3. Also removing `exit(1)` from
      check_no_sockets() makes the warning disappear since in that
      case free_sock_stat() can be reached also when the list is
      not empty.
      Reported-by: default avatarMarc-André Lureau <marcandre.lureau@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20211014152045.173872-1-sgarzare@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ba95a622
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 2151135a
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-10-14
      
      Brett ensures RDMA nodes are removed during release and rebuild. He also
      corrects fw.mgmt.api to include the patch number for proper
      identification.
      
      Dave stops ida_free() being called when an IDA has not been allocated.
      
      Michal corrects the order of parameters being provided and the number of
      entries skipped for UDP tunnels.
      ====================
      
      Link: https://lore.kernel.org/r/20211014181953.3538330-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2151135a