1. 20 Apr, 2022 30 commits
    • David S. Miller's avatar
      Merge branch 'net-sched-flower-num-vlan-tags' · c1f6f1e6
      David S. Miller authored
      Boris Sukholitko says:
      
      ====================
      net/sched: flower: match on the number of vlan tags
      
      Our customers in the fiber telecom world have network configurations
      where they would like to control their traffic according to the number
      of tags appearing in the packet.
      
      For example, TR247 GPON conformance test suite specification mostly
      talks about untagged, single, double tagged packets and gives lax
      guidelines on the vlan protocol vs. number of vlan tags.
      
      This is different from the common IT networks where 802.1Q and 802.1ad
      protocols are usually describe single and double tagged packet. GPON
      configurations that we work with have arbitrary mix the above protocols
      and number of vlan tags in the packet.
      
      The following patch series implement number of vlans flower filter. They
      add num_of_vlans flower filter as an alternative to vlan ethtype protocol
      matching. The end result is that the following command becomes possible:
      
      tc filter add dev eth1 ingress flower \
        num_of_vlans 1 vlan_prio 5 action drop
      
      Also, from our logs, we have redirect rules such that:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N \
           action mirred egress redirect dev $DEV
      
      where N can range from 0 to 3 and $DEV is the function of $N.
      
      Also there are rules setting skb mark based on the number of vlans:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
          $P action skbedit mark $M
      
      More about the patch series:
        - patches 1-2 remove duplicate code by introducing is_key_vlan
          helper.
        - patch 3, 4 implement num_of_vlans in the dissector and in the
          flower.
        - patch 5 uses the num_of_vlans filter to allow further matching on
          vlan attributes.
      
      Complementary iproute2 patches are being sent separately.
      
      Thanks,
      Boris.
      
      - v4: rebased to the latest net-next
      - v3:
          - more example commands in patch 3 description (request by Jamal)
          - patch 5 description made clearer (thanks to Jiri)
      - v2:
          - add suitable subject prefixes
          - more evolved patch 5 description
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1f6f1e6
    • Boris Sukholitko's avatar
      net/sched: flower: Consider the number of tags for vlan filters · 99fdb22b
      Boris Sukholitko authored
      Before this patch the existence of vlan filters was conditional on the vlan
      protocol being matched in the tc rule. For example, the following rule:
      
      tc filter add dev eth1 ingress flower vlan_prio 5
      
      was illegal because vlan protocol (e.g. 802.1q) does not appear in the rule.
      
      Remove the above restriction by looking at the num_of_vlans filter to
      allow further matching on vlan attributes. The following rule becomes
      legal as a result of this commit:
      
      tc filter add dev eth1 ingress flower num_of_vlans 1 vlan_prio 5
      
      because having num_of_vlans==1 implies that the packet is single tagged.
      
      Change is_vlan_key helper to look at the number of vlans in addition to
      the vlan ethertype. The outcome of this change is that outer (e.g. vlan_prio)
      and inner (e.g. cvlan_prio) tag vlan filters require the number of vlan
      tags to be greater then 0 and 1 accordingly.
      
      As a result of is_vlan_key change, the ethertype may be set to 0 when
      matching on the number of vlans. Update fl_set_key_vlan to avoid setting
      key, mask vlan_tpid for the 0 ethertype.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99fdb22b
    • Boris Sukholitko's avatar
      net/sched: flower: Add number of vlan tags filter · b4000312
      Boris Sukholitko authored
      These are bookkeeping parts of the new num_of_vlans filter.
      Defines, dump, load and set are being done here.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4000312
    • Boris Sukholitko's avatar
      flow_dissector: Add number of vlan tags dissector · 34951fcf
      Boris Sukholitko authored
      Our customers in the fiber telecom world have network configurations
      where they would like to control their traffic according to the number
      of tags appearing in the packet.
      
      For example, TR247 GPON conformance test suite specification mostly
      talks about untagged, single, double tagged packets and gives lax
      guidelines on the vlan protocol vs. number of vlan tags.
      
      This is different from the common IT networks where 802.1Q and 802.1ad
      protocols are usually describe single and double tagged packet. GPON
      configurations that we work with have arbitrary mix the above protocols
      and number of vlan tags in the packet.
      
      The goal is to make the following TC commands possible:
      
      tc filter add dev eth1 ingress flower \
        num_of_vlans 1 vlan_prio 5 action drop
      
      From our logs, we have redirect rules such that:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N \
           action mirred egress redirect dev $DEV
      
      where N can range from 0 to 3 and $DEV is the function of $N.
      
      Also there are rules setting skb mark based on the number of vlans:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
          $P action skbedit mark $M
      
      This new dissector allows extracting the number of vlan tags existing in
      the packet.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34951fcf
    • Boris Sukholitko's avatar
    • Boris Sukholitko's avatar
      net/sched: flower: Helper function for vlan ethtype checks · 285ba06b
      Boris Sukholitko authored
      There are somewhat repetitive ethertype checks in fl_set_key. Refactor
      them into is_vlan_key helper function.
      
      To make the changes clearer, avoid touching identation levels. This is
      the job for the next patch in the series.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      285ba06b
    • Haowen Bai's avatar
      ar5523: Use kzalloc instead of kmalloc/memset · e63dd412
      Haowen Bai authored
      Use kzalloc rather than duplicating its implementation, which
      makes code simple and easy to understand.
      Signed-off-by: default avatarHaowen Bai <baihaowen@meizu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e63dd412
    • Luiz Angelo Daros de Luca's avatar
      net: dsa: realtek: remove realtek,rtl8367s string · fcd30c96
      Luiz Angelo Daros de Luca authored
      There is no need to add new compatible strings for each new supported
      chip version. The compatible string is used only to select the subdriver
      (rtl8365mb.c or rtl8366rb.c). Once in the subdriver, it will detect the
      chip model by itself, ignoring which compatible string was used.
      
      Link: https://lore.kernel.org/netdev/20220414014055.m4wbmr7tdz6hsa3m@bang-olufsen.dk/Signed-off-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Reviewed-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcd30c96
    • Luiz Angelo Daros de Luca's avatar
      dt-bindings: net: dsa: realtek: cleanup compatible strings · 6f2d04cc
      Luiz Angelo Daros de Luca authored
      Compatible strings are used to help the driver find the chip ID/version
      register for each chip family. After that, the driver can setup the
      switch accordingly. Keep only the first supported model for each family
      as a compatible string and reference other chip models in the
      description.
      
      The removed compatible strings have never been used in a released kernel.
      
      CC: devicetree@vger.kernel.org
      Link: https://lore.kernel.org/netdev/20220414014055.m4wbmr7tdz6hsa3m@bang-olufsen.dk/Signed-off-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f2d04cc
    • David S. Miller's avatar
      Merge branch 'hns3-next' · e92453b9
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: updates for -next
      
      This series includes some updates for the HNS3 ethernet driver.
      
      Change logs:
      V1 -> V2:
       - Fix failed to apply to net-next problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e92453b9
    • Hao Chen's avatar
      net: hns3: remove unnecessary line wrap for hns3_set_tunable · 29c17cb6
      Hao Chen authored
      Remove unnecessary line wrap for hns3_set_tunable to improve
      function readability.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29c17cb6
    • Peng Li's avatar
      net: hns3: replace magic value by HCLGE_RING_REG_OFFSET · 350cb440
      Peng Li authored
      Magic values are not recommended.
      
      Signed-off-by: Peng Li<lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      350cb440
    • Peng Li's avatar
      net: hns3: fix the wrong words in comments · 9c657cbc
      Peng Li authored
      This patch fixes wrong words in comments.
      
      Signed-off-by: Peng Li<lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c657cbc
    • Peng Li's avatar
      net: hns3: update the comment of function hclgevf_get_mbx_resp · 2e0f5388
      Peng Li authored
      The param of function hclgevf_get_mbx_resp has been changed but the
      comments not upodated. This patch updates it.
      
      Signed-off-by: Peng Li<lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e0f5388
    • Hao Chen's avatar
      net: hns3: add log for setting tx spare buf size · 2373b35c
      Hao Chen authored
      For the active tx spare buffer size maybe changed according
      to the page size, so add log to notice it.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2373b35c
    • Jie Wang's avatar
      net: hns3: add failure logs in hclge_set_vport_mtu · bcc7a98f
      Jie Wang authored
      Currently, There is a low probability that pf mtu configuration fails, but
      the information in logs is insufficient for problem locating when the VF
      mtu value is illegally modified.
      
      So record the vf index and vf mtu value at the failure scenario.
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcc7a98f
    • Jian Shen's avatar
      net: hns3: refine the definition for struct hclge_pf_to_vf_msg · 6fde96df
      Jian Shen authored
      The struct hclge_pf_to_vf_msg is used for mailbox message from
      PF to VF, including both response and request. But its definition
      can only indicate respone, which makes the message data copy in
      function hclge_send_mbx_msg() unreadable. So refine it by edding
      a general message definition into it.
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fde96df
    • Hao Chen's avatar
      net: hns3: refactor hns3_set_ringparam() · 07fdc163
      Hao Chen authored
      Use struct hns3_ring_param to replace variable new/old_xxx and
      add hns3_is_ringparam_changed() to judge them if is changed to
      improve code readability.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07fdc163
    • Yufeng Mo's avatar
      net: hns3: add ethtool parameter check for CQE/EQE mode · 286c61e7
      Yufeng Mo authored
      For DEVICE_VERSION_V2, the hardware does not support the CQE mode.
      So add capability bit for coalesce CQE mode and add parameter check
      for it in ethtool.
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      286c61e7
    • David S. Miller's avatar
      Merge branch 'atlantic-xdp-multi-buffer' · e97e917b
      David S. Miller authored
      [PATCH net-next v5 0/3] net: atlantic: Add XDP support
      @ 2022-04-17 10:12 Taehee Yoo
        2022-04-17 10:12 ` [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane Taehee Yoo
                         ` (2 more replies)
        0 siblings, 3 replies; 4+ messages in thread
      From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw)
        To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk,
      	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf
        Cc: ap420073
      
      This patchset is to make atlantic to support multi-buffer XDP.
      
      The first patch implement control plane of xdp.
      The aq_xdp(), callback of .xdp_bpf is added.
      
      The second patch implements data plane of xdp.
      XDP_TX, XDP_DROP, and XDP_PASS is supported.
      __aq_ring_xdp_clean() is added to receive and execute xdp program.
      aq_nic_xmit_xdpf() is added to send packet by XDP.
      
      The third patch implements callback of .ndo_xdp_xmit.
      aq_xdp_xmit() is added to send redirected packets and it internally
      calls aq_nic_xmit_xdpf().
      
      Memory model is MEM_TYPE_PAGE_SHARED.
      
      Order-2 page allocation is used when XDP is enabled.
      
      LRO will be disabled if XDP program doesn't supports multi buffer.
      
      AQC chip supports 32 multi-queues and 8 vectors(irq).
      There are two options.
      1. under 8 cores and maximum 4 tx queues per core.
      2. under 4 cores and maximum 8 tx queues per core.
      
      Like other drivers, these tx queues can be used only for XDP_TX,
      XDP_REDIRECT queue. If so, no tx_lock is needed.
      But this patchset doesn't use this strategy because getting hardware tx
      queue index cost is too high.
      So, tx_lock is used in the aq_nic_xmit_xdpf().
      
      single-core, single queue, 80% cpu utilization.
      
        32.30%  [kernel]                  [k] aq_get_rxpages_xdp
        10.44%  [kernel]                  [k] aq_hw_read_reg <---------- here
         9.86%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
         5.51%  [kernel]                  [k] aq_ring_rx_clean
      
      single-core, 8 queues, 100% cpu utilization, half PPS.
      
        52.03%  [kernel]                  [k] aq_hw_read_reg <---------- here
        18.24%  [kernel]                  [k] aq_get_rxpages_xdp
         4.30%  [kernel]                  [k] hw_atl_b0_hw_ring_rx_receive
         4.24%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
         2.79%  [kernel]                  [k] aq_ring_rx_clean
      
      Performance result(64 Byte)
      1. XDP_TX
        a. xdp_geieric, single core
          - 2.5Mpps, 100% cpu
        b. xdp_driver, single core
          - 4.5Mpps, 80% cpu
        c. xdp_generic, 8 core(hyper thread)
          - 6.3Mpps, 40% cpu
        d. xdp_driver, 8 core(hyper thread)
          - 6.3Mpps, 30% cpu
      
      2. XDP_REDIRECT
        a. xdp_generic, single core
          - 2.3Mpps
        b. xdp_driver, single core
          - 4.5Mpps
      
      v5:
       - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0
       - Use 2K frame size instead of 3K
       - Use order-2 page allocation instead of order-0
       - Rename aq_get_rxpage() to aq_alloc_rxpages()
       - Add missing PageFree stats for ethtool
       - Remove aq_unset_rxpage_xdp(), introduced by v2 patch due to
         change of memory model
       - Fix wrong last parameter value of xdp_prepare_buff()
       - Add aq_get_rxpages_xdp() to increase page reference count
      
      v4:
       - Fix compile warning
      
      v3:
       - Change wrong PPS performance result 40% -> 80% in single
         core(Intel i3-12100)
       - Separate aq_nic_map_xdp() from aq_nic_map_skb()
       - Drop multi buffer packets if single buffer XDP is attached
       - Disable LRO when single buffer XDP is attached
       - Use xdp_get_{frame/buff}_len()
      
      v2:
       - Do not use inline in C file
      
      Taehee Yoo (3):
        net: atlantic: Implement xdp control plane
        net: atlantic: Implement xdp data plane
        net: atlantic: Implement .ndo_xdp_xmit handler
      
       .../net/ethernet/aquantia/atlantic/aq_cfg.h   |   1 +
       .../ethernet/aquantia/atlantic/aq_ethtool.c   |   9 +
       .../net/ethernet/aquantia/atlantic/aq_main.c  |  87 ++++
       .../net/ethernet/aquantia/atlantic/aq_main.h  |   2 +
       .../net/ethernet/aquantia/atlantic/aq_nic.c   | 136 ++++++
       .../net/ethernet/aquantia/atlantic/aq_nic.h   |   5 +
       .../net/ethernet/aquantia/atlantic/aq_ring.c  | 409 ++++++++++++++++--
       .../net/ethernet/aquantia/atlantic/aq_ring.h  |  21 +-
       .../net/ethernet/aquantia/atlantic/aq_vec.c   |  23 +-
       .../net/ethernet/aquantia/atlantic/aq_vec.h   |   6 +
       .../aquantia/atlantic/hw_atl/hw_atl_a0.c      |   6 +-
       .../aquantia/atlantic/hw_atl/hw_atl_b0.c      |  10 +-
       12 files changed, 670 insertions(+), 45 deletions(-)
      
      --
      2.17.1
      
      ^ permalink raw reply	[flat|nested] 4+ messages in thread
      * [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane
        2022-04-17 10:12 [PATCH net-next v5 0/3] net: atlantic: Add XDP support Taehee Yoo
      @ 2022-04-17 10:12 ` Taehee Yoo
        2022-04-17 10:12 ` [PATCH net-next v5 2/3] net: atlantic: Implement xdp data plane Taehee Yoo
        2022-04-17 10:12 ` [PATCH net-next v5 3/3] net: atlantic: Implement .ndo_xdp_xmit handler Taehee Yoo
        2 siblings, 0 replies; 4+ messages in thread
      From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw)
        To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk,
      	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf
        Cc: ap420073
      
      aq_xdp() is a xdp setup callback function for Atlantic driver.
      When XDP is attached or detached, the device will be restarted because
      it uses different headroom, tailroom, and page order value.
      
      If XDP enabled, it switches default page order value from 0 to 2.
      Because the default maximum frame size is still 2K and it needs
      additional area for headroom and tailroom.
      The total size(headroom + frame size + tailroom) is 2624.
      So, 1472Bytes will be always wasted for every frame.
      But when order-2 is used, these pages can be used 6 times
      with flip strategy.
      It means only about 106Bytes per frame will be wasted.
      
      Also, It supports xdp fragment feature.
      MTU can be 16K if xdp prog supports xdp fragment.
      If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.
      
      And a static key is added and It will be used to call the xdp_clean
      handler in ->poll(). data plane implementation will be contained
      the followed patch.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      ---
      
      v5:
       - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0
       - Use 2K frame size instead of 3K
       - Use order-2 page allocation instead of order-0
       - Rename aq_get_rxpage() to aq_alloc_rxpages()
      
      v4:
       - No changed
      
      v3:
       - Disable LRO when single buffer XDP is attached
      
      v2:
       - No changed
      e97e917b
    • Taehee Yoo's avatar
      net: atlantic: Implement .ndo_xdp_xmit handler · 45638f01
      Taehee Yoo authored
      aq_xdp_xmit() is the callback function of .ndo_xdp_xmit.
      It internally calls aq_nic_xmit_xdpf() to send packet.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45638f01
    • Taehee Yoo's avatar
      net: atlantic: Implement xdp data plane · 26efaef7
      Taehee Yoo authored
      It supports XDP_PASS, XDP_DROP and multi buffer.
      
      The new function aq_nic_xmit_xdpf() is used to send packet with
      xdp_frame and internally it calls aq_nic_map_xdp().
      
      AQC chip supports 32 multi-queues and 8 vectors(irq).
      there are two option
      1. under 8 cores and 4 tx queues per core.
      2. under 4 cores and 8 tx queues per core.
      
      Like ixgbe, these tx queues can be used only for XDP_TX, XDP_REDIRECT
      queue. If so, no tx_lock is needed.
      But this patchset doesn't use this strategy because getting hardware tx
      queue index cost is too high.
      So, tx_lock is used in the aq_nic_xmit_xdpf().
      
      single-core, single queue, 80% cpu utilization.
      
        30.75%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
        10.35%  [kernel]                  [k] aq_hw_read_reg <---------- here
         4.38%  [kernel]                  [k] get_page_from_freelist
      
      single-core, 8 queues, 100% cpu utilization, half PPS.
      
        45.56%  [kernel]                  [k] aq_hw_read_reg <---------- here
        17.58%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
         4.72%  [kernel]                  [k] hw_atl_b0_hw_ring_rx_receive
      
      The new function __aq_ring_xdp_clean() is a xdp rx handler and this is
      called only when XDP is attached.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26efaef7
    • Taehee Yoo's avatar
      net: atlantic: Implement xdp control plane · 0d14657f
      Taehee Yoo authored
      aq_xdp() is a xdp setup callback function for Atlantic driver.
      When XDP is attached or detached, the device will be restarted because
      it uses different headroom, tailroom, and page order value.
      
      If XDP enabled, it switches default page order value from 0 to 2.
      Because the default maximum frame size is still 2K and it needs
      additional area for headroom and tailroom.
      The total size(headroom + frame size + tailroom) is 2624.
      So, 1472Bytes will be always wasted for every frame.
      But when order-2 is used, these pages can be used 6 times
      with flip strategy.
      It means only about 106Bytes per frame will be wasted.
      
      Also, It supports xdp fragment feature.
      MTU can be 16K if xdp prog supports xdp fragment.
      If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.
      
      And a static key is added and It will be used to call the xdp_clean
      handler in ->poll(). data plane implementation will be contained
      the followed patch.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d14657f
    • David S. Miller's avatar
      Merge branch 'dsa-cross-chip-notifier-cleanup' · 8ab38ed7
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      DSA cross-chip notifier cleanups
      
      This patch set makes the following improvements:
      
      - Cross-chip notifiers pass a switch index, port index, sometimes tree
        index, all as integers. Sometimes we need to recover the struct
        dsa_port based on those integers. That recovery involves traversing a
        list. By passing directly a pointer to the struct dsa_port we can
        avoid that, and the indices passed previously can still be obtained
        from the passed struct dsa_port.
      
      - Resetting VLAN filtering on a switch has explicit code to make it run
        on a single switch, so it has no place to stay in the cross-chip
        notifier code. Move it out.
      
      - Changing the MTU on a user port affects only that single port, yet the
        code passes through the cross-chip notifier layer where all switches
        are notified. Avoid that.
      
      - Other related cosmetic changes in the MTU changing procedure.
      
      Apart from the slight improvement in performance given by
      (a) doing less work in cross-chip notifiers
      (b) emitting less cross-chip notifiers
      we also end up with about 100 less lines of code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ab38ed7
    • Vladimir Oltean's avatar
      net: dsa: don't emit targeted cross-chip notifiers for MTU change · be6ff966
      Vladimir Oltean authored
      A cross-chip notifier with "targeted_match=true" is one that matches
      only the local port of the switch that emitted it. In other words,
      passing through the cross-chip notifier layer serves no purpose.
      
      Eliminate this concept by calling directly ds->ops->port_change_mtu
      instead of emitting a targeted cross-chip notifier. This leaves the
      DSA_NOTIFIER_MTU event being emitted only for MTU updates on the CPU
      port, which need to be reflected also across all DSA links.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be6ff966
    • Vladimir Oltean's avatar
      net: dsa: drop dsa_slave_priv from dsa_slave_change_mtu · 4715029f
      Vladimir Oltean authored
      We can get a hold of the "ds" pointer directly from "dp", no need for
      the dsa_slave_priv.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4715029f
    • Vladimir Oltean's avatar
      net: dsa: avoid one dsa_to_port() in dsa_slave_change_mtu · cf1c39d3
      Vladimir Oltean authored
      We could retrieve the cpu_dp pointer directly from the "dp" we already
      have, no need to resort to dsa_to_port(ds, port).
      
      This change also removes the need for an "int port", so that is also
      deleted.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf1c39d3
    • Vladimir Oltean's avatar
      net: dsa: use dsa_tree_for_each_user_port in dsa_slave_change_mtu · b2033a05
      Vladimir Oltean authored
      Use the more conventional iterator over user ports instead of explicitly
      ignoring them, and use the more conventional name "other_dp" instead of
      "dp_iter", for readability.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2033a05
    • Vladimir Oltean's avatar
      net: dsa: make cross-chip notifiers more efficient for host events · 726816a1
      Vladimir Oltean authored
      To determine whether a given port should react to the port targeted by
      the notifier, dsa_port_host_vlan_match() and dsa_port_host_address_match()
      look at the positioning of the switch port currently executing the
      notifier relative to the switch port for which the notifier was emitted.
      
      To maintain stylistic compatibility with the other match functions from
      switch.c, the host address and host VLAN match functions take the
      notifier information about targeted port, switch and tree indices as
      argument. However, these functions only use that information to retrieve
      the struct dsa_port *targeted_dp, which is an invariant for the outer
      loop that calls them. So it makes more sense to calculate the targeted
      dp only once, and pass it to them as argument.
      
      But furthermore, the targeted dp is actually known at the time the call
      to dsa_port_notify() is made. It is just that we decide to only save the
      indices of the port, switch and tree in the notifier structure, just to
      retrace our steps and find the dp again using dsa_switch_find() and
      dsa_to_port().
      
      But both the above functions are relatively expensive, since they need
      to iterate through lists. It appears more straightforward to make all
      notifiers just pass the targeted dp inside their info structure, and
      have the code that needs the indices to look at info->dp->index instead
      of info->port, or info->dp->ds->index instead of info->sw_index, or
      info->dp->ds->dst->index instead of info->tree_index.
      
      For the sake of consistency, all cross-chip notifiers are converted to
      pass the "dp" directly.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      726816a1
    • Vladimir Oltean's avatar
      net: dsa: move reset of VLAN filtering to dsa_port_switchdev_unsync_attrs · 8e9e678e
      Vladimir Oltean authored
      In dsa_port_switchdev_unsync_attrs() there is a comment that resetting
      the VLAN filtering isn't done where it is expected. And since commit
      108dc874 ("net: dsa: Avoid cross-chip syncing of VLAN filtering"),
      there is no reason to handle this in switch.c either.
      
      Therefore, move the logic to port.c, and adapt it slightly to the data
      structures and naming conventions from there.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e9e678e
  2. 19 Apr, 2022 8 commits
    • Paolo Abeni's avatar
      Merge branch 'rtnetlink-improve-alt_ifname-config-and-fix-dangerous-group-usage' · cc4bdef2
      Paolo Abeni authored
      Florent Fourcot says:
      
      ====================
      rtnetlink: improve ALT_IFNAME config and fix dangerous GROUP usage
      
      First commit forbids dangerous calls when both IFNAME and GROUP are
      given, since it can introduce unexpected behaviour when IFNAME does not
      match any interface.
      
      Second patch achieves primary goal of this patchset to fix/improve
      IFLA_ALT_IFNAME attribute, since previous code was never working for
      newlink/setlink. ip-link command is probably getting interface index
      before, and was not using this feature.
      
      Last two patches are improving error code on corner cases.
      
      Changes in v2:
        * Remove ifname argument in rtnl_dev_get/do_setlink
          functions (simplify code)
        * Use a boolean to avoid condition duplication in __rtnl_newlink
      
      Changes in v3:
        * Simplify rtnl_dev_get signature
      
      Changes in v4:
        * Rename link_lookup to link_specified
      
      Changes in v5:
        * Re-order patches
      ====================
      
      Link: https://lore.kernel.org/r/20220415165330.10497-1-florent.fourcot@wifirst.frSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cc4bdef2
    • Florent Fourcot's avatar
      rtnetlink: return EINVAL when request cannot succeed · b6177d32
      Florent Fourcot authored
      A request without interface name/interface index/interface group cannot
      work. We should return EINVAL
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: default avatarBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b6177d32
    • Florent Fourcot's avatar
      rtnetlink: return ENODEV when IFLA_ALT_IFNAME is used in dellink · dee04163
      Florent Fourcot authored
      If IFLA_ALT_IFNAME is set and given interface is not found,
      we should return ENODEV and be consistent with IFLA_IFNAME
      behaviour
      This commit extends feature of commit 76c9ac0e,
      "net: rtnetlink: add possibility to use alternative names as message handle"
      
      CC: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: default avatarBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dee04163
    • Florent Fourcot's avatar
      rtnetlink: enable alt_ifname for setlink/newlink · 5ea08b52
      Florent Fourcot authored
      buffer called "ifname" given in function rtnl_dev_get
      is always valid when called by setlink/newlink,
      but contains only empty string when IFLA_IFNAME is not given. So
      IFLA_ALT_IFNAME is always ignored
      
      This patch fixes rtnl_dev_get function with a remove of ifname argument,
      and move ifname copy in do_setlink when required.
      
      It extends feature of commit 76c9ac0e,
      "net: rtnetlink: add possibility to use alternative names as message
      handle""
      
      CC: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: default avatarBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5ea08b52
    • Florent Fourcot's avatar
      rtnetlink: return ENODEV when ifname does not exist and group is given · ef2a7c90
      Florent Fourcot authored
      When the interface does not exist, and a group is given, the given
      parameters are being set to all interfaces of the given group. The given
      IFNAME/ALT_IF_NAME are being ignored in that case.
      
      That can be dangerous since a typo (or a deleted interface) can produce
      weird side effects for caller:
      
      Case 1:
      
       IFLA_IFNAME=valid_interface
       IFLA_GROUP=1
       MTU=1234
      
      Case 1 will update MTU and group of the given interface "valid_interface".
      
      Case 2:
      
       IFLA_IFNAME=doesnotexist
       IFLA_GROUP=1
       MTU=1234
      
      Case 2 will update MTU of all interfaces in group 1. IFLA_IFNAME is
      ignored in this case
      
      This behaviour is not consistent and dangerous. In order to fix this issue,
      we now return ENODEV when the given IFNAME does not exist.
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: default avatarBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ef2a7c90
    • Paolo Abeni's avatar
      Merge branch 'net-sched-allow-user-to-select-txqueue' · 8b11c35d
      Paolo Abeni authored
      Tonghao Zhang says:
      
      ====================
      net: sched: allow user to select txqueue
      
      From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
      
      Patch 1 allow user to select txqueue in clsact hook.
      Patch 2 support skbhash to select txqueue.
      ====================
      
      Link: https://lore.kernel.org/r/20220415164046.26636-1-xiangxia.m.yue@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8b11c35d
    • Tonghao Zhang's avatar
      net: sched: support hash selecting tx queue · 38a6f086
      Tonghao Zhang authored
      This patch allows users to pick queue_mapping, range
      from A to B. Then we can load balance packets from A
      to B tx queue. The range is an unsigned 16bit value
      in decimal format.
      
      $ tc filter ... action skbedit queue_mapping skbhash A B
      
      "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit")
      is enhanced with flags: SKBEDIT_F_TXQ_SKBHASH
      
        +----+      +----+      +----+
        | P1 |      | P2 |      | Pn |
        +----+      +----+      +----+
          |           |           |
          +-----------+-----------+
                      |
                      | clsact/skbedit
                      |      MQ
                      v
          +-----------+-----------+
          | q0        | qn        | qm
          v           v           v
        HTB/FQ       FIFO   ...  FIFO
      
      For example:
      If P1 sends out packets to different Pods on other host, and
      we want distribute flows from qn - qm. Then we can use skb->hash
      as hash.
      
      setup commands:
      $ NETDEV=eth0
      $ ip netns add n1
      $ ip link add ipv1 link $NETDEV type ipvlan mode l2
      $ ip link set ipv1 netns n1
      $ ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up
      
      $ tc qdisc add dev $NETDEV clsact
      $ tc filter add dev $NETDEV egress protocol ip prio 1 \
              flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping skbhash 2 6
      $ tc qdisc add dev $NETDEV handle 1: root mq
      $ tc qdisc add dev $NETDEV parent 1:1 handle 2: htb
      $ tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit
      $ tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit
      $ tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1
      $ tc qdisc add dev $NETDEV parent 1:3 pfifo
      $ tc qdisc add dev $NETDEV parent 1:4 pfifo
      $ tc qdisc add dev $NETDEV parent 1:5 pfifo
      $ tc qdisc add dev $NETDEV parent 1:6 pfifo
      $ tc qdisc add dev $NETDEV parent 1:7 pfifo
      
      $ ip netns exec n1 iperf3 -c 2.2.2.1 -i 1 -t 10 -P 10
      
      pick txqueue from 2 - 6:
      $ ethtool -S $NETDEV | grep -i tx_queue_[0-9]_bytes
           tx_queue_0_bytes: 42
           tx_queue_1_bytes: 0
           tx_queue_2_bytes: 11442586444
           tx_queue_3_bytes: 7383615334
           tx_queue_4_bytes: 3981365579
           tx_queue_5_bytes: 3983235051
           tx_queue_6_bytes: 6706236461
           tx_queue_7_bytes: 42
           tx_queue_8_bytes: 0
           tx_queue_9_bytes: 0
      
      txqueues 2 - 6 are mapped to classid 1:3 - 1:7
      $ tc -s class show dev $NETDEV
      ...
      class mq 1:3 root leaf 8002:
       Sent 11949133672 bytes 7929798 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:4 root leaf 8003:
       Sent 7710449050 bytes 5117279 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:5 root leaf 8004:
       Sent 4157648675 bytes 2758990 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:6 root leaf 8005:
       Sent 4159632195 bytes 2759990 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:7 root leaf 8006:
       Sent 7003169603 bytes 4646912 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      ...
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Lobakin <alobakin@pm.me>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Talal Ahmad <talalahmad@google.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Cc: Antoine Tenart <atenart@kernel.org>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      38a6f086
    • Tonghao Zhang's avatar
      net: sched: use queue_mapping to pick tx queue · 2f1e85b1
      Tonghao Zhang authored
      This patch fixes issue:
      * If we install tc filters with act_skbedit in clsact hook.
        It doesn't work, because netdev_core_pick_tx() overwrites
        queue_mapping.
      
        $ tc filter ... action skbedit queue_mapping 1
      
      And this patch is useful:
      * We can use FQ + EDT to implement efficient policies. Tx queues
        are picked by xps, ndo_select_queue of netdev driver, or skb hash
        in netdev_core_pick_tx(). In fact, the netdev driver, and skb
        hash are _not_ under control. xps uses the CPUs map to select Tx
        queues, but we can't figure out which task_struct of pod/containter
        running on this cpu in most case. We can use clsact filters to classify
        one pod/container traffic to one Tx queue. Why ?
      
        In containter networking environment, there are two kinds of pod/
        containter/net-namespace. One kind (e.g. P1, P2), the high throughput
        is key in these applications. But avoid running out of network resource,
        the outbound traffic of these pods is limited, using or sharing one
        dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
        (e.g. Pn), the low latency of data access is key. And the traffic is not
        limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
        This choice provides two benefits. First, contention on the HTB/FQ Qdisc
        lock is significantly reduced since fewer CPUs contend for the same queue.
        More importantly, Qdisc contention can be eliminated completely if each
        CPU has its own FIFO Qdisc for the second kind of pods.
      
        There must be a mechanism in place to support classifying traffic based on
        pods/container to different Tx queues. Note that clsact is outside of Qdisc
        while Qdisc can run a classifier to select a sub-queue under the lock.
      
        In general recording the decision in the skb seems a little heavy handed.
        This patch introduces a per-CPU variable, suggested by Eric.
      
        The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit().
        - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag
          is set in qdisc->enqueue() though tx queue has been selected in
          netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared
          firstly in __dev_queue_xmit(), is useful:
        - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev
          in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy:
          For example, eth0, macvlan in pod, which root Qdisc install skbedit
          queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of
          eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping
          because there is no filters in clsact or tx Qdisc of this netdev.
          Same action taked in eth0, ixgbe in Host.
        - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue
          in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it
          in __dev_queue_xmit when processing next packets.
      
        For performance reasons, use the static key. If user does not config the NET_EGRESS,
        the patch will not be compiled.
      
        +----+      +----+      +----+
        | P1 |      | P2 |      | Pn |
        +----+      +----+      +----+
          |           |           |
          +-----------+-----------+
                      |
                      | clsact/skbedit
                      |      MQ
                      v
          +-----------+-----------+
          | q0        | q1        | qn
          v           v           v
        HTB/FQ      HTB/FQ  ...  FIFO
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Lobakin <alobakin@pm.me>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Talal Ahmad <talalahmad@google.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Cc: Antoine Tenart <atenart@kernel.org>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2f1e85b1
  3. 18 Apr, 2022 2 commits
    • Luiz Angelo Daros de Luca's avatar
      docs: net: dsa: describe issues with checksum offload · a997157e
      Luiz Angelo Daros de Luca authored
      DSA tags before IP header (categories 1 and 2) or after the payload (3)
      might introduce offload checksum issues.
      Signed-off-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a997157e
    • David S. Miller's avatar
      Merge branch 'mlxsw-line-card' · 2a38de06
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Introduce line card support for modular switch
      
      Jiri says:
      
      This patchset introduces support for modular switch systems and also
      introduces mlxsw support for NVIDIA Mellanox SN4800 modular switch.
      It contains 8 slots to accommodate line cards - replaceable PHY modules
      which may contain gearboxes.
      Currently supported line card:
      16X 100GbE (QSFP28)
      Other line cards that are going to be supported:
      8X 200GbE (QSFP56)
      4X 400GbE (QSFP-DD)
      There may be other types of line cards added in the future.
      
      To be consistent with the port split configuration (splitter cabels),
      the line card entities are treated in the similar way. The nature of
      a line card is not "a pluggable device", but "a pluggable PHY module".
      
      A concept of "provisioning" is introduced. The user may "provision"
      certain slot with a line card type. Driver then creates all instances
      (devlink ports, netdevices, etc) related to this line card type. It does
      not matter if the line card is plugged-in at the time. User is able to
      configure netdevices, devlink ports, setup port splitters, etc. From the
      perspective of the switch ASIC, all is present and can be configured.
      
      The carrier of netdevices stays down if the line card is not plugged-in.
      Once the line card is inserted and activated, the carrier of
      the related netdevices is then reflecting the physical line state,
      same as for an ordinary fixed port.
      
      Once user does not want to use the line card related instances
      anymore, he can "unprovision" the slot. Driver then removes the
      instances.
      
      Patches 1-4 are extending devlink driver API and UAPI in order to
      register, show, dump, provision and activate the line card.
      Patches 5-17 are implementing the introduced API in mlxsw.
      The last patch adds a selftest for mlxsw line cards.
      
      Example:
      $ devlink port # No ports are listed
      $ devlink lc
      pci/0000:01:00.0:
        lc 1 state unprovisioned
          supported_types:
             16x100G
        lc 2 state unprovisioned
          supported_types:
             16x100G
        lc 3 state unprovisioned
          supported_types:
             16x100G
        lc 4 state unprovisioned
          supported_types:
             16x100G
        lc 5 state unprovisioned
          supported_types:
             16x100G
        lc 6 state unprovisioned
          supported_types:
             16x100G
        lc 7 state unprovisioned
          supported_types:
             16x100G
        lc 8 state unprovisioned
          supported_types:
             16x100G
      
      Note that driver exposes list supported line card types. Currently
      there is only one: "16x100G".
      
      To provision the slot #8:
      
      $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
      $ devlink lc show pci/0000:01:00.0 lc 8
      pci/0000:01:00.0:
        lc 8 state active type 16x100G
          supported_types:
             16x100G
      $ devlink port
      pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
      pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4
      pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4
      pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4
      pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4
      pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4
      pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4
      pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4
      pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4
      pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4
      pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4
      pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4
      pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4
      pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4
      pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4
      pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4
      pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4
      
      To uprovision the slot #8:
      
      $ devlink lc set pci/0000:01:00.0 lc 8 notype
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a38de06