1. 21 Apr, 2022 1 commit
  2. 20 Apr, 2022 38 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-line-card-status-tracking' · 365014f5
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Line cards status tracking
      
      When a line card is provisioned, netdevs corresponding to the ports
      found on the line card are registered. User space can then perform
      various logical configurations (e.g., splitting, setting MTU) on these
      netdevs.
      
      However, since the line card is not present / powered on (i.e., it is
      not in 'active' state), user space cannot access the various components
      found on the line card. For example, user space cannot read the
      temperature of gearboxes or transceiver modules found on the line card
      via hwmon / thermal. Similarly, it cannot dump the EEPROM contents of
      these transceiver modules. The above is only possible when the line card
      becomes active.
      
      This patchset solves the problem by tracking the status of each line
      card and invoking callbacks from interested parties when a line card
      becomes active / inactive.
      
      Patchset overview:
      
      Patch #1 adds the infrastructure in the line cards core that allows
      users to registers a set of callbacks that are invoked when a line card
      becomes active / inactive. To avoid races, if a line card is already
      active during registration, the got_active() callback is invoked.
      
      Patches #2-#3 are preparations.
      
      Patch #4 changes the port module core to register a set of callbacks
      with the line cards core. See detailed description with examples in the
      commit message.
      
      Patches #5-#6 do the same with regards to thermal / hwmon support, so
      that user space will be able to monitor the temperature of various
      components on the line card when it becomes active.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      365014f5
    • Vadim Pasternak's avatar
      mlxsw: core_hwmon: Add interfaces for line card initialization and de-initialization · 99a03b31
      Vadim Pasternak authored
      Add callback functions for line card 'hwmon' initialization and
      de-initialization. Each line card is associated with the relevant
      'hwmon' device, which may contain thermal attributes for the cages
      and gearboxes found on this line card.
      
      The line card 'hwmon' initialization / de-initialization APIs are to be
      called when line card is set to active / inactive state by
      got_active() / got_inactive() callbacks from line card state machine.
      
      For example cage temperature for module #9 located at line card #7 will
      be exposed by utility 'sensors' like:
      linecard#07
      front panel 009:	+32.0C  (crit = +70.0C, emerg = +80.0C)
      And temperature for gearbox #3 located at line card #5 will be exposed
      like:
      linecard#05
      gearbox 003:		+41.0C  (highest = +41.0C)
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99a03b31
    • Vadim Pasternak's avatar
      mlxsw: core_thermal: Add interfaces for line card initialization and de-initialization · f11a323d
      Vadim Pasternak authored
      Add callback functions for line card thermal area initialization and
      de-initialization. Each line card is associated with the relevant
      thermal area, which may contain thermal zones for cages and gearboxes
      found on this line card.
      
      The line card thermal initialization / de-initialization APIs are to be
      called when line card is set to active / inactive state by
      got_active() / got_inactive() callbacks from line card state machine.
      
      For example thermal zone for module #9 located at line card #7 will
      have type:
      mlxsw-lc7-module9.
      And thermal zone for gearbox #2 located at line card #5 will have type:
      mlxsw-lc5-gearbox2.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f11a323d
    • Vadim Pasternak's avatar
      mlxsw: core_env: Add interfaces for line card initialization and de-initialization · 06a0fc43
      Vadim Pasternak authored
      Netdevs for ports found on line cards are registered upon provisioning.
      However, user space is not allowed to access the transceiver modules
      found on a line card until the line card becomes active.
      
      Therefore, register event operations with the line card core to get
      notifications whenever a line card becomes active or inactive.
      
      When user space tries to dump the EEPROM of a transceiver module or reset
      it and the corresponding line card is inactive, emit an error
      message:
      ethtool -m enp1s0nl7p9
      netlink error: mlxsw_core: Cannot read EEPROM of module on an inactive line card
      netlink error: Input/output error
      
      When user space tries to set the power mode policy of such a transceiver,
      cache the configuration and apply it when the line card becomes active. This
      is consistent with other port configuration (e.g., MTU setting) that user space
      is able to perform while the line card is provisioned, but inactive.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06a0fc43
    • Vadim Pasternak's avatar
      mlxsw: core_env: Split module power mode setting to a separate function · a11e1ec1
      Vadim Pasternak authored
      Move the code that applies the module power mode to the device to a
      separate function. This function will be invoked by the next patch to
      set the power mode on transceiver modules found on a line card when the
      line card becomes active.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a11e1ec1
    • Vadim Pasternak's avatar
      mlxsw: core: Add bus argument to environment init API · 7b261af9
      Vadim Pasternak authored
      Pass bus argument to mlxsw_env_init(). The purpose is to get access to
      device handle, which is to be provided to error message in case of line
      card activation failure.
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b261af9
    • Jiri Pirko's avatar
      mlxsw: core_linecards: Introduce ops for linecards status change tracking · de28976d
      Jiri Pirko authored
      Introduce an infrastructure allowing users to register a set
      of operations which are to be called whenever a line card gets
      active/inactive.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de28976d
    • David S. Miller's avatar
      Merge tag 'linux-can-next-for-5.19-20220419' of... · 85ef87ba
      David S. Miller authored
      Merge tag 'linux-can-next-for-5.19-20220419' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2022-04-19
      
      this is a pull request of 17 patches for net-next/master.
      
      The first 2 patches are by me and target the CAN driver
      infrastructure. One patch renames a function in the rx_offload helper
      the other one updates the CAN bitrate calculation to prefer small bit
      rate pre-scalers over larger ones, which is encouraged by the CAN in
      Automation.
      
      Kris Bahnsen contributes a patch to fix the links to Technologic
      Systems web resources in the sja1000 driver.
      
      Christophe Leroy's patch prepares the mpc5xxx_can driver for upcoming
      powerpc header cleanup.
      
      Minghao Chi's patch converts the flexcan driver to use
      pm_runtime_resume_and_get().
      
      The next 2 patches target the Xilinx CAN driver. Lukas Bulwahn's patch
      fixes an entry in the MAINTAINERS file. A patch by me marks the bit
      timing constants as const.
      
      Wolfram Sang's patch documents r8a77961 support on the
      renesas,rcar-canfd bindings document.
      
      The next 2 patches are by me and add support for the mcp251863 chip to
      the mcp251xfd driver.
      
      The last 7 patches are by Pavel Pisa, Martin Jerabek et al. and add
      the ctucanfd driver for the CTU CAN FD IP Core.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85ef87ba
    • David S. Miller's avatar
      Merge branch 'net-sched-flower-num-vlan-tags' · c1f6f1e6
      David S. Miller authored
      Boris Sukholitko says:
      
      ====================
      net/sched: flower: match on the number of vlan tags
      
      Our customers in the fiber telecom world have network configurations
      where they would like to control their traffic according to the number
      of tags appearing in the packet.
      
      For example, TR247 GPON conformance test suite specification mostly
      talks about untagged, single, double tagged packets and gives lax
      guidelines on the vlan protocol vs. number of vlan tags.
      
      This is different from the common IT networks where 802.1Q and 802.1ad
      protocols are usually describe single and double tagged packet. GPON
      configurations that we work with have arbitrary mix the above protocols
      and number of vlan tags in the packet.
      
      The following patch series implement number of vlans flower filter. They
      add num_of_vlans flower filter as an alternative to vlan ethtype protocol
      matching. The end result is that the following command becomes possible:
      
      tc filter add dev eth1 ingress flower \
        num_of_vlans 1 vlan_prio 5 action drop
      
      Also, from our logs, we have redirect rules such that:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N \
           action mirred egress redirect dev $DEV
      
      where N can range from 0 to 3 and $DEV is the function of $N.
      
      Also there are rules setting skb mark based on the number of vlans:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
          $P action skbedit mark $M
      
      More about the patch series:
        - patches 1-2 remove duplicate code by introducing is_key_vlan
          helper.
        - patch 3, 4 implement num_of_vlans in the dissector and in the
          flower.
        - patch 5 uses the num_of_vlans filter to allow further matching on
          vlan attributes.
      
      Complementary iproute2 patches are being sent separately.
      
      Thanks,
      Boris.
      
      - v4: rebased to the latest net-next
      - v3:
          - more example commands in patch 3 description (request by Jamal)
          - patch 5 description made clearer (thanks to Jiri)
      - v2:
          - add suitable subject prefixes
          - more evolved patch 5 description
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1f6f1e6
    • Boris Sukholitko's avatar
      net/sched: flower: Consider the number of tags for vlan filters · 99fdb22b
      Boris Sukholitko authored
      Before this patch the existence of vlan filters was conditional on the vlan
      protocol being matched in the tc rule. For example, the following rule:
      
      tc filter add dev eth1 ingress flower vlan_prio 5
      
      was illegal because vlan protocol (e.g. 802.1q) does not appear in the rule.
      
      Remove the above restriction by looking at the num_of_vlans filter to
      allow further matching on vlan attributes. The following rule becomes
      legal as a result of this commit:
      
      tc filter add dev eth1 ingress flower num_of_vlans 1 vlan_prio 5
      
      because having num_of_vlans==1 implies that the packet is single tagged.
      
      Change is_vlan_key helper to look at the number of vlans in addition to
      the vlan ethertype. The outcome of this change is that outer (e.g. vlan_prio)
      and inner (e.g. cvlan_prio) tag vlan filters require the number of vlan
      tags to be greater then 0 and 1 accordingly.
      
      As a result of is_vlan_key change, the ethertype may be set to 0 when
      matching on the number of vlans. Update fl_set_key_vlan to avoid setting
      key, mask vlan_tpid for the 0 ethertype.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99fdb22b
    • Boris Sukholitko's avatar
      net/sched: flower: Add number of vlan tags filter · b4000312
      Boris Sukholitko authored
      These are bookkeeping parts of the new num_of_vlans filter.
      Defines, dump, load and set are being done here.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4000312
    • Boris Sukholitko's avatar
      flow_dissector: Add number of vlan tags dissector · 34951fcf
      Boris Sukholitko authored
      Our customers in the fiber telecom world have network configurations
      where they would like to control their traffic according to the number
      of tags appearing in the packet.
      
      For example, TR247 GPON conformance test suite specification mostly
      talks about untagged, single, double tagged packets and gives lax
      guidelines on the vlan protocol vs. number of vlan tags.
      
      This is different from the common IT networks where 802.1Q and 802.1ad
      protocols are usually describe single and double tagged packet. GPON
      configurations that we work with have arbitrary mix the above protocols
      and number of vlan tags in the packet.
      
      The goal is to make the following TC commands possible:
      
      tc filter add dev eth1 ingress flower \
        num_of_vlans 1 vlan_prio 5 action drop
      
      From our logs, we have redirect rules such that:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N \
           action mirred egress redirect dev $DEV
      
      where N can range from 0 to 3 and $DEV is the function of $N.
      
      Also there are rules setting skb mark based on the number of vlans:
      
      tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
          $P action skbedit mark $M
      
      This new dissector allows extracting the number of vlan tags existing in
      the packet.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34951fcf
    • Boris Sukholitko's avatar
    • Boris Sukholitko's avatar
      net/sched: flower: Helper function for vlan ethtype checks · 285ba06b
      Boris Sukholitko authored
      There are somewhat repetitive ethertype checks in fl_set_key. Refactor
      them into is_vlan_key helper function.
      
      To make the changes clearer, avoid touching identation levels. This is
      the job for the next patch in the series.
      Signed-off-by: default avatarBoris Sukholitko <boris.sukholitko@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      285ba06b
    • Haowen Bai's avatar
      ar5523: Use kzalloc instead of kmalloc/memset · e63dd412
      Haowen Bai authored
      Use kzalloc rather than duplicating its implementation, which
      makes code simple and easy to understand.
      Signed-off-by: default avatarHaowen Bai <baihaowen@meizu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e63dd412
    • Luiz Angelo Daros de Luca's avatar
      net: dsa: realtek: remove realtek,rtl8367s string · fcd30c96
      Luiz Angelo Daros de Luca authored
      There is no need to add new compatible strings for each new supported
      chip version. The compatible string is used only to select the subdriver
      (rtl8365mb.c or rtl8366rb.c). Once in the subdriver, it will detect the
      chip model by itself, ignoring which compatible string was used.
      
      Link: https://lore.kernel.org/netdev/20220414014055.m4wbmr7tdz6hsa3m@bang-olufsen.dk/Signed-off-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Reviewed-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcd30c96
    • Luiz Angelo Daros de Luca's avatar
      dt-bindings: net: dsa: realtek: cleanup compatible strings · 6f2d04cc
      Luiz Angelo Daros de Luca authored
      Compatible strings are used to help the driver find the chip ID/version
      register for each chip family. After that, the driver can setup the
      switch accordingly. Keep only the first supported model for each family
      as a compatible string and reference other chip models in the
      description.
      
      The removed compatible strings have never been used in a released kernel.
      
      CC: devicetree@vger.kernel.org
      Link: https://lore.kernel.org/netdev/20220414014055.m4wbmr7tdz6hsa3m@bang-olufsen.dk/Signed-off-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f2d04cc
    • David S. Miller's avatar
      Merge branch 'hns3-next' · e92453b9
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: updates for -next
      
      This series includes some updates for the HNS3 ethernet driver.
      
      Change logs:
      V1 -> V2:
       - Fix failed to apply to net-next problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e92453b9
    • Hao Chen's avatar
      net: hns3: remove unnecessary line wrap for hns3_set_tunable · 29c17cb6
      Hao Chen authored
      Remove unnecessary line wrap for hns3_set_tunable to improve
      function readability.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29c17cb6
    • Peng Li's avatar
      net: hns3: replace magic value by HCLGE_RING_REG_OFFSET · 350cb440
      Peng Li authored
      Magic values are not recommended.
      
      Signed-off-by: Peng Li<lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      350cb440
    • Peng Li's avatar
      net: hns3: fix the wrong words in comments · 9c657cbc
      Peng Li authored
      This patch fixes wrong words in comments.
      
      Signed-off-by: Peng Li<lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c657cbc
    • Peng Li's avatar
      net: hns3: update the comment of function hclgevf_get_mbx_resp · 2e0f5388
      Peng Li authored
      The param of function hclgevf_get_mbx_resp has been changed but the
      comments not upodated. This patch updates it.
      
      Signed-off-by: Peng Li<lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e0f5388
    • Hao Chen's avatar
      net: hns3: add log for setting tx spare buf size · 2373b35c
      Hao Chen authored
      For the active tx spare buffer size maybe changed according
      to the page size, so add log to notice it.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2373b35c
    • Jie Wang's avatar
      net: hns3: add failure logs in hclge_set_vport_mtu · bcc7a98f
      Jie Wang authored
      Currently, There is a low probability that pf mtu configuration fails, but
      the information in logs is insufficient for problem locating when the VF
      mtu value is illegally modified.
      
      So record the vf index and vf mtu value at the failure scenario.
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcc7a98f
    • Jian Shen's avatar
      net: hns3: refine the definition for struct hclge_pf_to_vf_msg · 6fde96df
      Jian Shen authored
      The struct hclge_pf_to_vf_msg is used for mailbox message from
      PF to VF, including both response and request. But its definition
      can only indicate respone, which makes the message data copy in
      function hclge_send_mbx_msg() unreadable. So refine it by edding
      a general message definition into it.
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fde96df
    • Hao Chen's avatar
      net: hns3: refactor hns3_set_ringparam() · 07fdc163
      Hao Chen authored
      Use struct hns3_ring_param to replace variable new/old_xxx and
      add hns3_is_ringparam_changed() to judge them if is changed to
      improve code readability.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07fdc163
    • Yufeng Mo's avatar
      net: hns3: add ethtool parameter check for CQE/EQE mode · 286c61e7
      Yufeng Mo authored
      For DEVICE_VERSION_V2, the hardware does not support the CQE mode.
      So add capability bit for coalesce CQE mode and add parameter check
      for it in ethtool.
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      286c61e7
    • David S. Miller's avatar
      Merge branch 'atlantic-xdp-multi-buffer' · e97e917b
      David S. Miller authored
      [PATCH net-next v5 0/3] net: atlantic: Add XDP support
      @ 2022-04-17 10:12 Taehee Yoo
        2022-04-17 10:12 ` [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane Taehee Yoo
                         ` (2 more replies)
        0 siblings, 3 replies; 4+ messages in thread
      From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw)
        To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk,
      	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf
        Cc: ap420073
      
      This patchset is to make atlantic to support multi-buffer XDP.
      
      The first patch implement control plane of xdp.
      The aq_xdp(), callback of .xdp_bpf is added.
      
      The second patch implements data plane of xdp.
      XDP_TX, XDP_DROP, and XDP_PASS is supported.
      __aq_ring_xdp_clean() is added to receive and execute xdp program.
      aq_nic_xmit_xdpf() is added to send packet by XDP.
      
      The third patch implements callback of .ndo_xdp_xmit.
      aq_xdp_xmit() is added to send redirected packets and it internally
      calls aq_nic_xmit_xdpf().
      
      Memory model is MEM_TYPE_PAGE_SHARED.
      
      Order-2 page allocation is used when XDP is enabled.
      
      LRO will be disabled if XDP program doesn't supports multi buffer.
      
      AQC chip supports 32 multi-queues and 8 vectors(irq).
      There are two options.
      1. under 8 cores and maximum 4 tx queues per core.
      2. under 4 cores and maximum 8 tx queues per core.
      
      Like other drivers, these tx queues can be used only for XDP_TX,
      XDP_REDIRECT queue. If so, no tx_lock is needed.
      But this patchset doesn't use this strategy because getting hardware tx
      queue index cost is too high.
      So, tx_lock is used in the aq_nic_xmit_xdpf().
      
      single-core, single queue, 80% cpu utilization.
      
        32.30%  [kernel]                  [k] aq_get_rxpages_xdp
        10.44%  [kernel]                  [k] aq_hw_read_reg <---------- here
         9.86%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
         5.51%  [kernel]                  [k] aq_ring_rx_clean
      
      single-core, 8 queues, 100% cpu utilization, half PPS.
      
        52.03%  [kernel]                  [k] aq_hw_read_reg <---------- here
        18.24%  [kernel]                  [k] aq_get_rxpages_xdp
         4.30%  [kernel]                  [k] hw_atl_b0_hw_ring_rx_receive
         4.24%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
         2.79%  [kernel]                  [k] aq_ring_rx_clean
      
      Performance result(64 Byte)
      1. XDP_TX
        a. xdp_geieric, single core
          - 2.5Mpps, 100% cpu
        b. xdp_driver, single core
          - 4.5Mpps, 80% cpu
        c. xdp_generic, 8 core(hyper thread)
          - 6.3Mpps, 40% cpu
        d. xdp_driver, 8 core(hyper thread)
          - 6.3Mpps, 30% cpu
      
      2. XDP_REDIRECT
        a. xdp_generic, single core
          - 2.3Mpps
        b. xdp_driver, single core
          - 4.5Mpps
      
      v5:
       - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0
       - Use 2K frame size instead of 3K
       - Use order-2 page allocation instead of order-0
       - Rename aq_get_rxpage() to aq_alloc_rxpages()
       - Add missing PageFree stats for ethtool
       - Remove aq_unset_rxpage_xdp(), introduced by v2 patch due to
         change of memory model
       - Fix wrong last parameter value of xdp_prepare_buff()
       - Add aq_get_rxpages_xdp() to increase page reference count
      
      v4:
       - Fix compile warning
      
      v3:
       - Change wrong PPS performance result 40% -> 80% in single
         core(Intel i3-12100)
       - Separate aq_nic_map_xdp() from aq_nic_map_skb()
       - Drop multi buffer packets if single buffer XDP is attached
       - Disable LRO when single buffer XDP is attached
       - Use xdp_get_{frame/buff}_len()
      
      v2:
       - Do not use inline in C file
      
      Taehee Yoo (3):
        net: atlantic: Implement xdp control plane
        net: atlantic: Implement xdp data plane
        net: atlantic: Implement .ndo_xdp_xmit handler
      
       .../net/ethernet/aquantia/atlantic/aq_cfg.h   |   1 +
       .../ethernet/aquantia/atlantic/aq_ethtool.c   |   9 +
       .../net/ethernet/aquantia/atlantic/aq_main.c  |  87 ++++
       .../net/ethernet/aquantia/atlantic/aq_main.h  |   2 +
       .../net/ethernet/aquantia/atlantic/aq_nic.c   | 136 ++++++
       .../net/ethernet/aquantia/atlantic/aq_nic.h   |   5 +
       .../net/ethernet/aquantia/atlantic/aq_ring.c  | 409 ++++++++++++++++--
       .../net/ethernet/aquantia/atlantic/aq_ring.h  |  21 +-
       .../net/ethernet/aquantia/atlantic/aq_vec.c   |  23 +-
       .../net/ethernet/aquantia/atlantic/aq_vec.h   |   6 +
       .../aquantia/atlantic/hw_atl/hw_atl_a0.c      |   6 +-
       .../aquantia/atlantic/hw_atl/hw_atl_b0.c      |  10 +-
       12 files changed, 670 insertions(+), 45 deletions(-)
      
      --
      2.17.1
      
      ^ permalink raw reply	[flat|nested] 4+ messages in thread
      * [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane
        2022-04-17 10:12 [PATCH net-next v5 0/3] net: atlantic: Add XDP support Taehee Yoo
      @ 2022-04-17 10:12 ` Taehee Yoo
        2022-04-17 10:12 ` [PATCH net-next v5 2/3] net: atlantic: Implement xdp data plane Taehee Yoo
        2022-04-17 10:12 ` [PATCH net-next v5 3/3] net: atlantic: Implement .ndo_xdp_xmit handler Taehee Yoo
        2 siblings, 0 replies; 4+ messages in thread
      From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw)
        To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk,
      	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf
        Cc: ap420073
      
      aq_xdp() is a xdp setup callback function for Atlantic driver.
      When XDP is attached or detached, the device will be restarted because
      it uses different headroom, tailroom, and page order value.
      
      If XDP enabled, it switches default page order value from 0 to 2.
      Because the default maximum frame size is still 2K and it needs
      additional area for headroom and tailroom.
      The total size(headroom + frame size + tailroom) is 2624.
      So, 1472Bytes will be always wasted for every frame.
      But when order-2 is used, these pages can be used 6 times
      with flip strategy.
      It means only about 106Bytes per frame will be wasted.
      
      Also, It supports xdp fragment feature.
      MTU can be 16K if xdp prog supports xdp fragment.
      If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.
      
      And a static key is added and It will be used to call the xdp_clean
      handler in ->poll(). data plane implementation will be contained
      the followed patch.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      ---
      
      v5:
       - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0
       - Use 2K frame size instead of 3K
       - Use order-2 page allocation instead of order-0
       - Rename aq_get_rxpage() to aq_alloc_rxpages()
      
      v4:
       - No changed
      
      v3:
       - Disable LRO when single buffer XDP is attached
      
      v2:
       - No changed
      e97e917b
    • Taehee Yoo's avatar
      net: atlantic: Implement .ndo_xdp_xmit handler · 45638f01
      Taehee Yoo authored
      aq_xdp_xmit() is the callback function of .ndo_xdp_xmit.
      It internally calls aq_nic_xmit_xdpf() to send packet.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45638f01
    • Taehee Yoo's avatar
      net: atlantic: Implement xdp data plane · 26efaef7
      Taehee Yoo authored
      It supports XDP_PASS, XDP_DROP and multi buffer.
      
      The new function aq_nic_xmit_xdpf() is used to send packet with
      xdp_frame and internally it calls aq_nic_map_xdp().
      
      AQC chip supports 32 multi-queues and 8 vectors(irq).
      there are two option
      1. under 8 cores and 4 tx queues per core.
      2. under 4 cores and 8 tx queues per core.
      
      Like ixgbe, these tx queues can be used only for XDP_TX, XDP_REDIRECT
      queue. If so, no tx_lock is needed.
      But this patchset doesn't use this strategy because getting hardware tx
      queue index cost is too high.
      So, tx_lock is used in the aq_nic_xmit_xdpf().
      
      single-core, single queue, 80% cpu utilization.
      
        30.75%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
        10.35%  [kernel]                  [k] aq_hw_read_reg <---------- here
         4.38%  [kernel]                  [k] get_page_from_freelist
      
      single-core, 8 queues, 100% cpu utilization, half PPS.
      
        45.56%  [kernel]                  [k] aq_hw_read_reg <---------- here
        17.58%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
         4.72%  [kernel]                  [k] hw_atl_b0_hw_ring_rx_receive
      
      The new function __aq_ring_xdp_clean() is a xdp rx handler and this is
      called only when XDP is attached.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26efaef7
    • Taehee Yoo's avatar
      net: atlantic: Implement xdp control plane · 0d14657f
      Taehee Yoo authored
      aq_xdp() is a xdp setup callback function for Atlantic driver.
      When XDP is attached or detached, the device will be restarted because
      it uses different headroom, tailroom, and page order value.
      
      If XDP enabled, it switches default page order value from 0 to 2.
      Because the default maximum frame size is still 2K and it needs
      additional area for headroom and tailroom.
      The total size(headroom + frame size + tailroom) is 2624.
      So, 1472Bytes will be always wasted for every frame.
      But when order-2 is used, these pages can be used 6 times
      with flip strategy.
      It means only about 106Bytes per frame will be wasted.
      
      Also, It supports xdp fragment feature.
      MTU can be 16K if xdp prog supports xdp fragment.
      If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.
      
      And a static key is added and It will be used to call the xdp_clean
      handler in ->poll(). data plane implementation will be contained
      the followed patch.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d14657f
    • David S. Miller's avatar
      Merge branch 'dsa-cross-chip-notifier-cleanup' · 8ab38ed7
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      DSA cross-chip notifier cleanups
      
      This patch set makes the following improvements:
      
      - Cross-chip notifiers pass a switch index, port index, sometimes tree
        index, all as integers. Sometimes we need to recover the struct
        dsa_port based on those integers. That recovery involves traversing a
        list. By passing directly a pointer to the struct dsa_port we can
        avoid that, and the indices passed previously can still be obtained
        from the passed struct dsa_port.
      
      - Resetting VLAN filtering on a switch has explicit code to make it run
        on a single switch, so it has no place to stay in the cross-chip
        notifier code. Move it out.
      
      - Changing the MTU on a user port affects only that single port, yet the
        code passes through the cross-chip notifier layer where all switches
        are notified. Avoid that.
      
      - Other related cosmetic changes in the MTU changing procedure.
      
      Apart from the slight improvement in performance given by
      (a) doing less work in cross-chip notifiers
      (b) emitting less cross-chip notifiers
      we also end up with about 100 less lines of code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ab38ed7
    • Vladimir Oltean's avatar
      net: dsa: don't emit targeted cross-chip notifiers for MTU change · be6ff966
      Vladimir Oltean authored
      A cross-chip notifier with "targeted_match=true" is one that matches
      only the local port of the switch that emitted it. In other words,
      passing through the cross-chip notifier layer serves no purpose.
      
      Eliminate this concept by calling directly ds->ops->port_change_mtu
      instead of emitting a targeted cross-chip notifier. This leaves the
      DSA_NOTIFIER_MTU event being emitted only for MTU updates on the CPU
      port, which need to be reflected also across all DSA links.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be6ff966
    • Vladimir Oltean's avatar
      net: dsa: drop dsa_slave_priv from dsa_slave_change_mtu · 4715029f
      Vladimir Oltean authored
      We can get a hold of the "ds" pointer directly from "dp", no need for
      the dsa_slave_priv.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4715029f
    • Vladimir Oltean's avatar
      net: dsa: avoid one dsa_to_port() in dsa_slave_change_mtu · cf1c39d3
      Vladimir Oltean authored
      We could retrieve the cpu_dp pointer directly from the "dp" we already
      have, no need to resort to dsa_to_port(ds, port).
      
      This change also removes the need for an "int port", so that is also
      deleted.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf1c39d3
    • Vladimir Oltean's avatar
      net: dsa: use dsa_tree_for_each_user_port in dsa_slave_change_mtu · b2033a05
      Vladimir Oltean authored
      Use the more conventional iterator over user ports instead of explicitly
      ignoring them, and use the more conventional name "other_dp" instead of
      "dp_iter", for readability.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2033a05
    • Vladimir Oltean's avatar
      net: dsa: make cross-chip notifiers more efficient for host events · 726816a1
      Vladimir Oltean authored
      To determine whether a given port should react to the port targeted by
      the notifier, dsa_port_host_vlan_match() and dsa_port_host_address_match()
      look at the positioning of the switch port currently executing the
      notifier relative to the switch port for which the notifier was emitted.
      
      To maintain stylistic compatibility with the other match functions from
      switch.c, the host address and host VLAN match functions take the
      notifier information about targeted port, switch and tree indices as
      argument. However, these functions only use that information to retrieve
      the struct dsa_port *targeted_dp, which is an invariant for the outer
      loop that calls them. So it makes more sense to calculate the targeted
      dp only once, and pass it to them as argument.
      
      But furthermore, the targeted dp is actually known at the time the call
      to dsa_port_notify() is made. It is just that we decide to only save the
      indices of the port, switch and tree in the notifier structure, just to
      retrace our steps and find the dp again using dsa_switch_find() and
      dsa_to_port().
      
      But both the above functions are relatively expensive, since they need
      to iterate through lists. It appears more straightforward to make all
      notifiers just pass the targeted dp inside their info structure, and
      have the code that needs the indices to look at info->dp->index instead
      of info->port, or info->dp->ds->index instead of info->sw_index, or
      info->dp->ds->dst->index instead of info->tree_index.
      
      For the sake of consistency, all cross-chip notifiers are converted to
      pass the "dp" directly.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      726816a1
    • Vladimir Oltean's avatar
      net: dsa: move reset of VLAN filtering to dsa_port_switchdev_unsync_attrs · 8e9e678e
      Vladimir Oltean authored
      In dsa_port_switchdev_unsync_attrs() there is a comment that resetting
      the VLAN filtering isn't done where it is expected. And since commit
      108dc874 ("net: dsa: Avoid cross-chip syncing of VLAN filtering"),
      there is no reason to handle this in switch.c either.
      
      Therefore, move the logic to port.c, and adapt it slightly to the data
      structures and naming conventions from there.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e9e678e
  3. 19 Apr, 2022 1 commit