1. 24 Oct, 2022 40 commits
    • Yunsheng Lin's avatar
      net: skb: move skb_pp_recycle() to skbuff.c · 4727bab4
      Yunsheng Lin authored
      skb_pp_recycle() is only used by skb_free_head() in
      skbuff.c, so move it to skbuff.c.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4727bab4
    • Nick Child's avatar
      ibmveth: Always stop tx queues during close · 127b7218
      Nick Child authored
      netif_stop_all_queues must be called before calling H_FREE_LOGICAL_LAN.
      As a result, we can remove the pool_config field from the ibmveth
      adapter structure.
      
      Some device configuration changes call ibmveth_close in order to free
      the current resources held by the device. These functions then make
      their changes and call ibmveth_open to reallocate and reserve resources
      for the device.
      
      Prior to this commit, the flag pool_config was used to tell ibmveth_close
      that it should not halt the transmit queue. pool_config was introduced in
      commit 860f242e ("[PATCH] ibmveth change buffer pools dynamically")
      to avoid interrupting the tx flow when making rx config changes. Since
      then, other commits adopted this approach, even if making tx config
      changes.
      
      The issue with this approach was that the hypervisor freed all of
      the devices control structures after the hcall H_FREE_LOGICAL_LAN
      was performed but the transmit queues were never stopped. So the higher
      layers in the network stack would continue transmission but any
      H_SEND_LOGICAL_LAN hcall would fail with H_PARAMETER until the
      hypervisor's structures for the device were allocated with the
      H_REGISTER_LOGICAL_LAN hcall in ibmveth_open. This resulted in
      no real networking harm but did cause several of these error
      messages to be logged: "h_send_logical_lan failed with rc=-4"
      
      So, instead of trying to keep the transmit queues alive during network
      configuration changes, just stop the queues, make necessary changes then
      restart the queues.
      Signed-off-by: default avatarNick Child <nnac123@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      127b7218
    • xu xin's avatar
      net: remove useless parameter of __sock_cmsg_send · 233baf9a
      xu xin authored
      The parameter 'msg' has never been used by __sock_cmsg_send, so we can remove it
      safely.
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarxu xin <xu.xin16@zte.com.cn>
      Reviewed-by: default avatarZhang Yunkai <zhang.yunkai@zte.com.cn>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      233baf9a
    • Wei Fang's avatar
      net: fec: Add support for periodic output signal of PPS · 350749b9
      Wei Fang authored
      This patch adds the support for configuring periodic output
      signal of PPS. So the PPS can be output at a specified time
      and period.
      For developers or testers, they can use the command "echo
      <channel> <start.sec> <start.nsec> <period.sec> <period.
      nsec> > /sys/class/ptp/ptp0/period" to specify time and
      period to output PPS signal.
      Notice that, the channel can only be set to 0. In addtion,
      the start time must larger than the current PTP clock time.
      So users can use the command "phc_ctl /dev/ptp0 -- get" to
      get the current PTP clock time before.
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      350749b9
    • Eric Dumazet's avatar
      net: add a refcount tracker for kernel sockets · 0cafd77d
      Eric Dumazet authored
      Commit ffa84b5f ("net: add netns refcount tracker to struct sock")
      added a tracker to sockets, but did not track kernel sockets.
      
      We still have syzbot reports hinting about netns being destroyed
      while some kernel TCP sockets had not been dismantled.
      
      This patch tracks kernel sockets, and adds a ref_tracker_dir_print()
      call to net_free() right before the netns is freed.
      
      Normally, each layer is responsible for properly releasing its
      kernel sockets before last call to net_free().
      
      This debugging facility is enabled with CONFIG_NET_NS_REFCNT_TRACKER=y
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Tested-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cafd77d
    • David S. Miller's avatar
      Merge branch 'udp-false-sharing' · b29e0dec
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      udp: avoid false sharing on receive
      
      Under high UDP load, the BH processing and the user-space receiver can
      run on different cores.
      
      The UDP implementation does a lot of effort to avoid false sharing in
      the receive path, but recent changes to the struct sock layout moved
      the sk_forward_alloc and the sk_rcvbuf fields on the same cacheline:
      
              /* --- cacheline 4 boundary (256 bytes) --- */
                      struct sk_buff *   tail;
              } sk_backlog;
              int                        sk_forward_alloc;
              unsigned int               sk_reserved_mem;
              unsigned int               sk_ll_usec;
              unsigned int               sk_napi_id;
              int                        sk_rcvbuf;
      
      sk_forward_alloc is updated by the BH, while sk_rcvbuf is accessed by
      udp_recvmsg(), causing false sharing.
      
      A possible solution would be to re-order the struct sock fields to avoid
      the false sharing. Such change is subject to being invalidated by future
      changes and could have negative side effects on other workload.
      
      Instead this series uses a different approach, touching only the UDP
      socket layout.
      
      The first patch generalizes the custom setsockopt infrastructure, to
      allow UDP tracking the buffer size, and the second patch addresses the
      issue, copying the relevant buffer information into an already hot
      cacheline.
      
      Overall the above gives a 10% peek throughput increase under UDP flood.
      
      v1 -> v2:
       - introduce and use a common helper to initialize the UDP v4/v6 sockets
         (Kuniyuki)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b29e0dec
    • Paolo Abeni's avatar
      udp: track the forward memory release threshold in an hot cacheline · 8a3854c7
      Paolo Abeni authored
      When the receiver process and the BH runs on different cores,
      udp_rmem_release() experience a cache miss while accessing sk_rcvbuf,
      as the latter shares the same cacheline with sk_forward_alloc, written
      by the BH.
      
      With this patch, UDP tracks the rcvbuf value and its update via custom
      SOL_SOCKET socket options, and copies the forward memory threshold value
      used by udp_rmem_release() in a different cacheline, already accessed by
      the above function and uncontended.
      
      Since the UDP socket init operation grown a bit, factor out the common
      code between v4 and v6 in a shared helper.
      
      Overall the above give a 10% peek throughput increase under UDP flood.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a3854c7
    • Paolo Abeni's avatar
      net: introduce and use custom sockopt socket flag · a5ef058d
      Paolo Abeni authored
      We will soon introduce custom setsockopt for UDP sockets, too.
      Instead of doing even more complex arbitrary checks inside
      sock_use_custom_sol_socket(), add a new socket flag and set it
      for the relevant socket types (currently only MPTCP).
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5ef058d
    • David S. Miller's avatar
      Merge branch 'net-800Gbps-support' · ea5ed0f0
      David S. Miller authored
      Petr Machata says:
      
      ====================
      net: Add support for 800Gbps speed
      
      Amit Cohen <amcohen@nvidia.com> writes:
      
      The next Nvidia Spectrum ASIC will support 800Gbps speed.
      The IEEE 802 LAN/MAN Standards Committee already published standards for
      800Gbps, see the last update [1] and the list of approved changes [2].
      
      As first phase, add support for 800Gbps over 8 lanes (100Gbps/lane).
      In the future 800Gbps over 4 lanes can be supported also.
      
      Extend ethtool to support the relevant PMDs and extend mlxsw and bonding
      drivers to support 800Gbps.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea5ed0f0
    • Amit Cohen's avatar
      bonding: 3ad: Add support for 800G speed · 41305d37
      Amit Cohen authored
      Add support for 800Gbps speed to allow using 3ad mode with 800G devices.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41305d37
    • Amit Cohen's avatar
      mlxsw: Add support for 800Gbps link modes · cceef209
      Amit Cohen authored
      Add support for 800Gbps speed, link modes of 100Gbps per lane.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cceef209
    • Amit Cohen's avatar
      ethtool: Add support for 800Gbps link modes · 404c7678
      Amit Cohen authored
      Add support for 800Gbps speed, link modes of 100Gbps per lane.
      As mentioned in slide 21 in IEEE documentation [1], all adopted 802.3df
      copper and optical PMDs baselines using 100G/lane will be supported.
      
      Add the relevant PMDs which are mentioned in slide 5 in IEEE
      documentation [1] and were approved on 10-2022 [2]:
      BP - KR8
      Cu Cable - CR8
      MMF 50m - VR8
      MMF 100m - SR8
      SMF 500m - DR8
      SMF 2km - DR8-2
      
      [1]: https://www.ieee802.org/3/df/public/22_10/22_1004/shrikhande_3df_01a_221004.pdf
      [2]: https://ieee802.org/3/df/KeyMotions_3df_221005.pdfSigned-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      404c7678
    • David S. Miller's avatar
      Merge branch 'sparx5-IS2-VCAP' · c1aa0a90
      David S. Miller authored
      Steen Hegelund says:
      
      ====================
      Add support for Sparx5 IS2 VCAP
      
      This provides initial support for the Sparx5 VCAP functionality via the
      'tc' traffic control userspace tool and its flower filter.
      
      Overview:
      =========
      
      The supported flower filter keys and actions are:
      
      - source and destination MAC address keys
      - trap action
      - pass action
      
      The supported Sparx5 VCAPs are: IS2 (see below for more info)
      
      The VCAP (Versatile Content-Aware Processor) feature is essentially a TCAM
      with rules consisting of:
      
      - Programmable key fields
      - Programmable action fields
      - A counter (which may be only one bit wide)
      
      Besides this each VCAP has:
      
      - A number of independent lookups
      - A keyset configuration typically per port per lookup
      
      VCAPs are used in many of the TSN features such as PSFP, PTP, FRER as well
      as the general shaping, policing and access control, so it is an important
      building block for these advanced features.
      
      Functionality:
      ==============
      
      When a frame is passed to a VCAP the VCAP will generate a set of keys
      (keyset) based on the traffic type.  If there is a rule created with this
      keyset in the VCAP and the values of the keys matches the values in the
      keyset of the frame, the rule is said to match and the actions in the rule
      will be executed and the rule counter will be incremented.  No more rules
      will be examined in this VCAP lookup.
      
      If there is no match in the current lookup the frame will be matched
      against the next lookup (some VCAPs do the processing of the lookups in
      parallel).
      
      The Sparx5 SoC has 6 different VCAP types:
      
      - IS0: Ingress Stage 0 (AKA CLM) mostly handles classification
      - IS2: Ingress Stage 2 mostly handles access control
      - IP6PFX: IPv6 prefix: Provides tables for IPV6 address management
      - LPM: Longest Path Match for IP guarding and routing
      - ES0: Egress Stage 0 is mostly used for CPU copying and multicast handling
      - ES2: Egress Stage 2 is known as the rewriter and mostly updates tags
      
      Design:
      =======
      
      The VCAP implementation provides switchcore independent handling of rules
      and supports:
      
      - Creating and deleting rules
      - Updating and getting rules
      
      The platform specific API implementation as well as the platform specific
      model of the VCAP instances are attached to the VCAP API and a client can
      then access rules via the API in a platform independent way, with the
      limitations that each VCAP has in terms of is supported keys and actions.
      
      The VCAP model is generated from information delivered by the designers of
      the VCAP hardware.
      
      Here is an illustration of this:
      
        +------------------+     +------------------+
        | TC flower filter |     | PTP client       |
        | for Sparx5       |     | for Sparx5       |
        +-------------\----+     +---------/--------+
                       \                  /
                        \                /
                         \              /
                          \            /
                           \          /
                       +----v--------v----+
                       |     VCAP API     |
                       +---------|--------+
                                 |
                                 |
                                 |
                                 |
                       +---------v--------+
                       |   VCAP control   |
                       |   instance       |
                       +----/--------|----+
                           /         |
                          /          |
                         /           |
                        /            |
        +--------------v---+    +----v-------------+
        |   Sparx5 VCAP    |    | Sparx5 VCAP API  |
        |   model          |    | Implementation   |
        +------------------+    +---------|--------+
                                          |
                                          |
                                          |
                                          |
                                +---------v--------+
                                | Sparx5 VCAP HW   |
                                +------------------+
      
      Delivery:
      =========
      
      For now only the IS2 is supported but later the IS0, ES0 and ES2 will be
      added. There are currently no plans to support the IP6PFX and the LPM
      VCAPs.
      
      The IS2 VCAP has 4 lookups and they are accessible with a TC chain id:
      
      - chain 8000000: IS2 Lookup 0
      - chain 8100000: IS2 Lookup 1
      - chain 8200000: IS2 Lookup 2
      - chain 8300000: IS2 Lookup 3
      
      These lookups are executed in parallel by the IS2 VCAP but the actions are
      executed in series (the datasheet explains what happens if actions
      overlap).
      
      The functionality of TC flower as well as TC matchall filters will be
      expanded in later submissions as well as the number of VCAPs supported.
      
      This is current plan:
      
      - add support for more TC flower filter keys and extend the Sparx5 port
        keyset configuration
      - support for TC protocol all
      - debugfs support for inspecting rules
      - TC flower filter statistics
      - Sparx5 IS0 VCAP support and more TC keys and actions to support this
      - add TC policer and drop action support (depends on the Sparx5 QoS support
        upstreamed separately)
      - Sparx5 ES0 VCAP support and more TC actions to support this
      - TC flower template support
      - TC matchall filter support for mirroring and policing ports
      - TC flower filter mirror action support
      - Sparx5 ES2 VCAP support
      
      The LAN966x switchcore will also be updated to use the VCAP API as well as
      future Microchip switches.
      The LAN966x has 3 VCAPS (IS1, IS2 and ES0) and a slightly different keyset
      and actionset portfolio than Sparx5.
      
      Version History:
      ================
      v3      Moved the sparx5_tc_flower_set_exterr function to the VCAP API and
              renamed it.
              Moved the sparx5_netbytes_copy function to the VCAP_API and renamed
              it (thanks Horatiu Vultur).
              Fixed indentation in the vcap_write_rule function.
              Added a comment mentioning the typegroup table terminator in the
              vcap_iter_skip_tg function.
      
      v2      Made the KUNIT test model a superset of the real model to fix a
              kernel robot build error.
      
      v1      Initial version
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1aa0a90
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding KUNIT test for the VCAP API · 67d63751
      Steen Hegelund authored
      This provides a KUNIT test suite for the VCAP APIs encoding functionality.
      
      The test can be run by adding these settings in a .kunitconfig file
      
      CONFIG_KUNIT=y
      CONFIG_NET=y
      CONFIG_VCAP_KUNIT_TEST=y
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67d63751
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding KUNIT test VCAP model · 5d7e5b04
      Steen Hegelund authored
      This provides a test VCAP model for use in a KUNIT test.  The model
      provides 3 different VCAP types for better test coverage.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d7e5b04
    • Steen Hegelund's avatar
      net: microchip: sparx5: Writing rules to the IS2 VCAP · 683e05c0
      Steen Hegelund authored
      This adds rule encoding functionality to the VCAP API.
      
      A rule consists of keys and actions in separate cache sections.
      
      The maximum size of the keyset or actionset determines the size of the
      rule.
      
      The VCAP hardware need to be able to distinguish different rule sizes from
      each other, and for that purpose some extra typegroup bits are added to the
      rule when it is encoded.
      
      The API provides a bit stream iterator that allows highlevel encoding
      functionality to add key and action value bits independent of typegroup
      bits.
      
      This is handled by letting the concrete VCAP model provide the typegroup
      table for the different rule sizes.
      After the key and action values have been added to the encoding bit streams
      the typegroup bits are set to their correct values just before the rule is
      written to the VCAP hardware.
      
      The key and action offsets provided in the VCAP model are the offset before
      adding the typegroup bits.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Tested-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      683e05c0
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding basic rule management in VCAP API · 8e10490b
      Steen Hegelund authored
      This provides most of the rule handling needed to add a new rule to a VCAP.
      To add a rule a client must follow these steps:
      
      1) Allocate a new rule (provide an id or get one automatically assigned)
      2) Add keys to the rule
      3) Add actions to the rule
      4) Optionally set a keyset on the rule
      5) Optionally set an actionset on the rule
      6) Validate the rule (this will add keyset and actionset if not specified
         in the previous steps)
      7) Add the rule (if the validation was successful)
      8) Free the rule instance (a copy has been added to the VCAP)
      
      The validation step will fail if there are no keysets with the requested
      keys, or there are no actionsets with the requested actions.
      The validation will also fail if the keyset is not configured for the port
      for the requested protocol).
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Tested-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e10490b
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding port keyset config and callback interface · 46be056e
      Steen Hegelund authored
      This provides a default port keyset configuration for the Sparx5 IS2 VCAP
      where all ports and all lookups in IS2 use the same keyset (MAC_ETYPE) for
      all types of traffic.
      
      This means that no matter what frame type is received on any front port it
      will generate the MAC_ETYPE keyset in the IS VCAP and any rule in the IS2
      VCAP that uses this keyset will be matched against the keys in the
      MAC_ETYPE keyset.
      
      The callback interface used by the VCAP API is populated with Sparx5
      specific handler functions that takes care of the actual reading and
      writing to data to the Sparx5 IS2 VCAP instance.
      
      A few functions are also added to the VCAP API to support addition of rule
      fields such as the ingress port mask and the lookup bit.
      
      The IS2 VCAP in Sparx5 is really divided in two instances with lookup 0
      and 1 in the first instance and lookup 2 and 3 in the second instance.
      The lookup bit selects lookup 0 or 3 in the respective instance when it is
      set.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Tested-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46be056e
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding initial tc flower support for VCAP API · c9da1ac1
      Steen Hegelund authored
      This adds initial TC flower filter support to Sparx5 for the IS2 VCAP.
      
      The support consists of the source and destination MAC addresses,
      and the trap and pass actions.
      
      This is how you can create a rule that test the functionality:
      
      tc qdisc add dev eth0 clsact
      tc filter add dev eth0 ingress chain 8000000 prio 10 handle 10 \
            protocol all flower skip_sw \
            dst_mac 0a:0b:0c:0d:0e:0f \
            src_mac 2:0:0:0:0:1 \
            action trap
      
      The IS2 chains in Sparx5 are assigned like this:
      
      - chain 8000000: IS2 Lookup 0
      - chain 8100000: IS2 Lookup 1
      - chain 8200000: IS2 Lookup 2
      - chain 8300000: IS2 Lookup 3
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Tested-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9da1ac1
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding IS2 VCAP register interface · 45c00ad0
      Steen Hegelund authored
      This adds the register interface needed to access the Sparx5 Ingress Stage
      2 VCAP (IS2).
      
      The Sparx5 Chip Register Model can be browsed at this location:
      https://github.com/microchip-ung/sparx-5_reginfoSigned-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45c00ad0
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding IS2 VCAP model to VCAP API · e8145e06
      Steen Hegelund authored
      This provides the Sparx5 Ingress Stage 2 (IS2) model and adds it to the
      VCAP control instance that will be provided to the VCAP API.
      
      The Sparx5 IS2 C code model is generated from the Sparx5 RTL design model.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8145e06
    • Steen Hegelund's avatar
      net: microchip: sparx5: Adding initial VCAP API support · 8beef08f
      Steen Hegelund authored
      This provides the initial VCAP API framework and Sparx5 specific VCAP
      implementation.
      
      When the Sparx5 Switchdev driver is initialized it will also initialize its
      VCAP module, and this hooks up the concrete Sparx5 VCAP model to the VCAP
      API, so that the VCAP API knows what VCAP instances are available.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8beef08f
    • Yanguo Li's avatar
      nfp: flower: tunnel neigh support bond offload · abc21095
      Yanguo Li authored
      Support hardware offload when tunnel neigh out port is bond.
      These feature work with the nfp firmware. If the firmware
      supports the NFP_FL_FEATS_TUNNEL_NEIGH_LAG feature, nfp driver
      write the bond information to the firmware neighbor table or
      do nothing for bond. when neighbor MAC changes, nfp driver
      need to update the neighbor information too.
      Signed-off-by: default avatarYanguo Li <yanguo.li@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abc21095
    • David S. Miller's avatar
      Merge branch 'inet6_destroy_sock-calls-remove' · 04d63e62
      David S. Miller authored
      Kuniyuki Iwashima says:
      
      ====================
      inet6: Remove inet6_destroy_sock() calls.
      
      This is a follow-up series for commit d38afeec ("tcp/udp: Call
      inet6_destroy_sock() in IPv6 sk->sk_destruct().").
      
      This series cleans up unnecessary inet6_destory_sock() calls in
      sk->sk_prot->destroy() and call it from sk->sk_destruct() to make
      sure we do not leak memory related to IPv6 specific-resources.
      
      Changes:
        v2:
          * patch 1
            * Fix build failure for CONFIG_MPTCP_IPV6=y
      
        v1: https://lore.kernel.org/netdev/20221018190956.1308-1-kuniyu@amazon.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04d63e62
    • Kuniyuki Iwashima's avatar
      inet6: Clean up failure path in do_ipv6_setsockopt(). · b45a337f
      Kuniyuki Iwashima authored
      We can reuse the unlock label above and need not repeat the same code.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b45a337f
    • Kuniyuki Iwashima's avatar
      inet6: Remove inet6_destroy_sock(). · 1f8c4eeb
      Kuniyuki Iwashima authored
      The last user of inet6_destroy_sock() is its wrapper inet6_cleanup_sock().
      Let's rename inet6_destroy_sock() to inet6_cleanup_sock().
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f8c4eeb
    • Kuniyuki Iwashima's avatar
      sctp: Call inet6_destroy_sock() via sk->sk_destruct(). · 6431b0f6
      Kuniyuki Iwashima authored
      After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
      in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
      sk->sk_destruct() by setting inet6_sock_destruct() to it to make
      sure we do not leak inet6-specific resources.
      
      SCTP sets its own sk->sk_destruct() in the sctp_init_sock(), and
      SCTPv6 socket reuses it as the init function.
      
      To call inet6_sock_destruct() from SCTPv6 sk->sk_destruct(), we
      set sctp_v6_destruct_sock() in a new init function.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6431b0f6
    • Kuniyuki Iwashima's avatar
      dccp: Call inet6_destroy_sock() via sk->sk_destruct(). · 1651951e
      Kuniyuki Iwashima authored
      After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
      in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
      sk->sk_destruct() by setting inet6_sock_destruct() to it to make
      sure we do not leak inet6-specific resources.
      
      DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and
      DCCPv6 socket shares it by calling the same init function via
      dccp_v6_init_sock().
      
      To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we
      export it and set dccp_v6_sk_destruct() in the init function.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1651951e
    • Kuniyuki Iwashima's avatar
      inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). · b5fc2923
      Kuniyuki Iwashima authored
      After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
      in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
      sk->sk_destruct() by setting inet6_sock_destruct() to it to make
      sure we do not leak inet6-specific resources.
      
      Now we can remove unnecessary inet6_destroy_sock() calls in
      sk->sk_prot->destroy().
      
      DCCP and SCTP have their own sk->sk_destruct() function, so we
      change them separately in the following patches.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5fc2923
    • David S. Miller's avatar
      Merge branch 'dpaa2-eth-AF_XDP-zc' · 225480f0
      David S. Miller authored
      Ioana Ciornei says:
      
      ====================
      net: dpaa2-eth: AF_XDP zero-copy support
      
      This patch set adds support for AF_XDP zero-copy in the dpaa2-eth
      driver. The support is available on the LX2160A SoC and its variants and
      only on interfaces (DPNIs) with a maximum of 8 queues (HW limitations
      are the root cause).
      
      We are first implementing the .get_channels() callback since this a
      dependency for further work.
      
      Patches 2-3 are working on making the necessary changes for multiple
      buffer pools on a single interface. By default, without an AF_XDP socket
      attached, only a single buffer pool will be used and shared between all
      the queues. The changes in the functions are made in this patch, but the
      actual allocation and setup of a new BP is done in patch#10.
      
      Patches 4-5 are improving the information exposed in debugfs. We are
      exposing a new file to show which buffer pool is used by what channels
      and how many buffers it currently has.
      
      The 6th patch updates the dpni_set_pools() firmware API so that we are
      capable of setting up a different buffer per queue in later patches.
      
      In the 7th patch the generic dev_open/close APIs are used instead of the
      dpaa2-eth internal ones.
      
      Patches 8-9 are rearranging the existing code in dpaa2-eth.c in order to
      create new functions which will be used in the XSK implementation in
      dpaa2-xsk.c
      
      Finally, the last 3 patches are adding the actual support for both the
      Rx and Tx path of AF_XDP zero-copy and some associated tracepoints.
      Details on the implementation can be found in the actual patch.
      
      Changes in v2:
       - 3/12:  Export dpaa2_eth_allocate_dpbp/dpaa2_eth_free_dpbp in this
         patch to avoid a build warning. The functions will be used in next
         patches.
       - 6/12:  Use __le16 instead of u16 for the dpbp_id field.
       - 12/12: Use xdp_buff->data_hard_start when tracing the BP seeding.
      
      Changes in v3:
       - 3/12: fix leaking of bp on error path
      ====================
      Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      225480f0
    • Robert-Ionut Alexa's avatar
      net: dpaa2-eth: add trace points on XSK events · 3817b2ac
      Robert-Ionut Alexa authored
      Define the dpaa2_tx_xsk_fd and dpaa2_rx_xsk_fd trace events for the XSK
      zero-copy Rx and Tx path.  Also, define the dpaa2_eth_buf as an event
      class so that both dpaa2_eth_buf_seed and dpaa2_xsk_buf_seed traces can
      derive from the same class.
      Signed-off-by: default avatarRobert-Ionut Alexa <robert-ionut.alexa@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3817b2ac
    • Robert-Ionut Alexa's avatar
      net: dpaa2-eth: AF_XDP TX zero copy support · 4a7f6c5a
      Robert-Ionut Alexa authored
      Add support in dpaa2-eth for packet processing on the Tx path using
      AF_XDP zero copy mode.
      
      The newly added dpaa2_xsk_tx() function will handle enqueuing AF_XDP Tx
      packets into the appropriate queue and update any necessary statistics.
      
      On a more detailed note, the dpaa2_xsk_tx_build_fd() function handles
      creating a Scatter-Gather frame descriptor with only one data buffer.
      This is needed because otherwise we would need to impose a headroom in
      the Tx buffer to store our software annotation structures.
      This tactic is already used on the normal data path of the dpaa2-eth
      driver, thus we are reusing the dpaa2_eth_sgt_get/dpaa2_eth_sgt_recycle
      functions in order to allocate and recycle the Scatter-Gather table
      buffers.
      
      In case we have reached the maximum number of Tx XSK packets to be sent
      in a NAPI cycle, we'll exit the dpaa2_eth_poll() and hope to be
      rescheduled again.
      
      On the XSK Tx confirmation path, we are just unmapping the SGT buffer
      and recycle it for further use.
      Signed-off-by: default avatarRobert-Ionut Alexa <robert-ionut.alexa@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a7f6c5a
    • Robert-Ionut Alexa's avatar
      net: dpaa2-eth: AF_XDP RX zero copy support · 48276c08
      Robert-Ionut Alexa authored
      This patch adds the support for receiving packets via the AF_XDP
      zero-copy mechanism in the dpaa2-eth driver. The support is available
      only on the LX2160A SoC and variants because we are relying on the HW
      capability to associate a buffer pool to a specific queue (QDBIN), only
      available on newer WRIOP versions.
      
      On the control path, the dpaa2_xsk_enable_pool() function is responsible
      to allocate a buffer pool (BP), setup this new BP to be used only on the
      requested queue and change the consume function to point to the XSK ZC
      one.
      We are forced to call dev_close() in order to change the queue to buffer
      pool association (dpaa2_xsk_set_bp_per_qdbin) . This also works in our
      favor since at dev_close() the buffer pools will be drained and at the
      later dev_open() call they will be again seeded, this time with buffers
      allocated from the XSK pool if needed.
      
      On the data path, a new software annotation type is defined to be used
      only for the XSK scenarios. This will enable us to pass keep necessary
      information about a packet buffer between the moment in which it was
      seeded and when it's received by the driver. In the XSK case, we are
      keeping the associated xdp_buff.
      Depending on the action returned by the BPF program, we will do the
      following:
       - XDP_PASS: copy the contents of the packet into a brand new skb,
         recycle the initial buffer.
       - XDP_TX: just enqueue the same frame descriptor back into the Tx path,
         the buffer will get automatically released into the initial BP.
       - XDP_REDIRECT: call xdp_do_redirect() and exit.
      Signed-off-by: default avatarRobert-Ionut Alexa <robert-ionut.alexa@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48276c08
    • Robert-Ionut Alexa's avatar
      net: dpaa2-eth: create and export the dpaa2_eth_receive_skb() function · ee2a3bde
      Robert-Ionut Alexa authored
      Carve out code from the dpaa2_eth_rx() function in order to create and
      export the dpaa2_eth_receive_skb() function. Do this in order to reuse
      this code also from the XSK path which will be introduced in a later
      patch.
      Signed-off-by: default avatarRobert-Ionut Alexa <robert-ionut.alexa@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee2a3bde
    • Robert-Ionut Alexa's avatar
      net: dpaa2-eth: create and export the dpaa2_eth_alloc_skb function · 129902a3
      Robert-Ionut Alexa authored
      The dpaa2_eth_alloc_skb() function is added by moving code from the
      dpaa2_eth_copybreak() previously defined function. What the new API does
      is to allocate a new skb, copy the frame data from the passed FD to the
      new skb and then return the skb.
      Export this new function since we'll need the this functionality also
      from the XSK code path.
      Signed-off-by: default avatarRobert-Ionut Alexa <robert-ionut.alexa@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      129902a3
    • Ioana Ciornei's avatar
      net: dpaa2-eth: use dev_close/open instead of the internal functions · e3caeb2d
      Ioana Ciornei authored
      Instead of calling the internal functions which implement .ndo_stop and
      .ndo_open, we can simply call dev_close and dev_open, so that we keep
      the code cleaner.
      
      Also, in the next patches we'll use the same APIs from other files
      without needing to export the internal functions.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3caeb2d
    • Robert-Ionut Alexa's avatar
      net: dpaa2-eth: update the dpni_set_pools() API to support per QDBIN pools · 801c76dd
      Robert-Ionut Alexa authored
      Update the dpni_set_pool() firmware API so that in the next patches we
      can configure per Rx queue (per QDBIN) buffer pools.
      This is a hard requirement of the AF_XDP, thus we need the newer API
      version.
      Signed-off-by: default avatarRobert-Ionut Alexa <robert-ionut.alexa@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      801c76dd
    • Ioana Ciornei's avatar
      net: dpaa2-eth: export buffer pool info into a new debugfs file · b1dd9bf6
      Ioana Ciornei authored
      Export the allocated buffer pools, the number of buffers that they have
      currently and which channels are using which BP.
      
      The output looks like below:
      
      Buffer pool info for eth2:
      IDX        BPID      Buf count      CH#0      CH#1      CH#2      CH#3
      BP#0         1           5124         x         x         x         x
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1dd9bf6
    • Ioana Ciornei's avatar
      net: dpaa2-eth: export the CH#<index> in the 'ch_stats' debug file · 96b44697
      Ioana Ciornei authored
      Just give out an index for each channel that we export into the debug
      file in the form of CH#<index>. This is purely to help corelate each
      channel information from one debugfs file to another one.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96b44697
    • Robert-Ionut Alexa's avatar
      net: dpaa2-eth: add support for multiple buffer pools per DPNI · 095174da
      Robert-Ionut Alexa authored
      This patch allows the configuration of multiple buffer pools associated
      with a single DPNI object, each distinct DPBP object not necessarily
      shared among all queues.
      The user can interogate both the number of buffer pools and the buffer
      count in each buffer pool by using the .get_ethtool_stats() callback.
      Signed-off-by: default avatarRobert-Ionut Alexa <robert-ionut.alexa@nxp.com>
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      095174da