1. 02 Apr, 2021 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · c2bcb4cf
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2021-04-01
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      We've added 68 non-merge commits during the last 7 day(s) which contain
      a total of 70 files changed, 2944 insertions(+), 1139 deletions(-).
      
      The main changes are:
      
      1) UDP support for sockmap, from Cong.
      
      2) Verifier merge conflict resolution fix, from Daniel.
      
      3) xsk selftests enhancements, from Maciej.
      
      4) Unstable helpers aka kernel func calling, from Martin.
      
      5) Batches ops for LPM map, from Pedro.
      
      6) Fix race in bpf_get_local_storage, from Yonghong.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2bcb4cf
  2. 01 Apr, 2021 30 commits
  3. 31 Mar, 2021 9 commits
    • Frank Wunderlich's avatar
      net: mediatek: add flow offload for mt7623 · 917e2e6c
      Frank Wunderlich authored
      mt7623 uses offload version 2 too
      
      tested on Bananapi-R2
      Signed-off-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      917e2e6c
    • Voon Weifeng's avatar
      net: stmmac: enable MTL ECC Error Address Status Over-ride by default · b494ba5a
      Voon Weifeng authored
      Turn on the MEEAO field of MTL_ECC_Control_Register by default.
      
      As the MTL ECC Error Address Status Over-ride(MEEAO) is set by default,
      the following error address fields will hold the last valid address
      where the error is detected.
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarTan Tee Min <tee.min.tan@intel.com>
      Co-developed-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b494ba5a
    • David S. Miller's avatar
      Merge branch 'nxp-enetc-xdp' · 77890db1
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      XDP for NXP ENETC
      
      This series adds support to the enetc driver for the basic XDP primitives.
      The ENETC is a network controller found inside the NXP LS1028A SoC,
      which is a dual-core Cortex A72 device for industrial networking,
      with the CPUs clocked at up to 1.3 GHz. On this platform, there are 4
      ENETC ports and a 6-port embedded DSA switch, in a topology that looks
      like this:
      
        +-------------------------------------------------------------------------+
        |                    +--------+ 1 Gbps (typically disabled)               |
        | ENETC PCI          |  ENETC |--------------------------+                |
        | Root Complex       | port 3 |-----------------------+  |                |
        | Integrated         +--------+                       |  |                |
        | Endpoint                                            |  |                |
        |                    +--------+ 2.5 Gbps              |  |                |
        |                    |  ENETC |--------------+        |  |                |
        |                    | port 2 |-----------+  |        |  |                |
        |                    +--------+           |  |        |  |                |
        |                                         |  |        |  |                |
        |                        +------------------------------------------------+
        |                        |             |  Felix |  |  Felix |             |
        |                        | Switch      | port 4 |  | port 5 |             |
        |                        |             +--------+  +--------+             |
        |                        |                                                |
        | +--------+  +--------+ | +--------+  +--------+  +--------+  +--------+ |
        | |  ENETC |  |  ENETC | | |  Felix |  |  Felix |  |  Felix |  |  Felix | |
        | | port 0 |  | port 1 | | | port 0 |  | port 1 |  | port 2 |  | port 3 | |
        +-------------------------------------------------------------------------+
               |          |             |           |            |          |
               v          v             v           v            v          v
             Up to      Up to                      Up to 4x 2.5Gbps
            2.5Gbps     1Gbps
      
      The ENETC ports 2 and 3 can act as DSA masters for the embedded switch.
      Because 4 out of the 6 externally-facing ports of the SoC are switch
      ports, the most interesting use case for XDP on this device is in fact
      XDP_TX on the 2.5Gbps DSA master.
      
      Nonetheless, the results presented below are for IPv4 forwarding between
      ENETC port 0 (eno0) and port 1 (eno1) both configured for 1Gbps.
      There are two streams of IPv4/UDP datagrams with a frame length of 64
      octets delivered at 100% port load to eno0 and to eno1. eno0 has a flow
      steering rule to process the traffic on RX ring 0 (CPU 0), and eno1 has
      a flow steering rule towards RX ring 1 (CPU 1).
      
      For the IPFWD test, standard IP routing was enabled in the netns.
      For the XDP_DROP test, the samples/bpf/xdp1 program was attached to both
      eno0 and to eno1.
      For the XDP_TX test, the samples/bpf/xdp2 program was attached to both
      eno0 and to eno1.
      For the XDP_REDIRECT test, the samples/bpf/xdp_redirect program was
      attached once to the input of eno0/output of eno1, and twice to the
      input of eno1/output of eno0.
      
      Finally, the preliminary results are as follows:
      
              | IPFWD | XDP_TX | XDP_REDIRECT | XDP_DROP
      --------+-------+--------+-------------------------
      fps     | 761   | 2535   | 1735         | 2783
      Gbps    | 0.51  | 1.71   | 1.17         | n/a
      
      There is a strange phenomenon in my testing sistem where it appears that
      one CPU is processing more than the other. I have not investigated this
      too much. Also, the code might not be very well optimized (for example
      dma_sync_for_device is called with the full ENETC_RXB_DMA_SIZE_XDP).
      
      Design wise, the ENETC is a PCI device with BD rings, so it uses the
      MEM_TYPE_PAGE_SHARED memory model, as can typically be seen in Intel
      devices. The strategy was to build upon the existing model that the
      driver uses, and not change it too much. So you will see things like a
      separate NAPI poll function for XDP.
      
      I have only tested with PAGE_SIZE=4096, and since we split pages in
      half, it means that MTU-sized frames are scatter/gather (the XDP
      headroom + skb_shared_info only leaves us 1476 bytes of data per
      buffer). This is sub-optimal, but I would rather keep it this way and
      help speed up Lorenzo's series for S/G support through testing, rather
      than change the enetc driver to use some other memory model like page_pool.
      My code is already structured for S/G, and that works fine for XDP_DROP
      and XDP_TX, just not for XDP_REDIRECT, even between two enetc ports.
      So the S/G XDP_REDIRECT is stubbed out (the frames are dropped), but
      obviously I would like to remove that limitation soon.
      
      Please note that I am rather new to this kind of stuff, I am more of a
      control path person, so I would appreciate feedback.
      
      Enough talking, on to the patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77890db1
    • Vladimir Oltean's avatar
      net: enetc: add support for XDP_REDIRECT · 9d2b68cc
      Vladimir Oltean authored
      The driver implementation of the XDP_REDIRECT action reuses parts from
      XDP_TX, most notably the enetc_xdp_tx function which transmits an array
      of TX software BDs. Only this time, the buffers don't have DMA mappings,
      we need to create them.
      
      When a BPF program reaches the XDP_REDIRECT verdict for a frame, we can
      employ the same buffer reuse strategy as for the normal processing path
      and for XDP_PASS: we can flip to the other page half and seed that to
      the RX ring.
      
      Note that scatter/gather support is there, but disabled due to lack of
      multi-buffer support in XDP (which is added by this series):
      https://patchwork.kernel.org/project/netdevbpf/cover/cover.1616179034.git.lorenzo@kernel.org/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d2b68cc
    • Vladimir Oltean's avatar
      net: enetc: increase RX ring default size · d6a2829e
      Vladimir Oltean authored
      As explained in the XDP_TX patch, when receiving a burst of frames with
      the XDP_TX verdict, there is a momentary dip in the number of available
      RX buffers. The system will eventually recover as TX completions will
      start kicking in and refilling our RX BD ring again. But until that
      happens, we need to survive with as few out-of-buffer discards as
      possible.
      
      This increases the memory footprint of the driver in order to avoid
      discards at 2.5Gbps line rate 64B packet sizes, the maximum speed
      available for testing on 1 port on NXP LS1028A.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6a2829e
    • Vladimir Oltean's avatar
      net: enetc: add support for XDP_TX · 7ed2bc80
      Vladimir Oltean authored
      For reflecting packets back into the interface they came from, we create
      an array of TX software BDs derived from the RX software BDs. Therefore,
      we need to extend the TX software BD structure to contain most of the
      stuff that's already present in the RX software BD structure, for
      reasons that will become evident in a moment.
      
      For a frame with the XDP_TX verdict, we don't reuse any buffer right
      away as we do for XDP_DROP (the same page half) or XDP_PASS (the other
      page half, same as the skb code path).
      
      Because the buffer transfers ownership from the RX ring to the TX ring,
      reusing any page half right away is very dangerous. So what we can do is
      we can recycle the same page half as soon as TX is complete.
      
      The code path is:
      enetc_poll
      -> enetc_clean_rx_ring_xdp
         -> enetc_xdp_tx
         -> enetc_refill_rx_ring
      (time passes, another MSI interrupt is raised)
      enetc_poll
      -> enetc_clean_tx_ring
         -> enetc_recycle_xdp_tx_buff
      
      But that creates a problem, because there is a potentially large time
      window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in
      which we'll have less and less RX buffers.
      
      Basically, when the ship starts sinking, the knee-jerk reaction is to
      let enetc_refill_rx_ring do what it does for the standard skb code path
      (refill every 16 consumed buffers), but that turns out to be very
      inefficient. The problem is that we have no rx_swbd->page at our
      disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would
      have to call enetc_new_page for every buffer that we refill (if we
      choose to refill at this early stage). Very inefficient, it only makes
      the problem worse, because page allocation is an expensive process, and
      CPU time is exactly what we're lacking.
      
      Additionally, there is an even bigger problem: if we let
      enetc_refill_rx_ring top up the ring's buffers again from the RX path,
      remember that the buffers sent to transmission haven't disappeared
      anywhere. They will be eventually sent, and processed in
      enetc_clean_tx_ring, and an attempt will be made to recycle them.
      But surprise, the RX ring is already full of new buffers, because we
      were premature in deciding that we should refill. So not only we took
      the expensive decision of allocating new pages, but now we must throw
      away perfectly good and reusable buffers.
      
      So what we do is we implement an elastic refill mechanism, which keeps
      track of the number of in-flight XDP_TX buffer descriptors. We top up
      the RX ring only up to the total ring capacity minus the number of BDs
      that are in flight (because we know that those BDs will return to us
      eventually).
      
      The enetc driver manages 1 RX ring per CPU, and the default TX ring
      management is the same. So we do XDP_TX towards the TX ring of the same
      index, because it is affined to the same CPU. This will probably not
      produce great results when we have a tc-taprio/tc-mqprio qdisc on the
      interface, because in that case, the number of TX rings might be
      greater, but I didn't add any checks for that yet (mostly because I
      didn't know what checks to add).
      
      It should also be noted that we need to change the DMA mapping direction
      for RX buffers, since they may now be reflected into the TX ring of the
      same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and
      remapping as DMA_TO_DEVICE, because performance is better this way.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ed2bc80
    • Vladimir Oltean's avatar
      net: enetc: add support for XDP_DROP and XDP_PASS · d1b15102
      Vladimir Oltean authored
      For the RX ring, enetc uses an allocation scheme based on pages split
      into two buffers, which is already very efficient in terms of preventing
      reallocations / maximizing reuse, so I see no reason why I would change
      that.
      
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half B | half B | half B | half B | half B | half B | half B |
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half A | half A | half A | half A | half A | half A | half A | RX ring
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
           ^                                                     ^
           |                                                     |
       next_to_clean                                       next_to_alloc
                                                            next_to_use
      
                         +--------+--------+--------+--------+--------+
                         |        |        |        |        |        |
                         | half B | half B | half B | half B | half B |
                         |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half B | half B | half A | half A | half A | half A | half A | RX ring
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |   ^                                   ^
       | half A | half A |   |                                   |
       |        |        | next_to_clean                   next_to_use
       +--------+--------+
                    ^
                    |
               next_to_alloc
      
      then when enetc_refill_rx_ring is called, whose purpose is to advance
      next_to_use, it sees that it can take buffers up to next_to_alloc, and
      it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate
      one!".
      
      The only problem is that for default PAGE_SIZE values of 4096, buffer
      sizes are 2048 bytes. While this is enough for normal skb allocations at
      an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256
      bytes, and including skb_shared_info and alignment, we end up being able
      to make use of only 1472 bytes, which is insufficient for the default
      MTU.
      
      To solve that problem, we implement scatter/gather processing in the
      driver, because we would really like to keep the existing allocation
      scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and
      another one of 28 bytes.
      
      Because the headroom required by XDP is different (and much larger) than
      the one required by the network stack, whenever a BPF program is added
      or deleted on the port, we drain the existing RX buffers and seed new
      ones with the required headroom. We also keep the required headroom in
      rx_ring->buffer_offset.
      
      The simplest way to implement XDP_PASS, where an skb must be created, is
      to create an xdp_buff based on the next_to_clean RX BDs, but not clear
      those BDs from the RX ring yet, just keep the original index at which
      the BDs for this frame started. Then, if the verdict is XDP_PASS,
      instead of converting the xdb_buff to an skb, we replay a call to
      enetc_build_skb (just as in the normal enetc_clean_rx_ring case),
      starting from the original BD index.
      
      We would also like to be minimally invasive to the regular RX data path,
      and not check whether there is a BPF program attached to the ring on
      every packet. So we create a separate RX ring processing function for
      XDP.
      
      Because we only install/remove the BPF program while the interface is
      down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there
      shouldn't be any circumstance in which we are processing packets and
      there is a potentially freed BPF program attached to the RX ring.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1b15102
    • Vladimir Oltean's avatar
      net: enetc: move up enetc_reuse_page and enetc_page_reusable · 65d0cbb4
      Vladimir Oltean authored
      For XDP_TX, we need to call enetc_reuse_page from enetc_clean_tx_ring,
      so we need to avoid a forward declaration.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d0cbb4
    • Vladimir Oltean's avatar
      net: enetc: clean the TX software BD on the TX confirmation path · 1ee8d6f3
      Vladimir Oltean authored
      With the future introduction of some new fields into enetc_tx_swbd such
      as is_xdp_tx, is_xdp_redirect etc, we need not only to set these bits
      to true from the XDP_TX/XDP_REDIRECT code path, but also to false from
      the old code paths.
      
      This is because TX software buffer descriptors are kept in a ring that
      is shadow of the hardware TX ring, so these structures keep getting
      reused, and there is always the possibility that when a software BD is
      reused (after we ran a full circle through the TX ring), the old user of
      the tx_swbd had set is_xdp_tx = true, and now we are sending a regular
      skb, which would need to set is_xdp_tx = false.
      
      To be minimally invasive to the old code paths, let's just scrub the
      software TX BD in the TX confirmation path (enetc_clean_tx_ring), once
      we know that nobody uses this software TX BD (tx_ring->next_to_clean
      hasn't yet been updated, and the TX paths check enetc_bd_unused which
      tells them if there's any more space in the TX ring for a new enqueue).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ee8d6f3