1. 08 Jun, 2021 29 commits
  2. 07 Jun, 2021 11 commits
    • David S. Miller's avatar
      Merge branch 'page_pool-recycling' · dc8cf755
      David S. Miller authored
      Matteo Croce says:
      
      ====================
      page_pool: recycle buffers
      
      This is a respin of [1]
      
      This patchset shows the plans for allowing page_pool to handle and
      maintain DMA map/unmap of the pages it serves to the driver. For this
      to work a return hook in the network core is introduced.
      
      The overall purpose is to simplify drivers, by providing a page
      allocation API that does recycling, such that each driver doesn't have
      to reinvent its own recycling scheme. Using page_pool in a driver
      does not require implementing XDP support, but it makes it trivially
      easy to do so. Instead of allocating buffers specifically for SKBs
      we now allocate a generic buffer and either wrap it on an SKB
      (via build_skb) or create an XDP frame.
      The recycling code leverages the XDP recycle APIs.
      
      The Marvell mvpp2 and mvneta drivers are used in this patchset to
      demonstrate how to use the API, and tested on a MacchiatoBIN
      and EspressoBIN boards respectively.
      
      Please let this going in on a future -rc1 so to allow enough time
      to have wider tests.
      
      v7 -> v8:
      - use page->lru.next instead of page->index for pfmemalloc
      - remove conditional include
      - rework page_pool_return_skb_page() so to have less conversions
        between page and addresses, and call compound_head() only once
      - move some code from skb_free_head() to a new helper skb_pp_recycle()
      - misc fixes
      
      v6 -> v7:
      - refresh patches against net-next
      - remove a redundant call to virt_to_head_page()
      - update mvneta benchmarks
      
      v5 -> v6:
      - preserve pfmemalloc bit when setting signature
      - fix typo in mvneta
      - rebase on next-next with the new cache
      - don't clear the skb->pp_recycle in pskb_expand_head()
      
      v4 -> v5:
      - move the signature so it doesn't alias with page->mapping
      - use an invalid pointer as magic
      - incorporate Matthew Wilcox's changes for pfmemalloc pages
      - move the __skb_frag_unref() changes to a preliminary patch
      - refactor some cpp directives
      - only attempt recycling if skb->head_frag
      - clear skb->pp_recycle in pskb_expand_head()
      
      v3 -> v4:
      - store a pointer to page_pool instead of xdp_mem_info
      - drop a patch which reduces xdp_mem_info size
      - do the recycling in the page_pool code instead of xdp_return
      - remove some unused headers include
      - remove some useless forward declaration
      
      v2 -> v3:
      - added missing SOBs
      - CCed the MM people
      
      v1 -> v2:
      - fix a commit message
      - avoid setting pp_recycle multiple times on mvneta
      - squash two patches to avoid breaking bisect
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc8cf755
    • Matteo Croce's avatar
      mvneta: recycle buffers · e4017570
      Matteo Croce authored
      Use the new recycling API for page_pool.
      In a drop rate test, the packet rate increased by 10%,
      from 296 Kpps to 326 Kpps.
      
      perf top on a stock system shows:
      
      Overhead  Shared Object     Symbol
        23.66%  [kernel]          [k] __pi___inval_dcache_area
        22.85%  [mvneta]          [k] mvneta_rx_swbm
         7.54%  [kernel]          [k] kmem_cache_alloc
         6.49%  [kernel]          [k] eth_type_trans
         3.94%  [kernel]          [k] dev_gro_receive
         3.91%  [kernel]          [k] __netif_receive_skb_core
         3.91%  [kernel]          [k] kmem_cache_free
         3.76%  [kernel]          [k] page_pool_release_page
         3.56%  [kernel]          [k] free_unref_page
         2.40%  [kernel]          [k] build_skb
         1.49%  [kernel]          [k] skb_release_data
         1.45%  [kernel]          [k] __alloc_pages_bulk
         1.30%  [kernel]          [k] page_frag_free
      
      And this is the same output with recycling enabled:
      
      Overhead  Shared Object     Symbol
        26.41%  [kernel]          [k] __pi___inval_dcache_area
        25.00%  [mvneta]          [k] mvneta_rx_swbm
         8.14%  [kernel]          [k] kmem_cache_alloc
         6.84%  [kernel]          [k] eth_type_trans
         4.44%  [kernel]          [k] __netif_receive_skb_core
         4.38%  [kernel]          [k] kmem_cache_free
         4.16%  [kernel]          [k] dev_gro_receive
         3.21%  [kernel]          [k] page_pool_put_page
         2.41%  [kernel]          [k] build_skb
         1.82%  [kernel]          [k] skb_release_data
         1.61%  [kernel]          [k] napi_gro_receive
         1.25%  [kernel]          [k] page_pool_refill_alloc_cache
         1.16%  [kernel]          [k] __netif_receive_skb_list_core
      
      We can see that page_pool_release_page(), free_unref_page() and
      __alloc_pages_bulk() are no longer on top of the list when receiving
      traffic.
      
      The test was done with mausezahn on the TX side with 64 byte raw
      ethernet frames.
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4017570
    • Matteo Croce's avatar
      mvpp2: recycle buffers · 133637fc
      Matteo Croce authored
      Use the new recycling API for page_pool.
      In a drop rate test, the packet rate is almost doubled,
      from 1110 Kpps to 2128 Kpps.
      
      perf top on a stock system shows:
      
      Overhead  Shared Object     Symbol
        34.88%  [kernel]          [k] page_pool_release_page
         8.06%  [kernel]          [k] free_unref_page
         6.42%  [mvpp2]           [k] mvpp2_rx
         6.07%  [kernel]          [k] eth_type_trans
         5.18%  [kernel]          [k] __netif_receive_skb_core
         4.95%  [kernel]          [k] build_skb
         4.88%  [kernel]          [k] kmem_cache_free
         3.97%  [kernel]          [k] kmem_cache_alloc
         3.45%  [kernel]          [k] dev_gro_receive
         2.73%  [kernel]          [k] page_frag_free
         2.07%  [kernel]          [k] __alloc_pages_bulk
         1.99%  [kernel]          [k] arch_local_irq_save
         1.84%  [kernel]          [k] skb_release_data
         1.20%  [kernel]          [k] netif_receive_skb_list_internal
      
      With packet rate stable at 1100 Kpps:
      
      tx: 0 bps 0 pps rx: 532.7 Mbps 1110 Kpps
      tx: 0 bps 0 pps rx: 532.6 Mbps 1110 Kpps
      tx: 0 bps 0 pps rx: 532.4 Mbps 1109 Kpps
      tx: 0 bps 0 pps rx: 532.1 Mbps 1109 Kpps
      tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps
      tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps
      
      And this is the same output with recycling enabled:
      
      Overhead  Shared Object     Symbol
        12.91%  [kernel]          [k] eth_type_trans
        12.54%  [mvpp2]           [k] mvpp2_rx
         9.67%  [kernel]          [k] build_skb
         9.63%  [kernel]          [k] __netif_receive_skb_core
         8.44%  [kernel]          [k] page_pool_put_page
         8.07%  [kernel]          [k] kmem_cache_free
         7.79%  [kernel]          [k] kmem_cache_alloc
         6.86%  [kernel]          [k] dev_gro_receive
         3.19%  [kernel]          [k] skb_release_data
         2.41%  [kernel]          [k] netif_receive_skb_list_internal
         2.18%  [kernel]          [k] page_pool_refill_alloc_cache
         1.76%  [kernel]          [k] napi_gro_receive
         1.61%  [kernel]          [k] kfree_skb
         1.20%  [kernel]          [k] dma_sync_single_for_device
         1.16%  [mvpp2]           [k] mvpp2_poll
         1.12%  [mvpp2]           [k] mvpp2_read
      
      With packet rate above 2100 Kpps:
      
      tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
      tx: 0 bps 0 pps rx: 1021 Mbps 2127 Kpps
      tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
      tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
      tx: 0 bps 0 pps rx: 1022 Mbps 2128 Kpps
      tx: 0 bps 0 pps rx: 1022 Mbps 2129 Kpps
      
      The major performance increase is explained by the fact that the most CPU
      consuming functions (page_pool_release_page, page_frag_free and
      free_unref_page) are no longer called on a per packet basis.
      
      The test was done by sending to the macchiatobin 64 byte ethernet frames
      with an invalid ethertype, so the packets are dropped early in the RX path.
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      133637fc
    • Ilias Apalodimas's avatar
      page_pool: Allow drivers to hint on SKB recycling · 6a5bcd84
      Ilias Apalodimas authored
      Up to now several high speed NICs have custom mechanisms of recycling
      the allocated memory they use for their payloads.
      Our page_pool API already has recycling capabilities that are always
      used when we are running in 'XDP mode'. So let's tweak the API and the
      kernel network stack slightly and allow the recycling to happen even
      during the standard operation.
      The API doesn't take into account 'split page' policies used by those
      drivers currently, but can be extended once we have users for that.
      
      The idea is to be able to intercept the packet on skb_release_data().
      If it's a buffer coming from our page_pool API recycle it back to the
      pool for further usage or just release the packet entirely.
      
      To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
      a field in struct page (page->pp) to store the page_pool pointer.
      Storing the information in page->pp allows us to recycle both SKBs and
      their fragments.
      We could have skipped the skb bit entirely, since identical information
      can bederived from struct page. However, in an effort to affect the free path
      as less as possible, reading a single bit in the skb which is already
      in cache, is better that trying to derive identical information for the
      page stored data.
      
      The driver or page_pool has to take care of the sync operations on it's own
      during the buffer recycling since the buffer is, after opting-in to the
      recycling, never unmapped.
      
      Since the gain on the drivers depends on the architecture, we are not
      enabling recycling by default if the page_pool API is used on a driver.
      In order to enable recycling the driver must call skb_mark_for_recycle()
      to store the information we need for recycling in page->pp and
      enabling the recycling bit, or page_pool_store_mem_info() for a fragment.
      Co-developed-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Co-developed-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a5bcd84
    • Matteo Croce's avatar
      skbuff: add a parameter to __skb_frag_unref · c420c989
      Matteo Croce authored
      This is a prerequisite patch, the next one is enabling recycling of
      skbs and fragments. Add an extra argument on __skb_frag_unref() to
      handle recycling, and update the current users of the function with that.
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c420c989
    • Matteo Croce's avatar
      mm: add a signature in struct page · c07aea3e
      Matteo Croce authored
      This is needed by the page_pool to avoid recycling a page not allocated
      via page_pool.
      
      The page->signature field is aliased to page->lru.next and
      page->compound_head, but it can't be set by mistake because the
      signature value is a bad pointer, and can't trigger a false positive
      in PageTail() because the last bit is 0.
      Co-developed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c07aea3e
    • Yang Yingliang's avatar
      net: moxa: Use devm_platform_get_and_ioremap_resource() · 35cba15a
      Yang Yingliang authored
      Use devm_platform_get_and_ioremap_resource() to simplify
      code and avoid a null-ptr-deref by checking 'res' in it.
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35cba15a
    • Zheng Yongjun's avatar
      l2tp: Fix spelling mistakes · 7f553ff2
      Zheng Yongjun authored
      Fix some spelling mistakes in comments:
      negociated  ==> negotiated
      dont  ==> don't
      Signed-off-by: default avatarZheng Yongjun <zhengyongjun3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f553ff2
    • Zheng Yongjun's avatar
      net/ncsi: Fix spelling mistakes · 4fb3ebbf
      Zheng Yongjun authored
      Fix some spelling mistakes in comments:
      constuct  ==> construct
      chanels  ==> channels
      Detination  ==> Destination
      Signed-off-by: default avatarZheng Yongjun <zhengyongjun3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fb3ebbf
    • Zheng Yongjun's avatar
      ipv4: Fix spelling mistakes · 974d8f86
      Zheng Yongjun authored
      Fix some spelling mistakes in comments:
      Dont  ==> Don't
      timout  ==> timeout
      incomming  ==> incoming
      necesarry  ==> necessary
      substract  ==> subtract
      Signed-off-by: default avatarZheng Yongjun <zhengyongjun3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      974d8f86
    • Zheng Yongjun's avatar
      netlabel: Fix spelling mistakes · 84a57ae9
      Zheng Yongjun authored
      Fix some spelling mistakes in comments:
      Interate  ==> Iterate
      sucess  ==> success
      Signed-off-by: default avatarZheng Yongjun <zhengyongjun3@huawei.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84a57ae9