1. 19 May, 2020 2 commits
  2. 18 May, 2020 3 commits
  3. 17 May, 2020 10 commits
    • John Hubbard's avatar
      rds: convert get_user_pages() --> pin_user_pages() · dbfe7d74
      John Hubbard authored
      This code was using get_user_pages_fast(), in a "Case 2" scenario
      (DMA/RDMA), using the categorization from [1]. That means that it's
      time to convert the get_user_pages_fast() + put_page() calls to
      pin_user_pages_fast() + unpin_user_pages() calls.
      
      There is some helpful background in [2]: basically, this is a small
      part of fixing a long-standing disconnect between pinning pages, and
      file systems' use of those pages.
      
      [1] Documentation/core-api/pin_user_pages.rst
      
      [2] "Explicit pinning of user-space pages":
          https://lwn.net/Articles/807108/
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-rdma@vger.kernel.org
      Cc: rds-devel@oss.oracle.com
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbfe7d74
    • David S. Miller's avatar
      Merge branch 'mptcp-do-not-block-on-subflow-socket' · 9740a7ae
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      mptcp: do not block on subflow socket
      
      This series reworks mptcp_sendmsg logic to avoid blocking on the subflow
      socket.
      
      It does so by removing the wait loop from mptcp_sendmsg_frag helper.
      
      In order to do that, it moves prerequisites that are currently
      handled in mptcp_sendmsg_frag (and cause it to wait until they are
      met, e.g. frag cache refill) into the callers.
      
      The worker can just reschedule in case no subflow socket is ready,
      since it can't wait -- doing so would block other work items and
      doesn't make sense anyway because we should not (re)send data
      in case resources are already low.
      
      The sendmsg path can use the existing wait logic until memory
      becomes available.
      
      Because large send requests can result in multiple mptcp_sendmsg_frag
      calls from mptcp_sendmsg, we may need to restart the socket lookup in
      case subflow can't accept more data or memory is low.
      
      Doing so blocks on the mptcp socket, and existing wait handling
      releases the msk lock while blocking.
      
      Lastly, no need to use GFP_ATOMIC for extension allocation:
      extend __skb_ext_alloc with gfp_t arg instead of hard-coded ATOMIC and
      then relax the allocation constraints for mptcp case: those requests
      occur in process context.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9740a7ae
    • Florian Westphal's avatar
      net: allow __skb_ext_alloc to sleep · 4930f483
      Florian Westphal authored
      mptcp calls this from the transmit side, from process context.
      Allow a sleeping allocation instead of unconditional GFP_ATOMIC.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4930f483
    • Florian Westphal's avatar
      mptcp: remove inner wait loop from mptcp_sendmsg_frag · 5c826443
      Florian Westphal authored
      previous patches made sure we only call into this function
      when these prerequisites are met, so no need to wait on the
      subflow socket anymore.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/7Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c826443
    • Florian Westphal's avatar
      mptcp: fill skb page frag cache outside of mptcp_sendmsg_frag · 17091708
      Florian Westphal authored
      The mptcp_sendmsg_frag helper contains a loop that will wait on the
      subflow sk.
      
      It seems preferrable to only wait in mptcp_sendmsg() when blocking io is
      requested.  mptcp_sendmsg already has such a wait loop that is used when
      no subflow socket is available for transmission.
      
      This is another preparation patch that makes sure we call
      mptcp_sendmsg_frag only if the page frag cache has been refilled.
      
      Followup patch will remove the wait loop from mptcp_sendmsg_frag().
      
      The retransmit worker doesn't need to do this refill as it won't
      transmit new mptcp-level data.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17091708
    • Florian Westphal's avatar
      mptcp: fill skb extension cache outside of mptcp_sendmsg_frag · 149f7c71
      Florian Westphal authored
      The mptcp_sendmsg_frag helper contains a loop that will wait on the
      subflow sk.
      
      It seems preferrable to only wait in mptcp_sendmsg() when blocking io is
      requested.  mptcp_sendmsg already has such a wait loop that is used when
      no subflow socket is available for transmission.
      
      This is a preparation patch that makes sure we call
      mptcp_sendmsg_frag only if a skb extension has been allocated.
      
      Moreover, such allocation currently uses GFP_ATOMIC while it
      could use sleeping allocation instead.
      
      Followup patches will remove the wait loop from mptcp_sendmsg_frag()
      and will allow to do a sleeping allocation for the extension.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      149f7c71
    • Florian Westphal's avatar
      mptcp: avoid blocking in tcp_sendpages · 72511aab
      Florian Westphal authored
      The transmit loop continues to xmit new data until an error is returned
      or all data was transmitted.
      
      For the blocking i/o case, this means that tcp_sendpages() may block on
      the subflow until more space becomes available, i.e. we end up sleeping
      with the mptcp socket lock held.
      
      Instead we should check if a different subflow is ready to be used.
      
      This restarts the subflow sk lookup when the tx operation succeeded
      and the tcp subflow can't accept more data or if tcp_sendpages
      indicates -EAGAIN on a blocking mptcp socket.
      
      In that case we also need to set the NOSPACE bit to make sure we get
      notified once memory becomes available.
      
      In case all subflows are busy, the existing logic will wait until a
      subflow is ready, releasing the mptcp socket lock while doing so.
      
      The mptcp worker already sets DONTWAIT, so no need to make changes there.
      
      v2:
       * set NOSPACE bit
       * add a comment to clarify that mptcp-sk sndbuf limits need to
         be checked as well.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72511aab
    • Florian Westphal's avatar
      mptcp: break and restart in case mptcp sndbuf is full · fb529e62
      Florian Westphal authored
      Its not enough to check for available tcp send space.
      
      We also hold on to transmitted data for mptcp-level retransmits.
      Right now we will send more and more data if the peer can ack data
      at the tcp level fast enough, since that frees up tcp send buffer space.
      
      But we also need to check that data was acked and reclaimed at the mptcp
      level.
      
      Therefore add needed check in mptcp_sendmsg, flush tcp data and
      wait until more mptcp snd space becomes available if we are over the
      limit.  Before we wait for more data, also make sure we start the
      retransmit timer if we ran out of sndbuf space.
      
      Otherwise there is a very small chance that we wait forever:
      
       * receiver is waiting for data
       * sender is blocked because mptcp socket buffer is full
       * at tcp level, all data was acked
       * mptcp-level snd_una was not updated, because last ack
         that acknowledged the last data packet carried an older
         MPTCP-ack.
      
      Restarting the retransmit timer avoids this problem: if TCP
      subflow is idle, data is retransmitted from the RTX queue.
      
      New data will make the peer send a new, updated MPTCP-Ack.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb529e62
    • Florian Westphal's avatar
      mptcp: move common nospace-pattern to a helper · a0e17064
      Florian Westphal authored
      Paolo noticed that ssk_check_wmem() has same pattern, so add/use
      common helper for both places.
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0e17064
    • David Ahern's avatar
      selftests: Drop 'pref medium' in route checks · eb682677
      David Ahern authored
      The 'pref medium' attribute was moved in iproute2 to be near the prefix
      which is where it applies versus after the last nexthop. The nexthop
      tests were updated to drop the string from route checking, but it crept
      in again with the compat tests.
      
      Fixes: 4dddb5be ("selftests: net: add new testcases for nexthop API compat mode sysctl")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb682677
  4. 16 May, 2020 19 commits
  5. 15 May, 2020 6 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2020-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ea6119aa
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2020-05-15
      
      mlx5 core and mlx5e (netdev) updates:
      
      1) Two fixes for release all FW pages support.
      2) Improvement in calculating the send queue stop room on tx
      3) Flow steering auto-groups creation improvements
      4) TC offload fix for Connection tracking with NAT action
      5) IPoIB support for self looback to allow communication between ipoib
      pkey child interfaces on the same host.
      6) DCBNL cleanup to avoid #ifdef DCBNL all over the main mlx5e code
      7) Small and trivial code cleanup
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea6119aa
    • Nathan Chancellor's avatar
      ethernet: ti: am65-cpts: Add missing inline qualifier to stub functions · 2ea46dc6
      Nathan Chancellor authored
      When building with Clang:
      
      In file included from drivers/net/ethernet/ti/am65-cpsw-ethtool.c:15:
      drivers/net/ethernet/ti/am65-cpts.h:58:12: warning: unused function
      'am65_cpts_ns_gettime' [-Wunused-function]
      static s64 am65_cpts_ns_gettime(struct am65_cpts *cpts)
                 ^
      drivers/net/ethernet/ti/am65-cpts.h:63:12: warning: unused function
      'am65_cpts_estf_enable' [-Wunused-function]
      static int am65_cpts_estf_enable(struct am65_cpts *cpts,
                 ^
      drivers/net/ethernet/ti/am65-cpts.h:69:13: warning: unused function
      'am65_cpts_estf_disable' [-Wunused-function]
      static void am65_cpts_estf_disable(struct am65_cpts *cpts, int idx)
                  ^
      3 warnings generated.
      
      These functions need to be marked as inline, which adds __maybe_unused,
      to avoid these warnings, which is the pattern for stub functions.
      
      Fixes: ec008fa2 ("ethernet: ti: am65-cpts: add routines to support taprio offload")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1026Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ea46dc6
    • Tariq Toukan's avatar
      net/mlx5e: Take DCBNL-related definitions into dedicated files · 3f3ab178
      Tariq Toukan authored
      Take DCBNL-related definitions out of the common en.h header,
      Use a dedicated header file for exposing them.
      Some need not to be exposed, use them locally in the .c file.
      Use stubs to eliminate use of CONFIG_MLX5_CORE_EN_DCB in the
      generic control flows.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3f3ab178
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Calculate SQ stop room in a robust way · 5ffb4d85
      Maxim Mikityanskiy authored
      Currently, different formulas are used to estimate the space that may be
      taken by WQEs in the SQ during a single packet transmit. This space is
      called stop room, and it's checked in the end of packet transmit to find
      out if the next packet could overflow the SQ. If it could, the driver
      tells the kernel to stop sending next packets.
      
      Many factors affect the stop room:
      
      1. Padding with NOPs to avoid WQEs spanning over page boundaries.
      
      2. Enabled and disabled offloads (TLS, upcoming MPWQE).
      
      3. The maximum size of a WQE.
      
      The padding is performed before every WQE if it doesn't fit the current
      page.
      
      The current formula assumes that only one padding will be required per
      packet, and it doesn't take into account that the WQEs posted during the
      transmission of a single packet might exceed the page size in very rare
      circumstances. For example, to hit this condition with 4096-byte pages,
      TLS offload will have to interrupt an almost-full MPWQE session, be in
      the resync flow and try to transmit a near to maximum amount of data.
      
      To avoid SQ overflows in such rare cases after MPWQE is added, this
      patch introduces a more robust formula to estimate the stop room. The
      new formula uses the fact that a WQE of size X will not require more
      than X-1 WQEBBs of padding. More exact estimations are possible, but
      they result in much more complex and error-prone code for little gain.
      
      Before this patch, the TLS stop room included space for both INNOVA and
      ConnectX TLS offloads that couldn't run at the same time anyway, so this
      patch accounts only for the active one.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5ffb4d85
    • Erez Shitrit's avatar
      net/mlx5e: IPoIB, Drop multicast packets that this interface sent · 8b46d424
      Erez Shitrit authored
      After enabled loopback packets for IPoIB, we need to drop these packets
      that this HCA has replicated and came back to the same interface that
      sent them.
      
      Fixes: 4c6c615e ("net/mlx5e: IPoIB, Add PKEY child interface nic profile")
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8b46d424
    • Erez Shitrit's avatar
      net/mlx5e: IPoIB, Enable loopback packets for IPoIB interfaces · 80639b19
      Erez Shitrit authored
      Enable loopback of unicast and multicast traffic for IPoIB enhanced
      mode.
      This will allow interfaces with the same pkey to communicate between
      them e.g cloned interfaces that located in different namespaces.
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      80639b19