1. 17 Nov, 2020 25 commits
  2. 16 Nov, 2020 15 commits
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-improve-multiple-xmit-streams-support' · 72308ecb
      Jakub Kicinski authored
      Paolo Abeni says:
      
      ====================
      mptcp: improve multiple xmit streams support
      
      This series improves MPTCP handling of multiple concurrent
      xmit streams.
      
      The to-be-transmitted data is enqueued to a subflow only when
      the send window is open, keeping the subflows xmit queue shorter
      and allowing for faster switch-over.
      
      The above requires a more accurate msk socket state tracking
      and some additional infrastructure to allow pushing the data
      pending in the msk xmit queue as soon as the MPTCP's send window
      opens (patches 6-10).
      
      As a side effect, the MPTCP socket could enqueue data to subflows
      after close() time - to completely spooling the data sitting in the
      msk xmit queue. Dealing with the requires some infrastructure and
      core TCP changes (patches 1-5)
      
      Finally, patches 11-12 introduce a more accurate tracking of the other
      end's receive window.
      
      Overall this refactor the MPTCP xmit path, without introducing
      new features - the new code is covered by the existing self-tests.
      
      v2 -> v3:
       - rebased,
       - fixed checkpatch issue in patch 1/13
       - fixed some state tracking issues in patch 8/13
      
      v1 -> v2:
       - this is just a repost, to cope with patchwork issues, no changes
         at all
      ====================
      
      Link: https://lore.kernel.org/r/cover.1605458224.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72308ecb
    • Paolo Abeni's avatar
      mptcp: send explicit ack on delayed ack_seq incr · 7ed90803
      Paolo Abeni authored
      When the worker moves some bytes from the OoO queue into
      the receive queue, the msk->ask_seq is updated, the MPTCP-level
      ack carrying that value needs to wait the next ingress packet,
      possibly slowing down or hanging the peer
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7ed90803
    • Florian Westphal's avatar
      mptcp: keep track of advertised windows right edge · 6f8a612a
      Florian Westphal authored
      Before sending 'x' new bytes also check that the new snd_una would
      be within the permitted receive window.
      
      For every ACK that also contains a DSS ack, check whether its tcp-level
      receive window would advance the current mptcp window right edge and
      update it if so.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Co-developed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6f8a612a
    • Florian Westphal's avatar
      mptcp: rework poll+nospace handling · 8edf0864
      Florian Westphal authored
      MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at
      least one subflow and the mptcp socket itself are writeable.
      
      mptcp_poll returns EPOLLOUT if the bit is set.
      
      mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write
      has used up all subflows or the mptcp socket wmem.
      
      This reworks nospace handling as follows:
      
      MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning.
      This bit is set when the mptcp socket is not writeable.
      The mptcp-level ack path schedule will then schedule the mptcp worker
      to allow it to free already-acked data (and reduce wmem usage).
      
      This will then wake userspace processes that wait for a POLLOUT event.
      
      sendmsg will set MPTCP_NOSPACE only when it has to wait for more
      wmem (blocking I/O case).
      
      poll path will set MPTCP_NOSPACE in case the mptcp socket is
      not writeable.
      
      Normal tcp-level notification (SOCK_NOSPACE) is only enabled
      in case the subflow socket has no available wmem.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8edf0864
    • Paolo Abeni's avatar
      mptcp: try to push pending data on snd una updates · 813e0a68
      Paolo Abeni authored
      After the previous patch we may end-up with unsent data
      in the write buffer. If such buffer is full, the writer
      will block for unlimited time.
      
      We need to trigger the MPTCP xmit path even for the
      subflow rx path, on MPTCP snd_una updates.
      
      Keep things simple and just schedule the work queue if
      needed.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      813e0a68
    • Paolo Abeni's avatar
      mptcp: move page frag allocation in mptcp_sendmsg() · d9ca1de8
      Paolo Abeni authored
      mptcp_sendmsg() is refactored so that first it copies
      the data provided from user space into the send queue,
      and then tries to spool the send queue via sendmsg_frag.
      
      There a subtle change in the mptcp level collapsing on
      consecutive data fragment: we now allow that only on unsent
      data.
      
      The latter don't need to deal with msghdr data anymore
      and can be simplified in a relevant way.
      
      snd_nxt and write_seq are now tracked independently.
      
      Overall this allows some relevant cleanup and will
      allow sending pending mptcp data on msk una update in
      later patch.
      Co-developed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d9ca1de8
    • Paolo Abeni's avatar
      mptcp: refactor shutdown and close · e16163b6
      Paolo Abeni authored
      We must not close the subflows before all the MPTCP level
      data, comprising the DATA_FIN has been acked at the MPTCP
      level, otherwise we could be unable to retransmit as needed.
      
      __mptcp_wr_shutdown() shutdown is responsible to check for the
      correct status and close all subflows. Is called by the output
      path after spooling any data and at shutdown/close time.
      
      In a similar way, __mptcp_destroy_sock() is responsible to clean-up
      the MPTCP level status, and is called when the msk transition
      to TCP_CLOSE.
      
      The protocol level close() does not force anymore the TCP_CLOSE
      status, but orphan the msk socket and all the subflows.
      Orphaned msk sockets are forciby closed after a timeout or
      when all MPTCP-level data is acked.
      
      There is a caveat about keeping the orphaned subflows around:
      the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
      tcp_close(). To prevent accessing freed memory on later MPTCP
      level operations, the msk acquires a reference to each subflow
      socket and prevent subflow_ulp_release() from releasing the
      subflow context before __mptcp_destroy_sock().
      
      The additional subflow references are released by __mptcp_done()
      and the async ULP release is detected checking ULP ops. If such
      field has been already cleared by the ULP release path, the
      dangling context is freed directly by __mptcp_done().
      Co-developed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e16163b6
    • Paolo Abeni's avatar
      mptcp: introduce MPTCP snd_nxt · eaa2ffab
      Paolo Abeni authored
      Track the next MPTCP sequence number used on xmit,
      currently always equal to write_next.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eaa2ffab
    • Paolo Abeni's avatar
      mptcp: add accounting for pending data · f0e6a4cf
      Paolo Abeni authored
      Preparation patch to track the data pending in the msk
      write queue. No functional change introduced here
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f0e6a4cf
    • Paolo Abeni's avatar
      mptcp: reduce the arguments of mptcp_sendmsg_frag · caf971df
      Paolo Abeni authored
      The current argument list is pretty long and quite unreadable,
      move many of them into a specific struct. Later patches
      will add more stuff to such struct.
      
      Additionally drop the 'timeo' argument, now unused.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      caf971df
    • Paolo Abeni's avatar
      mptcp: introduce mptcp_schedule_work · ba8f48f7
      Paolo Abeni authored
      remove some of code duplications an allow preventing
      rescheduling on close.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ba8f48f7
    • Paolo Abeni's avatar
      tcp: factor out __tcp_close() helper · 77c3c956
      Paolo Abeni authored
      unlocked version of protocol level close, will be used by
      MPTCP to allow decouple orphaning and subflow level close.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      77c3c956
    • Paolo Abeni's avatar
      mptcp: use tcp_build_frag() · e2223995
      Paolo Abeni authored
      mptcp_push_pending() is called even on orphaned
      msk (and orphaned subflows), if there is outstanding
      data at close() time.
      
      To cope with the above MPTCP needs to handle explicitly
      the allocation failure on xmit. The newly introduced
      do_tcp_sendfrag() allows that, just plug it.
      
      We can additionally drop a couple of sanity checks,
      duplicate in the TCP code.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e2223995
    • Paolo Abeni's avatar
      tcp: factor out tcp_build_frag() · b796d04b
      Paolo Abeni authored
      Will be needed by the next patch, as MPTCP needs to handle
      directly the error/memory-allocation-needed path.
      
      No functional changes intended.
      
      Additionally let MPTCP code access the tcp_remove_empty_skb()
      helper.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b796d04b
    • Jakub Kicinski's avatar
      Merge branch 'fix-inefficiences-and-rename-nla_strlcpy' · c0a645a7
      Jakub Kicinski authored
      Francis Laniel says:
      
      ====================
      Fix inefficiences and rename nla_strlcpy
      
      This patch set answers to first three issues listed in:
      https://github.com/KSPP/linux/issues/110
      
      To sum up, the patch contributions are the following:
      1. the first patch fixes an inefficiency where some bytes in dst were written
      twice, one with 0 the other with src content.
      2. The second one modifies nla_strlcpy to return the same value as strscpy,
      i.e. number of bytes written or -E2BIG if src was truncated.
      It also modifies code that calls nla_strlcpy and checks for its return value.
      3. The third renames nla_strlcpy to nla_strscpy.
      
      Unfortunately, I did not find how to create struct nlattr objects so I tested
      my modifications on simple char* and with GDB using tc to get to
      tcf_proto_check_kind.
      ====================
      
      Link: https://lore.kernel.org/r/20201115170806.3578-1-laniel_francis@privacyrequired.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0a645a7