1. 08 Nov, 2022 21 commits
    • David Howells's avatar
      rxrpc: Don't use a ring buffer for call Tx queue · a4ea4c47
      David Howells authored
      Change the way the Tx queueing works to make the following ends easier to
      achieve:
      
       (1) The filling of packets, the encryption of packets and the transmission
           of packets can be handled in parallel by separate threads, rather than
           rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
           packet before moving onto the next one.
      
       (2) Get rid of the fixed-size ring which sets a hard limit on the number
           of packets that can be retained in the ring.  This allows the number
           of packets to increase without having to allocate a very large ring or
           having variable-sized rings.
      
           [Note: the downside of this is that it's then less efficient to locate
           a packet for retransmission as we then have to step through a list and
           examine each buffer in the list.]
      
       (3) Allow the filler/encrypter to run ahead of the transmission window.
      
       (4) Make it easier to do zero copy UDP from the packet buffers.
      
       (5) Make it easier to do zero copy from userspace to the packet buffers -
           and thence to UDP (only if for unauthenticated connections).
      
      To that end, the following changes are made:
      
       (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
           to be transmitted in.  This allows them to be placed on multiple
           queues simultaneously.  An sk_buff isn't really necessary as it's
           never passed on to lower-level networking code.
      
       (2) Keep the transmissable packets in a linked list on the call struct
           rather than in a ring.  As a consequence, the annotation buffer isn't
           used either; rather a flag is set on the packet to indicate ackedness.
      
       (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
           transmitted has been queued.  Add RXRPC_CALL_TX_ALL_ACKED to indicate
           that all packets up to and including the last got hard acked.
      
       (4) Wire headers are now stored in the txbuf rather than being concocted
           on the stack and they're stored immediately before the data, thereby
           allowing zerocopy of a single span.
      
       (5) Don't bother with instant-resend on transmission failure; rather,
           leave it for a timer or an ACK packet to trigger.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      a4ea4c47
    • David Howells's avatar
      rxrpc: Get rid of the Rx ring · 5d7edbc9
      David Howells authored
      Get rid of the Rx ring and replace it with a pair of queues instead.  One
      queue gets the packets that are in-sequence and are ready for processing by
      recvmsg(); the other queue gets the out-of-sequence packets for addition to
      the first queue as the holes get filled.
      
      The annotation ring is removed and replaced with a SACK table.  The SACK
      table has the bits set that correspond exactly to the sequence number of
      the packet being acked.  The SACK ring is copied when an ACK packet is
      being assembled and rotated so that the first ACK is in byte 0.
      
      Flow control handling is altered so that packets that are moved to the
      in-sequence queue are hard-ACK'd even before they're consumed - and then
      the Rx window size in the ACK packet (rsize) is shrunk down to compensate
      (even going to 0 if the window is full).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      5d7edbc9
    • David Howells's avatar
      rxrpc: Clone received jumbo subpackets and queue separately · d4d02d8b
      David Howells authored
      Split up received jumbo packets into separate skbuffs by cloning the
      original skbuff for each subpacket and setting the offset and length of the
      data in that subpacket in the skbuff's private data.  The subpackets are
      then placed on the recvmsg queue separately.  The security class then gets
      to revise the offset and length to remove its metadata.
      
      If we fail to clone a packet, we just drop it and let the peer resend it.
      The original packet gets used for the final subpacket.
      
      This should make it easier to handle parallel decryption of the subpackets.
      It also simplifies the handling of lost or misordered packets in the
      queuing/buffering loop as the possibility of overlapping jumbo packets no
      longer needs to be considered.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      d4d02d8b
    • David Howells's avatar
      rxrpc: Split the rxrpc_recvmsg tracepoint · faf92e8d
      David Howells authored
      Split the rxrpc_recvmsg tracepoint so that the tracepoints that are about
      data packet processing (and which have extra pieces of information) are
      separate from the tracepoint that shows the general flow of recvmsg().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      faf92e8d
    • David Howells's avatar
      rxrpc: Clean up ACK handling · 530403d9
      David Howells authored
      Clean up the rxrpc_propose_ACK() function.  If deferred PING ACK proposal
      is split out, it's only really needed for deferred DELAY ACKs.  All other
      ACKs, bar terminal IDLE ACK are sent immediately.  The deferred IDLE ACK
      submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
      there's nothing to be SACK'd.
      
      Also, because there's a delay between an ACK being generated and being
      transmitted, it's possible that other ACKs of the same type will be
      generated during that interval.  Apart from the ACK time and the serial
      number responded to, most of the ACK body, including window and SACK
      parameters, are not filled out till the point of transmission - so we can
      avoid generating a new ACK if there's one pending that will cover the SACK
      data we need to convey.
      
      Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
      already pending.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      530403d9
    • David Howells's avatar
      rxrpc: Allocate ACK records at proposal and queue for transmission · 72f0c6fb
      David Howells authored
      Allocate rxrpc_txbuf records for ACKs and put onto a queue for the
      transmitter thread to dispatch.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      72f0c6fb
    • David Howells's avatar
      rxrpc: Define rxrpc_txbuf struct to carry data to be transmitted · 02a19356
      David Howells authored
      Define a struct, rxrpc_txbuf, to carry data to be transmitted instead of a
      socket buffer so that it can be placed onto multiple queues at once.  This
      also allows the data buffer to be in the same allocation as the internal
      data.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      02a19356
    • David Howells's avatar
      rxrpc: Remove call->tx_phase · a11e6ff9
      David Howells authored
      Remove call->tx_phase as it's only ever set.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      a11e6ff9
    • David Howells's avatar
      rxrpc: Remove the flags from the rxrpc_skb tracepoint · 27f699cc
      David Howells authored
      Remove the flags from the rxrpc_skb tracepoint as we're no longer going to
      be using this for the transmission buffers and so marking which are
      transmission buffers isn't going to be necessary.
      
      Note that this also remove the rxrpc skb flag that indicates if this is a
      transmission buffer and so the count is not updated for the moment.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      27f699cc
    • David Howells's avatar
      rxrpc: Remove unnecessary header inclusions · 23b237f3
      David Howells authored
      Remove a bunch of unnecessary header inclusions.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      23b237f3
    • David Howells's avatar
      rxrpc: Call udp_sendmsg() directly · ed472b0c
      David Howells authored
      Call udp_sendmsg() and udpv6_sendmsg() directly rather than calling
      kernel_sendmsg() as the latter assumes we want a kvec-class iterator.
      However, zerocopy explicitly doesn't work with such an iterator.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      ed472b0c
    • David Howells's avatar
      rxrpc: Use the core ICMP/ICMP6 parsers · b6c66c43
      David Howells authored
      Make rxrpc_encap_rcv_err() pass the ICMP/ICMP6 skbuff to ip_icmp_error() or
      ipv6_icmp_error() as appropriate to do the parsing rather than trying to do
      it in rxrpc.
      
      This pushes an error report onto the UDP socket's error queue and calls
      ->sk_error_report() from which point rxrpc can pick it up.
      
      It would be preferable to steal the packet directly from ip*_icmp_error()
      rather than letting it get queued, but this is probably good enough.
      
      Also note that __udp4_lib_err() calls sk_error_report() twice in some
      cases.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      b6c66c43
    • David Howells's avatar
      net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error() · 42fb06b3
      David Howells authored
      Change the udp encap_err_rcv signature to match ip_icmp_error() and
      ipv6_icmp_error() so that those can be used from the called function and
      export them.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      cc: netdev@vger.kernel.org
      42fb06b3
    • David Howells's avatar
      rxrpc: Fix ack.bufferSize to be 0 when generating an ack · 8889a711
      David Howells authored
      ack.bufferSize should be set to 0 when generating an ack.
      
      Fixes: 8d94aa38 ("rxrpc: Calls shouldn't hold socket refs")
      Reported-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      8889a711
    • David Howells's avatar
      rxrpc: Record stats for why the REQUEST-ACK flag is being set · f7fa5242
      David Howells authored
      Record stats for why the REQUEST-ACK flag is being set.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      f7fa5242
    • David Howells's avatar
      rxrpc: Record statistics about ACK types · f2a676d1
      David Howells authored
      Record statistics about the different types of ACKs that have been
      transmitted and received and the number of ACKs that have been filled out
      and transmitted or that have been skipped.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      f2a676d1
    • David Howells's avatar
      rxrpc: Add stats procfile and DATA packet stats · b0154246
      David Howells authored
      Add a procfile, /proc/net/rxrpc/stats, to display some statistics about
      what rxrpc has been doing.  Writing a blank line to the stats file will
      clear the increment-only counters.  Allocated resource counters don't get
      cleared.
      
      Add some counters to count various things about DATA packets, including the
      number created, transmitted and retransmitted and the number received, the
      number of ACK-requests markings and the number of jumbo packets received.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      b0154246
    • David Howells's avatar
      rxrpc: Track highest acked serial · 589a0c1e
      David Howells authored
      Keep track of the highest DATA serial number that has been acked by the
      peer for future purposes.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      589a0c1e
    • David Howells's avatar
      rxrpc: Split call timer-expiration from call timer-set tracepoint · 334dfbfc
      David Howells authored
      Split the tracepoint for call timer-set to separate out the call
      timer-expiration event
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      334dfbfc
    • David Howells's avatar
      rxrpc: Trace setting of the request-ack flag · 4d843be5
      David Howells authored
      Add a tracepoint to log why the request-ack flag is set on an outgoing DATA
      packet, allowing debugging as to why.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      4d843be5
    • David Howells's avatar
      net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write() · c3d96f69
      David Howells authored
      Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write().
      
      Also provide a fallback for proc_create_net_data_write().
      
      Fixes: 564def71 ("proc: Add a way to make network proc files writable")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      cc: netdev@vger.kernel.org
      c3d96f69
  2. 28 Oct, 2022 19 commits