• Linus Torvalds's avatar
    Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux · 9961a785
    Linus Torvalds authored
    Pull io_uring updates from Jens Axboe:
    
     - Greatly improve send zerocopy performance, by enabling coalescing of
       sent buffers.
    
       MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the
       io_uring side did not. In local testing, the crossover point for send
       zerocopy being faster is now around 3000 byte packets, and it
       performs better than the sync syscall variants as well.
    
       This feature relies on a shared branch with net-next, which was
       pulled into both branches.
    
     - Unification of how async preparation is done across opcodes.
    
       Previously, opcodes that required extra memory for async retry would
       allocate that as needed, using on-stack state until that was the
       case. If async retry was needed, the on-stack state was adjusted
       appropriately for a retry and then copied to the allocated memory.
    
       This led to some fragile and ugly code, particularly for read/write
       handling, and made storage retries more difficult than they needed to
       be. Allocate the memory upfront, as it's cheap from our pools, and
       use that state consistently both initially and also from the retry
       side.
    
     - Move away from using remap_pfn_range() for mapping the rings.
    
       This is really not the right interface to use and can cause lifetime
       issues or leaks. Additionally, it means the ring sq/cq arrays need to
       be physically contigious, which can cause problems in production with
       larger rings when services are restarted, as memory can be very
       fragmented at that point.
    
       Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply
       the same treatment to mapped ring provided buffers. This also helps
       unify the code we have dealing with allocating and mapping memory.
    
       Hard to see in the diffstat as we're adding a few features as well,
       but this kills about ~400 lines of code from the codebase as well.
    
     - Add support for bundles for send/recv.
    
       When used with provided buffers, bundles support sending or receiving
       more than one buffer at the time, improving the efficiency by only
       needing to call into the networking stack once for multiple sends or
       receives.
    
     - Tweaks for our accept operations, supporting both a DONTWAIT flag for
       skipping poll arm and retry if we can, and a POLLFIRST flag that the
       application can use to skip the initial accept attempt and rely
       purely on poll for triggering the operation. Both of these have
       identical flags on the receive side already.
    
     - Make the task_work ctx locking unconditional.
    
       We had various code paths here that would do a mix of lock/trylock
       and set the task_work state to whether or not it was locked. All of
       that goes away, we lock it unconditionally and get rid of the state
       flag indicating whether it's locked or not.
    
       The state struct still exists as an empty type, can go away in the
       future.
    
     - Add support for specifying NOP completion values, allowing it to be
       used for error handling testing.
    
     - Use set/test bit for io-wq worker flags. Not strictly needed, but
       also doesn't hurt and helps silence a KCSAN warning.
    
     - Cleanups for io-wq locking and work assignments, closing a tiny race
       where cancelations would not be able to find the work item reliably.
    
     - Misc fixes, cleanups, and improvements
    
    * tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits)
      io_uring: support to inject result for NOP
      io_uring: fail NOP if non-zero op flags is passed in
      io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
      io_uring/net: add IORING_ACCEPT_DONTWAIT flag
      io_uring/filetable: don't unnecessarily clear/reset bitmap
      io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
      io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
      io_uring: Require zeroed sqe->len on provided-buffers send
      io_uring/notif: disable LAZY_WAKE for linked notifs
      io_uring/net: fix sendzc lazy wake polling
      io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
      io_uring/rw: reinstate thread check for retries
      io_uring/notif: implement notification stacking
      io_uring/notif: simplify io_notif_flush()
      net: add callback for setting a ubuf_info to skb
      net: extend ubuf_info callback to ops structure
      io_uring/net: support bundles for recv
      io_uring/net: support bundles for send
      io_uring/kbuf: add helpers for getting/peeking multiple buffers
      io_uring/net: add provided buffer support for IORING_OP_SEND
      ...
    9961a785
io_uring.c 101 KB