1. 12 Sep, 2019 1 commit
    • Jens Axboe's avatar
      io_uring: extend async work merging · 6d5d5ac5
      Jens Axboe authored
      We currently merge async work items if we see a strict sequential hit.
      This helps avoid unnecessary workqueue switches when we don't need
      them. We can extend this merging to cover cases where it's not a strict
      sequential hit, but the IO still fits within the same page. If an
      application is doing multiple requests within the same page, we don't
      want separate workers waiting on the same page to complete IO. It's much
      faster to let the first worker bring in the page, then operate on that
      page from the same worker to complete the next request(s).
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6d5d5ac5
  2. 10 Sep, 2019 3 commits
    • Jens Axboe's avatar
      io_uring: limit parallelism of buffered writes · 54a91f3b
      Jens Axboe authored
      All the popular filesystems need to grab the inode lock for buffered
      writes. With io_uring punting buffered writes to async context, we
      observe a lot of contention with all workers hamming this mutex.
      
      For buffered writes, we generally don't need a lot of parallelism on
      the submission side, as the flushing will take care of that for us.
      Hence we don't need a deep queue on the write side, as long as we
      can safely punt from the original submission context.
      
      Add a workqueue with a limit of 2 that we can use for buffered writes.
      This greatly improves the performance and efficiency of higher queue
      depth buffered async writes with io_uring.
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      54a91f3b
    • Jens Axboe's avatar
      io_uring: add io_queue_async_work() helper · 18d9be1a
      Jens Axboe authored
      Add a helper for queueing a request for async execution, in preparation
      for optimizing it.
      
      No functional change in this patch.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      18d9be1a
    • Jens Axboe's avatar
      io_uring: optimize submit_and_wait API · c5766668
      Jens Axboe authored
      For some applications that end up using a submit-and-wait type of
      approach for certain batches of IO, we can make that a bit more
      efficient by allowing the application to block for the last IO
      submission. This prevents an async when we don't need it, as the
      application will be blocking for the completion event(s) anyway.
      
      Typical use cases are using the liburing
      io_uring_submit_and_wait() API, or just using io_uring_enter()
      doing both submissions and completions. As a specific example,
      RocksDB doing MultiGet() is sped up quite a bit with this
      change.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c5766668
  3. 09 Sep, 2019 2 commits
    • Jackie Liu's avatar
      io_uring: add support for link with drain · 4fe2c963
      Jackie Liu authored
      To support the link with drain, we need to do two parts.
      
      There is an sqes:
      
          0     1     2     3     4     5     6
       +-----+-----+-----+-----+-----+-----+-----+
       |  N  |  L  |  L  | L+D |  N  |  N  |  N  |
       +-----+-----+-----+-----+-----+-----+-----+
      
      First, we need to ensure that the io before the link is completed,
      there is a easy way is set drain flag to the link list's head, so
      all subsequent io will be inserted into the defer_list.
      
      	+-----+
          (0) |  N  |
      	+-----+
                 |          (2)         (3)         (4)
      	+-----+     +-----+     +-----+     +-----+
          (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
      	+-----+     +-----+     +-----+     +-----+
                 |
      	+-----+
          (5) |  N  |
      	+-----+
                 |
      	+-----+
          (6) |  N  |
      	+-----+
      
      Second, ensure that the following IO will not be completed first,
      an easy way is to create a mirror of drain io and insert it into
      defer_list, in this way, as long as drain io is not processed, the
      following io in the defer_list will not be actively process.
      
      	+-----+
          (0) |  N  |
      	+-----+
                 |          (2)         (3)         (4)
      	+-----+     +-----+     +-----+     +-----+
          (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
      	+-----+     +-----+     +-----+     +-----+
                 |
      	+-----+
         ('3) |  D  |   <== This is a shadow of (3)
      	+-----+
                 |
      	+-----+
          (5) |  N  |
      	+-----+
                 |
      	+-----+
          (6) |  N  |
      	+-----+
      Signed-off-by: default avatarJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4fe2c963
    • Jackie Liu's avatar
      io_uring: fix wrong sequence setting logic · 8776f3fa
      Jackie Liu authored
      Sqo_thread will get sqring in batches, which will cause
      ctx->cached_sq_head to be added in batches. if one of these
      sqes is set with the DRAIN flag, then he will never get a
      chance to process, and finally sqo_thread will not exit.
      
      Fixes: de0617e4 ("io_uring: add support for marking commands as draining")
      Signed-off-by: default avatarJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8776f3fa
  4. 06 Sep, 2019 1 commit
    • Jens Axboe's avatar
      io_uring: expose single mmap capability · ac90f249
      Jens Axboe authored
      After commit 75b28aff we can get by with just a single mmap to
      map both the sq and cq ring. However, userspace doesn't know that.
      
      Add a features variable to io_uring_params, and notify userspace
      that the kernel has this ability. This can then be used in liburing
      (or in applications directly) to avoid the second mmap.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ac90f249
  5. 27 Aug, 2019 2 commits
  6. 25 Aug, 2019 22 commits
  7. 24 Aug, 2019 8 commits
  8. 23 Aug, 2019 1 commit
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 9140d8bd
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "No beating around the bush: this is a monster pull request for an -rc5
        kernel. Intel hit me with a series of fixes for TID processing.
        Mellanox hit me with a series for their UMR memory support.
      
        And we had one fix for siw that fixes the 32bit build warnings and
        because of the number of casts that had to be changed to properly
        silence the warnings, that one patch alone is a full 40% of the LOC of
        this entire pull request. Given that this is the initial release
        kernel for siw, I'm trying to fix anything in it that we can, so that
        adds to the impetus to take fixes for it like this one.
      
        I had to do a rebase early in the week. Jason had thought he put a
        patch on the rc queue that he needed to be there so he could base some
        work off of it, and it had actually not been placed there. So he asked
        me (on Tuesday) to fix that up before pushing my wip branch to the
        official rc branch. I did, and that's why the early patches look like
        they were all committed at the same time on Tuesday. That bunch had
        been in my queue prior.
      
        The various patches all pass my test for being legitimate fixes and
        not attempts to slide new features or development into a late rc.
        Well, they were all fixes with the exception of a couple clean up
        patches people wrote for making the fixes they also wrote better (like
        a cleanup patch to move UMR checking into a function so that the
        remaining UMR fix patches can reference that function), so I left
        those in place too.
      
        My apologies for the LOC count and the number of patches here, it's
        just how the cards fell this cycle.
      
        Summary:
      
         - Fix siw buffer mapping issue
      
         - Fix siw 32/64 casting issues
      
         - Fix a KASAN access issue in bnxt_re
      
         - Fix several memory leaks (hfi1, mlx4)
      
         - Fix a NULL deref in cma_cleanup
      
         - Fixes for UMR memory support in mlx5 (4 patch series)
      
         - Fix namespace check for restrack
      
         - Fixes for counter support
      
         - Fixes for hfi1 TID processing (5 patch series)
      
         - Fix potential NULL deref in siw
      
         - Fix memory page calculations in mlx5"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (21 commits)
        RDMA/siw: Fix 64/32bit pointer inconsistency
        RDMA/siw: Fix SGL mapping issues
        RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message
        infiniband: hfi1: fix memory leaks
        infiniband: hfi1: fix a memory leak bug
        IB/mlx4: Fix memory leaks
        RDMA/cma: fix null-ptr-deref Read in cma_cleanup
        IB/mlx5: Block MR WR if UMR is not possible
        IB/mlx5: Fix MR re-registration flow to use UMR properly
        IB/mlx5: Report and handle ODP support properly
        IB/mlx5: Consolidate use_umr checks into single function
        RDMA/restrack: Rewrite PID namespace check to be reliable
        RDMA/counters: Properly implement PID checks
        IB/core: Fix NULL pointer dereference when bind QP to counter
        IB/hfi1: Drop stale TID RDMA packets that cause TIDErr
        IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet
        IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet
        IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet
        IB/hfi1: Drop stale TID RDMA packets
        RDMA/siw: Fix potential NULL de-ref
        ...
      9140d8bd