1. 18 Sep, 2019 1 commit
    • Jens Axboe's avatar
      io_uring: IORING_OP_TIMEOUT support · 5262f567
      Jens Axboe authored
      There's been a few requests for functionality similar to io_getevents()
      and epoll_wait(), where the user can specify a timeout for waiting on
      events. I deliberately did not add support for this through the system
      call initially to avoid overloading the args, but I can see that the use
      cases for this are valid.
      
      This adds support for IORING_OP_TIMEOUT. If a user wants to get woken
      when waiting for events, simply submit one of these timeout commands
      with your wait call (or before). This ensures that the application
      sleeping on the CQ ring waiting for events will get woken. The timeout
      command is passed in as a pointer to a struct timespec. Timeouts are
      relative. The timeout command also includes a way to auto-cancel after
      N events has passed.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5262f567
  2. 19 Sep, 2019 1 commit
  3. 18 Sep, 2019 4 commits
  4. 14 Sep, 2019 1 commit
  5. 12 Sep, 2019 2 commits
    • Jens Axboe's avatar
      io_uring: make sqpoll wakeup possible with getevents · b2a9eada
      Jens Axboe authored
      The way the logic is setup in io_uring_enter() means that you can't wake
      up the SQ poller thread while at the same time waiting (or polling) for
      completions afterwards. There's no reason for that to be the case.
      Reported-by: default avatarLewis Baker <lbaker@fb.com>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b2a9eada
    • Jens Axboe's avatar
      io_uring: extend async work merging · 6d5d5ac5
      Jens Axboe authored
      We currently merge async work items if we see a strict sequential hit.
      This helps avoid unnecessary workqueue switches when we don't need
      them. We can extend this merging to cover cases where it's not a strict
      sequential hit, but the IO still fits within the same page. If an
      application is doing multiple requests within the same page, we don't
      want separate workers waiting on the same page to complete IO. It's much
      faster to let the first worker bring in the page, then operate on that
      page from the same worker to complete the next request(s).
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6d5d5ac5
  6. 10 Sep, 2019 3 commits
    • Jens Axboe's avatar
      io_uring: limit parallelism of buffered writes · 54a91f3b
      Jens Axboe authored
      All the popular filesystems need to grab the inode lock for buffered
      writes. With io_uring punting buffered writes to async context, we
      observe a lot of contention with all workers hamming this mutex.
      
      For buffered writes, we generally don't need a lot of parallelism on
      the submission side, as the flushing will take care of that for us.
      Hence we don't need a deep queue on the write side, as long as we
      can safely punt from the original submission context.
      
      Add a workqueue with a limit of 2 that we can use for buffered writes.
      This greatly improves the performance and efficiency of higher queue
      depth buffered async writes with io_uring.
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      54a91f3b
    • Jens Axboe's avatar
      io_uring: add io_queue_async_work() helper · 18d9be1a
      Jens Axboe authored
      Add a helper for queueing a request for async execution, in preparation
      for optimizing it.
      
      No functional change in this patch.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      18d9be1a
    • Jens Axboe's avatar
      io_uring: optimize submit_and_wait API · c5766668
      Jens Axboe authored
      For some applications that end up using a submit-and-wait type of
      approach for certain batches of IO, we can make that a bit more
      efficient by allowing the application to block for the last IO
      submission. This prevents an async when we don't need it, as the
      application will be blocking for the completion event(s) anyway.
      
      Typical use cases are using the liburing
      io_uring_submit_and_wait() API, or just using io_uring_enter()
      doing both submissions and completions. As a specific example,
      RocksDB doing MultiGet() is sped up quite a bit with this
      change.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c5766668
  7. 09 Sep, 2019 2 commits
    • Jackie Liu's avatar
      io_uring: add support for link with drain · 4fe2c963
      Jackie Liu authored
      To support the link with drain, we need to do two parts.
      
      There is an sqes:
      
          0     1     2     3     4     5     6
       +-----+-----+-----+-----+-----+-----+-----+
       |  N  |  L  |  L  | L+D |  N  |  N  |  N  |
       +-----+-----+-----+-----+-----+-----+-----+
      
      First, we need to ensure that the io before the link is completed,
      there is a easy way is set drain flag to the link list's head, so
      all subsequent io will be inserted into the defer_list.
      
      	+-----+
          (0) |  N  |
      	+-----+
                 |          (2)         (3)         (4)
      	+-----+     +-----+     +-----+     +-----+
          (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
      	+-----+     +-----+     +-----+     +-----+
                 |
      	+-----+
          (5) |  N  |
      	+-----+
                 |
      	+-----+
          (6) |  N  |
      	+-----+
      
      Second, ensure that the following IO will not be completed first,
      an easy way is to create a mirror of drain io and insert it into
      defer_list, in this way, as long as drain io is not processed, the
      following io in the defer_list will not be actively process.
      
      	+-----+
          (0) |  N  |
      	+-----+
                 |          (2)         (3)         (4)
      	+-----+     +-----+     +-----+     +-----+
          (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
      	+-----+     +-----+     +-----+     +-----+
                 |
      	+-----+
         ('3) |  D  |   <== This is a shadow of (3)
      	+-----+
                 |
      	+-----+
          (5) |  N  |
      	+-----+
                 |
      	+-----+
          (6) |  N  |
      	+-----+
      Signed-off-by: default avatarJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4fe2c963
    • Jackie Liu's avatar
      io_uring: fix wrong sequence setting logic · 8776f3fa
      Jackie Liu authored
      Sqo_thread will get sqring in batches, which will cause
      ctx->cached_sq_head to be added in batches. if one of these
      sqes is set with the DRAIN flag, then he will never get a
      chance to process, and finally sqo_thread will not exit.
      
      Fixes: de0617e4 ("io_uring: add support for marking commands as draining")
      Signed-off-by: default avatarJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8776f3fa
  8. 06 Sep, 2019 1 commit
    • Jens Axboe's avatar
      io_uring: expose single mmap capability · ac90f249
      Jens Axboe authored
      After commit 75b28aff we can get by with just a single mmap to
      map both the sq and cq ring. However, userspace doesn't know that.
      
      Add a features variable to io_uring_params, and notify userspace
      that the kernel has this ability. This can then be used in liburing
      (or in applications directly) to avoid the second mmap.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ac90f249
  9. 27 Aug, 2019 2 commits
  10. 25 Aug, 2019 22 commits
  11. 24 Aug, 2019 1 commit