1. 11 Mar, 2020 2 commits
    • Jens Axboe's avatar
      io_uring: dual license io_uring.h uapi header · bbbdeb47
      Jens Axboe authored
      This just syncs the header it with the liburing version, so there's no
      confusion on the license of the header parts.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bbbdeb47
    • Xiaoguang Wang's avatar
      io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled · 32b2244a
      Xiaoguang Wang authored
      When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, applications don't need
      to do io completion events polling again, they can rely on io_sq_thread to do
      polling work, which can reduce cpu usage and uring_lock contention.
      
      I modify fio io_uring engine codes a bit to evaluate the performance:
      static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
                              continue;
                      }
      
      -               if (!o->sqpoll_thread) {
      +               if (o->sqpoll_thread && o->hipri) {
                              r = io_uring_enter(ld, 0, actual_min,
                                                      IORING_ENTER_GETEVENTS);
                              if (r < 0) {
      
      and use "fio  -name=fiotest -filename=/dev/nvme0n1 -iodepth=$depth -thread
      -rw=read -ioengine=io_uring  -hipri=1 -sqthread_poll=1  -direct=1 -bs=4k
      -size=10G -numjobs=1  -time_based -runtime=120"
      
      original codes
      --------------------------------------------------------------------
      iodepth       |        4 |        8 |       16 |       32 |       64
      bw            | 1133MB/s | 1519MB/s | 2090MB/s | 2710MB/s | 3012MB/s
      fio cpu usage |     100% |     100% |     100% |     100% |     100%
      --------------------------------------------------------------------
      
      with patch
      --------------------------------------------------------------------
      iodepth       |        4 |        8 |       16 |       32 |       64
      bw            | 1196MB/s | 1721MB/s | 2351MB/s | 2977MB/s | 3357MB/s
      fio cpu usage |    63.8% |   74.4%% |    81.1% |    83.7% |    82.4%
      --------------------------------------------------------------------
      bw improve    |     5.5% |    13.2% |    12.3% |     9.8% |    11.5%
      --------------------------------------------------------------------
      
      From above test results, we can see that bw has above 5.5%~13%
      improvement, and fio process's cpu usage also drops much. Note this
      won't improve io_sq_thread's cpu usage when SETUP_IOPOLL|SETUP_SQPOLL
      are both enabled, in this case, io_sq_thread always has 100% cpu usage.
      I think this patch will be friendly to applications which will often use
      io_uring_wait_cqe() or similar from liburing.
      Signed-off-by: default avatarXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      32b2244a
  2. 10 Mar, 2020 8 commits
    • YueHaibing's avatar
      io_uring: Fix unused function warnings · 469956e8
      YueHaibing authored
      If CONFIG_NET is not set, gcc warns:
      
      fs/io_uring.c:3110:12: warning: io_setup_async_msg defined but not used [-Wunused-function]
       static int io_setup_async_msg(struct io_kiocb *req,
                  ^~~~~~~~~~~~~~~~~~
      
      There are many funcions wraped by CONFIG_NET, move them
      together to simplify code, also fix this warning.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      
      Minor tweaks.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      469956e8
    • Jens Axboe's avatar
      io_uring: add end-of-bits marker and build time verify it · 84557871
      Jens Axboe authored
      Not easy to tell if we're going over the size of bits we can shove
      in req->flags, so add an end-of-bits marker and a BUILD_BUG_ON()
      check for it.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      84557871
    • Jens Axboe's avatar
      io_uring: provide means of removing buffers · 067524e9
      Jens Axboe authored
      We have IORING_OP_PROVIDE_BUFFERS, but the only way to remove buffers
      is to trigger IO on them. The usual case of shrinking a buffer pool
      would be to just not replenish the buffers when IO completes, and
      instead just free it. But it may be nice to have a way to manually
      remove a number of buffers from a given group, and
      IORING_OP_REMOVE_BUFFERS provides that functionality.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      067524e9
    • Jens Axboe's avatar
      io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG · 52de1fe1
      Jens Axboe authored
      Like IORING_OP_READV, this is limited to supporting just a single
      segment in the iovec passed in.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      52de1fe1
    • Jens Axboe's avatar
      net: abstract out normal and compat msghdr import · 0a384abf
      Jens Axboe authored
      This splits it into two parts, one that imports the message, and one
      that imports the iovec. This allows a caller to only do the first part,
      and import the iovec manually afterwards.
      
      No functional changes in this patch.
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0a384abf
    • Jens Axboe's avatar
      io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_READV · 4d954c25
      Jens Axboe authored
      This adds support for the vectored read. This is limited to supporting
      just 1 segment in the iov, and is provided just for convenience for
      applications that use IORING_OP_READV already.
      
      The iov helpers will be used for IORING_OP_RECVMSG as well.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4d954c25
    • Jens Axboe's avatar
      io_uring: support buffer selection for OP_READ and OP_RECV · bcda7baa
      Jens Axboe authored
      If a server process has tons of pending socket connections, generally
      it uses epoll to wait for activity. When the socket is ready for reading
      (or writing), the task can select a buffer and issue a recv/send on the
      given fd.
      
      Now that we have fast (non-async thread) support, a task can have tons
      of pending reads or writes pending. But that means they need buffers to
      back that data, and if the number of connections is high enough, having
      them preallocated for all possible connections is unfeasible.
      
      With IORING_OP_PROVIDE_BUFFERS, an application can register buffers to
      use for any request. The request then sets IOSQE_BUFFER_SELECT in the
      sqe, and a given group ID in sqe->buf_group. When the fd becomes ready,
      a free buffer from the specified group is selected. If none are
      available, the request is terminated with -ENOBUFS. If successful, the
      CQE on completion will contain the buffer ID chosen in the cqe->flags
      member, encoded as:
      
      	(buffer_id << IORING_CQE_BUFFER_SHIFT) | IORING_CQE_F_BUFFER;
      
      Once a buffer has been consumed by a request, it is no longer available
      and must be registered again with IORING_OP_PROVIDE_BUFFERS.
      
      Requests need to support this feature. For now, IORING_OP_READ and
      IORING_OP_RECV support it. This is checked on SQE submission, a CQE with
      res == -EOPNOTSUPP will be posted if attempted on unsupported requests.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bcda7baa
    • Jens Axboe's avatar
      io_uring: add IORING_OP_PROVIDE_BUFFERS · ddf0322d
      Jens Axboe authored
      IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to
      support passing in an addr/len that is associated with a buffer ID and
      buffer group ID. The group ID is used to index and lookup the buffers,
      while the buffer ID can be used to notify the application which buffer
      in the group was used. The addr passed in is the starting buffer address,
      and length is each buffer length. A number of buffers to add with can be
      specified, in which case addr is incremented by length for each addition,
      and each buffer increments the buffer ID specified.
      
      No validation is done of the buffer ID. If the application provides
      buffers within the same group with identical buffer IDs, then it'll have
      a hard time telling which buffer ID was used. The only restriction is
      that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the
      maximum ID that can be used.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ddf0322d
  3. 04 Mar, 2020 8 commits
  4. 03 Mar, 2020 1 commit
  5. 02 Mar, 2020 21 commits