1. 18 Dec, 2018 7 commits
    • Christoph Hellwig's avatar
      aio: separate out ring reservation from req allocation · 432c7997
      Christoph Hellwig authored
      This is in preparation for certain types of IO not needing a ring
      reserveration.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      432c7997
    • Jens Axboe's avatar
      aio: use assigned completion handler · bc9bff61
      Jens Axboe authored
      We know this is a read/write request, but in preparation for
      having different kinds of those, ensure that we call the assigned
      handler instead of assuming it's aio_complete_rq().
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bc9bff61
    • Jens Axboe's avatar
      Merge branch 'for-4.21/block' into for-4.21/aio · 4b925432
      Jens Axboe authored
      * for-4.21/block: (351 commits)
        blk-mq: enable IO poll if .nr_queues of type poll > 0
        blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
        blk-mq: skip zero-queue maps in blk_mq_map_swqueue
        block: fix blk-iolatency accounting underflow
        blk-mq: fix dispatch from sw queue
        block: mq-deadline: Fix write completion handling
        nvme-pci: don't share queue maps
        blk-mq: only dispatch to non-defauly queue maps if they have queues
        blk-mq: export hctx->type in debugfs instead of sysfs
        blk-mq: fix allocation for queue mapping table
        blk-wbt: export internal state via debugfs
        blk-mq-debugfs: support rq_qos
        block: update sysfs documentation
        block: loop: check error using IS_ERR instead of IS_ERR_OR_NULL in loop_add()
        aoe: add __exit annotation
        block: clear REQ_HIPRI if polling is not supported
        blk-mq: replace and kill blk_mq_request_issue_directly
        blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests
        blk-mq: refactor the code of issue request directly
        block: remove the bio_integrity_advance export
        ...
      4b925432
    • Ming Lei's avatar
      blk-mq: enable IO poll if .nr_queues of type poll > 0 · cd19181b
      Ming Lei authored
      The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
      is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
      before enabling IO poll.
      
      Otherwise IO race & timeout can be observed when running block/007.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cd19181b
    • Jens Axboe's avatar
      blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() · 3c94d83c
      Jens Axboe authored
      There's a single user of this function, dm, and dm just wants
      to check if IO is inflight, not that it's just allocated.
      
      This fixes a hang with srp/002 in blktests with dm, where it tries
      to suspend but waits for inflight IO to finish first. As it checks
      for just allocated requests, this fails.
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3c94d83c
    • Mimi Zohar's avatar
      ima: cleanup the match_token policy code · 1a9430db
      Mimi Zohar authored
      Start the policy_tokens and the associated enumeration from zero,
      simplifying the pt macro.
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a9430db
    • Linus Torvalds's avatar
      security: don't use a negative Opt_err token index · 94c13f66
      Linus Torvalds authored
      The code uses a bitmap to check for duplicate tokens during parsing, and
      that doesn't work at all for the negative Opt_err token case.
      
      There is absolutely no reason to make Opt_err be negative, and in fact
      it only confuses things, since some of the affected functions actually
      return a positive Opt_xyz enum _or_ a regular negative error code (eg
      -EINVAL), and using -1 for Opt_err makes no sense.
      
      There are similar problems in ima_policy.c and key encryption, but they
      don't have the immediate bug wrt bitmap handing, and ima_policy.c in
      particular needs a different patch to make the enum values match the
      token array index.  Mimi is sending that separately.
      
      Reported-by: syzbot+a22e0dc07567662c50bc@syzkaller.appspotmail.com
      Reported-by: default avatarEric Biggers <ebiggers@kernel.org>
      Fixes: 5208cc83 ("keys, trusted: fix: *do not* allow duplicate key options")
      Fixes: 00d60fd3 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]")
      Cc: James Morris James Morris <jmorris@namei.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: Peter Huewe <peterhuewe@gmx.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94c13f66
  2. 17 Dec, 2018 11 commits
    • Ming Lei's avatar
      blk-mq: skip zero-queue maps in blk_mq_map_swqueue · e5edd5f2
      Ming Lei authored
      From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping
      table won't be initialized actually if map->nr_queues is zero, so
      we can't use blk_mq_map_queue_type() to retrieve hctx any more.
      
      This way still may cause broken mapping, fix it by skipping zero-queues
      maps in blk_mq_map_swqueue().
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e5edd5f2
    • Dennis Zhou's avatar
      block: fix blk-iolatency accounting underflow · 13369816
      Dennis Zhou authored
      The blk-iolatency controller measures the time from rq_qos_throttle() to
      rq_qos_done_bio() and attributes this time to the first bio that needs
      to create the request. This means if a bio is plug-mergeable or
      bio-mergeable, it gets to bypass the blk-iolatency controller.
      
      The recent series [1], to tag all bios w/ blkgs undermined how iolatency
      was determining which bios it was charging and should process in
      rq_qos_done_bio(). Because all bios are being tagged, this caused the
      atomic_t for the struct rq_wait inflight count to underflow and result
      in a stall.
      
      This patch adds a new flag BIO_TRACKED to let controllers know that a
      bio is going through the rq_qos path. blk-iolatency now checks if this
      flag is set to see if it should process the bio in rq_qos_done_bio().
      
      Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing.
      BIO_THROTTLED was another candidate, but the flag is set for all bios
      that have gone through blk-throttle code. Overloading a flag comes with
      the burden of making sure that when either implementation changes, a
      change in setting rules for one doesn't cause a bug in the other. So
      here, we unfortunately opt for adding a new flag.
      
      [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/
      
      Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      13369816
    • Ming Lei's avatar
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei authored
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c16d6b5a
    • Damien Le Moal's avatar
      block: mq-deadline: Fix write completion handling · 7211aef8
      Damien Le Moal authored
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      
      Add missing export of blk_mq_sched_restart()
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7211aef8
    • Christoph Hellwig's avatar
      nvme-pci: don't share queue maps · 7e849dd9
      Christoph Hellwig authored
      Now that the block layer checks if a queue map has any queues inside
      it there is no more reason to duplicate the maps for the non-default
      types.
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7e849dd9
    • Christoph Hellwig's avatar
      blk-mq: only dispatch to non-defauly queue maps if they have queues · 5aceaeb2
      Christoph Hellwig authored
      We should check if a given queue map actually has queues enabled before
      dispatching to it.  This allows drivers to not initialize optional but
      not used map types, which subsequently will allow fixing problems with
      queue map rebuilds for that case.
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5aceaeb2
    • Ming Lei's avatar
      blk-mq: export hctx->type in debugfs instead of sysfs · 346fc108
      Ming Lei authored
      Now we only export hctx->type via sysfs, and there isn't such info
      in hctx entry under debugfs. We often use debugfs only to diagnose
      queue mapping issue, so add the support in debugfs.
      
      Queue mapping becomes a bit more complicated after multiple queue
      mapping is supported, we may write blktest to verify if queue mapping
      is valid based on blk-mq-debugfs.
      
      Given not necessary to export hctx->type twice, so remove the export
      from sysfs.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      346fc108
    • Ming Lei's avatar
      blk-mq: fix allocation for queue mapping table · 07b35eb5
      Ming Lei authored
      Type of each element in queue mapping table is 'unsigned int,
      intead of 'struct blk_mq_queue_map)', so fix it.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      07b35eb5
    • Ming Lei's avatar
      blk-wbt: export internal state via debugfs · d19afebc
      Ming Lei authored
      This information is helpful to either investigate issues, or understand
      wbt's internal behaviour.
      
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d19afebc
    • Ming Lei's avatar
      blk-mq-debugfs: support rq_qos · cc56694f
      Ming Lei authored
      blk-mq-debugfs has been proved as very helpful for debug some
      tough issues, such as IO hang.
      
      We have seen blk-wbt related IO hang several times, even inside
      Red Hat BZ, there is such report not sovled yet, so this patch
      adds support debugfs on rq_qos.
      
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cc56694f
    • Damien Le Moal's avatar
      block: update sysfs documentation · f9824952
      Damien Le Moal authored
      Add the description of the zoned, nr_zones and chunk_sectors sysfs queue
      attributes to Documentation/block/queue-sysfs.txt. The description of
      the zoned and chunk_sector attributes are mostly copied from
      ABI/testing/sysfs-block (added a typo fix). While at it, also fix a
      typo in the description of the io_poll_delay attribute.
      
      nr_zones description is also added to ABI/testing/sysfs-block and
      contact email address updated for the zoned attribute.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f9824952
  3. 16 Dec, 2018 9 commits
  4. 14 Dec, 2018 13 commits