1. 08 Apr, 2017 24 commits
  2. 07 Apr, 2017 16 commits
    • Scott Bauer's avatar
      block: sed-opal: Tone down all the pr_* to debugs · 591c59d1
      Scott Bauer authored
      Lets not flood the kernel log with messages unless
      the user requests so.
      Signed-off-by: default avatarScott Bauer <scott.bauer@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      591c59d1
    • Bart Van Assche's avatar
      blk-mq: Clarify comments in blk_mq_dispatch_rq_list() · 710c785f
      Bart Van Assche authored
      The blk_mq_dispatch_rq_list() implementation got modified several
      times but the comments in that function were not updated every
      time. Since it is nontrivial what is going on, update the comments
      in blk_mq_dispatch_rq_list().
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      710c785f
    • Bart Van Assche's avatar
      blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list · 705cda97
      Bart Van Assche authored
      Since the next patch in this series will use RCU to iterate over
      tag_list, make this safe. Add lockdep_assert_held() statements
      in functions that iterate over tag_list to make clear that using
      list_for_each_entry() instead of list_for_each_entry_rcu() is
      fine in these functions.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      705cda97
    • Omar Sandoval's avatar
      blk-mq: use true instead of 1 for blk_mq_queue_data.last · d945a365
      Omar Sandoval authored
      Trivial cleanup.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d945a365
    • Omar Sandoval's avatar
      blk-mq: make driver tag failure path easier to follow · 807b1041
      Omar Sandoval authored
      Minor cleanup that makes it easier to figure out what's going on in the
      driver tag allocation failure path of blk_mq_dispatch_rq_list().
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      807b1041
    • Omar Sandoval's avatar
      blk-mq-sched: provide hooks for initializing hardware queue data · ee056f98
      Omar Sandoval authored
      Schedulers need to be informed when a hardware queue is added or removed
      at runtime so they can allocate/free per-hardware queue data. So,
      replace the blk_mq_sched_init_hctx_data() helper, which only makes sense
      at init time, with .init_hctx() and .exit_hctx() hooks.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ee056f98
    • Jens Axboe's avatar
      Merge branch 'for-linus' into for-4.12/block · 65f619d2
      Jens Axboe authored
      We've added a considerable amount of fixes for stalls and issues
      with the blk-mq scheduling in the 4.11 series since forking
      off the for-4.12/block branch. We need to do improvements on
      top of that for 4.12, so pull in the previous fixes to make
      our lives easier going forward.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      65f619d2
    • Bart Van Assche's avatar
      blk-mq: Restart a single queue if tag sets are shared · 6d8c6c0f
      Bart Van Assche authored
      To improve scalability, if hardware queues are shared, restart
      a single hardware queue in round-robin fashion. Rename
      blk_mq_sched_restart_queues() to reflect the new semantics.
      Remove blk_mq_sched_mark_restart_queue() because this function
      has no callers. Remove flag QUEUE_FLAG_RESTART because this
      patch removes the code that uses this flag.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      6d8c6c0f
    • Bart Van Assche's avatar
      dm rq: Avoid that request processing stalls sporadically · 6077c2d7
      Bart Van Assche authored
      While running the srp-test software I noticed that request
      processing stalls sporadically at the beginning of a test, namely
      when mkfs is run against a dm-mpath device. Every time when that
      happened the following command was sufficient to resume request
      processing:
      
          echo run >/sys/kernel/debug/block/dm-0/state
      
      This patch avoids that such request processing stalls occur. The
      test I ran is as follows:
      
          while srp-test/run_tests -d -r 30 -t 02-mq; do :; done
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      6077c2d7
    • Bart Van Assche's avatar
      scsi: Avoid that SCSI queues get stuck · 36e3cf27
      Bart Van Assche authored
      If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
      driver that implements that function is responsible for rerunning the
      hardware queue once requests can be queued again successfully.
      
      commit 52d7f1b5 ("blk-mq: Avoid that requeueing starts stopped
      queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
      for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
      that are intended to rerun a busy queue such that these examine all
      hardware queues instead of only stopped queues.
      
      Since no other functions than scsi_internal_device_block() and
      scsi_internal_device_unblock() should ever stop or restart a SCSI
      queue, change the blk_mq_delay_queue() call into a
      blk_mq_delay_run_hw_queue() call.
      
      Fixes: commit 52d7f1b5 ("blk-mq: Avoid that requeueing starts stopped queues")
      Fixes: commit 7e79dadc ("blk-mq: stop hardware queue in blk_mq_delay_queue()")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Long Li <longli@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      36e3cf27
    • Bart Van Assche's avatar
      blk-mq: Introduce blk_mq_delay_run_hw_queue() · 7587a5ae
      Bart Van Assche authored
      Introduce a function that runs a hardware queue unconditionally
      after a delay. Note: there is already a function that stops and
      restarts a hardware queue after a delay, namely blk_mq_delay_queue().
      
      This function will be used in the next patch in this series.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Long Li <longli@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      7587a5ae
    • NeilBrown's avatar
      block: trace completion of all bios. · fbbaf700
      NeilBrown authored
      Currently only dm and md/raid5 bios trigger
      trace_block_bio_complete().  Now that we have bio_chain() and
      bio_inc_remaining(), it is not possible, in general, for a driver to
      know when the bio is really complete.  Only bio_endio() knows that.
      
      So move the trace_block_bio_complete() call to bio_endio().
      
      Now trace_block_bio_complete() pairs with trace_block_bio_queue().
      Any bio for which a 'queue' event is traced, will subsequently
      generate a 'complete' event.
      
      There are a few cases where completion tracing is not wanted.
      1/ If blk_update_request() has already generated a completion
         trace event at the 'request' level, there is no point generating
         one at the bio level too.  In this case the bi_sector and bi_size
         will have changed, so the bio level event would be wrong
      
      2/ If the bio hasn't actually been queued yet, but is being aborted
         early, then a trace event could be confusing.  Some filesystems
         call bio_endio() but do not want tracing.
      
      3/ The bio_integrity code interposes itself by replacing bi_end_io,
         then restoring it and calling bio_endio() again.  This would produce
         two identical trace events if left like that.
      
      To handle these, we introduce a flag BIO_TRACE_COMPLETION and only
      produce the trace event when this is set.
      We address point 1 above by clearing the flag in blk_update_request().
      We address point 2 above by only setting the flag when
      generic_make_request() is called.
      We address point 3 above by clearing the flag after generating a
      completion event.
      
      When bio_split() is used on a bio, particularly in blk_queue_split(),
      there is an extra complication.  A new bio is split off the front, and
      may be handle directly without going through generic_make_request().
      The old bio, which has been advanced, is passed to
      generic_make_request(), so it will trigger a trace event a second
      time.
      Probably the best result when a split happens is to see a single
      'queue' event for the whole bio, then multiple 'complete' events - one
      for each component.  To achieve this was can:
      - copy the BIO_TRACE_COMPLETION flag to the new bio in bio_split()
      - avoid generating a 'queue' event if BIO_TRACE_COMPLETION is already set.
      This way, the split-off bio won't create a queue event, the original
      won't either even if it re-submitted to generic_make_request(),
      but both will produce completion events, each for their own range.
      
      So if generic_make_request() is called (which generates a QUEUED
      event), then bi_endio() will create a single COMPLETE event for each
      range that the bio is split into, unless the driver has explicitly
      requested it not to.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      fbbaf700
    • NeilBrown's avatar
      block: simple improvements for bio->flags · dbde775c
      NeilBrown authored
      The comment for the 'flags' field of 'bio' mentions
      "command" which is no longer stored there, and doesn't
      mention the bvec pool number, which is.
      
      BIO_RESET_BITS is set in such a way that it would need to be
      updated if new bits were added, which is easy to miss.
      
      BVEC_POOL_BITS is larger than needed.  The BVEC_POOL_IDX()
      ranges from 0 to 6, so 3 bits are sufficient.
      
      This patch make improvements in each of these areas.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dbde775c
    • Omar Sandoval's avatar
      blk-mq: remap queues when adding/removing hardware queues · ebe8bddb
      Omar Sandoval authored
      blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
      behavior that drivers expect. However, commit 4e68a011 changed
      blk_mq_queue_reinit() to not remap queues for the case of CPU
      hotplugging, inadvertently making blk_mq_update_nr_hw_queues() not remap
      queues as well. This breaks, for example, NBD's multi-connection mode,
      leaving the added hardware queues unused. Fix it by making
      blk_mq_update_nr_hw_queues() explicitly remap the queues.
      
      Fixes: 4e68a011 ("blk-mq: don't redistribute hardware queues on a CPU hotplug event")
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ebe8bddb
    • Omar Sandoval's avatar
      blk-mq-sched: fix crash in switch error path · 54d5329d
      Omar Sandoval authored
      In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
      back to the original scheduler. However, at this point, we've already
      torn down the original scheduler's tags, so this causes a crash. Doing
      the fallback like the legacy elevator path is much harder for mq, so fix
      it by just falling back to none, instead.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      54d5329d
    • Omar Sandoval's avatar
      blk-mq-sched: set up scheduler tags when bringing up new queues · 93252632
      Omar Sandoval authored
      If a new hardware queue is added at runtime, we don't allocate scheduler
      tags for it, leading to a crash. This hooks up the scheduler framework
      to blk_mq_{init,exit}_hctx() to make sure everything gets properly
      initialized/freed.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      93252632