1. 17 Jan, 2018 1 commit
    • Tina Ruchandani's avatar
      aoe: use ktime_t instead of timeval · 85cf955d
      Tina Ruchandani authored
      'struct frame' uses two variables to store the sent timestamp - 'struct
      timeval' and jiffies. jiffies is used to avoid discrepancies caused by
      updates to system time. 'struct timeval' is deprecated because it uses
      32-bit representation for seconds which will overflow in year 2038.
      
      This patch does the following:
      - Replace the use of 'struct timeval' and jiffies with ktime_t, which
        is the recommended type for timestamping
      - ktime_t provides both long range (like jiffies) and high resolution
        (like timeval). Using ktime_get (monotonic time) instead of wall-clock
        time prevents any discprepancies caused by updates to system time.
      
      [updates by Arnd below]
      The original patch from Tina never went anywhere as we discussed how
      to keep the impact on performance minimal. I've started over now but
      arrived at basically the same patch that she had originally, except for
      an slightly improved tsince_hr() function. I'm making it more robust
      against overflows, and also optimize explicitly for the common case
      in which a frame is less than 4.2 seconds old, using only a 32-bit
      division in that case.
      
      This should make the new version more efficient than the old code,
      since we replace the existing two 32-bit division in do_gettimeofday()
      plus one multiplication with a single single 32-bit division in
      tsince_hr() and drop the double bookkeeping. It's also more efficient
      than the ktime_get_us() API we discussed before, since that would
      also rely on multiple divisions.
      
      Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.htmlSigned-off-by: default avatarTina Ruchandani <ruchandani.tina@gmail.com>
      Cc: Ed Cashin <ed.cashin@acm.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      85cf955d
  2. 16 Jan, 2018 1 commit
    • Arnd Bergmann's avatar
      blkcg: simplify statistic accumulation code · ddc21231
      Arnd Bergmann authored
      Some older compilers (gcc-4.4 through 4.6 in particular) struggle
      with the way that blkg_rwstat_read() returns a structure, leading
      to excessive stack usage and rather inefficient code:
      
      block/blk-cgroup.c: In function 'blkg_destroy':
      block/blk-cgroup.c:354:1: error: the frame size of 1296 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      block/cfq-iosched.c: In function 'cfqg_stats_add_aux':
      block/cfq-iosched.c:753:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      block/bfq-cgroup.c: In function 'bfqg_stats_add_aux':
      block/bfq-cgroup.c:299:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      I also notice that there is no point in using atomic accesses
      for the local variables, so storing the temporaries in simple 'u64'
      variables not only avoids the stack usage on older compilers but
      also improves the object code on modern versions.
      
      Fixes: e6269c44 ("blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it")
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ddc21231
  3. 15 Jan, 2018 5 commits
    • Mike Snitzer's avatar
      dm: fix incomplete request_queue initialization · c100ec49
      Mike Snitzer authored
      DM is no longer prone to having its request_queue be improperly
      initialized.
      
      Summary of changes:
      
      - defer DM's blk_register_queue() from add_disk()-time until
        dm_setup_md_queue() by using add_disk_no_queue_reg() in alloc_dev().
      
      - dm_setup_md_queue() is updated to fully initialize DM's request_queue
        (_after_ all table loads have occurred and the request_queue's type,
        features and limits are known).
      
      A very welcome side-effect of these changes is DM no longer needs to:
      1) backfill the "mq" sysfs entry (because historically DM didn't
      initialize the request_queue to use blk-mq until _after_
      blk_register_queue() was called via add_disk()).
      2) call elv_register_queue() to get .request_fn request-based DM
      device's "iosched" exposed in syfs.
      
      In addition, blk-mq debugfs support is now made available because
      request-based DM's blk-mq request_queue is now properly initialized
      before dm_setup_md_queue() calls blk_register_queue().
      
      These changes also stave off the need to introduce new DM-specific
      workarounds in block core, e.g. this proposal:
      https://patchwork.kernel.org/patch/10067961/
      
      In the end DM devices should be less unicorn in nature (relative to
      initialization and availability of block core infrastructure provided by
      the request_queue).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Tested-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c100ec49
    • Douglas Gilbert's avatar
      blk_rq_map_user_iov: fix error override · 69e0927b
      Douglas Gilbert authored
      During stress tests by syzkaller on the sg driver the block layer
      infrequently returns EINVAL. Closer inspection shows the block
      layer was trying to return ENOMEM (which is much more
      understandable) but for some reason overroad that useful error.
      
      Patch below does not show this (unchanged) line:
         ret =__blk_rq_map_user_iov(rq, map_data, &i, gfp_mask, copy);
      That 'ret' was being overridden when that function failed.
      Signed-off-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      69e0927b
    • Mike Snitzer's avatar
      block: allow gendisk's request_queue registration to be deferred · fa70d2e2
      Mike Snitzer authored
      Since I can remember DM has forced the block layer to allow the
      allocation and initialization of the request_queue to be distinct
      operations.  Reason for this is block/genhd.c:add_disk() has requires
      that the request_queue (and associated bdi) be tied to the gendisk
      before add_disk() is called -- because add_disk() also deals with
      exposing the request_queue via blk_register_queue().
      
      DM's dynamic creation of arbitrary device types (and associated
      request_queue types) requires the DM device's gendisk be available so
      that DM table loads can establish a master/slave relationship with
      subordinate devices that are referenced by loaded DM tables -- using
      bd_link_disk_holder().  But until these DM tables, and their associated
      subordinate devices, are known DM cannot know what type of request_queue
      it needs -- nor what its queue_limits should be.
      
      This chicken and egg scenario has created all manner of problems for DM
      and, at times, the block layer.
      
      Summary of changes:
      
      - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
        that drivers may use to add a disk without also calling
        blk_register_queue().  Driver must call blk_register_queue() once its
        request_queue is fully initialized.
      
      - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
        is not set.  It won't be set if driver used add_disk_no_queue_reg()
        but driver encounters an error and must del_gendisk() before calling
        blk_register_queue().
      
      - Export blk_register_queue().
      
      These changes allow DM to use add_disk_no_queue_reg() to anchor its
      gendisk as the "master" for master/slave relationships DM must establish
      with subordinate devices referenced in DM tables that get loaded.  Once
      all "slave" devices for a DM device are known its request_queue can be
      properly initialized and then advertised via sysfs -- important
      improvement being that no request_queue resource initialization
      performed by blk_register_queue() is missed for DM devices anymore.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fa70d2e2
    • Mike Snitzer's avatar
      block: properly protect the 'queue' kobj in blk_unregister_queue · 667257e8
      Mike Snitzer authored
      The original commit e9a823fb (block: fix warning when I/O elevator
      is changed as request_queue is being removed) is pretty conflated.
      "conflated" because the resource being protected by q->sysfs_lock isn't
      the queue_flags (it is the 'queue' kobj).
      
      q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
      from racing with blk_unregister_queue():
      1) By holding q->sysfs_lock first, __elevator_change() can complete
      before a racing blk_unregister_queue().
      2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
      in case elv_iosched_store() loses the race with blk_unregister_queue(),
      it needs a way to know the 'queue' kobj isn't there.
      
      Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
      held until after the 'queue' kobj is removed.
      
      To do so blk_mq_unregister_dev() must not also take q->sysfs_lock.  So
      rename __blk_mq_unregister_dev() to blk_mq_unregister_dev().
      
      Also, blk_unregister_queue() should use q->queue_lock to protect against
      any concurrent writes to q->queue_flags -- even though chances are the
      queue is being cleaned up so no concurrent writes are likely.
      
      Fixes: e9a823fb ("block: fix warning when I/O elevator is changed as request_queue is being removed")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      667257e8
    • Mike Snitzer's avatar
      block: only bdi_unregister() in del_gendisk() if !GENHD_FL_HIDDEN · bc8d062c
      Mike Snitzer authored
      device_add_disk() will only call bdi_register_owner() if
      !GENHD_FL_HIDDEN, so it follows that del_gendisk() should only call
      bdi_unregister() if !GENHD_FL_HIDDEN.
      
      Found with code inspection.  bdi_unregister() won't do any harm if
      bdi_register_owner() wasn't used but best to avoid the unnecessary
      call to bdi_unregister().
      
      Fixes: 8ddcd653 ("block: introduce GENHD_FL_HIDDEN")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bc8d062c
  4. 14 Jan, 2018 1 commit
  5. 12 Jan, 2018 3 commits
  6. 11 Jan, 2018 2 commits
  7. 10 Jan, 2018 17 commits
  8. 09 Jan, 2018 10 commits
    • Jens Axboe's avatar
      null_blk: wire up timeouts · 5448aca4
      Jens Axboe authored
      This is needed to ensure that we actually handle timeouts.
      Without it, the queue_mode=1 path will never call blk_add_timer(),
      and the queue_mode=2 path will continually just return
      EH_RESET_TIMER and we never actually complete the offending request.
      
      This was used to test the new timeout code, and the changes around
      killing off REQ_ATOM_COMPLETE.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5448aca4
    • Jens Axboe's avatar
      bfq-iosched: don't call bfqg_and_blkg_put for !CONFIG_BFQ_GROUP_IOSCHED · 8abef10b
      Jens Axboe authored
      It's not available if we don't have group io scheduling set, and
      there's no need to call it.
      
      Fixes: 0d52af59 ("block, bfq: release oom-queue ref to root group on exit")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8abef10b
    • Michael Lyle's avatar
      bcache: closures: move control bits one bit right · 3609c471
      Michael Lyle authored
      Otherwise, architectures that do negated adds of atomics (e.g. s390)
      to do atomic_sub fail in closure_set_stopped.
      Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3609c471
    • Bart Van Assche's avatar
      block: Fix kernel-doc warnings reported when building with W=1 · aa98192d
      Bart Van Assche authored
      Commit 3a025e1d ("Add optional check for bad kernel-doc comments")
      causes W=1 the kernel-doc script to be run and thereby causes several
      new warnings to appear when building the kernel with W=1. Fix the
      block layer kernel-doc headers such that the block layer again builds
      cleanly with W=1.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      aa98192d
    • Bart Van Assche's avatar
      blk-mq: Fix spelling in a source code comment · ee3e4de5
      Bart Van Assche authored
      Change "nedeing" into "needing" and "caes" into "cases".
      
      Fixes: commit f906a6a0 ("blk-mq: improve tag waiting setup for non-shared tags")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ee3e4de5
    • Jens Axboe's avatar
      blk-mq: silence false positive warnings in hctx_unlock() · 08b5a6e2
      Jens Axboe authored
      In some stupider versions of gcc, it complains:
      
      block/blk-mq.c: In function ‘blk_mq_complete_request’:
      ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        __srcu_read_unlock(sp, idx);
        ^
      block/blk-mq.c:620:6: note: ‘srcu_idx’ was declared here
        int srcu_idx;
            ^
      
      which is completely bogus, since we only use srcu_idx when
      hctx->flags & BLK_MQ_F_BLOCKING is set, and that's the case where
      hctx_lock() has initialized it.
      
      Just set it to '0' in the normal path in hctx_lock() to silence
      this annoying warning.
      
      Fixes: 04ced159 ("blk-mq: move hctx lock/unlock into a helper")
      Fixes: 5197c05e ("blk-mq: protect completion path with RCU")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      08b5a6e2
    • Tejun Heo's avatar
      blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu · 05707b64
      Tejun Heo authored
      The RCU protection has been expanded to cover both queueing and
      completion paths making ->queue_rq_srcu a misnomer.  Rename it to
      ->srcu as suggested by Bart.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      05707b64
    • Tejun Heo's avatar
      blk-mq: remove REQ_ATOM_STARTED · 5a61c363
      Tejun Heo authored
      After the recent updates to use generation number and state based
      synchronization, we can easily replace REQ_ATOM_STARTED usages by
      adding an extra state to distinguish completed but not yet freed
      state.
      
      Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with
      blk_mq_rq_state() tests.  REQ_ATOM_STARTED no longer has any users
      left and is removed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5a61c363
    • Tejun Heo's avatar
      blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq · 634f9e46
      Tejun Heo authored
      After the recent updates to use generation number and state based
      synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
      to avoid firing the same timeout multiple times.
      
      Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
      RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
      times.  This removes atomic bitops from hot paths too.
      
      v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().
      
      v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      634f9e46
    • Tejun Heo's avatar
      blk-mq: make blk_abort_request() trigger timeout path · 358f70da
      Tejun Heo authored
      With issue/complete and timeout paths now using the generation number
      and state based synchronization, blk_abort_request() is the only one
      which depends on REQ_ATOM_COMPLETE for arbitrating completion.
      
      There's no reason for blk_abort_request() to be a completely separate
      path.  This patch makes blk_abort_request() piggyback on the timeout
      path instead of trying to terminate the request directly.
      
      This removes the last dependency on REQ_ATOM_COMPLETE in blk-mq.
      
      Note that this makes blk_abort_request() asynchronous - it initiates
      abortion but the actual termination will happen after a short while,
      even when the caller owns the request.  AFAICS, SCSI and ATA should be
      fine with that and I think mtip32xx and dasd should be safe but not
      completely sure.  It'd be great if people who know the drivers take a
      look.
      
      v2: - Add comment explaining the lack of synchronization around
            ->deadline update as requested by Bart.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Asai Thambi SP <asamymuthupa@micron.com>
      Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
      Cc: Jan Hoeppner <hoeppner@linux.vnet.ibm.com>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      358f70da