1. 16 Jun, 2021 5 commits
  2. 11 Jun, 2021 30 commits
  3. 09 Jun, 2021 1 commit
  4. 08 Jun, 2021 2 commits
    • Jan Kara's avatar
      rq-qos: fix missed wake-ups in rq_qos_throttle try two · 11c7aa0d
      Jan Kara authored
      Commit 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      tried to fix a problem that a process could be sleeping in rq_qos_wait()
      without anyone to wake it up. However the fix is not complete and the
      following can still happen:
      
      CPU1 (waiter1)		CPU2 (waiter2)		CPU3 (waker)
      rq_qos_wait()		rq_qos_wait()
        acquire_inflight_cb() -> fails
      			  acquire_inflight_cb() -> fails
      
      						completes IOs, inflight
      						  decreased
        prepare_to_wait_exclusive()
      			  prepare_to_wait_exclusive()
        has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
      			  has_sleeper = !wq_has_single_sleeper() -> true
        io_schedule()		  io_schedule()
      
      Deadlock as now there's nobody to wakeup the two waiters. The logic
      automatically blocking when there are already sleepers is really subtle
      and the only way to make it work reliably is that we check whether there
      are some waiters in the queue when adding ourselves there. That way, we
      are guaranteed that at least the first process to enter the wait queue
      will recheck the waiting condition before going to sleep and thus
      guarantee forward progress.
      
      Fixes: 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.czSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      11c7aa0d
    • Long Li's avatar
      block: return the correct bvec when checking for gaps · c9c9762d
      Long Li authored
      After commit 07173c3e ("block: enable multipage bvecs"), a bvec can
      have multiple pages. But bio_will_gap() still assumes one page bvec while
      checking for merging. If the pages in the bvec go across the
      seg_boundary_mask, this check for merging can potentially succeed if only
      the 1st page is tested, and can fail if all the pages are tested.
      
      Later, when SCSI builds the SG list the same check for merging is done in
      __blk_segment_map_sg_merge() with all the pages in the bvec tested. This
      time the check may fail if the pages in bvec go across the
      seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
      BIOs were merged). If this check fails, we end up with a broken SG list
      for drivers assuming the SG list not having offsets in intermediate pages.
      This results in incorrect pages written to the disk.
      
      Fix this by returning the multi-page bvec when testing gaps for merging.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3e ("block: enable multipage bvecs")
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c9c9762d
  5. 03 Jun, 2021 2 commits
    • Bart Van Assche's avatar
      block: Update blk_update_request() documentation · 7cc2623d
      Bart Van Assche authored
      Although the original intent was to use blk_update_request() in stacking
      block drivers only, it is used much more widely today. Reflect this in the
      documentation block above this function. See also:
      * commit 32fab448 ("block: add request update interface").
      * commit 2e60e022 ("block: clean up request completion API").
      * commit ed6565e7 ("block: handle partial completions for special
        payload requests").
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20210519175226.8853-1-bvanassche@acm.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7cc2623d
    • Jan Kara's avatar
      block: Do not pull requests from the scheduler when we cannot dispatch them · 61347154
      Jan Kara authored
      Provided the device driver does not implement dispatch budget accounting
      (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
      requests from the IO scheduler as long as it is willing to give out any.
      That defeats scheduling heuristics inside the scheduler by creating
      false impression that the device can take more IO when it in fact
      cannot.
      
      For example with BFQ IO scheduler on top of virtio-blk device setting
      blkio cgroup weight has barely any impact on observed throughput of
      async IO because __blk_mq_do_dispatch_sched() always sucks out all the
      IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
      when that is all dispatched, it will give out IO of lower weight cgroups
      as well. And then we have to wait for all this IO to be dispatched to
      the disk (which means lot of it actually has to complete) before the
      IO scheduler is queried again for dispatching more requests. This
      completely destroys any service differentiation.
      
      So grab request tag for a request pulled out of the IO scheduler already
      in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
      cannot get it because we are unlikely to be able to dispatch it. That
      way only single request is going to wait in the dispatch list for some
      tag to free.
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.czSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      61347154