Commits · 56f99b8d06ef1ed1c9730948f9f05ac2b930a20b · Kirill Smelkov / linux

13 Sep, 2022 1 commit

block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait · 56f99b8d

Stefan Roesch authored Sep 12, 2022

Today blk_queue_enter() and __bio_queue_enter() return -EBUSY for the
nowait code path. This is not correct: they should return -EAGAIN
instead.

This problem was detected by fio. The following command exposed the
above problem:

t/io_uring -p0 -d128 -b4096 -s32 -c32 -F1 -B0 -R0 -X1 -n24 -P1 -u1 -O0 /dev/ng0n1

By applying the patch, the retry case is handled correctly in the slow
path.
Signed-off-by: Stefan Roesch <shr@fb.com>
Fixes: bfd343aa ("blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

56f99b8d

09 Sep, 2022 1 commit

block: add missing request flags to debugfs code · 745ed372

Jens Axboe authored Sep 08, 2022

We're missing TIMED_OUT and RESV. Particularly the former is handy
for debugging, let's get them added.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

745ed372

08 Sep, 2022 1 commit

Merge tag 'nvme-6.0-2022-09-08' of git://git.infradead.org/nvme into block-6.0 · 75c523ac

Jens Axboe authored Sep 08, 2022

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.1

 - fix a use after free in nvmet (Bart Van Assche)
 - fix a use after free when detecting digest errors (Sagi Grimberg)
 - fix regression that causes sporadic TCP requests to time out
   (Sagi Grimberg)
 - fix two off by ones errors in the nvmet ZNS support
   (Dennis Maisenbacher)
 - requeue aen after firmware activation (Keith Busch)"

* tag 'nvme-6.0-2022-09-08' of git://git.infradead.org/nvme:
  nvme: requeue aen after firmware activation
  nvmet: fix mar and mor off-by-one errors
  nvme-tcp: fix regression that causes sporadic requests to time out
  nvme-tcp: fix UAF when detecting digest errors
  nvmet: fix a use-after-free

75c523ac

07 Sep, 2022 2 commits

nvme: requeue aen after firmware activation · 371a982c

Keith Busch authored Sep 01, 2022

The driver prevents async event work while handling a processing paused
event, but someone needs to restart it after the controller returns to a
live state.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216400Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

371a982c

nvmet: fix mar and mor off-by-one errors · b7e97872

Dennis Maisenbacher authored Sep 06, 2022

Maximum Active Resources (MAR) and Maximum Open Resources (MOR) are 0's
based vales where a value of 0xffffffff indicates that there is no limit.

Decrement the values that are returned by bdev_max_open_zones and
bdev_max_active_zones as the block layer helpers are not 0's based.
A 0 returned by the block layer helpers indicates no limit, thus convert
it to 0xffffffff (U32_MAX).

Fixes: aaf2e048 ("nvmet: add ZBD over ZNS backend support")
Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

b7e97872

06 Sep, 2022 2 commits

nvme-tcp: fix regression that causes sporadic requests to time out · 3770a42b

Sagi Grimberg authored Sep 05, 2022

When we queue requests, we strive to batch as much as possible and also
signal the network stack that more data is about to be sent over a socket
with MSG_SENDPAGE_NOTLAST. This flag looks at the pending requests queued
as well as queue->more_requests that is derived from the block layer
last-in-batch indication.

We set more_request=true when we flush the request directly from
.queue_rq submission context (in nvme_tcp_send_all), however this is
wrongly assuming that no other requests may be queued during the
execution of nvme_tcp_send_all.

Due to this, a race condition may happen where:

 1. request X is queued as !last-in-batch
 2. request X submission context calls nvme_tcp_send_all directly
 3. nvme_tcp_send_all is preempted and schedules to a different cpu
 4. request Y is queued as last-in-batch
 5. nvme_tcp_send_all context sends request X+Y, however signals for
    both MSG_SENDPAGE_NOTLAST because queue->more_requests=true.

==> none of the requests is pushed down to the wire as the network
stack is waiting for more data, both requests timeout.

To fix this, we eliminate queue->more_requests and only rely on
the queue req_list and send_list to be not-empty.

Fixes: 122e5b9f ("nvme-tcp: optimize network stack with setting msg flags according to batch size")
Reported-by: Jonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Jonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

3770a42b

nvme-tcp: fix UAF when detecting digest errors · 160f3549

Sagi Grimberg authored Sep 05, 2022

We should also bail from the io_work loop when we set rd_enabled to true,
so we don't attempt to read data from the socket when the TCP stream is
already out-of-sync or corrupted.

Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
Reported-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

160f3549

05 Sep, 2022 1 commit

nvmet: fix a use-after-free · 6a02a61e

Bart Van Assche authored Aug 12, 2022

Fix the following use-after-free complaint triggered by blktests nvme/004:

BUG: KASAN: user-memory-access in blk_mq_complete_request_remote+0xac/0x350
Read of size 4 at addr 0000607bd1835943 by task kworker/13:1/460
Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop]
Call Trace:
 show_stack+0x52/0x58
 dump_stack_lvl+0x49/0x5e
 print_report.cold+0x36/0x1e2
 kasan_report+0xb9/0xf0
 __asan_load4+0x6b/0x80
 blk_mq_complete_request_remote+0xac/0x350
 nvme_loop_queue_response+0x1df/0x275 [nvme_loop]
 __nvmet_req_complete+0x132/0x4f0 [nvmet]
 nvmet_req_complete+0x15/0x40 [nvmet]
 nvmet_execute_io_connect+0x18a/0x1f0 [nvmet]
 nvme_loop_execute_work+0x20/0x30 [nvme_loop]
 process_one_work+0x56e/0xa70
 worker_thread+0x2d1/0x640
 kthread+0x183/0x1c0
 ret_from_fork+0x1f/0x30

Cc: stable@vger.kernel.org
Fixes: a07b4970 ("nvmet: add a generic NVMe target")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

6a02a61e

03 Sep, 2022 1 commit

block: don't add partitions if GD_SUPPRESS_PART_SCAN is set · 748008e1

Ming Lei authored Aug 23, 2022

Commit b9684a71 ("block, loop: support partitions without scanning")
adds GD_SUPPRESS_PART_SCAN for replacing part function of
GENHD_FL_NO_PART. But looks blk_add_partitions() is missed, since
loop doesn't want to add partitions if GENHD_FL_NO_PART was set.
And it causes regression on libblockdev (as called from udisks) which
operates with the LO_FLAGS_PARTSCAN.

Fixes the issue by not adding partitions if GD_SUPPRESS_PART_SCAN is
set.

Fixes: b9684a71 ("block, loop: support partitions without scanning")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220823103819.395776-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

748008e1

02 Sep, 2022 1 commit

Documentation: document ublk · 7a3d2225

Ming Lei authored Sep 02, 2022

Add documentation for ublk subsystem. It was supposed to be documented when
merging the driver, but missing at that time.

Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Richard W.M. Jones <rjones@redhat.com>
Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[axboe: correct MAINTAINERS addition]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7a3d2225

01 Sep, 2022 1 commit

Merge tag 'nvme-6.0-2022-09-01' of git://git.infradead.org/nvme into block-6.0 · 25657798

Jens Axboe authored Sep 01, 2022

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.0

 - error handling fix for the new auth code (Hannes Reinecke)
 - fix unhandled tcp states in nvmet_tcp_state_change (Maurizio Lombardi)
 - add NVME_QUIRK_BOGUS_NID for Lexar NM610 (Shyamin Ayesh)"

* tag 'nvme-6.0-2022-09-01' of git://git.infradead.org/nvme:
  nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change()
  nvmet-auth: add missing goto in nvmet_setup_auth()
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM610

25657798

31 Aug, 2022 3 commits

nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change() · 478814a5

Maurizio Lombardi authored Aug 29, 2022

TCP_FIN_WAIT2 and TCP_LAST_ACK were not handled, the connection is closing
so we can ignore them and avoid printing the "unhandled state"
warning message.

[ 1298.852386] nvmet_tcp: queue 2 unhandled state 5
[ 1298.879112] nvmet_tcp: queue 7 unhandled state 5
[ 1298.884253] nvmet_tcp: queue 8 unhandled state 5
[ 1298.889475] nvmet_tcp: queue 9 unhandled state 5

v2: Do not call nvmet_tcp_schedule_release_queue(), just ignore
the fin_wait2 and last_ack states.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

478814a5

nvmet-auth: add missing goto in nvmet_setup_auth() · da0342a3

Hannes Reinecke authored Aug 24, 2022

There's a goto missing in nvmet_setup_auth(), causing a kernel oops
when nvme_auth_extract_key() fails.
Reported-by: Tal Lossos <tallossos@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

da0342a3

nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM610 · 200dccd0

Shyamin Ayesh authored Aug 26, 2022

Lexar NM610 reports bogus eui64 values that appear to be the same across
all drives. Quirk them out so they are not marked as "non globally unique"
duplicates.
Signed-off-by: Shyamin Ayesh <me@shyamin.com>
[patch formatting]
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

200dccd0

24 Aug, 2022 6 commits

Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.0 · 645b5ed8

Jens Axboe authored Aug 24, 2022

Pull MD fixes from Song:

"1. Fix for clustered raid, by Guoqing Jiang.
 2. req_op fix, by Bart Van Assche.
 3. Fix race condition in raid recreate, by David Sloan."

* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: call __md_stop_writes in md_stop
  Revert "md-raid: destroy the bitmap after destroying the thread"
  md: Flush workqueue md_rdev_misc_wq in md_alloc()
  md/raid10: Fix the data type of an r10_sync_page_io() argument

645b5ed8

md: call __md_stop_writes in md_stop · 0dd84b31

Guoqing Jiang authored Aug 17, 2022

From the link [1], we can see raid1d was running even after the path
raid_dtr -> md_stop -> __md_stop.

Let's stop write first in destructor to align with normal md-raid to
fix the KASAN issue.

[1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a

Fixes: 48df498d ("md: move bitmap_destroy to the beginning of __md_stop")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>

0dd84b31

Revert "md-raid: destroy the bitmap after destroying the thread" · 1d258758

Guoqing Jiang authored Aug 17, 2022

This reverts commit e151db8e. Because it
obviously breaks clustered raid as noticed by Neil though it fixed KASAN
issue for dm-raid, let's revert it and fix KASAN issue in next commit.

[1]. https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470

Fixes: e151db8e ("md-raid: destroy the bitmap after destroying the thread")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>

1d258758

md: Flush workqueue md_rdev_misc_wq in md_alloc() · 5e8daf90

David Sloan authored Aug 11, 2022

A race condition still exists when removing and re-creating md devices
in test cases. However, it is only seen on some setups.

The race condition was tracked down to a reference still being held
to the kobject by the rdev in the md_rdev_misc_wq which will be released
in rdev_delayed_delete().

md_alloc() waits for previous deletions by waiting on the md_misc_wq,
but the md_rdev_misc_wq may still be holding a reference to a recently
removed device.

To fix this, also flush the md_rdev_misc_wq in md_alloc().
Signed-off-by: David Sloan <david.sloan@eideticom.com>
[logang@deltatee.com: rewrote commit message]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>

5e8daf90

md/raid10: Fix the data type of an r10_sync_page_io() argument · 265ad47a

Bart Van Assche authored Aug 10, 2022

Fix the following sparse warning:

drivers/md/raid10.c:2647:60: sparse: sparse: incorrect type in argument 5 (different base types) @@     expected restricted blk_opf_t [usertype] opf @@     got int rw @@

This patch does not change any functionality since REQ_OP_READ = READ = 0
and since REQ_OP_WRITE = WRITE = 1.

Cc: Rong A Chen <rong.a.chen@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: 4ce4c73f ("md/core: Combine two sync_page_io() arguments")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Song Liu <song@kernel.org>

265ad47a

loop: Check for overflow while configuring loop · c490a0b5

Siddh Raman Pant authored Aug 23, 2022

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
	lo->lo_offset = info->lo_offset;

This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
	WARN_ON_ONCE(iter->iomap.offset > iter->pos);

Thus, check for negative value during loop_set_status_from_info().

Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e

Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.meSigned-off-by: Jens Axboe <axboe@kernel.dk>

c490a0b5

19 Aug, 2022 1 commit

blk-mq: fix io hung due to missing commit_rqs · 65fac0d5

Yu Kuai authored Jul 26, 2022

Currently, in virtio_scsi, if 'bd->last' is not set to true while
dispatching request, such io will stay in driver's queue, and driver
will wait for block layer to dispatch more rqs. However, if block
layer failed to dispatch more rq, it should trigger commit_rqs to
inform driver.

There is a problem in blk_mq_try_issue_list_directly() that commit_rqs
won't be called:

// assume that queue_depth is set to 1, list contains two rq
blk_mq_try_issue_list_directly
 blk_mq_request_issue_directly
 // dispatch first rq
 // last is false
  __blk_mq_try_issue_directly
   blk_mq_get_dispatch_budget
   // succeed to get first budget
   __blk_mq_issue_directly
    scsi_queue_rq
     cmd->flags |= SCMD_LAST
      virtscsi_queuecommand
       kick = (sc->flags & SCMD_LAST) != 0
       // kick is false, first rq won't issue to disk
 queued++

 blk_mq_request_issue_directly
 // dispatch second rq
  __blk_mq_try_issue_directly
   blk_mq_get_dispatch_budget
   // failed to get second budget
 ret == BLK_STS_RESOURCE
  blk_mq_request_bypass_insert
 // errors is still 0

 if (!list_empty(list) || errors && ...)
  // won't pass, commit_rqs won't be called

In this situation, first rq relied on second rq to dispatch, while
second rq relied on first rq to complete, thus they will both hung.

Fix the problem by also treat 'BLK_STS_*RESOURCE' as 'errors' since
it means that request is not queued successfully.

Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_*RESOURCE'
can't be treated as 'errors' here, fix the problem by calling
commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'.

Fixes: d666ba98 ("blk-mq: add mq_ops->commit_rqs()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220726122224.1790882-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

65fac0d5

18 Aug, 2022 2 commits

blk-mq: run queue no matter whether the request is the last request · d3b38596

Yufen Yu authored Aug 03, 2022

We do test on a virtio scsi device (/dev/sda) and the default mq
scheduler is 'none'. We found a IO hung as following:

blk_finish_plug
  blk_mq_plug_issue_direct
      scsi_mq_get_budget
      //get budget_token fail and sdev->restarts=1

			     	 scsi_end_request
				   scsi_run_queue_async
                                   //sdev->restart=0 and run queue

     blk_mq_request_bypass_insert
        //add request to hctx->dispatch list

  //continue to dispath plug list
  blk_mq_dispatch_plug_list
      blk_mq_try_issue_list_directly
        //success issue all requests from plug list

After .get_budget fail, scsi_mq_get_budget will increase 'restarts'.
Normally, it will run hw queue when io complete and set 'restarts'
as 0. But if we run queue before adding request to the dispatch list
and blk_mq_dispatch_plug_list also success issue all requests, then
on one will run queue, and the request will be stall in the dispatch
list and cannot complete forever.

It is wrong to use last request of plug list to decide if run queue is
needed since all the remained requests in plug list may be from other
hctxs. To fix the bug, pass run_queue as true always to
blk_mq_request_bypass_insert().
Fix-suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Fixes: dc5fc361 ("block: attempt direct issue of plug list")
Link: https://lore.kernel.org/r/20220803023355.3687360-1-yuyufen@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d3b38596

blk-mq: remove unused function blk_mq_queue_stopped() · a8239f03

Yu Kuai authored Aug 18, 2022

blk_mq_queue_stopped() doesn't have any caller, which was found by
code coverage test, thus remove it.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220818063555.3741222-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a8239f03

16 Aug, 2022 3 commits

ublk_drv: do not add a re-issued request aborted previously to ioucmd's task_work · e6190dd0

ZiyangZhang authored Aug 15, 2022

In ublk_queue_rq(), Assume current request is a re-issued request aborted
previously in monitor_work because the ubq_daemon(ioucmd's task) is
PF_EXITING. For this request, we cannot call
io_uring_cmd_complete_in_task() anymore because at that moment io_uring
context may be freed in case that no inflight ioucmd exists. Otherwise,
we may cause null-deref in ctx->fallback_work.

Add a check on UBLK_IO_FLAG_ABORTED to prevent the above situation. This
check is safe and makes sense.

Note: monitor_work sets UBLK_IO_FLAG_ABORTED and ends this request
(releasing the tag). Then the request is restarted(allocating the tag)
and we are here. Since releasing/allocating a tag implies smp_mb(),
finding UBLK_IO_FLAG_ABORTED guarantees that here is a re-issued request
aborted previously.
Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220815023633.259825-4-ZiyangZhang@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e6190dd0

ublk_drv: update comment for __ublk_fail_req() · bb241747

ZiyangZhang authored Aug 15, 2022

Since __ublk_rq_task_work always fails requests immediately during
exiting, __ublk_fail_req() is only called from abort context during
exiting. So lock is unnecessary.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220815023633.259825-3-ZiyangZhang@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

bb241747

ublk_drv: check ubq_daemon_is_dying() in __ublk_rq_task_work() · 966120b5

ZiyangZhang authored Aug 15, 2022

Replace direct check on PF_EXITING in __ublk_rq_task_work() by the
existing wrapper. Also inline ubq_daemon_is_dying().
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220815023633.259825-2-ZiyangZhang@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

966120b5

13 Aug, 2022 1 commit

ublk_drv: update iod->addr for UBLK_IO_NEED_GET_DATA · 92cb6e2e

ZiyangZhang authored Aug 10, 2022

If ublksrv sends UBLK_IO_NEED_GET_DATA with new allocated io buffer, we
have to update iod->addr in task_work before calling io_uring_cmd_done().
Then usersapce target can handle (write)io request with the new io
buffer reading from updated iod.

Without this change, userspace target may touch a wrong io buffer!
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220810055212.66417-1-ZiyangZhang@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

92cb6e2e

12 Aug, 2022 1 commit

block: Do not call blk_put_queue() if gendisk allocation fails · aa0c680c

Rafael Mendonca authored Aug 11, 2022

Commit 6f8191fd ("block: simplify disk shutdown") removed the call
to blk_get_queue() during gendisk allocation but missed to remove the
corresponding cleanup code blk_put_queue() for it. Thus, if the gendisk
allocation fails, the request_queue refcount gets decremented and
reaches 0, causing blk_mq_release() to be called with a hctx still
alive. That triggers a WARNING report, as found by syzkaller:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 23016 at block/blk-mq.c:3881
blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881
[...] stripped
RIP: 0010:blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881
[...] stripped
Call Trace:
 <TASK>
 blk_release_queue+0x153/0x270 block/blk-sysfs.c:780
 kobject_cleanup lib/kobject.c:673 [inline]
 kobject_release lib/kobject.c:704 [inline]
 kref_put include/linux/kref.h:65 [inline]
 kobject_put+0x1c8/0x540 lib/kobject.c:721
 __alloc_disk_node+0x4f7/0x610 block/genhd.c:1388
 __blk_mq_alloc_disk+0x13b/0x1f0 block/blk-mq.c:3961
 loop_add+0x3e2/0xaf0 drivers/block/loop.c:1978
 loop_control_ioctl+0x133/0x620 drivers/block/loop.c:2150
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl fs/ioctl.c:856 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...] stripped

Fixes: 6f8191fd ("block: simplify disk shutdown")
Reported-by: syzbot+31c9594f6e43b9289b25@syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220811232338.254673-1-rafaelmendsr@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

aa0c680c

11 Aug, 2022 2 commits

Merge tag 'nvme-6.0-2022-08-11' of git://git.infradead.org/nvme into block-6.0 · cd83cd55

Jens Axboe authored Aug 11, 2022

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.0

 - print nvme connect Linux error codes properly (Amit Engel)
 - fix the fc_appid_store return value (Christoph Hellwig)
 - fix a typo in an error message (Christophe JAILLET)
 - add another non-unique identifier quirk (Dennis P. Kliem)
 - check if the queue is allocated before stopping it in nvme-tcp
   (Maurizio Lombardi)
 - restart admin queue if the caller needs to restart queue in nvme-fc
   (Ming Lei)
 - use kmemdup instead of kmalloc + memcpy in nvme-auth (Zhang Xiaoxu)"

* tag 'nvme-6.0-2022-08-11' of git://git.infradead.org/nvme:
  nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70
  nvme-tcp: check if the queue is allocated before stopping it
  nvme-fabrics: Fix a typo in an error message
  nvme-fabrics: parse nvme connect Linux error codes
  nvmet-auth: use kmemdup instead of kmalloc + memcpy
  nvme-fc: fix the fc_appid_store return value
  nvme-fc: restart admin queue if the caller needs to restart queue

cd83cd55

nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70 · f37527a0

Dennis P. Kliem authored Aug 11, 2022

ADATA XPG GAMMIX S70 reports bogus eui64 values that appear to be the same
across all drives. Quirk them out so they are not marked as "non globally
unique" duplicates.
Signed-off-by: Dennis P. Kliem <dpkliem@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

f37527a0

10 Aug, 2022 6 commits

nvme-tcp: check if the queue is allocated before stopping it · 2bff487f

Maurizio Lombardi authored Aug 01, 2022

When an error is detected and the host reconnects, the
nvme_tcp_error_recovery_work() function is called and starts
tearing down the io queues and de-allocating them;
If at the same time the "nvme" process deletes the controller via sysfs,
the nvme_tcp_delete_ctrl() gets called and waits until the
nvme_tcp_error_recovery_work() finishes its job; then starts
tearing down the io queues, but at this point they have already
been freed and the mutexes are destroyed.

Calling mutex_lock() against a destroyed mutex triggers a warning:

[ 1299.025575] nvme nvme1: Reconnecting in 10 seconds...
[ 1299.636449] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[ 1299.645262] ------------[ cut here ]------------
[ 1299.649949] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 1299.649971] WARNING: CPU: 4 PID: 104150 at kernel/locking/mutex.c:579 __mutex_lock+0x2d0/0x7dc

[ 1299.717934] CPU: 4 PID: 104150 Comm: nvme
[ 1299.828075] Call trace:
[ 1299.830526]  __mutex_lock+0x2d0/0x7dc
[ 1299.834203]  mutex_lock_nested+0x64/0xd4
[ 1299.838139]  nvme_tcp_stop_queue+0x54/0xe0 [nvme_tcp]
[ 1299.843211]  nvme_tcp_teardown_io_queues.part.0+0x90/0x280 [nvme_tcp]
[ 1299.849672]  nvme_tcp_delete_ctrl+0x6c/0xf0 [nvme_tcp]
[ 1299.854831]  nvme_do_delete_ctrl+0x108/0x120 [nvme_core]
[ 1299.860181]  nvme_sysfs_delete+0xec/0xf0 [nvme_core]
[ 1299.865179]  dev_attr_store+0x40/0x70

Fix the warning by checking if the queues are allocated
in the nvme_tcp_stop_queue(). If they are not, it makes no
sense to try to stop them.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2bff487f

nvme-fabrics: Fix a typo in an error message · c50cd03d

Christophe JAILLET authored Aug 06, 2022

A 'c' is missing.
s/fabris/fabrics/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c50cd03d

nvme-fabrics: parse nvme connect Linux error codes · ec9e96b5

Amit Engel authored Aug 01, 2022

This fixes the assumption that errval is an unsigned nvme error
Signed-off-by: Amit Engel <amit.engel@dell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

ec9e96b5

nvmet-auth: use kmemdup instead of kmalloc + memcpy · 14446f9a

Zhang Xiaoxu authored Jul 26, 2022

For code neat purpose, we can use kmemdup to replace
kmalloc + memcpy.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

14446f9a

nvme-fc: fix the fc_appid_store return value · 9317d001

Christoph Hellwig authored Aug 06, 2022

"nvme-fc: fold t fc_update_appid into fc_appid_store" accidentally
changed the userspace interface for the appid attribute, because the code
that decrements "count" to remove a trailing '\n' in the parsing results
in the decremented value being incorrectly be returned from the sysfs
write.  Fix this by keeping an orig_count variable for the full length
of the write.

Fixes: c814153c ("nvme-fc: fold t fc_update_appid into fc_appid_store")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Tested-by: Muneendra Kumar M <muneendra.kumar@broadcom.com>

9317d001

nvme-fc: restart admin queue if the caller needs to restart queue · 6fb271f1

Ming Lei authored Jul 21, 2022

Without restarting admin queue in __nvme_fc_abort_outstanding_ios(),
it leaves controller not capable of handling admin pt request, and
causes io hang.

Fixes it by restarting admin queue if the caller of __nvme_fc_abort_outstanding_ios
requires to restart queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

6fb271f1

05 Aug, 2022 4 commits

Merge tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-block · fa9db655

Linus Torvalds authored Aug 04, 2022

Pull block driver updates from Jens Axboe:

 - NVMe pull requests via Christoph:
      - add support for In-Band authentication (Hannes Reinecke)
      - handle the persistent internal error AER (Michael Kelley)
      - use in-capsule data for TCP I/O queue connect (Caleb Sander)
      - remove timeout for getting RDMA-CM established event (Israel
        Rukshin)
      - misc cleanups (Joel Granados, Sagi Grimberg, Chaitanya Kulkarni,
        Guixin Liu, Xiang wangx)
      - use command_id instead of req->tag in trace_nvme_complete_rq()
        (Bean Huo)
      - various fixes for the new authentication code (Lukas Bulwahn,
        Dan Carpenter, Colin Ian King, Chaitanya Kulkarni, Hannes
        Reinecke)
      - small cleanups (Liu Song, Christoph Hellwig)
      - restore compat_ioctl support (Nick Bowler)
      - make a nvmet-tcp workqueue lockdep-safe (Sagi Grimberg)
      - enable generic interface (/dev/ngXnY) for unknown command sets
        (Joel Granados, Christoph Hellwig)
      - don't always build constants.o (Christoph Hellwig)
      - print the command name of aborted commands (Christoph Hellwig)

 - MD pull requests via Song:
      - Improve raid5 lock contention, by Logan Gunthorpe.
      - Misc fixes to raid5, by Logan Gunthorpe.
      - Fix race condition with md_reap_sync_thread(), by Guoqing Jiang.
      - Fix potential deadlock with raid5_quiesce and
        raid5_get_active_stripe, by Logan Gunthorpe.
      - Refactoring md_alloc(), by Christoph"
      - Fix md disk_name lifetime problems, by Christoph Hellwig
      - Convert prepare_to_wait() to wait_woken() api, by Logan
        Gunthorpe;
      - Fix sectors_to_do bitmap issue, by Logan Gunthorpe.

 - Work on unifying the null_blk module parameters and configfs API
   (Vincent)

 - drbd bitmap IO error fix (Lars)

 - Set of rnbd fixes (Guoqing, Md Haris)

 - Remove experimental marker on bcache async device registration (Coly)

 - Series from cleaning up the bio splitting (Christoph)

 - Removal of the sx8 block driver. This hardware never really
   widespread, and it didn't receive a lot of attention after the
   initial merge of it back in 2005 (Christoph)

 - A few fixes for s390 dasd (Eric, Jiang)

 - Followup set of fixes for ublk (Ming)

 - Support for UBLK_IO_NEED_GET_DATA for ublk (ZiyangZhang)

 - Fixes for the dio dma alignment (Keith)

 - Misc fixes and cleanups (Ming, Yu, Dan, Christophe

* tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-block: (136 commits)
  s390/dasd: Establish DMA alignment
  s390/dasd: drop unexpected word 'for' in comments
  ublk_drv: add support for UBLK_IO_NEED_GET_DATA
  ublk_cmd.h: add one new ublk command: UBLK_IO_NEED_GET_DATA
  ublk_drv: cleanup ublksrv_ctrl_dev_info
  ublk_drv: add SET_PARAMS/GET_PARAMS control command
  ublk_drv: fix ublk device leak in case that add_disk fails
  ublk_drv: cancel device even though disk isn't up
  block: fix leaking page ref on truncated direct io
  block: ensure bio_iov_add_page can't fail
  block: ensure iov_iter advances for added pages
  drivers:md:fix a potential use-after-free bug
  md/raid5: Ensure batch_last is released before sleeping for quiesce
  md/raid5: Move stripe_request_ctx up
  md/raid5: Drop unnecessary call to r5c_check_stripe_cache_usage()
  md/raid5: Make is_inactive_blocked() helper
  md/raid5: Refactor raid5_get_active_stripe()
  block: pass struct queue_limits to the bio splitting helpers
  block: move bio_allowed_max_sectors to blk-merge.c
  block: move the call to get_max_io_size out of blk_bio_segment_split
  ...

fa9db655

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · e4952747

Linus Torvalds authored Aug 04, 2022

Pull rdma updates from Jason Gunthorpe:
 "This cycle we got a new RDMA driver "ERDMA" for the Alibaba cloud
  environment. Otherwise the changes are dominated by rxe fixes.

  There is another RDMA driver on the list that might get merged next
  cycle, 'MANA' for the Azure cloud environment.

  Summary:

   - Bug fixes and small features for irdma, hns, siw, qedr, hfi1, mlx5

   - General spelling/grammer fixes

   - rdma cm can follow changes in neighbours for control packets

   - Significant amounts of rxe fixes and spec compliance changes

   - Use the modern NAPI API

   - Use the bitmap API instead of open coding

   - Performance improvements for rtrs

   - Add the ERDMA driver for Alibaba cloud

   - Fix a use after free bug in SRP"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
  RDMA/ib_srpt: Unify checking rdma_cm_id condition in srpt_cm_req_recv()
  RDMA/rxe: Fix error unwind in rxe_create_qp()
  RDMA/mlx5: Add missing check for return value in get namespace flow
  RDMA/rxe: Split qp state for requester and completer
  RDMA/rxe: Generate error completion for error requester QP state
  RDMA/rxe: Update wqe_index for each wqe error completion
  RDMA/srpt: Fix a use-after-free
  RDMA/srpt: Introduce a reference count in struct srpt_device
  RDMA/srpt: Duplicate port name members
  IB/qib: Fix repeated "in" within comments
  RDMA/erdma: Add driver to kernel build environment
  RDMA/erdma: Add the ABI definitions
  RDMA/erdma: Add the erdma module
  RDMA/erdma: Add connection management (CM) support
  RDMA/erdma: Add verbs implementation
  RDMA/erdma: Add verbs header file
  RDMA/erdma: Add event queue implementation
  RDMA/erdma: Add cmdq implementation
  RDMA/erdma: Add main include file
  RDMA/erdma: Add the hardware related definitions
  ...

e4952747

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 746fc76b

Linus Torvalds authored Aug 04, 2022

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, qla2xx, target, lpfc, smartpqi,
  mpi3mr).

  The main driver change that might cause issues on down the road is the
  conversion of some of our oldest surviving drivers to the DMA API
  (should only affect m68k).

  The only major core change is the rework of async resume; the rest are
  either completely trivial or for updating deprecated APIs"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (195 commits)
  scsi: target: Remove XDWRITEREAD emulated support
  scsi: megaraid: Remove the static variable initialisation
  scsi: ch: Do not initialise statics to 0
  scsi: ufs: core: Fix spelling mistake "Cannnot" -> "Cannot"
  scsi: target: iscsi: Do not require target authentication
  scsi: target: iscsi: Allow AuthMethod=None
  scsi: target: iscsi: Support base64 in CHAP
  scsi: target: iscsi: Add support for extended CDB AHS
  scsi: ufs: dt-bindings: Add SC8280XP binding
  scsi: target: iscsi: Fix clang -Wformat warnings
  scsi: ufs: core: Read device property for ref clock
  scsi: libsas: Resume SAS host for phy reset or enable via sysfs
  scsi: hisi_sas: Modify v3 HW SATA completion error processing
  scsi: hisi_sas: Relocate DMA unmap of SMP task
  scsi: hisi_sas: Remove unnecessary variable to hold DMA map elements
  scsi: hisi_sas: Call hisi_sas_slave_configure() from slave_configure_v3_hw()
  scsi: mpi3mr: Delete a stray tab
  scsi: mpi3mr: Unlock on error path
  scsi: mpi3mr: Reduce VD queue depth on detecting throttling
  scsi: mpi3mr: Resource Based Metering
  ...

746fc76b

Merge tag 'mmc-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 328141e5

Linus Torvalds authored Aug 04, 2022

Pull MMC updates from Ulf Hansson:
 "MMC core:
   - Add support for the asynchronous SDIO wakeup interrupts
   - Skip redundant evaluation of eMMC HS400 caps when no-MMC-cap
   - Add support to store error stats from host drivers
   - Extend debugfs to show error stats from host drivers
   - Add single I/O read support in the recovery path for 4k sector cards

  MMC host:
   - dw_mmc-exynos: Convert corresponding DT bindings to the dtschema
   - dw_mmc-rockchip: Add support for the Rockchip RV1126 variant
   - mmc_spi: Convert corresponding DT bindings to the dtschema
   - mtk-sd: Extend support for interrupts/pinctrls for SDIO low-power mode
   - mtk-sd: Add support for SDIO wake irqs
   - mtk-sd: Add support for the Mediatek MT8188 variant
   - renesas_sdhi: Drop redundant manual tap correction for newer SoCs
   - renesas_sdhi: Add support for the R-Car S4-8 and generic Gen4 variants
   - sdhci/cqhci: Add support to capture stats from host errors
   - sdhci-brcmstb: Add ability to increase max clock rate for SDIO on 72116b0
   - sdhci-msm: Add support for the MSM8998 and SM8450 variant
   - sdhci-of-at91: Fixup UHS-I mode by rewriting of MC1R
   - sdhci-of-dwcmshc: Add support for the Rockchip rk3588 variant
   - sdhci-of-dwcmshc: Enable reset support for the Rockchip variants
   - sdhci-pci-gli: Improve I/O read/write performance for GL9763E
   - sdhci-s3c: Convert corresponding DT bindings to the dtschema
   - tmio: Avoid glitches when resetting

  MEMSTICK core:
   - A couple of minor fixes and cleanups"

* tag 'mmc-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (61 commits)
  mmc: mediatek: add support for SDIO eint wakup IRQ
  mmc: core: Add support for SDIO wakeup interrupt
  dt-bindings: mmc: mtk-sd: extend interrupts and pinctrls properties
  dt-bindings: mmc: rockchip-dw-mshc: Document Rockchip RV1126
  mmc: renesas_sdhi: newer SoCs don't need manual tap correction
  mmc: cavium-thunderx: Add of_node_put() when breaking out of loop
  mmc: cavium-octeon: Add of_node_put() when breaking out of loop
  mmc: core: quirks: Add of_node_put() when breaking out of loop
  mmc: sdhci-brcmstb: use clk_get_rate(base_clk) in PM resume
  dt-bindings: mmc: sdhci-msm: Document the SM8450 compatible
  mmc: sdhci-msm: drop redundant of_device_id entries
  dt-bindings: mmc: sdhci-msm: add MSM8998
  mmc: block: Add single read for 4k sector cards
  mmc: mxcmmc: Use mmc_card_sdio macro
  mmc: core: Use mmc_card_* macro and add a new for the sd_combo type
  dt-bindings: mmc: sdhci-msm: constrain reg-names per variants
  dt-bindings: mmc: sdhci-msm: fix reg-names entries
  dt-bindings: mmc: Add compatible for MediaTek MT8188
  dt-bindings: mmc: sdhci-msm: document resets
  mmc: sdhci-of-at91: fix set_uhs_signaling rewriting of MC1R
  ...

328141e5