Commits · 81fe92849928d65159d707b7b28febffbef94559 · nexedi / linux

13 Mar, 2019 4 commits

nvme-trace: fix cdw10 buffer overrun · 81fe9284

Keith Busch authored Mar 13, 2019

The field is defined to be a 24 byte array, we don't need to multiply
the sizeof() that field by the number of dwords it covers.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

81fe9284

nvme: don't warn on block content change effects · 415df90b

Keith Busch authored Mar 13, 2019

A write or flush IO passthrough command is expected to change the
logical block content, so don't warn on these as no additional handling
is necessary.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

415df90b

nvme: add get-feature to admin cmds tracer · d9d53ed3

Max Gurtovoy authored Mar 13, 2019

This will print get-feature cmd in more informative way. For example,
run "nvme get-feature /dev/nvme0 -n 1 -f 0x9 -c 10" will trace:

 nvme-3907  [008] ....  1763.635054: nvme_setup_cmd: nvme0: qid=0, cmdid=6, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_get_features fid=0x9 sel=0x0 cdw11=0xa)
<idle>-0     [001] d.h.  1763.635112: nvme_sq: nvme0: qid=0, head=27, tail=27
<idle>-0     [008] ..s.  1763.635121: nvme_complete_rq: nvme0: qid=0, cmdid=6, res=10, retries=0, flags=0x2, status=0
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d9d53ed3

Merge branch 'for-5.1/md-post' of https://github.com/liu-song-6/linux into for-5.1/block-post · 7b7395a3

Jens Axboe authored Mar 13, 2019

Pull MD fixes from Song.

* 'for-5.1/md-post' of https://github.com/liu-song-6/linux:
  md: Fix failed allocation of md_register_thread
  It's wrong to add len to sector_nr in raid10 reshape twice
  raid5: set write hint for PPL

7b7395a3

12 Mar, 2019 3 commits

md: Fix failed allocation of md_register_thread · e406f12d

Aditya Pakki authored Mar 04, 2019

mddev->sync_thread can be set to NULL on kzalloc failure downstream.
The patch checks for such a scenario and frees allocated resources.

Committer node:

Added similar fix to raid5.c, as suggested by Guoqing.

Cc: stable@vger.kernel.org # v3.16+
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Song Liu <songliubraving@fb.com>

e406f12d

It's wrong to add len to sector_nr in raid10 reshape twice · b761dcf1

Xiao Ni authored Mar 08, 2019

In reshape_request it already adds len to sector_nr already. It's wrong to add len to
sector_nr again after adding pages to bio. If there is bad block it can't copy one chunk
at a time, it needs to goto read_more. Now the sector_nr is wrong. It can cause data
corruption.

Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>

b761dcf1

raid5: set write hint for PPL · a596d086

Mariusz Dabrowski authored Feb 18, 2019

When the Partial Parity Log is enabled, circular buffer is used to store
PPL data. Each write to RAID device causes overwrite of data in this buffer
so some write_hint can be set to those request to help drives handle
garbage collection. This patch adds new sysfs attribute which can be used
to specify which write_hint should be assigned to PPL.
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>

a596d086

07 Mar, 2019 1 commit

pblk: fix max_io calculation · 9205e449

Javier González authored Mar 07, 2019

When calculating the maximun I/O size allowed into the buffer, consider
the write size (ws_opt) used by the write thread in order to cover the
case in which, due to flushes, the mem and subm pointers are disaligned
by (ws_opt - 1). This case currently translates into a stall when
an I/O of the largest possible size is submitted.

Fixes: f9f9d1ae2c66 ("lightnvm: pblk: prevent stall due to wb threshold")
Signed-off-by: Javier González <javier@javigon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9205e449

06 Mar, 2019 2 commits

block: fix segment calculation for passthrough IO · 05b700ba

Ming Lei authored Mar 03, 2019

blk_recount_segments() can be called in bio_add_pc_page() for
calculating how many segments this bio will has after one page is added
to this bio. If the resulted segment number is beyond the queue limit,
the added page will be removed.

The try-and-fix policy requires blk_recount_segments(__blk_recalc_rq_segments)
to not consider the segment number limit. Unfortunately bvec_split_segs()
does check this limit, and causes small segment number returned to
bio_add_pc_page(), then page still may be added to the bio even though
segment number limit becomes broken.

Fixes this issue by not considering segment number limit when calcualting
bio's segment number.

Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

05b700ba

Merge branch 'stable/for-jens-5.1' of... · e61750c8

Jens Axboe authored Mar 06, 2019

Merge branch 'stable/for-jens-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-5.1/block-post

Pull two xen blkback fixes from Konrad.

* 'stable/for-jens-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront
  xen/blkback: add stack variable 'blkif' in connect_ring()

e61750c8

02 Mar, 2019 1 commit

block: fix updating bio's front segment size · aaeee62c

Ming Lei authored Mar 02, 2019

When the current bvec can be merged to the 1st segment, the bio's front
segment size has to be updated.

However, dcebd755 doesn't consider that case, then bio's front
segment size may not be correct.

This patch fixes this issue.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Omar Sandoval <osandov@fb.com>
Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

aaeee62c

28 Feb, 2019 9 commits

block: Replace function name in string with __func__ · dfc76d11

Keyur Patel authored Feb 17, 2019

Replace hard coded function name register_blkdev with __func__, to
improve robustness and to conform to the Linux kernel coding
style. Issue found using checkpatch.
Signed-off-by: Keyur Patel <iamkeyur96@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dfc76d11

nbd: propagate genlmsg_reply return code · cd46eb89

Li RongQing authored Feb 19, 2019

genlmsg_reply can fail, so propagate its return code
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cd46eb89

floppy: remove set but not used variable 'q' · 6dc8746d

YueHaibing authored Feb 18, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/block/floppy.c: In function 'request_done':
drivers/block/floppy.c:2233:24: warning:
 variable 'q' set but not used [-Wunused-but-set-variable]

It's never used and can be removed.
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6dc8746d

null_blk: fix checking for REQ_FUA · bf7c7a04

Heinz Mauelshagen authored Feb 22, 2019

null_handle_bio() erroneously uses the bio_op macro
which masks respective request flag bits including REQ_FUA
out thus failing the check.

Fix by checking bio->bi_opf directly.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bf7c7a04

block: fix NULL pointer dereference in register_disk · 4d7c1d3f

zhengbin authored Feb 20, 2019

If __device_add_disk-->bdi_register_owner-->bdi_register-->
bdi_register_va-->device_create_vargs fails, bdi->dev is still
NULL, __device_add_disk-->register_disk will visit bdi->dev->kobj.
This patch fixes that.
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4d7c1d3f

fs: fix guard_bio_eod to check for real EOD errors · dce30ca9

Carlos Maiolino authored Feb 26, 2019

guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.

It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.

In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.

Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.

I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.

The following script can trigger the error.

MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0

mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT

losetup -D

losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT

find $MNT -type f -exec cat {} + >/dev/null

Kudos to Eric Sandeen for coming up with the reproducer above
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dce30ca9

blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map · 7d76f856

Dongli Zhang authored Feb 27, 2019

Replace set->map[0] with set->map[HCTX_TYPE_DEFAULT] to avoid hardcoding.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7d76f856

block: optimize bvec iteration in bvec_iter_advance · 5b88a17c

Christoph Hellwig authored Feb 28, 2019

There is no need to only iterate in chunks of PAGE_SIZE or less in
bvec_iter_advance, given that the callers pass in the chunk length that
they are operating on - either that already is less than PAGE_SIZE
because they do classic page-based iteration, or it is larger because
the caller operates on multi-page bvecs.

This should help shaving off a few cycles of the I/O hot path.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5b88a17c

block: introduce mp_bvec_for_each_page() for iterating over page · 594b9a89

Ming Lei authored Feb 27, 2019

mp_bvec_for_each_segment() is a bit big for the iteration, so introduce
a light-weight helper for iterating over pages, then 32bytes stack
space can be saved.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

594b9a89

27 Feb, 2019 3 commits

block: optimize blk_bio_segment_split for single-page bvec · bbcbbd56

Ming Lei authored Feb 27, 2019

Introduce a fast path for single-page bvec IO, then we can avoid
to call bvec_split_segs() unnecessarily.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bbcbbd56

block: optimize __blk_segment_map_sg() for single-page bvec · 48d7727c

Ming Lei authored Feb 27, 2019

Introduce a fast path for single-page bvec IO, then blk_bvec_map_sg()
can be avoided.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

48d7727c

block: introduce bvec_nth_page() · 4d633062

Ming Lei authored Feb 27, 2019

Single-page bvec can often be seen in small BS workloads, so
introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily,
which looks not cheap.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4d633062

24 Feb, 2019 5 commits

iomap: wire up the iopoll method · 81214bab

Christoph Hellwig authored Dec 04, 2018

Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device.  Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>

Modified to use bio_set_polled().
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

81214bab

block: add bio_set_polled() helper · 0bbb280d

Jens Axboe authored Dec 21, 2018

For the upcoming async polled IO, we can't sleep allocating requests.
If we do, then we introduce a deadlock where the submitter already
has async polled IO in-flight, but can't wait for them to complete
since polled requests must be active found and reaped.

Utilize the helper in the blockdev DIRECT_IO code.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0bbb280d

block: wire up block device iopoll method · eae83ce1

Christoph Hellwig authored Nov 30, 2018

Just call blk_poll on the iocb cookie, we can derive the block device
from the inode trivially.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

eae83ce1

fs: add an iopoll method to struct file_operations · fb7e1600

Christoph Hellwig authored Nov 22, 2018

This new methods is used to explicitly poll for I/O completion for an
iocb.  It must be called for any iocb submitted asynchronously (that
is with a non-null ki_complete) which has the IOCB_HIPRI flag set.

The method is assisted by a new ki_cookie field in struct iocb to store
the polling cookie.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fb7e1600

xen/blkback: rework connect_ring() to avoid inconsistent xenstore... · 4a8c31a1

Dongli Zhang authored Feb 24, 2019

xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront

The xenstore 'ring-page-order' is used globally for each blkback queue and
therefore should be read from xenstore only once. However, it is obtained
in read_per_ring_refs() which might be called multiple times during the
initialization of each blkback queue.

If the blkfront is malicious and the 'ring-page-order' is set in different
value by blkfront every time before blkback reads it, this may end up at
the "WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));" in
xen_blkif_disconnect() when frontend is destroyed.

This patch reworks connect_ring() to read xenstore 'ring-page-order' only
once.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4a8c31a1

22 Feb, 2019 2 commits

loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() · 758a58d0

Dongli Zhang authored Feb 22, 2019

Commit 0da03cab
("loop: Fix deadlock when calling blkdev_reread_part()") moves
blkdev_reread_part() out of the loop_ctl_mutex. However,
GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
__blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
will not rescan the loop device to delete all partitions.

Below are steps to reproduce the issue:

step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
step2 # losetup -P /dev/loop0 tmp.raw
step3 # parted /dev/loop0 mklabel gpt
step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
step5 # losetup -d /dev/loop0

Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
there is below kernel warning message:

[  464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)

This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().

Fixes: 0da03cab ("loop: Fix deadlock when calling blkdev_reread_part()")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

758a58d0

loop: do not print warn message if partition scan is successful · 40853d6f

Dongli Zhang authored Feb 22, 2019

Do not print warn message when the partition scan returns 0.

Fixes: d57f3374 ("loop: Move special partition reread handling in loop_clr_fd()")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

40853d6f

21 Feb, 2019 3 commits

block: bounce: make sure that bvec table is updated · 8f4e80da

Ming Lei authored Feb 21, 2019

Block bounce needs to allocate new page for doing IO, and the
new page has to be updated to bvec table.

Commit 6dc4f100 switches __blk_queue_bounce() to use the new
bio_for_each_segment_all() interface. Unfortunately the new
bio_for_each_segment_all() can't be used to update bvec table.

This patch fixes this issue by retrieving bvec from the table
directly, then the new allocated page can be updated to the bio.
This way is safe because the cloned bio is single page bvec.

Fixes: 6dc4f100 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8f4e80da

Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-5.1/block · 037b2625

Jens Axboe authored Feb 21, 2019

Pull NVMe changes for 5.1 from Christoph

* 'nvme-5.1' of git://git.infradead.org/nvme: (22 commits)
  nvme-rdma: use nr_phys_segments when map rq to sgl
  nvmet: convert to SPDX identifiers
  nvmet-rdma: convert to SPDX identifiers
  nvme-loop: convert to SPDX identifiers
  nvmet-fcloop: convert to SPDX identifiers
  nvmet-fc: convert to SPDX identifiers
  nvme: convert to SPDX identifiers
  nvme-pci: convert to SPDX identifiers
  nvme-lightnvm: convert to SPDX identifiers
  nvme-rdma: convert to SPDX identifiers
  nvme-fc: convert to SPDX identifiers
  nvme-fabrics: convert to SPDX identifiers
  nvme-tcp.h: fix SPDX header
  nvme_ioctl.h: remove duplicate GPL boilerplate
  nvme: return error from nvme_alloc_ns()
  nvme: avoid that deleting a controller triggers a circular locking complaint
  nvme: introduce a helper function for controller deletion
  nvme: unexport nvme_delete_ctrl_sync()
  nvme-pci: check kstrtoint() return value in queue_count_set()
  nvme-fabrics: document the poll function argument
  ...

037b2625

nvme-rdma: use nr_phys_segments when map rq to sgl · 34e08191

Chaitanya Kulkarni authored Feb 20, 2019

Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
if a command contains data to be mapped.  This fixes the case where
a struct request contains LBAs, but it has no payload, such as
Write Zeroes support.

Fixes: 6e02318e ("nvme: add support for the Write Zeroes command")
Reported-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

34e08191

20 Feb, 2019 7 commits

nvmet: convert to SPDX identifiers · 77141dc6

Christoph Hellwig authored Feb 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

77141dc6

nvmet-rdma: convert to SPDX identifiers · 3641bd32

Christoph Hellwig authored Feb 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

3641bd32

nvme-loop: convert to SPDX identifiers · d0ad6904

Christoph Hellwig authored Feb 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

d0ad6904

nvmet-fcloop: convert to SPDX identifiers · a4b74fcc

Christoph Hellwig authored Feb 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

a4b74fcc

nvmet-fc: convert to SPDX identifiers · 4f80fc77

Christoph Hellwig authored Feb 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

4f80fc77

nvme: convert to SPDX identifiers · bc50ad75

Christoph Hellwig authored Feb 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

bc50ad75

nvme-pci: convert to SPDX identifiers · 5f37396d

Christoph Hellwig authored Feb 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

5f37396d