Commits · 412445acb6cad4cef026daae37c4765fb9942c60 · Kirill Smelkov / linux

01 May, 2017 6 commits

dm: introduce a new DM_MAPIO_KILL return value · 412445ac

Christoph Hellwig authored Apr 26, 2017

This untangles the DM_MAPIO_* values returned from ->clone_and_map_rq
from the error codes used by the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

412445ac

dm rq: change ->rq_end_io calling conventions · 7ed8578a

Christoph Hellwig authored Apr 26, 2017

Instead of returning either a DM_ENDIO_* constant or an error code, add
a new DM_ENDIO_DONE value that means keep errno as is.  This allows us
to easily keep the existing error code in case where we can't push back,
and it also preparares for the new block level status codes with strict
type checking.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

7ed8578a

dm mpath: merge do_end_io into multipath_end_io · b79f10ee

Christoph Hellwig authored Apr 26, 2017

This simplifies the I/O completion path a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

b79f10ee

Merge branch 'dm-4.12' into dm-4.12-post-merge · 7e25a760
Mike Snitzer authored May 01, 2017

7e25a760

dm bufio: check new buffer allocation watermark every 30 seconds · 390020ad

Mikulas Patocka authored Apr 30, 2017

dm-bufio checks a watermark when it allocates a new buffer in
__bufio_new().  However, it doesn't check the watermark when the user
changes /sys/module/dm_bufio/parameters/max_cache_size_bytes.

This may result in a problem - if the watermark is high enough so that
all possible buffers are allocated and if the user lowers the value of
"max_cache_size_bytes", the watermark will never be checked against the
new value because no new buffer would be allocated.

To fix this, change __evict_old_buffers() so that it checks the
watermark.  __evict_old_buffers() is called every 30 seconds, so if the
user reduces "max_cache_size_bytes", dm-bufio will react to this change
within 30 seconds and decrease memory consumption.

Depends-on: 1b0fb5a5 ("dm bufio: avoid a possible ABBA deadlock")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

390020ad

dm bufio: avoid a possible ABBA deadlock · 1b0fb5a5

Mikulas Patocka authored Apr 30, 2017

__get_memory_limit() tests if dm_bufio_cache_size changed and calls
__cache_size_refresh() if it did.  It takes dm_bufio_clients_lock while
it already holds the client lock.  However, lock ordering is violated
because in cleanup_old_buffers() dm_bufio_clients_lock is taken before
the client lock.

This results in a possible deadlock and lockdep engine warning.

Fix this deadlock by changing mutex_lock() to mutex_trylock().  If the
lock can't be taken, it will be re-checked next time when a new buffer
is allocated.

Also add "unlikely" to the if condition, so that the optimizer assumes
that the condition is false.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

1b0fb5a5

28 Apr, 2017 6 commits

block: hide badblocks attribute by default · 9438b3e0

Dan Williams authored Apr 27, 2017

Commit 99e6608c "block: Add badblock management for gendisks"
allowed for drivers like pmem and software-raid to advertise a list of
bad media areas. However, it inadvertently added a 'badblocks' to all
block devices. Lets clean this up by having the 'badblocks' attribute
not be visible when the driver has not populated a 'struct badblocks'
instance in the gendisk.

Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

9438b3e0

blk-mq: unify hctx delay_work and run_work · 21c6e939

Jens Axboe authored Apr 10, 2017

The only difference between ->run_work and ->delay_work, is that
the latter is used to defer running a queue. This is done by
marking the queue stopped, and scheduling ->delay_work to run
sometime in the future. While the queue is stopped, direct runs
or runs through ->run_work will not run the queue.

If we combine the handlers, then we need to handle two things:

1) If a delayed/stopped run is scheduled, then we should not run
   the queue before that has been completed.
2) If a queue is delayed/stopped, the handler needs to restart
   the queue. Normally a run of a queue with the stopped bit set
   would be a no-op.

Case 1 is handled by modifying a currently pending queue run
to the deadline set by the caller of blk_mq_delay_queue().
Subsequent attempts to queue a queue run will find the work
item already pending, and direct runs will see a stopped queue
as before.

Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN,
that tells the work handler that it should clear a stopped
queue and run the handler.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

21c6e939

block: add kblock_mod_delayed_work_on() · 818cd1cb

Jens Axboe authored Apr 10, 2017

This modifies (or adds, if not currently pending) an existing
delayed work item.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

818cd1cb

blk-mq: unify hctx delayed_run_work and run_work · 9f993737

Jens Axboe authored Apr 10, 2017

They serve the exact same purpose. Get rid of the non-delayed
work variant, and just run it without delay for the normal case.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

9f993737

nbd: fix use after free on module unload · 60ae36ad

Josef Bacik authored Apr 28, 2017

list_for_each_entry() isn't super safe if we're freeing the objects
while we traverse the list.  Also don't bother taking the extra
reference, the module refcounting stuff will save us from having anybody
messing with the device while we're trying to unload.
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

60ae36ad

MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler · bf290f8f

Ulf Hansson authored Apr 28, 2017

Seems like this was forgotten in the bfq-series from Paolo. Let's do it now
so people don't miss out involving Paolo for any future changes or when
reporting bugs.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@fb.com>

bf290f8f

27 Apr, 2017 17 commits

dm mpath: make it easier to detect unintended I/O request flushes · 86331f39

Bart Van Assche authored Apr 27, 2017

I/O errors triggered by multipathd incorrectly not enabling the no-flush
flag for DM_DEVICE_SUSPEND or DM_DEVICE_RESUME are hard to debug.  Add
more logging to make it easier to debug this.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

86331f39

dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit() · 9a8ac3ae

Bart Van Assche authored Apr 27, 2017

No functional change but makes the code easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

9a8ac3ae

dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH · ca5beb76

Bart Van Assche authored Apr 27, 2017

Instead of checking MPATHF_QUEUE_IF_NO_PATH,
MPATHF_SAVED_QUEUE_IF_NO_PATH and the no_flush flag to decide whether
or not to push back a request (or bio) if there are no paths available,
only clear MPATHF_QUEUE_IF_NO_PATH in queue_if_no_path() if no_flush has
not been set.  The result is that only a single bit has to be tested in
the hot path to decide whether or not a request must be pushed back and
also that m->lock does not have to be taken in the hot path.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

ca5beb76

dm: introduce enum dm_queue_mode to cleanup related code · 7e0d574f

Bart Van Assche authored Apr 27, 2017

Introduce an enumeration type for the queue mode.  This patch does
not change any functionality but makes the DM code easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

7e0d574f

dm mpath: verify __pg_init_all_paths locking assumptions at runtime · b194679f

Bart Van Assche authored Apr 27, 2017

Verify at runtime that __pg_init_all_paths() is called with
multipath.lock held if lockdep is enabled.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

b194679f

dm: verify suspend_locking assumptions at runtime · 1ea0654e

Bart Van Assche authored Apr 27, 2017

Ensure that the assumptions about the caller holding suspend_lock
are checked at runtime if lockdep is enabled.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

1ea0654e

dm block manager: remove an unused argument from dm_block_manager_create() · 73cbca6a

Bart Van Assche authored Apr 27, 2017

The 'cache_size' argument of dm_block_manager_create() has never been
used.  Remove it along with the definitions of the constants passed as
the 'cache_size' argument.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

73cbca6a

dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue() · 23a60124

Bart Van Assche authored Apr 27, 2017

Otherwise the request-based DM blk-mq request_queue will be put into
service without being properly exported via sysfs.

Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

23a60124

dm mpath: delay requeuing while path initialization is in progress · c1d7ecf7

Bart Van Assche authored Apr 27, 2017

Requeuing a request immediately while path initialization is ongoing
causes high CPU usage, something that is undesired.  Hence delay
requeuing while path initialization is in progress.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

c1d7ecf7

dm mpath: avoid that path removal can trigger an infinite loop · 7083abbb

Bart Van Assche authored Apr 27, 2017

If blk_get_request() fails, check whether the failure is due to a path
being removed.  If that is the case, fail the path by triggering a call
to fail_path().  This avoids that the following scenario can be
encountered while removing paths:
* CPU usage of a kworker thread jumps to 100%.
* Removing the DM device becomes impossible.

Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK,
and the queue is not dying, because in these cases immediate requeuing
is inappropriate.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

7083abbb

dm mpath: split and rename activate_path() to prepare for its expanded use · 89bfce76

Bart Van Assche authored Apr 27, 2017

activate_path() is renamed to activate_path_work() which now calls
activate_or_offline_path().  activate_or_offline_path() will be used
by the next commit.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

89bfce76

dm ioctl: prevent stack leak in dm ioctl call · 4617f564

Adrian Salido authored Apr 27, 2017

When calling a dm ioctl that doesn't process any data
(IOCTL_FLAGS_NO_PARAMS), the contents of the data field in struct
dm_ioctl are left initialized.  Current code is incorrectly extending
the size of data copied back to user, causing the contents of kernel
stack to be leaked to user.  Fix by only copying contents before data
and allow the functions processing the ioctl to override.

Cc: stable@vger.kernel.org
Signed-off-by: Adrian Salido <salidoa@google.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

4617f564

dm integrity: use previously calculated log2 of sectors_per_block · 84ff1bcc

Mikulas Patocka authored Apr 26, 2017

The log2 of sectors_per_block was already calculated, so we don't have
to use the ilog2 function.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

84ff1bcc

dm integrity: use hex2bin instead of open-coded variant · 6625d903
Mikulas Patocka authored Apr 27, 2017
```
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
```
6625d903

dm crypt: replace custom implementation of hex2bin() · e944e03e

Andy Shevchenko authored Apr 27, 2017

There is no need to have a duplication of the generic library, i.e. hex2bin().
Replace the open coded variant.
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

e944e03e

blk-mq-sched: alloate reserved tags out of normal pool · 33931808

Jens Axboe authored Apr 27, 2017

At least one driver, mtip32xx, has a hard coded dependency on
the value of the reserved tag used for internal commands. While
that should really be fixed up, for now let's ensure that we just
bypass the scheduler tags an allocation marked as reserved. They
are used for house keeping or error handling, so we can safely
ignore them in the scheduler.
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

33931808

mtip32xx: use runtime tag to initialize command header · a4e84aae

Ming Lei authored Apr 27, 2017

mtip32xx supposes that 'request_idx' passed to .init_request()
is tag of the request, and use that as request's tag to initialize
command header.

After MQ IO scheduler is in, request tag assigned isn't same with
the request index anymore, so cause strange hardware failure on
mtip32xx, even whole system panic is triggered.

This patch fixes the issue by initializing command header via
request's real tag.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

a4e84aae

26 Apr, 2017 11 commits

scsi: Implement blk_mq_ops.show_rq() · 0eebd005

Bart Van Assche authored Apr 26, 2017

Show the SCSI CDB for pending SCSI commands in
/sys/kernel/debug/block/*/mq/*/dispatch and */rq_list. An example
of how SCSI commands are displayed by this code:

ffff8801703245c0 {.op=READ, .cmd_flags=META PRIO, .rq_flags=DONTPREP IO_STAT STATS, .tag=14, .internal_tag=-1, .cmd=Read(10) 28 00 2a 81 1b 30 00 00 08 00}
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: <linux-scsi@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>

0eebd005

blk-mq: Add blk_mq_ops.show_rq() · 2836ee4b

Bart Van Assche authored Apr 26, 2017

This new callback function will be used in the next patch to show
more information about SCSI requests.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

2836ee4b

blk-mq: Show operation, cmd_flags and rq_flags names · 8658dca8

Bart Van Assche authored Apr 26, 2017

Show the operation name, .cmd_flags and .rq_flags as names instead
of numbers.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

8658dca8

blk-mq: Make blk_flags_show() callers append a newline character · fd07dc81

Bart Van Assche authored Apr 26, 2017

This patch does not change any functionality but makes it possible
to produce a single line of output with multiple flag-to-name
translations.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

fd07dc81

blk-mq: Move the "state" debugfs attribute one level down · 65ca1ca3

Bart Van Assche authored Apr 26, 2017

Move the "state" attribute from the top level to the "mq" directory
as requested by Omar.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

65ca1ca3

blk-mq: Unregister debugfs attributes earlier · e869b546

Bart Van Assche authored Apr 26, 2017

We currently call blk_mq_free_queue() from blk_cleanup_queue()
before we unregister the debugfs attributes for that queue in
blk_release_queue(). This leaves a window open during which
accessing most of the mq debugfs attributes would cause a
use-after-free. Additionally, the "state" attribute allows
running the queue, which we should not do after the queue has
entered the "dead" state. Fix both cases by unregistering the
debugfs attributes before freeing queue resources starts.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

e869b546

blk-mq: Only unregister hctxs for which registration succeeded · f05d1ba7

Bart Van Assche authored Apr 26, 2017

Hctx unregistration involves calling kobject_del(). kobject_del()
must not be called if kobject_add() has not been called. Hence in
the error path only unregister hctxs for which registration succeeded.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f05d1ba7

blk-mq-debugfs: Rename functions for registering and unregistering the mq directory · 62d6c949

Bart Van Assche authored Apr 26, 2017

Since the blk_mq_debugfs_*register_hctxs() functions register and
unregister all attributes under the "mq" directory, rename these
into blk_mq_debugfs_*register_mq().
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

62d6c949

blk-mq: Let blk_mq_debugfs_register() look up the queue name · 4c9e4019

Bart Van Assche authored Apr 26, 2017

A later patch will move the call of blk_mq_debugfs_register() to
a function to which the queue name is not passed as an argument.
To avoid having to add a 'name' argument to multiple callers, let
blk_mq_debugfs_register() look up the queue name.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

4c9e4019

blk-mq: Register <dev>/queue/mq after having registered <dev>/queue · 2d0364c8

Bart Van Assche authored Apr 26, 2017

A later patch in this series will modify blk_mq_debugfs_register()
such that it uses q->kobj.parent to determine the name of a
request queue. Hence make sure that that pointer is initialized
before blk_mq_debugfs_register() is called. To avoid lock inversion,
protect sysfs / debugfs registration with the queue sysfs_lock
instead of the global mutex all_q_mutex.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

2d0364c8

ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset · 1608fd1c

Christoph Hellwig authored Apr 26, 2017

The caller only looks at the scsi_request result field anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

1608fd1c