Commits · c87228f16f0acdf242dee62f139013212208473f · Kirill Smelkov / linux

16 Oct, 2018 13 commits

amiflop: fold headers into C file · c87228f1

Omar Sandoval authored Oct 11, 2018

amifd.h and amifdreg.h are only used from amiflop.c, and they're pretty
small, so move the contents to amiflop.c and get rid of the .h files.
This is preparation for adding a struct blk_mq_tag_set to struct
amiga_floppy_struct.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c87228f1

swim3: convert to blk-mq · 8ccb8cb1

Omar Sandoval authored Oct 15, 2018

Pretty simple conversion. grab_drive() could probably be replaced by
some freeze/quiesce incantation, but I left it alone, and just used
freeze/quiesce for eject. Compile-tested only.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>

Converted to blk_mq_init_sq_queue().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8ccb8cb1

swim3: add real error handling in setup · dbaa54b6

Omar Sandoval authored Oct 11, 2018

The driver doesn't have support for removing a device that has already
been configured, but with more careful ordering we can avoid the need
for that and make sure that we don't leak generic resources.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dbaa54b6

swim: convert to blk-mq · e3896d77

Omar Sandoval authored Oct 15, 2018

The only interesting thing here is that there may be two floppies (i.e.,
request queues) sharing the same controller, so we use the global struct
swim_priv->lock to check whether the controller is busy. Compile-tested
only.
Tested-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>

Converted to blk_mq_init_sq_queue()
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e3896d77

swim: fix cleanup on setup error · 1448a2a5

Omar Sandoval authored Oct 11, 2018

If we fail to allocate the request queue for a disk, we still need to
free that disk, not just the previous ones. Additionally, we need to
cleanup the previous request queues.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1448a2a5

mtd_blkdevs: convert to blk-mq · 891b7c5f

Jens Axboe authored Oct 16, 2018

Straight forward conversion, using an internal list to enable the
driver to pull requests at will.

Dynamically allocate the tag set to avoid having to pull in the
block headers for blktrans.h, since various mtd drivers use
block conflicting names for defines and functions.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

891b7c5f

xsysace: convert to blk-mq · 804186fa

Jens Axboe authored Oct 15, 2018

Straight forward conversion, using an internal list to enable the
driver to pull requests at will.
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

804186fa

paride: convert pf to blk-mq · 77218ddf

Jens Axboe authored Oct 15, 2018

Tested-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

77218ddf

paride: convert pd to blk-mq · 99fe8b02

Jens Axboe authored Oct 15, 2018

Tested-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

99fe8b02

paride: convert pcd to blk-mq · 89c6b165

Jens Axboe authored Oct 15, 2018

Tested-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

89c6b165

ps3disk: convert to blk-mq · fab1adcf

Jens Axboe authored Oct 15, 2018

Convert from the old request_fn style driver to blk-mq.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fab1adcf

blk-mq: provide helper for setting up an SQ queue and tag set · 9316a9ed

Jens Axboe authored Oct 15, 2018

This pattern is repeated throughout all the blk-mq conversions.
Provide a basic helper to get it done.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9316a9ed

null_blk: remove set but not used variable 'q' · de038597

YueHaibing authored Oct 16, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/block/null_blk_main.c: In function 'end_cmd':
drivers/block/null_blk_main.c:609:24: warning:
 variable 'q' set but not used [-Wunused-but-set-variable]

It not used any more after commit
e50b1e32 ("null_blk: remove legacy IO path")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

de038597

14 Oct, 2018 6 commits

cdrom: don't attempt to fiddle with cdo->capability · 8f94004e

Jens Axboe authored Oct 14, 2018

We can't modify cdo->capability as it is defined as a const.
Change the modification hack to just WARN_ON_ONCE() if we hit
any of the invalid combinations.

This fixes a regression for pcd, which doesn't work after the
constify patch.

Fixes: 853fe1bf ("cdrom: Make device operations read-only")
Tested-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8f94004e

block: remove bogus check for queue_lock assignment · 5e27891e

Jens Axboe authored Oct 12, 2018

We just allocated the queue and haven't even set it up yet,
hence we know that checking if ->mq_ops is NULL is always
going to be true.

In fact we do need to assign a lock to ->queue_lock always,
as we need it for the queue flags modifications.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5e27891e

null_blk: remove legacy IO path · e50b1e32

Jens Axboe authored Oct 11, 2018

We're planning on removing this code completely, kill the old
path.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e50b1e32

um: Convert ubd driver to blk-mq · 4e6da0fe

Richard Weinberger authored Nov 26, 2017

Convert the driver to the modern blk-mq framework.
As byproduct we get rid of our open coded restart logic and let
blk-mq handle it.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4e6da0fe

skd: fixup usage of legacy IO API · 6d1f9dfd

Jens Axboe authored Oct 11, 2018

We need to be using the mq variant of request requeue here.

Fixes: ca33dd92 ("skd: Convert to blk-mq")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6d1f9dfd

aoe: convert aoeblk to blk-mq · 3582dd29

Jens Axboe authored Oct 12, 2018

Straight forward conversion - instead of rewriting the internal buffer
retrieval logic, just replace the previous elevator peeking with an
internal list of requests.
Reviewed-by: "Ed L. Cashin" <ed.cashin@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3582dd29

13 Oct, 2018 5 commits

blk-mq: fallback to previous nr_hw_queues when updating fails · e01ad46d

Jianchao Wang authored Oct 12, 2018

When we try to increate the nr_hw_queues, we may fail due to
shortage of memory or other reason, then blk_mq_realloc_hw_ctxs stops
and some entries in q->queue_hw_ctx are left with NULL. However,
because queue map has been updated with new nr_hw_queues, some cpus
have been mapped to hw queue which just encounters allocation failure,
thus blk_mq_map_queue could return NULL. This will cause panic in
following blk_mq_map_swqueue.

To fix it, when increase nr_hw_queues fails, fallback to previous
nr_hw_queues and post warning. At the same time, driver's .map_queues
usually use completion irq affinity to map hw and cpu, fallback
nr_hw_queues will cause lack of some cpu's map to hw, so use default
blk_mq_map_queues to do that.

Reported-by: syzbot+83e8cbe702263932d9d4@syzkaller.appspotmail.com
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e01ad46d

blk-mq: realloc hctx when hw queue is mapped to another node · 34d11ffa

Jianchao Wang authored Oct 12, 2018

When the hw queues and mq_map are updated, a hctx could be mapped
to a different numa node. At this moment, we need to realloc the
hctx. If fail to do that, go on using previous hctx.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

34d11ffa

blk-mq: change gfp flags to GFP_NOIO in blk_mq_realloc_hw_ctxs · 5b202853

Jianchao Wang authored Oct 12, 2018

blk_mq_realloc_hw_ctxs could be invoked during update hw queues.
At the momemt, IO is blocked. Change the gfp flags from GFP_KERNEL
to GFP_NOIO to avoid forever hang during memory allocation in
blk_mq_realloc_hw_ctxs.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5b202853

blk-mq: adjust debugfs and sysfs register when updating nr_hw_queues · 477e19de

Jianchao Wang authored Oct 12, 2018

blk-mq debugfs and sysfs entries need to be removed before updating
queue map, otherwise, we get get wrong result there. This patch fixes
it and remove the redundant debugfs and sysfs register/unregister
operations during __blk_mq_update_nr_hw_queues.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

477e19de

block, bfq: improve asymmetric scenarios detection · 2d29c9f8

Federico Motta authored Oct 12, 2018

bfq defines as asymmetric a scenario where an active entity, say E
(representing either a single bfq_queue or a group of other entities),
has a higher weight than some other entities.  If the entity E does sync
I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
other entities in the following situation: E is in service but
temporarily has no pending I/O request.  In fact, without this plugging,
all the times that E stops being temporarily idle, it may find the
internal queues of the storage device already filled with an
out-of-control number of extra requests, from other entities. So E may
have to wait for the service of these extra requests, before finally
having its own requests served. This may easily break service
guarantees, with E getting less than its fair share of the device
throughput.  Usually, the end result is that E gets the same fraction of
the throughput as the other entities, instead of getting more, according
to its higher weight.

Yet there are two other more subtle cases where E, even if its weight is
actually equal to or even lower than the weight of any other active
entities, may get less than its fair share of the throughput in case the
above I/O plugging is not performed:
1. other entities issue larger requests than E;
2. other entities contain more active child entities than E (or in
   general tend to have more backlog than E).

In the first case, other entities may get more service than E because
they get larger requests, than those of E, served during the temporary
idle periods of E.  In the second case, other entities get more service
because, by having many child entities, they have many requests ready
for dispatching while E is temporarily idle.

This commit addresses this issue by extending the definition of
asymmetric scenario: a scenario is asymmetric when
- active entities representing bfq_queues have differentiated weights,
  as in the original definition
or (inclusive)
- one or more entities representing groups of entities are active.

This broader definition makes sure that I/O plugging will be performed
in all the above cases, provided that there is at least one active
group.  Of course, this definition is very coarse, so it will trigger
I/O plugging also in cases where it is not needed, such as, e.g.,
multiple active entities with just one child each, and all with the same
I/O-request size.  The reason for this coarse definition is just that a
finer-grained definition would be rather heavy to compute.

On the opposite end, even this new definition does not trigger I/O
plugging in all cases where there is no active group, and all bfq_queues
have the same weight.  So, in these cases some unfairness may occur if
there are asymmetries in I/O-request sizes.  We made this choice because
I/O plugging may lower throughput, and probably a user that has not
created any group cares more about throughput than about perfect
fairness.  At any rate, as for possible applications that may care about
service guarantees, bfq already guarantees a high responsiveness and a
low latency to soft real-time applications automatically.
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2d29c9f8

11 Oct, 2018 2 commits

cfq: clear queue pointers from cfqg after unpinning them in cfq_pd_offline · a2fa8a19

Maciej S. Szmigiero authored Oct 10, 2018

BFQ is already doing a similar thing in its .pd_offline_fn() method
implementation.

While it seems that after commit 4c699480
("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
was reverted leaving these pointers intact no longer causes crashes
clearing them is still a sensible thing to do to make the code more robust.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a2fa8a19

block: describe difference between flags IO_STAT and STATS · 4822e902

Konstantin Khlebnikov authored Oct 11, 2018

This adds reasonable comments, but they definitely needs better names.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4822e902

10 Oct, 2018 2 commits

drivers/block: remove redundant 'default n' from Kconfig-s · 486c6fba

Bartlomiej Zolnierkiewicz authored Oct 09, 2018

'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.

Also since commit f467c564 ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:

    ...
    One side effect of (and the main motivation for) this change is making
    the following two definitions behave exactly the same:

        config FOO
                bool

        config FOO
                bool
                default n

    With this change, neither of these will generate a
    '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
    That might make it clearer to people that a bare 'default n' is
    redundant.
    ...
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

486c6fba

block: remove redundant 'default n' from Kconfig-s · 1306ad4e

Bartlomiej Zolnierkiewicz authored Oct 09, 2018

'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.

Also since commit f467c564 ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:

    ...
    One side effect of (and the main motivation for) this change is making
    the following two definitions behave exactly the same:

        config FOO
                bool

        config FOO
                bool
                default n

    With this change, neither of these will generate a
    '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
    That might make it clearer to people that a bare 'default n' is
    redundant.
    ...
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1306ad4e

09 Oct, 2018 12 commits

lightnvm: pblk: guarantee that backpointer is respected on writer stall · 766c8ceb

Javier González authored Oct 09, 2018

pblk's write buffer must guarantee that it respects the device's
constrains for reads (i.e., mw_cunits). This is done by maintaining a
backpointer that updates the L2P table as entries wrap up, making them
point to the media instead of pointing to the write buffer.

This mechanism can race in case that the write thread stalls, as the
write pointer will protect the last written entry, thus disregarding the
read constrains.

This patch adds an extra check on wrap up, making sure that the
threshold is respected at all times, preventing new entries to overwrite
committed data, also in case of write thread stall.
Reported-by: Heiner Litz <hlitz@ucsc.edu>
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Heiner Litz <hlitz@ucsc.edu>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

766c8ceb

lightnvm: pblk: consider max hw sectors supported for max_write_pgs · 8a57fc38

Zhoujie Wu authored Oct 09, 2018

When do GC, the number of read/write sectors are determined
by max_write_pgs(see gc_rq preparation in pblk_gc_line_prepare_ws).

Due to max_write_pgs doesn't consider max hw sectors
supported by nvme controller(128K), which leads to GC
tries to read 64 * 4K in one command, and see below error
caused by pblk_bio_map_addr in function pblk_submit_read_gc.

[ 2923.005376] pblk: could not add page to bio
[ 2923.005377] pblk: could not allocate GC bio (18446744073709551604)
Signed-off-by: Zhoujie Wu <zjwu@marvell.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8a57fc38

lightnvm: pblk: fix error handling of pblk_lines_init() · a70985f8

Wei Yongjun authored Oct 09, 2018

In the too many bad blocks error handling case, we should release all
the allocated resources, otherwise it will cause memory leak.

Fixes: 2deeefc0 ("lightnvm: pblk: fail gracefully on line alloc. failure")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a70985f8

lightnvm: do no update csecs and sos on 1.2 · 6fd05cad

Javier González authored Oct 09, 2018

1.2 devices exposes their data and metadata size through the separate
identify command. Make sure that the NVMe LBA format does not override
these values.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6fd05cad

lightnvm: pblk: guarantee mw_cunits on read buffer · d672d92d

Javier González authored Oct 09, 2018

OCSSD 2.0 defines the amount of data that the host must buffer per chunk
to guarantee reads through the geometry field mw_cunits. This value is
the base that pblk uses to determine the size of its read buffer.
Currently, this size is set to be the closes power-of-2 to mw_cunits
times the number of parallel units available to the pblk instance for
each open line (currently one). When an entry (4KB) is put in the
buffer, the L2P table points to it. As the buffer wraps up, the L2P is
updated to point to addresses on the device, thus guaranteeing mw_cunits
at a chunk level.

However, given that pblk cannot write to the device under ws_min
(normally ws_opt), there might be a window in which the buffer starts
wrapping up and updating L2P entries before the mw_cunits value in a
chunk has been surpassed.

In order not to violate the mw_cunits constrain in this case, account
for ws_opt on the read buffer creation.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d672d92d

lightnvm: pblk: move ring buffer alloc/free rb init · 9bd1f875

Javier González authored Oct 09, 2018

pblk's read/write buffer currently takes a buffer and its size and uses
it to create the metadata around it to use it as a ring buffer. This
puts the responsibility of allocating/freeing ring buffer memory on the
ring buffer user. Instead, move it inside of the ring buffer helpers
(pblk-rb.c). This simplifies creation/destruction routines.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9bd1f875

lightnvm: pblk: encapsulate rb pointer operations · 40b8657d

Javier González authored Oct 09, 2018

pblk's read/write buffer is always a power-of-2, thus wrapping up the
buffer can be done with a bit mask. Since this is an implementation
detail internal to the write buffer, make a helper that hides pointer
increment + wrap, and allows to transparently relax this assumption in
the future.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

40b8657d

lightnvm: pblk: remove unused function · dde4aac2

Javier González authored Oct 09, 2018

Removed unused function in pblk-rb.c
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dde4aac2

lightnvm: pblk: fix race on sysfs line state · 44cdbdc6

Javier González authored Oct 09, 2018

pblk exposes a sysfs interface that represents its internal state. Part
of this state is the map bitmap for the current open line, which should
be protected by the line lock to avoid a race when freeing the line
metadata. Currently, it is not.

This patch makes sure that the line state is consistent and NULL
bitmap pointers are not dereferenced.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

44cdbdc6

lightnvm: pblk: add SPDX license tag · 02a1520d

Javier González authored Oct 09, 2018

Add GLP-2.0 SPDX license tag to all pblk files
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

02a1520d

lightnvm: pblk: recover open lines on 2.0 devices · 6ad2f619

Javier González authored Oct 09, 2018

In the OCSSD 2.0 spec, each chunk reports its write pointer. This means
that pblk does not need to scan open lines to find the write pointer,
but instead, it can retrieve it directly (and verify it).

This patch uses the write pointer on open lines to (i) recover the line
up until the last written lba and (ii) reconstruct the map bitmap and
rest of line metadata so that the line can be used for new data.

Since the 1.2 path in lightnvm core has been re-implemented to populate
the chunk structure and thus recover the write pointer on
initialization, this patch removes 1.2 specific recovery, as the 2.0
path can be reused.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6ad2f619

lightnvm: pblk: take write semaphore on metadata · 253babc3

Javier González authored Oct 09, 2018

pblk guarantees write ordering at a chunk level through a per open chunk
semaphore. At this point, since we only have an open I/O stream for both
user and GC data, the semaphore is per parallel unit.

For the metadata I/O that is synchronous, the semaphore is not needed as
ordering is guaranteed. However, if the metadata scheme changes or
multiple streams are open, this guarantee might not be preserved.

This patch makes sure that all writes go through the semaphore, even for
synchronous I/O. This is consistent with pblk's write I/O model. It also
simplifies maintenance since changes in the metadata scheme could cause
ordering issues.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

253babc3