Commits · 785d584c30ffc1224027536fe55bdc15ee509f14 · Kirill Smelkov / linux

27 Oct, 2021 1 commit

nvme: add new discovery log page entry definitions · 785d584c

Hannes Reinecke authored Oct 18, 2021

TP8014 adds a new SUBTYPE value and a new field EFLAGS for the
discovery log page entry.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

785d584c

25 Oct, 2021 1 commit

block: ataflop: more blk-mq refactoring fixes · d28e4dff

Michael Schmitz authored Oct 24, 2021

As it turns out, my earlier patch in commit 86d46fda (block:
ataflop: fix breakage introduced at blk-mq refactoring) was
incomplete. This patch fixes any remaining issues found during
more testing and code review.

Requests exceeding 4 k are handled in 4k segments but
__blk_mq_end_request() is never called on these (still
sectors outstanding on the request). With redo_fd_request()
removed, there is no provision to kick off processing of the
next segment, causing requests exceeding 4k to hang. (By
setting /sys/block/fd0/queue/max_sectors_k <= 4 as workaround,
this behaviour can be avoided).

Instead of reintroducing redo_fd_request(), requeue the remainder
of the request by calling blk_mq_requeue_request() on incomplete
requests (i.e. when blk_update_request() still returns true), and
rely on the block layer to queue the residual as new request.

Both error handling and formatting needs to release the
ST-DMA lock, so call finish_fdc() on these (this was previously
handled by redo_fd_request()). finish_fdc() may be called
legitimately without the ST-DMA lock held - make sure we only
release the lock if we actually held it. In a similar way,
early exit due to errors in ataflop_queue_rq() must release
the lock.

After minor errors, fd_error sets up to recalibrate the drive
but never re-runs the current operation (another task handled by
redo_fd_request() before). Call do_fd_action() to get the next
steps (seek, retry read/write) underway.
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Fixes: 6ec3938c (ataflop: convert to blk-mq)
CC: linux-block@vger.kernel.org
Link: https://lore.kernel.org/r/20211024002013.9332-1-schmitzmic@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d28e4dff

22 Oct, 2021 1 commit

block: remove support for cryptoloop and the xor transfer · 47e96246

Christoph Hellwig authored Oct 19, 2021

Support for cyrptoloop has been officially marked broken and deprecated
in favor of dm-crypt (which supports the same broken algorithms if
needed) in Linux 2.6.4 (released in March 2004), and support for it has
been entirely removed from losetup in util-linux 2.23 (released in April
2013). The XOR transfer has never been more than a toy to demonstrate
the transfer in the bad old times of crypto export restrictions.
Remove them as they have some nasty interactions with loop device life
times due to the iteration over all loop devices in
loop_unregister_transfer.
Suggested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019075639.2333969-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

47e96246

21 Oct, 2021 9 commits

mtd: add add_disk() error handling · 83b863f4

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-10-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

83b863f4

rnbd: add error handling support for add_disk() · 2e9e31be

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-9-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

2e9e31be

um/drivers/ubd_kern: add error handling support for add_disk() · 66638f16

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

ubd_disk_register() never returned an error, so just fix
that now and let the caller handle the error condition.
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-8-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

66638f16

m68k/emu/nfblock: add error handling support for add_disk() · 21fd880d

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-7-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

21fd880d

xen-blkfront: add error handling support for add_disk() · 293a7c52

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on device_add_disk() as this function
returned void. Now that this is fixed, use the shiny new error
handling. The function xlvbd_alloc_gendisk() typically does the
unwinding on error on allocating the disk and creating the tag,
but since all that error handling was stuffed inside
xlvbd_alloc_gendisk() we must repeat the tag free'ing as well.

We set the info->rq to NULL to ensure blkif_free() doesn't crash
on blk_mq_stop_hw_queues() on device_add_disk() error as the queue
will be long gone by then.
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-6-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

293a7c52

bcache: add error handling support for add_disk() · 2961c3bb

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

This driver doesn't do any unwinding with blk_cleanup_disk()
even on errors after add_disk() and so we follow that
tradition.
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-5-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

2961c3bb

dm: add add_disk() error handling · e7089f65

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

There are two calls to dm_setup_md_queue() which can fail then,
one on dm_early_create() and we can easily see that the error path
there calls dm_destroy in the error path. The other use case is on
the ioctl table_load case. If that fails userspace needs to call
the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-4-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

e7089f65

block: aoe: fixup coccinelle warnings · ff06ed7e

Ye Guojin authored Oct 21, 2021

coccicheck complains about the use of snprintf() in sysfs show
functions:
WARNING  use scnprintf or sprintf

Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Link: https://lore.kernel.org/r/20211021064931.1047687-1-ye.guojin@zte.com.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

ff06ed7e

Merge tag 'nvme-5.16-2021-10-21' of git://git.infradead.org/nvme into for-5.16/drivers · cbab6ae0

Jens Axboe authored Oct 21, 2021

Pull NVMe updates from Christoph:

"nvme updates for Linux 5.16

 - fix a multipath partition scanning deadlock (Hannes Reinecke)
 - generate uevent once a multipath namespace is operational again
   (Hannes Reinecke)
 - support unique discovery controller NQNs (Hannes Reinecke)
 - fix use-after-free when a port is removed (Israel Rukshin)
 - clear shadow doorbell memory on resets (Keith Busch)
 - use struct_size (Len Baker)
 - add error handling support for add_disk (Luis Chamberlain)
 - limit the maximal queue size for RDMA controllers (Max Gurtovoy)
 - use a few more symbolic names (Max Gurtovoy)
 - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy)
 - add support for ->map_queues on FC (Saurav Kashyap)"

* tag 'nvme-5.16-2021-10-21' of git://git.infradead.org/nvme: (23 commits)
  nvmet: use struct_size over open coded arithmetic
  nvme: drop scan_lock and always kick requeue list when removing namespaces
  nvme-pci: clear shadow doorbell memory on resets
  nvme-rdma: fix error code in nvme_rdma_setup_ctrl
  nvme-multipath: add error handling support for add_disk()
  nvmet: use macro definitions for setting cmic value
  nvmet: use macro definition for setting nmic value
  nvme: display correct subsystem NQN
  nvme: Add connect option 'discovery'
  nvme: expose subsystem type in sysfs attribute 'subsystype'
  nvmet: set 'CNTRLTYPE' in the identify controller data
  nvmet: add nvmet_is_disc_subsys() helper
  nvme: add CNTRLTYPE definitions for 'identify controller'
  nvmet: make discovery NQN configurable
  nvmet-rdma: implement get_max_queue_size controller op
  nvmet: add get_max_queue_size op for controllers
  nvme-rdma: limit the maximal queue size for RDMA controllers
  nvmet-tcp: fix use-after-free when a port is removed
  nvmet-rdma: fix use-after-free when a port is removed
  nvmet: fix use-after-free when a port is removed
  ...

cbab6ae0

20 Oct, 2021 28 commits

nvmet: use struct_size over open coded arithmetic · 117d5b6d

Len Baker authored Oct 17, 2021

As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

In this case this is not actually dynamic size: all the operands
involved in the calculation are constant values. However it is better to
refactor this anyway, just to keep the open-coded math idiom out of
code.

So, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kmalloc() function.

This code was detected with the help of Coccinelle and audited and fixed
manually.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-argumentsSigned-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

117d5b6d

nvme: drop scan_lock and always kick requeue list when removing namespaces · 2b81a5f0

Hannes Reinecke authored Oct 20, 2021

When reading the partition table on initial scan hits an I/O error the
I/O will hang with the scan_mutex held:

[<0>] do_read_cache_page+0x49b/0x790
[<0>] read_part_sector+0x39/0xe0
[<0>] read_lba+0xf9/0x1d0
[<0>] efi_partition+0xf1/0x7f0
[<0>] bdev_disk_changed+0x1ee/0x550
[<0>] blkdev_get_whole+0x81/0x90
[<0>] blkdev_get_by_dev+0x128/0x2e0
[<0>] device_add_disk+0x377/0x3c0
[<0>] nvme_mpath_set_live+0x130/0x1b0 [nvme_core]
[<0>] nvme_mpath_add_disk+0x150/0x160 [nvme_core]
[<0>] nvme_alloc_ns+0x417/0x950 [nvme_core]
[<0>] nvme_validate_or_alloc_ns+0xe9/0x1e0 [nvme_core]
[<0>] nvme_scan_work+0x168/0x310 [nvme_core]
[<0>] process_one_work+0x231/0x420

and trying to delete the controller will deadlock as it tries to grab
the scan mutex:

[<0>] nvme_mpath_clear_ctrl_paths+0x25/0x80 [nvme_core]
[<0>] nvme_remove_namespaces+0x31/0xf0 [nvme_core]
[<0>] nvme_do_delete_ctrl+0x4b/0x80 [nvme_core]

As we're now properly ordering the namespace list there is no need to
hold the scan_mutex in nvme_mpath_clear_ctrl_paths() anymore.
And we always need to kick the requeue list as the path will be marked
as unusable and I/O will be requeued _without_ a current path.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2b81a5f0

nvme-pci: clear shadow doorbell memory on resets · 58847f12

Keith Busch authored Oct 14, 2021

The host memory doorbell and event buffers need to be initialized on
each reset so the driver doesn't observe stale values from the previous
instantiation.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Tested-by: John Levon <john.levon@nutanix.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

58847f12

nvme-rdma: fix error code in nvme_rdma_setup_ctrl · 09748122

Max Gurtovoy authored Oct 17, 2021

In case that icdoff is not zero or mandatory keyed sgls are not
supported by the NVMe/RDMA target, we'll go to error flow but we'll
return 0 to the caller. Fix it by returning an appropriate error code.

Fixes: c66e2998 ("nvme-rdma: centralize controller setup sequence")
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

09748122

nvme-multipath: add error handling support for add_disk() · 11384580

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Since we now can tell for sure when a disk was added, move
setting the bit NVME_NSHEAD_DISK_LIVE only when we did
add the disk successfully.

Nothing to do here as the cleanup is done elsewhere. We take
care and use test_and_set_bit() because it is protects against
two nvme paths simultaneously calling device_add_disk() on the
same namespace head.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

11384580

nvmet: use macro definitions for setting cmic value · d56ae18f

Max Gurtovoy authored Sep 23, 2021

This makes the code more readable.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

d56ae18f

nvmet: use macro definition for setting nmic value · 571b5444

Max Gurtovoy authored Sep 23, 2021

This makes the code more readable.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

571b5444

nvme: display correct subsystem NQN · e5ea42fa

Hannes Reinecke authored Sep 22, 2021

With discovery controllers supporting unique subsystem NQNs the
actual subsystem NQN might be different from that one passed in
via the connect args. So add a helper to display the resulting
subsystem NQN.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

e5ea42fa

nvme: Add connect option 'discovery' · 20e8b689

Hannes Reinecke authored Sep 22, 2021

Add a connect option 'discovery' to specify that the connection
should be made to a discovery controller, not a normal I/O controller.
With discovery controllers supporting unique subsystem NQNs we
cannot easily distinguish by the subsystem NQN if this should be
a discovery connection, but we need this information to blank out
options not supported by discovery controllers.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

20e8b689

nvme: expose subsystem type in sysfs attribute 'subsystype' · 954ae166

Hannes Reinecke authored Sep 22, 2021

With unique discovery controller NQNs we cannot distinguish the
subsystem type by the NQN alone, but need to check the subsystem
type, too.
So expose the subsystem type in a new sysfs attribute 'subsystype'.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

954ae166

nvmet: set 'CNTRLTYPE' in the identify controller data · d3aef701

Hannes Reinecke authored Sep 22, 2021

Set the correct 'CNTRLTYPE' field in the identify controller data.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

d3aef701

nvmet: add nvmet_is_disc_subsys() helper · a294711e

Hannes Reinecke authored Sep 22, 2021

Add a helper function to determine if a given subsystem is a discovery
subsystem.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

a294711e

nvme: add CNTRLTYPE definitions for 'identify controller' · e15a8a97

Hannes Reinecke authored Sep 22, 2021

Update the 'identify controller' structure to define the newly added
CNTRLTYPE field.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

e15a8a97

nvmet: make discovery NQN configurable · 626851e9

Hannes Reinecke authored Sep 22, 2021

TPAR8013 allows for unique discovery NQNs, so make the discovery
controller NQN configurable by exposing a subsys attribute
'discovery_nqn'.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

626851e9

nvmet-rdma: implement get_max_queue_size controller op · c7d792f9

Max Gurtovoy authored Sep 23, 2021

Limit the maximal queue size for RDMA controllers. Today, the target
reports a limit of 1024 and this limit isn't valid for some of the RDMA
based controllers. For now, limit RDMA transport to 128 entries (the
max queue depth configured for Linux NVMe/RDMA host).

Future general solution should use RDMA/core API to calculate this size
according to device capabilities and number of WRs needed per NVMe IO
request.
Reported-by: Mark Ruijter <mruijter@primelogic.nl>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c7d792f9

nvmet: add get_max_queue_size op for controllers · 6d1555cc

Max Gurtovoy authored Sep 23, 2021

Some transports, such as RDMA, would like to set the queue size
according to device/port/ctrl characteristics. Add a new nvmet transport
op that is called during ctrl initialization. This will not effect
transports that don't implement this option.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

6d1555cc

nvme-rdma: limit the maximal queue size for RDMA controllers · 44c3c625

Max Gurtovoy authored Sep 23, 2021

Corrent limit of 1024 isn't valid for some of the RDMA based ctrls. In
case the target expose a cap of larger amount of entries (e.g. 1024),
the initiator may fail to create a QP with this size. Thus limit to a
value that works for all RDMA adapters.

Future general solution should use RDMA/core API to calculate this size
according to device capabilities and number of WRs needed per NVMe IO
request.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

44c3c625

nvmet-tcp: fix use-after-free when a port is removed · 2351ead9

Israel Rukshin authored Oct 06, 2021

When removing a port, all its controllers are being removed, but there
are queues on the port that doesn't belong to any controller (during
connection time). This causes a use-after-free bug for any command
that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
should be destroyed before freeing the port via configfs. Destroy
the remaining queues after the accept_work was cancelled guarantees
that no new queue will be created.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2351ead9

nvmet-rdma: fix use-after-free when a port is removed · fcf73a80

Israel Rukshin authored Oct 06, 2021

When removing a port, all its controllers are being removed, but there
are queues on the port that doesn't belong to any controller (during
connection time). This causes a use-after-free bug for any command
that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
should be destroyed before freeing the port via configfs. Destroy the
remaining queues after the RDMA-CM was destroyed guarantees that no
new queue will be created.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

fcf73a80

nvmet: fix use-after-free when a port is removed · e3e19dcc

Israel Rukshin authored Oct 06, 2021

When a port is removed through configfs, any connected controllers
are starting teardown flow asynchronously and can still send commands.
This causes a use-after-free bug for any command that dereferences
req->port (like in nvmet_parse_io_cmd).

To fix this, wait for all the teardown scheduled works to complete
(like release_work at rdma/tcp drivers). This ensures there are no
active controllers when the port is eventually removed.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

e3e19dcc

qla2xxx: add ->map_queues support for nvme · 2b2af50a

Saurav Kashyap authored Aug 23, 2021

Implement ->map queues and use the block layer blk_mq_pci_map_queues
helper for mapping queues to CPUs.

With this mapping minimum 10%+ increase in performance is noticed.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2b2af50a

nvme-fc: add support for ->map_queues · 01d83816

Saurav Kashyap authored Aug 23, 2021

NVMe FC don't have support for map queues, unlike the PCI, RDMA and TCP
transports.  Add a ->map_queues callout for the LLDDs to provide such
functionality.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

01d83816

nvme: generate uevent once a multipath namespace is operational again · f6f09c15

Hannes Reinecke authored Oct 06, 2021

When fast_io_fail_tmo is set I/O will be aborted while recovery is
still ongoing. This causes MD to set the namespace to failed, and
no futher I/O will be submitted to that namespace.

However, once the recovery succeeds and the namespace becomes
operational again the NVMe subsystem doesn't send a notification,
so MD cannot automatically reinstate operation and requires
manual interaction.

This patch will send a KOBJ_CHANGE uevent per multipathed namespace
once the underlying controller transitions to LIVE, allowing an automatic
MD reassembly with these udev rules:

/etc/udev/rules.d/65-md-auto-re-add.rules:
SUBSYSTEM!="block", GOTO="md_end"

ACTION!="change", GOTO="md_end"
ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="md_end"
PROGRAM="/sbin/md_raid_auto_readd.sh $devnode"
LABEL="md_end"

/sbin/md_raid_auto_readd.sh:

MDADM=/sbin/mdadm
DEVNAME=$1

export $(${MDADM} --examine --export ${DEVNAME})

if [ -z "${MD_UUID}" ]; then
    exit 1
fi

UUID_LINK=$(readlink /dev/disk/by-id/md-uuid-${MD_UUID})
MD_DEVNAME=${UUID_LINK##*/}
export $(${MDADM} --detail --export /dev/${MD_DEVNAME})
if [ -z "${MD_METADATA}" ] ; then
    exit 1
fi
if [ $(cat /sys/block/${MD_DEVNAME}/md/degraded) != 1 ]; then
    echo "${MD_DEVNAME}: array not degraded, nothing to do"
    exit 0
fi
MD_STATE=$(cat /sys/block/${MD_DEVNAME}/md/array_state)
if [ ${MD_STATE} != "clean" ] ; then
    echo "${MD_DEVNAME}: array state ${MD_STATE}, cannot re-add"
    exit 1
fi
MD_VARNAME="MD_DEVICE_dev_${DEVNAME##*/}_ROLE"
if [ ${!MD_VARNAME} = "spare" ] ; then
    ${MDADM} --manage /dev/${MD_DEVNAME} --re-add ${DEVNAME}
fi

Changes to v2:
- Add udev rules example to description
Changes to v1:
- use disk_uevent() as suggested by hch
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

f6f09c15

bcache: remove bch_crc64_update · 39fa7a95

Christoph Hellwig authored Oct 20, 2021

bch_crc64_update is an entirely pointless wrapper around crc64_be.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-9-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

39fa7a95

bcache: use bvec_kmap_local in bch_data_verify · 00387bd2

Christoph Hellwig authored Oct 20, 2021

Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Also switch from page_address to bvec_kmap_local for cbv to be on the
safe side and to avoid pointlessly poking into bvec internals.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-8-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

00387bd2

bcache: remove the backing_dev_name field from struct cached_dev · 0f5cd781

Christoph Hellwig authored Oct 20, 2021

Just use the %pg format specifier to print the name directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-7-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

0f5cd781

bcache: remove the cache_dev_name field from struct cache · 7e84c215

Christoph Hellwig authored Oct 20, 2021

Just use the %pg format specifier to print the name directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-6-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

7e84c215

bcache: move calc_cached_dev_sectors to proper place on backing device detach · 0259d449

Lin Feng authored Oct 20, 2021

Calculation of cache_set's cached sectors is done by travelling
cached_devs list as shown below:

static void calc_cached_dev_sectors(struct cache_set *c)
{
...
        list_for_each_entry(dc, &c->cached_devs, list)
                sectors += bdev_sectors(dc->bdev);

        c->cached_dev_sectors = sectors;
}

But cached_dev won't be unlinked from c->cached_devs list until we call
following list_move(&dc->list, &uncached_devices),
so previous fix in 'commit 46010141
("bcache: recal cached_dev_sectors on detach")' is wrong, now we move
it to its right place.
Signed-off-by: Lin Feng <linf@wangsu.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-5-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

0259d449