Commits · ceaf2966ab082bbc4d26516f97b3ca8a676e2af8 · Kirill Smelkov / linux

26 Apr, 2021 2 commits

async_xor: increase src_offs when dropping destination page · ceaf2966

Xiao Ni authored Apr 25, 2021

Now we support sharing one page if PAGE_SIZE is not equal stripe size. To
support this, it needs to support calculating xor value with different
offsets for each r5dev. One offset array is used to record those offsets.

In RMW mode, parity page is used as a source page. It sets
ASYNC_TX_XOR_DROP_DST before calculating xor value in ops_run_prexor5.
So it needs to add src_list and src_offs at the same time. Now it only
needs src_list. So the xor value which is calculated is wrong. It can
cause data corruption problem.

I can reproduce this problem 100% on a POWER8 machine. The steps are:

  mdadm -CR /dev/md0 -l5 -n3 /dev/sdb1 /dev/sdc1 /dev/sdd1 --size=3G
  mkfs.xfs /dev/md0
  mount /dev/md0 /mnt/test
  mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.

Fixes: 29bcff78 ("md/raid5: add new xor function to support different page offset")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>

ceaf2966

drivers/block/null_blk/main: Fix a double free in null_init. · 72ce11dd

Lv Yunlong authored Apr 26, 2021

In null_init, null_add_dev(dev) is called.
In null_add_dev, it calls null_free_zoned_dev(dev) to free dev->zones
via kvfree(dev->zones) in out_cleanup_zone branch and returns err.
Then null_init accept the err code and then calls null_free_dev(dev).

But in null_free_dev(dev), dev->zones is freed again by
null_free_zoned_dev().

My patch set dev->zones to NULL in null_free_zoned_dev() after
kvfree(dev->zones) is called, to avoid the double free.

Fixes: 2984c868 ("nullb: factor disk parameters")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Link: https://lore.kernel.org/r/20210426143229.7374-1-lyl2019@mail.ustc.edu.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

72ce11dd

23 Apr, 2021 3 commits

Merge branch 'md-next' of... · b8417f72

Jens Axboe authored Apr 23, 2021

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers

Pull MD fixes from Song.

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid1: properly indicate failure when ending a failed write request
  md-cluster: fix use-after-free issue when removing rdev

b8417f72

md/raid1: properly indicate failure when ending a failed write request · 2417b986

Paul Clements authored Apr 15, 2021

This patch addresses a data corruption bug in raid1 arrays using bitmaps.
Without this fix, the bitmap bits for the failed I/O end up being cleared.

Since we are in the failure leg of raid1_end_write_request, the request
either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).

Fixes: eeba6809 ("md/raid1: end bio when the device faulty")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Paul Clements <paul.clements@us.sios.com>
Signed-off-by: Song Liu <song@kernel.org>

2417b986

md-cluster: fix use-after-free issue when removing rdev · f7c7a2f9

Heming Zhao authored Apr 08, 2021

md_kick_rdev_from_array will remove rdev, so we should
use rdev_for_each_safe to search list.

How to trigger:

env: Two nodes on kvm-qemu x86_64 VMs (2C2G with 2 iscsi luns).

```
node2=192.168.0.3

for i in {1..20}; do
    echo ==== $i `date` ====;

    mdadm -Ss && ssh ${node2} "mdadm -Ss"
    wipefs -a /dev/sda /dev/sdb

    mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l 1 /dev/sda \
       /dev/sdb --assume-clean
    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
    mdadm --wait /dev/md0
    ssh ${node2} "mdadm --wait /dev/md0"

    mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda
    sleep 1
done
```

Crash stack:

```
stack segment: 0000 [#1] SMP
... ...
RIP: 0010:md_check_recovery+0x1e8/0x570 [md_mod]
... ...
RSP: 0018:ffffb149807a7d68 EFLAGS: 00010207
RAX: 0000000000000000 RBX: ffff9d494c180800 RCX: ffff9d490fc01e50
RDX: fffff047c0ed8308 RSI: 0000000000000246 RDI: 0000000000000246
RBP: 6b6b6b6b6b6b6b6b R08: ffff9d490fc01e40 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: ffff9d494c180818 R14: ffff9d493399ef38 R15: ffff9d4933a1d800
FS:  0000000000000000(0000) GS:ffff9d494f700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe68cab9010 CR3: 000000004c6be001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 raid1d+0x5c/0xd40 [raid1]
 ? finish_task_switch+0x75/0x2a0
 ? lock_timer_base+0x67/0x80
 ? try_to_del_timer_sync+0x4d/0x80
 ? del_timer_sync+0x41/0x50
 ? schedule_timeout+0x254/0x2d0
 ? md_start_sync+0xe0/0xe0 [md_mod]
 ? md_thread+0x127/0x160 [md_mod]
 md_thread+0x127/0x160 [md_mod]
 ? wait_woken+0x80/0x80
 kthread+0x10d/0x130
 ? kthread_park+0xa0/0xa0
 ret_from_fork+0x1f/0x40
```

Fixes: dbb64f86 ("md-cluster: Fix adding of new disk with new reload code")
Fixes: 659b254f ("md-cluster: remove a disk asynchronously from cluster environment")
Cc: stable@vger.kernel.org
Reviewed-by: Gang He <ghe@suse.com>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>

f7c7a2f9

22 Apr, 2021 2 commits

Merge tag 'nvme-5.13-2021-04-22' of git://git.infradead.org/nvme into for-5.13/drivers · 87d9ad02

Jens Axboe authored Apr 22, 2021

Pull NVMe updates from Christoph:

"- add support for a per-namespace character device (Minwoo Im)
 - various KATO fixes and cleanups (Hou Pu, Hannes Reinecke)
 - APST fix and cleanup"

* tag 'nvme-5.13-2021-04-22' of git://git.infradead.org/nvme:
  nvme: introduce generic per-namespace chardev
  nvme: cleanup nvme_configure_apst
  nvme: do not try to reconfigure APST when the controller is not live
  nvme: add 'kato' sysfs attribute
  nvme: sanitize KATO setting
  nvmet: avoid queuing keep-alive timer if it is disabled

87d9ad02

nvme: introduce generic per-namespace chardev · 2637baed

Minwoo Im authored Apr 21, 2021

Userspace has not been allowed to I/O to device that's failed to
be initialized.  This patch introduces generic per-namespace character
device to allow userspace to I/O regardless the block device is there or
not.

The chardev naming convention will similar to the existing blkdev naming,
using a ng prefix instead of nvme, i.e.

	- /dev/ngXnY

It also supports multipath which means it will not expose chardev for the
hidden namespace blkdevs (e.g., nvmeXcYnZ).  If /dev/ngXnY is created for
a ns_head, then I/O request will be routed to a specific controller
selected by the iopolicy of the subsystem.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2637baed

21 Apr, 2021 8 commits

nvme: cleanup nvme_configure_apst · 60df5de9

Christoph Hellwig authored Apr 09, 2021

Remove a level of indentation from the main code implementating the table
search by using a goto for the APST not supported case.  Also move the
main comment above the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

60df5de9

nvme: do not try to reconfigure APST when the controller is not live · 53fe2a30

Christoph Hellwig authored Apr 09, 2021

Do not call nvme_configure_apst when the controller is not live, given
that nvme_configure_apst will fail due the lack of an admin queue when
the controller is being torn down and nvme_set_latency_tolerance is
called from dev_pm_qos_hide_latency_tolerance.

Fixes: 510a405d("nvme: fix memory leak for power latency tolerance")
Reported-by: Peng Liu <liupeng17@lenovo.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>

53fe2a30

nvme: add 'kato' sysfs attribute · 74c22990

Hannes Reinecke authored Apr 16, 2021

Add a 'kato' controller sysfs attribute to display the current
keep-alive timeout value (if any). This allows userspace to identify
persistent discovery controllers, as these will have a non-zero
KATO value.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

74c22990

nvme: sanitize KATO setting · a70b81bd

Hannes Reinecke authored Apr 16, 2021

According to the NVMe base spec the KATO commands should be sent
at half of the KATO interval, to properly account for round-trip
times.
As we now will only ever send one KATO command per connection we
can easily use the recommended values.
This also fixes a potential issue where the request timeout for
the KATO command does not match the value in the connect command,
which might be causing spurious connection drops from the target.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

a70b81bd

nvmet: avoid queuing keep-alive timer if it is disabled · 8f864c59

Hou Pu authored Apr 16, 2021

Issue following command:
nvme set-feature -f 0xf -v 0 /dev/nvme1n1 # disable keep-alive timer
nvme admin-passthru -o 0x18 /dev/nvme1n1  # send keep-alive command
will make keep-alive timer fired and thus delete the controller like
below:

[247459.907635] nvmet: ctrl 1 keep-alive timer (0 seconds) expired!
[247459.930294] nvmet: ctrl 1 fatal error occurred!

Avoid this by not queuing delayed keep-alive if it is disabled when
keep-alive command is received from the admin queue.
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Tested-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

8f864c59

brd: expose number of allocated pages in debugfs · f4be591f

Calvin Owens authored Apr 16, 2021

While the maximum size of each ramdisk is defined either as a module
parameter, or compile time default, it's impossible to know how many pages
have currently been allocated by each ram%d device, since they're
allocated when used and never freed.

This patch creates a new directory at this location:

/sys/kernel/debug/ramdisk_pages/

which will contain a file named "ram%d" for each instantiated ramdisk on
the system. The file is read-only, and read() will output the number of
pages currently held by that ramdisk.

We lose track how much memory a ramdisk is using as pages once used are
simply recycled but never freed.

In instances where we exhaust the size of the ramdisk with a file that
exceeds it, encounter ENOSPC and delete the file for mitigation; df would
show decrease in used and increase in available blocks but the since we
have touched all pages, the memory footprint of the ramdisk does not
reflect the blocks used/available count

...
[root@localhost ~]# mkfs.ext2 /dev/ram15
mke2fs 1.45.6 (20-Mar-2020)
Creating filesystem with 4096 1k blocks and 1024 inodes
[root@localhost ~]# mount /dev/ram15 /mnt/ram15/

[root@localhost ~]# cat
/sys/kernel/debug/ramdisk_pages/ram15
58
[root@kerneltest008.06.prn3 ~]# df /dev/ram15
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/ram15          3963    31      3728   1% /mnt/ram15
[root@kerneltest008.06.prn3 ~]# dd if=/dev/urandom of=/mnt/ram15/test2
bs=1M count=5
dd: error writing '/mnt/ram15/test2': No space left on device
4+0 records in
3+0 records out
4005888 bytes (4.0 MB, 3.8 MiB) copied, 0.0446614 s, 89.7 MB/s
[root@kerneltest008.06.prn3 ~]# df /mnt/ram15/
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/ram15          3963  3960         0 100% /mnt/ram15
[root@kerneltest008.06.prn3 ~]# cat
/sys/kernel/debug/ramdisk_pages/ram15
1024
[root@kerneltest008.06.prn3 ~]# rm /mnt/ram15/test2
rm: remove regular file '/mnt/ram15/test2'? y
[root@kerneltest008.06.prn3 /var]# df /dev/ram15
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/ram15          3963    31      3728   1% /mnt/ram15

# Acutal memory footprint
[root@kerneltest008.06.prn3 /var]# cat
/sys/kernel/debug/ramdisk_pages/ram15
1024
...

This debugfs counter will always reveal the accurate number of
permanently allocated pages to the ramdisk.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
[cleaned up the !CONFIG_DEBUG_FS case and API changes for HEAD]
Signed-off-by: Kyle McMartin <jkkm@fb.com>
[rebased]
Signed-off-by: Saravanan D <saravanand@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f4be591f

ataflop: fix off by one in ataflop_probe() · b777f4c4

Dan Carpenter authored Apr 21, 2021

Smatch complains that the "type > NUM_DISK_MINORS" should be >=
instead of >.  We also need to subtract one from "type" at the start.

Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b777f4c4

ataflop: potential out of bounds in do_format() · 1ffec389

Dan Carpenter authored Apr 21, 2021

The function uses "type" as an array index:

	q = unit[drive].disk[type]->queue;

Unfortunately the bounds check on "type" isn't done until later in the
function.  Fix this by moving the bounds check to the start.

Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1ffec389

20 Apr, 2021 25 commits

drbd: Fix fall-through warnings for Clang · 6327c911

Gustavo A. R. Silva authored Apr 20, 2021

In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding a break statement instead of just
letting the code fall through to the next, and by adding a fallthrough
pseudo-keyword in places whre the code is intended to fall through.

Link: https://github.com/KSPP/linux/issues/115Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6327c911

block/rnbd: Use strscpy instead of strlcpy · 57b93ed4

Dima Stepanov authored Apr 19, 2021

During checkpatch analyzing the following warning message was found:
WARNING:STRLCPY: Prefer strscpy over strlcpy - see:
https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Fix it by using strscpy calls instead of strlcpy.
Signed-off-by: Dima Stepanov <dmitrii.stepanov@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-20-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

57b93ed4

block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name · 3db7cf55

Dima Stepanov authored Apr 19, 2021

cppcheck report the following error:
  rnbd/rnbd-clt-sysfs.c:522:36: error: The variable 'buf' is used both
  as a parameter and as destination in snprintf(). The origin and
  destination buffers overlap. Quote from glibc (C-library)
  documentation
  (http://www.gnu.org/software/libc/manual/html_mono/libc.html#Formatted-Output-Functions):
  "If copying takes place between objects that overlap as a result of a
  call to sprintf() or snprintf(), the results are undefined."
  [sprintfOverlappingData]
Fix it by initializing the buf variable in the first snprintf call.

Fixes: 91f4acb2 ("block/rnbd-clt: support mapping two devices")
Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-19-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3db7cf55

block/rnbd-clt: Remove max_segment_size · 503438a4

Jack Wang authored Apr 19, 2021

We always map with SZ_4K, so do not need max_segment_size.

Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-18-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

503438a4

block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes · 3ba1c693

Md Haris Iqbal authored Apr 19, 2021

When an RTRS session state changes, the transport layer generates an event
to RNBD. Then RNBD will change the state of the RNBD client device
accordingly.

This commit add kobject_uevent when the RNBD device state changes. With
this udev rules can be configured to react accordingly.
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-17-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3ba1c693

block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev · c81cba85

Gioh Kim authored Apr 19, 2021

struct rtrs_srv is not used when handling rnbd_srv_rdma_ev messages, so
cleaned up
rdma_ev function pointer in rtrs_srv_ops also is changed.

Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-16-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c81cba85

Documentation/ABI/rnbd-clt: Add description for nr_poll_queues · 015fcf13

Gioh Kim authored Apr 19, 2021

describe how to set nr_poll_queues and enable the polling
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-15-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

015fcf13

block/rnbd-clt: Support polling mode for IO latency optimization · 2958a995

Gioh Kim authored Apr 19, 2021

RNBD can make double-queues for irq-mode and poll-mode.
For example, on 4-CPU system 8 request-queues are created,
4 for irq-mode and 4 for poll-mode.
If the IO has HIPRI flag, the block-layer will call .poll function
of RNBD. Then IO is sent to the poll-mode queue.
Add optional nr_poll_queues argument for map_devices interface.

To support polling of RNBD, RTRS client creates connections
for both of irq-mode and direct-poll-mode.

For example, on 4-CPU system it could've create 5 connections:
con[0] => user message (softirq cq)
con[1:4] => softirq cq

After this patch, it can create 9 connections:
con[0] => user message (softirq cq)
con[1:4] => softirq cq
con[5:8] => DIRECT-POLL cq

Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-14-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2958a995

block/rnbd-clt: Fix missing a memory free when unloading the module · 12b06533

Gioh Kim authored Apr 19, 2021

When unloading the rnbd-clt module, it does not free a memory
including the filename of the symbolic link to /sys/block/rnbdX.

It is found by kmemleak as below.

unreferenced object 0xffff9f1a83d3c740 (size 16):
  comm "bash", pid 736, jiffies 4295179665 (age 9841.310s)
  hex dump (first 16 bytes):
    21 64 65 76 21 6e 75 6c 6c 62 30 40 62 6c 61 00  !dev!nullb0@bla.
  backtrace:
    [<0000000039f0c55e>] 0xffffffffc0456c24
    [<000000001aab9513>] kernfs_fop_write+0xcf/0x1c0
    [<00000000db5aa4b3>] vfs_write+0xdb/0x1d0
    [<000000007a2e2207>] ksys_write+0x65/0xe0
    [<00000000055e280a>] do_syscall_64+0x50/0x1b0
    [<00000000c2b51831>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-13-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

12b06533

block/rnbd-clt: Improve find_or_create_sess() return check · ce9d2b4f

Tom Rix authored Apr 19, 2021

clang static analysis reports this problem

rnbd-clt.c:1212:11: warning: Branch condition evaluates to a
  garbage value
        else if (!first)
                 ^~~~~~

This is triggered in the find_and_get_or_create_sess() call
because the variable first is not initialized and the
earlier check is specifically for

	if (sess == ERR_PTR(-ENOMEM))

This is false positive.

But the if-check can be reduced by initializing first to
false and then returning if the call to find_or_creat_sess()
does not set it to true.  When it remains false, either
sess will be valid or not.  The not case is caught by
find_and_get_or_create_sess()'s caller rnbd_clt_map_device()

	sess = find_and_get_or_create_sess(...);
	if (IS_ERR(sess))
		return ERR_CAST(sess);

Since find_and_get_or_create_sess() initializes first to false
setting it in find_or_create_sess() is not needed.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-12-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ce9d2b4f

block/rnbd-srv: Remove force_close file after holding a lock · c77bfa8f

Gioh Kim authored Apr 19, 2021

We changed the rnbd_srv_sess_dev_force_close to use try-lock
because rnbd_srv_sess_dev_force_close and process_msg_close
can generate a deadlock.

Now rnbd_srv_sess_dev_force_close would do nothing
if it fails to get the lock. So removing the force_close
file should be moved to after the lock. Or the force_close
file is removed but the others are not removed.
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-11-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c77bfa8f

block/rnbd-srv: Prevent a deadlock generated by accessing sysfs in parallel · b168e1d8

Gioh Kim authored Apr 19, 2021

We got a warning message below.
When server tries to close one session by force, it locks the sysfs
interface and locks the srv_sess lock.
The problem is that client can send a request to close at the same time.
By close request, server locks the srv_sess lock and locks the sysfs
to remove the sysfs interfaces.

The simplest way to prevent that situation could be just use
mutex_trylock.

[  234.153965] ======================================================
[  234.154093] WARNING: possible circular locking dependency detected
[  234.154219] 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 Tainted: G           O
[  234.154381] ------------------------------------------------------
[  234.154531] kworker/1:1H/618 is trying to acquire lock:
[  234.154651] ffff8887a09db0a8 (kn->count#132){++++}, at: kernfs_remove_by_name_ns+0x40/0x80
[  234.154819]
               but task is already holding lock:
[  234.154965] ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server]
[  234.155132]
               which lock already depends on the new lock.

[  234.155311]
               the existing dependency chain (in reverse order) is:
[  234.155462]
               -> #1 (&srv_sess->lock){+.+.}:
[  234.155614]        __mutex_lock+0x134/0xcb0
[  234.155761]        rnbd_srv_sess_dev_force_close+0x36/0x50 [rnbd_server]
[  234.155889]        rnbd_srv_dev_session_force_close_store+0x69/0xc0 [rnbd_server]
[  234.156042]        kernfs_fop_write+0x13f/0x240
[  234.156162]        vfs_write+0xf3/0x280
[  234.156278]        ksys_write+0xba/0x150
[  234.156395]        do_syscall_64+0x62/0x270
[  234.156513]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  234.156632]
               -> #0 (kn->count#132){++++}:
[  234.156782]        __lock_acquire+0x129e/0x23a0
[  234.156900]        lock_acquire+0xf3/0x210
[  234.157043]        __kernfs_remove+0x42b/0x4c0
[  234.157161]        kernfs_remove_by_name_ns+0x40/0x80
[  234.157282]        remove_files+0x3f/0xa0
[  234.157399]        sysfs_remove_group+0x4a/0xb0
[  234.157519]        rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server]
[  234.157648]        rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server]
[  234.157775]        process_io_req+0x29a/0x6a0 [rtrs_server]
[  234.157924]        __ib_process_cq+0x8c/0x100 [ib_core]
[  234.158709]        ib_cq_poll_work+0x31/0xb0 [ib_core]
[  234.158834]        process_one_work+0x4e5/0xaa0
[  234.158958]        worker_thread+0x65/0x5c0
[  234.159078]        kthread+0x1e0/0x200
[  234.159194]        ret_from_fork+0x24/0x30
[  234.159309]
               other info that might help us debug this:

[  234.159513]  Possible unsafe locking scenario:

[  234.159658]        CPU0                    CPU1
[  234.159775]        ----                    ----
[  234.159891]   lock(&srv_sess->lock);
[  234.160005]                                lock(kn->count#132);
[  234.160128]                                lock(&srv_sess->lock);
[  234.160250]   lock(kn->count#132);
[  234.160364]
                *** DEADLOCK ***

[  234.160536] 3 locks held by kworker/1:1H/618:
[  234.160677]  #0: ffff8883ca1ed528 ((wq_completion)ib-comp-wq){+.+.}, at: process_one_work+0x40a/0xaa0
[  234.160840]  #1: ffff8883d2d5fe10 ((work_completion)(&cq->work)){+.+.}, at: process_one_work+0x40a/0xaa0
[  234.161003]  #2: ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server]
[  234.161168]
               stack backtrace:
[  234.161312] CPU: 1 PID: 618 Comm: kworker/1:1H Tainted: G           O      5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10
[  234.161490] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
[  234.161643] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[  234.161765] Call Trace:
[  234.161910]  dump_stack+0x96/0xe0
[  234.162028]  check_noncircular+0x29e/0x2e0
[  234.162148]  ? print_circular_bug+0x100/0x100
[  234.162267]  ? register_lock_class+0x1ad/0x8a0
[  234.162385]  ? __lock_acquire+0x68e/0x23a0
[  234.162505]  ? trace_event_raw_event_lock+0x190/0x190
[  234.162626]  __lock_acquire+0x129e/0x23a0
[  234.162746]  ? register_lock_class+0x8a0/0x8a0
[  234.162866]  lock_acquire+0xf3/0x210
[  234.162982]  ? kernfs_remove_by_name_ns+0x40/0x80
[  234.163127]  __kernfs_remove+0x42b/0x4c0
[  234.163243]  ? kernfs_remove_by_name_ns+0x40/0x80
[  234.163363]  ? kernfs_fop_readdir+0x3b0/0x3b0
[  234.163482]  ? strlen+0x1f/0x40
[  234.163596]  ? strcmp+0x30/0x50
[  234.163712]  kernfs_remove_by_name_ns+0x40/0x80
[  234.163832]  remove_files+0x3f/0xa0
[  234.163948]  sysfs_remove_group+0x4a/0xb0
[  234.164068]  rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server]
[  234.164196]  rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server]
[  234.164345]  ? _raw_spin_unlock_irqrestore+0x43/0x50
[  234.164466]  ? lockdep_hardirqs_on+0x1a8/0x290
[  234.164597]  ? mlx4_ib_poll_cq+0x927/0x1280 [mlx4_ib]
[  234.164732]  ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server]
[  234.164859]  process_io_req+0x29a/0x6a0 [rtrs_server]
[  234.164982]  ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server]
[  234.165130]  __ib_process_cq+0x8c/0x100 [ib_core]
[  234.165279]  ib_cq_poll_work+0x31/0xb0 [ib_core]
[  234.165404]  process_one_work+0x4e5/0xaa0
[  234.165550]  ? pwq_dec_nr_in_flight+0x160/0x160
[  234.165675]  ? do_raw_spin_lock+0x119/0x1d0
[  234.165796]  worker_thread+0x65/0x5c0
[  234.165914]  ? process_one_work+0xaa0/0xaa0
[  234.166031]  kthread+0x1e0/0x200
[  234.166147]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[  234.166268]  ret_from_fork+0x24/0x30
[  234.251591] rnbd_server L243: </dev/loop1@close_device_session>: Device closed
[  234.604221] rnbd_server L264: RTRS Session close_device_session disconnected
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-10-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b168e1d8

block/rnbd-clt: Replace {NO_WAIT,WAIT} with RTRS_PERMIT_{WAIT,NOWAIT} · 9f455eea

Gioh Kim authored Apr 19, 2021

They are defined with the same value and similar meaning, let's remove
one of them, then we can remove {WAIT,NOWAIT}.

Also change the type of 'wait' from 'int' to 'enum wait_type' to make
it clear.

Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20210419073722.15351-9-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

9f455eea

block/rnbd: Kill destroy_device_cb · d16b5ac8

Guoqing Jiang authored Apr 19, 2021

We can use destroy_device directly since destroy_device_cb is just the
wrapper of destroy_device.
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-8-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d16b5ac8

block/rnbd: Kill rnbd_clt_destroy_default_group · 8e43c90a

Guoqing Jiang authored Apr 19, 2021

No need to have it since we can call sysfs_remove_group in the
rnbd_clt_destroy_sysfs_files.

Then rnbd_clt_destroy_sysfs_files is paired with it's counterpart
rnbd_clt_create_sysfs_files.
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-7-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8e43c90a

block/rnbd-clt: Move add_disk(dev->gd) to rnbd_clt_setup_gen_disk · d0a70ab1

Guoqing Jiang authored Apr 19, 2021

It makes more sense to add gendisk in rnbd_clt_setup_gen_disk, instead
of do it in rnbd_clt_map_device.
Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com>
Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-6-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d0a70ab1

block/rnbd-clt: Remove some arguments from rnbd_client_setup_device · 8b7f0511

Guoqing Jiang authored Apr 19, 2021

Remove them since both sess and idx can be dereferenced from dev. And
sess is not used in the function.
Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com>
Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-5-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8b7f0511

block/rnbd-clt: Remove some arguments from insert_dev_if_not_exists_devpath · 02ee80f5

Guoqing Jiang authored Apr 19, 2021

Remove 'pathname' and 'sess' since we can dereference it from 'dev'.
Signed-off-by: Guoqing Jiang <guoqing.jiang@gmx.com>
Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210419073722.15351-4-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

02ee80f5

Documentation/sysfs-block-rnbd: Add descriptions for remap_device and resize · e5f221c7

Gioh Kim authored Apr 19, 2021

Two sysfs entries, remap_device and resize, are missing.
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-3-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e5f221c7

MAINTAINERS: Change maintainer for rnbd module · ceeb7218

Danil Kipnis authored Apr 19, 2021

Danil steps down, Haris will take over.
Also update email address to ionos.com, the old
cloud.ionos.com will still work for some time.
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Acked-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Link: https://lore.kernel.org/r/20210419073722.15351-2-gi-oh.kim@ionos.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ceeb7218

floppy: remove redundant assignment to variable st · b53002e0

Colin Ian King authored Apr 15, 2021

The variable st is being assigned a value that is never read and
it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Denis Efremov <efremov@linux.com>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20210415130020.1959951-1-colin.king@canonical.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b53002e0

floppy: cleanups: remove FLOPPY_SILENT_DCL_CLEAR undef · a720e11f

Denis Efremov authored Apr 16, 2021

FLOPPY_SILENT_DCL_CLEAR is not defined anywhere and comes from pre-git
era. Just drop this undef. There is FD_SILENT_DCL_CLEAR which is really
used.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-6-efremov@linux.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a720e11f

floppy: cleanups: use memcpy() to copy reply_buffer · fa6b885e

Denis Efremov authored Apr 16, 2021

Use memcpy() in raw_cmd_done() to copy reply_buffer instead
of a for loop.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-5-efremov@linux.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

fa6b885e

floppy: cleanups: use memset() to zero reply_buffer · f6df18f2

Denis Efremov authored Apr 16, 2021

Use memset() to zero reply buffer in raw_cmd_copyin() instead
of a for loop.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-4-efremov@linux.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f6df18f2

floppy: cleanups: use ST0 as reply_buffer index 0 · 67c07161

Denis Efremov authored Apr 16, 2021

Use ST0 as 0 index for reply_buffer array. get_fdc_version() is the only
function that uses index 0 directly instead of the ST0 define.
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20210416083449.72700-3-efremov@linux.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

67c07161