1. 26 Apr, 2021 2 commits
    • Xiao Ni's avatar
      async_xor: increase src_offs when dropping destination page · ceaf2966
      Xiao Ni authored
      Now we support sharing one page if PAGE_SIZE is not equal stripe size. To
      support this, it needs to support calculating xor value with different
      offsets for each r5dev. One offset array is used to record those offsets.
      
      In RMW mode, parity page is used as a source page. It sets
      ASYNC_TX_XOR_DROP_DST before calculating xor value in ops_run_prexor5.
      So it needs to add src_list and src_offs at the same time. Now it only
      needs src_list. So the xor value which is calculated is wrong. It can
      cause data corruption problem.
      
      I can reproduce this problem 100% on a POWER8 machine. The steps are:
      
        mdadm -CR /dev/md0 -l5 -n3 /dev/sdb1 /dev/sdc1 /dev/sdd1 --size=3G
        mkfs.xfs /dev/md0
        mount /dev/md0 /mnt/test
        mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.
      
      Fixes: 29bcff78 ("md/raid5: add new xor function to support different page offset")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      ceaf2966
    • Lv Yunlong's avatar
      drivers/block/null_blk/main: Fix a double free in null_init. · 72ce11dd
      Lv Yunlong authored
      In null_init, null_add_dev(dev) is called.
      In null_add_dev, it calls null_free_zoned_dev(dev) to free dev->zones
      via kvfree(dev->zones) in out_cleanup_zone branch and returns err.
      Then null_init accept the err code and then calls null_free_dev(dev).
      
      But in null_free_dev(dev), dev->zones is freed again by
      null_free_zoned_dev().
      
      My patch set dev->zones to NULL in null_free_zoned_dev() after
      kvfree(dev->zones) is called, to avoid the double free.
      
      Fixes: 2984c868 ("nullb: factor disk parameters")
      Signed-off-by: default avatarLv Yunlong <lyl2019@mail.ustc.edu.cn>
      Link: https://lore.kernel.org/r/20210426143229.7374-1-lyl2019@mail.ustc.edu.cnSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      72ce11dd
  2. 23 Apr, 2021 3 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · b8417f72
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers
      
      Pull MD fixes from Song.
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/raid1: properly indicate failure when ending a failed write request
        md-cluster: fix use-after-free issue when removing rdev
      b8417f72
    • Paul Clements's avatar
      md/raid1: properly indicate failure when ending a failed write request · 2417b986
      Paul Clements authored
      This patch addresses a data corruption bug in raid1 arrays using bitmaps.
      Without this fix, the bitmap bits for the failed I/O end up being cleared.
      
      Since we are in the failure leg of raid1_end_write_request, the request
      either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).
      
      Fixes: eeba6809 ("md/raid1: end bio when the device faulty")
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: default avatarPaul Clements <paul.clements@us.sios.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      2417b986
    • Heming Zhao's avatar
      md-cluster: fix use-after-free issue when removing rdev · f7c7a2f9
      Heming Zhao authored
      md_kick_rdev_from_array will remove rdev, so we should
      use rdev_for_each_safe to search list.
      
      How to trigger:
      
      env: Two nodes on kvm-qemu x86_64 VMs (2C2G with 2 iscsi luns).
      
      ```
      node2=192.168.0.3
      
      for i in {1..20}; do
          echo ==== $i `date` ====;
      
          mdadm -Ss && ssh ${node2} "mdadm -Ss"
          wipefs -a /dev/sda /dev/sdb
      
          mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l 1 /dev/sda \
             /dev/sdb --assume-clean
          ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
          mdadm --wait /dev/md0
          ssh ${node2} "mdadm --wait /dev/md0"
      
          mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda
          sleep 1
      done
      ```
      
      Crash stack:
      
      ```
      stack segment: 0000 [#1] SMP
      ... ...
      RIP: 0010:md_check_recovery+0x1e8/0x570 [md_mod]
      ... ...
      RSP: 0018:ffffb149807a7d68 EFLAGS: 00010207
      RAX: 0000000000000000 RBX: ffff9d494c180800 RCX: ffff9d490fc01e50
      RDX: fffff047c0ed8308 RSI: 0000000000000246 RDI: 0000000000000246
      RBP: 6b6b6b6b6b6b6b6b R08: ffff9d490fc01e40 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
      R13: ffff9d494c180818 R14: ffff9d493399ef38 R15: ffff9d4933a1d800
      FS:  0000000000000000(0000) GS:ffff9d494f700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe68cab9010 CR3: 000000004c6be001 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       raid1d+0x5c/0xd40 [raid1]
       ? finish_task_switch+0x75/0x2a0
       ? lock_timer_base+0x67/0x80
       ? try_to_del_timer_sync+0x4d/0x80
       ? del_timer_sync+0x41/0x50
       ? schedule_timeout+0x254/0x2d0
       ? md_start_sync+0xe0/0xe0 [md_mod]
       ? md_thread+0x127/0x160 [md_mod]
       md_thread+0x127/0x160 [md_mod]
       ? wait_woken+0x80/0x80
       kthread+0x10d/0x130
       ? kthread_park+0xa0/0xa0
       ret_from_fork+0x1f/0x40
      ```
      
      Fixes: dbb64f86 ("md-cluster: Fix adding of new disk with new reload code")
      Fixes: 659b254f ("md-cluster: remove a disk asynchronously from cluster environment")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGang He <ghe@suse.com>
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      f7c7a2f9
  3. 22 Apr, 2021 2 commits
  4. 21 Apr, 2021 8 commits
    • Christoph Hellwig's avatar
      nvme: cleanup nvme_configure_apst · 60df5de9
      Christoph Hellwig authored
      Remove a level of indentation from the main code implementating the table
      search by using a goto for the APST not supported case.  Also move the
      main comment above the function.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      60df5de9
    • Christoph Hellwig's avatar
      nvme: do not try to reconfigure APST when the controller is not live · 53fe2a30
      Christoph Hellwig authored
      Do not call nvme_configure_apst when the controller is not live, given
      that nvme_configure_apst will fail due the lack of an admin queue when
      the controller is being torn down and nvme_set_latency_tolerance is
      called from dev_pm_qos_hide_latency_tolerance.
      
      Fixes: 510a405d("nvme: fix memory leak for power latency tolerance")
      Reported-by: default avatarPeng Liu <liupeng17@lenovo.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      53fe2a30
    • Hannes Reinecke's avatar
      nvme: add 'kato' sysfs attribute · 74c22990
      Hannes Reinecke authored
      Add a 'kato' controller sysfs attribute to display the current
      keep-alive timeout value (if any). This allows userspace to identify
      persistent discovery controllers, as these will have a non-zero
      KATO value.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      74c22990
    • Hannes Reinecke's avatar
      nvme: sanitize KATO setting · a70b81bd
      Hannes Reinecke authored
      According to the NVMe base spec the KATO commands should be sent
      at half of the KATO interval, to properly account for round-trip
      times.
      As we now will only ever send one KATO command per connection we
      can easily use the recommended values.
      This also fixes a potential issue where the request timeout for
      the KATO command does not match the value in the connect command,
      which might be causing spurious connection drops from the target.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      a70b81bd
    • Hou Pu's avatar
      nvmet: avoid queuing keep-alive timer if it is disabled · 8f864c59
      Hou Pu authored
      Issue following command:
      nvme set-feature -f 0xf -v 0 /dev/nvme1n1 # disable keep-alive timer
      nvme admin-passthru -o 0x18 /dev/nvme1n1  # send keep-alive command
      will make keep-alive timer fired and thus delete the controller like
      below:
      
      [247459.907635] nvmet: ctrl 1 keep-alive timer (0 seconds) expired!
      [247459.930294] nvmet: ctrl 1 fatal error occurred!
      
      Avoid this by not queuing delayed keep-alive if it is disabled when
      keep-alive command is received from the admin queue.
      Signed-off-by: default avatarHou Pu <houpu.main@gmail.com>
      Tested-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8f864c59
    • Calvin Owens's avatar
      brd: expose number of allocated pages in debugfs · f4be591f
      Calvin Owens authored
      While the maximum size of each ramdisk is defined either as a module
      parameter, or compile time default, it's impossible to know how many pages
      have currently been allocated by each ram%d device, since they're
      allocated when used and never freed.
      
      This patch creates a new directory at this location:
      
      /sys/kernel/debug/ramdisk_pages/
      
      which will contain a file named "ram%d" for each instantiated ramdisk on
      the system. The file is read-only, and read() will output the number of
      pages currently held by that ramdisk.
      
      We lose track how much memory a ramdisk is using as pages once used are
      simply recycled but never freed.
      
      In instances where we exhaust the size of the ramdisk with a file that
      exceeds it, encounter ENOSPC and delete the file for mitigation; df would
      show decrease in used and increase in available blocks but the since we
      have touched all pages, the memory footprint of the ramdisk does not
      reflect the blocks used/available count
      
      ...
      [root@localhost ~]# mkfs.ext2 /dev/ram15
      mke2fs 1.45.6 (20-Mar-2020)
      Creating filesystem with 4096 1k blocks and 1024 inodes
      [root@localhost ~]# mount /dev/ram15 /mnt/ram15/
      
      [root@localhost ~]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      58
      [root@kerneltest008.06.prn3 ~]# df /dev/ram15
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963    31      3728   1% /mnt/ram15
      [root@kerneltest008.06.prn3 ~]# dd if=/dev/urandom of=/mnt/ram15/test2
      bs=1M count=5
      dd: error writing '/mnt/ram15/test2': No space left on device
      4+0 records in
      3+0 records out
      4005888 bytes (4.0 MB, 3.8 MiB) copied, 0.0446614 s, 89.7 MB/s
      [root@kerneltest008.06.prn3 ~]# df /mnt/ram15/
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963  3960         0 100% /mnt/ram15
      [root@kerneltest008.06.prn3 ~]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      1024
      [root@kerneltest008.06.prn3 ~]# rm /mnt/ram15/test2
      rm: remove regular file '/mnt/ram15/test2'? y
      [root@kerneltest008.06.prn3 /var]# df /dev/ram15
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963    31      3728   1% /mnt/ram15
      
      # Acutal memory footprint
      [root@kerneltest008.06.prn3 /var]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      1024
      ...
      
      This debugfs counter will always reveal the accurate number of
      permanently allocated pages to the ramdisk.
      Signed-off-by: default avatarCalvin Owens <calvinowens@fb.com>
      [cleaned up the !CONFIG_DEBUG_FS case and API changes for HEAD]
      Signed-off-by: default avatarKyle McMartin <jkkm@fb.com>
      [rebased]
      Signed-off-by: default avatarSaravanan D <saravanand@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f4be591f
    • Dan Carpenter's avatar
      ataflop: fix off by one in ataflop_probe() · b777f4c4
      Dan Carpenter authored
      Smatch complains that the "type > NUM_DISK_MINORS" should be >=
      instead of >.  We also need to subtract one from "type" at the start.
      
      Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b777f4c4
    • Dan Carpenter's avatar
      ataflop: potential out of bounds in do_format() · 1ffec389
      Dan Carpenter authored
      The function uses "type" as an array index:
      
      	q = unit[drive].disk[type]->queue;
      
      Unfortunately the bounds check on "type" isn't done until later in the
      function.  Fix this by moving the bounds check to the start.
      
      Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1ffec389
  5. 20 Apr, 2021 25 commits