1. 25 Mar, 2019 3 commits
    • Ming Lei's avatar
      sbitmap: order READ/WRITE freed instance and setting clear bit · e6d1fa58
      Ming Lei authored
      Inside sbitmap_queue_clear(), once the clear bit is set, it will be
      visiable to allocation path immediately. Meantime READ/WRITE on old
      associated instance(such as request in case of blk-mq) may be
      out-of-order with the setting clear bit, so race with re-allocation
      may be triggered.
      
      Adds one memory barrier for ordering READ/WRITE of the freed associated
      instance with setting clear bit for avoiding race with re-allocation.
      
      The following kernel oops triggerd by block/006 on aarch64 may be fixed:
      
      [  142.330954] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000330
      [  142.338794] Mem abort info:
      [  142.341554]   ESR = 0x96000005
      [  142.344632]   Exception class = DABT (current EL), IL = 32 bits
      [  142.350500]   SET = 0, FnV = 0
      [  142.353544]   EA = 0, S1PTW = 0
      [  142.356678] Data abort info:
      [  142.359528]   ISV = 0, ISS = 0x00000005
      [  142.363343]   CM = 0, WnR = 0
      [  142.366305] user pgtable: 64k pages, 48-bit VAs, pgdp = 000000002a3c51c0
      [  142.372983] [0000000000000330] pgd=0000000000000000, pud=0000000000000000
      [  142.379777] Internal error: Oops: 96000005 [#1] SMP
      [  142.384613] Modules linked in: null_blk ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp vfat fat rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core sbsa_gwdt crct10dif_ce ghash_ce ipmi_ssif sha2_ce ipmi_devintf sha256_arm64 sg sha1_ce ipmi_msghandler ip_tables xfs libcrc32c mlx5_core sdhci_acpi mlxfw ahci_platform at803x sdhci libahci_platform qcom_emac mmc_core hdma hdma_mgmt i2c_dev [last unloaded: null_blk]
      [  142.429753] CPU: 7 PID: 1983 Comm: fio Not tainted 5.0.0.cki #2
      [  142.449458] pstate: 00400005 (nzcv daif +PAN -UAO)
      [  142.454239] pc : __blk_mq_free_request+0x4c/0xa8
      [  142.458830] lr : blk_mq_free_request+0xec/0x118
      [  142.463344] sp : ffff00003360f6a0
      [  142.466646] x29: ffff00003360f6a0 x28: ffff000010e70000
      [  142.471941] x27: ffff801729a50048 x26: 0000000000010000
      [  142.477232] x25: ffff00003360f954 x24: ffff7bdfff021440
      [  142.482529] x23: 0000000000000000 x22: 00000000ffffffff
      [  142.487830] x21: ffff801729810000 x20: 0000000000000000
      [  142.493123] x19: ffff801729a50000 x18: 0000000000000000
      [  142.498413] x17: 0000000000000000 x16: 0000000000000001
      [  142.503709] x15: 00000000000000ff x14: ffff7fe000000000
      [  142.509003] x13: ffff8017dcde09a0 x12: 0000000000000000
      [  142.514308] x11: 0000000000000001 x10: 0000000000000008
      [  142.519597] x9 : ffff8017dcde09a0 x8 : 0000000000002000
      [  142.524889] x7 : ffff8017dcde0a00 x6 : 000000015388f9be
      [  142.530187] x5 : 0000000000000001 x4 : 0000000000000000
      [  142.535478] x3 : 0000000000000000 x2 : 0000000000000000
      [  142.540777] x1 : 0000000000000001 x0 : ffff00001041b194
      [  142.546071] Process fio (pid: 1983, stack limit = 0x000000006460a0ea)
      [  142.552500] Call trace:
      [  142.554926]  __blk_mq_free_request+0x4c/0xa8
      [  142.559181]  blk_mq_free_request+0xec/0x118
      [  142.563352]  blk_mq_end_request+0xfc/0x120
      [  142.567444]  end_cmd+0x3c/0xa8 [null_blk]
      [  142.571434]  null_complete_rq+0x20/0x30 [null_blk]
      [  142.576194]  blk_mq_complete_request+0x108/0x148
      [  142.580797]  null_handle_cmd+0x1d4/0x718 [null_blk]
      [  142.585662]  null_queue_rq+0x60/0xa8 [null_blk]
      [  142.590171]  blk_mq_try_issue_directly+0x148/0x280
      [  142.594949]  blk_mq_try_issue_list_directly+0x9c/0x108
      [  142.600064]  blk_mq_sched_insert_requests+0xb0/0xd0
      [  142.604926]  blk_mq_flush_plug_list+0x16c/0x2a0
      [  142.609441]  blk_flush_plug_list+0xec/0x118
      [  142.613608]  blk_finish_plug+0x3c/0x4c
      [  142.617348]  blkdev_direct_IO+0x3b4/0x428
      [  142.621336]  generic_file_read_iter+0x84/0x180
      [  142.625761]  blkdev_read_iter+0x50/0x78
      [  142.629579]  aio_read.isra.6+0xf8/0x190
      [  142.633409]  __io_submit_one.isra.8+0x148/0x738
      [  142.637912]  io_submit_one.isra.9+0x88/0xb8
      [  142.642078]  __arm64_sys_io_submit+0xe0/0x238
      [  142.646428]  el0_svc_handler+0xa0/0x128
      [  142.650238]  el0_svc+0x8/0xc
      [  142.653104] Code: b9402a63 f9000a7f 3100047f 540000a0 (f9419a81)
      [  142.659202] ---[ end trace 467586bc175eb09d ]---
      
      Fixes: ea86ea2c ("sbitmap: ammortize cost of clearing bits")
      Reported-and-bisected_and_tested-by: Yi Zhang <yi.zhang@redhat.com>
      Cc: Yi Zhang <yi.zhang@redhat.com>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e6d1fa58
    • Jens Axboe's avatar
      blk-mq: fix sbitmap ws_active for shared tags · e8618575
      Jens Axboe authored
      We now wrap sbitmap waitqueues in an active counter, so we can avoid
      iterating wakeups unless we have waiters there. This works as long as
      everyone that's manipulating the waitqueues use the proper helpers. For
      the tag wait case for shared tags, however, we add ourselves to the
      waitqueue without incrementing/decrementing the ->ws_active count. This
      means that wakeups can take a long time to happen.
      
      Fix this by manually doing the inc/dec as needed for the wait queue
      handling.
      Reported-by: default avatarMichael Leun <kbug@newton.leun.net>
      Tested-by: default avatarMichael Leun <kbug@newton.leun.net>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Fixes: 5d2ee712 ("sbitmap: optimize wakeup check")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e8618575
    • Arnd Bergmann's avatar
      io_uring: fix big-endian compat signal mask handling · 9e75ad5d
      Arnd Bergmann authored
      On big-endian architectures, the signal masks are differnet
      between 32-bit and 64-bit tasks, so we have to use a different
      function for reading them from user space.
      
      io_cqring_wait() initially got this wrong, and always interprets
      this as a native structure. This is ok on x86 and most arm64,
      but not on s390, ppc64be, mips64be, sparc64 and parisc.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e75ad5d
  2. 24 Mar, 2019 2 commits
  3. 23 Mar, 2019 3 commits
    • Linus Torvalds's avatar
      Merge tag 'io_uring-20190323' of git://git.kernel.dk/linux-block · 1bdd3dbf
      Linus Torvalds authored
      Pull io_uring fixes and improvements from Jens Axboe:
       "The first five in this series are heavily inspired by the work Al did
        on the aio side to fix the races there.
      
        The last two re-introduce a feature that was in io_uring before it got
        merged, but which I pulled since we didn't have a good way to have
        BVEC iters that already have a stable reference. These aren't
        necessarily related to block, it's just how io_uring pins fixed
        buffers"
      
      * tag 'io_uring-20190323' of git://git.kernel.dk/linux-block:
        block: add BIO_NO_PAGE_REF flag
        iov_iter: add ITER_BVEC_FLAG_NO_REF flag
        io_uring: mark me as the maintainer
        io_uring: retry bulk slab allocs as single allocs
        io_uring: fix poll races
        io_uring: fix fget/fput handling
        io_uring: add prepped flag
        io_uring: make io_read/write return an integer
        io_uring: use regular request ref counts
      1bdd3dbf
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190323' of git://git.kernel.dk/linux-block · 2335cbe6
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A set of fixes/changes that should go into this series. This contains:
      
         - Kernel doc / comment updates (Bart, Shenghui)
      
         - Un-export of core-only used function (Bart)
      
         - Fix race on loop file access (Dongli)
      
         - pf/pcd queue cleanup fixes (me)
      
         - Use appropriate helper for RESTART bit set (Yufen)
      
         - Use named identifier for classic poll (Yufen)"
      
      * tag 'for-linus-20190323' of git://git.kernel.dk/linux-block:
        sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
        blkcg: Fix kernel-doc warnings
        blk-iolatency: #include "blk.h"
        block: Unexport blk_mq_add_to_requeue_list()
        block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
        blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
        loop: access lo_backing_file only when the loop device is Lo_bound
        blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART
        paride/pcd: cleanup queues when detection fails
        paride/pf: cleanup queues when detection fails
      2335cbe6
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client · 9a1050ad
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "A follow up for the new alloc_size logic and a blacklisting fix,
        marked for stable"
      
      * tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client:
        rbd: drop wait_for_latest_osdmap()
        libceph: wait for latest osdmap in ceph_monc_blacklist_add()
        rbd: set io_min, io_opt and discard_granularity to alloc_size
      9a1050ad
  4. 22 Mar, 2019 18 commits
  5. 21 Mar, 2019 14 commits