1. 06 Jun, 2019 2 commits
    • Max Gurtovoy's avatar
      nvme-rdma: use dynamic dma mapping per command · 62f99b62
      Max Gurtovoy authored
      Commit 87fd1253 ("nvme-rdma: remove redundant reference between
      ib_device and tagset") caused a kernel panic when disconnecting from an
      inaccessible controller (disconnect during re-connection).
      
      --
      nvme nvme0: Removing ctrl: NQN "testnqn1"
      nvme_rdma: nvme_rdma_exit_request: hctx 0 queue_idx 1
      BUG: unable to handle kernel paging request at 0000000080000228
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      ...
      Call Trace:
       blk_mq_exit_hctx+0x5c/0xf0
       blk_mq_exit_queue+0xd4/0x100
       blk_cleanup_queue+0x9a/0xc0
       nvme_rdma_destroy_io_queues+0x52/0x60 [nvme_rdma]
       nvme_rdma_shutdown_ctrl+0x3e/0x80 [nvme_rdma]
       nvme_do_delete_ctrl+0x53/0x80 [nvme_core]
       nvme_sysfs_delete+0x45/0x60 [nvme_core]
       kernfs_fop_write+0x105/0x180
       vfs_write+0xad/0x1a0
       ksys_write+0x5a/0xd0
       do_syscall_64+0x55/0x110
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fa215417154
      --
      
      The reason for this crash is accessing an already freed ib_device for
      performing dma_unmap during exit_request commands. The root cause for
      that is that during re-connection all the queues are destroyed and
      re-created (and the ib_device is reference counted by the queues and
      freed as well) but the tagset stays alive and all the DMA mappings (that
      we perform in init_request) kept in the request context. The original
      commit fixed a different bug that was introduced during bonding (aka nic
      teaming) tests that for some scenarios change the underlying ib_device
      and caused memory leakage and possible segmentation fault. This commit
      is a complementary commit that also changes the wrong DMA mappings that
      were saved in the request context and making the request sqe dma
      mappings dynamic with the command lifetime (i.e. mapped in .queue_rq and
      unmapped in .complete). It also fixes the above crash of accessing freed
      ib_device during destruction of the tagset.
      
      Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset")
      Reported-by: default avatarJim Harris <james.r.harris@intel.com>
      Suggested-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Tested-by: default avatarJim Harris <james.r.harris@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      62f99b62
    • Jaesoo Lee's avatar
      nvme: Fix u32 overflow in the number of namespace list calculation · c8e8c77b
      Jaesoo Lee authored
      The Number of Namespaces (nn) field in the identify controller data structure is
      defined as u32 and the maximum allowed value in NVMe specification is
      0xFFFFFFFEUL. This change fixes the possible overflow of the DIV_ROUND_UP()
      operation used in nvme_scan_ns_list() by casting the nn to u64.
      Signed-off-by: default avatarJaesoo Lee <jalee@purestorage.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      c8e8c77b
  2. 04 Jun, 2019 1 commit
    • Minwoo Im's avatar
      nvmet: fix data_len to 0 for bdev-backed write_zeroes · 3562f5d9
      Minwoo Im authored
      The WRITE ZEROES command has no data transfer so that we need to
      initialize the struct (nvmet_req *req)->data_len to 0x0.  While
      (nvmet_req *req)->transfer_len is initialized in nvmet_req_init(),
      data_len will be initialized by nowhere which might cause the failure
      with status code NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR randomly.  It's
      because nvmet_req_execute() checks like:
      
      	if (unlikely(req->data_len != req->transfer_len)) {
      		req->error_loc = offsetof(struct nvme_common_command, dptr);
      		nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR);
      	} else
      		req->execute(req);
      
      This patch fixes req->data_len not to be a randomly assigned by
      initializing it to 0x0 when preparing the command in
      nvmet_bdev_parse_io_cmd().
      
      nvmet_file_parse_io_cmd() which is for file-backed I/O has already
      initialized the data_len field to 0x0, though.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
      Signed-off-by: default avatarMinwoo Im <minwoo.im.dev@gmail.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      3562f5d9
  3. 30 May, 2019 2 commits
    • Sagi Grimberg's avatar
      nvme-tcp: fix queue mapping when queue count is limited · 64861993
      Sagi Grimberg authored
      When the controller supports less queues than requested, we
      should make sure that queue mapping does the right thing and
      not assume that all queues are available. This fixes a crash
      when the controller supports less queues than requested.
      
      The rules are:
      1. if no write queues are requested, we assign the available queues
         to the default queue map. The default and read queue maps share the
         existing queues.
      2. if write queues are requested:
        - first make sure that read queue map gets the requested
          nr_io_queues count
        - then grant the default queue map the minimum between the requested
          nr_write_queues and the remaining queues. If there are no available
          queues to dedicate to the default queue map, fallback to (1) and
          share all the queues in the existing queue map.
      
      Also, provide a log indication on how we constructed the different
      queue maps.
      Reported-by: default avatarHarris, James R <james.r.harris@intel.com>
      Tested-by: default avatarJim Harris <james.r.harris@intel.com>
      Cc: <stable@vger.kernel.org> # v5.0+
      Suggested-by: default avatarRoy Shterman <roys@lightbitslabs.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      64861993
    • Sagi Grimberg's avatar
      nvme-rdma: fix queue mapping when queue count is limited · 5651cd3c
      Sagi Grimberg authored
      When the controller supports less queues than requested, we
      should make sure that queue mapping does the right thing and
      not assume that all queues are available. This fixes a crash
      when the controller supports less queues than requested.
      
      The rules are:
      1. if no write/poll queues are requested, we assign the available queues
         to the default queue map. The default and read queue maps share the
         existing queues.
      2. if write queues are requested:
        - first make sure that read queue map gets the requested
          nr_io_queues count
        - then grant the default queue map the minimum between the requested
          nr_write_queues and the remaining queues. If there are no available
          queues to dedicate to the default queue map, fallback to (1) and
          share all the queues in the existing queue map.
      3. if poll queues are requested:
        - map the remaining queues to the poll queue map.
      
      Also, provide a log indication on how we constructed the different
      queue maps.
      Reported-by: default avatarHarris, James R <james.r.harris@intel.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Tested-by: default avatarJim Harris <james.r.harris@intel.com>
      Cc: <stable@vger.kernel.org> # v5.0+
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      5651cd3c
  4. 22 May, 2019 1 commit
    • Keith Busch's avatar
      nvme-pci: use blk-mq mapping for unmanaged irqs · cb9e0e50
      Keith Busch authored
      If a device is providing a single IRQ vector, the IO queue will share
      that vector with the admin queue. This is an unmanaged vector, so does
      not have a valid PCI IRQ affinity. Avoid trying to extract a managed
      affinity in this case and let blk-mq set up the cpu:queue mapping instead.
      Otherwise we'd hit the following warning when the device is using MSI:
      
       WARNING: CPU: 4 PID: 7 at drivers/pci/msi.c:1272 pci_irq_get_affinity+0x66/0x80
       Modules linked in: nvme nvme_core serio_raw
       CPU: 4 PID: 7 Comm: kworker/u16:0 Tainted: G        W         5.2.0-rc1+ #494
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
       Workqueue: nvme-reset-wq nvme_reset_work [nvme]
       RIP: 0010:pci_irq_get_affinity+0x66/0x80
       Code: 0b 31 c0 c3 83 e2 10 48 c7 c0 b0 83 35 91 74 2a 48 8b 87 d8 03 00 00 48 85 c0 74 0e 48 8b 50 30 48 85 d2 74 05 39 70 14 77 05 <0f> 0b 31 c0 c3 48 63 f6 48 8d 04 76 48 8d 04 c2 f3 c3 48 8b 40 30
       RSP: 0000:ffffb5abc01d3cc8 EFLAGS: 00010246
       RAX: ffff9536786a39c0 RBX: 0000000000000000 RCX: 0000000000000080
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9536781ed000
       RBP: ffff95367346a008 R08: ffff95367d43f080 R09: ffff953678c07800
       R10: ffff953678164800 R11: 0000000000000000 R12: 0000000000000000
       R13: ffff9536781ed000 R14: 00000000ffffffff R15: ffff95367346a008
       FS:  0000000000000000(0000) GS:ffff95367d400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fdf814a3ff0 CR3: 000000001a20f000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        blk_mq_pci_map_queues+0x37/0xd0
        nvme_pci_map_queues+0x80/0xb0 [nvme]
        blk_mq_alloc_tag_set+0x133/0x2f0
        nvme_reset_work+0x105d/0x1590 [nvme]
        process_one_work+0x291/0x530
        worker_thread+0x218/0x3d0
        ? process_one_work+0x530/0x530
        kthread+0x111/0x130
        ? kthread_park+0x90/0x90
        ret_from_fork+0x1f/0x30
       ---[ end trace 74587339d93c83c0 ]---
      
      Fixes: 22b55601 ("nvme-pci: Separate IO and admin queue IRQ vectors")
      Reported-by: default avatarIván Chavero <ichavero@chavero.com.mx>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      cb9e0e50
  5. 21 May, 2019 2 commits
  6. 17 May, 2019 18 commits
    • Yufen Yu's avatar
      nvme: fix memory leak for power latency tolerance · 510a405d
      Yufen Yu authored
      Unconditionally hide device pm latency tolerance when uninitializing
      the controller to ensure all qos resources are released so that we're
      not leaking this memory. This is safe to call if none were allocated in
      the first place, or were previously freed.
      
      Fixes: c5552fde("nvme: Enable autonomous power state transitions")
      Suggested-by: default avatarKeith Busch <keith.busch@intel.com>
      Tested-by: default avatarDavid Milburn <dmilburn@redhat.com>
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      [changelog]
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      510a405d
    • Christoph Hellwig's avatar
      nvme: release namespace SRCU protection before performing controller ioctls · 5fb4aac7
      Christoph Hellwig authored
      Holding the SRCU critical section protecting the namespace list can
      cause deadlocks when using the per-namespace admin passthrough ioctl to
      delete as namespace.  Release it earlier when performing per-controller
      ioctls to avoid that.
      Reported-by: default avatarKenneth Heitke <kenneth.heitke@intel.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      5fb4aac7
    • Christoph Hellwig's avatar
      nvme: merge nvme_ns_ioctl into nvme_ioctl · 90ec611a
      Christoph Hellwig authored
      Merge the two functions to make future changes a little easier.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      90ec611a
    • Christoph Hellwig's avatar
      nvme: remove the ifdef around nvme_nvm_ioctl · 3f98bcc5
      Christoph Hellwig authored
      We already have a proper stub if lightnvm is not enabled, so don't bother
      with the ifdef.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      3f98bcc5
    • Christoph Hellwig's avatar
      nvme: fix srcu locking on error return in nvme_get_ns_from_disk · 100c815c
      Christoph Hellwig authored
      If we can't get a namespace don't leak the SRCU lock.  nvme_ioctl was
      working around this, but nvme_pr_command wasn't handling this properly.
      Just do what callers would usually expect.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      100c815c
    • Keith Busch's avatar
      nvme: Fix known effects · 6fa0321a
      Keith Busch authored
      We're trying to append known effects to the ones reported in the
      controller's log. The original patch accomplished this, but something
      went wrong when patch was merged causing the effects log to override
      the known effects.
      
      Link: http://lists.infradead.org/pipermail/linux-nvme/2019-May/023710.html
      Fixes: f4524cc4 ("nvme-pci: add known admin effects to augument admin effects log page")
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      6fa0321a
    • Keith Busch's avatar
      nvme-pci: Sync queues on reset · d6135c3a
      Keith Busch authored
      A controller with multiple namespaces may have multiple request_queues with
      their own timeout work. If a controller fails with IO outstanding to
      diffent namespaces, each request queue may attempt to handle it, so
      ensure there is no previously scheduled timeout work executing prior to
      starting controller initialization by synchronizing with each queue.
      Reviewed-by: default avatarMinwoo Im <minwoo.im.dev@gmail.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      d6135c3a
    • Keith Busch's avatar
      nvme-pci: Unblock reset_work on IO failure · 2036f726
      Keith Busch authored
      The reset_work waits for queued IO to complete before setting the
      controller to live. If any of these times out and requeues, we won't be
      able to restart the controller because the reset_work is already running.
      
      Flush all entered requests to a failed completion if a timeout occurs
      in the connecting state, and ensure the controller can't transition to
      the live state after we've unblocked it from waiting for completions.
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      2036f726
    • Keith Busch's avatar
      nvme-pci: Don't disable on timeout in reset state · 39a9dd81
      Keith Busch authored
      The reset state doesn't dispatch commands that it needs to wait for
      anymore. If a timeout occurs in this state, the reset work is already
      disabling the controller, so just reset the request's timer.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      39a9dd81
    • Keith Busch's avatar
      nvme-pci: Fix controller freeze wait disabling · e43269e6
      Keith Busch authored
      If a controller disabling didn't start a freeze, don't wait for the
      operation to complete.
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      e43269e6
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190516' of git://git.kernel.dk/linux-block · a6a4b66b
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A small set of fixes for io_uring.
      
        This contains:
      
         - smp_rmb() cleanup for io_cqring_events() (Jackie)
      
         - io_cqring_wait() simplification (Jackie)
      
         - removal of dead 'ev_flags' passing (me)
      
         - SQ poll CPU affinity verification fix (me)
      
         - SQ poll wait fix (Roman)
      
         - SQE command prep cleanup and fix (Stefan)"
      
      * tag 'for-linus-20190516' of git://git.kernel.dk/linux-block:
        io_uring: use wait_event_interruptible for cq_wait conditional wait
        io_uring: adjust smp_rmb inside io_cqring_events
        io_uring: fix infinite wait in khread_park() on io_finish_async()
        io_uring: remove 'ev_flags' argument
        io_uring: fix failure to verify SQ_AFF cpu
        io_uring: fix race condition reading SQE data
      a6a4b66b
    • Linus Torvalds's avatar
      Merge tag 'for-5.2/block-post-20190516' of git://git.kernel.dk/linux-block · 1718de78
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "This is mainly some late lightnvm changes that came in just before the
        merge window, as well as fixes that have been queued up since the
        initial pull request was frozen.
      
        This contains:
      
         - lightnvm changes, fixing race conditions, improving memory
           utilization, and improving pblk compatability (Chansol, Igor,
           Marcin)
      
         - NVMe pull request with minor fixes all over the map (via Christoph)
      
         - remove redundant error print in sata_rcar (Geert)
      
         - struct_size() cleanup (Jackie)
      
         - dasd CONFIG_LBADF warning fix (Ming)
      
         - brd cond_resched() improvement (Mikulas)"
      
      * tag 'for-5.2/block-post-20190516' of git://git.kernel.dk/linux-block: (41 commits)
        block/bio-integrity: use struct_size() in kmalloc()
        nvme: validate cntlid during controller initialisation
        nvme: change locking for the per-subsystem controller list
        nvme: trace all async notice events
        nvme: fix typos in nvme status code values
        nvme-fabrics: remove unused argument
        nvme-multipath: avoid crash on invalid subsystem cntlid enumeration
        nvme-fc: use separate work queue to avoid warning
        nvme-rdma: remove redundant reference between ib_device and tagset
        nvme-pci: mark expected switch fall-through
        nvme-pci: add known admin effects to augument admin effects log page
        nvme-pci: init shadow doorbell after each reset
        brd: add cond_resched to brd_free_pages
        sata_rcar: Remove ata_host_alloc() error printing
        s390/dasd: fix build warning in dasd_eckd_build_cp_raw
        lightnvm: pblk: use nvm_rq_to_ppa_list()
        lightnvm: pblk: simplify partial read path
        lightnvm: do not remove instance under global lock
        lightnvm: track inflight target creations
        lightnvm: pblk: recover only written metadata
        ...
      1718de78
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 815d469d
      Linus Torvalds authored
      Pull more clk framework updates from Stephen Boyd:
       "One more patch to remove io.h from clk-provider.h.
      
        We used to need this include when we had clk_readl() and clk_writel(),
        but those are gone now so this patch pushes the dependency out to the
        users of clk-provider.h"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: Remove io.h from clk-provider.h
      815d469d
    • Linus Torvalds's avatar
      Merge branch 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 5f3ab27b
      Linus Torvalds authored
      Pull cgroup fix from Tejun Heo:
       "The cgroup2 freezer pulled in this cycle broke strace. This pull
        request includes a workaround for the problem.
      
        It's not a complete fix in that it may cause spurious frozen state
        flip-flops which is fairly minor. Will push a full fix once it's
        ready"
      
      * 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        signal: unconditionally leave the frozen state in ptrace_stop()
      5f3ab27b
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.2-rc1-2' of... · 4c7b63a3
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull more kselftest updates from Shuah Khan:
      
       - kselftest framework bpf build/test workflow regression fix
      
       - Fix to kselftest install to use default install path
      
       - Fix to kselftest KBUILD_OUTPUT builds to not clutter main
         KBUILD_OUTPUT directory with selftest objects
      
       - .gitignore fixes (Kelsey Skunberg)
      
       - rseq selftests updates (Mathieu Desnoyers and Martin Schwidefsky)
      
         They change the per-architecture pre-abort signatures to ensure those
         are valid trap instructions.
      
         The way exit points are presented to debuggers is enhanced, ensuring
         all exit points are present, so debuggers don't have to disassemble
         rseq critical section to properly skip over them.
      
         Discussions with the glibc community is reaching a consensus of
         exposing a __rseq_handled symbol from glibc to coexist with rseq
         early adopters. Update the rseq selftest code to expose and use this
         symbol.
      
         Support for compiling asm goto with clang is added with the
         "-no-integrated-as" compiler switch, similarly to the top level
         kernel Makefile.
      
       - kselftest Makefile test run output refactoring and making test output
         TAP13 compliant from Kees Cook:
      
         This re-factors the selftest Makefiles to extract the test running
         logic to be reused between "run_tests" and "emit_tests", while also
         fixing up the test output to be TAP version 13 compliant:
      	- added "plan" line
      	- fixed result line syntax
      	- moved all test output to be "# "-prefixed as TAP "diagnostic"
      	  lines
      
         The prefixing code includes a fallback mode for limited execution
         environments.
      
         Additionally, the plan lines are fixed for all callers of
         kselftest.h.
      
      * tag 'linux-kselftest-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
        selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects
        selftests: drivers: Create .gitignore to include /dma-buf/udmabuf
        selftests: pidfd: Create .gitignore to include pidfd_test
        selftests: fix bpf build/test workflow regression when KBUILD_OUTPUT is set
        selftests: fix install target to use default install path
        rseq/selftests: add -no-integrated-as for clang
        rseq/selftests: mips: use break instruction for RSEQ_SIG
        rseq/selftests: powerpc code signature: generate valid instructions
        rseq/selftests: aarch64 code signature: handle big-endian environment
        rseq/selftests: arm: use udf instruction for RSEQ_SIG
        rseq/selftests: s390: use trap4 for RSEQ_SIG
        rseq/selftests: x86: use ud1 instruction as RSEQ_SIG opcode
        rseq/selftests: s390: use jg instruction for jumps outside of the asm
        rseq/selftests: Use __rseq_handled symbol to coexist with glibc
        rseq/selftests: Introduce __rseq_cs_ptr_array, rename __rseq_table to __rseq_cs
        rseq/selftests: Add __rseq_exit_point_array section for debuggers
        rseq/selftests: x86: Work-around bogus gcc-8 optimisation
        selftests: Add test plan API to kselftest.h and adjust callers
        selftests: Remove KSFT_TAP_LEVEL
        selftests: Move test output to diagnostic lines
        ...
      4c7b63a3
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-5.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 9cbda1bd
      Linus Torvalds authored
      Pull Devicetree vendor prefix conversion from Rob Herring:
       "Conversion of vendor-prefixes.txt to json-schema"
      
      * tag 'devicetree-for-5.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: Convert vendor prefixes to json-schema
      9cbda1bd
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-b-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 0d744719
      Linus Torvalds authored
      Pull AFS callback promise fixes from David Howells:
       "This series fixes a bunch of problems in callback promise handling,
        where a callback promise indicates a promise on the part of the server
        to notify the client in the event of some sort of change to a file or
        volume. In the event of a break, the client has to go and refetch the
        client status from the server and discard any cached permission
        information as the ACL might have changed.
      
        The problem in the current code is that changes made by other clients
        aren't always noticed, primarily because the file status information
        and the callback information aren't updated in the same critical
        section, even if these are carried in the same reply from an RPC
        operation, and so the AFS_VNODE_CB_PROMISED flag is unreliable.
      
        Arranging for them to be done in the same critical section during
        reply decoding is tricky because of the FS.InlineBulkStatus op - which
        has all the statuses in the reply arriving and then all the callbacks,
        so they have to be buffered. It simplifies things a lot to move the
        critical section out of the decode phase and do it after the RPC
        function returns.
      
        Also new inodes (either newly fetched or newly created) aren't
        properly managed against a callback break happening before we get the
        local inode up and running.
      
        Fix this by:
      
         - There's now a combined file status and callback record (struct
           afs_status_cb) to carry both plus some flags.
      
         - Each operation wrapper function allocates sufficient afs_status_cb
           records for all the vnodes it is interested in and passes them into
           RPC operations to be filled in from the reply.
      
         - The FileStatus and CallBack record decoders no longer apply the
           new/revised status and callback information to the inode/vnode at
           the point of decoding and instead store the information into the
           record from (2).
      
         - afs_vnode_commit_status() then revises the file status, detects
           deletion and notes callback information inside of a single critical
           section. It also checks the callback break counters and cancels the
           callback promise if they changed during the operation.
      
           [*] Note that "callback break counters" are counters of server
           events that cancel one or more callback promises that the client
           thinks it has. The client counts the events and compares the
           counters before and after an operation to see if the callback
           promise it thinks it just got evaporated before it got recorded
           under lock.
      
         - Volume and server callback break counters are passed into
           afs_iget() allowing callback breaks concurrent with inode set up to
           be detected and the callback promise thence to be cancelled.
      
         - AFS validation checks are now done under RCU conditions using a
           read lock on cb_lock. This requires vnode->cb_interest to be made
           RCU safe.
      
         - If the checks in (6) fail, the callback breaker is then called
           under write lock on the cb_lock - but only if the callback break
           counter didn't change from the value read before the checks were
           made.
      
         - Results from FS.InlineBulkStatus that correspond to inodes we
           currently have in memory are now used to update those inodes'
           status and callback information rather than being discarded. This
           requires those inodes to be looked up before the RPC op is made and
           all their callback break values saved.
      
        To aid in this, the following changes have also been made:
      
         - Don't pass the vnode into the reply delivery functions or the
           decoders. The vnode shouldn't be altered anywhere in those paths.
           The only exception, for the moment, is for the call done hook for
           file lock ops that wants access to both the vnode and the call -
           this can be fixed at a later time.
      
         - Get rid of the call->reply[] void* array and replace it with named
           and typed members. This avoids confusion since different ops were
           mapping different reply[] members to different things.
      
         - Fix an order-1 kmalloc allocation in afs_do_lookup() and replace it
           with kvcalloc().
      
         - Always get the reply time. Since callback, lock and fileserver
           record expiry times are calculated for several RPCs, make this
           mandatory.
      
         - Call afs_pages_written_back() from the operation wrapper rather
           than from the delivery function.
      
         - Don't store the version and type from a callback promise in a reply
           as the information in them is of very limited use"
      
      * tag 'afs-fixes-b-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix application of the results of a inline bulk status fetch
        afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
        afs: Fix unlink to handle YFS.RemoveFile2 better
        afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry
        afs: Make vnode->cb_interest RCU safe
        afs: Split afs_validate() so first part can be used under LOOKUP_RCU
        afs: Don't save callback version and type fields
        afs: Fix application of status and callback to be under same lock
        afs: Always get the reply time
        afs: Fix order-1 allocation in afs_do_lookup()
        afs: Get rid of afs_call::reply[]
        afs: Don't pass the vnode pointer through into the inline bulk status op
      0d744719
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 227747fb
      Linus Torvalds authored
      Pull misc AFS fixes from David Howells:
       "This fixes a set of miscellaneous issues in the afs filesystem,
        including:
      
         - leak of keys on file close.
      
         - broken error handling in xattr functions.
      
         - missing locking when updating VL server list.
      
         - volume location server DNS lookup whereby preloaded cells may not
           ever get a lookup and regular DNS lookups to maintain server lists
           consume power unnecessarily.
      
         - incorrect error propagation and handling in the fileserver
           iteration code causes operations to sometimes apparently succeed.
      
         - interruption of server record check/update side op during
           fileserver iteration causes uninterruptible main operations to fail
           unexpectedly.
      
         - callback promise expiry time miscalculation.
      
         - over invalidation of the callback promise on directories.
      
         - double locking on callback break waking up file locking waiters.
      
         - double increment of the vnode callback break counter.
      
        Note that it makes some changes outside of the afs code, including:
      
         - an extra parameter to dns_query() to allow the dns_resolver key
           just accessed to be immediately invalidated. AFS is caching the
           results itself, so the key can be discarded.
      
         - an interruptible version of wait_var_event().
      
         - an rxrpc function to allow the maximum lifespan to be set on a
           call.
      
         - a way for an rxrpc call to be marked as non-interruptible"
      
      * tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix double inc of vnode->cb_break
        afs: Fix lock-wait/callback-break double locking
        afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set
        afs: Fix calculation of callback expiry time
        afs: Make dynamic root population wait uninterruptibly for proc_cells_lock
        afs: Make some RPC operations non-interruptible
        rxrpc: Allow the kernel to mark a call as being non-interruptible
        afs: Fix error propagation from server record check/update
        afs: Fix the maximum lifespan of VL and probe calls
        rxrpc: Provide kernel interface to set max lifespan on a call
        afs: Fix "kAFS: AFS vnode with undefined type 0"
        afs: Fix cell DNS lookup
        Add wait_var_event_interruptible()
        dns_resolver: Allow used keys to be invalidated
        afs: Fix afs_cell records to always have a VL server list record
        afs: Fix missing lock when replacing VL server list
        afs: Fix afs_xattr_get_yfs() to not try freeing an error value
        afs: Fix incorrect error handling in afs_xattr_get_acl()
        afs: Fix key leak in afs_release() and afs_evict_inode()
      227747fb
  7. 16 May, 2019 14 commits
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client · 1d9d7cbf
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "On the filesystem side we have:
      
         - a fix to enforce quotas set above the mount point (Luis Henriques)
      
         - support for exporting snapshots through NFS (Zheng Yan)
      
         - proper statx implementation (Jeff Layton). statx flags are mapped
           to MDS caps, with AT_STATX_{DONT,FORCE}_SYNC taken into account.
      
         - some follow-up dentry name handling fixes, in particular
           elimination of our hand-rolled helper and the switch to __getname()
           as suggested by Al (Jeff Layton)
      
         - a set of MDS client cleanups in preparation for async MDS requests
           in the future (Jeff Layton)
      
         - a fix to sync the filesystem before remounting (Jeff Layton)
      
        On the rbd side, work is on-going on object-map and fast-diff image
        features"
      
      * tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client: (29 commits)
        ceph: flush dirty inodes before proceeding with remount
        ceph: fix unaligned access in ceph_send_cap_releases
        libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer
        libceph: fix unaligned accesses in ceph_entity_addr handling
        rbd: don't assert on writes to snapshots
        rbd: client_mutex is never nested
        ceph: print inode number in __caps_issued_mask debugging messages
        ceph: just call get_session in __ceph_lookup_mds_session
        ceph: simplify arguments and return semantics of try_get_cap_refs
        ceph: fix comment over ceph_drop_caps_for_unlink
        ceph: move wait for mds request into helper function
        ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_request
        ceph: after an MDS request, do callback and completions
        ceph: use pathlen values returned by set_request_path_attr
        ceph: use __getname/__putname in ceph_mdsc_build_path
        ceph: use ceph_mdsc_build_path instead of clone_dentry_name
        ceph: fix potential use-after-free in ceph_mdsc_build_path
        ceph: dump granular cap info in "caps" debugfs file
        ceph: make iterate_session_caps a public symbol
        ceph: fix NULL pointer deref when debugging is enabled
        ...
      1d9d7cbf
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 2c45e7fb
      Linus Torvalds authored
      Pull thermal management updates from Zhang Rui:
      
       - Remove the 'module' Kconfig option for thermal subsystem framework
         because the thermal framework are required to be ready as early as
         possible to avoid overheat at boot time (Daniel Lezcano)
      
       - Fix a bug that thermal framework pokes disabled thermal zones upon
         resume (Wei Wang)
      
        - A couple of cleanups and trivial fixes on int340x thermal drivers
          (Srinivas Pandruvada, Zhang Rui, Sumeet Pawnikar)
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
        drivers: thermal: processor_thermal: Downgrade error message
        mlxsw: Remove obsolete dependency on THERMAL=m
        hwmon/drivers/core: Simplify complex dependency
        thermal/drivers/core: Fix typo in the option name
        thermal/drivers/core: Remove depends on THERMAL in Kconfig
        thermal/drivers/core: Remove module unload code
        thermal/drivers/core: Remove the module Kconfig's option
        thermal: core: skip update disabled thermal zones after suspend
        thermal: make device_register's type argument const
        thermal: intel: int340x: processor_thermal_device: simplify to get driver data
        thermal/int3403_thermal: favor _TMP instead of PTYP
      2c45e7fb
    • Linus Torvalds's avatar
      Merge tag 'for-5.2/dm-changes-v2' of... · 311f7128
      Linus Torvalds authored
      Merge tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper updates from Mike Snitzer:
      
       - Improve DM snapshot target's scalability by using finer grained
         locking. Requires some list_bl interface improvements.
      
       - Add ability for DM integrity to use a bitmap mode, that tracks
         regions where data and metadata are out of sync, instead of using a
         journal.
      
       - Improve DM thin provisioning target to not write metadata changes to
         disk if the thin-pool and associated thin devices are merely
         activated but not used. This avoids metadata corruption due to
         concurrent activation of thin devices across different OS instances
         (e.g. split brain scenarios, which ultimately would be avoided if
         proper device filters were used -- but not having proper filtering
         has proven a very common configuration mistake)
      
       - Fix missing call to path selector type->end_io in DM multipath. This
         fixes reported performance problems due to inaccurate path selector
         IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to
         particular path due to it seemingly being heavily used).
      
       - Fix bug in DM cache metadata's loading of its discard bitset that
         could lead to all cache blocks being discarded if the very first
         cache block was discarded (thankfully in practice the first cache
         block is generally in use; be it FS superblock, partition table, disk
         label, etc).
      
       - Add testing-only DM dust target which simulates a device that has
         failing sectors and/or read failures.
      
       - Fix a DM init error path reference count hang that caused boot hangs
         if user supplied malformed input on kernel commandline.
      
       - Fix a couple issues with DM crypt target's logging being overly
         verbose or lacking context.
      
       - Various other small fixes to DM init, DM multipath, DM zoned, and DM
         crypt.
      
      * tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits)
        dm: fix a couple brace coding style issues
        dm crypt: print device name in integrity error message
        dm crypt: move detailed message into debug level
        dm ioctl: fix hang in early create error condition
        dm integrity: whitespace, coding style and dead code cleanup
        dm integrity: implement synchronous mode for reboot handling
        dm integrity: handle machine reboot in bitmap mode
        dm integrity: add a bitmap mode
        dm integrity: introduce a function add_new_range_and_wait()
        dm integrity: allow large ranges to be described
        dm ingerity: pass size to dm_integrity_alloc_page_list()
        dm integrity: introduce rw_journal_sectors()
        dm integrity: update documentation
        dm integrity: don't report unused options
        dm integrity: don't check null pointer before kvfree and vfree
        dm integrity: correctly calculate the size of metadata area
        dm dust: Make dm_dust_init and dm_dust_exit static
        dm dust: remove redundant unsigned comparison to less than zero
        dm mpath: always free attached_handler_name in parse_path()
        dm init: fix max devices/targets checks
        ...
      311f7128
    • Qian Cai's avatar
      slab: remove /proc/slab_allocators · 7878c231
      Qian Cai authored
      It turned out that DEBUG_SLAB_LEAK is still broken even after recent
      recue efforts that when there is a large number of objects like
      kmemleak_object which is normal on a debug kernel,
      
        # grep kmemleak /proc/slabinfo
        kmemleak_object   2243606 3436210 ...
      
      reading /proc/slab_allocators could easily loop forever while processing
      the kmemleak_object cache and any additional freeing or allocating
      objects will trigger a reprocessing. To make a situation worse,
      soft-lockups could easily happen in this sitatuion which will call
      printk() to allocate more kmemleak objects to guarantee an infinite
      loop.
      
      Also, since it seems no one had noticed when it was totally broken
      more than 2-year ago - see the commit fcf88917 ("slab: fix a crash
      by reading /proc/slab_allocators"), probably nobody cares about it
      anymore due to the decline of the SLAB. Just remove it entirely.
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7878c231
    • David Howells's avatar
      afs: Fix application of the results of a inline bulk status fetch · 39db9815
      David Howells authored
      Fix afs_do_lookup() such that when it does an inline bulk status fetch op,
      it will update inodes that are already extant (something that afs_iget()
      doesn't do) and to cache permits for each inode created (thereby avoiding a
      follow up FS.FetchStatus call to determine this).
      
      Extant inodes need looking up in advance so that their cb_break counters
      before and after the operation can be compared.  To this end, the inode
      pointers are cached so that they don't need looking up again after the op.
      
      Fixes: 5cf9dd55 ("afs: Prospectively look up extra files when doing a single lookup")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      39db9815
    • David Howells's avatar
      afs: Pass pre-fetch server and volume break counts into afs_iget5_set() · b8359153
      David Howells authored
      Pass the server and volume break counts from before the status fetch
      operation that queried the attributes of a file into afs_iget5_set() so
      that the new vnode's break counters can be initialised appropriately.
      
      This allows detection of a volume or server break that happened whilst we
      were fetching the status or setting up the vnode.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b8359153
    • David Howells's avatar
      afs: Fix unlink to handle YFS.RemoveFile2 better · a38a7558
      David Howells authored
      Make use of the status update for the target file that the YFS.RemoveFile2
      RPC op returns to correctly update the vnode as to whether the file was
      actually deleted or just had nlink reduced.
      
      Fixes: 30062bd1 ("afs: Implement YFS support in the fs client")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a38a7558
    • David Howells's avatar
      afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry · 61c347ba
      David Howells authored
      Fix afs_validate() to clear AFS_VNODE_CB_PROMISED on a vnode if we detect
      any condition that causes the callback promise to be broken implicitly,
      including server break (cb_s_break), volume break (cb_v_break) or callback
      expiry.
      
      Fixes: ae3b7361 ("afs: Fix validation/callback interaction")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      61c347ba
    • David Howells's avatar
      afs: Make vnode->cb_interest RCU safe · f642404a
      David Howells authored
      Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
      vnode->cb_interest.  Use that change to allow afs_check_validity() to use
      read_seqbegin_or_lock() instead of read_seqlock_excl().
      
      This also requires the caller of afs_check_validity() to hold the RCU read
      lock across the call.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f642404a
    • David Howells's avatar
      afs: Split afs_validate() so first part can be used under LOOKUP_RCU · c925bd0a
      David Howells authored
      Split afs_validate() so that the part that decides if the vnode is still
      valid can be used under LOOKUP_RCU conditions from afs_d_revalidate().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c925bd0a
    • David Howells's avatar
      afs: Don't save callback version and type fields · 7c712458
      David Howells authored
      Don't save callback version and type fields as the version is about the
      format of the callback information and the type is relative to the
      particular RPC call.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7c712458
    • Rob Herring's avatar
      dt-bindings: Convert vendor prefixes to json-schema · 8122de54
      Rob Herring authored
      Convert the vendor prefix registry to a schema. This will enable checking
      that new vendor prefixes are added (in addition to the less than perfect
      checkpatch.pl check) and will also check against adding other prefixes
      which are not vendors.
      
      Converted vendor-prefixes.txt using the following sed script:
      
      sed -e 's/\([a-zA-Z0-9\-]*\)[[:space:]]*\([a-zA-Z0-9].*\)/  "^\1,\.\*\":\n    description: \2/'
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      8122de54
    • Linus Torvalds's avatar
      Merge tag 'media/v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 01be377c
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "Some fixes for some platform drivers (rockchip, atmel, omap, daVinci,
        tegra-cec, coda and rcar).
      
        Also includes a fix on one of the V4L2 uAPI doc, explaining a border
        case"
      
      * tag 'media/v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: rockchip/vpu: Fix/re-order probe-error/remove path
        media: rockchip/vpu: Initialize mdev->bus_info
        media: rockchip/vpu: Get vdev from the file arg in vidioc_querycap()
        media: rockchip/vpu: Add missing dont_use_autosuspend() calls
        media: rockchip/vpu: Do not request id 0 for our video device
        media: tegra-cec: fix cec_notifier_parse_hdmi_phandle return check
        media: davinci/vpbe: array underflow in vpbe_enum_outputs()
        media: field-order.rst: clarify FIELD_ANY and FIELD_NONE
        media: staging/imx: add media device to capture register
        media: rcar-csi2: Propagate the FLD signal for NTSC and PAL
        media: rcar-csi2: restart CSI-2 link if error is detected
        media: omap_vout: potential buffer overflow in vidioc_dqbuf()
        media: coda: fix unset field and fail on invalid field in buf_prepare
        media: atmel: atmel-isc: fix asd memory allocation
        media: atmel: atmel-isc: fix INIT_WORK misplacement
        media: atmel: atmel-isc: limit incoming pixels per frame
      01be377c
    • Linus Torvalds's avatar
      Merge tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 11b11773
      Linus Torvalds authored
      Pull EDAC fixes from Borislav Petkov:
      
       - Do not build mpc85_edac as a module (Michael Ellerman)
      
       - Correct edac_mc_find()'s return value on error (Robert Richter)
      
      * tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
        EDAC/mc: Fix edac_mc_find() in case no device is found
        EDAC/mpc85xx: Prevent building as a module
      11b11773