1. 15 Jan, 2020 28 commits
  2. 14 Jan, 2020 12 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 95e20af9
      Linus Torvalds authored
      Pull NFS client bugfixes from Anna Schumaker:
       "Three NFS over RDMA fixes for bugs Chuck found that can be hit during
        device removal:
      
         - Fix create_qp crash on device unload
      
         - Fix completion wait during device removal
      
         - Fix oops in receive handler after device removal"
      
      * tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        xprtrdma: Fix oops in Receive handler after device removal
        xprtrdma: Fix completion wait during device removal
        xprtrdma: Fix create_qp crash on device unload
      95e20af9
    • Chuck Lever's avatar
      xprtrdma: Fix oops in Receive handler after device removal · 671c450b
      Chuck Lever authored
      Since v5.4, a device removal occasionally triggered this oops:
      
      Dec  2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
      Dec  2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
      Dec  2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
      Dec  2 17:13:53 manet kernel: PGD 0 P4D 0
      Dec  2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
      Dec  2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G        W         5.4.0-00050-g53717e43af61 #883
      Dec  2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Dec  2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      Dec  2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
      Dec  2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
      Dec  2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
      Dec  2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
      Dec  2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
      Dec  2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
      Dec  2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
      Dec  2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
      Dec  2 17:13:53 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
      Dec  2 17:13:53 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Dec  2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
      Dec  2 17:13:53 manet kernel: Call Trace:
      Dec  2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
      Dec  2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
      Dec  2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: kthread+0xf4/0xf9
      Dec  2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
      Dec  2 17:13:53 manet kernel: ret_from_fork+0x24/0x30
      
      The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
      is still pointing to the old ib_device, which has been freed. The
      only way that is possible is if this rpcrdma_rep was not destroyed
      by rpcrdma_ia_remove.
      
      Debugging showed that was indeed the case: this rpcrdma_rep was
      still in use by a completing RPC at the time of the device removal,
      and thus wasn't on the rep free list. So, it was not found by
      rpcrdma_reps_destroy().
      
      The fix is to introduce a list of all rpcrdma_reps so that they all
      can be found when a device is removed. That list is used to perform
      only regbuf DMA unmapping, replacing that call to
      rpcrdma_reps_destroy().
      
      Meanwhile, to prevent corruption of this list, I've moved the
      destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
      rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
      not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
      protecting the rb_all_reps list.
      
      Fixes: b0b227f0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      671c450b
    • Chuck Lever's avatar
      xprtrdma: Fix completion wait during device removal · 13cb886c
      Chuck Lever authored
      I've found that on occasion, "rmmod <dev>" will hang while if an NFS
      is under load.
      
      Ensure that ri_remove_done is initialized only just before the
      transport is woken up to force a close. This avoids the completion
      possibly getting initialized again while the CM event handler is
      waiting for a wake-up.
      
      Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from under an NFS mount")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      13cb886c
    • Chuck Lever's avatar
      xprtrdma: Fix create_qp crash on device unload · b32b9ed4
      Chuck Lever authored
      On device re-insertion, the RDMA device driver crashes trying to set
      up a new QP:
      
      Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
      Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
      Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
      Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
      Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
      Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G        W         5.4.0 #852
      Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
      Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
      Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
      Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
      Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
      Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
      Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
      Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
      Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
      Nov 27 16:32:06 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
      Nov 27 16:32:06 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
      Nov 27 16:32:06 manet kernel: Call Trace:
      Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
      Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
      Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
      Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
      Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]
      
      The fix is to copy the qp_init_attr struct that was just created by
      rpcrdma_ep_create() instead of using the one from the previous
      connection instance.
      
      Fixes: 98ef77d1 ("xprtrdma: Send Queue size grows after a reconnect")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      b32b9ed4
    • Linus Torvalds's avatar
      Merge branch 'parisc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 452424cd
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "A boot crash fix by Mike Rapoport and a printk fix by Krzysztof
        Kozlowski"
      
      * 'parisc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: fix map_pages() to actually populate upper directory
        parisc: Use proper printk format for resource_size_t
      452424cd
    • Linus Torvalds's avatar
      Merge tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground · 67373994
      Linus Torvalds authored
      Pull asm-generic fixes from Arnd Bergmann:
       "Here are two bugfixes from Mike Rapoport, both fixing compile-time
        errors for the nds32 architecture that were recently introduced"
      
      * tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
        nds32: fix build failure caused by page table folding updates
        asm-generic/nds32: don't redefine cacheflush primitives
      67373994
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c21ed4d9
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two simple fixes in the upper drivers (so both fairly core), one in
        enclosures, which fixes replugging a device into an enclosure slot and
        one in the disk driver which fixes revalidating a drive with
        protection information (PI) to make it a non-PI drive ... previously
        we were still remembering the old PI state.
      
        Both fixed issues are quite rare in the field"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: enclosure: Fix stale device oops with hot replug
        scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI
      c21ed4d9
    • Linus Torvalds's avatar
      Merge branch 'dhowells' (patches from DavidH) · e033e7d4
      Linus Torvalds authored
      Merge misc fixes from David Howells.
      
      Two afs fixes and a key refcounting fix.
      
      * dhowells:
        afs: Fix afs_lookup() to not clobber the version on a new dentry
        afs: Fix use-after-loss-of-ref
        keys: Fix request_key() cache
      e033e7d4
    • David Howells's avatar
      afs: Fix afs_lookup() to not clobber the version on a new dentry · f52b83b0
      David Howells authored
      Fix afs_lookup() to not clobber the version set on a new dentry by
      afs_do_lookup() - especially as it's using the wrong version of the
      version (we need to use the one given to us by whatever op the dir
      contents correspond to rather than what's in the afs_vnode).
      
      Fixes: 9dd0b82e ("afs: Fix missing dentry data version updating")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f52b83b0
    • David Howells's avatar
      afs: Fix use-after-loss-of-ref · 40a708bd
      David Howells authored
      afs_lookup() has a tracepoint to indicate the outcome of
      d_splice_alias(), passing it the inode to retrieve the fid from.
      However, the function gave up its ref on that inode when it called
      d_splice_alias(), which may have failed and dropped the inode.
      
      Fix this by caching the fid.
      
      Fixes: 80548b03 ("afs: Add more tracepoints")
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      40a708bd
    • David Howells's avatar
      keys: Fix request_key() cache · 8379bb84
      David Howells authored
      When the key cached by request_key() and co.  is cleaned up on exit(),
      the code looks in the wrong task_struct, and so clears the wrong cache.
      This leads to anomalies in key refcounting when doing, say, a kernel
      build on an afs volume, that then trigger kasan to report a
      use-after-free when the key is viewed in /proc/keys.
      
      Fix this by making exit_creds() look in the passed-in task_struct rather
      than in current (the task_struct cleanup code is deferred by RCU and
      potentially run in another task).
      
      Fixes: 7743c48e ("keys: Cache result of request_key*() temporarily in task_struct")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8379bb84
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 3f1f9a9b
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "11 mm fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE
        mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
        mm/page-writeback.c: improve arithmetic divisions
        mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide
        mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
        mm, debug_pagealloc: don't rely on static keys too early
        mm: memcg/slab: fix percpu slab vmstats flushing
        mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
        mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
        mm/memory_hotplug: don't free usage map when removing a re-added early section
        mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
      3f1f9a9b