1. 15 Dec, 2017 3 commits
    • Scott Mayhew's avatar
      nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests · dc4fd9ab
      Scott Mayhew authored
      If there were no commit requests, then nfs_commit_inode() should not
      wait on the commit or mark the inode dirty, otherwise the following
      BUG_ON can be triggered:
      
      [ 1917.130762] kernel BUG at fs/inode.c:578!
      [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
      [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
      [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
      [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G               ------------ T 3.10.0-768.el7.ppc64 #1
      [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000
      [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80
      [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700   Tainted: G               ------------ T  (3.10.0-768.el7.ppc64)
      [ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>  CR: 22002822  XER: 20000000
      [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1
      GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
      GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
      GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
      GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
      GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
      [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350
      [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350
      [ 1917.130873] Call Trace:
      [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable)
      [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270
      [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0
      [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
      [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
      [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180
      [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150
      [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100
      [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
      [ 1917.130919] Instruction dump:
      [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
      [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Cc: stable@vger.kernel.org # 4.5+
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      dc4fd9ab
    • Chuck Lever's avatar
      xprtrdma: Spread reply processing over more CPUs · ccede759
      Chuck Lever authored
      Commit d8f532d2 ("xprtrdma: Invoke rpcrdma_reply_handler
      directly from RECV completion") introduced a performance regression
      for NFS I/O small enough to not need memory registration. In multi-
      threaded benchmarks that generate primarily small I/O requests,
      IOPS throughput is reduced by nearly a third. This patch restores
      the previous level of throughput.
      
      Because workqueues are typically BOUND (in particular ib_comp_wq,
      nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend
      to aggregate on the CPU that is handling Receive completions.
      
      The usual approach to addressing this problem is to create a QP
      and CQ for each CPU, and then schedule transactions on the QP
      for the CPU where you want the transaction to complete. The
      transaction then does not require an extra context switch during
      completion to end up on the same CPU where the transaction was
      started.
      
      This approach doesn't work for the Linux NFS/RDMA client because
      currently the Linux NFS client does not support multiple connections
      per client-server pair, and the RDMA core API does not make it
      straightforward for ULPs to determine which CPU is responsible for
      handling Receive completions for a CQ.
      
      So for the moment, record the CPU number in the rpcrdma_req before
      the transport sends each RPC Call. Then during Receive completion,
      queue the RPC completion on that same CPU.
      
      Additionally, move all RPC completion processing to the deferred
      handler so that even RPCs with simple small replies complete on
      the CPU that sent the corresponding RPC Call.
      
      Fixes: d8f532d2 ("xprtrdma: Invoke rpcrdma_reply_handler ...")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      ccede759
    • Scott Mayhew's avatar
      nfs: fix a deadlock in nfs client initialization · c156618e
      Scott Mayhew authored
      The following deadlock can occur between a process waiting for a client
      to initialize in while walking the client list during nfsv4 server trunking
      detection and another process waiting for the nfs_clid_init_mutex so it
      can initialize that client:
      
      Process 1                               Process 2
      ---------                               ---------
      spin_lock(&nn->nfs_client_lock);
      list_add_tail(&CLIENTA->cl_share_link,
              &nn->nfs_client_list);
      spin_unlock(&nn->nfs_client_lock);
                                              spin_lock(&nn->nfs_client_lock);
                                              list_add_tail(&CLIENTB->cl_share_link,
                                                      &nn->nfs_client_list);
                                              spin_unlock(&nn->nfs_client_lock);
                                              mutex_lock(&nfs_clid_init_mutex);
                                              nfs41_walk_client_list(clp, result, cred);
                                              nfs_wait_client_init_complete(CLIENTA);
      (waiting for nfs_clid_init_mutex)
      
      Make sure nfs_match_client() only evaluates clients that have completed
      initialization in order to prevent that deadlock.
      
      This patch also fixes v4.0 trunking behavior by not marking the client
      NFS_CS_READY until the clientid has been confirmed.
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      c156618e
  2. 30 Nov, 2017 1 commit
  3. 29 Nov, 2017 2 commits
  4. 17 Nov, 2017 34 commits