1. 04 Jan, 2024 11 commits
    • Trond Myklebust's avatar
      NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT · 037e56a2
      Trond Myklebust authored
      Once the client has processed the CB_LAYOUTRECALL, but has not yet
      successfully returned the layout, the server is supposed to switch to
      returning NFS4ERR_RETURNCONFLICT. This patch ensures that we handle
      that return value correctly.
      
      Fixes: 183d9e7b ("pnfs: rework LAYOUTGET retry handling")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      037e56a2
    • Trond Myklebust's avatar
      NFSv4.1: if referring calls are complete, trust the stateid argument · dce72920
      Trond Myklebust authored
      If the server is recalling a layout, and sends us a list of referring
      calls that we can see are complete, then we should just trust that the
      stateid argument is correct, even if the sequence id doesn't match the
      one we hold.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      dce72920
    • Trond Myklebust's avatar
      NFSv4: Track the number of referring calls in struct cb_process_state · e3fd54e7
      Trond Myklebust authored
      When the server gives us a set of referring calls, to tell us that the
      NFSv4.1 callback needs to be ordered with respect to those calls, then
      we may want to make that information available to the operations. In
      certain cases, it may allow them to optimise their behaviour due to the
      extra knowledge.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      e3fd54e7
    • Scott Mayhew's avatar
      NFS: Use parent's objective cred in nfs_access_login_time() · a10a9233
      Scott Mayhew authored
      The subjective cred (task->cred) can potentially be overridden and
      subsquently freed in non-RCU context, which could lead to a panic if we
      try to use it in cred_fscmp().  Use __task_cred(), which returns the
      objective cred (task->real_cred) instead.
      
      Fixes: 0eb43812 ("NFS: Clear the file access cache upon login")
      Fixes: 5e9a7b9c ("NFS: Fix up a sparse warning")
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      a10a9233
    • Benjamin Coddington's avatar
      NFSv4: Always ask for type with READDIR · b4d4fd60
      Benjamin Coddington authored
      Again we have claimed regressions for walking a directory tree, this time
      with the "find" utility which always tries to optimize away asking for any
      attributes until it has a complete list of entries.  This behavior makes
      the readdir plus heuristic do the wrong thing, which causes a storm of
      GETATTRs to determine each entry's type in order to continue the walk.
      
      For v4 add the type attribute to each READDIR request to include it no
      matter the heuristic.  This allows a simple `find` command to proceed
      quickly through a directory tree.
      Suggested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      b4d4fd60
    • Benjamin Coddington's avatar
      pnfs/blocklayout: Don't add zero-length pnfs_block_dev · d76c769c
      Benjamin Coddington authored
      We noticed a SCSI device that refused to allow READ CAPACITY when the
      device had a PR with exclusive access, registrants only.  The result of
      this situation is that the blocklayout driver adds a pnfs_block_dev of zero
      length which always fails the offset_in_map tests.  Instead of continuously
      trying to do pNFS for this case, just mark the device as unavailable which
      will allow the client to fallback to the MDS for the duration of
      PNFS_DEVICE_RETRY_TIMEOUT.
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      d76c769c
    • Benjamin Coddington's avatar
      blocklayoutdriver: Fix reference leak of pnfs_device_node · 1530827b
      Benjamin Coddington authored
      The error path for blocklayout's device lookup is missing a reference drop
      for the case where a lookup finds the device, but the device is marked with
      NFS_DEVICEID_UNAVAILABLE.
      
      Fixes: b3dce6a2 ("pnfs/blocklayout: handle transient devices")
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      1530827b
    • Anna Schumaker's avatar
      SUNRPC: Fix a suspicious RCU usage warning · 31b62908
      Anna Schumaker authored
      I received the following warning while running cthon against an ontap
      server running pNFS:
      
      [   57.202521] =============================
      [   57.202522] WARNING: suspicious RCU usage
      [   57.202523] 6.7.0-rc3-g2cc14f52 #41492 Not tainted
      [   57.202525] -----------------------------
      [   57.202525] net/sunrpc/xprtmultipath.c:349 RCU-list traversed in non-reader section!!
      [   57.202527]
                     other info that might help us debug this:
      
      [   57.202528]
                     rcu_scheduler_active = 2, debug_locks = 1
      [   57.202529] no locks held by test5/3567.
      [   57.202530]
                     stack backtrace:
      [   57.202532] CPU: 0 PID: 3567 Comm: test5 Not tainted 6.7.0-rc3-g2cc14f52 #41492 5b09971b4965c0aceba19f3eea324a4a806e227e
      [   57.202534] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
      [   57.202536] Call Trace:
      [   57.202537]  <TASK>
      [   57.202540]  dump_stack_lvl+0x77/0xb0
      [   57.202551]  lockdep_rcu_suspicious+0x154/0x1a0
      [   57.202556]  rpc_xprt_switch_has_addr+0x17c/0x190 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
      [   57.202596]  rpc_clnt_setup_test_and_add_xprt+0x50/0x180 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
      [   57.202621]  ? rpc_clnt_add_xprt+0x254/0x300 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
      [   57.202646]  rpc_clnt_add_xprt+0x27a/0x300 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
      [   57.202671]  ? __pfx_rpc_clnt_setup_test_and_add_xprt+0x10/0x10 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
      [   57.202696]  nfs4_pnfs_ds_connect+0x345/0x760 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
      [   57.202728]  ? __pfx_nfs4_test_session_trunk+0x10/0x10 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
      [   57.202754]  nfs4_fl_prepare_ds+0x75/0xc0 [nfs_layout_nfsv41_files e3a4187f18ae8a27b630f9feae6831b584a9360a]
      [   57.202760]  filelayout_write_pagelist+0x4a/0x200 [nfs_layout_nfsv41_files e3a4187f18ae8a27b630f9feae6831b584a9360a]
      [   57.202765]  pnfs_generic_pg_writepages+0xbe/0x230 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
      [   57.202788]  __nfs_pageio_add_request+0x3fd/0x520 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
      [   57.202813]  nfs_pageio_add_request+0x18b/0x390 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
      [   57.202831]  nfs_do_writepage+0x116/0x1e0 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
      [   57.202849]  nfs_writepages_callback+0x13/0x30 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
      [   57.202866]  write_cache_pages+0x265/0x450
      [   57.202870]  ? __pfx_nfs_writepages_callback+0x10/0x10 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
      [   57.202891]  nfs_writepages+0x141/0x230 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
      [   57.202913]  do_writepages+0xd2/0x230
      [   57.202917]  ? filemap_fdatawrite_wbc+0x5c/0x80
      [   57.202921]  filemap_fdatawrite_wbc+0x67/0x80
      [   57.202924]  filemap_write_and_wait_range+0xd9/0x170
      [   57.202930]  nfs_wb_all+0x49/0x180 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
      [   57.202947]  nfs4_file_flush+0x72/0xb0 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
      [   57.202969]  __se_sys_close+0x46/0xd0
      [   57.202972]  do_syscall_64+0x68/0x100
      [   57.202975]  ? do_syscall_64+0x77/0x100
      [   57.202976]  ? do_syscall_64+0x77/0x100
      [   57.202979]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [   57.202982] RIP: 0033:0x7fe2b12e4a94
      [   57.202985] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 18 0e 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 c3
      [   57.202987] RSP: 002b:00007ffe857ddb38 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
      [   57.202989] RAX: ffffffffffffffda RBX: 00007ffe857dfd68 RCX: 00007fe2b12e4a94
      [   57.202991] RDX: 0000000000002000 RSI: 00007ffe857ddc40 RDI: 0000000000000003
      [   57.202992] RBP: 00007ffe857dfc50 R08: 7fffffffffffffff R09: 0000000065650f49
      [   57.202993] R10: 00007fe2b11f8300 R11: 0000000000000202 R12: 0000000000000000
      [   57.202994] R13: 00007ffe857dfd80 R14: 00007fe2b1445000 R15: 0000000000000000
      [   57.202999]  </TASK>
      
      The problem seems to be that two out of three callers aren't taking the
      rcu_read_lock() before calling the list_for_each_entry_rcu() function in
      rpc_xprt_switch_has_addr(). I fix this by having
      rpc_xprt_switch_has_addr() unconditionaly take the rcu_read_lock(),
      which is okay to do recursively in the case that the lock has already
      been taken by a caller.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      31b62908
    • Anna Schumaker's avatar
      SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switch · a902f3de
      Anna Schumaker authored
      This function takes the necessary rcu read lock to dereference the
      client's rpc_xprt_switch and bump the reference count so it doesn't
      disappear underneath us before returning. This does mean that callers
      are responsible for calling xprt_switch_put() on the returned object
      when they are done with it.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      a902f3de
    • Anna Schumaker's avatar
    • Anna Schumaker's avatar
      SUNRPC: Clean up unused variable in rpc_xprt_probe_trunked() · ec677b58
      Anna Schumaker authored
      We don't use the rpc_xprt_switch anywhere in this function, so let's not
      take an extra reference to in unnecessarily.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      ec677b58
  2. 31 Dec, 2023 3 commits
  3. 30 Dec, 2023 5 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 453f5db0
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix readers that are blocked on the ring buffer when buffer_percent
         is 100%. They are supposed to wake up when the buffer is full, but
         because the sub-buffer that the writer is on is never considered
         "dirty" in the calculation, dirty pages will never equal nr_pages.
         Add +1 to the dirty count in order to count for the sub-buffer that
         the writer is on.
      
       - When a reader is blocked on the "snapshot_raw" file, it is to be
         woken up when a snapshot is done and be able to read the snapshot
         buffer. But because the snapshot swaps the buffers (the main one with
         the snapshot one), and the snapshot reader is waiting on the old
         snapshot buffer, it was not woken up (because it is now on the main
         buffer after the swap). Worse yet, when it reads the buffer after a
         snapshot, it's not reading the snapshot buffer, it's reading the live
         active main buffer.
      
         Fix this by forcing a wakeup of all readers on the snapshot buffer
         when a new snapshot happens, and then update the buffer that the
         reader is reading to be back on the snapshot buffer.
      
       - Fix the modification of the direct_function hash. There was a race
         when new functions were added to the direct_function hash as when it
         moved function entries from the old hash to the new one, a direct
         function trace could be hit and not see its entry.
      
         This is fixed by allocating the new hash, copy all the old entries
         onto it as well as the new entries, and then use rcu_assign_pointer()
         to update the new direct_function hash with it.
      
         This also fixes a memory leak in that code.
      
       - Fix eventfs ownership
      
      * tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ftrace: Fix modification of direct_function hash while in use
        tracing: Fix blocked reader of snapshot buffer
        ring-buffer: Fix wake ups when buffer_percent is set to 100
        eventfs: Fix file and directory uid and gid ownership
      453f5db0
    • David Laight's avatar
      locking/osq_lock: Clarify osq_wait_next() · b106bcf0
      David Laight authored
      Directly return NULL or 'next' instead of breaking out of the loop.
      Signed-off-by: default avatarDavid Laight <david.laight@aculab.com>
      [ Split original patch into two independent parts  - Linus ]
      Link: https://lore.kernel.org/lkml/7c8828aec72e42eeb841ca0ee3397e9a@AcuMS.aculab.com/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b106bcf0
    • David Laight's avatar
      locking/osq_lock: Clarify osq_wait_next() calling convention · 563adbfc
      David Laight authored
      osq_wait_next() is passed 'prev' from osq_lock() and NULL from
      osq_unlock() but only needs the 'cpu' value to write to lock->tail.
      
      Just pass prev->cpu or OSQ_UNLOCKED_VAL instead.
      
      Should have no effect on the generated code since gcc manages to assume
      that 'prev != NULL' due to an earlier dereference.
      Signed-off-by: default avatarDavid Laight <david.laight@aculab.com>
      [ Changed 'old' to 'old_cpu' by request from Waiman Long  - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      563adbfc
    • David Laight's avatar
      locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c · 7c223098
      David Laight authored
      struct optimistic_spin_node is private to the implementation.
      Move it into the C file to ensure nothing is accessing it.
      Signed-off-by: default avatarDavid Laight <david.laight@aculab.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c223098
    • Steven Rostedt (Google)'s avatar
      ftrace: Fix modification of direct_function hash while in use · d05cb470
      Steven Rostedt (Google) authored
      Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
      if the number of new entries are added is large enough to cause two
      allocations in the loop:
      
              for (i = 0; i < size; i++) {
                      hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
                              new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
                              if (!new)
                                      goto out_remove;
                              entry->direct = addr;
                      }
              }
      
      Where ftrace_add_rec_direct() has:
      
              if (ftrace_hash_empty(direct_functions) ||
                  direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
                      struct ftrace_hash *new_hash;
                      int size = ftrace_hash_empty(direct_functions) ? 0 :
                              direct_functions->count + 1;
      
                      if (size < 32)
                              size = 32;
      
                      new_hash = dup_hash(direct_functions, size);
                      if (!new_hash)
                              return NULL;
      
                      *free_hash = direct_functions;
                      direct_functions = new_hash;
              }
      
      The "*free_hash = direct_functions;" can happen twice, losing the previous
      allocation of direct_functions.
      
      But this also exposed a more serious bug.
      
      The modification of direct_functions above is not safe. As
      direct_functions can be referenced at any time to find what direct caller
      it should call, the time between:
      
                      new_hash = dup_hash(direct_functions, size);
       and
                      direct_functions = new_hash;
      
      can have a race with another CPU (or even this one if it gets interrupted),
      and the entries being moved to the new hash are not referenced.
      
      That's because the "dup_hash()" is really misnamed and is really a
      "move_hash()". It moves the entries from the old hash to the new one.
      
      Now even if that was changed, this code is not proper as direct_functions
      should not be updated until the end. That is the best way to handle
      function reference changes, and is the way other parts of ftrace handles
      this.
      
      The following is done:
      
       1. Change add_hash_entry() to return the entry it created and inserted
          into the hash, and not just return success or not.
      
       2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
          the former.
      
       3. Allocate a "new_hash" at the start that is made for holding both the
          new hash entries as well as the existing entries in direct_functions.
      
       4. Copy (not move) the direct_function entries over to the new_hash.
      
       5. Copy the entries of the added hash to the new_hash.
      
       6. If everything succeeds, then use rcu_pointer_assign() to update the
          direct_functions with the new_hash.
      
      This simplifies the code and fixes both the memory leak as well as the
      race condition mentioned above.
      
      Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
      Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Fixes: 763e34e7 ("ftrace: Add register_ftrace_direct()")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d05cb470
  4. 29 Dec, 2023 10 commits
  5. 28 Dec, 2023 5 commits
  6. 27 Dec, 2023 4 commits
  7. 26 Dec, 2023 2 commits
    • Edward Adam Davis's avatar
      keys, dns: Fix missing size check of V1 server-list header · 1997b3cb
      Edward Adam Davis authored
      The dns_resolver_preparse() function has a check on the size of the
      payload for the basic header of the binary-style payload, but is missing
      a check for the size of the V1 server-list payload header after
      determining that's what we've been given.
      
      Fix this by getting rid of the the pointer to the basic header and just
      assuming that we have a V1 server-list payload and moving the V1 server
      list pointer inside the if-statement.  Dealing with other types and
      versions can be left for when such have been defined.
      
      This can be tested by doing the following with KASAN enabled:
      
          echo -n -e '\x0\x0\x1\x2' | keyctl padd dns_resolver foo @p
      
      and produces an oops like the following:
      
          BUG: KASAN: slab-out-of-bounds in dns_resolver_preparse+0xc9f/0xd60 net/dns_resolver/dns_key.c:127
          Read of size 1 at addr ffff888028894084 by task syz-executor265/5069
          ...
          Call Trace:
            dns_resolver_preparse+0xc9f/0xd60 net/dns_resolver/dns_key.c:127
            __key_create_or_update+0x453/0xdf0 security/keys/key.c:842
            key_create_or_update+0x42/0x50 security/keys/key.c:1007
            __do_sys_add_key+0x29c/0x450 security/keys/keyctl.c:134
            do_syscall_x64 arch/x86/entry/common.c:52 [inline]
            do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
            entry_SYSCALL_64_after_hwframe+0x62/0x6a
      
      This patch was originally by Edward Adam Davis, but was modified by
      Linus.
      
      Fixes: b946001d3bb1 ("keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry")
      Reported-and-tested-by: syzbot+94bbb75204a05da3d89f@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/0000000000009b39bc060c73e209@google.com/Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarEdward Adam Davis <eadavis@qq.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Edward Adam Davis <eadavis@qq.com>
      Cc: Jarkko Sakkinen <jarkko@kernel.org>
      Cc: Jeffrey E Altman <jaltman@auristor.com>
      Cc: Wang Lei <wang840925@gmail.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Steve French <sfrench@us.ibm.com>
      Cc: Marc Dionne <marc.dionne@auristor.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1997b3cb
    • Christoph Hellwig's avatar
      block: renumber QUEUE_FLAG_HW_WC · 02d374f3
      Christoph Hellwig authored
      For the QUEUE_FLAG_HW_WC to actually work, it needs to have a separate
      number from QUEUE_FLAG_FUA, doh.
      
      Fixes: 43c9835b ("block: don't allow enabling a cache on devices that don't support it")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20231226081524.180289-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      02d374f3