1. 07 Nov, 2017 3 commits
    • Elena Reshetova's avatar
      fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t · a15dfcd5
      Elena Reshetova authored
      
      atomic_t variables are currently used to implement reference
      counters with the following properties:
       - counter is initialized to 1 using atomic_set()
       - a resource is freed upon counter reaching zero
       - once counter reaches zero, its further
         increments aren't allowed
       - counter schema uses basic atomic operations
         (set, inc, inc_not_zero, dec_and_test, etc.)
      
      Such atomic variables should be converted to a newly provided
      refcount_t type and API that prevents accidental counter overflows
      and underflows. This is important since overflows and underflows
      can lead to use-after-free situation and be exploitable.
      
      The variable nfs4_stid.sc_count is used as pure reference counter.
      Convert it to refcount_t and fix up the operations.
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarDavid Windsor <dwindsor@gmail.com>
      Reviewed-by: default avatarHans Liljestrand <ishkamiel@gmail.com>
      Signed-off-by: default avatarElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      a15dfcd5
    • J. Bruce Fields's avatar
      nfsd4: catch some false session retries · 53da6a53
      J. Bruce Fields authored
      
      The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
      the client is making a call that matches a previous (slot, seqid) pair
      but that *isn't* actually a replay, because some detail of the call
      doesn't actually match the previous one.
      
      Catching every such case is difficult, but we may as well catch a few
      easy ones.  This also handles the case described in the previous patch,
      in a different way.
      
      The spec does however require us to catch the case where the difference
      is in the rpc credentials.  This prevents somebody from snooping another
      user's replies by fabricating retries.
      
      (But the practical value of the attack is limited by the fact that the
      replies with the most sensitive data are READ replies, which are not
      normally cached.)
      Tested-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      53da6a53
    • J. Bruce Fields's avatar
      nfsd4: fix cached replies to solo SEQUENCE compounds · 085def3a
      J. Bruce Fields authored
      
      Currently our handling of 4.1+ requests without "cachethis" set is
      confusing and not quite correct.
      
      Suppose a client sends a compound consisting of only a single SEQUENCE
      op, and it matches the seqid in a session slot (so it's a retry), but
      the previous request with that seqid did not have "cachethis" set.
      
      The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
      but the protocol only allows that to be returned on the op following the
      SEQUENCE, and there is no such op in this case.
      
      The protocol permits us to cache replies even if the client didn't ask
      us to.  And it's easy to do so in the case of solo SEQUENCE compounds.
      
      So, when we get a solo SEQUENCE, we can either return the previously
      cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
      way from the original call.
      
      Currently, we're returning a corrupt reply in the case a solo SEQUENCE
      matches a previous compound with more ops.  This actually matters
      because the Linux client recently started doing this as a way to recover
      from lost replies to idempotent operations in the case the process doing
      the original reply was killed: in that case it's difficult to keep the
      original arguments around to do a real retry, and the client no longer
      cares what the result is anyway, but it would like to make sure that the
      slot's sequence id has been incremented, and the solo SEQUENCE assures
      that: if the server never got the original reply, it will increment the
      sequence id.  If it did get the original reply, it won't increment, and
      nothing else that about the reply really matters much.  But we can at
      least attempt to return valid xdr!
      Tested-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      085def3a
  2. 17 Feb, 2017 1 commit
  3. 31 Jan, 2017 1 commit
  4. 26 Sep, 2016 3 commits
    • Jeff Layton's avatar
      nfsd: add a LRU list for blocked locks · 7919d0a2
      Jeff Layton authored
      
      It's possible for a client to call in on a lock that is blocked for a
      long time, but discontinue polling for it. A malicious client could
      even set a lock on a file, and then spam the server with failing lock
      requests from different lockowners that pile up in a DoS attack.
      
      Add the blocked lock structures to a per-net namespace LRU when hashing
      them, and timestamp them. If the lock request is not revisited after a
      lease period, we'll drop it under the assumption that the client is no
      longer interested.
      
      This also gives us a mechanism to clean up these objects at server
      shutdown time as well.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      7919d0a2
    • Jeff Layton's avatar
      nfsd: have nfsd4_lock use blocking locks for v4.1+ locks · 76d348fa
      Jeff Layton authored
      
      Create a new per-lockowner+per-inode structure that contains a
      file_lock. Have nfsd4_lock add this structure to the lockowner's list
      prior to setting the lock. Then call the vfs and request a blocking lock
      (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
      back, then we dequeue the block structure and free it. When the next
      lock request comes in, we'll look for an existing block for the same
      filehandle and dequeue and reuse it if there is one.
      
      When the lock comes free (a'la an lm_notify call), we dequeue it
      from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
      inform the client that it should retry the lock request.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      76d348fa
    • Jeff Layton's avatar
      nfsd: plumb in a CB_NOTIFY_LOCK operation · a188620e
      Jeff Layton authored
      
      Add the encoding/decoding for CB_NOTIFY_LOCK operations.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      a188620e
  5. 16 Sep, 2016 1 commit
  6. 13 Jul, 2016 1 commit
    • Andrew Elble's avatar
      nfsd: implement machine credential support for some operations · ed941643
      Andrew Elble authored
      
      This addresses the conundrum referenced in RFC5661 18.35.3,
      and will allow clients to return state to the server using the
      machine credentials.
      
      The biggest part of the problem is that we need to allow the client
      to send a compound op with integrity/privacy on mounts that don't
      have it enabled.
      
      Add server support for properly decoding and using spo_must_enforce
      and spo_must_allow bits. Add support for machine credentials to be
      used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
      and TEST/FREE STATEID.
      Implement a check so as to not throw WRONGSEC errors when these
      operations are used if integrity/privacy isn't turned on.
      
      Without this, Linux clients with credentials that expired while holding
      delegations were getting stuck in an endless loop.
      Signed-off-by: default avatarAndrew Elble <aweits@rit.edu>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      ed941643
  7. 16 Jun, 2016 1 commit
    • Oleg Drokin's avatar
      nfsd: Always lock state exclusively. · feb9dad5
      Oleg Drokin authored
      
      It used to be the case that state had an rwlock that was locked for write
      by downgrades, but for read for upgrades (opens). Well, the problem is
      if there are two competing opens for the same state, they step on
      each other toes potentially leading to leaking file descriptors
      from the state structure, since access mode is a bitmap only set once.
      Signed-off-by: default avatarOleg Drokin <green@linuxhacker.ru>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      feb9dad5
  8. 13 May, 2016 1 commit
  9. 08 Dec, 2015 1 commit
  10. 23 Nov, 2015 1 commit
  11. 23 Oct, 2015 2 commits
    • Jeff Layton's avatar
      nfsd: ensure that seqid morphing operations are atomic wrt to copies · 9767feb2
      Jeff Layton authored
      
      Bruce points out that the increment of the seqid in stateids is not
      serialized in any way, so it's possible for racing calls to bump it
      twice and end up sending the same stateid. While we don't have any
      reports of this problem it _is_ theoretically possible, and could lead
      to spurious state recovery by the client.
      
      In the current code, update_stateid is always followed by a memcpy of
      that stateid, so we can combine the two operations. For better
      atomicity, we add a spinlock to the nfs4_stid and hold that when bumping
      the seqid and copying the stateid.
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      9767feb2
    • Jeff Layton's avatar
      nfsd: serialize layout stateid morphing operations · cc8a5532
      Jeff Layton authored
      
      In order to allow the client to make a sane determination of what
      happened with racing LAYOUTGET/LAYOUTRETURN/CB_LAYOUTRECALL calls, we
      must ensure that the seqids return accurately represent the order of
      operations. The simplest way to do that is to ensure that operations on
      a single stateid are serialized.
      
      This patch adds a mutex to the layout stateid, and locks it when
      checking the layout stateid's seqid. The mutex is held over the entire
      operation and released after the seqid is bumped.
      
      Note that in the case of CB_LAYOUTRECALL we must move the increment of
      the seqid and setting into a new cb "prepare" operation. The lease
      infrastructure will call the lm_break callback with a spinlock held, so
      and we can't take the mutex in that codepath.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      cc8a5532
  12. 12 Oct, 2015 1 commit
    • Jeff Layton's avatar
      nfsd: serialize state seqid morphing operations · 35a92fe8
      Jeff Layton authored
      
      Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
      running in parallel. The server would receive the OPEN_DOWNGRADE first
      and check its seqid, but then an OPEN would race in and bump it. The
      OPEN_DOWNGRADE would then complete and bump the seqid again.  The result
      was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
      it should have been rejected since the seqid changed.
      
      The only recourse we have here I think is to serialize operations that
      bump the seqid in a stateid, particularly when we're given a seqid in
      the call. To address this, we add a new rw_semaphore to the
      nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
      after looking up the stateid to ensure that nothing else is going to
      bump it while we're operating on it.
      
      In the case of OPEN, we do a down_read, as the call doesn't contain a
      seqid. Those can run in parallel -- we just need to serialize them when
      there is a concurrent OPEN_DOWNGRADE or CLOSE.
      
      LOCK and LOCKU however always take the write lock as there is no
      opportunity for parallelizing those.
      Reported-and-Tested-by: default avatarAndrew W Elble <aweits@rit.edu>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      35a92fe8
  13. 13 Aug, 2015 1 commit
  14. 22 Jun, 2015 1 commit
    • Christoph Hellwig's avatar
      nfsd: take struct file setup fully into nfs4_preprocess_stateid_op · af90f707
      Christoph Hellwig authored
      
      This patch changes nfs4_preprocess_stateid_op so it always returns
      a valid struct file if it has been asked for that.  For that we
      now allocate a temporary struct file for special stateids, and check
      permissions if we got the file structure from the stateid.  This
      ensures that all callers will get their handling of special stateids
      right, and avoids code duplication.
      
      There is a little wart in here because the read code needs to know
      if we allocated a file structure so that it can copy around the
      read-ahead parameters.  In the long run we should probably aim to
      cache full file structures used with special stateids instead.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      af90f707
  15. 04 Jun, 2015 1 commit
  16. 04 May, 2015 3 commits
    • Christoph Hellwig's avatar
      nfsd: fix callback restarts · cba5f62b
      Christoph Hellwig authored
      
      Checking the rpc_client pointer is not a reliable way to detect
      backchannel changes: cl_cb_client is changed only after shutting down
      the rpc client, so the condition cl_cb_client = tk_client will always be
      true.
      
      Check the RPC_TASK_KILLED flag instead, and rewrite the code to avoid
      the buggy cl_callbacks list and fix the lifetime rules due to double
      calls of the ->prepare callback operations method for this retry case.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      cba5f62b
    • Christoph Hellwig's avatar
      nfsd: split transport vs operation errors for callbacks · ef2a1b3e
      Christoph Hellwig authored
      
      We must only increment the sequence id if the client has seen and responded
      to a request.  If we failed to deliver it to the client we must resend with
      the same sequence id.  So just like the client track errors at the transport
      level differently from those returned in the XDR.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      ef2a1b3e
    • Sachin Bhamare's avatar
      nfsd: fix pNFS return on close semantics · 8287f009
      Sachin Bhamare authored
      
      For the sake of forgetful clients, the server should return the layouts
      to the file system on 'last close' of a file (assuming that there are no
      delegations outstanding to that particular client) or on delegreturn
      (assuming that there are no opens on a file from that particular
      client).
      
      In theory the information is all there in current data structures, but
      it's not efficiently available; nfs4_file->fi_ref includes references on
      the file across all clients, but we need a per-(client, file) count.
      Walking through lots of stateid's to calculate this on each close or
      delegreturn would be painful.
      
      This patch introduces infrastructure to maintain per-client opens and
      delegation counters on a per-file basis.
      
      [hch: ported to the mainline pNFS support, merged various fixes from Jeff]
      Signed-off-by: default avatarSachin Bhamare <sachin.bhamare@primarydata.com>
      Signed-off-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: J. Bruce Fiel...
      8287f009
  17. 02 Feb, 2015 5 commits
    • Christoph Hellwig's avatar
      nfsd: implement pNFS layout recalls · c5c707f9
      Christoph Hellwig authored
      
      Add support to issue layout recalls to clients.  For now we only support
      full-file recalls to get a simple and stable implementation.  This allows
      to embedd a nfsd4_callback structure in the layout_state and thus avoid
      any memory allocations under spinlocks during a recall.  For normal
      use cases that do not intent to share a single file between multiple
      clients this implementation is fully sufficient.
      
      To ensure layouts are recalled on local filesystem access each layout
      state registers a new FL_LAYOUT lease with the kernel file locking code,
      which filesystems that support pNFS exports that require recalls need
      to break on conflicting access patterns.
      
      The XDR code is based on the old pNFS server implementation by
      Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
      Marc Eshel, Mike Sager and Ricardo Labiaga.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      c5c707f9
    • Christoph Hellwig's avatar
      nfsd: implement pNFS operations · 9cf514cc
      Christoph Hellwig authored
      Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
      LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
      outstanding layouts and devices.
      
      Layout management is very straight forward, with a nfs4_layout_stateid
      structure that extends nfs4_stid to manage layout stateids as the
      top-level structure.  It is linked into the nfs4_file and nfs4_client
      structures like the other stateids, and contains a linked list of
      layouts that hang of the stateid.  The actual layout operations are
      implemented in layout drivers that are not part of this commit, but
      will be added later.
      
      The worst part of this commit is the management of the pNFS device IDs,
      which suffers from a specification that is not sanely implementable due
      to the fact that the device-IDs are global and not bound to an export,
      and have a small enough size so that we can't store the fsid portion of
      a file handle, and must never be reused.  As we still do need perform all
      export authentication and valida...
      9cf514cc
    • Christoph Hellwig's avatar
    • Christoph Hellwig's avatar
    • Christoph Hellwig's avatar
  18. 07 Jan, 2015 1 commit
  19. 07 Nov, 2014 1 commit
    • Jeff Layton's avatar
      nfsd: convert nfs4_file searches to use RCU · 5b095e99
      Jeff Layton authored
      
      The global state_lock protects the file_hashtbl, and that has the
      potential to be a scalability bottleneck.
      
      Address this by making the file_hashtbl use RCU. Add a rcu_head to the
      nfs4_file and use that when freeing ones that have been hashed. In order
      to conserve space, we union the fi_rcu field with the fi_delegations
      list_head which must be clear by the time the last reference to the file
      is dropped.
      
      Convert find_file_locked to use RCU lookup primitives and not to require
      that the state_lock be held, and convert find_file to do a lockless
      lookup. Convert find_or_add_file to attempt a lockless lookup first, and
      then fall back to doing a locked search and insert if that fails to find
      anything.
      
      Also, minimize the number of times we need to calculate the hash value
      by passing it in as an argument to the search and insert functions, and
      optimize the order of arguments in nfsd4_init_file.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      5b095e99
  20. 23 Oct, 2014 1 commit
  21. 07 Oct, 2014 1 commit
  22. 01 Oct, 2014 1 commit
  23. 26 Sep, 2014 4 commits
  24. 17 Sep, 2014 3 commits