1. 06 Jan, 2012 3 commits
    • Trond Myklebust's avatar
      NFS: Remove pNFS bloat from the generic write path · e2fecb21
      Trond Myklebust authored
      We have no business doing any this in the standard write release path.
      Get rid of it, and put it in the pNFS layer.
      
      Also, while we're at it, get rid of the completely bogus unlock/relock
      semantics that were present in nfs_writeback_release_full(). It is
      not only unnecessary, but actually dangerous to release the write lock
      just in order to take it again in nfs_page_async_flush(). Better just
      to open code the pgio operations in a pnfs helper.
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      e2fecb21
    • Boaz Harrosh's avatar
      pnfs-obj: Must return layout on IO error · fe0fe835
      Boaz Harrosh authored
      As mandated by the standard. In case of an IO error, a pNFS
      objects layout driver must return it's layout. This is because
      all device errors are reported to the server as part of the
      layout return buffer.
      
      This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
      is done, through a bit flag on the pnfs_layoutdriver_type->flags
      member. The flag is set by the layout driver that wants a
      layout_return preformed at pnfs_ld_{write,read}_done in case
      of an error.
      (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
       because this code is never called outside of pnfs.c and pnfs IO
       paths)
      
      Without this patch 3.[0-2] Kernels leak memory and have an annoying
      WARN_ON after every IO error utilizing the pnfs-obj driver.
      
      [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      fe0fe835
    • Boaz Harrosh's avatar
      pnfs-obj: pNFS errors are communicated on iodata->pnfs_error · 5c0b4129
      Boaz Harrosh authored
      Some time along the way pNFS IO errors were switched to
      communicate with a special iodata->pnfs_error member instead
      of the regular RPC members. But objlayout was not switched
      over.
      
      Fix that!
      Without this fix any IO error is hanged, because IO is not
      switched to MDS and pages are never cleared or read.
      
      [Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      5c0b4129
  2. 05 Jan, 2012 12 commits
    • Chuck Lever's avatar
      NFS: Cache state owners after files are closed · 0aaaf5c4
      Chuck Lever authored
      Servers have a finite amount of memory to store NFSv4 open and lock
      owners.  Moreover, servers may have a difficult time determining when
      they can reap their state owner table, thanks to gray areas in the
      NFSv4 protocol specification.  Thus clients should be careful to reuse
      state owners when possible.
      
      Currently Linux is not too careful.  When a user has closed all her
      files on one mount point, the state owner's reference count goes to
      zero, and it is released.  The next OPEN allocates a new one.  A
      workload that serially opens and closes files can run through a large
      number of open owners this way.
      
      When a state owner's reference count goes to zero, slap it onto a free
      list for that nfs_server, with an expiry time.  Garbage collect before
      looking for a state owner.  This makes state owners for active users
      available for re-use.
      
      Now that there can be unused state owners remaining at umount time,
      purge the state owner free list when a server is destroyed.  Also be
      sure not to reclaim unused state owners during state recovery.
      
      This change has benefits for the client as well.  For some workloads,
      this approach drops the number of OPEN_CONFIRM calls from the same as
      the number of OPEN calls, down to just one.  This reduces wire traffic
      and thus open(2) latency.  Before this patch, untarring a kernel
      source tarball shows the OPEN_CONFIRM call counter steadily increasing
      through the test.  With the patch, the OPEN_CONFIRM count remains at 1
      throughout the entire untar.
      
      As long as the expiry time is kept short, I don't think garbage
      collection should be terribly expensive, although it does bounce the
      clp->cl_lock around a bit.
      
      [ At some point we should rationalize the use of the nfs_server
      ->destroy method. ]
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      [Trond: Fixed a garbage collection race and a few efficiency issues]
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      0aaaf5c4
    • Chuck Lever's avatar
      NFS: Clean up nfs4_find_state_owners_locked() · 414adf14
      Chuck Lever authored
      There's no longer a need to check the so_server field in the state
      owner, because nowadays the RB tree we search for state owners
      contains owners for that only server.
      
      Make nfs4_find_state_owners_locked() use the same tree searching logic
      as nfs4_insert_state_owner_locked().
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      414adf14
    • Andy Adamson's avatar
      NFSv4: include bitmap in nfsv4 get acl data · bf118a34
      Andy Adamson authored
      The NFSv4 bitmap size is unbounded: a server can return an arbitrary
      sized bitmap in an FATTR4_WORD0_ACL request.  Replace using the
      nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
      with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
      xdr length to the (cached) acl page data.
      
      This is a general solution to commit e5012d1f "NFSv4.1: update
      nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
      when getting ACLs.
      
      Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
      was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAndy Adamson <andros@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      bf118a34
    • Chris Metcalf's avatar
      nfs: fix a minor do_div portability issue · 3476f114
      Chris Metcalf authored
      This change modifies filelayout_get_dense_offset() to use the functions
      in math64.h and thus avoid a 32-bit platform compile error trying to
      use do_div() on an s64 type.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Reviewed-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      3476f114
    • Andy Adamson's avatar
      0b1c8fc4
    • Andy Adamson's avatar
    • Andy Adamson's avatar
      NFSv4.1: cleanup init and reset of session slot tables · aacd5537
      Andy Adamson authored
      We are either initializing or resetting a session. Initialize or reset
      the session slot tables accordingly.
      Signed-off-by: default avatarAndy Adamson <andros@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      aacd5537
    • Andy Adamson's avatar
      NFSv4.1: fix backchannel slotid off-by-one bug · 61f2e510
      Andy Adamson authored
      Cc:stable@kernel.org
      Signed-off-by: default avatarAndy Adamson <andros@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      61f2e510
    • Jeff Layton's avatar
      nfs: fix regression in handling of context= option in NFSv4 · 8a0d551a
      Jeff Layton authored
      Setting the security context of a NFSv4 mount via the context= mount
      option is currently broken. The NFSv4 codepath allocates a parsed
      options struct, and then parses the mount options to fill it. It
      eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
      That clobbers the lsm_opts struct that was populated earlier. This bug
      also looks like it causes a small memory leak on each v4 mount where
      context= is used.
      
      Fix this by moving the initialization of the lsm_opts into
      nfs_alloc_parsed_mount_data. Also, add a destructor for
      nfs_parsed_mount_data to make it easier to free all of the allocations
      hanging off of it, and to ensure that the security_free_mnt_opts is
      called whenever security_init_mnt_opts is.
      
      I believe this regression was introduced quite some time ago, probably
      by commit c02d7adf.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      8a0d551a
    • NeilBrown's avatar
      NFS - fix recent breakage to NFS error handling. · 2edb6bc3
      NeilBrown authored
      From c6d615d2b97fe305cbf123a8751ced859dca1d5e Mon Sep 17 00:00:00 2001
      From: NeilBrown <neilb@suse.de>
      Date: Wed, 16 Nov 2011 09:39:05 +1100
      Subject: [PATCH] NFS - fix recent breakage to NFS error handling.
      
      commit 02c24a82 made a small and
      presumably unintended change to write error handling in NFS.
      
      Previously an error from filemap_write_and_wait_range would only be of
      interest if nfs_file_fsync did not return an error.  After this commit,
      an error from filemap_write_and_wait_range would mean that (the rest of)
      nfs_file_fsync would not even be called.
      
      This means that:
       1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC.
       2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are
          synchronous.
      
      This patch restores previous behaviour.
      
      Cc: stable@kernel.org
      Cc: Josef Bacik <josef@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      2edb6bc3
    • Chuck Lever's avatar
      NFS: Retry mounting NFSROOT · 43717c7d
      Chuck Lever authored
      Lukas Razik <linux@razik.name> reports that on his SPARC system,
      booting with an NFS root file system stopped working after commit
      56463e50 "NFS: Use super.c for NFSROOT mount option parsing."
      
      We found that the network switch to which Lukas' client was attached
      was delaying access to the LAN after the client's NIC driver reported
      that its link was up.  The delay was longer than the timeouts used in
      the NFS client during mounting.
      
      NFSROOT worked for Lukas before commit 56463e50 because in those
      kernels, the client's first operation was an rpcbind request to
      determine which port the NFS server was listening on.  When that
      request failed after a long timeout, the client simply selected the
      default NFS port (2049).  By that time the switch was allowing access
      to the LAN, and the mount succeeded.
      
      Neither of these client behaviors is desirable, so reverting 56463e50
      is really not a choice.  Instead, introduce a mechanism that retries
      the NFSROOT mount request several times.  This is the same tactic that
      normal user space NFS mounts employ to overcome server and network
      delays.
      Signed-off-by: default avatarLukas Razik <linux@razik.name>
      [ cel: match kernel coding style, add proper patch description ]
      [ cel: add exponential back-off ]
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarLukas Razik <linux@razik.name>
      Cc: stable@kernel.org # > 2.6.38
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      43717c7d
    • Trond Myklebust's avatar
      SUNRPC: Clean up the RPCSEC_GSS service ticket requests · 68c97153
      Trond Myklebust authored
      Instead of hacking specific service names into gss_encode_v1_msg, we should
      just allow the caller to specify the service name explicitly.
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      68c97153
  3. 04 Jan, 2012 13 commits
  4. 03 Jan, 2012 9 commits
  5. 02 Jan, 2012 2 commits
  6. 31 Dec, 2011 1 commit