Commits · 0cb3284b535bd5eacc287632b55150c8e5d9edc7 · nexedi / linux

01 Mar, 2012 2 commits

NFSv4.1: Get rid of NFS4CLNT_LAYOUTRECALL · 0cb3284b

Trond Myklebust authored Mar 01, 2012

The NFS4CLNT_LAYOUTRECALL bit is a long-term impediment to scalability. It
basically stops all other recalls by a given server once any layout recall
is requested.

If the recall is for a different file, then we don't care.
If the recall applies to the same file, then we're in one of two situations:
Either we are in the case of a replay of an existing request, in which case
the session is supposed to deal with matters, or we are dealing with a
completely different request, in which case we should just try to process
it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

0cb3284b

NFSv4.1: Get rid of redundant NFS4CLNT_LAYOUTRECALL tests · a59c30ac

Trond Myklebust authored Mar 01, 2012

The NFS4CLNT_LAYOUTRECALL tests in pnfs_layout_process and
pnfs_update_layout are redundant.

In the case of a bulk layout recall, we're always testing for
the NFS_LAYOUT_BULK_RECALL flay anyway.
In the case of a file or segment recall, the call to
pnfs_set_layout_stateid() updates the layout_header 'barrier'
sequence id, which triggers the test in pnfs_layoutgets_blocked()
and is less race-prone than NFS4CLNT_LAYOUTRECALL anyway.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

a59c30ac

27 Feb, 2012 4 commits

SUNRPC: move waitq from RPC pipe to RPC inode · 591ad7fe

Stanislav Kinsbursky authored Feb 27, 2012

Currently, wait queue, used for polling of RPC pipe changes from user-space,
is a part of RPC pipe. But the pipe data itself can be released on NFS umount
prior to dentry-inode pair, connected to it (is case of this pair is open by
some process).
This is not a problem for almost all pipe users, because all PipeFS file
operations checks pipe reference prior to using it.
Except evenfd. This thing registers itself with "poll" file operation and thus
has a reference to pipe wait queue. This leads to oopses on destroying eventfd
after NFS umount (like rpc_idmapd do) since not pipe data left to the point
already.
The solution is to wait queue from pipe data to internal RPC inode data. This
looks more logical, because this wiat queue used only for user-space processes,
which already holds inode reference.

Note: upcalls have to get pipe->dentry prior to dereferecing wait queue to make
sure, that mount point won't disappear from underneath us.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

591ad7fe

SUNRPC: check RPC inode's pipe reference before dereferencing · 2c9030ee

Stanislav Kinsbursky authored Feb 27, 2012

There are 2 tightly bound objects: pipe data (created for kernel needs, has
reference to dentry, which depends on PipeFS mount/umount) and PipeFS
dentry/inode pair (created on mount for user-space needs). They both
independently may have or have not a valid reference to each other.
This means, that we have to make sure, that pipe->dentry reference is valid on
upcalls, and dentry->pipe reference is valid on downcalls. The latter check is
absent - my fault.
IOW, PipeFS dentry can be opened by some process (rpc.idmapd for example), but
it's pipe data can belong to NFS mount, which was unmounted already and thus
pipe data was destroyed.
To fix this, pipe reference have to be set to NULL on rpc_unlink() and checked
on PipeFS file operations instead of pipe->dentry check.

Note: PipeFS "poll" file operation will be updated in next patch, because it's
logic is more complicated.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

2c9030ee

NFS: release per-net clients lock before calling PipeFS dentries creation · e9dbca8d

Stanislav Kinsbursky authored Feb 27, 2012

v3:
1) Lookup for client is performed from the beginning of the list on each PipeFS
event handling operation.

Lockdep is sad otherwise, because inode mutex is taken on PipeFS dentry
creation, which can be called on mount notification, where this per-net client
lock is taken on clients list walk.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

e9dbca8d

SUNRPC: release per-net clients lock before calling PipeFS dentries creation · da3b4622

Stanislav Kinsbursky authored Feb 27, 2012

v3:
1) Lookup for client is performed from the beginning of the list on each PipeFS
event handling operation.

Lockdep is sad otherwise, because inode mutex is taken on PipeFS dentry
creation, which can be called on mount notification, where this per-net client
lock is taken on clients list walk.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

da3b4622

26 Feb, 2012 1 commit
- NFSv4.1: Don't call nfs4_deviceid_purge_client() unless we're NFSv4.1 · 7df529af
  Trond Myklebust authored Feb 26, 2012
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
  7df529af
19 Feb, 2012 2 commits

NFS: Ensure struct nfs_client holds a reference to the net namespace · abd96698

Trond Myklebust authored Feb 19, 2012

Otherwise we have no guarantee that the net namespace won't just
disappear from underneath us once the task that created it
is destroyed.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>

abd96698

NFS: Ensure that the nfs_client 'net' field is always set · 9937347a

Trond Myklebust authored Feb 19, 2012

Currently, the nfs_parsed_mount_data->net field is initialised in
the nfs_parse_mount_options() function, which means that it only
gets set if we're using text based mounts. The legacy binary
mount interface is therefore broken.

Fix is to initialise the ->net field in nfs_alloc_parsed_mount_data.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>

9937347a

17 Feb, 2012 2 commits

NFS: include filelayout DS rpc stats in mountstats · 0a702195

Weston Andros Adamson authored Feb 17, 2012

Include RPC statistics from all data servers in /proc/self/mountstats for pNFS
filelayout mounts.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

0a702195

NFSv4.1 set highest_used_slotid to NFS4_NO_SLOT · b6bf6e7d

Andy Adamson authored Feb 17, 2012

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b6bf6e7d

16 Feb, 2012 4 commits

nfs: Clean up debugging in nfs_follow_mountpoint() · d7c32675

Chuck Lever authored Feb 15, 2012

Clean up: Fix a debugging message which had an obsolete function name
in it (nfs_follow_mountpoint).

Introduced by commit 36d43a43 "NFS: Use d_automount() rather than
abusing follow_link()" (January 14, 2011)
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

d7c32675

SUNRPC: Use KERN_DEFAULT for debugging printk's · dbb9c2a2

Chuck Lever authored Feb 15, 2012

Our dprintk() debugging facility doesn't specify any verbosity level
for it's printk() calls, but it should.

The default verbosity for printk's is KERN_DEFAULT.  You might argue
that these are debugging printk's and thus the verbosity should be
KERN_DEBUG.  That would mean that to see NFS and SUNRPC debugging
output an admin would also have to boost the syslog verbosity, which
would be insufferably noisy.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

dbb9c2a2

SUNRPC: add sending,pending queue and max slot to xprt stats · 15a45206

Andy Adamson authored Feb 14, 2012

With static RPC slots, the xprt backlog queue stats were useful in showing
when the transport (TCP) was starved by lack of RPC slots. The new dynamic
RPC slot code, commit d9ba131d, always
provides an RPC slot and so only uses the xprt backlog queue when the
tcp_max_slot_table_entries value has been hit or when an allocation error
occurs. All requests are now placed on the xprt sending or pending queue which
need to be monitored for debugging.

The max_slot stat shows the maximum number of dynamic RPC slots reached which is
useful when debugging performance issues.

Add the new fields at the end of the mountstats xprt stanza so that mountstats
outputs the previous correct values and ignores the new fields. Bump
NFS_IOSTATS_VERS.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

15a45206

SUNRPC: init per-net rpcbind spinlock · 1d96e80f

Stanislav Kinsbursky authored Feb 16, 2012

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

1d96e80f

15 Feb, 2012 21 commits

nfs41: Verify channel's attributes accordingly to RFC v2 · b4b9a0c1

Vitaliy Gusev authored Feb 15, 2012

 ca_maxoperations:

      For the backchannel, the server MUST
      NOT change the value the client offers.  For the fore channel,
      the server MAY change the requested value.

  ca_maxrequests:

       For the backchannel, the server MUST NOT change the
       value the client offers.  For the fore channel, the server MAY
       change the requested value.
Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b4b9a0c1

NFS: dont allow minorversion= opt when vers != 4 · 571b7554

Weston Andros Adamson authored Feb 01, 2012

Don't allow invalid 'vers' and 'minorversion' combinations in mount options,
such as "vers=3,minorversion=1".
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

571b7554

SUNRPC: Ensure that we can trace waitqueues when !defined(CONFIG_SYSCTL) · 2f09c242

Trond Myklebust authored Feb 08, 2012

The tracepoint code relies on the queue->name being defined in order to
be able to display the name of the waitqueue on which an RPC task is
sleeping.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>

2f09c242

NFSv4: Further reduce the footprint of the idmapper · 685f50f9

Trond Myklebust authored Feb 08, 2012

Don't allocate the legacy idmapper tables until we actually need
them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

685f50f9

NFSv4: The idmapper now depends on keyring functionality · e3da8706

Trond Myklebust authored Feb 08, 2012

Add the appropriate 'select KEYS' to the NFSv4 Kconfig entry.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

e3da8706

NFSv4: Reduce the footprint of the idmapper · d073e9b5

Trond Myklebust authored Feb 07, 2012

Instead of pre-allocating the storage for all the strings, we can
significantly reduce the size of that table by doing the allocation
when we do the downcall.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

d073e9b5

NFS: add mount options 'v4.0' and 'v4.1' · 7ced286e

Weston Andros Adamson authored Feb 07, 2012

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

7ced286e

NFS: fix nfs4_find_client_sessionid() arguments list · b6d1e83b

Stanislav Kinsbursky authored Feb 07, 2012

It's not compilable in case of CONFIG_NFS_V4_1 is not set.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b6d1e83b

NFS: Initialise the nfs_net->nfs_client_lock · 4c03ae4a

Trond Myklebust authored Feb 07, 2012

Ensure that we initialise the nfs_net->nfs_client_lock spinlock.
Also ensure that nfs_server_remove_lists() doesn't try to
dereference server->nfs_client before that is initialised.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>

4c03ae4a

Lockd: shutdown NLM hosts in network namespace context · 3b64739f

Stanislav Kinsbursky authored Jan 31, 2012

Lockd now managed in network namespace context. And this patch introduces
network namespace related NLM hosts shutdown in case of releasing per-net Lockd
resources.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3b64739f

LockD: make NSM network namespace aware · 0e1cb5c0

Stanislav Kinsbursky authored Jan 31, 2012

NLM host is network namespace aware now.
So NSM have to take it into account.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

0e1cb5c0

LockD: make nlm hosts network namespace aware · 66697bfd

Stanislav Kinsbursky authored Jan 31, 2012

This object depends on RPC client, and thus on network namespace.
So let's make it's allocation and lookup in network namespace context.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

66697bfd

Lockd: per-net up and down routines introduced · bb2224df

Stanislav Kinsbursky authored Jan 31, 2012

This patch introduces per-net Lockd initialization and destruction routines.
The logic is the same as in global Lockd up and down routines. Probably the
solution is not the best one. But at least it looks clear.
So per-net "up" routine are called only in case of lockd is running already. If
per-net resources are not allocated yet, then service is being registered with
local portmapper and lockd sockets created.
Per-net "down" routine is called on every lockd_down() call in case of global
users counter is not zero.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

bb2224df

Lockd: pernet usage counter introduced · a9c5d73a

Stanislav Kinsbursky authored Jan 31, 2012

Lockd is going to be shared between network namespaces - i.e. going to be able
to handle lock requests from different network namespaces. This means, that
network namespace related resources have to be allocated not once (like now),
but for every network namespace context, from which service is requested to
operate.
This patch implements Lockd per-net users accounting. New per-net counter is
used to determine, when per-net resources have to be freed.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

a9c5d73a

Lockd: create permanent lockd sockets in current network namespace · c228fa20

Stanislav Kinsbursky authored Jan 31, 2012

This patch parametrizes Lockd permanent sockets creation routine by network
namespace context.
It also replaces hard-coded init_net with current network namespace context in
Lockd sockets creation routines.
This approach looks safe, because Lockd is created during NFS mount (or NFS
server start) and thus socket is required exactly in current network namespace
context. But in the same time it means, that Lockd sockets inherits first Lockd
requester network namespace. This issue will be fixed in further patches of the
series.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

c228fa20

SUNRPC: service shutdown function in network namespace context introduced · 074d0f67

Stanislav Kinsbursky authored Jan 31, 2012

This function is enough for releasing resources, allocated for network
namespace context, in case of sharing service between them.
IOW, each service "user" (LockD, NFSd, etc), which wants to share service
between network namespaces, have to release related resources by the function,
introduced in this patch, instead of performing service shutdown (of course in
case the service is shared already to the moment of release).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

074d0f67

SUNRPC: service destruction in network namespace context · 7b147f1f

Stanislav Kinsbursky authored Jan 31, 2012

v2: Added comment to BUG_ON's in svc_destroy() to make code looks clearer.

This patch introduces network namespace filter for service destruction
function.
Nothing special here - just do exactly the same operations, but only for
tranports in passed networks namespace context.
BTW, BUG_ON() checks for empty service transports lists were returned into
svc_destroy() function. This is because of swithing generic svc_close_all() to
networks namespace dependable svc_close_net().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

7b147f1f

SUNRPC: clear svc transports lists helper introduced · 3a22bf50

Stanislav Kinsbursky authored Jan 31, 2012

This patch moves service transports deletion from service sockets lists to
separated function.
This is a precursor patch, which would be usefull with service shutdown in
network namespace context, introduced later in the series.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3a22bf50

SUNRPC: clear svc pools lists helper introduced · 6f513365

Stanislav Kinsbursky authored Jan 31, 2012

This patch moves removing of service transport from it's pools ready lists to
separated function. Also this clear is now done with list_for_each_entry_safe()
helper.
This is a precursor patch, which would be usefull with service shutdown in
network namespace context, introduced later in the series.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

6f513365

NFSv4.1: Add a module parameter to set the number of session slots · ef159e91

Trond Myklebust authored Feb 06, 2012

Add the module parameter 'max_session_slots' to set the initial number
of slots that the NFSv4.1 client will attempt to negotiate with the
server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

ef159e91

NFSv4.1: Convert slotid from u8 to u32 · 45d43c29

Trond Myklebust authored Feb 06, 2012

It is perfectly legal to negotiate up to 2^32-1 slots in the protocol,
and with 10GigE, we are already seeing that 255 slots is far too limiting.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

45d43c29

06 Feb, 2012 4 commits

NFS: build fixed in case of NFS_USE_NEW_IDMAPPER is undefined · 17347d03

Stanislav Kinsbursky authored Jan 26, 2012

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

17347d03

NFS: pass transport net to rpc_pton() while parse server name · 33faaa38

Stanislav Kinsbursky authored Jan 26, 2012

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

33faaa38

NFS: pass current net to rpc_pton() while parsing mount options · b48e1278

Stanislav Kinsbursky authored Jan 26, 2012

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b48e1278

NFS: search for client session id in proper network namespace · c7add9a9

Stanislav Kinsbursky authored Jan 26, 2012

Network namespace is taken from request transport and passed as a part of
cb_process_state structure.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

c7add9a9