Commit 61845143 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "The main change is the pNFS block server support from Christoph, which
  allows an NFS client connected to shared disk to do block IO to the
  shared disk in place of NFS reads and writes.  This also requires xfs
  patches, which should arrive soon through the xfs tree, barring
  unexpected problems.  Support for other filesystems is also possible
  if there's interest.

  Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
  shape"

* 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
  nfsd: default NFSv4.2 to on
  nfsd: pNFS block layout driver
  exportfs: add methods for block layout exports
  nfsd: add trace events
  nfsd: update documentation for pNFS support
  nfsd: implement pNFS layout recalls
  nfsd: implement pNFS operations
  nfsd: make find_any_file available outside nfs4state.c
  nfsd: make find/get/put file available outside nfs4state.c
  nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
  nfsd: add fh_fsid_match helper
  nfsd: move nfsd_fh_match to nfsfh.h
  fs: add FL_LAYOUT lease type
  fs: track fl_owner for leases
  nfs: add LAYOUT_TYPE_MAX enum value
  nfsd: factor out a helper to decode nfstime4 values
  sunrpc/lockd: fix references to the BKL
  nfsd: fix year-2038 nfs4 state problem
  svcrdma: Handle additional inline content
  svcrdma: Move read list XDR round-up logic
  ...
parents a26be149 c23ae601
......@@ -24,11 +24,6 @@ focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
"exactly once" semantics and better control and throttling of the
resources allocated for each client.
Other NFSv4.1 features, Parallel NFS operations in particular,
are still under development out of tree.
See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
for more information.
The table below, taken from the NFSv4.1 document, lists
the operations that are mandatory to implement (REQ), optional
(OPT), and NFSv4.0 operations that are required not to implement (MNI)
......@@ -43,9 +38,7 @@ The OPTIONAL features identified and their abbreviations are as follows:
The following abbreviations indicate the linux server implementation status.
I Implemented NFSv4.1 operations.
NS Not Supported.
NS* unimplemented optional feature.
P pNFS features implemented out of tree.
PNS pNFS features that are not supported yet (out of tree).
NS* Unimplemented optional feature.
Operations
......@@ -70,13 +63,13 @@ I | DESTROY_SESSION | REQ | | Section 18.37 |
I | EXCHANGE_ID | REQ | | Section 18.35 |
I | FREE_STATEID | REQ | | Section 18.38 |
| GETATTR | REQ | | Section 18.7 |
P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
I | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
NS*| GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
| GETFH | REQ | | Section 18.8 |
NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 |
P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
I | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
I | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
I | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
| LINK | OPT | | Section 18.9 |
| LOCK | REQ | | Section 18.10 |
| LOCKT | REQ | | Section 18.11 |
......@@ -122,9 +115,9 @@ Callback Operations
| | MNI | or OPT) | |
+-------------------------+-----------+-------------+---------------+
| CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 |
P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
I | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 |
P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
NS*| CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 |
NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 |
| CB_RECALL | OPT | FDELG, | Section 20.2 |
......
pNFS block layout server user guide
The Linux NFS server now supports the pNFS block layout extension. In this
case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition
to handling all the metadata access to the NFS export also hands out layouts
to the clients to directly access the underlying block devices that are
shared with the client.
To use pNFS block layouts with with the Linux NFS server the exported file
system needs to support the pNFS block layouts (currently just XFS), and the
file system must sit on shared storage (typically iSCSI) that is accessible
to the clients in addition to the MDS. As of now the file system needs to
sit directly on the exported volume, striping or concatenation of
volumes on the MDS and clients is not supported yet.
On the server, pNFS block volume support is automatically if the file system
support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK
option enabled, the blkmapd daemon from nfs-utils is running, and the
file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1).
If the nfsd server needs to fence a non-responding client it calls
/sbin/nfsd-recall-failed with the first argument set to the IP address of
the client, and the second argument set to the device node without the /dev
prefix for the file system to be fenced. Below is an example file that shows
how to translate the device into a serial number from SCSI EVPD 0x80:
cat > /sbin/nfsd-recall-failed << EOF
#!/bin/sh
CLIENT="$1"
DEV="/dev/$2"
EVPD=`sg_inq --page=0x80 ${DEV} | \
grep "Unit serial number:" | \
awk -F ': ' '{print $2}'`
echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log
EOF
......@@ -57,8 +57,8 @@ static DEFINE_SPINLOCK(nlm_blocked_lock);
static const char *nlmdbg_cookie2a(const struct nlm_cookie *cookie)
{
/*
* We can get away with a static buffer because we're only
* called with BKL held.
* We can get away with a static buffer because this is only called
* from lockd, which is single-threaded.
*/
static char buf[2*NLM_MAXCOOKIELEN+1];
unsigned int i, len = sizeof(buf);
......
......@@ -95,14 +95,6 @@ nlm_decode_fh(__be32 *p, struct nfs_fh *f)
return p + XDR_QUADLEN(NFS2_FHSIZE);
}
static inline __be32 *
nlm_encode_fh(__be32 *p, struct nfs_fh *f)
{
*p++ = htonl(NFS2_FHSIZE);
memcpy(p, f->data, NFS2_FHSIZE);
return p + XDR_QUADLEN(NFS2_FHSIZE);
}
/*
* Encode and decode owner handle
*/
......
......@@ -137,7 +137,7 @@
#define IS_POSIX(fl) (fl->fl_flags & FL_POSIX)
#define IS_FLOCK(fl) (fl->fl_flags & FL_FLOCK)
#define IS_LEASE(fl) (fl->fl_flags & (FL_LEASE|FL_DELEG))
#define IS_LEASE(fl) (fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
#define IS_OFDLCK(fl) (fl->fl_flags & FL_OFDLCK)
static bool lease_breaking(struct file_lock *fl)
......@@ -1371,6 +1371,8 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
{
if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT))
return false;
if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
return false;
return locks_conflict(breaker, lease);
......@@ -1594,11 +1596,14 @@ int fcntl_getlease(struct file *filp)
* conflict with the lease we're trying to set.
*/
static int
check_conflicting_open(const struct dentry *dentry, const long arg)
check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
{
int ret = 0;
struct inode *inode = dentry->d_inode;
if (flags & FL_LAYOUT)
return 0;
if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
return -EAGAIN;
......@@ -1647,7 +1652,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
spin_lock(&ctx->flc_lock);
time_out_leases(inode, &dispose);
error = check_conflicting_open(dentry, arg);
error = check_conflicting_open(dentry, arg, lease->fl_flags);
if (error)
goto out;
......@@ -1661,7 +1666,8 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
*/
error = -EAGAIN;
list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
if (fl->fl_file == filp) {
if (fl->fl_file == filp &&
fl->fl_owner == lease->fl_owner) {
my_fl = fl;
continue;
}
......@@ -1702,7 +1708,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
* precedes these checks.
*/
smp_mb();
error = check_conflicting_open(dentry, arg);
error = check_conflicting_open(dentry, arg, lease->fl_flags);
if (error) {
locks_unlink_lock_ctx(lease, &ctx->flc_lease_cnt);
goto out;
......@@ -1721,7 +1727,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
return error;
}
static int generic_delete_lease(struct file *filp)
static int generic_delete_lease(struct file *filp, void *owner)
{
int error = -EAGAIN;
struct file_lock *fl, *victim = NULL;
......@@ -1737,7 +1743,8 @@ static int generic_delete_lease(struct file *filp)
spin_lock(&ctx->flc_lock);
list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
if (fl->fl_file == filp) {
if (fl->fl_file == filp &&
fl->fl_owner == owner) {
victim = fl;
break;
}
......@@ -1778,13 +1785,14 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
switch (arg) {
case F_UNLCK:
return generic_delete_lease(filp);
return generic_delete_lease(filp, *priv);
case F_RDLCK:
case F_WRLCK:
if (!(*flp)->fl_lmops->lm_break) {
WARN_ON_ONCE(1);
return -ENOLCK;
}
return generic_add_lease(filp, arg, flp, priv);
default:
return -EINVAL;
......@@ -1857,7 +1865,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
{
if (arg == F_UNLCK)
return vfs_setlease(filp, F_UNLCK, NULL, NULL);
return vfs_setlease(filp, F_UNLCK, NULL, (void **)&filp);
return do_fcntl_add_lease(fd, filp, arg);
}
......
......@@ -82,6 +82,16 @@ config NFSD_V4
If unsure, say N.
config NFSD_PNFS
bool "NFSv4.1 server support for Parallel NFS (pNFS)"
depends on NFSD_V4
help
This option enables support for the parallel NFS features of the
minor version 1 of the NFSv4 protocol (RFC5661) in the kernel's NFS
server.
If unsure, say N.
config NFSD_V4_SECURITY_LABEL
bool "Provide Security Label support for NFSv4 server"
depends on NFSD_V4 && SECURITY
......
......@@ -2,9 +2,14 @@
# Makefile for the Linux nfs server
#
ccflags-y += -I$(src) # needed for trace events
obj-$(CONFIG_NFSD) += nfsd.o
nfsd-y := nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
# this one should be compiled first, as the tracing macros can easily blow up
nfsd-y += trace.o
nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
......@@ -12,3 +17,4 @@ nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o
nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
nfs4acl.o nfs4callback.o nfs4recover.o
nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o blocklayout.o blocklayoutxdr.o
/*
* Copyright (c) 2014 Christoph Hellwig.
*/
#include <linux/exportfs.h>
#include <linux/genhd.h>
#include <linux/slab.h>
#include <linux/nfsd/debug.h>
#include "blocklayoutxdr.h"
#include "pnfs.h"
#define NFSDDBG_FACILITY NFSDDBG_PNFS
static int
nfsd4_block_get_device_info_simple(struct super_block *sb,
struct nfsd4_getdeviceinfo *gdp)
{
struct pnfs_block_deviceaddr *dev;
struct pnfs_block_volume *b;
dev = kzalloc(sizeof(struct pnfs_block_deviceaddr) +
sizeof(struct pnfs_block_volume), GFP_KERNEL);
if (!dev)
return -ENOMEM;
gdp->gd_device = dev;
dev->nr_volumes = 1;
b = &dev->volumes[0];
b->type = PNFS_BLOCK_VOLUME_SIMPLE;
b->simple.sig_len = PNFS_BLOCK_UUID_LEN;
return sb->s_export_op->get_uuid(sb, b->simple.sig, &b->simple.sig_len,
&b->simple.offset);
}
static __be32
nfsd4_block_proc_getdeviceinfo(struct super_block *sb,
struct nfsd4_getdeviceinfo *gdp)
{
if (sb->s_bdev != sb->s_bdev->bd_contains)
return nfserr_inval;
return nfserrno(nfsd4_block_get_device_info_simple(sb, gdp));
}
static __be32
nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
struct nfsd4_layoutget *args)
{
struct nfsd4_layout_seg *seg = &args->lg_seg;
struct super_block *sb = inode->i_sb;
u32 block_size = (1 << inode->i_blkbits);
struct pnfs_block_extent *bex;
struct iomap iomap;
u32 device_generation = 0;
int error;
/*
* We do not attempt to support I/O smaller than the fs block size,
* or not aligned to it.
*/
if (args->lg_minlength < block_size) {
dprintk("pnfsd: I/O too small\n");
goto out_layoutunavailable;
}
if (seg->offset & (block_size - 1)) {
dprintk("pnfsd: I/O misaligned\n");
goto out_layoutunavailable;
}
/*
* Some clients barf on non-zero block numbers for NONE or INVALID
* layouts, so make sure to zero the whole structure.
*/
error = -ENOMEM;
bex = kzalloc(sizeof(*bex), GFP_KERNEL);
if (!bex)
goto out_error;
args->lg_content = bex;
error = sb->s_export_op->map_blocks(inode, seg->offset, seg->length,
&iomap, seg->iomode != IOMODE_READ,
&device_generation);
if (error) {
if (error == -ENXIO)
goto out_layoutunavailable;
goto out_error;
}
if (iomap.length < args->lg_minlength) {
dprintk("pnfsd: extent smaller than minlength\n");
goto out_layoutunavailable;
}
switch (iomap.type) {
case IOMAP_MAPPED:
if (seg->iomode == IOMODE_READ)
bex->es = PNFS_BLOCK_READ_DATA;
else
bex->es = PNFS_BLOCK_READWRITE_DATA;
bex->soff = (iomap.blkno << 9);
break;
case IOMAP_UNWRITTEN:
if (seg->iomode & IOMODE_RW) {
/*
* Crack monkey special case from section 2.3.1.
*/
if (args->lg_minlength == 0) {
dprintk("pnfsd: no soup for you!\n");
goto out_layoutunavailable;
}
bex->es = PNFS_BLOCK_INVALID_DATA;
bex->soff = (iomap.blkno << 9);
break;
}
/*FALLTHRU*/
case IOMAP_HOLE:
if (seg->iomode == IOMODE_READ) {
bex->es = PNFS_BLOCK_NONE_DATA;
break;
}
/*FALLTHRU*/
case IOMAP_DELALLOC:
default:
WARN(1, "pnfsd: filesystem returned %d extent\n", iomap.type);
goto out_layoutunavailable;
}
error = nfsd4_set_deviceid(&bex->vol_id, fhp, device_generation);
if (error)
goto out_error;
bex->foff = iomap.offset;
bex->len = iomap.length;
seg->offset = iomap.offset;
seg->length = iomap.length;
dprintk("GET: %lld:%lld %d\n", bex->foff, bex->len, bex->es);
return 0;
out_error:
seg->length = 0;
return nfserrno(error);
out_layoutunavailable:
seg->length = 0;
return nfserr_layoutunavailable;
}
static __be32
nfsd4_block_proc_layoutcommit(struct inode *inode,
struct nfsd4_layoutcommit *lcp)
{
loff_t new_size = lcp->lc_last_wr + 1;
struct iattr iattr = { .ia_valid = 0 };
struct iomap *iomaps;
int nr_iomaps;
int error;
nr_iomaps = nfsd4_block_decode_layoutupdate(lcp->lc_up_layout,
lcp->lc_up_len, &iomaps, 1 << inode->i_blkbits);
if (nr_iomaps < 0)
return nfserrno(nr_iomaps);
if (lcp->lc_mtime.tv_nsec == UTIME_NOW ||
timespec_compare(&lcp->lc_mtime, &inode->i_mtime) < 0)
lcp->lc_mtime = current_fs_time(inode->i_sb);
iattr.ia_valid |= ATTR_ATIME | ATTR_CTIME | ATTR_MTIME;
iattr.ia_atime = iattr.ia_ctime = iattr.ia_mtime = lcp->lc_mtime;
if (new_size > i_size_read(inode)) {
iattr.ia_valid |= ATTR_SIZE;
iattr.ia_size = new_size;
}
error = inode->i_sb->s_export_op->commit_blocks(inode, iomaps,
nr_iomaps, &iattr);
kfree(iomaps);
return nfserrno(error);
}
const struct nfsd4_layout_ops bl_layout_ops = {
.proc_getdeviceinfo = nfsd4_block_proc_getdeviceinfo,
.encode_getdeviceinfo = nfsd4_block_encode_getdeviceinfo,
.proc_layoutget = nfsd4_block_proc_layoutget,
.encode_layoutget = nfsd4_block_encode_layoutget,
.proc_layoutcommit = nfsd4_block_proc_layoutcommit,
};
/*
* Copyright (c) 2014 Christoph Hellwig.
*/
#include <linux/sunrpc/svc.h>
#include <linux/exportfs.h>
#include <linux/nfs4.h>
#include "nfsd.h"
#include "blocklayoutxdr.h"
#define NFSDDBG_FACILITY NFSDDBG_PNFS
__be32
nfsd4_block_encode_layoutget(struct xdr_stream *xdr,
struct nfsd4_layoutget *lgp)
{
struct pnfs_block_extent *b = lgp->lg_content;
int len = sizeof(__be32) + 5 * sizeof(__be64) + sizeof(__be32);
__be32 *p;
p = xdr_reserve_space(xdr, sizeof(__be32) + len);
if (!p)
return nfserr_toosmall;
*p++ = cpu_to_be32(len);
*p++ = cpu_to_be32(1); /* we always return a single extent */
p = xdr_encode_opaque_fixed(p, &b->vol_id,
sizeof(struct nfsd4_deviceid));
p = xdr_encode_hyper(p, b->foff);
p = xdr_encode_hyper(p, b->len);
p = xdr_encode_hyper(p, b->soff);
*p++ = cpu_to_be32(b->es);
return 0;
}
static int
nfsd4_block_encode_volume(struct xdr_stream *xdr, struct pnfs_block_volume *b)
{
__be32 *p;
int len;
switch (b->type) {
case PNFS_BLOCK_VOLUME_SIMPLE:
len = 4 + 4 + 8 + 4 + b->simple.sig_len;
p = xdr_reserve_space(xdr, len);
if (!p)
return -ETOOSMALL;
*p++ = cpu_to_be32(b->type);
*p++ = cpu_to_be32(1); /* single signature */
p = xdr_encode_hyper(p, b->simple.offset);
p = xdr_encode_opaque(p, b->simple.sig, b->simple.sig_len);
break;
default:
return -ENOTSUPP;
}
return len;
}
__be32
nfsd4_block_encode_getdeviceinfo(struct xdr_stream *xdr,
struct nfsd4_getdeviceinfo *gdp)
{
struct pnfs_block_deviceaddr *dev = gdp->gd_device;
int len = sizeof(__be32), ret, i;
__be32 *p;
p = xdr_reserve_space(xdr, len + sizeof(__be32));
if (!p)
return nfserr_resource;
for (i = 0; i < dev->nr_volumes; i++) {
ret = nfsd4_block_encode_volume(xdr, &dev->volumes[i]);
if (ret < 0)
return nfserrno(ret);
len += ret;
}
/*
* Fill in the overall length and number of volumes at the beginning
* of the layout.
*/
*p++ = cpu_to_be32(len);
*p++ = cpu_to_be32(dev->nr_volumes);
return 0;
}
int
nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
u32 block_size)
{
struct iomap *iomaps;
u32 nr_iomaps, expected, i;
if (len < sizeof(u32)) {
dprintk("%s: extent array too small: %u\n", __func__, len);
return -EINVAL;
}
nr_iomaps = be32_to_cpup(p++);
expected = sizeof(__be32) + nr_iomaps * NFS4_BLOCK_EXTENT_SIZE;
if (len != expected) {
dprintk("%s: extent array size mismatch: %u/%u\n",
__func__, len, expected);
return -EINVAL;
}
iomaps = kcalloc(nr_iomaps, sizeof(*iomaps), GFP_KERNEL);
if (!iomaps) {
dprintk("%s: failed to allocate extent array\n", __func__);
return -ENOMEM;
}
for (i = 0; i < nr_iomaps; i++) {
struct pnfs_block_extent bex;
memcpy(&bex.vol_id, p, sizeof(struct nfsd4_deviceid));
p += XDR_QUADLEN(sizeof(struct nfsd4_deviceid));
p = xdr_decode_hyper(p, &bex.foff);
if (bex.foff & (block_size - 1)) {
dprintk("%s: unaligned offset %lld\n",
__func__, bex.foff);
goto fail;
}
p = xdr_decode_hyper(p, &bex.len);
if (bex.len & (block_size - 1)) {
dprintk("%s: unaligned length %lld\n",
__func__, bex.foff);
goto fail;
}
p = xdr_decode_hyper(p, &bex.soff);
if (bex.soff & (block_size - 1)) {
dprintk("%s: unaligned disk offset %lld\n",
__func__, bex.soff);
goto fail;
}
bex.es = be32_to_cpup(p++);
if (bex.es != PNFS_BLOCK_READWRITE_DATA) {
dprintk("%s: incorrect extent state %d\n",
__func__, bex.es);
goto fail;
}
iomaps[i].offset = bex.foff;
iomaps[i].length = bex.len;
}
*iomapp = iomaps;
return nr_iomaps;
fail:
kfree(iomaps);
return -EINVAL;
}
#ifndef _NFSD_BLOCKLAYOUTXDR_H
#define _NFSD_BLOCKLAYOUTXDR_H 1
#include <linux/blkdev.h>
#include "xdr4.h"
struct iomap;
struct xdr_stream;
enum pnfs_block_extent_state {
PNFS_BLOCK_READWRITE_DATA = 0,
PNFS_BLOCK_READ_DATA = 1,
PNFS_BLOCK_INVALID_DATA = 2,
PNFS_BLOCK_NONE_DATA = 3,
};
struct pnfs_block_extent {
struct nfsd4_deviceid vol_id;
u64 foff;
u64 len;
u64 soff;
enum pnfs_block_extent_state es;
};
#define NFS4_BLOCK_EXTENT_SIZE 44
enum pnfs_block_volume_type {
PNFS_BLOCK_VOLUME_SIMPLE = 0,
PNFS_BLOCK_VOLUME_SLICE = 1,
PNFS_BLOCK_VOLUME_CONCAT = 2,
PNFS_BLOCK_VOLUME_STRIPE = 3,
};
/*
* Random upper cap for the uuid length to avoid unbounded allocation.
* Not actually limited by the protocol.
*/
#define PNFS_BLOCK_UUID_LEN 128
struct pnfs_block_volume {
enum pnfs_block_volume_type type;
union {
struct {
u64 offset;
u32 sig_len;
u8 sig[PNFS_BLOCK_UUID_LEN];
} simple;
};
};
struct pnfs_block_deviceaddr {
u32 nr_volumes;
struct pnfs_block_volume volumes[];
};
__be32 nfsd4_block_encode_getdeviceinfo(struct xdr_stream *xdr,
struct nfsd4_getdeviceinfo *gdp);
__be32 nfsd4_block_encode_layoutget(struct xdr_stream *xdr,
struct nfsd4_layoutget *lgp);
int nfsd4_block_decode_layoutupdate(__be32 *p, u32 len, struct iomap **iomapp,
u32 block_size);
#endif /* _NFSD_BLOCKLAYOUTXDR_H */
......@@ -20,6 +20,7 @@
#include "nfsd.h"
#include "nfsfh.h"
#include "netns.h"
#include "pnfs.h"
#define NFSDDBG_FACILITY NFSDDBG_EXPORT
......@@ -545,6 +546,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
exp.ex_client = dom;
exp.cd = cd;
exp.ex_devid_map = NULL;
/* expiry */
err = -EINVAL;
......@@ -621,6 +623,8 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
if (!gid_valid(exp.ex_anon_gid))
goto out4;
err = 0;
nfsd4_setup_layout_type(&exp);
}
expp = svc_export_lookup(&exp);
......@@ -703,6 +707,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
new->ex_fslocs.locations = NULL;
new->ex_fslocs.locations_count = 0;
new->ex_fslocs.migrated = 0;
new->ex_layout_type = 0;
new->ex_uuid = NULL;
new->cd = item->cd;
}
......@@ -717,6 +722,8 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
new->ex_anon_uid = item->ex_anon_uid;
new->ex_anon_gid = item->ex_anon_gid;
new->ex_fsid = item->ex_fsid;
new->ex_devid_map = item->ex_devid_map;
item->ex_devid_map = NULL;
new->ex_uuid = item->ex_uuid;
item->ex_uuid = NULL;
new->ex_fslocs.locations = item->ex_fslocs.locations;
......@@ -725,6 +732,7 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
item->ex_fslocs.locations_count = 0;
new->ex_fslocs.migrated = item->ex_fslocs.migrated;
item->ex_fslocs.migrated = 0;
new->ex_layout_type = item->ex_layout_type;
new->ex_nflavors = item->ex_nflavors;
for (i = 0; i < MAX_SECINFO_LIST; i++) {
new->ex_flavors[i] = item->ex_flavors[i];
......
......@@ -56,6 +56,8 @@ struct svc_export {
struct nfsd4_fs_locations ex_fslocs;
uint32_t ex_nflavors;
struct exp_flavor_info ex_flavors[MAX_SECINFO_LIST];
enum pnfs_layouttype ex_layout_type;
struct nfsd4_deviceid_map *ex_devid_map;
struct cache_detail *cd;
};
......
......@@ -546,6 +546,102 @@ static int nfs4_xdr_dec_cb_recall(struct rpc_rqst *rqstp,
return status;
}
#ifdef CONFIG_NFSD_PNFS
/*
* CB_LAYOUTRECALL4args
*
* struct layoutrecall_file4 {
* nfs_fh4 lor_fh;
* offset4 lor_offset;
* length4 lor_length;
* stateid4 lor_stateid;
* };
*
* union layoutrecall4 switch(layoutrecall_type4 lor_recalltype) {
* case LAYOUTRECALL4_FILE:
* layoutrecall_file4 lor_layout;
* case LAYOUTRECALL4_FSID:
* fsid4 lor_fsid;
* case LAYOUTRECALL4_ALL:
* void;
* };
*
* struct CB_LAYOUTRECALL4args {
* layouttype4 clora_type;
* layoutiomode4 clora_iomode;
* bool clora_changed;
* layoutrecall4 clora_recall;
* };
*/
static void encode_cb_layout4args(struct xdr_stream *xdr,
const struct nfs4_layout_stateid *ls,
struct nfs4_cb_compound_hdr *hdr)
{
__be32 *p;
BUG_ON(hdr->minorversion == 0);
p = xdr_reserve_space(xdr, 5 * 4);
*p++ = cpu_to_be32(OP_CB_LAYOUTRECALL);
*p++ = cpu_to_be32(ls->ls_layout_type);
*p++ = cpu_to_be32(IOMODE_ANY);
*p++ = cpu_to_be32(1);
*p = cpu_to_be32(RETURN_FILE);
encode_nfs_fh4(xdr, &ls->ls_stid.sc_file->fi_fhandle);
p = xdr_reserve_space(xdr, 2 * 8);
p = xdr_encode_hyper(p, 0);
xdr_encode_hyper(p, NFS4_MAX_UINT64);
encode_stateid4(xdr, &ls->ls_recall_sid);
hdr->nops++;
}
static void nfs4_xdr_enc_cb_layout(struct rpc_rqst *req,
struct xdr_stream *xdr,
const struct nfsd4_callback *cb)
{
const struct nfs4_layout_stateid *ls =
container_of(cb, struct nfs4_layout_stateid, ls_recall);
struct nfs4_cb_compound_hdr hdr = {
.ident = 0,
.minorversion = cb->cb_minorversion,
};
encode_cb_compound4args(xdr, &hdr);
encode_cb_sequence4args(xdr, cb, &hdr);
encode_cb_layout4args(xdr, ls, &hdr);
encode_cb_nops(&hdr);
}
static int nfs4_xdr_dec_cb_layout(struct rpc_rqst *rqstp,
struct xdr_stream *xdr,
struct nfsd4_callback *cb)
{
struct nfs4_cb_compound_hdr hdr;
enum nfsstat4 nfserr;
int status;
status = decode_cb_compound4res(xdr, &hdr);
if (unlikely(status))
goto out;
if (cb) {
status = decode_cb_sequence4res(xdr, cb);
if (unlikely(status))
goto out;
}
status = decode_cb_op_status(xdr, OP_CB_LAYOUTRECALL, &nfserr);
if (unlikely(status))
goto out;
if (unlikely(nfserr != NFS4_OK))
status = nfs_cb_stat_to_errno(nfserr);
out:
return status;
}
#endif /* CONFIG_NFSD_PNFS */
/*
* RPC procedure tables
*/
......@@ -563,6 +659,9 @@ static int nfs4_xdr_dec_cb_recall(struct rpc_rqst *rqstp,
static struct rpc_procinfo nfs4_cb_procedures[] = {
PROC(CB_NULL, NULL, cb_null, cb_null),
PROC(CB_RECALL, COMPOUND, cb_recall, cb_recall),
#ifdef CONFIG_NFSD_PNFS
PROC(CB_LAYOUT, COMPOUND, cb_layout, cb_layout),
#endif
};
static struct rpc_version nfs_cb_version4 = {
......
This diff is collapsed.
......@@ -43,6 +43,8 @@
#include "current_stateid.h"
#include "netns.h"
#include "acl.h"
#include "pnfs.h"
#include "trace.h"
#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
#include <linux/security.h>
......@@ -1178,6 +1180,259 @@ nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return status == nfserr_same ? nfs_ok : status;
}
#ifdef CONFIG_NFSD_PNFS
static const struct nfsd4_layout_ops *
nfsd4_layout_verify(struct svc_export *exp, unsigned int layout_type)
{
if (!exp->ex_layout_type) {
dprintk("%s: export does not support pNFS\n", __func__);
return NULL;
}
if (exp->ex_layout_type != layout_type) {
dprintk("%s: layout type %d not supported\n",
__func__, layout_type);
return NULL;
}
return nfsd4_layout_ops[layout_type];
}
static __be32
nfsd4_getdeviceinfo(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
struct nfsd4_getdeviceinfo *gdp)
{
const struct nfsd4_layout_ops *ops;
struct nfsd4_deviceid_map *map;
struct svc_export *exp;
__be32 nfserr;
dprintk("%s: layout_type %u dev_id [0x%llx:0x%x] maxcnt %u\n",
__func__,
gdp->gd_layout_type,
gdp->gd_devid.fsid_idx, gdp->gd_devid.generation,
gdp->gd_maxcount);
map = nfsd4_find_devid_map(gdp->gd_devid.fsid_idx);
if (!map) {
dprintk("%s: couldn't find device ID to export mapping!\n",
__func__);
return nfserr_noent;
}
exp = rqst_exp_find(rqstp, map->fsid_type, map->fsid);
if (IS_ERR(exp)) {
dprintk("%s: could not find device id\n", __func__);
return nfserr_noent;
}
nfserr = nfserr_layoutunavailable;
ops = nfsd4_layout_verify(exp, gdp->gd_layout_type);
if (!ops)
goto out;
nfserr = nfs_ok;
if (gdp->gd_maxcount != 0)
nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb, gdp);
gdp->gd_notify_types &= ops->notify_types;
exp_put(exp);
out:
return nfserr;
}
static __be32
nfsd4_layoutget(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
struct nfsd4_layoutget *lgp)
{
struct svc_fh *current_fh = &cstate->current_fh;
const struct nfsd4_layout_ops *ops;
struct nfs4_layout_stateid *ls;
__be32 nfserr;
int accmode;
switch (lgp->lg_seg.iomode) {
case IOMODE_READ:
accmode = NFSD_MAY_READ;
break;
case IOMODE_RW:
accmode = NFSD_MAY_READ | NFSD_MAY_WRITE;
break;
default:
dprintk("%s: invalid iomode %d\n",
__func__, lgp->lg_seg.iomode);
nfserr = nfserr_badiomode;
goto out;
}
nfserr = fh_verify(rqstp, current_fh, 0, accmode);
if (nfserr)
goto out;
nfserr = nfserr_layoutunavailable;
ops = nfsd4_layout_verify(current_fh->fh_export, lgp->lg_layout_type);
if (!ops)
goto out;
/*
* Verify minlength and range as per RFC5661:
* o If loga_length is less than loga_minlength,
* the metadata server MUST return NFS4ERR_INVAL.
* o If the sum of loga_offset and loga_minlength exceeds
* NFS4_UINT64_MAX, and loga_minlength is not
* NFS4_UINT64_MAX, the error NFS4ERR_INVAL MUST result.
* o If the sum of loga_offset and loga_length exceeds
* NFS4_UINT64_MAX, and loga_length is not NFS4_UINT64_MAX,
* the error NFS4ERR_INVAL MUST result.
*/
nfserr = nfserr_inval;
if (lgp->lg_seg.length < lgp->lg_minlength ||
(lgp->lg_minlength != NFS4_MAX_UINT64 &&
lgp->lg_minlength > NFS4_MAX_UINT64 - lgp->lg_seg.offset) ||
(lgp->lg_seg.length != NFS4_MAX_UINT64 &&
lgp->lg_seg.length > NFS4_MAX_UINT64 - lgp->lg_seg.offset))
goto out;
if (lgp->lg_seg.length == 0)
goto out;
nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lgp->lg_sid,
true, lgp->lg_layout_type, &ls);
if (nfserr) {
trace_layout_get_lookup_fail(&lgp->lg_sid);
goto out;
}
nfserr = nfserr_recallconflict;
if (atomic_read(&ls->ls_stid.sc_file->fi_lo_recalls))
goto out_put_stid;
nfserr = ops->proc_layoutget(current_fh->fh_dentry->d_inode,
current_fh, lgp);
if (nfserr)
goto out_put_stid;
nfserr = nfsd4_insert_layout(lgp, ls);
out_put_stid:
nfs4_put_stid(&ls->ls_stid);
out:
return nfserr;
}
static __be32
nfsd4_layoutcommit(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
struct nfsd4_layoutcommit *lcp)
{
const struct nfsd4_layout_seg *seg = &lcp->lc_seg;
struct svc_fh *current_fh = &cstate->current_fh;
const struct nfsd4_layout_ops *ops;
loff_t new_size = lcp->lc_last_wr + 1;
struct inode *inode;
struct nfs4_layout_stateid *ls;
__be32 nfserr;
nfserr = fh_verify(rqstp, current_fh, 0, NFSD_MAY_WRITE);
if (nfserr)
goto out;
nfserr = nfserr_layoutunavailable;
ops = nfsd4_layout_verify(current_fh->fh_export, lcp->lc_layout_type);
if (!ops)
goto out;
inode = current_fh->fh_dentry->d_inode;
nfserr = nfserr_inval;
if (new_size <= seg->offset) {
dprintk("pnfsd: last write before layout segment\n");
goto out;
}
if (new_size > seg->offset + seg->length) {
dprintk("pnfsd: last write beyond layout segment\n");
goto out;
}
if (!lcp->lc_newoffset && new_size > i_size_read(inode)) {
dprintk("pnfsd: layoutcommit beyond EOF\n");
goto out;
}
nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lcp->lc_sid,
false, lcp->lc_layout_type,
&ls);
if (nfserr) {
trace_layout_commit_lookup_fail(&lcp->lc_sid);
/* fixup error code as per RFC5661 */
if (nfserr == nfserr_bad_stateid)
nfserr = nfserr_badlayout;
goto out;
}
nfserr = ops->proc_layoutcommit(inode, lcp);
if (nfserr)
goto out_put_stid;
if (new_size > i_size_read(inode)) {
lcp->lc_size_chg = 1;
lcp->lc_newsize = new_size;
} else {
lcp->lc_size_chg = 0;
}
out_put_stid:
nfs4_put_stid(&ls->ls_stid);
out:
return nfserr;
}
static __be32
nfsd4_layoutreturn(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
struct nfsd4_layoutreturn *lrp)
{
struct svc_fh *current_fh = &cstate->current_fh;
__be32 nfserr;
nfserr = fh_verify(rqstp, current_fh, 0, NFSD_MAY_NOP);
if (nfserr)
goto out;
nfserr = nfserr_layoutunavailable;
if (!nfsd4_layout_verify(current_fh->fh_export, lrp->lr_layout_type))
goto out;
switch (lrp->lr_seg.iomode) {
case IOMODE_READ:
case IOMODE_RW:
case IOMODE_ANY:
break;
default:
dprintk("%s: invalid iomode %d\n", __func__,
lrp->lr_seg.iomode);
nfserr = nfserr_inval;
goto out;
}
switch (lrp->lr_return_type) {
case RETURN_FILE:
nfserr = nfsd4_return_file_layouts(rqstp, cstate, lrp);
break;
case RETURN_FSID:
case RETURN_ALL:
nfserr = nfsd4_return_client_layouts(rqstp, cstate, lrp);
break;
default:
dprintk("%s: invalid return_type %d\n", __func__,
lrp->lr_return_type);
nfserr = nfserr_inval;
break;
}
out:
return nfserr;
}
#endif /* CONFIG_NFSD_PNFS */
/*
* NULL call.
*/
......@@ -1679,6 +1934,36 @@ static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd
op_encode_channel_attrs_maxsz) * sizeof(__be32);
}
#ifdef CONFIG_NFSD_PNFS
/*
* At this stage we don't really know what layout driver will handle the request,
* so we need to define an arbitrary upper bound here.
*/
#define MAX_LAYOUT_SIZE 128
static inline u32 nfsd4_layoutget_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
return (op_encode_hdr_size +
1 /* logr_return_on_close */ +
op_encode_stateid_maxsz +
1 /* nr of layouts */ +
MAX_LAYOUT_SIZE) * sizeof(__be32);
}
static inline u32 nfsd4_layoutcommit_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
return (op_encode_hdr_size +
1 /* locr_newsize */ +
2 /* ns_size */) * sizeof(__be32);
}
static inline u32 nfsd4_layoutreturn_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
return (op_encode_hdr_size +
1 /* lrs_stateid */ +
op_encode_stateid_maxsz) * sizeof(__be32);
}
#endif /* CONFIG_NFSD_PNFS */
static struct nfsd4_operation nfsd4_ops[] = {
[OP_ACCESS] = {
.op_func = (nfsd4op_func)nfsd4_access,
......@@ -1966,6 +2251,31 @@ static struct nfsd4_operation nfsd4_ops[] = {
.op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
#ifdef CONFIG_NFSD_PNFS
[OP_GETDEVICEINFO] = {
.op_func = (nfsd4op_func)nfsd4_getdeviceinfo,
.op_flags = ALLOWED_WITHOUT_FH,
.op_name = "OP_GETDEVICEINFO",
},
[OP_LAYOUTGET] = {
.op_func = (nfsd4op_func)nfsd4_layoutget,
.op_flags = OP_MODIFIES_SOMETHING,
.op_name = "OP_LAYOUTGET",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_layoutget_rsize,
},
[OP_LAYOUTCOMMIT] = {
.op_func = (nfsd4op_func)nfsd4_layoutcommit,
.op_flags = OP_MODIFIES_SOMETHING,
.op_name = "OP_LAYOUTCOMMIT",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_layoutcommit_rsize,
},
[OP_LAYOUTRETURN] = {
.op_func = (nfsd4op_func)nfsd4_layoutreturn,
.op_flags = OP_MODIFIES_SOMETHING,
.op_name = "OP_LAYOUTRETURN",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_layoutreturn_rsize,
},
#endif /* CONFIG_NFSD_PNFS */
/* NFSv4.2 operations */
[OP_ALLOCATE] = {
......
......@@ -48,6 +48,7 @@
#include "current_stateid.h"
#include "netns.h"
#include "pnfs.h"
#define NFSDDBG_FACILITY NFSDDBG_PROC
......@@ -150,16 +151,6 @@ renew_client_locked(struct nfs4_client *clp)
clp->cl_time = get_seconds();
}
static inline void
renew_client(struct nfs4_client *clp)
{
struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
spin_lock(&nn->client_lock);
renew_client_locked(clp);
spin_unlock(&nn->client_lock);
}
static void put_client_renew_locked(struct nfs4_client *clp)
{
struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
......@@ -282,7 +273,7 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
kmem_cache_free(file_slab, fp);
}
static inline void
void
put_nfs4_file(struct nfs4_file *fi)
{
might_lock(&state_lock);
......@@ -295,12 +286,6 @@ put_nfs4_file(struct nfs4_file *fi)
}
}
static inline void
get_nfs4_file(struct nfs4_file *fi)
{
atomic_inc(&fi->fi_ref);
}
static struct file *
__nfs4_get_fd(struct nfs4_file *f, int oflag)
{
......@@ -358,7 +343,7 @@ find_readable_file(struct nfs4_file *f)
return ret;
}
static struct file *
struct file *
find_any_file(struct nfs4_file *f)
{
struct file *ret;
......@@ -408,14 +393,6 @@ static unsigned int file_hashval(struct knfsd_fh *fh)
return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1);
}
static bool nfsd_fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
{
return fh1->fh_size == fh2->fh_size &&
!memcmp(fh1->fh_base.fh_pad,
fh2->fh_base.fh_pad,
fh1->fh_size);
}
static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
static void
......@@ -494,7 +471,7 @@ static void nfs4_file_put_access(struct nfs4_file *fp, u32 access)
__nfs4_file_put_access(fp, O_RDONLY);
}
static struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl,
struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl,
struct kmem_cache *slab)
{
struct nfs4_stid *stid;
......@@ -688,17 +665,17 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
struct file *filp = NULL;
spin_lock(&fp->fi_lock);
if (fp->fi_deleg_file && atomic_dec_and_test(&fp->fi_delegees))
if (fp->fi_deleg_file && --fp->fi_delegees == 0)
swap(filp, fp->fi_deleg_file);
spin_unlock(&fp->fi_lock);
if (filp) {
vfs_setlease(filp, F_UNLCK, NULL, NULL);
vfs_setlease(filp, F_UNLCK, NULL, (void **)&fp);
fput(filp);
}
}
static void unhash_stid(struct nfs4_stid *s)
void nfs4_unhash_stid(struct nfs4_stid *s)
{
s->sc_type = 0;
}
......@@ -1006,7 +983,7 @@ static void unhash_lock_stateid(struct nfs4_ol_stateid *stp)
list_del_init(&stp->st_locks);
unhash_ol_stateid(stp);
unhash_stid(&stp->st_stid);
nfs4_unhash_stid(&stp->st_stid);
}
static void release_lock_stateid(struct nfs4_ol_stateid *stp)
......@@ -1518,7 +1495,12 @@ unhash_session(struct nfsd4_session *ses)
static int
STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
{
if (clid->cl_boot == nn->boot_time)
/*
* We're assuming the clid was not given out from a boot
* precisely 2^32 (about 136 years) before this one. That seems
* a safe assumption:
*/
if (clid->cl_boot == (u32)nn->boot_time)
return 0;
dprintk("NFSD stale clientid (%08x/%08x) boot_time %08lx\n",
clid->cl_boot, clid->cl_id, nn->boot_time);
......@@ -1558,6 +1540,9 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
INIT_LIST_HEAD(&clp->cl_lru);
INIT_LIST_HEAD(&clp->cl_callbacks);
INIT_LIST_HEAD(&clp->cl_revoked);
#ifdef CONFIG_NFSD_PNFS
INIT_LIST_HEAD(&clp->cl_lo_states);
#endif
spin_lock_init(&clp->cl_lock);
rpc_init_wait_queue(&clp->cl_cb_waitq, "Backchannel slot table");
return clp;
......@@ -1662,6 +1647,7 @@ __destroy_client(struct nfs4_client *clp)
nfs4_get_stateowner(&oo->oo_owner);
release_openowner(oo);
}
nfsd4_return_all_client_layouts(clp);
nfsd4_shutdown_callback(clp);
if (clp->cl_cb_conn.cb_xprt)
svc_xprt_put(clp->cl_cb_conn.cb_xprt);
......@@ -2145,8 +2131,11 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
static void
nfsd4_set_ex_flags(struct nfs4_client *new, struct nfsd4_exchange_id *clid)
{
/* pNFS is not supported */
#ifdef CONFIG_NFSD_PNFS
new->cl_exchange_flags |= EXCHGID4_FLAG_USE_PNFS_MDS;
#else
new->cl_exchange_flags |= EXCHGID4_FLAG_USE_NON_PNFS;
#endif
/* Referrals are supported, Migration is not. */
new->cl_exchange_flags |= EXCHGID4_FLAG_SUPP_MOVED_REFER;
......@@ -3074,6 +3063,10 @@ static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval,
fp->fi_share_deny = 0;
memset(fp->fi_fds, 0, sizeof(fp->fi_fds));
memset(fp->fi_access, 0, sizeof(fp->fi_access));
#ifdef CONFIG_NFSD_PNFS
INIT_LIST_HEAD(&fp->fi_lo_states);
atomic_set(&fp->fi_lo_recalls, 0);
#endif
hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
}
......@@ -3300,7 +3293,7 @@ find_file_locked(struct knfsd_fh *fh, unsigned int hashval)
struct nfs4_file *fp;
hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash) {
if (nfsd_fh_match(&fp->fi_fhandle, fh)) {
if (fh_match(&fp->fi_fhandle, fh)) {
if (atomic_inc_not_zero(&fp->fi_ref))
return fp;
}
......@@ -3308,7 +3301,7 @@ find_file_locked(struct knfsd_fh *fh, unsigned int hashval)
return NULL;
}
static struct nfs4_file *
struct nfs4_file *
find_file(struct knfsd_fh *fh)
{
struct nfs4_file *fp;
......@@ -3856,12 +3849,12 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
/* Race breaker */
if (fp->fi_deleg_file) {
status = 0;
atomic_inc(&fp->fi_delegees);
++fp->fi_delegees;
hash_delegation_locked(dp, fp);
goto out_unlock;
}
fp->fi_deleg_file = filp;
atomic_set(&fp->fi_delegees, 1);
fp->fi_delegees = 1;
hash_delegation_locked(dp, fp);
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
......@@ -3902,7 +3895,7 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
status = -EAGAIN;
goto out_unlock;
}
atomic_inc(&fp->fi_delegees);
++fp->fi_delegees;
hash_delegation_locked(dp, fp);
status = 0;
out_unlock:
......@@ -4295,7 +4288,7 @@ laundromat_main(struct work_struct *laundry)
static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_ol_stateid *stp)
{
if (!nfsd_fh_match(&fhp->fh_handle, &stp->st_stid.sc_file->fi_fhandle))
if (!fh_match(&fhp->fh_handle, &stp->st_stid.sc_file->fi_fhandle))
return nfserr_bad_stateid;
return nfs_ok;
}
......@@ -4446,7 +4439,7 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
return status;
}
static __be32
__be32
nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
stateid_t *stateid, unsigned char typemask,
struct nfs4_stid **s, struct nfsd_net *nn)
......@@ -4860,6 +4853,9 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
update_stateid(&stp->st_stid.sc_stateid);
memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
nfsd4_return_all_file_layouts(stp->st_stateowner->so_client,
stp->st_stid.sc_file);
nfsd4_close_open_stateid(stp);
/* put reference from nfs4_preprocess_seqid_op */
......
This diff is collapsed.
......@@ -21,6 +21,7 @@
#include "cache.h"
#include "state.h"
#include "netns.h"
#include "pnfs.h"
/*
* We have a single directory with several nodes in it.
......@@ -1258,9 +1259,12 @@ static int __init init_nfsd(void)
retval = nfsd4_init_slabs();
if (retval)
goto out_unregister_pernet;
retval = nfsd_fault_inject_init(); /* nfsd fault injection controls */
retval = nfsd4_init_pnfs();
if (retval)
goto out_free_slabs;
retval = nfsd_fault_inject_init(); /* nfsd fault injection controls */
if (retval)
goto out_exit_pnfs;
nfsd_stat_init(); /* Statistics */
retval = nfsd_reply_cache_init();
if (retval)
......@@ -1282,6 +1286,8 @@ static int __init init_nfsd(void)
out_free_stat:
nfsd_stat_shutdown();
nfsd_fault_inject_cleanup();
out_exit_pnfs:
nfsd4_exit_pnfs();
out_free_slabs:
nfsd4_free_slabs();
out_unregister_pernet:
......@@ -1299,6 +1305,7 @@ static void __exit exit_nfsd(void)
nfsd_stat_shutdown();
nfsd_lockd_shutdown();
nfsd4_free_slabs();
nfsd4_exit_pnfs();
nfsd_fault_inject_cleanup();
unregister_filesystem(&nfsd_fs_type);
unregister_pernet_subsys(&nfsd_net_ops);
......
......@@ -325,15 +325,27 @@ void nfsd_lockd_shutdown(void);
#define NFSD4_SUPPORTED_ATTRS_WORD2 0
/* 4.1 */
#ifdef CONFIG_NFSD_PNFS
#define PNFSD_SUPPORTED_ATTRS_WORD1 FATTR4_WORD1_FS_LAYOUT_TYPES
#define PNFSD_SUPPORTED_ATTRS_WORD2 \
(FATTR4_WORD2_LAYOUT_BLKSIZE | FATTR4_WORD2_LAYOUT_TYPES)
#else
#define PNFSD_SUPPORTED_ATTRS_WORD1 0
#define PNFSD_SUPPORTED_ATTRS_WORD2 0
#endif /* CONFIG_NFSD_PNFS */
#define NFSD4_1_SUPPORTED_ATTRS_WORD0 \
NFSD4_SUPPORTED_ATTRS_WORD0
#define NFSD4_1_SUPPORTED_ATTRS_WORD1 \
NFSD4_SUPPORTED_ATTRS_WORD1
(NFSD4_SUPPORTED_ATTRS_WORD1 | PNFSD_SUPPORTED_ATTRS_WORD1)
#define NFSD4_1_SUPPORTED_ATTRS_WORD2 \
(NFSD4_SUPPORTED_ATTRS_WORD2 | FATTR4_WORD2_SUPPATTR_EXCLCREAT)
(NFSD4_SUPPORTED_ATTRS_WORD2 | PNFSD_SUPPORTED_ATTRS_WORD2 | \
FATTR4_WORD2_SUPPATTR_EXCLCREAT)
/* 4.2 */
#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
#define NFSD4_2_SECURITY_ATTRS FATTR4_WORD2_SECURITY_LABEL
#else
......
......@@ -187,6 +187,24 @@ fh_init(struct svc_fh *fhp, int maxsize)
return fhp;
}
static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
{
if (fh1->fh_size != fh2->fh_size)
return false;
if (memcmp(fh1->fh_base.fh_pad, fh2->fh_base.fh_pad, fh1->fh_size) != 0)
return false;
return true;
}
static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
{
if (fh1->fh_fsid_type != fh2->fh_fsid_type)
return false;
if (memcmp(fh1->fh_fsid, fh2->fh_fsid, key_len(fh1->fh_fsid_type) != 0))
return false;
return true;
}
#ifdef CONFIG_NFSD_V3
/*
* The wcc data stored in current_fh should be cleared
......
......@@ -119,6 +119,7 @@ struct svc_program nfsd_program = {
static bool nfsd_supported_minorversions[NFSD_SUPPORTED_MINOR_VERSION + 1] = {
[0] = 1,
[1] = 1,
[2] = 1,
};
int nfsd_vers(int vers, enum vers_op change)
......
#ifndef _FS_NFSD_PNFS_H
#define _FS_NFSD_PNFS_H 1
#include <linux/exportfs.h>
#include <linux/nfsd/export.h>
#include "state.h"
#include "xdr4.h"
struct xdr_stream;
struct nfsd4_deviceid_map {
struct list_head hash;
u64 idx;
int fsid_type;
u32 fsid[];
};
struct nfsd4_layout_ops {
u32 notify_types;
__be32 (*proc_getdeviceinfo)(struct super_block *sb,
struct nfsd4_getdeviceinfo *gdevp);
__be32 (*encode_getdeviceinfo)(struct xdr_stream *xdr,
struct nfsd4_getdeviceinfo *gdevp);
__be32 (*proc_layoutget)(struct inode *, const struct svc_fh *fhp,
struct nfsd4_layoutget *lgp);
__be32 (*encode_layoutget)(struct xdr_stream *,
struct nfsd4_layoutget *lgp);
__be32 (*proc_layoutcommit)(struct inode *inode,
struct nfsd4_layoutcommit *lcp);
};
extern const struct nfsd4_layout_ops *nfsd4_layout_ops[];
extern const struct nfsd4_layout_ops bl_layout_ops;
__be32 nfsd4_preprocess_layout_stateid(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate, stateid_t *stateid,
bool create, u32 layout_type, struct nfs4_layout_stateid **lsp);
__be32 nfsd4_insert_layout(struct nfsd4_layoutget *lgp,
struct nfs4_layout_stateid *ls);
__be32 nfsd4_return_file_layouts(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
struct nfsd4_layoutreturn *lrp);
__be32 nfsd4_return_client_layouts(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
struct nfsd4_layoutreturn *lrp);
int nfsd4_set_deviceid(struct nfsd4_deviceid *id, const struct svc_fh *fhp,
u32 device_generation);
struct nfsd4_deviceid_map *nfsd4_find_devid_map(int idx);
#ifdef CONFIG_NFSD_PNFS
void nfsd4_setup_layout_type(struct svc_export *exp);
void nfsd4_return_all_client_layouts(struct nfs4_client *);
void nfsd4_return_all_file_layouts(struct nfs4_client *clp,
struct nfs4_file *fp);
int nfsd4_init_pnfs(void);
void nfsd4_exit_pnfs(void);
#else
static inline void nfsd4_setup_layout_type(struct svc_export *exp)
{
}
static inline void nfsd4_return_all_client_layouts(struct nfs4_client *clp)
{
}
static inline void nfsd4_return_all_file_layouts(struct nfs4_client *clp,
struct nfs4_file *fp)
{
}
static inline void nfsd4_exit_pnfs(void)
{
}
static inline int nfsd4_init_pnfs(void)
{
return 0;
}
#endif /* CONFIG_NFSD_PNFS */
#endif /* _FS_NFSD_PNFS_H */
......@@ -92,6 +92,7 @@ struct nfs4_stid {
/* For a deleg stateid kept around only to process free_stateid's: */
#define NFS4_REVOKED_DELEG_STID 16
#define NFS4_CLOSED_DELEG_STID 32
#define NFS4_LAYOUT_STID 64
unsigned char sc_type;
stateid_t sc_stateid;
struct nfs4_client *sc_client;
......@@ -297,6 +298,9 @@ struct nfs4_client {
struct list_head cl_delegations;
struct list_head cl_revoked; /* unacknowledged, revoked 4.1 state */
struct list_head cl_lru; /* tail queue */
#ifdef CONFIG_NFSD_PNFS
struct list_head cl_lo_states; /* outstanding layout states */
#endif
struct xdr_netobj cl_name; /* id generated by client */
nfs4_verifier cl_verifier; /* generated by client */
time_t cl_time; /* time of last lease renewal */
......@@ -493,9 +497,13 @@ struct nfs4_file {
atomic_t fi_access[2];
u32 fi_share_deny;
struct file *fi_deleg_file;
atomic_t fi_delegees;
int fi_delegees;
struct knfsd_fh fi_fhandle;
bool fi_had_conflict;
#ifdef CONFIG_NFSD_PNFS
struct list_head fi_lo_states;
atomic_t fi_lo_recalls;
#endif
};
/*
......@@ -528,6 +536,24 @@ static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
return container_of(s, struct nfs4_ol_stateid, st_stid);
}
struct nfs4_layout_stateid {
struct nfs4_stid ls_stid;
struct list_head ls_perclnt;
struct list_head ls_perfile;
spinlock_t ls_lock;
struct list_head ls_layouts;
u32 ls_layout_type;
struct file *ls_file;
struct nfsd4_callback ls_recall;
stateid_t ls_recall_sid;
bool ls_recalled;
};
static inline struct nfs4_layout_stateid *layoutstateid(struct nfs4_stid *s)
{
return container_of(s, struct nfs4_layout_stateid, ls_stid);
}
/* flags for preprocess_seqid_op() */
#define RD_STATE 0x00000010
#define WR_STATE 0x00000020
......@@ -535,6 +561,7 @@ static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
enum nfsd4_cb_op {
NFSPROC4_CLNT_CB_NULL = 0,
NFSPROC4_CLNT_CB_RECALL,
NFSPROC4_CLNT_CB_LAYOUT,
NFSPROC4_CLNT_CB_SEQUENCE,
};
......@@ -545,6 +572,12 @@ struct nfsd_net;
extern __be32 nfs4_preprocess_stateid_op(struct net *net,
struct nfsd4_compound_state *cstate,
stateid_t *stateid, int flags, struct file **filp);
__be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
stateid_t *stateid, unsigned char typemask,
struct nfs4_stid **s, struct nfsd_net *nn);
struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl,
struct kmem_cache *slab);
void nfs4_unhash_stid(struct nfs4_stid *s);
void nfs4_put_stid(struct nfs4_stid *s);
void nfs4_remove_reclaim_record(struct nfs4_client_reclaim *, struct nfsd_net *);
extern void nfs4_release_reclaim(struct nfsd_net *);
......@@ -567,6 +600,14 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
struct nfsd_net *nn);
extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
struct nfs4_file *find_file(struct knfsd_fh *fh);
void put_nfs4_file(struct nfs4_file *fi);
static inline void get_nfs4_file(struct nfs4_file *fi)
{
atomic_inc(&fi->fi_ref);
}
struct file *find_any_file(struct nfs4_file *f);
/* grace period management */
void nfsd4_end_grace(struct nfsd_net *nn);
......
#include "state.h"
#define CREATE_TRACE_POINTS
#include "trace.h"
/*
* Copyright (c) 2014 Christoph Hellwig.
*/
#undef TRACE_SYSTEM
#define TRACE_SYSTEM nfsd
#if !defined(_NFSD_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
#define _NFSD_TRACE_H
#include <linux/tracepoint.h>
DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
TP_ARGS(stp),
TP_STRUCT__entry(
__field(u32, cl_boot)
__field(u32, cl_id)
__field(u32, si_id)
__field(u32, si_generation)
),
TP_fast_assign(
__entry->cl_boot = stp->si_opaque.so_clid.cl_boot;
__entry->cl_id = stp->si_opaque.so_clid.cl_id;
__entry->si_id = stp->si_opaque.so_id;
__entry->si_generation = stp->si_generation;
),
TP_printk("client %08x:%08x stateid %08x:%08x",
__entry->cl_boot,
__entry->cl_id,
__entry->si_id,
__entry->si_generation)
)
#define DEFINE_STATEID_EVENT(name) \
DEFINE_EVENT(nfsd_stateid_class, name, \
TP_PROTO(stateid_t *stp), \
TP_ARGS(stp))
DEFINE_STATEID_EVENT(layoutstate_alloc);
DEFINE_STATEID_EVENT(layoutstate_unhash);
DEFINE_STATEID_EVENT(layoutstate_free);
DEFINE_STATEID_EVENT(layout_get_lookup_fail);
DEFINE_STATEID_EVENT(layout_commit_lookup_fail);
DEFINE_STATEID_EVENT(layout_return_lookup_fail);
DEFINE_STATEID_EVENT(layout_recall);
DEFINE_STATEID_EVENT(layout_recall_done);
DEFINE_STATEID_EVENT(layout_recall_fail);
DEFINE_STATEID_EVENT(layout_recall_release);
#endif /* _NFSD_TRACE_H */
#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH .
#define TRACE_INCLUDE_FILE trace
#include <trace/define_trace.h>
......@@ -428,6 +428,61 @@ struct nfsd4_reclaim_complete {
u32 rca_one_fs;
};
struct nfsd4_deviceid {
u64 fsid_idx;
u32 generation;
u32 pad;
};
struct nfsd4_layout_seg {
u32 iomode;
u64 offset;
u64 length;
};
struct nfsd4_getdeviceinfo {
struct nfsd4_deviceid gd_devid; /* request */
u32 gd_layout_type; /* request */
u32 gd_maxcount; /* request */
u32 gd_notify_types;/* request - response */
void *gd_device; /* response */
};
struct nfsd4_layoutget {
u64 lg_minlength; /* request */
u32 lg_signal; /* request */
u32 lg_layout_type; /* request */
u32 lg_maxcount; /* request */
stateid_t lg_sid; /* request/response */
struct nfsd4_layout_seg lg_seg; /* request/response */
void *lg_content; /* response */
};
struct nfsd4_layoutcommit {
stateid_t lc_sid; /* request */
struct nfsd4_layout_seg lc_seg; /* request */
u32 lc_reclaim; /* request */
u32 lc_newoffset; /* request */
u64 lc_last_wr; /* request */
struct timespec lc_mtime; /* request */
u32 lc_layout_type; /* request */
u32 lc_up_len; /* layout length */
void *lc_up_layout; /* decoded by callback */
u32 lc_size_chg; /* boolean for response */
u64 lc_newsize; /* response */
};
struct nfsd4_layoutreturn {
u32 lr_return_type; /* request */
u32 lr_layout_type; /* request */
struct nfsd4_layout_seg lr_seg; /* request */
u32 lr_reclaim; /* request */
u32 lrf_body_len; /* request */
void *lrf_body; /* request */
stateid_t lr_sid; /* request/response */
u32 lrs_present; /* response */
};
struct nfsd4_fallocate {
/* request */
stateid_t falloc_stateid;
......@@ -491,6 +546,10 @@ struct nfsd4_op {
struct nfsd4_reclaim_complete reclaim_complete;
struct nfsd4_test_stateid test_stateid;
struct nfsd4_free_stateid free_stateid;
struct nfsd4_getdeviceinfo getdeviceinfo;
struct nfsd4_layoutget layoutget;
struct nfsd4_layoutcommit layoutcommit;
struct nfsd4_layoutreturn layoutreturn;
/* NFSv4.2 */
struct nfsd4_fallocate allocate;
......
......@@ -21,3 +21,10 @@
#define NFS4_dec_cb_recall_sz (cb_compound_dec_hdr_sz + \
cb_sequence_dec_sz + \
op_dec_sz)
#define NFS4_enc_cb_layout_sz (cb_compound_enc_hdr_sz + \
cb_sequence_enc_sz + \
1 + 3 + \
enc_nfs4_fh_sz + 4)
#define NFS4_dec_cb_layout_sz (cb_compound_dec_hdr_sz + \
cb_sequence_dec_sz + \
op_dec_sz)
......@@ -4,6 +4,7 @@
#include <linux/types.h>
struct dentry;
struct iattr;
struct inode;
struct super_block;
struct vfsmount;
......@@ -180,6 +181,21 @@ struct fid {
* get_name is not (which is possibly inconsistent)
*/
/* types of block ranges for multipage write mappings. */
#define IOMAP_HOLE 0x01 /* no blocks allocated, need allocation */
#define IOMAP_DELALLOC 0x02 /* delayed allocation blocks */
#define IOMAP_MAPPED 0x03 /* blocks allocated @blkno */
#define IOMAP_UNWRITTEN 0x04 /* blocks allocated @blkno in unwritten state */
#define IOMAP_NULL_BLOCK -1LL /* blkno is not valid */
struct iomap {
sector_t blkno; /* first sector of mapping */
loff_t offset; /* file offset of mapping, bytes */
u64 length; /* length of mapping, bytes */
int type; /* type of mapping */
};
struct export_operations {
int (*encode_fh)(struct inode *inode, __u32 *fh, int *max_len,
struct inode *parent);
......@@ -191,6 +207,13 @@ struct export_operations {
struct dentry *child);
struct dentry * (*get_parent)(struct dentry *child);
int (*commit_metadata)(struct inode *inode);
int (*get_uuid)(struct super_block *sb, u8 *buf, u32 *len, u64 *offset);
int (*map_blocks)(struct inode *inode, loff_t offset,
u64 len, struct iomap *iomap,
bool write, u32 *device_generation);
int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
int nr_iomaps, struct iattr *iattr);
};
extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
......
......@@ -873,6 +873,7 @@ static inline struct file *get_file(struct file *f)
#define FL_DOWNGRADE_PENDING 256 /* Lease is being downgraded */
#define FL_UNLOCK_PENDING 512 /* Lease is being broken */
#define FL_OFDLCK 1024 /* lock is "owned" by struct file */
#define FL_LAYOUT 2048 /* outstanding pNFS layout */
/*
* Special return value from posix_lock_file() and vfs_lock_file() for
......@@ -2035,6 +2036,16 @@ static inline int break_deleg_wait(struct inode **delegated_inode)
return ret;
}
static inline int break_layout(struct inode *inode, bool wait)
{
smp_mb();
if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
return __break_lease(inode,
wait ? O_WRONLY : O_WRONLY | O_NONBLOCK,
FL_LAYOUT);
return 0;
}
#else /* !CONFIG_FILE_LOCKING */
static inline int locks_mandatory_locked(struct file *file)
{
......@@ -2090,6 +2101,11 @@ static inline int break_deleg_wait(struct inode **delegated_inode)
return 0;
}
static inline int break_layout(struct inode *inode, bool wait)
{
return 0;
}
#endif /* CONFIG_FILE_LOCKING */
/* fs/open.c */
......
......@@ -411,6 +411,7 @@ enum lock_type4 {
#define FATTR4_WORD1_TIME_MODIFY_SET (1UL << 22)
#define FATTR4_WORD1_MOUNTED_ON_FILEID (1UL << 23)
#define FATTR4_WORD1_FS_LAYOUT_TYPES (1UL << 30)
#define FATTR4_WORD2_LAYOUT_TYPES (1UL << 0)
#define FATTR4_WORD2_LAYOUT_BLKSIZE (1UL << 1)
#define FATTR4_WORD2_MDSTHRESHOLD (1UL << 4)
#define FATTR4_WORD2_SECURITY_LABEL (1UL << 16)
......@@ -517,6 +518,7 @@ enum pnfs_layouttype {
LAYOUT_OSD2_OBJECTS = 2,
LAYOUT_BLOCK_VOLUME = 3,
LAYOUT_FLEX_FILES = 4,
LAYOUT_TYPE_MAX
};
/* used for both layout return and recall */
......
......@@ -110,7 +110,7 @@ struct svc_serv {
* We use sv_nrthreads as a reference count. svc_destroy() drops
* this refcount, so we need to bump it up around operations that
* change the number of threads. Horrible, but there it is.
* Should be called with the BKL held.
* Should be called with the "service mutex" held.
*/
static inline void svc_get(struct svc_serv *serv)
{
......
......@@ -77,6 +77,7 @@ struct svc_rdma_op_ctxt {
enum ib_wr_opcode wr_op;
enum ib_wc_status wc_status;
u32 byte_len;
u32 position;
struct svcxprt_rdma *xprt;
unsigned long flags;
enum dma_data_direction direction;
......@@ -148,6 +149,10 @@ struct svcxprt_rdma {
struct ib_cq *sc_rq_cq;
struct ib_cq *sc_sq_cq;
struct ib_mr *sc_phys_mr; /* MR for server memory */
int (*sc_reader)(struct svcxprt_rdma *,
struct svc_rqst *,
struct svc_rdma_op_ctxt *,
int *, u32 *, u32, u32, u64, bool);
u32 sc_dev_caps; /* distilled device caps */
u32 sc_dma_lkey; /* local dma key */
unsigned int sc_frmr_pg_list_len;
......@@ -176,8 +181,6 @@ struct svcxprt_rdma {
#define RPCRDMA_MAX_REQ_SIZE 4096
/* svc_rdma_marshal.c */
extern void svc_rdma_rcl_chunk_counts(struct rpcrdma_read_chunk *,
int *, int *);
extern int svc_rdma_xdr_decode_req(struct rpcrdma_msg **, struct svc_rqst *);
extern int svc_rdma_xdr_decode_deferred_req(struct svc_rqst *);
extern int svc_rdma_xdr_encode_error(struct svcxprt_rdma *,
......@@ -195,6 +198,12 @@ extern int svc_rdma_xdr_get_reply_hdr_len(struct rpcrdma_msg *);
/* svc_rdma_recvfrom.c */
extern int svc_rdma_recvfrom(struct svc_rqst *);
extern int rdma_read_chunk_lcl(struct svcxprt_rdma *, struct svc_rqst *,
struct svc_rdma_op_ctxt *, int *, u32 *,
u32, u32, u64, bool);
extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,
struct svc_rdma_op_ctxt *, int *, u32 *,
u32, u32, u64, bool);
/* svc_rdma_sendto.c */
extern int svc_rdma_sendto(struct svc_rqst *);
......
......@@ -32,6 +32,7 @@
#define NFSDDBG_REPCACHE 0x0080
#define NFSDDBG_XDR 0x0100
#define NFSDDBG_LOCKD 0x0200
#define NFSDDBG_PNFS 0x0400
#define NFSDDBG_ALL 0x7FFF
#define NFSDDBG_NOCHANGE 0xFFFF
......
......@@ -47,8 +47,10 @@
* exported filesystem.
*/
#define NFSEXP_V4ROOT 0x10000
#define NFSEXP_NOPNFS 0x20000
/* All flags that we claim to support. (Note we don't support NOACL.) */
#define NFSEXP_ALLFLAGS 0x1FE7F
#define NFSEXP_ALLFLAGS 0x3FE7F
/* The flags that may vary depending on security flavor: */
#define NFSEXP_SECINFO_FLAGS (NFSEXP_READONLY | NFSEXP_ROOTSQUASH \
......
......@@ -768,8 +768,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
EXPORT_SYMBOL_GPL(svc_set_num_threads);
/*
* Called from a server thread as it's exiting. Caller must hold the BKL or
* the "service mutex", whichever is appropriate for the service.
* Called from a server thread as it's exiting. Caller must hold the "service
* mutex" for the service.
*/
void
svc_exit_thread(struct svc_rqst *rqstp)
......
......@@ -42,7 +42,7 @@ static LIST_HEAD(svc_xprt_class_list);
* svc_pool->sp_lock protects most of the fields of that pool.
* svc_serv->sv_lock protects sv_tempsocks, sv_permsocks, sv_tmpcnt.
* when both need to be taken (rare), svc_serv->sv_lock is first.
* BKL protects svc_serv->sv_nrthread.
* The "service mutex" protects svc_serv->sv_nrthread.
* svc_sock->sk_lock protects the svc_sock->sk_deferred list
* and the ->sk_info_authunix cache.
*
......@@ -67,7 +67,6 @@ static LIST_HEAD(svc_xprt_class_list);
* that no other thread will be using the transport or will
* try to set XPT_DEAD.
*/
int svc_reg_xprt_class(struct svc_xprt_class *xcl)
{
struct svc_xprt_class *cl;
......
......@@ -70,22 +70,6 @@ static u32 *decode_read_list(u32 *va, u32 *vaend)
return (u32 *)&ch->rc_position;
}
/*
* Determine number of chunks and total bytes in chunk list. The chunk
* list has already been verified to fit within the RPCRDMA header.
*/
void svc_rdma_rcl_chunk_counts(struct rpcrdma_read_chunk *ch,
int *ch_count, int *byte_count)
{
/* compute the number of bytes represented by read chunks */
*byte_count = 0;
*ch_count = 0;
for (; ch->rc_discrim != 0; ch++) {
*byte_count = *byte_count + ntohl(ch->rc_target.rs_length);
*ch_count = *ch_count + 1;
}
}
/*
* Decodes a write chunk list. The expected format is as follows:
* descrim : xdr_one
......
This diff is collapsed.
......@@ -60,8 +60,11 @@ static int map_xdr(struct svcxprt_rdma *xprt,
u32 page_off;
int page_no;
BUG_ON(xdr->len !=
(xdr->head[0].iov_len + xdr->page_len + xdr->tail[0].iov_len));
if (xdr->len !=
(xdr->head[0].iov_len + xdr->page_len + xdr->tail[0].iov_len)) {
pr_err("svcrdma: map_xdr: XDR buffer length error\n");
return -EIO;
}
/* Skip the first sge, this is for the RPCRDMA header */
sge_no = 1;
......@@ -150,7 +153,11 @@ static int send_write(struct svcxprt_rdma *xprt, struct svc_rqst *rqstp,
int bc;
struct svc_rdma_op_ctxt *ctxt;
BUG_ON(vec->count > RPCSVC_MAXPAGES);
if (vec->count > RPCSVC_MAXPAGES) {
pr_err("svcrdma: Too many pages (%lu)\n", vec->count);
return -EIO;
}
dprintk("svcrdma: RDMA_WRITE rmr=%x, to=%llx, xdr_off=%d, "
"write_len=%d, vec->sge=%p, vec->count=%lu\n",
rmr, (unsigned long long)to, xdr_off,
......@@ -190,7 +197,10 @@ static int send_write(struct svcxprt_rdma *xprt, struct svc_rqst *rqstp,
sge_off = 0;
sge_no++;
xdr_sge_no++;
BUG_ON(xdr_sge_no > vec->count);
if (xdr_sge_no > vec->count) {
pr_err("svcrdma: Too many sges (%d)\n", xdr_sge_no);
goto err;
}
bc -= sge_bytes;
if (sge_no == xprt->sc_max_sge)
break;
......@@ -421,7 +431,10 @@ static int send_reply(struct svcxprt_rdma *rdma,
ctxt->sge[sge_no].lkey = rdma->sc_dma_lkey;
ctxt->sge[sge_no].length = sge_bytes;
}
BUG_ON(byte_count != 0);
if (byte_count != 0) {
pr_err("svcrdma: Could not map %d bytes\n", byte_count);
goto err;
}
/* Save all respages in the ctxt and remove them from the
* respages array. They are our pages until the I/O
......@@ -442,7 +455,10 @@ static int send_reply(struct svcxprt_rdma *rdma,
}
rqstp->rq_next_page = rqstp->rq_respages + 1;
BUG_ON(sge_no > rdma->sc_max_sge);
if (sge_no > rdma->sc_max_sge) {
pr_err("svcrdma: Too many sges (%d)\n", sge_no);
goto err;
}
memset(&send_wr, 0, sizeof send_wr);
ctxt->wr_op = IB_WR_SEND;
send_wr.wr_id = (unsigned long)ctxt;
......@@ -467,18 +483,6 @@ void svc_rdma_prep_reply_hdr(struct svc_rqst *rqstp)
{
}
/*
* Return the start of an xdr buffer.
*/
static void *xdr_start(struct xdr_buf *xdr)
{
return xdr->head[0].iov_base -
(xdr->len -
xdr->page_len -
xdr->tail[0].iov_len -
xdr->head[0].iov_len);
}
int svc_rdma_sendto(struct svc_rqst *rqstp)
{
struct svc_xprt *xprt = rqstp->rq_xprt;
......@@ -496,8 +500,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
dprintk("svcrdma: sending response for rqstp=%p\n", rqstp);
/* Get the RDMA request header. */
rdma_argp = xdr_start(&rqstp->rq_arg);
/* Get the RDMA request header. The receive logic always
* places this at the start of page 0.
*/
rdma_argp = page_address(rqstp->rq_pages[0]);
/* Build an req vec for the XDR */
ctxt = svc_rdma_get_context(rdma);
......
......@@ -139,7 +139,6 @@ void svc_rdma_put_context(struct svc_rdma_op_ctxt *ctxt, int free_pages)
struct svcxprt_rdma *xprt;
int i;
BUG_ON(!ctxt);
xprt = ctxt->xprt;
if (free_pages)
for (i = 0; i < ctxt->count; i++)
......@@ -339,12 +338,14 @@ static void process_context(struct svcxprt_rdma *xprt,
switch (ctxt->wr_op) {
case IB_WR_SEND:
BUG_ON(ctxt->frmr);
if (ctxt->frmr)
pr_err("svcrdma: SEND: ctxt->frmr != NULL\n");
svc_rdma_put_context(ctxt, 1);
break;
case IB_WR_RDMA_WRITE:
BUG_ON(ctxt->frmr);
if (ctxt->frmr)
pr_err("svcrdma: WRITE: ctxt->frmr != NULL\n");
svc_rdma_put_context(ctxt, 0);
break;
......@@ -353,19 +354,21 @@ static void process_context(struct svcxprt_rdma *xprt,
svc_rdma_put_frmr(xprt, ctxt->frmr);
if (test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags)) {
struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
BUG_ON(!read_hdr);
if (read_hdr) {
spin_lock_bh(&xprt->sc_rq_dto_lock);
set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
list_add_tail(&read_hdr->dto_q,
&xprt->sc_read_complete_q);
spin_unlock_bh(&xprt->sc_rq_dto_lock);
} else {
pr_err("svcrdma: ctxt->read_hdr == NULL\n");
}
svc_xprt_enqueue(&xprt->sc_xprt);
}
svc_rdma_put_context(ctxt, 0);
break;
default:
BUG_ON(1);
printk(KERN_ERR "svcrdma: unexpected completion type, "
"opcode=%d\n",
ctxt->wr_op);
......@@ -513,7 +516,10 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
buflen = 0;
ctxt->direction = DMA_FROM_DEVICE;
for (sge_no = 0; buflen < xprt->sc_max_req_size; sge_no++) {
BUG_ON(sge_no >= xprt->sc_max_sge);
if (sge_no >= xprt->sc_max_sge) {
pr_err("svcrdma: Too many sges (%d)\n", sge_no);
goto err_put_ctxt;
}
page = svc_rdma_get_page();
ctxt->pages[sge_no] = page;
pa = ib_dma_map_page(xprt->sc_cm_id->device,
......@@ -687,7 +693,6 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
{
struct rdma_cm_id *listen_id;
struct svcxprt_rdma *cma_xprt;
struct svc_xprt *xprt;
int ret;
dprintk("svcrdma: Creating RDMA socket\n");
......@@ -698,7 +703,6 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
cma_xprt = rdma_create_xprt(serv, 1);
if (!cma_xprt)
return ERR_PTR(-ENOMEM);
xprt = &cma_xprt->sc_xprt;
listen_id = rdma_create_id(rdma_listen_handler, cma_xprt, RDMA_PS_TCP,
IB_QPT_RC);
......@@ -822,7 +826,7 @@ void svc_rdma_put_frmr(struct svcxprt_rdma *rdma,
if (frmr) {
frmr_unmap_dma(rdma, frmr);
spin_lock_bh(&rdma->sc_frmr_q_lock);
BUG_ON(!list_empty(&frmr->frmr_list));
WARN_ON_ONCE(!list_empty(&frmr->frmr_list));
list_add(&frmr->frmr_list, &rdma->sc_frmr_q);
spin_unlock_bh(&rdma->sc_frmr_q_lock);
}
......@@ -970,10 +974,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
* NB: iWARP requires remote write access for the data sink
* of an RDMA_READ. IB does not.
*/
newxprt->sc_reader = rdma_read_chunk_lcl;
if (devattr.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
newxprt->sc_frmr_pg_list_len =
devattr.max_fast_reg_page_list_len;
newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
newxprt->sc_reader = rdma_read_chunk_frmr;
}
/*
......@@ -1125,7 +1131,9 @@ static void __svc_rdma_free(struct work_struct *work)
dprintk("svcrdma: svc_rdma_free(%p)\n", rdma);
/* We should only be called from kref_put */
BUG_ON(atomic_read(&rdma->sc_xprt.xpt_ref.refcount) != 0);
if (atomic_read(&rdma->sc_xprt.xpt_ref.refcount) != 0)
pr_err("svcrdma: sc_xprt still in use? (%d)\n",
atomic_read(&rdma->sc_xprt.xpt_ref.refcount));
/*
* Destroy queued, but not processed read completions. Note
......@@ -1153,8 +1161,12 @@ static void __svc_rdma_free(struct work_struct *work)
}
/* Warn if we leaked a resource or under-referenced */
WARN_ON(atomic_read(&rdma->sc_ctxt_used) != 0);
WARN_ON(atomic_read(&rdma->sc_dma_used) != 0);
if (atomic_read(&rdma->sc_ctxt_used) != 0)
pr_err("svcrdma: ctxt still in use? (%d)\n",
atomic_read(&rdma->sc_ctxt_used));
if (atomic_read(&rdma->sc_dma_used) != 0)
pr_err("svcrdma: dma still in use? (%d)\n",
atomic_read(&rdma->sc_dma_used));
/* De-allocate fastreg mr */
rdma_dealloc_frmr_q(rdma);
......@@ -1254,7 +1266,6 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
if (test_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags))
return -ENOTCONN;
BUG_ON(wr->send_flags != IB_SEND_SIGNALED);
wr_count = 1;
for (n_wr = wr->next; n_wr; n_wr = n_wr->next)
wr_count++;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment