Commit c8a6ba01 authored by David Howells's avatar David Howells Committed by Linus Torvalds

[PATCH] intrinsic automount and mountpoint degradation support

Here's a patch that I worked out with Al Viro that adds support for a
filesystem (such as kAFS) to perform automounting intrinsically without the
need for a userspace daemon.  It also adds support for such mountpoints to be
degraded at the filesystem's behest until they've been untouched long enough
that they'll be removed.

I've a patch (to follow) that removes some #ifdef's from fs/afs/* thus
allowing it to make use of this facility.

There are five pieces to this:

 (1) Any interested filesystem needs to have at least one list to which
     expirable mountpoints can be added.

     Access to this list is governed by the vfsmount_lock.

 (2) When a filesystem wants to create an expirable mount, it calls
     do_kern_mount() to get a handle on the filesystem it wants mounting, and
     then calls do_add_mount() to mount that filesystem on the designated
     mountpoint, supplying the list mentioned in (1) to which the vfsmount
     will be added.

     In kAFS's case, the mountpoint is a directory with a follow_link() method
     defined (fs/afs/mntpt.c). This uses the struct nameidata supplied as an
     argument as a determination of where the new filesystem should be
     mounted.

 (3) When something using a vfsmount finishes dealing with it, it calls
     mntput(). This unmarks the vfsmount for immediate expiry.

     There are two criteria for determining if a vfsmount may be expired - it
     mustn't be marked as in use for anything other than being a child of
     another vfsmount, and it must have an expiry mark against it already.

 (4) The filesystem then determines the policy on expiring the mounts created
     in (2). When it feels the need to, it passes the list mentioned in (1) to
     mark_mounts_for_expiry() to request everything on the list be expired.

     This function examines each mount listed. If the vfsmount meets the
     criteria mentioned in (3), then the vfsmount is deleted from the
     namespace and disposed of as for unmounting; otherwise the vfsmount is
     left untouched apart from now bearing an expiration mark if it didn't
     before.

     kAFS's expiration policy is simply to invoke this process at regular
     intervals for all the mounts on its list.

 (5) An expiration facility is also provided to userspace: by calling umount()
     with a MNT_EXPIRE flag, it can make a request to unmount only if the
     mountpoint hasn't been used since the last request and isn't in use now.

     This allows expiration to be driven by userspace instead of by the
     kernel if that is desirable.

     This also means that do_umount() has to use a different version of
     path_release() to everyone else... it can't call mntput() as that clears
     the expiration flag, thus rendering this unachievable; so it's version of
     path_release() calls _mntput(), which doesn't do the clear.

My original idea was to give the kernel more knowledge of automounted
things. This avoids a certain problem with stat() on a mountpoint causing it
to mount (for example, do "ls -l /afs" on a machine with kAFS), but Al wanted
it done this way.

> Why is autofs unsuitable?

Because:

 (1) Autofs is flat; AFS requires a tree - mounts on mounts on mounts on
     mounts...

 (2) AFS holds the data as to what the mountpoints are and where they go, and
     these may be cross-links to subtrees beyond your control. It's also not
     trivial to extract a list of mountpoints as is required for autofs.

 (3) Autofs is not namespace safe.

 (4) Ducking back to userspace to get that to do the mount is pretty tricky if
     namespaces are involved.

In fact, autofs may well want to make use of this facility.
Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent 694effe7
Support is available for filesystems that wish to do automounting support (such
as kAFS which can be found in fs/afs/). This facility includes allowing
in-kernel mounts to be performed and mountpoint degradation to be
requested. The latter can also be requested by userspace.
======================
IN-KERNEL AUTOMOUNTING
======================
A filesystem can now mount another filesystem on one of its directories by the
following procedure:
(1) Give the directory a follow_link() operation.
When the directory is accessed, the follow_link op will be called, and
it will be provided with the location of the mountpoint in the nameidata
structure (vfsmount and dentry).
(2) Have the follow_link() op do the following steps:
(a) Call do_kern_mount() to call the appropriate filesystem to set up a
superblock and gain a vfsmount structure representing it.
(b) Copy the nameidata provided as an argument and substitute the dentry
argument into it the copy.
(c) Call do_add_mount() to install the new vfsmount into the namespace's
mountpoint tree, thus making it accessible to userspace. Use the
nameidata set up in (b) as the destination.
If the mountpoint will be automatically expired, then do_add_mount()
should also be given the location of an expiration list (see further
down).
(d) Release the path in the nameidata argument and substitute in the new
vfsmount and its root dentry. The ref counts on these will need
incrementing.
Then from userspace, you can just do something like:
[root@andromeda root]# mount -t afs \#root.afs. /afs
[root@andromeda root]# ls /afs
asd cambridge cambridge.redhat.com grand.central.org
[root@andromeda root]# ls /afs/cambridge
afsdoc
[root@andromeda root]# ls /afs/cambridge/afsdoc/
ChangeLog html LICENSE pdf RELNOTES-1.2.2
And then if you look in the mountpoint catalogue, you'll see something like:
[root@andromeda root]# cat /proc/mounts
...
#root.afs. /afs afs rw 0 0
#root.cell. /afs/cambridge.redhat.com afs rw 0 0
#afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0
===========================
AUTOMATIC MOUNTPOINT EXPIRY
===========================
Automatic expiration of mountpoints is easy, provided you've mounted the
mountpoint to be expired in the automounting procedure outlined above.
To do expiration, you need to follow these steps:
(3) Create at least one list off which the vfsmounts to be expired can be
hung. Access to this list will be governed by the vfsmount_lock.
(4) In step (2c) above, the call to do_add_mount() should be provided with a
pointer to this list. It will hang the vfsmount off of it if it succeeds.
(5) When you want mountpoints to be expired, call mark_mounts_for_expiry()
with a pointer to this list. This will process the list, marking every
vfsmount thereon for potential expiry on the next call.
If a vfsmount was already flagged for expiry, and if its usage count is 1
(it's only referenced by its parent vfsmount), then it will be deleted
from the namespace and thrown away (effectively unmounted).
It may prove simplest to simply call this at regular intervals, using
some sort of timed event to drive it.
The expiration flag is cleared by calls to mntput. This means that expiration
will only happen on the second expiration request after the last time the
mountpoint was accessed.
If a mountpoint is moved, it gets removed from the expiration list. If a bind
mount is made on an expirable mount, the new vfsmount will not be on the
expiration list and will not expire.
If a namespace is copied, all mountpoints contained therein will be copied,
and the copies of those that are on an expiration list will be added to the
same expiration list.
=======================
USERSPACE DRIVEN EXPIRY
=======================
As an alternative, it is possible for userspace to request expiry of any
mountpoint (though some will be rejected - the current process's idea of the
rootfs for example). It does this by passing the MNT_EXPIRE flag to
umount(). This flag is considered incompatible with MNT_FORCE and MNT_DETACH.
If the mountpoint in question is in referenced by something other than
umount() or its parent mountpoint, an EBUSY error will be returned and the
mountpoint will not be marked for expiration or unmounted.
If the mountpoint was not already marked for expiry at that time, an EAGAIN
error will be given and it won't be unmounted.
Otherwise if it was already marked and it wasn't referenced, unmounting will
take place as usual.
Again, the expiration flag is cleared every time anything other than umount()
looks at a mountpoint.
......@@ -278,6 +278,16 @@ void path_release(struct nameidata *nd)
mntput(nd->mnt);
}
/*
* umount() mustn't call path_release()/mntput() as that would clear
* mnt_expiry_mark
*/
void path_release_on_umount(struct nameidata *nd)
{
dput(nd->dentry);
_mntput(nd->mnt);
}
/*
* Internal lookup() using the new generic dcache.
* SMP-safe
......
......@@ -60,6 +60,7 @@ struct vfsmount *alloc_vfsmnt(const char *name)
INIT_LIST_HEAD(&mnt->mnt_child);
INIT_LIST_HEAD(&mnt->mnt_mounts);
INIT_LIST_HEAD(&mnt->mnt_list);
INIT_LIST_HEAD(&mnt->mnt_fslink);
if (name) {
int size = strlen(name)+1;
char *newname = kmalloc(size, GFP_KERNEL);
......@@ -106,13 +107,9 @@ struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
EXPORT_SYMBOL(lookup_mnt);
static int check_mnt(struct vfsmount *mnt)
static inline int check_mnt(struct vfsmount *mnt)
{
spin_lock(&vfsmount_lock);
while (mnt->mnt_parent != mnt)
mnt = mnt->mnt_parent;
spin_unlock(&vfsmount_lock);
return mnt == current->namespace->root;
return mnt->mnt_namespace == current->namespace;
}
static void detach_mnt(struct vfsmount *mnt, struct nameidata *old_nd)
......@@ -164,6 +161,14 @@ clone_mnt(struct vfsmount *old, struct dentry *root)
mnt->mnt_root = dget(root);
mnt->mnt_mountpoint = mnt->mnt_root;
mnt->mnt_parent = mnt;
mnt->mnt_namespace = old->mnt_namespace;
/* stick the duplicate mount on the same expiry list
* as the original if that was on one */
spin_lock(&vfsmount_lock);
if (!list_empty(&old->mnt_fslink))
list_add(&mnt->mnt_fslink, &old->mnt_fslink);
spin_unlock(&vfsmount_lock);
}
return mnt;
}
......@@ -346,6 +351,7 @@ void umount_tree(struct vfsmount *mnt)
while (!list_empty(&kill)) {
mnt = list_entry(kill.next, struct vfsmount, mnt_list);
list_del_init(&mnt->mnt_list);
list_del_init(&mnt->mnt_fslink);
if (mnt->mnt_parent == mnt) {
spin_unlock(&vfsmount_lock);
} else {
......@@ -368,6 +374,24 @@ static int do_umount(struct vfsmount *mnt, int flags)
if (retval)
return retval;
/*
* Allow userspace to request a mountpoint be expired rather than
* unmounting unconditionally. Unmount only happens if:
* (1) the mark is already set (the mark is cleared by mntput())
* (2) the usage count == 1 [parent vfsmount] + 1 [sys_umount]
*/
if (flags & MNT_EXPIRE) {
if (mnt == current->fs->rootmnt ||
flags & (MNT_FORCE | MNT_DETACH))
return -EINVAL;
if (atomic_read(&mnt->mnt_count) != 2)
return -EBUSY;
if (!xchg(&mnt->mnt_expiry_mark, 1))
return -EAGAIN;
}
/*
* If we may have to abort operations to get out of this
* mount, and they will themselves hold resources we must
......@@ -461,7 +485,7 @@ asmlinkage long sys_umount(char __user * name, int flags)
retval = do_umount(nd.mnt, flags);
dput_and_out:
path_release(&nd);
path_release_on_umount(&nd);
out:
return retval;
}
......@@ -618,6 +642,11 @@ static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
}
if (mnt) {
/* stop bind mounts from expiring */
spin_lock(&vfsmount_lock);
list_del_init(&mnt->mnt_fslink);
spin_unlock(&vfsmount_lock);
err = graft_tree(mnt, nd);
if (err) {
spin_lock(&vfsmount_lock);
......@@ -638,7 +667,8 @@ static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
* on it - tough luck.
*/
static int do_remount(struct nameidata *nd,int flags,int mnt_flags,void *data)
static int do_remount(struct nameidata *nd, int flags, int mnt_flags,
void *data)
{
int err;
struct super_block * sb = nd->mnt->mnt_sb;
......@@ -710,6 +740,10 @@ static int do_move_mount(struct nameidata *nd, char *old_name)
detach_mnt(old_nd.mnt, &parent_nd);
attach_mnt(old_nd.mnt, nd);
/* if the mount is moved, it should no longer be expire
* automatically */
list_del_init(&old_nd.mnt->mnt_fslink);
out2:
spin_unlock(&vfsmount_lock);
out1:
......@@ -722,11 +756,14 @@ static int do_move_mount(struct nameidata *nd, char *old_name)
return err;
}
static int do_add_mount(struct nameidata *nd, char *type, int flags,
/*
* create a new mount for userspace and request it to be added into the
* namespace's tree
*/
static int do_new_mount(struct nameidata *nd, char *type, int flags,
int mnt_flags, char *name, void *data)
{
struct vfsmount *mnt;
int err;
if (!type || !memchr(type, 0, PAGE_SIZE))
return -EINVAL;
......@@ -736,9 +773,20 @@ static int do_add_mount(struct nameidata *nd, char *type, int flags,
return -EPERM;
mnt = do_kern_mount(type, flags, name, data);
err = PTR_ERR(mnt);
if (IS_ERR(mnt))
goto out;
return PTR_ERR(mnt);
return do_add_mount(mnt, nd, mnt_flags, NULL);
}
/*
* add a mount into a namespace's mount tree
* - provide the option of adding the new mount to an expiration list
*/
int do_add_mount(struct vfsmount *newmnt, struct nameidata *nd,
int mnt_flags, struct list_head *fslist)
{
int err;
down_write(&current->namespace->sem);
/* Something was mounted here while we slept */
......@@ -750,22 +798,143 @@ static int do_add_mount(struct nameidata *nd, char *type, int flags,
/* Refuse the same filesystem on the same mount point */
err = -EBUSY;
if (nd->mnt->mnt_sb == mnt->mnt_sb && nd->mnt->mnt_root == nd->dentry)
if (nd->mnt->mnt_sb == newmnt->mnt_sb &&
nd->mnt->mnt_root == nd->dentry)
goto unlock;
err = -EINVAL;
if (S_ISLNK(mnt->mnt_root->d_inode->i_mode))
if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
goto unlock;
mnt->mnt_flags = mnt_flags;
err = graft_tree(mnt, nd);
newmnt->mnt_flags = mnt_flags;
err = graft_tree(newmnt, nd);
if (err == 0 && fslist) {
/* add to the specified expiration list */
spin_lock(&vfsmount_lock);
list_add_tail(&newmnt->mnt_fslink, fslist);
spin_unlock(&vfsmount_lock);
}
unlock:
up_write(&current->namespace->sem);
mntput(mnt);
out:
mntput(newmnt);
return err;
}
EXPORT_SYMBOL_GPL(do_add_mount);
/*
* process a list of expirable mountpoints with the intent of discarding any
* mountpoints that aren't in use and haven't been touched since last we came
* here
*/
void mark_mounts_for_expiry(struct list_head *mounts)
{
struct namespace *namespace;
struct list_head graveyard, *_p, *_n;
struct vfsmount *mnt;
if (list_empty(mounts))
return;
INIT_LIST_HEAD(&graveyard);
spin_lock(&vfsmount_lock);
/* extract from the expiration list every vfsmount that matches the
* following criteria:
* - only referenced by its parent vfsmount
* - still marked for expiry (marked on the last call here; marks are
* cleared by mntput())
*/
list_for_each_safe(_p, _n, mounts) {
mnt = list_entry(_p, struct vfsmount, mnt_fslink);
if (!xchg(&mnt->mnt_expiry_mark, 1) ||
atomic_read(&mnt->mnt_count) != 1)
continue;
mntget(mnt);
list_move(&mnt->mnt_fslink, &graveyard);
}
/*
* go through the vfsmounts we've just consigned to the graveyard to
* - check that they're still dead
* - delete the vfsmount from the appropriate namespace under lock
* - dispose of the corpse
*/
while (!list_empty(&graveyard)) {
mnt = list_entry(graveyard.next, struct vfsmount, mnt_fslink);
list_del_init(&mnt->mnt_fslink);
/* don't do anything if the namespace is dead - all the
* vfsmounts from it are going away anyway */
namespace = mnt->mnt_namespace;
if (!namespace || atomic_read(&namespace->count) <= 0)
continue;
get_namespace(namespace);
spin_unlock(&vfsmount_lock);
down_write(&namespace->sem);
spin_lock(&vfsmount_lock);
/* check that it is still dead: the count should now be 2 - as
* contributed by the vfsmount parent and the mntget above */
if (atomic_read(&mnt->mnt_count) == 2) {
struct vfsmount *xdmnt;
struct dentry *xdentry;
/* delete from the namespace */
list_del_init(&mnt->mnt_list);
list_del_init(&mnt->mnt_child);
list_del_init(&mnt->mnt_hash);
mnt->mnt_mountpoint->d_mounted--;
xdentry = mnt->mnt_mountpoint;
mnt->mnt_mountpoint = mnt->mnt_root;
xdmnt = mnt->mnt_parent;
mnt->mnt_parent = mnt;
spin_unlock(&vfsmount_lock);
mntput(xdmnt);
dput(xdentry);
/* now lay it to rest if this was the last ref on the
* superblock */
if (atomic_read(&mnt->mnt_sb->s_active) == 1) {
/* last instance - try to be smart */
lock_kernel();
DQUOT_OFF(mnt->mnt_sb);
acct_auto_close(mnt->mnt_sb);
unlock_kernel();
}
mntput(mnt);
}
else {
/* someone brought it back to life whilst we didn't
* have any locks held so return it to the expiration
* list */
list_add_tail(&mnt->mnt_fslink, mounts);
spin_unlock(&vfsmount_lock);
}
up_write(&namespace->sem);
mntput(mnt);
put_namespace(namespace);
spin_lock(&vfsmount_lock);
}
spin_unlock(&vfsmount_lock);
}
EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
int copy_mount_options (const void __user *data, unsigned long *where)
{
int i;
......@@ -860,7 +1029,7 @@ long do_mount(char * dev_name, char * dir_name, char *type_page,
else if (flags & MS_MOVE)
retval = do_move_mount(&nd, dev_name);
else
retval = do_add_mount(&nd, type_page, flags, mnt_flags,
retval = do_new_mount(&nd, type_page, flags, mnt_flags,
dev_name, data_page);
dput_out:
path_release(&nd);
......@@ -1185,6 +1354,7 @@ static void __init init_mount_tree(void)
init_rwsem(&namespace->sem);
list_add(&mnt->mnt_list, &namespace->list);
namespace->root = mnt;
mnt->mnt_namespace = namespace;
init_task.namespace = namespace;
read_lock(&tasklist_lock);
......@@ -1252,8 +1422,15 @@ void __init mnt_init(unsigned long mempages)
void __put_namespace(struct namespace *namespace)
{
struct vfsmount *mnt;
down_write(&namespace->sem);
spin_lock(&vfsmount_lock);
list_for_each_entry(mnt, &namespace->list, mnt_list) {
mnt->mnt_namespace = NULL;
}
umount_tree(namespace->root);
spin_unlock(&vfsmount_lock);
up_write(&namespace->sem);
......
......@@ -793,6 +793,7 @@ do_kern_mount(const char *fstype, int flags, const char *name, void *data)
mnt->mnt_root = dget(sb->s_root);
mnt->mnt_mountpoint = sb->s_root;
mnt->mnt_parent = mnt;
mnt->mnt_namespace = current->namespace;
up_write(&sb->s_umount);
put_filesystem(type);
return mnt;
......@@ -809,6 +810,8 @@ do_kern_mount(const char *fstype, int flags, const char *name, void *data)
return (struct vfsmount *)sb;
}
EXPORT_SYMBOL_GPL(do_kern_mount);
struct vfsmount *kern_mount(struct file_system_type *type)
{
return do_kern_mount(type->name, 0, type->name, NULL);
......
......@@ -716,6 +716,7 @@ extern int send_sigurg(struct fown_struct *fown);
#define MNT_FORCE 0x00000001 /* Attempt to forcibily umount */
#define MNT_DETACH 0x00000002 /* Just detach from the tree */
#define MNT_EXPIRE 0x00000004 /* Mark for expiry */
extern struct list_head super_blocks;
extern spinlock_t sb_lock;
......
......@@ -29,8 +29,11 @@ struct vfsmount
struct list_head mnt_child; /* and going through their mnt_child */
atomic_t mnt_count;
int mnt_flags;
int mnt_expiry_mark; /* true if marked for expiry */
char *mnt_devname; /* Name of device e.g. /dev/dsk/hda1 */
struct list_head mnt_list;
struct list_head mnt_fslink; /* link in fs-specific expiry list */
struct namespace *mnt_namespace; /* containing namespace */
};
static inline struct vfsmount *mntget(struct vfsmount *mnt)
......@@ -42,7 +45,7 @@ static inline struct vfsmount *mntget(struct vfsmount *mnt)
extern void __mntput(struct vfsmount *mnt);
static inline void mntput(struct vfsmount *mnt)
static inline void _mntput(struct vfsmount *mnt)
{
if (mnt) {
if (atomic_dec_and_test(&mnt->mnt_count))
......@@ -50,10 +53,26 @@ static inline void mntput(struct vfsmount *mnt)
}
}
static inline void mntput(struct vfsmount *mnt)
{
if (mnt) {
mnt->mnt_expiry_mark = 0;
_mntput(mnt);
}
}
extern void free_vfsmnt(struct vfsmount *mnt);
extern struct vfsmount *alloc_vfsmnt(const char *name);
extern struct vfsmount *do_kern_mount(const char *fstype, int flags,
const char *name, void *data);
struct nameidata;
extern int do_add_mount(struct vfsmount *newmnt, struct nameidata *nd,
int mnt_flags, struct list_head *fslist);
extern void mark_mounts_for_expiry(struct list_head *mounts);
extern spinlock_t vfsmount_lock;
#endif
......
......@@ -61,6 +61,7 @@ extern int FASTCALL(path_lookup(const char *, unsigned, struct nameidata *));
extern int FASTCALL(path_walk(const char *, struct nameidata *));
extern int FASTCALL(link_path_walk(const char *, struct nameidata *));
extern void path_release(struct nameidata *);
extern void path_release_on_umount(struct nameidata *);
extern struct dentry * lookup_one_len(const char *, struct dentry *, int);
extern struct dentry * lookup_hash(struct qstr *, struct dentry *);
......
......@@ -14,7 +14,7 @@ struct namespace {
extern void umount_tree(struct vfsmount *);
extern int copy_namespace(int, struct task_struct *);
void __put_namespace(struct namespace *namespace);
extern void __put_namespace(struct namespace *namespace);
static inline void put_namespace(struct namespace *namespace)
{
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment