Commit cc7d1f8f authored by Pekka Enberg's avatar Pekka Enberg Committed by Linus Torvalds

[PATCH] VFS: update overview document

This patch updates the Documentation/filesystems/vfs.txt document.  I
rearranged and rewrote parts of the introduction chapter and added better
headings for each section.  I also added a description for the inode
rename() operation which was missing and added links to some useful
external VFS documentation.
Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent b8887e6e
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
Original author: Richard Gooch <rgooch@atnf.csiro.au> Original author: Richard Gooch <rgooch@atnf.csiro.au>
Last updated on August 25, 2005 Last updated on October 28, 2005
Copyright (C) 1999 Richard Gooch Copyright (C) 1999 Richard Gooch
Copyright (C) 2005 Pekka Enberg Copyright (C) 2005 Pekka Enberg
...@@ -11,62 +11,61 @@ ...@@ -11,62 +11,61 @@
This file is released under the GPLv2. This file is released under the GPLv2.
What is it? Introduction
=========== ============
The Virtual File System (otherwise known as the Virtual Filesystem The Virtual File System (also known as the Virtual Filesystem Switch)
Switch) is the software layer in the kernel that provides the is the software layer in the kernel that provides the filesystem
filesystem interface to userspace programs. It also provides an interface to userspace programs. It also provides an abstraction
abstraction within the kernel which allows different filesystem within the kernel which allows different filesystem implementations to
implementations to coexist. coexist.
VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so
on are called from a process context. Filesystem locking is described
in the document Documentation/filesystems/Locking.
A Quick Look At How It Works
============================
In this section I'll briefly describe how things work, before Directory Entry Cache (dcache)
launching into the details. I'll start with describing what happens ------------------------------
when user programs open and manipulate files, and then look from the
other view which is how a filesystem is supported and subsequently
mounted.
Opening a File
--------------
The VFS implements the open(2), stat(2), chmod(2) and similar system
calls. The pathname argument is used by the VFS to search through the
directory entry cache (dentry cache or "dcache"). This provides a very
fast look-up mechanism to translate a pathname (filename) into a
specific dentry.
An individual dentry usually has a pointer to an inode. Inodes are the
things that live on disc drives, and can be regular files (you know:
those things that you write data into), directories, FIFOs and other
beasts. Dentries live in RAM and are never saved to disc: they exist
only for performance. Inodes live on disc and are copied into memory
when required. Later any changes are written back to disc. The inode
that lives in RAM is a VFS inode, and it is this which the dentry
points to. A single inode can be pointed to by multiple dentries
(think about hardlinks).
The dcache is meant to be a view into your entire filespace. Unlike
Linus, most of us losers can't fit enough dentries into RAM to cover
all of our filespace, so the dcache has bits missing. In order to
resolve your pathname into a dentry, the VFS may have to resort to
creating dentries along the way, and then loading the inode. This is
done by looking up the inode.
To look up an inode (usually read from disc) requires that the VFS
calls the lookup() method of the parent directory inode. This method
is installed by the specific filesystem implementation that the inode
lives in. There will be more on this later.
Once the VFS has the required dentry (and hence the inode), we can do The VFS implements the open(2), stat(2), chmod(2), and similar system
all those boring things like open(2) the file, or stat(2) it to peek calls. The pathname argument that is passed to them is used by the VFS
at the inode data. The stat(2) operation is fairly simple: once the to search through the directory entry cache (also known as the dentry
VFS has the dentry, it peeks at the inode data and passes some of it cache or dcache). This provides a very fast look-up mechanism to
back to userspace. translate a pathname (filename) into a specific dentry. Dentries live
in RAM and are never saved to disc: they exist only for performance.
The dentry cache is meant to be a view into your entire filespace. As
most computers cannot fit all dentries in the RAM at the same time,
some bits of the cache are missing. In order to resolve your pathname
into a dentry, the VFS may have to resort to creating dentries along
the way, and then loading the inode. This is done by looking up the
inode.
The Inode Object
----------------
An individual dentry usually has a pointer to an inode. Inodes are
filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems)
or in the memory (for pseudo filesystems). Inodes that live on the
disc are copied into the memory when required and changes to the inode
are written back to disc. A single inode can be pointed to by multiple
dentries (hard links, for example, do this).
To look up an inode requires that the VFS calls the lookup() method of
the parent directory inode. This method is installed by the specific
filesystem implementation that the inode lives in. Once the VFS has
the required dentry (and hence the inode), we can do all those boring
things like open(2) the file, or stat(2) it to peek at the inode
data. The stat(2) operation is fairly simple: once the VFS has the
dentry, it peeks at the inode data and passes some of it back to
userspace.
The File Object
---------------
Opening a file requires another operation: allocation of a file Opening a file requires another operation: allocation of a file
structure (this is the kernel-side implementation of file structure (this is the kernel-side implementation of file
...@@ -74,51 +73,39 @@ descriptors). The freshly allocated file structure is initialized with ...@@ -74,51 +73,39 @@ descriptors). The freshly allocated file structure is initialized with
a pointer to the dentry and a set of file operation member functions. a pointer to the dentry and a set of file operation member functions.
These are taken from the inode data. The open() file method is then These are taken from the inode data. The open() file method is then
called so the specific filesystem implementation can do it's work. You called so the specific filesystem implementation can do it's work. You
can see that this is another switch performed by the VFS. can see that this is another switch performed by the VFS. The file
structure is placed into the file descriptor table for the process.
The file structure is placed into the file descriptor table for the
process.
Reading, writing and closing files (and other assorted VFS operations) Reading, writing and closing files (and other assorted VFS operations)
is done by using the userspace file descriptor to grab the appropriate is done by using the userspace file descriptor to grab the appropriate
file structure, and then calling the required file structure method file structure, and then calling the required file structure method to
function to do whatever is required. do whatever is required. For as long as the file is open, it keeps the
dentry in use, which in turn means that the VFS inode is still in use.
For as long as the file is open, it keeps the dentry "open" (in use),
which in turn means that the VFS inode is still in use.
All VFS system calls (i.e. open(2), stat(2), read(2), write(2),
chmod(2) and so on) are called from a process context. You should
assume that these calls are made without any kernel locks being
held. This means that the processes may be executing the same piece of
filesystem or driver code at the same time, on different
processors. You should ensure that access to shared resources is
protected by appropriate locks.
Registering and Mounting a Filesystem Registering and Mounting a Filesystem
------------------------------------- =====================================
If you want to support a new kind of filesystem in the kernel, all you To register and unregister a filesystem, use the following API
need to do is call register_filesystem(). You pass a structure functions:
describing the filesystem implementation (struct file_system_type)
which is then added to an internal table of supported filesystems. You
can do:
% cat /proc/filesystems #include <linux/fs.h>
to see what filesystems are currently available on your system. extern int register_filesystem(struct file_system_type *);
extern int unregister_filesystem(struct file_system_type *);
When a request is made to mount a block device onto a directory in The passed struct file_system_type describes your filesystem. When a
your filespace the VFS will call the appropriate method for the request is made to mount a device onto a directory in your filespace,
specific filesystem. The dentry for the mount point will then be the VFS will call the appropriate get_sb() method for the specific
updated to point to the root inode for the new filesystem. filesystem. The dentry for the mount point will then be updated to
point to the root inode for the new filesystem.
It's now time to look at things in more detail. You can see all filesystems that are registered to the kernel in the
file /proc/filesystems.
struct file_system_type struct file_system_type
======================= -----------------------
This describes the filesystem. As of kernel 2.6.13, the following This describes the filesystem. As of kernel 2.6.13, the following
members are defined: members are defined:
...@@ -197,8 +184,14 @@ A fill_super() method implementation has the following arguments: ...@@ -197,8 +184,14 @@ A fill_super() method implementation has the following arguments:
int silent: whether or not to be silent on error int silent: whether or not to be silent on error
The Superblock Object
=====================
A superblock object represents a mounted filesystem.
struct super_operations struct super_operations
======================= -----------------------
This describes how the VFS can manipulate the superblock of your This describes how the VFS can manipulate the superblock of your
filesystem. As of kernel 2.6.13, the following members are defined: filesystem. As of kernel 2.6.13, the following members are defined:
...@@ -286,9 +279,9 @@ or bottom half). ...@@ -286,9 +279,9 @@ or bottom half).
a superblock. The second parameter indicates whether the method a superblock. The second parameter indicates whether the method
should wait until the write out has been completed. Optional. should wait until the write out has been completed. Optional.
write_super_lockfs: called when VFS is locking a filesystem and forcing write_super_lockfs: called when VFS is locking a filesystem and
it into a consistent state. This function is currently used by the forcing it into a consistent state. This method is currently
Logical Volume Manager (LVM). used by the Logical Volume Manager (LVM).
unlockfs: called when VFS is unlocking a filesystem and making it writable unlockfs: called when VFS is unlocking a filesystem and making it writable
again. again.
...@@ -317,8 +310,14 @@ field. This is a pointer to a "struct inode_operations" which ...@@ -317,8 +310,14 @@ field. This is a pointer to a "struct inode_operations" which
describes the methods that can be performed on individual inodes. describes the methods that can be performed on individual inodes.
The Inode Object
================
An inode object represents an object within the filesystem.
struct inode_operations struct inode_operations
======================= -----------------------
This describes how the VFS can manipulate an inode in your This describes how the VFS can manipulate an inode in your
filesystem. As of kernel 2.6.13, the following members are defined: filesystem. As of kernel 2.6.13, the following members are defined:
...@@ -394,51 +393,62 @@ otherwise noted. ...@@ -394,51 +393,62 @@ otherwise noted.
will probably need to call d_instantiate() just as you would will probably need to call d_instantiate() just as you would
in the create() method in the create() method
rename: called by the rename(2) system call to rename the object to
have the parent and name given by the second inode and dentry.
readlink: called by the readlink(2) system call. Only required if readlink: called by the readlink(2) system call. Only required if
you want to support reading symbolic links you want to support reading symbolic links
follow_link: called by the VFS to follow a symbolic link to the follow_link: called by the VFS to follow a symbolic link to the
inode it points to. Only required if you want to support inode it points to. Only required if you want to support
symbolic links. This function returns a void pointer cookie symbolic links. This method returns a void pointer cookie
that is passed to put_link(). that is passed to put_link().
put_link: called by the VFS to release resources allocated by put_link: called by the VFS to release resources allocated by
follow_link(). The cookie returned by follow_link() is passed to follow_link(). The cookie returned by follow_link() is passed
to this function as the last parameter. It is used by filesystems to to this method as the last parameter. It is used by
such as NFS where page cache is not stable (i.e. page that was filesystems such as NFS where page cache is not stable
installed when the symbolic link walk started might not be in the (i.e. page that was installed when the symbolic link walk
page cache at the end of the walk). started might not be in the page cache at the end of the
walk).
truncate: called by the VFS to change the size of a file. The i_size
field of the inode is set to the desired size by the VFS before truncate: called by the VFS to change the size of a file. The
this function is called. This function is called by the truncate(2) i_size field of the inode is set to the desired size by the
system call and related functionality. VFS before this method is called. This method is called by
the truncate(2) system call and related functionality.
permission: called by the VFS to check for access rights on a POSIX-like permission: called by the VFS to check for access rights on a POSIX-like
filesystem. filesystem.
setattr: called by the VFS to set attributes for a file. This function is setattr: called by the VFS to set attributes for a file. This method
called by chmod(2) and related system calls. is called by chmod(2) and related system calls.
getattr: called by the VFS to get attributes of a file. This function is getattr: called by the VFS to get attributes of a file. This method
called by stat(2) and related system calls. is called by stat(2) and related system calls.
setxattr: called by the VFS to set an extended attribute for a file. setxattr: called by the VFS to set an extended attribute for a file.
Extended attribute is a name:value pair associated with an inode. This Extended attribute is a name:value pair associated with an
function is called by setxattr(2) system call. inode. This method is called by setxattr(2) system call.
getxattr: called by the VFS to retrieve the value of an extended
attribute name. This method is called by getxattr(2) function
call.
listxattr: called by the VFS to list all extended attributes for a
given file. This method is called by listxattr(2) system call.
getxattr: called by the VFS to retrieve the value of an extended attribute removexattr: called by the VFS to remove an extended attribute from
name. This function is called by getxattr(2) function call. a file. This method is called by removexattr(2) system call.
listxattr: called by the VFS to list all extended attributes for a given
file. This function is called by listxattr(2) system call.
removexattr: called by the VFS to remove an extended attribute from a file. The Address Space Object
This function is called by removexattr(2) system call. ========================
The address space object is used to identify pages in the page cache.
struct address_space_operations struct address_space_operations
=============================== -------------------------------
This describes how the VFS can manipulate mapping of a file to page cache in This describes how the VFS can manipulate mapping of a file to page cache in
your filesystem. As of kernel 2.6.13, the following members are defined: your filesystem. As of kernel 2.6.13, the following members are defined:
...@@ -502,8 +512,14 @@ struct address_space_operations { ...@@ -502,8 +512,14 @@ struct address_space_operations {
it. An example implementation can be found in fs/ext2/xip.c. it. An example implementation can be found in fs/ext2/xip.c.
The File Object
===============
A file object represents a file opened by a process.
struct file_operations struct file_operations
====================== ----------------------
This describes how the VFS can manipulate an open file. As of kernel This describes how the VFS can manipulate an open file. As of kernel
2.6.13, the following members are defined: 2.6.13, the following members are defined:
...@@ -661,7 +677,7 @@ of child dentries. Child dentries are basically like files in a ...@@ -661,7 +677,7 @@ of child dentries. Child dentries are basically like files in a
directory. directory.
Directory Entry Cache APIs Directory Entry Cache API
-------------------------- --------------------------
There are a number of functions defined which permit a filesystem to There are a number of functions defined which permit a filesystem to
...@@ -880,3 +896,22 @@ Papers and other documentation on dcache locking ...@@ -880,3 +896,22 @@ Papers and other documentation on dcache locking
1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124). 1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124).
2. http://lse.sourceforge.net/locking/dcache/dcache.html 2. http://lse.sourceforge.net/locking/dcache/dcache.html
Resources
=========
(Note some of these resources are not up-to-date with the latest kernel
version.)
Creating Linux virtual filesystems. 2002
<http://lwn.net/Articles/13325/>
The Linux Virtual File-system Layer by Neil Brown. 1999
<http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html>
A tour of the Linux VFS by Michael K. Johnson. 1996
<http://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html>
A small trail through the Linux kernel by Andries Brouwer. 2001
<http://www.win.tue.nl/~aeb/linux/vfs/trail.html>
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment