Commit 90caa781 authored by Tobin C. Harding's avatar Tobin C. Harding Committed by Jonathan Corbet

docs: filesystems: vfs: Use 72 character column width

In preparation for conversion to RST format use the kernels favoured
documentation column width.  If we are going to do this we might as well
do it thoroughly.  Just do the paragraphs (not the indented stuff), the
rest will be done during indentation fix up patch.

This patch is whitespace only, no textual changes.

Use 72 character column width for all paragraph sections.
Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: default avatarTobin C. Harding <tobin@kernel.org>
Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent 4ee33ea4
...@@ -12,15 +12,14 @@ ...@@ -12,15 +12,14 @@
Introduction Introduction
============ ============
The Virtual File System (also known as the Virtual Filesystem Switch) The Virtual File System (also known as the Virtual Filesystem Switch) is
is the software layer in the kernel that provides the filesystem the software layer in the kernel that provides the filesystem interface
interface to userspace programs. It also provides an abstraction to userspace programs. It also provides an abstraction within the
within the kernel which allows different filesystem implementations to kernel which allows different filesystem implementations to coexist.
coexist.
VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
on are called from a process context. Filesystem locking is described are called from a process context. Filesystem locking is described in
in the document Documentation/filesystems/Locking. the document Documentation/filesystems/Locking.
Directory Entry Cache (dcache) Directory Entry Cache (dcache)
...@@ -34,11 +33,10 @@ translate a pathname (filename) into a specific dentry. Dentries live ...@@ -34,11 +33,10 @@ translate a pathname (filename) into a specific dentry. Dentries live
in RAM and are never saved to disc: they exist only for performance. in RAM and are never saved to disc: they exist only for performance.
The dentry cache is meant to be a view into your entire filespace. As The dentry cache is meant to be a view into your entire filespace. As
most computers cannot fit all dentries in the RAM at the same time, most computers cannot fit all dentries in the RAM at the same time, some
some bits of the cache are missing. In order to resolve your pathname bits of the cache are missing. In order to resolve your pathname into a
into a dentry, the VFS may have to resort to creating dentries along dentry, the VFS may have to resort to creating dentries along the way,
the way, and then loading the inode. This is done by looking up the and then loading the inode. This is done by looking up the inode.
inode.
The Inode Object The Inode Object
...@@ -46,33 +44,32 @@ The Inode Object ...@@ -46,33 +44,32 @@ The Inode Object
An individual dentry usually has a pointer to an inode. Inodes are An individual dentry usually has a pointer to an inode. Inodes are
filesystem objects such as regular files, directories, FIFOs and other filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems) beasts. They live either on the disc (for block device filesystems) or
or in the memory (for pseudo filesystems). Inodes that live on the in the memory (for pseudo filesystems). Inodes that live on the disc
disc are copied into the memory when required and changes to the inode are copied into the memory when required and changes to the inode are
are written back to disc. A single inode can be pointed to by multiple written back to disc. A single inode can be pointed to by multiple
dentries (hard links, for example, do this). dentries (hard links, for example, do this).
To look up an inode requires that the VFS calls the lookup() method of To look up an inode requires that the VFS calls the lookup() method of
the parent directory inode. This method is installed by the specific the parent directory inode. This method is installed by the specific
filesystem implementation that the inode lives in. Once the VFS has filesystem implementation that the inode lives in. Once the VFS has the
the required dentry (and hence the inode), we can do all those boring required dentry (and hence the inode), we can do all those boring things
things like open(2) the file, or stat(2) it to peek at the inode like open(2) the file, or stat(2) it to peek at the inode data. The
data. The stat(2) operation is fairly simple: once the VFS has the stat(2) operation is fairly simple: once the VFS has the dentry, it
dentry, it peeks at the inode data and passes some of it back to peeks at the inode data and passes some of it back to userspace.
userspace.
The File Object The File Object
--------------- ---------------
Opening a file requires another operation: allocation of a file Opening a file requires another operation: allocation of a file
structure (this is the kernel-side implementation of file structure (this is the kernel-side implementation of file descriptors).
descriptors). The freshly allocated file structure is initialized with The freshly allocated file structure is initialized with a pointer to
a pointer to the dentry and a set of file operation member functions. the dentry and a set of file operation member functions. These are
These are taken from the inode data. The open() file method is then taken from the inode data. The open() file method is then called so the
called so the specific filesystem implementation can do its work. You specific filesystem implementation can do its work. You can see that
can see that this is another switch performed by the VFS. The file this is another switch performed by the VFS. The file structure is
structure is placed into the file descriptor table for the process. placed into the file descriptor table for the process.
Reading, writing and closing files (and other assorted VFS operations) Reading, writing and closing files (and other assorted VFS operations)
is done by using the userspace file descriptor to grab the appropriate is done by using the userspace file descriptor to grab the appropriate
...@@ -93,11 +90,12 @@ functions: ...@@ -93,11 +90,12 @@ functions:
extern int unregister_filesystem(struct file_system_type *); extern int unregister_filesystem(struct file_system_type *);
The passed struct file_system_type describes your filesystem. When a The passed struct file_system_type describes your filesystem. When a
request is made to mount a filesystem onto a directory in your namespace, request is made to mount a filesystem onto a directory in your
the VFS will call the appropriate mount() method for the specific namespace, the VFS will call the appropriate mount() method for the
filesystem. New vfsmount referring to the tree returned by ->mount() specific filesystem. New vfsmount referring to the tree returned by
will be attached to the mountpoint, so that when pathname resolution ->mount() will be attached to the mountpoint, so that when pathname
reaches the mountpoint it will jump into the root of that vfsmount. resolution reaches the mountpoint it will jump into the root of that
vfsmount.
You can see all filesystems that are registered to the kernel in the You can see all filesystems that are registered to the kernel in the
file /proc/filesystems. file /proc/filesystems.
...@@ -156,21 +154,21 @@ The mount() method must return the root dentry of the tree requested by ...@@ -156,21 +154,21 @@ The mount() method must return the root dentry of the tree requested by
caller. An active reference to its superblock must be grabbed and the caller. An active reference to its superblock must be grabbed and the
superblock must be locked. On failure it should return ERR_PTR(error). superblock must be locked. On failure it should return ERR_PTR(error).
The arguments match those of mount(2) and their interpretation The arguments match those of mount(2) and their interpretation depends
depends on filesystem type. E.g. for block filesystems, dev_name is on filesystem type. E.g. for block filesystems, dev_name is interpreted
interpreted as block device name, that device is opened and if it as block device name, that device is opened and if it contains a
contains a suitable filesystem image the method creates and initializes suitable filesystem image the method creates and initializes struct
struct super_block accordingly, returning its root dentry to caller. super_block accordingly, returning its root dentry to caller.
->mount() may choose to return a subtree of existing filesystem - it ->mount() may choose to return a subtree of existing filesystem - it
doesn't have to create a new one. The main result from the caller's doesn't have to create a new one. The main result from the caller's
point of view is a reference to dentry at the root of (sub)tree to point of view is a reference to dentry at the root of (sub)tree to be
be attached; creation of new superblock is a common side effect. attached; creation of new superblock is a common side effect.
The most interesting member of the superblock structure that the The most interesting member of the superblock structure that the mount()
mount() method fills in is the "s_op" field. This is a pointer to method fills in is the "s_op" field. This is a pointer to a "struct
a "struct super_operations" which describes the next level of the super_operations" which describes the next level of the filesystem
filesystem implementation. implementation.
Usually, a filesystem uses one of the generic mount() implementations Usually, a filesystem uses one of the generic mount() implementations
and provides a fill_super() callback instead. The generic variants are: and provides a fill_super() callback instead. The generic variants are:
...@@ -317,16 +315,16 @@ or bottom half). ...@@ -317,16 +315,16 @@ or bottom half).
implementations will cause holdoff problems due to large scan batch implementations will cause holdoff problems due to large scan batch
sizes. sizes.
Whoever sets up the inode is responsible for filling in the "i_op" field. This Whoever sets up the inode is responsible for filling in the "i_op"
is a pointer to a "struct inode_operations" which describes the methods that field. This is a pointer to a "struct inode_operations" which describes
can be performed on individual inodes. the methods that can be performed on individual inodes.
struct xattr_handlers struct xattr_handlers
--------------------- ---------------------
On filesystems that support extended attributes (xattrs), the s_xattr On filesystems that support extended attributes (xattrs), the s_xattr
superblock field points to a NULL-terminated array of xattr handlers. Extended superblock field points to a NULL-terminated array of xattr handlers.
attributes are name:value pairs. Extended attributes are name:value pairs.
name: Indicates that the handler matches attributes with the specified name name: Indicates that the handler matches attributes with the specified name
(such as "system.posix_acl_access"); the prefix field must be NULL. (such as "system.posix_acl_access"); the prefix field must be NULL.
...@@ -346,9 +344,9 @@ attributes are name:value pairs. ...@@ -346,9 +344,9 @@ attributes are name:value pairs.
attribute. This method is called by the the setxattr(2) and attribute. This method is called by the the setxattr(2) and
removexattr(2) system calls. removexattr(2) system calls.
When none of the xattr handlers of a filesystem match the specified attribute When none of the xattr handlers of a filesystem match the specified
name or when a filesystem doesn't support extended attributes, the various attribute name or when a filesystem doesn't support extended attributes,
*xattr(2) system calls return -EOPNOTSUPP. the various *xattr(2) system calls return -EOPNOTSUPP.
The Inode Object The Inode Object
...@@ -360,8 +358,8 @@ An inode object represents an object within the filesystem. ...@@ -360,8 +358,8 @@ An inode object represents an object within the filesystem.
struct inode_operations struct inode_operations
----------------------- -----------------------
This describes how the VFS can manipulate an inode in your This describes how the VFS can manipulate an inode in your filesystem.
filesystem. As of kernel 2.6.22, the following members are defined: As of kernel 2.6.22, the following members are defined:
struct inode_operations { struct inode_operations {
int (*create) (struct inode *,struct dentry *, umode_t, bool); int (*create) (struct inode *,struct dentry *, umode_t, bool);
...@@ -517,42 +515,40 @@ The Address Space Object ...@@ -517,42 +515,40 @@ The Address Space Object
======================== ========================
The address space object is used to group and manage pages in the page The address space object is used to group and manage pages in the page
cache. It can be used to keep track of the pages in a file (or cache. It can be used to keep track of the pages in a file (or anything
anything else) and also track the mapping of sections of the file into else) and also track the mapping of sections of the file into process
process address spaces. address spaces.
There are a number of distinct yet related services that an There are a number of distinct yet related services that an
address-space can provide. These include communicating memory address-space can provide. These include communicating memory pressure,
pressure, page lookup by address, and keeping track of pages tagged as page lookup by address, and keeping track of pages tagged as Dirty or
Dirty or Writeback. Writeback.
The first can be used independently to the others. The VM can try to The first can be used independently to the others. The VM can try to
either write dirty pages in order to clean them, or release clean either write dirty pages in order to clean them, or release clean pages
pages in order to reuse them. To do this it can call the ->writepage in order to reuse them. To do this it can call the ->writepage method
method on dirty pages, and ->releasepage on clean pages with on dirty pages, and ->releasepage on clean pages with PagePrivate set.
PagePrivate set. Clean pages without PagePrivate and with no external Clean pages without PagePrivate and with no external references will be
references will be released without notice being given to the released without notice being given to the address_space.
address_space.
To achieve this functionality, pages need to be placed on an LRU with To achieve this functionality, pages need to be placed on an LRU with
lru_cache_add and mark_page_active needs to be called whenever the lru_cache_add and mark_page_active needs to be called whenever the page
page is used. is used.
Pages are normally kept in a radix tree index by ->index. This tree Pages are normally kept in a radix tree index by ->index. This tree
maintains information about the PG_Dirty and PG_Writeback status of maintains information about the PG_Dirty and PG_Writeback status of each
each page, so that pages with either of these flags can be found page, so that pages with either of these flags can be found quickly.
quickly.
The Dirty tag is primarily used by mpage_writepages - the default The Dirty tag is primarily used by mpage_writepages - the default
->writepages method. It uses the tag to find dirty pages to call ->writepages method. It uses the tag to find dirty pages to call
->writepage on. If mpage_writepages is not used (i.e. the address ->writepage on. If mpage_writepages is not used (i.e. the address
provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
almost unused. write_inode_now and sync_inode do use it (through unused. write_inode_now and sync_inode do use it (through
__sync_single_inode) to check if ->writepages has been successful in __sync_single_inode) to check if ->writepages has been successful in
writing out the whole address_space. writing out the whole address_space.
The Writeback tag is used by filemap*wait* and sync_page* functions, The Writeback tag is used by filemap*wait* and sync_page* functions, via
via filemap_fdatawait_range, to wait for all writeback to complete. filemap_fdatawait_range, to wait for all writeback to complete.
An address_space handler may attach extra information to a page, An address_space handler may attach extra information to a page,
typically using the 'private' field in the 'struct page'. If such typically using the 'private' field in the 'struct page'. If such
...@@ -562,25 +558,24 @@ handler to deal with that data. ...@@ -562,25 +558,24 @@ handler to deal with that data.
An address space acts as an intermediate between storage and An address space acts as an intermediate between storage and
application. Data is read into the address space a whole page at a application. Data is read into the address space a whole page at a
time, and provided to the application either by copying of the page, time, and provided to the application either by copying of the page, or
or by memory-mapping the page. by memory-mapping the page. Data is written into the address space by
Data is written into the address space by the application, and then the application, and then written-back to storage typically in whole
written-back to storage typically in whole pages, however the pages, however the address_space has finer control of write sizes.
address_space has finer control of write sizes.
The read process essentially only requires 'readpage'. The write The read process essentially only requires 'readpage'. The write
process is more complicated and uses write_begin/write_end or process is more complicated and uses write_begin/write_end or
set_page_dirty to write data into the address_space, and writepage set_page_dirty to write data into the address_space, and writepage and
and writepages to writeback data to storage. writepages to writeback data to storage.
Adding and removing pages to/from an address_space is protected by the Adding and removing pages to/from an address_space is protected by the
inode's i_mutex. inode's i_mutex.
When data is written to a page, the PG_Dirty flag should be set. It When data is written to a page, the PG_Dirty flag should be set. It
typically remains set until writepage asks for it to be written. This typically remains set until writepage asks for it to be written. This
should clear PG_Dirty and set PG_Writeback. It can be actually should clear PG_Dirty and set PG_Writeback. It can be actually written
written at any point after PG_Dirty is clear. Once it is known to be at any point after PG_Dirty is clear. Once it is known to be safe,
safe, PG_Writeback is cleared. PG_Writeback is cleared.
Writeback makes use of a writeback_control structure to direct the Writeback makes use of a writeback_control structure to direct the
operations. This gives the the writepage and writepages operations some operations. This gives the the writepage and writepages operations some
...@@ -609,9 +604,10 @@ file descriptors should get back an error is not possible. ...@@ -609,9 +604,10 @@ file descriptors should get back an error is not possible.
Instead, the generic writeback error tracking infrastructure in the Instead, the generic writeback error tracking infrastructure in the
kernel settles for reporting errors to fsync on all file descriptions kernel settles for reporting errors to fsync on all file descriptions
that were open at the time that the error occurred. In a situation with that were open at the time that the error occurred. In a situation with
multiple writers, all of them will get back an error on a subsequent fsync, multiple writers, all of them will get back an error on a subsequent
even if all of the writes done through that particular file descriptor fsync, even if all of the writes done through that particular file
succeeded (or even if there were no writes on that file descriptor at all). descriptor succeeded (or even if there were no writes on that file
descriptor at all).
Filesystems that wish to use this infrastructure should call Filesystems that wish to use this infrastructure should call
mapping_set_error to record the error in the address_space when it mapping_set_error to record the error in the address_space when it
...@@ -623,8 +619,8 @@ point in the stream of errors emitted by the backing device(s). ...@@ -623,8 +619,8 @@ point in the stream of errors emitted by the backing device(s).
struct address_space_operations struct address_space_operations
------------------------------- -------------------------------
This describes how the VFS can manipulate mapping of a file to page cache in This describes how the VFS can manipulate mapping of a file to page
your filesystem. The following members are defined: cache in your filesystem. The following members are defined:
struct address_space_operations { struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc); int (*writepage)(struct page *page, struct writeback_control *wbc);
...@@ -1231,8 +1227,8 @@ filesystems. ...@@ -1231,8 +1227,8 @@ filesystems.
Showing options Showing options
--------------- ---------------
If a filesystem accepts mount options, it must define show_options() If a filesystem accepts mount options, it must define show_options() to
to show all the currently active options. The rules are: show all the currently active options. The rules are:
- options MUST be shown which are not default or their values differ - options MUST be shown which are not default or their values differ
from the default from the default
...@@ -1240,14 +1236,14 @@ to show all the currently active options. The rules are: ...@@ -1240,14 +1236,14 @@ to show all the currently active options. The rules are:
- options MAY be shown which are enabled by default or have their - options MAY be shown which are enabled by default or have their
default value default value
Options used only internally between a mount helper and the kernel Options used only internally between a mount helper and the kernel (such
(such as file descriptors), or which only have an effect during the as file descriptors), or which only have an effect during the mounting
mounting (such as ones controlling the creation of a journal) are exempt (such as ones controlling the creation of a journal) are exempt from the
from the above rules. above rules.
The underlying reason for the above rules is to make sure, that a The underlying reason for the above rules is to make sure, that a mount
mount can be accurately replicated (e.g. umounting and mounting again) can be accurately replicated (e.g. umounting and mounting again) based
based on the information found in /proc/mounts. on the information found in /proc/mounts.
Resources Resources
========= =========
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment