Commit a8f6c2e5 authored by Darrick J. Wong's avatar Darrick J. Wong

xfs: document the motivation for online fsck design

Start the first chapter of the online fsck design documentation.
This covers the motivations for creating this in the first place.
Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
parent 09a9639e
...@@ -123,4 +123,5 @@ Documentation for filesystem implementations. ...@@ -123,4 +123,5 @@ Documentation for filesystem implementations.
vfat vfat
xfs-delayed-logging-design xfs-delayed-logging-design
xfs-self-describing-metadata xfs-self-describing-metadata
xfs-online-fsck-design
zonefs zonefs
.. SPDX-License-Identifier: GPL-2.0
.. _xfs_online_fsck_design:
..
Mapping of heading styles within this document:
Heading 1 uses "====" above and below
Heading 2 uses "===="
Heading 3 uses "----"
Heading 4 uses "````"
Heading 5 uses "^^^^"
Heading 6 uses "~~~~"
Heading 7 uses "...."
Sections are manually numbered because apparently that's what everyone
does in the kernel.
======================
XFS Online Fsck Design
======================
This document captures the design of the online filesystem check feature for
XFS.
The purpose of this document is threefold:
- To help kernel distributors understand exactly what the XFS online fsck
feature is, and issues about which they should be aware.
- To help people reading the code to familiarize themselves with the relevant
concepts and design points before they start digging into the code.
- To help developers maintaining the system by capturing the reasons
supporting higher level decision making.
As the online fsck code is merged, the links in this document to topic branches
will be replaced with links to code.
This document is licensed under the terms of the GNU Public License, v2.
The primary author is Darrick J. Wong.
This design document is split into seven parts.
Part 1 defines what fsck tools are and the motivations for writing a new one.
Parts 2 and 3 present a high level overview of how online fsck process works
and how it is tested to ensure correct functionality.
Part 4 discusses the user interface and the intended usage modes of the new
program.
Parts 5 and 6 show off the high level components and how they fit together, and
then present case studies of how each repair function actually works.
Part 7 sums up what has been discussed so far and speculates about what else
might be built atop online fsck.
.. contents:: Table of Contents
:local:
1. What is a Filesystem Check?
==============================
A Unix filesystem has four main responsibilities:
- Provide a hierarchy of names through which application programs can associate
arbitrary blobs of data for any length of time,
- Virtualize physical storage media across those names, and
- Retrieve the named data blobs at any time.
- Examine resource usage.
Metadata directly supporting these functions (e.g. files, directories, space
mappings) are sometimes called primary metadata.
Secondary metadata (e.g. reverse mapping and directory parent pointers) support
operations internal to the filesystem, such as internal consistency checking
and reorganization.
Summary metadata, as the name implies, condense information contained in
primary metadata for performance reasons.
The filesystem check (fsck) tool examines all the metadata in a filesystem
to look for errors.
In addition to looking for obvious metadata corruptions, fsck also
cross-references different types of metadata records with each other to look
for inconsistencies.
People do not like losing data, so most fsck tools also contains some ability
to correct any problems found.
As a word of caution -- the primary goal of most Linux fsck tools is to restore
the filesystem metadata to a consistent state, not to maximize the data
recovered.
That precedent will not be challenged here.
Filesystems of the 20th century generally lacked any redundancy in the ondisk
format, which means that fsck can only respond to errors by erasing files until
errors are no longer detected.
More recent filesystem designs contain enough redundancy in their metadata that
it is now possible to regenerate data structures when non-catastrophic errors
occur; this capability aids both strategies.
+--------------------------------------------------------------------------+
| **Note**: |
+--------------------------------------------------------------------------+
| System administrators avoid data loss by increasing the number of |
| separate storage systems through the creation of backups; and they avoid |
| downtime by increasing the redundancy of each storage system through the |
| creation of RAID arrays. |
| fsck tools address only the first problem. |
+--------------------------------------------------------------------------+
TLDR; Show Me the Code!
-----------------------
Code is posted to the kernel.org git trees as follows:
`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
Each kernel patchset adding an online repair function will use the same branch
name across the kernel, xfsprogs, and fstests git repos.
Existing Tools
--------------
The online fsck tool described here will be the third tool in the history of
XFS (on Linux) to check and repair filesystems.
Two programs precede it:
The first program, ``xfs_check``, was created as part of the XFS debugger
(``xfs_db``) and can only be used with unmounted filesystems.
It walks all metadata in the filesystem looking for inconsistencies in the
metadata, though it lacks any ability to repair what it finds.
Due to its high memory requirements and inability to repair things, this
program is now deprecated and will not be discussed further.
The second program, ``xfs_repair``, was created to be faster and more robust
than the first program.
Like its predecessor, it can only be used with unmounted filesystems.
It uses extent-based in-memory data structures to reduce memory consumption,
and tries to schedule readahead IO appropriately to reduce I/O waiting time
while it scans the metadata of the entire filesystem.
The most important feature of this tool is its ability to respond to
inconsistencies in file metadata and directory tree by erasing things as needed
to eliminate problems.
Space usage metadata are rebuilt from the observed file metadata.
Problem Statement
-----------------
The current XFS tools leave several problems unsolved:
1. **User programs** suddenly **lose access** to the filesystem when unexpected
shutdowns occur as a result of silent corruptions in the metadata.
These occur **unpredictably** and often without warning.
2. **Users** experience a **total loss of service** during the recovery period
after an **unexpected shutdown** occurs.
3. **Users** experience a **total loss of service** if the filesystem is taken
offline to **look for problems** proactively.
4. **Data owners** cannot **check the integrity** of their stored data without
reading all of it.
This may expose them to substantial billing costs when a linear media scan
performed by the storage system administrator might suffice.
5. **System administrators** cannot **schedule** a maintenance window to deal
with corruptions if they **lack the means** to assess filesystem health
while the filesystem is online.
6. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem
health when doing so requires **manual intervention** and downtime.
7. **Users** can be tricked into **doing things they do not desire** when
malicious actors **exploit quirks of Unicode** to place misleading names
in directories.
Given this definition of the problems to be solved and the actors who would
benefit, the proposed solution is a third fsck tool that acts on a running
filesystem.
This new third program has three components: an in-kernel facility to check
metadata, an in-kernel facility to repair metadata, and a userspace driver
program to drive fsck activity on a live filesystem.
``xfs_scrub`` is the name of the driver program.
The rest of this document presents the goals and use cases of the new fsck
tool, describes its major design points in connection to those goals, and
discusses the similarities and differences with existing tools.
+--------------------------------------------------------------------------+
| **Note**: |
+--------------------------------------------------------------------------+
| Throughout this document, the existing offline fsck tool can also be |
| referred to by its current name "``xfs_repair``". |
| The userspace driver program for the new online fsck tool can be |
| referred to as "``xfs_scrub``". |
| The kernel portion of online fsck that validates metadata is called |
| "online scrub", and portion of the kernel that fixes metadata is called |
| "online repair". |
+--------------------------------------------------------------------------+
The naming hierarchy is broken up into objects known as directories and files
and the physical space is split into pieces known as allocation groups.
Sharding enables better performance on highly parallel systems and helps to
contain the damage when corruptions occur.
The division of the filesystem into principal objects (allocation groups and
inodes) means that there are ample opportunities to perform targeted checks and
repairs on a subset of the filesystem.
While this is going on, other parts continue processing IO requests.
Even if a piece of filesystem metadata can only be regenerated by scanning the
entire system, the scan can still be done in the background while other file
operations continue.
In summary, online fsck takes advantage of resource sharding and redundant
metadata to enable targeted checking and repair operations while the system
is running.
This capability will be coupled to automatic system management so that
autonomous self-healing of XFS maximizes service availability.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment