Commit 96485e44 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The siginificant new ext4 feature this time around is Harshad's new
  fast_commit mode.

  In addition, thanks to Mauricio for fixing a race where mmap'ed pages
  that are being changed in parallel with a data=journal transaction
  commit could result in bad checksums in the failure that could cause
  journal replays to fail.

  Also notable is Ritesh's buffered write optimization which can result
  in significant improvements on parallel write workloads. (The kernel
  test robot reported a 330.6% improvement on fio.write_iops on a 96
  core system using DAX)

  Besides that, we have the usual miscellaneous cleanups and bug fixes"

Link: https://lore.kernel.org/r/20200925071217.GO28663@shao2-debian

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
  ext4: fix invalid inode checksum
  ext4: add fast commit stats in procfs
  ext4: add a mount opt to forcefully turn fast commits on
  ext4: fast commit recovery path
  jbd2: fast commit recovery path
  ext4: main fast-commit commit path
  jbd2: add fast commit machinery
  ext4 / jbd2: add fast commit initialization
  ext4: add fast_commit feature and handling for extended mount options
  doc: update ext4 and journalling docs to include fast commit feature
  ext4: Detect already used quota file early
  jbd2: avoid transaction reuse after reformatting
  ext4: use the normal helper to get the actual inode
  ext4: fix bs < ps issue reported with dioread_nolock mount opt
  ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()
  ext4: data=journal: fixes for ext4_page_mkwrite()
  jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers()
  jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
  ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable()
  ext4: use ext4_sb_bread() instead of sb_bread()
  ...
parents f56e65df 13221811
......@@ -28,6 +28,17 @@ metadata are written to disk through the journal. This is slower but
safest. If ``data=writeback``, dirty data blocks are not flushed to the
disk before the metadata are written to disk through the journal.
In case of ``data=ordered`` mode, Ext4 also supports fast commits which
help reduce commit latency significantly. The default ``data=ordered``
mode works by logging metadata blocks to the journal. In fast commit
mode, Ext4 only stores the minimal delta needed to recreate the
affected metadata in fast commit space that is shared with JBD2.
Once the fast commit area fills in or if fast commit is not possible
or if JBD2 commit timer goes off, Ext4 performs a traditional full commit.
A full commit invalidates all the fast commits that happened before
it and thus it makes the fast commit area empty for further fast
commits. This feature needs to be enabled at mkfs time.
The journal inode is typically inode 8. The first 68 bytes of the
journal inode are replicated in the ext4 superblock. The journal itself
is normal (but hidden) file within the filesystem. The file usually
......@@ -609,3 +620,58 @@ bytes long (but uses a full block):
- h\_commit\_nsec
- Nanoseconds component of the above timestamp.
Fast commits
~~~~~~~~~~~~
Fast commit area is organized as a log of tag length values. Each TLV has
a ``struct ext4_fc_tl`` in the beginning which stores the tag and the length
of the entire field. It is followed by variable length tag specific value.
Here is the list of supported tags and their meanings:
.. list-table::
:widths: 8 20 20 32
:header-rows: 1
* - Tag
- Meaning
- Value struct
- Description
* - EXT4_FC_TAG_HEAD
- Fast commit area header
- ``struct ext4_fc_head``
- Stores the TID of the transaction after which these fast commits should
be applied.
* - EXT4_FC_TAG_ADD_RANGE
- Add extent to inode
- ``struct ext4_fc_add_range``
- Stores the inode number and extent to be added in this inode
* - EXT4_FC_TAG_DEL_RANGE
- Remove logical offsets to inode
- ``struct ext4_fc_del_range``
- Stores the inode number and the logical offset range that needs to be
removed
* - EXT4_FC_TAG_CREAT
- Create directory entry for a newly created file
- ``struct ext4_fc_dentry_info``
- Stores the parent inode number, inode number and directory entry of the
newly created file
* - EXT4_FC_TAG_LINK
- Link a directory entry to an inode
- ``struct ext4_fc_dentry_info``
- Stores the parent inode number, inode number and directory entry
* - EXT4_FC_TAG_UNLINK
- Unlink a directory entry of an inode
- ``struct ext4_fc_dentry_info``
- Stores the parent inode number, inode number and directory entry
* - EXT4_FC_TAG_PAD
- Padding (unused area)
- None
- Unused bytes in the fast commit area.
* - EXT4_FC_TAG_TAIL
- Mark the end of a fast commit
- ``struct ext4_fc_tail``
- Stores the TID of the commit, CRC of the fast commit of which this tag
represents the end of
......@@ -132,6 +132,39 @@ The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing
these calls.
Fast commits
~~~~~~~~~~~~
JBD2 to also allows you to perform file-system specific delta commits known as
fast commits. In order to use fast commits, you first need to call
:c:func:`jbd2_fc_init` and tell how many blocks at the end of journal
area should be reserved for fast commits. Along with that, you will also need
to set following callbacks that perform correspodning work:
`journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and
fast commit.
`journal->j_fc_replay_cb`: Replay function called for replay of fast commit
blocks.
File system is free to perform fast commits as and when it wants as long as it
gets permission from JBD2 to do so by calling the function
:c:func:`jbd2_fc_begin_commit()`. Once a fast commit is done, the client
file system should tell JBD2 about it by calling
:c:func:`jbd2_fc_end_commit()`. If file system wants JBD2 to perform a full
commit immediately after stopping the fast commit it can do so by calling
:c:func:`jbd2_fc_end_commit_fallback()`. This is useful if fast commit operation
fails for some reason and the only way to guarantee consistency is for JBD2 to
perform the full traditional commit.
JBD2 helper functions to manage fast commit buffers. File system can use
:c:func:`jbd2_fc_get_buf()` and :c:func:`jbd2_fc_wait_bufs()` to allocate
and wait on IO completion of fast commit buffers.
Currently, only Ext4 implements fast commits. For details of its implementation
of fast commits, please refer to the top level comments in
fs/ext4/fast_commit.c.
Summary
~~~~~~~
......
......@@ -10,7 +10,7 @@ ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
super.o symlink.o sysfs.o xattr.o xattr_hurd.o xattr_trusted.o \
xattr_user.o
xattr_user.o fast_commit.o
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
......
......@@ -242,6 +242,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
if (IS_ERR(handle))
return PTR_ERR(handle);
ext4_fc_start_update(inode);
if ((type == ACL_TYPE_ACCESS) && acl) {
error = posix_acl_update_mode(inode, &mode, &acl);
......@@ -259,6 +260,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
}
out_stop:
ext4_journal_stop(handle);
ext4_fc_stop_update(inode);
if (error == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
return error;
......
......@@ -368,7 +368,12 @@ static int ext4_validate_block_bitmap(struct super_block *sb,
struct buffer_head *bh)
{
ext4_fsblk_t blk;
struct ext4_group_info *grp = ext4_get_group_info(sb, block_group);
struct ext4_group_info *grp;
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
grp = ext4_get_group_info(sb, block_group);
if (buffer_verified(bh))
return 0;
......@@ -495,10 +500,9 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
*/
set_buffer_new(bh);
trace_ext4_read_block_bitmap_load(sb, block_group, ignore_locked);
bh->b_end_io = ext4_end_bitmap_read;
get_bh(bh);
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO |
(ignore_locked ? REQ_RAHEAD : 0), bh);
ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO |
(ignore_locked ? REQ_RAHEAD : 0),
ext4_end_bitmap_read);
return bh;
verify:
err = ext4_validate_block_bitmap(sb, desc, block_group, bh);
......
......@@ -131,7 +131,7 @@ static void debug_print_tree(struct ext4_sb_info *sbi)
printk(KERN_INFO "System zones: ");
rcu_read_lock();
system_blks = rcu_dereference(sbi->system_blks);
system_blks = rcu_dereference(sbi->s_system_blks);
node = rb_first(&system_blks->root);
while (node) {
entry = rb_entry(node, struct ext4_system_zone, node);
......@@ -261,7 +261,7 @@ int ext4_setup_system_zone(struct super_block *sb)
* with ext4_data_block_valid() accessing the rbtree at the same
* time.
*/
rcu_assign_pointer(sbi->system_blks, system_blks);
rcu_assign_pointer(sbi->s_system_blks, system_blks);
if (test_opt(sb, DEBUG))
debug_print_tree(sbi);
......@@ -286,9 +286,9 @@ void ext4_release_system_zone(struct super_block *sb)
{
struct ext4_system_blocks *system_blks;
system_blks = rcu_dereference_protected(EXT4_SB(sb)->system_blks,
system_blks = rcu_dereference_protected(EXT4_SB(sb)->s_system_blks,
lockdep_is_held(&sb->s_umount));
rcu_assign_pointer(EXT4_SB(sb)->system_blks, NULL);
rcu_assign_pointer(EXT4_SB(sb)->s_system_blks, NULL);
if (system_blks)
call_rcu(&system_blks->rcu, ext4_destroy_system_zone);
......@@ -319,7 +319,7 @@ int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
* mount option.
*/
rcu_read_lock();
system_blks = rcu_dereference(sbi->system_blks);
system_blks = rcu_dereference(sbi->s_system_blks);
if (system_blks == NULL)
goto out_rcu;
......
......@@ -674,7 +674,7 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
{
struct qstr qstr = {.name = str, .len = len };
const struct dentry *parent = READ_ONCE(dentry->d_parent);
const struct inode *inode = READ_ONCE(parent->d_inode);
const struct inode *inode = d_inode_rcu(parent);
char strbuf[DNAME_INLINE_LEN];
if (!inode || !IS_CASEFOLDED(inode) ||
......@@ -706,7 +706,7 @@ static int ext4_d_hash(const struct dentry *dentry, struct qstr *str)
{
const struct ext4_sb_info *sbi = EXT4_SB(dentry->d_sb);
const struct unicode_map *um = sbi->s_encoding;
const struct inode *inode = READ_ONCE(dentry->d_inode);
const struct inode *inode = d_inode_rcu(dentry);
unsigned char *norm;
int len, ret = 0;
......
This diff is collapsed.
......@@ -100,7 +100,7 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
return ERR_PTR(err);
journal = EXT4_SB(sb)->s_journal;
if (!journal)
if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
return ext4_get_nojournal();
return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds,
GFP_NOFS, type, line);
......
This diff is collapsed.
......@@ -311,6 +311,9 @@ void ext4_es_find_extent_range(struct inode *inode,
ext4_lblk_t lblk, ext4_lblk_t end,
struct extent_status *es)
{
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return;
trace_ext4_es_find_extent_range_enter(inode, lblk);
read_lock(&EXT4_I(inode)->i_es_lock);
......@@ -361,6 +364,9 @@ bool ext4_es_scan_range(struct inode *inode,
{
bool ret;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return false;
read_lock(&EXT4_I(inode)->i_es_lock);
ret = __es_scan_range(inode, matching_fn, lblk, end);
read_unlock(&EXT4_I(inode)->i_es_lock);
......@@ -404,6 +410,9 @@ bool ext4_es_scan_clu(struct inode *inode,
{
bool ret;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return false;
read_lock(&EXT4_I(inode)->i_es_lock);
ret = __es_scan_clu(inode, matching_fn, lblk);
read_unlock(&EXT4_I(inode)->i_es_lock);
......@@ -812,6 +821,9 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
int err = 0;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
es_debug("add [%u/%u) %llu %x to extent status tree of inode %lu\n",
lblk, len, pblk, status, inode->i_ino);
......@@ -873,6 +885,9 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
struct extent_status newes;
ext4_lblk_t end = lblk + len - 1;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return;
newes.es_lblk = lblk;
newes.es_len = len;
ext4_es_store_pblock_status(&newes, pblk, status);
......@@ -908,6 +923,9 @@ int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
struct rb_node *node;
int found = 0;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
trace_ext4_es_lookup_extent_enter(inode, lblk);
es_debug("lookup extent in block %u\n", lblk);
......@@ -1419,6 +1437,9 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
int err = 0;
int reserved = 0;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
trace_ext4_es_remove_extent(inode, lblk, len);
es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
lblk, len, inode->i_ino);
......@@ -1969,6 +1990,9 @@ int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
struct extent_status newes;
int err = 0;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
lblk, inode->i_ino);
......
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __FAST_COMMIT_H__
#define __FAST_COMMIT_H__
/* Number of blocks in journal area to allocate for fast commits */
#define EXT4_NUM_FC_BLKS 256
/* Fast commit tags */
#define EXT4_FC_TAG_ADD_RANGE 0x0001
#define EXT4_FC_TAG_DEL_RANGE 0x0002
#define EXT4_FC_TAG_CREAT 0x0003
#define EXT4_FC_TAG_LINK 0x0004
#define EXT4_FC_TAG_UNLINK 0x0005
#define EXT4_FC_TAG_INODE 0x0006
#define EXT4_FC_TAG_PAD 0x0007
#define EXT4_FC_TAG_TAIL 0x0008
#define EXT4_FC_TAG_HEAD 0x0009
#define EXT4_FC_SUPPORTED_FEATURES 0x0
/* On disk fast commit tlv value structures */
/* Fast commit on disk tag length structure */
struct ext4_fc_tl {
__le16 fc_tag;
__le16 fc_len;
};
/* Value structure for tag EXT4_FC_TAG_HEAD. */
struct ext4_fc_head {
__le32 fc_features;
__le32 fc_tid;
};
/* Value structure for EXT4_FC_TAG_ADD_RANGE. */
struct ext4_fc_add_range {
__le32 fc_ino;
__u8 fc_ex[12];
};
/* Value structure for tag EXT4_FC_TAG_DEL_RANGE. */
struct ext4_fc_del_range {
__le32 fc_ino;
__le32 fc_lblk;
__le32 fc_len;
};
/*
* This is the value structure for tags EXT4_FC_TAG_CREAT, EXT4_FC_TAG_LINK
* and EXT4_FC_TAG_UNLINK.
*/
struct ext4_fc_dentry_info {
__le32 fc_parent_ino;
__le32 fc_ino;
u8 fc_dname[0];
};
/* Value structure for EXT4_FC_TAG_INODE and EXT4_FC_TAG_INODE_PARTIAL. */
struct ext4_fc_inode {
__le32 fc_ino;
__u8 fc_raw_inode[0];
};
/* Value structure for tag EXT4_FC_TAG_TAIL. */
struct ext4_fc_tail {
__le32 fc_tid;
__le32 fc_crc;
};
/*
* In memory list of dentry updates that are performed on the file
* system used by fast commit code.
*/
struct ext4_fc_dentry_update {
int fcd_op; /* Type of update create / unlink / link */
int fcd_parent; /* Parent inode number */
int fcd_ino; /* Inode number */
struct qstr fcd_name; /* Dirent name */
unsigned char fcd_iname[DNAME_INLINE_LEN]; /* Dirent name string */
struct list_head fcd_list;
};
/*
* Fast commit reason codes
*/
enum {
/*
* Commit status codes:
*/
EXT4_FC_REASON_OK = 0,
EXT4_FC_REASON_INELIGIBLE,
EXT4_FC_REASON_ALREADY_COMMITTED,
EXT4_FC_REASON_FC_START_FAILED,
EXT4_FC_REASON_FC_FAILED,
/*
* Fast commit ineligiblity reasons:
*/
EXT4_FC_REASON_XATTR = 0,
EXT4_FC_REASON_CROSS_RENAME,
EXT4_FC_REASON_JOURNAL_FLAG_CHANGE,
EXT4_FC_REASON_MEM,
EXT4_FC_REASON_SWAP_BOOT,
EXT4_FC_REASON_RESIZE,
EXT4_FC_REASON_RENAME_DIR,
EXT4_FC_REASON_FALLOC_RANGE,
EXT4_FC_COMMIT_FAILED,
EXT4_FC_REASON_MAX
};
struct ext4_fc_stats {
unsigned int fc_ineligible_reason_count[EXT4_FC_REASON_MAX];
unsigned long fc_num_commits;
unsigned long fc_ineligible_commits;
unsigned long fc_numblks;
};
#define EXT4_FC_REPLAY_REALLOC_INCREMENT 4
/*
* Physical block regions added to different inodes due to fast commit
* recovery. These are set during the SCAN phase. During the replay phase,
* our allocator excludes these from its allocation. This ensures that
* we don't accidentally allocating a block that is going to be used by
* another inode.
*/
struct ext4_fc_alloc_region {
ext4_lblk_t lblk;
ext4_fsblk_t pblk;
int ino, len;
};
/*
* Fast commit replay state.
*/
struct ext4_fc_replay_state {
int fc_replay_num_tags;
int fc_replay_expected_off;
int fc_current_pass;
int fc_cur_tag;
int fc_crc;
struct ext4_fc_alloc_region *fc_regions;
int fc_regions_size, fc_regions_used, fc_regions_valid;
int *fc_modified_inodes;
int fc_modified_inodes_used, fc_modified_inodes_size;
};
#define region_last(__region) (((__region)->lblk) + ((__region)->len) - 1)
#define fc_for_each_tl(__start, __end, __tl) \
for (tl = (struct ext4_fc_tl *)start; \
(u8 *)tl < (u8 *)end; \
tl = (struct ext4_fc_tl *)((u8 *)tl + \
sizeof(struct ext4_fc_tl) + \
+ le16_to_cpu(tl->fc_len)))
#endif /* __FAST_COMMIT_H__ */
......@@ -260,6 +260,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
ext4_fc_start_update(inode);
inode_lock(inode);
ret = ext4_write_checks(iocb, from);
if (ret <= 0)
......@@ -271,6 +272,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
out:
inode_unlock(inode);
ext4_fc_stop_update(inode);
if (likely(ret > 0)) {
iocb->ki_pos += ret;
ret = generic_write_sync(iocb, ret);
......@@ -534,7 +536,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out;
}
ext4_fc_start_update(inode);
ret = ext4_orphan_add(handle, inode);
ext4_fc_stop_update(inode);
if (ret) {
ext4_journal_stop(handle);
goto out;
......@@ -656,8 +660,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
#endif
if (iocb->ki_flags & IOCB_DIRECT)
return ext4_dio_write_iter(iocb, from);
return ext4_buffered_write_iter(iocb, from);
else
return ext4_buffered_write_iter(iocb, from);
}
#ifdef CONFIG_FS_DAX
......@@ -757,6 +761,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
ext4_fc_start_update(inode);
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
......@@ -764,6 +769,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
} else {
vma->vm_ops = &ext4_file_vm_ops;
}
ext4_fc_stop_update(inode);
return 0;
}
......@@ -844,7 +850,7 @@ static int ext4_file_open(struct inode *inode, struct file *filp)
return ret;
}
filp->f_mode |= FMODE_NOWAIT;
filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
return dquot_file_open(inode, filp);
}
......
......@@ -108,6 +108,9 @@ static int ext4_getfsmap_helper(struct super_block *sb,
/* Are we just counting mappings? */
if (info->gfi_head->fmh_count == 0) {
if (info->gfi_head->fmh_entries == UINT_MAX)
return EXT4_QUERY_RANGE_ABORT;
if (rec_fsblk > info->gfi_next_fsblk)
info->gfi_head->fmh_entries++;
......@@ -571,8 +574,8 @@ static bool ext4_getfsmap_is_valid_device(struct super_block *sb,
if (fm->fmr_device == 0 || fm->fmr_device == UINT_MAX ||
fm->fmr_device == new_encode_dev(sb->s_bdev->bd_dev))
return true;
if (EXT4_SB(sb)->journal_bdev &&
fm->fmr_device == new_encode_dev(EXT4_SB(sb)->journal_bdev->bd_dev))
if (EXT4_SB(sb)->s_journal_bdev &&
fm->fmr_device == new_encode_dev(EXT4_SB(sb)->s_journal_bdev->bd_dev))
return true;
return false;
}
......@@ -642,9 +645,9 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
memset(handlers, 0, sizeof(handlers));
handlers[0].gfd_dev = new_encode_dev(sb->s_bdev->bd_dev);
handlers[0].gfd_fn = ext4_getfsmap_datadev;
if (EXT4_SB(sb)->journal_bdev) {
if (EXT4_SB(sb)->s_journal_bdev) {
handlers[1].gfd_dev = new_encode_dev(
EXT4_SB(sb)->journal_bdev->bd_dev);
EXT4_SB(sb)->s_journal_bdev->bd_dev);
handlers[1].gfd_fn = ext4_getfsmap_logdev;
}
......
......@@ -112,7 +112,7 @@ static int ext4_fsync_journal(struct inode *inode, bool datasync,
!jbd2_trans_will_send_data_barrier(journal, commit_tid))
*needs_barrier = true;
return jbd2_complete_transaction(journal, commit_tid);
return ext4_fc_commit(journal, commit_tid);
}
/*
......@@ -150,7 +150,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
ret = file_write_and_wait_range(file, start, end);
if (ret)
return ret;
goto out;
/*
* data=writeback,ordered:
......
......@@ -82,7 +82,12 @@ static int ext4_validate_inode_bitmap(struct super_block *sb,
struct buffer_head *bh)
{
ext4_fsblk_t blk;
struct ext4_group_info *grp = ext4_get_group_info(sb, block_group);
struct ext4_group_info *grp;
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
grp = ext4_get_group_info(sb, block_group);
if (buffer_verified(bh))
return 0;
......@@ -189,10 +194,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
* submit the buffer_head for reading
*/
trace_ext4_load_inode_bitmap(sb, block_group);
bh->b_end_io = ext4_end_bitmap_read;
get_bh(bh);
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
wait_on_buffer(bh);
ext4_read_bh(bh, REQ_META | REQ_PRIO, ext4_end_bitmap_read);
ext4_simulate_fail_bh(sb, bh, EXT4_SIM_IBITMAP_EIO);
if (!buffer_uptodate(bh)) {
put_bh(bh);
......@@ -284,15 +286,17 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
bitmap_bh = ext4_read_inode_bitmap(sb, block_group);
/* Don't bother if the inode bitmap is corrupt. */
grp = ext4_get_group_info(sb, block_group);
if (IS_ERR(bitmap_bh)) {
fatal = PTR_ERR(bitmap_bh);
bitmap_bh = NULL;
goto error_return;
}
if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
fatal = -EFSCORRUPTED;
goto error_return;
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
grp = ext4_get_group_info(sb, block_group);
if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
fatal = -EFSCORRUPTED;
goto error_return;
}
}
BUFFER_TRACE(bitmap_bh, "get_write_access");
......@@ -742,6 +746,122 @@ static int find_inode_bit(struct super_block *sb, ext4_group_t group,
return 1;
}
int ext4_mark_inode_used(struct super_block *sb, int ino)
{
unsigned long max_ino = le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count);
struct buffer_head *inode_bitmap_bh = NULL, *group_desc_bh = NULL;
struct ext4_group_desc *gdp;
ext4_group_t group;
int bit;
int err = -EFSCORRUPTED;
if (ino < EXT4_FIRST_INO(sb) || ino > max_ino)
goto out;
group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
if (IS_ERR(inode_bitmap_bh))
return PTR_ERR(inode_bitmap_bh);
if (ext4_test_bit(bit, inode_bitmap_bh->b_data)) {
err = 0;
goto out;
}
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
if (!gdp || !group_desc_bh) {
err = -EINVAL;
goto out;
}
ext4_set_bit(bit, inode_bitmap_bh->b_data);
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(NULL, NULL, inode_bitmap_bh);
if (err) {
ext4_std_error(sb, err);
goto out;
}
err = sync_dirty_buffer(inode_bitmap_bh);
if (err) {
ext4_std_error(sb, err);
goto out;
}
/* We may have to initialize the block bitmap if it isn't already */
if (ext4_has_group_desc_csum(sb) &&
gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
struct buffer_head *block_bitmap_bh;
block_bitmap_bh = ext4_read_block_bitmap(sb, group);
if (IS_ERR(block_bitmap_bh)) {
err = PTR_ERR(block_bitmap_bh);
goto out;
}
BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
err = ext4_handle_dirty_metadata(NULL, NULL, block_bitmap_bh);
sync_dirty_buffer(block_bitmap_bh);
/* recheck and clear flag under lock if we still need to */
ext4_lock_group(sb, group);
if (ext4_has_group_desc_csum(sb) &&
(gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
ext4_free_group_clusters_set(sb, gdp,
ext4_free_clusters_after_init(sb, group, gdp));
ext4_block_bitmap_csum_set(sb, group, gdp,
block_bitmap_bh);
ext4_group_desc_csum_set(sb, group, gdp);
}
ext4_unlock_group(sb, group);
brelse(block_bitmap_bh);
if (err) {
ext4_std_error(sb, err);
goto out;
}
}
/* Update the relevant bg descriptor fields */
if (ext4_has_group_desc_csum(sb)) {
int free;
ext4_lock_group(sb, group); /* while we modify the bg desc */
free = EXT4_INODES_PER_GROUP(sb) -
ext4_itable_unused_count(sb, gdp);
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
free = 0;
}
/*
* Check the relative inode number against the last used
* relative inode number in this group. if it is greater
* we need to update the bg_itable_unused count
*/
if (bit >= free)
ext4_itable_unused_set(sb, gdp,
(EXT4_INODES_PER_GROUP(sb) - bit - 1));
} else {
ext4_lock_group(sb, group);
}
ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
if (ext4_has_group_desc_csum(sb)) {
ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
EXT4_INODES_PER_GROUP(sb) / 8);
ext4_group_desc_csum_set(sb, group, gdp);
}
ext4_unlock_group(sb, group);
err = ext4_handle_dirty_metadata(NULL, NULL, group_desc_bh);
sync_dirty_buffer(group_desc_bh);
out:
return err;
}
static int ext4_xattr_credits_for_new_inode(struct inode *dir, mode_t mode,
bool encrypt)
{
......@@ -818,7 +938,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
struct inode *ret;
ext4_group_t i;
ext4_group_t flex_group;
struct ext4_group_info *grp;
struct ext4_group_info *grp = NULL;
bool encrypt = false;
/* Cannot create files in a deleted directory */
......@@ -918,15 +1038,21 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
if (ext4_free_inodes_count(sb, gdp) == 0)
goto next_group;
grp = ext4_get_group_info(sb, group);
/* Skip groups with already-known suspicious inode tables */
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
goto next_group;
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
grp = ext4_get_group_info(sb, group);
/*
* Skip groups with already-known suspicious inode
* tables
*/
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
goto next_group;
}
brelse(inode_bitmap_bh);
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
/* Skip groups with suspicious inode tables */
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp) ||
if (((!(sbi->s_mount_state & EXT4_FC_REPLAY))
&& EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) ||
IS_ERR(inode_bitmap_bh)) {
inode_bitmap_bh = NULL;
goto next_group;
......@@ -945,7 +1071,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
goto next_group;
}
if (!handle) {
if ((!(sbi->s_mount_state & EXT4_FC_REPLAY)) && !handle) {
BUG_ON(nblocks <= 0);
handle = __ext4_journal_start_sb(dir->i_sb, line_no,
handle_type, nblocks, 0,
......@@ -1049,9 +1175,15 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
/* Update the relevant bg descriptor fields */
if (ext4_has_group_desc_csum(sb)) {
int free;
struct ext4_group_info *grp = ext4_get_group_info(sb, group);
down_read(&grp->alloc_sem); /* protect vs itable lazyinit */
struct ext4_group_info *grp = NULL;
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
grp = ext4_get_group_info(sb, group);
down_read(&grp->alloc_sem); /*
* protect vs itable
* lazyinit
*/
}
ext4_lock_group(sb, group); /* while we modify the bg desc */
free = EXT4_INODES_PER_GROUP(sb) -
ext4_itable_unused_count(sb, gdp);
......@@ -1067,7 +1199,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
if (ino > free)
ext4_itable_unused_set(sb, gdp,
(EXT4_INODES_PER_GROUP(sb) - ino));
up_read(&grp->alloc_sem);
if (!(sbi->s_mount_state & EXT4_FC_REPLAY))
up_read(&grp->alloc_sem);
} else {
ext4_lock_group(sb, group);
}
......
......@@ -163,7 +163,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
}
if (!bh_uptodate_or_lock(bh)) {
if (bh_submit_read(bh) < 0) {
if (ext4_read_bh(bh, 0, NULL) < 0) {
put_bh(bh);
goto failure;
}
......@@ -593,7 +593,8 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
if (ext4_has_feature_bigalloc(inode->i_sb)) {
EXT4_ERROR_INODE(inode, "Can't allocate blocks for "
"non-extent mapped inodes with bigalloc");
return -EFSCORRUPTED;
err = -EFSCORRUPTED;
goto out;
}
/* Set up for the direct block allocation */
......@@ -1012,14 +1013,14 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
}
/* Go read the buffer for the next level down */
bh = sb_bread(inode->i_sb, nr);
bh = ext4_sb_bread(inode->i_sb, nr, 0);
/*
* A read failure? Report error and clear slot
* (should be rare).
*/
if (!bh) {
ext4_error_inode_block(inode, nr, EIO,
if (IS_ERR(bh)) {
ext4_error_inode_block(inode, nr, -PTR_ERR(bh),
"Read failure");
continue;
}
......@@ -1033,7 +1034,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
brelse(bh);
/*
* Everything below this this pointer has been
* Everything below this pointer has been
* released. Now let this top-of-subtree go.
*
* We want the freeing of this indirect block to be
......
......@@ -354,7 +354,7 @@ static int ext4_update_inline_data(handle_t *handle, struct inode *inode,
if (error)
goto out;
/* Update the xttr entry. */
/* Update the xattr entry. */
i.value = value;
i.value_len = len;
......
This diff is collapsed.
......@@ -86,7 +86,7 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2)
i_size_write(inode2, isize);
}
static void reset_inode_seed(struct inode *inode)
void ext4_reset_inode_seed(struct inode *inode)
{
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
......@@ -165,6 +165,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
err = -EINVAL;
goto err_out;
}
ext4_fc_start_ineligible(sb, EXT4_FC_REASON_SWAP_BOOT);
/* Protect extent tree against block allocations via delalloc */
ext4_double_down_write_data_sem(inode, inode_bl);
......@@ -199,8 +200,8 @@ static long swap_inode_boot_loader(struct super_block *sb,
inode->i_generation = prandom_u32();
inode_bl->i_generation = prandom_u32();
reset_inode_seed(inode);
reset_inode_seed(inode_bl);
ext4_reset_inode_seed(inode);
ext4_reset_inode_seed(inode_bl);
ext4_discard_preallocations(inode, 0);
......@@ -247,6 +248,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
err_out1:
ext4_journal_stop(handle);
ext4_fc_stop_ineligible(sb);
ext4_double_up_write_data_sem(inode, inode_bl);
err_out:
......@@ -807,7 +809,7 @@ static int ext4_ioctl_get_es_cache(struct file *filp, unsigned long arg)
return error;
}
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
struct super_block *sb = inode->i_sb;
......@@ -1074,6 +1076,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
err = ext4_resize_fs(sb, n_blocks_count);
if (EXT4_SB(sb)->s_journal) {
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE);
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
......@@ -1308,6 +1311,17 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
}
}
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
long ret;
ext4_fc_start_update(file_inode(filp));
ret = __ext4_ioctl(filp, cmd, arg);
ext4_fc_stop_update(file_inode(filp));
return ret;
}
#ifdef CONFIG_COMPAT
long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
......
This diff is collapsed.
......@@ -85,15 +85,11 @@ static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
}
}
get_bh(*bh);
lock_buffer(*bh);
(*bh)->b_end_io = end_buffer_read_sync;
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, *bh);
wait_on_buffer(*bh);
if (!buffer_uptodate(*bh)) {
ret = -EIO;
ret = ext4_read_bh(*bh, REQ_META | REQ_PRIO, NULL);
if (ret)
goto warn_exit;
}
mmp = (struct mmp_struct *)((*bh)->b_data);
if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC) {
ret = -EFSCORRUPTED;
......
......@@ -215,7 +215,7 @@ mext_page_mkuptodate(struct page *page, unsigned from, unsigned to)
for (i = 0; i < nr; i++) {
bh = arr[i];
if (!bh_uptodate_or_lock(bh)) {
err = bh_submit_read(bh);
err = ext4_read_bh(bh, 0, NULL);
if (err)
return err;
}
......
This diff is collapsed.
......@@ -843,8 +843,10 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
BUFFER_TRACE(dind, "get_write_access");
err = ext4_journal_get_write_access(handle, dind);
if (unlikely(err))
if (unlikely(err)) {
ext4_std_error(sb, err);
goto errout;
}
/* ext4_reserve_inode_write() gets a reference on the iloc */
err = ext4_reserve_inode_write(handle, inode, &iloc);
......@@ -1243,7 +1245,7 @@ static struct buffer_head *ext4_get_bitmap(struct super_block *sb, __u64 block)
if (unlikely(!bh))
return NULL;
if (!bh_uptodate_or_lock(bh)) {
if (bh_submit_read(bh) < 0) {
if (ext4_read_bh(bh, 0, NULL) < 0) {
brelse(bh);
return NULL;
}
......@@ -1806,8 +1808,8 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
o_blocks_count + add, add);
/* See if the device is actually as big as what was requested */
bh = sb_bread(sb, o_blocks_count + add - 1);
if (!bh) {
bh = ext4_sb_bread(sb, o_blocks_count + add - 1, 0);
if (IS_ERR(bh)) {
ext4_warning(sb, "can't read last block, resize aborted");
return -ENOSPC;
}
......@@ -1932,8 +1934,8 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
int meta_bg;
/* See if the device is actually as big as what was requested */
bh = sb_bread(sb, n_blocks_count - 1);
if (!bh) {
bh = ext4_sb_bread(sb, n_blocks_count - 1, 0);
if (IS_ERR(bh)) {
ext4_warning(sb, "can't read last block, resize aborted");
return -ENOSPC;
}
......
This diff is collapsed.
......@@ -521,6 +521,8 @@ int ext4_register_sysfs(struct super_block *sb)
proc_create_single_data("es_shrinker_info", S_IRUGO,
sbi->s_proc, ext4_seq_es_shrinker_info_show,
sb);
proc_create_single_data("fc_info", 0444, sbi->s_proc,
ext4_fc_info_show, sb);
proc_create_seq_data("mb_groups", S_IRUGO, sbi->s_proc,
&ext4_mb_seq_groups_ops, sb);
}
......
......@@ -2419,6 +2419,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
if (IS_SYNC(inode))
ext4_handle_sync(handle);
}
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
cleanup:
brelse(is.iloc.bh);
......@@ -2496,6 +2497,7 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name,
if (error == 0)
error = error2;
}
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
return error;
}
......@@ -2928,6 +2930,7 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
error);
goto cleanup;
}
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
}
error = 0;
cleanup:
......
......@@ -187,20 +187,48 @@ static int journal_wait_on_commit_record(journal_t *journal,
* use writepages() because with delayed allocation we may be doing
* block allocation in writepages().
*/
static int journal_submit_inode_data_buffers(struct address_space *mapping,
loff_t dirty_start, loff_t dirty_end)
int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
{
int ret;
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = mapping->nrpages * 2,
.range_start = dirty_start,
.range_end = dirty_end,
.range_start = jinode->i_dirty_start,
.range_end = jinode->i_dirty_end,
};
ret = generic_writepages(mapping, &wbc);
return ret;
/*
* submit the inode data buffers. We use writepage
* instead of writepages. Because writepages can do
* block allocation with delalloc. We need to write
* only allocated blocks here.
*/
return generic_writepages(mapping, &wbc);
}
/* Send all the data buffers related to an inode */
int jbd2_submit_inode_data(struct jbd2_inode *jinode)
{
if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
return 0;
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
return jbd2_journal_submit_inode_data_buffers(jinode);
}
EXPORT_SYMBOL(jbd2_submit_inode_data);
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode)
{
if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) ||
!jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
return 0;
return filemap_fdatawait_range_keep_errors(
jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start,
jinode->i_dirty_end);
}
EXPORT_SYMBOL(jbd2_wait_inode_data);
/*
* Submit all the data buffers of inode associated with the transaction to
......@@ -215,29 +243,20 @@ static int journal_submit_data_buffers(journal_t *journal,
{
struct jbd2_inode *jinode;
int err, ret = 0;
struct address_space *mapping;
spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
loff_t dirty_start = jinode->i_dirty_start;
loff_t dirty_end = jinode->i_dirty_end;
if (!(jinode->i_flags & JI_WRITE_DATA))
continue;
mapping = jinode->i_vfs_inode->i_mapping;
jinode->i_flags |= JI_COMMIT_RUNNING;
spin_unlock(&journal->j_list_lock);
/*
* submit the inode data buffers. We use writepage
* instead of writepages. Because writepages can do
* block allocation with delalloc. We need to write
* only allocated blocks here.
*/
/* submit the inode data buffers. */
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
err = journal_submit_inode_data_buffers(mapping, dirty_start,
dirty_end);
if (!ret)
ret = err;
if (journal->j_submit_inode_data_buffers) {
err = journal->j_submit_inode_data_buffers(jinode);
if (!ret)
ret = err;
}
spin_lock(&journal->j_list_lock);
J_ASSERT(jinode->i_transaction == commit_transaction);
jinode->i_flags &= ~JI_COMMIT_RUNNING;
......@@ -248,6 +267,15 @@ static int journal_submit_data_buffers(journal_t *journal,
return ret;
}
int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
{
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
return filemap_fdatawait_range_keep_errors(mapping,
jinode->i_dirty_start,
jinode->i_dirty_end);
}
/*
* Wait for data submitted for writeout, refile inodes to proper
* transaction if needed.
......@@ -262,18 +290,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
/* For locking, see the comment in journal_submit_data_buffers() */
spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
loff_t dirty_start = jinode->i_dirty_start;
loff_t dirty_end = jinode->i_dirty_end;
if (!(jinode->i_flags & JI_WAIT_DATA))
continue;
jinode->i_flags |= JI_COMMIT_RUNNING;
spin_unlock(&journal->j_list_lock);
err = filemap_fdatawait_range_keep_errors(
jinode->i_vfs_inode->i_mapping, dirty_start,
dirty_end);
if (!ret)
ret = err;
/* wait for the inode data buffers writeout. */
if (journal->j_finish_inode_data_buffers) {
err = journal->j_finish_inode_data_buffers(jinode);
if (!ret)
ret = err;
}
spin_lock(&journal->j_list_lock);
jinode->i_flags &= ~JI_COMMIT_RUNNING;
smp_mb();
......@@ -413,6 +439,20 @@ void jbd2_journal_commit_transaction(journal_t *journal)
J_ASSERT(journal->j_running_transaction != NULL);
J_ASSERT(journal->j_committing_transaction == NULL);
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_FULL_COMMIT_ONGOING;
while (journal->j_flags & JBD2_FAST_COMMIT_ONGOING) {
DEFINE_WAIT(wait);
prepare_to_wait(&journal->j_fc_wait, &wait,
TASK_UNINTERRUPTIBLE);
write_unlock(&journal->j_state_lock);
schedule();
write_lock(&journal->j_state_lock);
finish_wait(&journal->j_fc_wait, &wait);
}
write_unlock(&journal->j_state_lock);
commit_transaction = journal->j_running_transaction;
trace_jbd2_start_commit(journal, commit_transaction);
......@@ -420,6 +460,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
commit_transaction->t_tid);
write_lock(&journal->j_state_lock);
journal->j_fc_off = 0;
J_ASSERT(commit_transaction->t_state == T_RUNNING);
commit_transaction->t_state = T_LOCKED;
......@@ -1119,12 +1160,16 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (journal->j_commit_callback)
journal->j_commit_callback(journal, commit_transaction);
if (journal->j_fc_cleanup_callback)
journal->j_fc_cleanup_callback(journal, 1);
trace_jbd2_end_commit(journal, commit_transaction);
jbd_debug(1, "JBD2: commit %d complete, head %d\n",
journal->j_commit_sequence, journal->j_tail_sequence);
write_lock(&journal->j_state_lock);
journal->j_flags &= ~JBD2_FULL_COMMIT_ONGOING;
journal->j_flags &= ~JBD2_FAST_COMMIT_ONGOING;
spin_lock(&journal->j_list_lock);
commit_transaction->t_state = T_FINISHED;
/* Check if the transaction can be dropped now that we are finished */
......@@ -1136,6 +1181,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
spin_unlock(&journal->j_list_lock);
write_unlock(&journal->j_state_lock);
wake_up(&journal->j_wait_done_commit);
wake_up(&journal->j_fc_wait);
/*
* Calculate overall stats
......
This diff is collapsed.
This diff is collapsed.
......@@ -883,6 +883,10 @@ int ocfs2_journal_init(struct ocfs2_journal *journal, int *dirty)
OCFS2_JOURNAL_DIRTY_FL);
journal->j_journal = j_journal;
journal->j_journal->j_submit_inode_data_buffers =
jbd2_journal_submit_inode_data_buffers;
journal->j_journal->j_finish_inode_data_buffers =
jbd2_journal_finish_inode_data_buffers;
journal->j_inode = inode;
journal->j_bh = bh;
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment