Commits · afd09b617db3786b6ef3dc43e28fe728cfea84df · Kirill Smelkov / linux

06 Jun, 2021 3 commits

ext4: fix memory leak in ext4_fill_super · afd09b61

Alexey Makhalov authored May 21, 2021

Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.

If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.

This can easily be reproduced by calling an infinite loop of:

  systemctl start <ext4_on_lvm>.mount, and
  systemctl stop <ext4_on_lvm>.mount

... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.

Fixes: ce40733c ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.comSigned-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

afd09b61

ext4: fix fast commit alignment issues · a7ba36bc

Harshad Shirwadkar authored May 19, 2021

Fast commit recovery data on disk may not be aligned. So, when the
recovery code reads it, this patch makes sure that fast commit info
found on-disk is first memcpy-ed into an aligned variable before
accessing it. As a consequence of it, we also remove some macros that
could resulted in unaligned accesses.

Cc: stable@kernel.org
Fixes: 8016e29f ("ext4: fast commit recovery path")
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

a7ba36bc

ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed · 082cd4ec

Ye Bin authored May 06, 2021

We got follow bug_on when run fsstress with injecting IO fault:
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
[130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995]  ext4_readpage+0x54/0x100 [ext4]
[130747.337359]  generic_file_buffered_read+0x410/0xae8
[130747.337767]  generic_file_read_iter+0x114/0x190
[130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556]  __vfs_read+0x11c/0x188
[130747.338851]  vfs_read+0x94/0x150
[130747.339110]  ksys_read+0x74/0xf0

This patch's modification is according to Jan Kara's suggestion in:
https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/
"I see. Now I understand your patch. Honestly, seeing how fragile is trying
to fix extent tree after split has failed in the middle, I would probably
go even further and make sure we fix the tree properly in case of ENOSPC
and EDQUOT (those are easily user triggerable).  Anything else indicates a
HW problem or fs corruption so I'd rather leave the extent tree as is and
don't try to fix it (which also means we will not create overlapping
extents)."

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

082cd4ec

03 Jun, 2021 1 commit

ext4: fix accessing uninit percpu counter variable with fast_commit · b45f189a

Ritesh Harjani authored Apr 29, 2021

When running generic/527 with fast_commit configuration, the following
issue is seen on Power.  With fast_commit, during ext4_fc_replay()
(which can be called from ext4_fill_super()), if inode eviction
happens then it can access an uninitialized percpu counter variable.

This patch adds the check before accessing the counters in
ext4_free_inode() path.

[  321.165371] run fstests generic/527 at 2021-04-29 08:38:43
[  323.027786] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: block_validity. Quota mode: none.
[  323.618772] BUG: Unable to handle kernel data access on read at 0x1fbd80000
[  323.619767] Faulting instruction address: 0xc000000000bae78c
cpu 0x1: Vector: 300 (Data Access) at [c000000010706ef0]
    pc: c000000000bae78c: percpu_counter_add_batch+0x3c/0x100
    lr: c0000000006d0bb0: ext4_free_inode+0x780/0xb90
    pid   = 5593, comm = mount
	ext4_free_inode+0x780/0xb90
	ext4_evict_inode+0xa8c/0xc60
	evict+0xfc/0x1e0
	ext4_fc_replay+0xc50/0x20f0
	do_one_pass+0xfe0/0x1350
	jbd2_journal_recover+0x184/0x2e0
	jbd2_journal_load+0x1c0/0x4a0
	ext4_fill_super+0x2458/0x4200
	mount_bdev+0x1dc/0x290
	ext4_mount+0x28/0x40
	legacy_get_tree+0x4c/0xa0
	vfs_get_tree+0x4c/0x120
	path_mount+0xcf8/0xd70
	do_mount+0x80/0xd0
	sys_mount+0x3fc/0x490
	system_call_exception+0x384/0x3d0
	system_call_common+0xec/0x278

Cc: stable@kernel.org
Fixes: 8016e29f ("ext4: fast commit recovery path")
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/6cceb9a75c54bef8fa9696c1b08c8df5ff6169e2.1619692410.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

b45f189a

21 May, 2021 1 commit

ext4: fix memory leak in ext4_mb_init_backend on error path. · a8867f4e

Phillip Potter authored Apr 12, 2021

Fix a memory leak discovered by syzbot when a file system is corrupted
with an illegally large s_log_groups_per_flex.

Reported-by: syzbot+aa12d6106ea4ca1b6aae@syzkaller.appspotmail.com
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20210412073837.1686-1-phil@philpotter.co.ukSigned-off-by: Theodore Ts'o <tytso@mit.edu>

a8867f4e

22 Apr, 2021 2 commits

ext4: wipe ext4_dir_entry2 upon file deletion · 6c091273

Leah Rumancik authored Apr 22, 2021

Upon file deletion, zero out all fields in ext4_dir_entry2 besides rec_len.
In case sensitive data is stored in filenames, this ensures no potentially
sensitive data is left in the directory entry upon deletion. Also, wipe
these fields upon moving a directory entry during the conversion to an
htree and when splitting htree nodes.

The data wiped may still exist in the journal, but there are future
commits planned to address this.
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Link: https://lore.kernel.org/r/20210422180834.2242353-1-leah.rumancik@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

6c091273

ext4: Fix occasional generic/418 failure · 5899593f

Jan Kara authored Apr 15, 2021

Eric has noticed that after pagecache read rework, generic/418 is
occasionally failing for ext4 when blocksize < pagesize. In fact, the
pagecache rework just made hard to hit race in ext4 more likely. The
problem is that since ext4 conversion of direct IO writes to iomap
framework (commit 378f32ba), we update inode size after direct IO
write only after invalidating page cache. Thus if buffered read sneaks
at unfortunate moment like:

CPU1 - write at offset 1k                       CPU2 - read from offset 0
iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT);
                                                ext4_readpage();
ext4_handle_inode_extension()

the read will zero out tail of the page as it still sees smaller inode
size and thus page cache becomes inconsistent with on-disk contents with
all the consequences.

Fix the problem by moving inode size update into end_io handler which
gets called before the page cache is invalidated.
Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com>
Fixes: 378f32ba ("ext4: introduce direct I/O write using iomap infrastructure")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

5899593f

18 Apr, 2021 1 commit

fs: fix reporting supported extra file attributes for statx() · 5afa7e8b

Theodore Ts'o authored Apr 17, 2021

statx(2) notes that any attribute that is not indicated as supported
by stx_attributes_mask has no usable value.  Commits 801e5237
("fs: move generic stat response attr handling to vfs_getattr_nosec")
and 712b2698 ("fs/stat: Define DAX statx attribute") sets
STATX_ATTR_AUTOMOUNT and STATX_ATTR_DAX, respectively, without setting
stx_attributes_mask, which can cause xfstests generic/532 to fail.

Fix this in the same way as commit 1b9598c8 ("xfs: fix reporting
supported extra file attributes for statx()")

Fixes: 801e5237 ("fs: move generic stat response attr handling to vfs_getattr_nosec")
Fixes: 712b2698 ("fs/stat: Define DAX statx attribute")
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

5afa7e8b

13 Apr, 2021 1 commit

ext4: allow the dax flag to be set and cleared on inline directories · 4811d992

Theodore Ts'o authored Apr 12, 2021

This is needed to allow generic/607 to pass for file systems with the
inline data_feature enabled, and it allows the use of file systems
where the directories use inline_data, while the files are accessed
via DAX.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

4811d992

10 Apr, 2021 9 commits

ext4: fix debug format string warning · fcdf3c34

Arnd Bergmann authored Apr 09, 2021

Using no_printk() for jbd_debug() revealed two warnings:

fs/jbd2/recovery.c: In function 'fc_do_one_pass':
fs/jbd2/recovery.c:256:30: error: format '%d' expects a matching 'int' argument [-Werror=format=]
  256 |                 jbd_debug(3, "Processing fast commit blk with seq %d");
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ext4/fast_commit.c: In function 'ext4_fc_replay_add_range':
fs/ext4/fast_commit.c:1732:30: error: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int' [-Werror=format=]
 1732 |                 jbd_debug(1, "Converting from %d to %d %lld",

The first one was added incorrectly, and was also missing a few newlines
in debug output, and the second one happened when the type of an
argument changed.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: d5564351 ("jbd2: avoid -Wempty-body warnings")
Fixes: 6db07461 ("ext4: use BIT() macro for BH_** state bits")
Fixes: 5b849b5f ("jbd2: fast commit recovery path")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210409201211.1866633-1-arnd@kernel.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

fcdf3c34

ext4: fix trailing whitespace · 666245d9

Jack Qiu authored Apr 09, 2021

Made suggested modifications from checkpatch in reference to ERROR:
 trailing whitespace
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20210409042035.15516-1-jack.qiu@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

666245d9

ext4: fix various seppling typos · 3088e5a5

Bhaskar Chowdhury authored Mar 27, 2021

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Link: https://lore.kernel.org/r/cover.1616840203.git.unixbhaskar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

3088e5a5

ext4: fix error return code in ext4_fc_perform_commit() · e1262cd2

Xu Yihang authored Apr 08, 2021

In case of if not ext4_fc_add_tlv branch, an error return code is missing.

Cc: stable@kernel.org
Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Yihang <xuyihang@huawei.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210408070033.123047-1-xuyihang@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

e1262cd2

ext4: annotate data race in jbd2_journal_dirty_metadata() · 83fe6b18

Jan Kara authored Apr 06, 2021

Assertion checks in jbd2_journal_dirty_metadata() are known to be racy
but we don't want to be grabbing locks just for them.  We thus recheck
them under b_state_lock only if it looks like they would fail. Annotate
the checks with data_race().

Cc: stable@kernel.org
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

83fe6b18

ext4: annotate data race in start_this_handle() · 3b1833e9

Jan Kara authored Apr 06, 2021

Access to journal->j_running_transaction is not protected by appropriate
lock and thus is racy. We are well aware of that and the code handles
the race properly. Just add a comment and data_race() annotation.

Cc: stable@kernel.org
Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

3b1833e9

ext4: fix ext4_error_err save negative errno into superblock · 6810fad9

Ye Bin authored Apr 06, 2021

Fix As write_mmp_block() so that it returns -EIO instead of 1, so that
the correct error gets saved into the superblock.

Cc: stable@kernel.org
Fixes: 54d3adbc ("ext4: save all error info in save_error_info() and drop ext4_set_errno()")
Reported-by: Liu Zhi Qiang <liuzhiqiang26@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210406025331.148343-1-yebin10@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

6810fad9

ext4: fix error code in ext4_commit_super · f88f1466

Fengnan Chang authored Apr 02, 2021

We should set the error code when ext4_commit_super check argument failed.
Found in code review.
Fixes: c4be0c1d ("filesystem freeze: add error handling of write_super_lockfs/unlockfs").

Cc: stable@kernel.org
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

f88f1466

ext4: always panic when errors=panic is specified · ac2f7ca5

Ye Bin authored Apr 01, 2021

Before commit 014c9caa ("ext4: make ext4_abort() use
__ext4_error()"), the following series of commands would trigger a
panic:

1. mount /dev/sda -o ro,errors=panic test
2. mount /dev/sda -o remount,abort test

After commit 014c9caa, remounting a file system using the test
mount option "abort" will no longer trigger a panic.  This commit will
restore the behaviour immediately before commit 014c9caa.
(However, note that the Linux kernel's behavior has not been
consistent; some previous kernel versions, including 5.4 and 4.19
similarly did not panic after using the mount option "abort".)

This also makes a change to long-standing behaviour; namely, the
following series commands will now cause a panic, when previously it
did not:

1. mount /dev/sda -o ro,errors=panic test
2. echo test > /sys/fs/ext4/sda/trigger_fs_error

However, this makes ext4's behaviour much more consistent, so this is
a good thing.

Cc: stable@kernel.org
Fixes: 014c9caa ("ext4: make ext4_abort() use __ext4_error()")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20210401081903.3421208-1-yebin10@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

ac2f7ca5

09 Apr, 2021 10 commits

ext4: delete redundant uptodate check for buffer · 3cd46171

Yang Guo authored Apr 01, 2021

The buffer uptodate state has been checked in function set_buffer_uptodate,
there is no need use buffer_uptodate before calling set_buffer_uptodate and
delete it.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Yang Guo <guoyang2@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/1617260610-29770-1-git-send-email-zhangshaokun@hisilicon.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

3cd46171

ext4: do not set SB_ACTIVE in ext4_orphan_cleanup() · 72ffb49a

Zhang Yi authored Mar 31, 2021

When CONFIG_QUOTA is enabled, if we failed to mount the filesystem due
to some error happens behind ext4_orphan_cleanup(), it will end up
triggering a after free issue of super_block. The problem is that
ext4_orphan_cleanup() will set SB_ACTIVE flag if CONFIG_QUOTA is
enabled, after we cleanup the truncated inodes, the last iput() will put
them into the lru list, and these inodes' pages may probably dirty and
will be write back by the writeback thread, so it could be raced by
freeing super_block in the error path of mount_bdev().

After check the setting of SB_ACTIVE flag in ext4_orphan_cleanup(), it
was used to ensure updating the quota file properly, but evict inode and
trash data immediately in the last iput does not affect the quotafile,
so setting the SB_ACTIVE flag seems not required[1]. Fix this issue by
just remove the SB_ACTIVE setting.

[1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00

Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210331033138.918975-1-yi.zhang@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

72ffb49a

ext4: make prefetch_block_bitmaps default · 21175ca4

Harshad Shirwadkar authored Apr 01, 2021

Block bitmap prefetching is needed for these allocator optimization
data structures to get populated and provide better group scanning
order. So, turn it on bu default. prefetch_block_bitmaps mount option
is now marked as removed and a new option no_prefetch_block_bitmaps is
added to disable block bitmap prefetching.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210401172129.189766-8-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

21175ca4

ext4: add proc files to monitor new structures · f68f4063

Harshad Shirwadkar authored Apr 01, 2021

This patch adds a new file "mb_structs_summary" which allows us to see
the summary of the new allocator structures added in this
series. Here's the sample output of file:

optimize_scan: 1
max_free_order_lists:
        list_order_0_groups: 0
        list_order_1_groups: 0
        list_order_2_groups: 0
        list_order_3_groups: 0
        list_order_4_groups: 0
        list_order_5_groups: 0
        list_order_6_groups: 0
        list_order_7_groups: 0
        list_order_8_groups: 0
        list_order_9_groups: 0
        list_order_10_groups: 0
        list_order_11_groups: 0
        list_order_12_groups: 0
        list_order_13_groups: 40
fragment_size_tree:
        tree_min: 16384
        tree_max: 32768
        tree_nodes: 40
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20210401172129.189766-7-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

f68f4063

ext4: improve cr 0 / cr 1 group scanning · 196e402a

Harshad Shirwadkar authored Apr 01, 2021

Instead of traversing through groups linearly, scan groups in specific
orders at cr 0 and cr 1. At cr 0, we want to find groups that have the
largest free order >= the order of the request. So, with this patch,
we maintain lists for each possible order and insert each group into a
list based on the largest free order in its buddy bitmap. During cr 0
allocation, we traverse these lists in the increasing order of largest
free orders. This allows us to find a group with the best available cr
0 match in constant time. If nothing can be found, we fallback to cr 1
immediately.

At CR1, the story is slightly different. We want to traverse in the
order of increasing average fragment size. For CR1, we maintain a rb
tree of groupinfos which is sorted by average fragment size. Instead
of traversing linearly, at CR1, we traverse in the order of increasing
average fragment size, starting at the most optimal group. This brings
down cr 1 search complexity to log(num groups).

For cr >= 2, we just perform the linear search as before. Also, in
case of lock contention, we intermittently fallback to linear search
even in CR 0 and CR 1 cases. This allows us to proceed during the
allocation path even in case of high contention.

There is an opportunity to do optimization at CR2 too. That's because
at CR2 we only consider groups where bb_free counter (number of free
blocks) is greater than the request extent size. That's left as future
work.

All the changes introduced in this patch are protected under a new
mount option "mb_optimize_scan".

With this patchset, following experiment was performed:

Created a highly fragmented disk of size 65TB. The disk had no
contiguous 2M regions. Following command was run consecutively for 3
times:

time dd if=/dev/urandom of=file bs=2M count=10

Here are the results with and without cr 0/1 optimizations introduced
in this patch:

|---------+------------------------------+---------------------------|
|         | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
|---------+------------------------------+---------------------------|
| 1st run | 5m1.871s                     | 2m47.642s                 |
| 2nd run | 2m28.390s                    | 0m0.611s                  |
| 3rd run | 2m26.530s                    | 0m1.255s                  |
|---------+------------------------------+---------------------------|
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210401172129.189766-6-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

196e402a

ext4: add MB_NUM_ORDERS macro · 4b68f6df

Harshad Shirwadkar authored Apr 01, 2021

A few arrays in mballoc.c use the total number of valid orders as
their size. Currently, this value is set as "sb->s_blocksize_bits +
2". This makes code harder to read. So, instead add a new macro
MB_NUM_ORDERS(sb) to make the code more readable.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20210401172129.189766-5-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

4b68f6df

ext4: add mballoc stats proc file · a6c75eaf

Harshad Shirwadkar authored Apr 01, 2021

Add new stats for measuring the performance of mballoc. This patch is
forked from Artem Blagodarenko's work that can be found here:

https://github.com/lustre/lustre-release/blob/master/ldiskfs/kernel_patches/patches/rhel8/ext4-simple-blockalloc.patch

This patch reorganizes the stats by cr level. This is how the output
looks like:

mballoc:
	reqs: 0
	success: 0
	groups_scanned: 0
	cr0_stats:
		hits: 0
		groups_considered: 0
		useless_loops: 0
		bad_suggestions: 0
	cr1_stats:
		hits: 0
		groups_considered: 0
		useless_loops: 0
		bad_suggestions: 0
	cr2_stats:
		hits: 0
		groups_considered: 0
		useless_loops: 0
	cr3_stats:
		hits: 0
		groups_considered: 0
		useless_loops: 0
	extents_scanned: 0
		goal_hits: 0
		2^n_hits: 0
		breaks: 0
		lost: 0
	buddies_generated: 0/40
	buddies_time_used: 0
	preallocated: 0
	discarded: 0
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20210401172129.189766-4-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

a6c75eaf

ext4: add ability to return parsed options from parse_options · b237e304

Harshad Shirwadkar authored Apr 01, 2021

Before this patch, the function parse_options() was returning
journal_devnum and journal_ioprio variables to the caller. This patch
generalizes that interface to allow parse_options to return any parsed
options to return back to the caller. In this patch series, it gets
used to capture the value of "mb_optimize_scan=%u" mount option.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20210401172129.189766-3-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

b237e304

ext4: drop s_mb_bal_lock and convert protected fields to atomic · 67d25186

Harshad Shirwadkar authored Apr 01, 2021

s_mb_buddies_generated gets used later in this patch series to
determine if the cr 0 and cr 1 optimziations should be performed or
not. Currently, s_mb_buddies_generated is protected under a
spin_lock. In the allocation path, it is better if we don't depend on
the lock and instead read the value atomically. In order to do that,
we drop s_bal_lock altogether and we convert the only two protected
fields by it s_mb_buddies_generated and s_mb_generation_time to atomic
type.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20210401172129.189766-2-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

67d25186

ext4: fix check to prevent false positive report of incorrect used inodes · a149d2a5

Zhang Yi authored Mar 31, 2021

Commit <50122847> ("ext4: fix check to prevent initializing reserved
inodes") check the block group zero and prevent initializing reserved
inodes. But in some special cases, the reserved inode may not all belong
to the group zero, it may exist into the second group if we format
filesystem below.

  mkfs.ext4 -b 4096 -g 8192 -N 1024 -I 4096 /dev/sda

So, it will end up triggering a false positive report of a corrupted
file system. This patch fix it by avoid check reserved inodes if no free
inode blocks will be zeroed.

Cc: stable@kernel.org
Fixes: 50122847 ("ext4: fix check to prevent initializing reserved inodes")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210331121516.2243099-1-yi.zhang@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

a149d2a5

06 Apr, 2021 3 commits

jbd2: avoid -Wempty-body warnings · d5564351

Arnd Bergmann authored Mar 22, 2021

Building with 'make W=1' shows a harmless -Wempty-body warning:

fs/jbd2/recovery.c: In function 'fc_do_one_pass':
fs/jbd2/recovery.c:267:75: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
  267 |                 jbd_debug(3, "Fast commit replay failed, err = %d\n", err);
      |                                                                           ^

Change the empty dprintk() macros to no_printk(), which avoids this
warning and adds format string checking.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210322102152.95684-1-arnd@kernel.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

d5564351

ext4: optimize match for casefolded encrypted dirs · 1ae98e29

Daniel Rosenberg authored Mar 19, 2021

Matching names with casefolded encrypting directories requires
decrypting entries to confirm case since we are case preserving. We can
avoid needing to decrypt if our hash values don't match.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20210319073414.1381041-3-drosen@google.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

1ae98e29

ext4: handle casefolding with encryption · 471fbbea

Daniel Rosenberg authored Mar 19, 2021

This adds support for encryption with casefolding.

Since the name on disk is case preserving, and also encrypted, we can no
longer just recompute the hash on the fly. Additionally, to avoid
leaking extra information from the hash of the unencrypted name, we use
siphash via an fscrypt v2 policy.

The hash is stored at the end of the directory entry for all entries
inside of an encrypted and casefolded directory apart from those that
deal with '.' and '..'. This way, the change is backwards compatible
with existing ext4 filesystems.

[ Changed to advertise this feature via the file:
  /sys/fs/ext4/features/encrypted_casefold -- TYT ]
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210319073414.1381041-2-drosen@google.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

471fbbea

02 Apr, 2021 4 commits

ext4: remove unnecessary braces in fs/ext4/dir.c · 400086d7

Milan Djurovic authored Mar 15, 2021

Removes braces to follow the coding style.
Signed-off-by: Milan Djurovic <mdjurovic@zohomail.com>
Link: https://lore.kernel.org/r/20210316052953.67616-1-mdjurovic@zohomail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

400086d7

ext4: delete some unused tracepoint definitions · 6b3caab4

Eric Whitney authored Feb 16, 2021

A number of tracepoint instances have been removed from ext4 by past
patches but the definitions of those tracepoints have not.

All instances of ext4_ext_in_cache and ext4_ext_put_in_cache were
removed by commit 69eb33dc ("ext4: remove single extent cache").
ext4_get_reserved_cluster_alloc was removed by commit b6bf9171
("ext4: reduce reserved cluster count by number of allocated
clusters").  ext4_find_delalloc_range was removed by commit
7d1b1fbc ("ext4: reimplement ext4_find_delay_alloc_range on extent
status tree").

All instances of ext4_direct_IO_enter and ext4_direct_IO_exit were
removed by commit 378f32ba ("ext4: introduce direct I/O write
using iomap infrastructure").
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20210216191634.20957-1-enwlinux@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

6b3caab4

Updated locking documentation for transaction_t · 3042b1b4

Alexander Lochmann authored Feb 11, 2021

Some members of transaction_t are allowed to be read without any lock
being held if accessed from the correct context. We used LockDoc's
findings to determine those members. Each member of them is marked
with a short comment: "no lock needed for jbd2 thread".
Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210211171410.17984-1-alexander.lochmann@tu-dortmund.deSigned-off-by: Theodore Ts'o <tytso@mit.edu>

3042b1b4

ext4: updated locking documentation for journal_t · d699ae4f

Alexander Lochmann authored Feb 11, 2021

Some members of transaction_t are allowed to be read without any lock
being held if consistency doesn't matter. Based on LockDoc's
findings, we extended the locking documentation of those members.
Each one of them is marked with a short comment: "no lock for quick
racy checks".
Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/ad82c7a9-a624-4ed5-5ada-a6410c44c0b3@tu-dortmund.deSigned-off-by: Theodore Ts'o <tytso@mit.edu>

d699ae4f

25 Mar, 2021 2 commits

ext4: use memcpy_to_page() in pagecache_write() · bd256fda

Chaitanya Kulkarni authored Feb 07, 2021

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210207190425.38107-7-chaitanya.kulkarni@wdc.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

bd256fda

ext4: use memcpy_from_page() in pagecache_read() · 4d93874b

Chaitanya Kulkarni authored Feb 07, 2021

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210207190425.38107-6-chaitanya.kulkarni@wdc.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

4d93874b

21 Mar, 2021 3 commits

Linux 5.12-rc4 · 0d02ec6b
Linus Torvalds authored Mar 21, 2021

0d02ec6b

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · d7f5f1bd

Linus Torvalds authored Mar 21, 2021

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes for v5.12"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: initialize ret to suppress smatch warning
  ext4: stop inode update before return
  ext4: fix rename whiteout with fast commit
  ext4: fix timer use-after-free on failed mount
  ext4: fix potential error in ext4_do_update_inode
  ext4: do not try to set xattr into ea_inode if value is empty
  ext4: do not iput inode under running transaction in ext4_rename()
  ext4: find old entry again if failed to rename whiteout
  ext4: fix error handling in ext4_end_enable_verity()
  ext4: fix bh ref count on error paths
  fs/ext4: fix integer overflow in s_log_groups_per_flex
  ext4: add reclaim checks to xattr code
  ext4: shrink race window in ext4_should_retry_alloc()

d7f5f1bd

Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block · 2c41fab1

Linus Torvalds authored Mar 21, 2021

Pull io_uring followup fixes from Jens Axboe:

 - The SIGSTOP change from Eric, so we properly ignore that for
   PF_IO_WORKER threads.

 - Disallow sending signals to PF_IO_WORKER threads in general, we're
   not interested in having them funnel back to the io_uring owning
   task.

 - Stable fix from Stefan, ensuring we properly break links for short
   send/sendmsg recv/recvmsg if MSG_WAITALL is set.

 - Catch and loop when needing to run task_work before a PF_IO_WORKER
   threads goes to sleep.

* tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block:
  io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  io-wq: ensure task is running before processing task_work
  signal: don't allow STOP on PF_IO_WORKER threads
  signal: don't allow sending any signals to PF_IO_WORKER threads

2c41fab1