Commits · 783ae448b7a21ca59ffe5bc261c17d9c3ebfe2ad · Kirill Smelkov / linux

14 Apr, 2023 8 commits

ext4: Fix special handling of journalled data from extent zeroing · 783ae448

Jan Kara authored Mar 29, 2023

The handling of journalled data in ext4_zero_range() is incomplete. We
do not need to commit running transaction but we rather need to
checkpoint pages with journalled data. If we don't, journal tail can be
advanced beyond transaction containing the journalled data and if we
then crash before committing the transaction doing the zeroing we will
have inconsistent (too old) data in the file. Make sure file pages with
journalled data are properly checkpointed before removing them from the
page cache.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-8-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

783ae448

ext4: Drop special handling of journalled data from extent shifting operations · c000dfec

Jan Kara authored Mar 29, 2023

Now that filemap_write_and_wait() makes sure pages with journalled data
are safely on disk, ext4_collapse_range() and ext4_insert_range() do
not need special handling of journalled data.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-7-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

c000dfec

ext4: Drop special handling of journalled data from ext4_sync_file() · e360c6ed

Jan Kara authored Mar 29, 2023

Now that ext4_writepages() make sure all pages with journalled data are
stable on disk, we don't need special handling of journalled data in
ext4_sync_file().
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-6-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

e360c6ed

ext4: Commit transaction before writing back pages in data=journal mode · 1f1a55f0

Jan Kara authored Mar 29, 2023

When journalling data we currently just walk over pages, journal those
that are marked for delayed dirtying (only pinned pages dirtied behing
our back these days) and checkpoint other dirty pages. Because some
pages may be part of running transaction the result is that after
filemap_write_and_wait() we are not guaranteed pages are stable on disk.
Thus places that want to flush current pagecache content need to jump
through hoops to make sure journalled data is not lost. This is
manageable in cases completely controlled by ext4 (such as extent
shifting operations or inode eviction) but it gets ugly for stuff like
fsverity. Furthermore it is rather error prone as people often do not
realize journalled data needs special handling.

So change ext4_writepages() to commit transaction with inode's data
before going through the writeback loop in WB_SYNC_ALL mode. As a result
filemap_write_and_wait() is now really getting pages to stable storage
and makes pagecache pages safe to reclaim. Consequently we can remove
the special handling of journalled data from several places in follow up
patches.

Note that this will make fsync(2) for journalled data more expensive as
we will end up not only committing the transaction we need but also
checkpointing the data (which we may have previously skipped if the data
was part of the running transaction). If we really cared, we would need
to introduce special VFS function for writing out & invalidating page
cache for a range, use ->launder_page callback to perform checkpointing,
and use it from all the places that need this functionality. But at this
point I'm not convinced the complexity is worth it.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-5-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

1f1a55f0

ext4: Clear dirty bit from pages without data to write · 5e1bdea6

Jan Kara authored Mar 29, 2023

With journalled data it can happen that checkpointing code will write
out page contents without clearing the page dirty bit. The logic in
ext4_page_nomap_can_writeout() then results in us never calling
mpage_submit_page() and thus clearing the dirty bit. Drop the
optimization with ext4_page_nomap_can_writeout() and just always call to
mpage_submit_page(). ext4_bio_write_page() knows when to redirty the
page and the additional clearing & setting of page dirty bit for ordered
mode writeout is not that expensive to jump through the hoops for it.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-4-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

5e1bdea6

ext4: Keep pages with journalled data dirty · 265e72ef

Jan Kara authored Mar 29, 2023

Currently we clear page dirty bit when we checkpoint some buffers from a
page with journalled data or when we perform delayed dirtying of a page
in ext4_writepages(). In a quest to simplify handling of journalled data
we want to keep page dirty as long as it has either buffers to
checkpoint or journalled dirty data. So make sure to keep page dirty in
ext4_writepages() if it still has journalled data attached to it.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-3-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

265e72ef

ext4: Mark pages with journalled data dirty · d84c9ebd

Jan Kara authored Mar 29, 2023

Currently pages with journalled data written by write(2) or modified by
block zeroing during truncate(2) are not marked as dirty. They are
dirtied only once the transaction commits. This however makes writeback
code think inode has no pages to write and so ext4_writepages() is not
called to make pages with journalled data persistent. Mark pages with
journalled data dirty (similarly as it happens for writes through mmap)
so that writeback code knows about them and ext4_writepages() can do
what it needs to to the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-2-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

d84c9ebd

jdb2: Don't refuse invalidation of already invalidated buffers · bd159398

Jan Kara authored Mar 29, 2023

When invalidating buffers under the partial tail page,
jbd2_journal_invalidate_folio() returns -EBUSY if the buffer is part of
the committing transaction as we cannot safely modify buffer state.
However if the buffer is already invalidated (due to previous
invalidation attempts from ext4_wait_for_tail_page_commit()), there's
nothing to do and there's no point in returning -EBUSY. This fixes
occasional warnings from ext4_journalled_invalidate_folio() triggered by
generic/051 fstest when blocksize < pagesize.

Fixes: 53e87268 ("ext4: fix deadlock in journal_unmap_buffer()")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-1-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>

bd159398

06 Apr, 2023 32 commits

ext4: Use a folio in ext4_read_merkle_tree_page · e9ebecf2

Matthew Wilcox authored Mar 24, 2023

This is an implementation of fsverity_operations read_merkle_tree_page,
so it must still return the precise page asked for, but we can use the
folio API to reduce the number of conversions between folios & pages.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-30-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

e9ebecf2

ext4: Convert pagecache_read() to use a folio · b23fb762

Matthew Wilcox authored Mar 24, 2023

Use the folio API and support folios of arbitrary sizes.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-29-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

b23fb762

ext4: Convert mext_page_mkuptodate() to take a folio · 3060b6ef

Matthew Wilcox authored Mar 24, 2023

Use a folio throughout. Does not support large folios due to
an array sized for MAX_BUF_PER_PAGE, but it does remove a few
calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-28-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

3060b6ef

ext4: Use a folio iterator in __read_end_io() · f2b229a8

Matthew Wilcox authored Mar 24, 2023

Iterate once per folio, not once per page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-27-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

f2b229a8

ext4: Use a folio in ext4_page_mkwrite() · 9ea0e45b

Matthew Wilcox authored Mar 24, 2023

Convert to the folio API, saving a few calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-26-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

9ea0e45b

ext4: Convert ext4_block_write_begin() to take a folio · 86b38c27

Matthew Wilcox authored Mar 24, 2023

All the callers now have a folio, so pass that in and operate on folios.
Removes four calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-25-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

86b38c27

ext4: Convert ext4_mpage_readpages() to work on folios · c0be8e6f

Matthew Wilcox authored Mar 24, 2023

This definitely doesn't include support for large folios; there
are all kinds of assumptions about the number of buffers attached
to a folio. But it does remove several calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-24-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

c0be8e6f

ext4: Use a folio in ext4_da_write_begin() · 0b5a2543

Matthew Wilcox authored Mar 24, 2023

Remove a few calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-23-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

0b5a2543

ext4: Convert ext4_page_nomap_can_writeout to ext4_folio_nomap_can_writeout · 02e4b04c

Matthew Wilcox authored Mar 24, 2023

Its one caller already uses a folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-22-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

02e4b04c

ext4: Convert __ext4_block_zero_page_range() to use a folio · 9d3973de

Matthew Wilcox authored Mar 24, 2023

Use folio APIs throughout. Saves many calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-21-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

9d3973de

ext4: Convert ext4_journalled_zero_new_buffers() to use a folio · 86324a21

Matthew Wilcox authored Mar 24, 2023

Remove a call to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-20-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

86324a21

ext4: Use a folio in ext4_journalled_write_end() · feb22b77

Matthew Wilcox authored Mar 24, 2023

Convert the incoming page to a folio to remove a few calls to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-19-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

feb22b77

ext4: Convert ext4_write_end() to use a folio · 64fb3136

Matthew Wilcox authored Mar 24, 2023

Convert the incoming struct page to a folio. Replaces two implicit
calls to compound_head() with one explicit call.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-18-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

64fb3136

ext4: Convert ext4_write_begin() to use a folio · 4d934a5e

Matthew Wilcox authored Mar 24, 2023

Remove a lot of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-17-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

4d934a5e

ext4: Convert ext4_write_inline_data_end() to use a folio · 6b90d413

Matthew Wilcox authored Mar 24, 2023

Convert the incoming page to a folio so that we call compound_head()
only once instead of seven times.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-16-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

6b90d413

ext4: Convert ext4_read_inline_page() to ext4_read_inline_folio() · 6b87fbe4

Matthew Wilcox authored Mar 24, 2023

All callers now have a folio, so pass it and use it. The folio may
be large, although I doubt we'll want to use a large folio for an
inline file.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-15-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

6b87fbe4

ext4: Convert ext4_da_write_inline_data_begin() to use a folio · 9a9d01f0

Matthew Wilcox authored Mar 24, 2023

Saves a number of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-14-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

9a9d01f0

ext4: Convert ext4_da_convert_inline_data_to_extent() to use a folio · 4ed9b598

Matthew Wilcox authored Mar 24, 2023

Saves a number of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-13-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

4ed9b598

ext4: Convert ext4_try_to_write_inline_data() to use a folio · f8f8c89f

Matthew Wilcox authored Mar 24, 2023

Saves a number of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-12-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

f8f8c89f

ext4: Convert ext4_convert_inline_data_to_extent() to use a folio · 83eba701

Matthew Wilcox authored Mar 24, 2023

Saves a number of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-11-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

83eba701

ext4: Convert ext4_readpage_inline() to take a folio · 3edde93e

Matthew Wilcox authored Mar 24, 2023

Use the folio API in this function, saves a few calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-10-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

3edde93e

ext4: Convert ext4_bio_write_page() to ext4_bio_write_folio() · e8d6062c

Matthew Wilcox authored Mar 24, 2023

The only caller now has a folio so pass it in directly and avoid the call
to page_folio() at the beginning.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-9-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

e8d6062c

ext4: Convert mpage_page_done() to mpage_folio_done() · 33483b3b

Matthew Wilcox authored Mar 24, 2023

All callers now have a folio so we can pass one in and use the folio
APIs to support large folios as well as save instructions by eliminating
a call to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-8-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

33483b3b

ext4: Convert mpage_submit_page() to mpage_submit_folio() · 81a0d3e1

Matthew Wilcox authored Mar 24, 2023

All callers now have a folio so we can pass one in and use the folio
APIs to support large folios as well as save instructions by eliminating
calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-7-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

81a0d3e1

ext4: Turn mpage_process_page() into mpage_process_folio() · 4da2f6e3

Matthew Wilcox authored Mar 24, 2023

The page/folio is only used to extract the buffers, so this is a
simple change.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-6-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

4da2f6e3

ext4: Convert ext4_finish_bio() to use folios · bb64c08b

Matthew Wilcox authored Mar 24, 2023

Prepare ext4 to support large folios in the page writeback path.
Also set the actual error in the mapping, not just -EIO.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-5-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

bb64c08b

ext4: Convert ext4_bio_write_page() to use a folio · cd57b771

Matthew Wilcox authored Mar 24, 2023

Remove several calls to compound_head() and the last caller of
set_page_writeback_keepwrite(), so remove the wrapper too.

Also export bio_add_folio() as this is the first caller from a module.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-4-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

cd57b771

fscrypt: Add some folio helper functions · c76e14dc

Matthew Wilcox authored Mar 24, 2023

fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-3-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

c76e14dc

fs: Add FGP_WRITEBEGIN · e999a5c5

Matthew Wilcox authored Mar 24, 2023

This particular combination of flags is used by most filesystems
in their ->write_begin method, although it does find use in a
few other places.  Before folios, it warranted its own function
(grab_cache_page_write_begin()), but I think that just having specialised
flags is enough.  It certainly helps the few places that have been
converted from grab_cache_page_write_begin() to __filemap_get_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-2-willy@infradead.orgSigned-off-by: Theodore Ts'o <tytso@mit.edu>

e999a5c5

ext4: Remove the logic to trim inode PAs · 361eb69f

Ojaswin Mujoo authored Mar 25, 2023

Earlier, inode PAs were stored in a linked list. This caused a need to
periodically trim the list down inorder to avoid growing it to a very
large size, as this would severly affect performance during list
iteration.

Recent patches changed this list to an rbtree, and since the tree scales
up much better, we no longer need to have the trim functionality, hence
remove it.
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/c409addceaa3ade4b40328e28e3b54b2f259689e.1679731817.git.ojaswin@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

361eb69f

ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list · 38727786

Ojaswin Mujoo authored Mar 25, 2023

Currently, the kernel uses i_prealloc_list to hold all the inode
preallocations. This is known to cause degradation in performance in
workloads which perform large number of sparse writes on a single file.
This is mainly because functions like ext4_mb_normalize_request() and
ext4_mb_use_preallocated() iterate over this complete list, resulting in
slowdowns when large number of PAs are present.

Patch 27bc446e partially fixed this by enforcing a limit of 512 for
the inode preallocation list and adding logic to continually trim the
list if it grows above the threshold, however our testing revealed that
a hardcoded value is not suitable for all kinds of workloads.

To optimize this, add an rbtree to the inode and hold the inode
preallocations in this rbtree. This will make iterating over inode PAs
faster and scale much better than a linked list. Additionally, we also
had to remove the LRU logic that was added during trimming of the list
(in ext4_mb_release_context()) as it will add extra overhead in rbtree.
The discards now happen in the lowest-logical-offset-first order.

** Locking notes **

With the introduction of rbtree to maintain inode PAs, we can't use RCU
to walk the tree for searching since it can result in partial traversals
which might miss some nodes(or entire subtrees) while discards happen
in parallel (which happens under a lock). Hence this patch converts the
ei->i_prealloc_lock spin_lock to rw_lock.

Almost all the codepaths that read/modify the PA rbtrees are protected
by the higher level inode->i_data_sem (except
ext4_mb_discard_group_preallocations() and ext4_clear_inode()) IIUC, the
only place we need lock protection is when one thread is reading
"searching" the PA rbtree (earlier protected under rcu_read_lock()) and
another is "deleting" the PAs in ext4_mb_discard_group_preallocations()
function (which iterates all the PAs using the grp->bb_prealloc_list and
deletes PAs from the tree without taking any inode lock (i_data_sem)).

So, this patch converts all rcu_read_lock/unlock() paths for inode list
PA to use read_lock() and all places where we were using
ei->i_prealloc_lock spinlock will now be using write_lock().

Note that this makes the fast path (searching of the right PA e.g.
ext4_mb_use_preallocated() or ext4_mb_normalize_request()), now use
read_lock() instead of rcu_read_lock/unlock(). Ths also will now block
due to slow discard path (ext4_mb_discard_group_preallocations()) which
uses write_lock().

But this is not as bad as it looks. This is because -

1. The slow path only occurs when the normal allocation failed and we
can say that we are low on disk space. One can argue this scenario
won't be much frequent.

2. ext4_mb_discard_group_preallocations(), locks and unlocks the rwlock
for deleting every individual PA. This gives enough opportunity for
the fast path to acquire the read_lock for searching the PA inode
list.
Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/4137bce8f6948fedd8bae134dabae24acfe699c6.1679731817.git.ojaswin@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

38727786

ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union · a8e38fd3

Ojaswin Mujoo authored Mar 25, 2023

** Splitting pa->pa_inode_list **

Currently, we use the same pa->pa_inode_list to add a pa to either
the inode preallocation list or the locality group preallocation list.
For better clarity, split this list into a union of 2 list_heads and use
either of the them based on the type of pa.

** Splitting pa->pa_obj_lock **

Currently, pa->pa_obj_lock is either assigned &ei->i_prealloc_lock for
inode PAs or lg_prealloc_lock for lg PAs, and is then used to lock the
lists containing these PAs. Make the distinction between the 2 PA types
clear by changing this lock to a union of 2 locks. Explicitly use the
pa_lock_node.inode_lock for inode PAs and pa_lock_node.lg_lock for lg
PAs.

This patch is required so that the locality group preallocation code
remains the same as in upcoming patches we are going to make changes to
inode preallocation code to move from list to rbtree based
implementation. This patch also makes it easier to review the upcoming
patches.

There are no functional changes in this patch.
Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1d7ac0557e998c3fc7eef422b52e4bc67bdef2b0.1679731817.git.ojaswin@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

a8e38fd3