Commits · dcfc593f7b3a35e340f0cefa3281a3285ddb48e8 · Kirill Smelkov / linux

An error occurred fetching the project authors.

22 Oct, 2023 40 commits

bcachefs: Fix page state after fallocate · dcfc593f

Kent Overstreet authored 3 years ago

This tweaks the fallocate code to also update the page cache to reflect
the new on disk reservations, giving us better i_sectors consistency.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

dcfc593f

bcachefs: Fix page state when reading into !PageUptodate pages · e6ec361f

Kent Overstreet authored 3 years ago

This patch adds code to read page state before writing to pages that
aren't uptodate, which corrects i_sectors being tempororarily too large
and means we may not need to get a disk reservation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# Conflicts:
#	fs/bcachefs/fs-io.c

e6ec361f

bcachefs: Kill PAGE_SECTOR_SHIFT · 7279c1a2

Kent Overstreet authored 3 years ago

Replace it with the new, standard PAGE_SECTORS_SHIFT
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

7279c1a2

bcachefs: Apply workaround for too many btree iters to read path · 084d42bb

Kent Overstreet authored 3 years ago

Reading from cached data, which calls bch2_bucket_io_time_reset(), is
leading to transaction iterator overflows - this standardizes the
workaround.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

084d42bb

bcachefs: SECTOR_DIRTY_RESERVED · b44a66a6

Kent Overstreet authored 3 years ago

This fixes another i_sectors accounting bug - we need to differentiate
between dirty writes that overwrite a reservation and dirty writes to
unallocated space - dirty writes to unallocated space increase
i_sectors, dirty writes over a reservation do not.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

b44a66a6

bcachefs: Fix i_sectors_leak in bch2_truncate_page · b19d307d

Kent Overstreet authored 3 years ago

When bch2_truncate_page() discards dirty sectors in the page cache, we
need to account for that - we don't need to account for allocated
sectors because that'll be done by the bch2_fpunch() call when it
updates the btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

b19d307d

bcachefs: Fix an i_sectors accounting bug · 8810386f

Kent Overstreet authored 3 years ago

We weren't checking for errors before calling i_sectors_acct()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8810386f

bcachefs: Don't check for -ENOSPC in page writeback · f74a5051

Kent Overstreet authored 3 years ago

If at all possible we'd prefer to not fail page writeback unless the
filesystem has been shutdown; allowing errors in page writeback means
things we'd like to assert about i_size consistency between the VFS and
the btree go out the window.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f74a5051

bcachefs: Fallocate fixes · 74163da7

Kent Overstreet authored 3 years ago

- fpunch wasn't always correctly updating i_size - when we drop buffered
  writes that were extending a file, we become responsible for writing
  i_size.

- fzero was sometimes zeroing out more data that it should have -
  block_start and block_end were being rounded in the wrong directions
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

74163da7

bcachefs: Switch fsync to use bi_journal_seq · 68a2054d

Kent Overstreet authored 3 years ago

Now that we're recording in each inode the journal sequence number of
the most recent update, fsync becomes a lot simpler and we can delete
all the plumbing for ei_journal_seq.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

68a2054d

bcachefs: Fix restart handling in for_each_btree_key() · e5fa91d7

Kent Overstreet authored 3 years ago

Code that uses for_each_btree_key often wants transaction restarts to be
handled locally and not returned. Originally, we wouldn't return
transaction restarts if there was a single iterator in the transaction -
the reasoning being if there weren't other iterators being invalidated,
and the current iterator was being advanced/retraversed, there weren't
any locks or iterators we were required to preserve.

But with the btree_path conversion that approach doesn't work anymore -
even when we're using for_each_btree_key() with a single iterator there
will still be two paths in the transaction, since we now always preserve
the path at the pos the iterator was initialized at - the reason being
that on restart we often restart from the same place.

And it turns out there's now a lot of for_each_btree_key() uses that _do
not_ want transaction restarts handled locally, and should be returning
them.

This patch splits out for_each_btree_key_norestart() and
for_each_btree_key_continue_norestart(), and converts existing users as
appropriate. for_each_btree_key(), for_each_btree_key_continue(), and
for_each_btree_node() now handle transaction restarts themselves by
calling bch2_trans_begin() when necessary - and the old hack to not
return transaction restarts when there's a single path in the
transaction has been deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e5fa91d7

bcachefs: bch2_trans_exit() no longer returns errors · 9a796fdb

Kent Overstreet authored 3 years ago

Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

9a796fdb

bcachefs: Convert io paths for snapshots · 8c6d298a

Kent Overstreet authored 3 years ago

This plumbs around the subvolume ID as was done previously for other
filesystem code, but now for the IO paths - the control flow in the IO
paths is trickier so the changes in this patch are more involved.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8c6d298a

bcachefs: Plumb through subvolume id · 6fed42bb

Kent Overstreet authored 3 years ago

To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

6fed42bb

bcachefs: btree_path · 67e0dd8f

Kent Overstreet authored 3 years ago

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

67e0dd8f

bcachefs: Reduce iter->trans usage · 9f6bd307

Kent Overstreet authored 3 years ago

Disfavoured, and should go away.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

9f6bd307

bcachefs: Fix an unhandled transaction restart · 3737e0dd

Kent Overstreet authored 3 years ago

__bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may
cause a transaction restart, which we don't return an error for because
it doesn't prevent us from making forward progress on the read we're
submitting.

Instead, change __bch2_read() and bchfs_read() to check for transaction
restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

3737e0dd

bcachefs: Use bch2_trans_begin() more consistently · 700c25b3

Kent Overstreet authored 3 years ago

Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

700c25b3

bcachefs: Always check for transaction restarts · 8b3e9bd6

Kent Overstreet authored 3 years ago

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8b3e9bd6

bcachefs: Use bch2_inode_find_by_inum() in truncate · b97bbd4e

Kent Overstreet authored 3 years ago

This is needed for snapshots because we need to start handling lock
restarts even when just calling bch2_inode_peek().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

b97bbd4e

bcachefs: Fix a memory leak in the dio write path · 5468f119

Kent Overstreet authored 3 years ago

There were some error paths where we were leaking page refs - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5468f119

bcachefs: fix truncate without a size change · 78d66ab1

Dan Robertson authored 3 years ago

Do not attempt to shortcut a truncate when the given new size is
the same as the current size. There may be blocks allocated to the
file that extend beyond the i_size. The ctime and mtime should
not be updated in this case.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

78d66ab1

bcachefs: fix truncate with ATTR_MODE · 68a507a2

Kent Overstreet authored 3 years ago

After the v5.12 rebase, we started oopsing when truncate was passed
ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This
refactors things so that truncate/extend finish by using
bch2_setattr_nonsize(), which solves the problem.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

68a507a2

bcachefs: Improve iter->should_be_locked · 8c3f6da9

Kent Overstreet authored 3 years ago

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8c3f6da9

bcachefs: Fix a memory leak in dio write path · 2ed5cd50

Kent Overstreet authored 3 years ago

Commit c42bca92 "bio: don't copy bvec
for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec
at the incoming biovec, meaning if we already allocated one, it'll be
leaked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

2ed5cd50

bcachefs: Preallocate transaction mem · f7beb4ca

Kent Overstreet authored 3 years ago

This helps avoid transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f7beb4ca

bcachefs: Don't use bch_write_op->cl for delivering completions · 9f311f21

Kent Overstreet authored 2 years ago

We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.

Two reasons:
 - op->end_io is more efficient, due to fewer atomic ops, this completes
   the conversion that was originally only done for the direct IO path.
 - We'll be restructing the write path to use a different mechanism for
   punting to process context, refactoring to not use op->cl will make
   that easier.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9f311f21

bcachefs: Fix for buffered writes getting -ENOSPC · a6336910

Kent Overstreet authored 3 years ago

Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

a6336910

bcachefs: Make bch2_remap_range respect O_SYNC · e7084c9c
Kent Overstreet authored 3 years ago
```
Caught by xfstest generic/628
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
e7084c9c

bcachefs: Ratelimiting for writeback IOs · ef1b2092

Kent Overstreet authored 3 years ago

Writeback throttling is a kernel config option and not always enabled.
When it's not enabled we need a fallback, to avoid unbounded memory
pinning and work item backlogs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ef1b2092

bcachefs: Ensure that fpunch updates inode timestamps · 050197b1

Kent Overstreet authored 3 years ago

Fixes xfstests generic/059
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

050197b1

bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack · 694015c2

Kent Overstreet authored 3 years ago

Upcoming patch is going to disallow multiple btree_trans on the stack.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

694015c2

bcachefs: Require all btree iterators to be freed · 50dc0f69

Kent Overstreet authored 3 years ago

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

50dc0f69

bcachefs: Kill reflink option · 87a432f5

Kent Overstreet authored 3 years ago

An option was added to control whether reflink support was on or off
because for a long time, reflink + inline data extent support was
missing - but that's since been fixed, so we can drop the option now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

87a432f5

bcachefs: Fix read retry path for indirect extents · 5ff75ccb

Kent Overstreet authored 3 years ago

In the read path, for retry of indirect extents to work we need to
differentiate between the location in the btree the read was for, vs.
the location where we found the data. This patch adds that plumbing to
bch_read_bio.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5ff75ccb

bcachefs: Rename BTREE_ID enums for consistency with other enums · 41f8b09e

Kent Overstreet authored 4 years ago

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

41f8b09e

bcachefs: Fix bch2_btree_iter_peek_prev() · 3d495595

Kent Overstreet authored 4 years ago

This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev()
consistent with peek() and next(), w.r.t. iter->pos.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3d495595

bcachefs: Fix loopback in dio mode · b4725cc1

Kent Overstreet authored 4 years ago

We had a deadlock on page_lock, because buffered reads signal completion
by unlocking the page, but the dio read path normally dirties the pages
it's reading to with set_page_dirty_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b4725cc1

bcachefs: Fix .splice_write · 032ac32c
Kent Overstreet authored 3 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
032ac32c

bcachefs: Reduce/kill BKEY_PADDED use · 07a1006a

Kent Overstreet authored 4 years ago

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

07a1006a