- 18 Jul, 2024 5 commits
-
-
Kent Overstreet authored
We can't have more updates than paths, so btree_path_idx_t is the correct type to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
If a btree_trans is in use it's supposed to be passed to fsck_err so that it can be unlocked if we're waiting on userspace input; but the btree IO paths do call fsck errors where a btree_trans exists on the stack but it's not passed through. But it's ok, because it's unlocked while doing IO. Fixes: a850bde6 ("bcachefs: fsck_err() may now take a btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This causes us to go read-only - need an error message saying why. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Tavian Barnes authored
Shifting a negative value left is undefined. Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
- 15 Jul, 2024 1 commit
-
-
Tavian Barnes authored
memcpy's second parameter must not be NULL, even if size is zero. Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
- 14 Jul, 2024 34 commits
-
-
Kent Overstreet authored
We no longer track individual btree node locks with lockdep, so this will never be enabled. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
perusal of /sys/kernel/debug/bcachefs/*/btree_transaction_stats shows that the read path has been acculumalating unneeded paths on the reflink btree, which we don't want. The solution is to call bch2_trans_begin(), which drops paths not used on previous loop iteration. bch2_readahead: Max mem used: 0 Transaction duration: count: 194235 since mount recent duration of events min: 150 ns max: 9 ms total: 838 ms mean: 4 us 6 us stddev: 34 us 7 us time between events min: 10 ns max: 15 h mean: 2 s 12 s stddev: 2 s 3 ms Maximum allocated btree paths (193): path: idx 2 ref 0:0 P btree=extents l=0 pos 270943112:392:U32_MAX locks 0 path: idx 3 ref 1:0 S btree=extents l=0 pos 270943112:24578:U32_MAX locks 1 path: idx 4 ref 0:0 P btree=reflink l=0 pos 0:24773509:0 locks 0 path: idx 5 ref 0:0 P S btree=reflink l=0 pos 0:24773631:0 locks 1 path: idx 6 ref 0:0 P S btree=reflink l=0 pos 0:24773759:0 locks 1 path: idx 7 ref 0:0 P S btree=reflink l=0 pos 0:24773887:0 locks 1 path: idx 8 ref 0:0 P S btree=reflink l=0 pos 0:24774015:0 locks 1 path: idx 9 ref 0:0 P S btree=reflink l=0 pos 0:24774143:0 locks 1 path: idx 10 ref 0:0 P S btree=reflink l=0 pos 0:24774271:0 locks 1 <many more reflink paths> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Hongbo Li authored
If label is not set, the Label tag in superblock info show '(none)'. ``` [Before] Device index: 0 Label: Version: 1.4: member_seq [After] Device index: 0 Label: (none) Version: 1.4: member_seq ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Unnecessary here, and this broke the rust bindings: error[E0588]: packed type cannot transitively contain a `#[repr(align)]` type --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:29025:1 | 29025 | pub struct bkey_i_inode_v3 { | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | note: `bch_inode_v3` has a `#[repr(align)]` attribute --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:8949:1 | 8949 | pub struct bch_inode_v3 { | ^^^^^^^^^^^^^^^^^^^^^^^ error[E0588]: packed type cannot transitively contain a `#[repr(align)]` type --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:32826:1 | 32826 | pub struct bkey_inode_buf { | ^^^^^^^^^^^^^^^^^^^^^^^^^ | note: `bch_inode_v3` has a `#[repr(align)]` attribute --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:8949:1 | 8949 | pub struct bch_inode_v3 { | ^^^^^^^^^^^^^^^^^^^^^^^ note: `bkey_inode_buf` contains a field of type `bkey_i_inode_v3` --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:32827:9 | 32827 | pub inode: bkey_i_inode_v3, | ^^^^^ note: ...which contains a field of type `bch_inode_v3` --> /build/source/target/release/build/bch_bindgen-9445b24c90aca2a3/out/bcachefs.rs:29027:9 | 29027 | pub v: bch_inode_v3, | ^ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
highly damaged filesystems, or filesystems that have been damaged and repair and damaged again, may have sequence numbers we can't fully trust - which in itself is something we need to debug. Add a journal_seq fallback so that repair doesn't get stuck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This adds lockdep tracking for held btree locks with a single dep_map in btree_trans, i.e. tracking all held btree locks as one object. This is more practical and more useful than having lockdep track held btree locks individually, because - we can take more locks than lockdep can track (unbounded, now that we have dynamically resizable btree paths) - there's no lock ordering between btree locks for lockdep to track (we do cycle detection) - and this makes it easy to teach lockdep that btree locks are not safe to hold while invoking memory reclaim. The last rule is one that lockdep would never learn, because we only do trylock() from within shrinkers - but we very much do not want to be invoking memory reclaim while holding btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Add a new helper to disable lockdep tracking entirely for a given class. This is needed for bcachefs, which takes too many btree node locks for lockdep to track. Instead, we have a single lockdep_map for "btree_trans has any btree nodes locked", which makes more since given that we have centralized lock management and a cycle detector. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
printing the raw values can occasionally be very useful Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Eliminate possible integer truncation bugs on 32 bit Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We're not always mounting when we start the filesystem Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This repurposes the promote path, which already knows how to call data_update() after a read: we now automatically rewrite bad data when we get a read error and then successfully retry from a different replica. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
fsck passes read_only as a mount option, and it's required for nochanges, which it also uses. Usually read_only is handled by the VFS, but we need to be able to handle it too; we just don't want to print it out twice, so mark it as a hidden option. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Don't allocate the new bkey_cached until after we've done the btree lookup; this means we can kill bkey_cached.valid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This fixes an accounting mismatch for cached data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This adds a new helper, bch2_folio_reservation_get_partial(), which reserves as many blocks as possible and may return partial success. __bch2_buffered_write() is switched to the new helper - this fixes fstests generic/275, the write until -ENOSPC test. generic/230 now fails: this appears to be a test bug, where xfs_io isn't looping after a partial write to get the error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Hongbo Li authored
Add support for STATX_DIOALIGN to bcachefs, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/bcachefs/test statx(/mnt/bcachefs/test) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/bcachefs/test statx(/mnt/bcachefs/test) = 0 dio mem align:1 dio offset align:512 ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Pankaj Raghav authored
Set the preferred folio order in the fgp_flags by calling fgf_set_order(). Page cache will try to allocate large folio of the preferred order whenever possible instead of allocating multiple 0 order folios. This improves the buffered write performance up to 1.25x with default mount options and up to 1.57x when mounted with no_data_io option with the following fio workload: fio --name=bcachefs --filename=/mnt/test --size=100G \ --ioengine=io_uring --iodepth=16 --rw=write --bs=128k Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Pankaj Raghav authored
Use FGP_WRITEBEGIN to avoid repeating the individual FGP flags before starting a buffered write. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
gc_lock is now only for synchronization between check_alloc_info and interior btree updates - nothing else Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Brian Foster authored
Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
this is an internal implementation detail - and we're improving key cache coherency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
new helper - small refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
we have a separate helper for releasing write locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
gc_pos is now based on keys, not nodes, for invariantness w.r.t. splits and merges Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-