Commits · 360f0342b2e9374298e2222c846f3fe9d0295f0d · Kirill Smelkov / linux

03 Jan, 2024 2 commits

Merge tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 360f0342

Linus Torvalds authored Jan 03, 2024

Pull tracing fixes from Steven Rostedt:

 - Fix a NULL kernel dereference in set_gid() on tracefs mounting.

   When tracefs is mounted with "gid=1000", it will update the existing
   dentries to have the new gid. The tracefs_inode which is retrieved by
   a container_of(dentry->d_inode) has flags to see if the inode belongs
   to the eventfs system.

   The issue that was fixed was if getdents() was called on tracefs that
   was previously mounted, and was not closed. It will leave a "cursor
   dentry" in the subdirs list of the current dentries that set_gid()
   walks. On a remount of tracefs, the container_of(dentry->d_inode)
   will dereference a NULL pointer and cause a crash when referenced.

   Simply have a check for dentry->d_inode to see if it is NULL and if
   so, skip that entry.

 - Fix the bits of the eventfs_inode structure.

   The "is_events" bit was taken from the nr_entries field, but the
   nr_entries field wasn't updated to be 30 bits and was still 31.
   Including the "is_freed" bit this would use 33 bits which would make
   the structure use another integer for just one bit.

* tag 'trace-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Fix bitwise fields for "is_events"
  tracefs: Check for dentry->d_inode exists in set_gid()

360f0342

Merge tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefs · 981d0413

Linus Torvalds authored Jan 03, 2024

Pull bcachefs from Kent Overstreet:
 "More bcachefs bugfixes for 6.7, and forwards compatibility work:

   - fix for a nasty extents + snapshot interaction, reported when
     reflink of a snapshotted file wouldn't complete but turned out to
     be a more general bug

   - fix for an invalid free in dio write path when iov vector was
     longer than our inline vector

   - fix for a buffer overflow in the nocow write path -
     BCH_REPLICAS_MAX doesn't actually limit the number of pointers in
     an extent when cached pointers are included

   - RO snapshots are actually RO now

   - And, a new superblock section to avoid future breakage when the
     disk space acounting rewrite rolls out: the new superblock section
     describes versions that need work to downgrade, where the work
     required is a list of recovery passes and errors to silently fix"

* tag 'bcachefs-2024-01-01' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: make RO snapshots actually RO
  bcachefs: bch_sb_field_downgrade
  bcachefs: bch_sb.recovery_passes_required
  bcachefs: Add persistent identifiers for recovery passes
  bcachefs: prt_bitflags_vector()
  bcachefs: move BCH_SB_ERRS() to sb-errors_types.h
  bcachefs: fix buffer overflow in nocow write path
  bcachefs: DARRAY_PREALLOCATED()
  bcachefs: Switch darray to kvmalloc()
  bcachefs: Factor out darray resize slowpath
  bcachefs: fix setting version_upgrade_complete
  bcachefs: fix invalid free in dio write path
  bcachefs: Fix extents iteration + snapshots interaction

981d0413

02 Jan, 2024 2 commits

eventfs: Fix bitwise fields for "is_events" · fd56cd5f

Steven Rostedt (Google) authored Jan 02, 2024

A flag was needed to denote which eventfs_inode was the "events"
directory, so a bit was taken from the "nr_entries" field, as there's not
that many entries, and 2^30 is plenty. But the bit number for nr_entries
was not updated to reflect the bit taken from it, which would add an
unnecessary integer to the structure.

Link: https://lore.kernel.org/linux-trace-kernel/20240102151832.7ca87275@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 7e8358ed ("eventfs: Fix file and directory uid and gid ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

fd56cd5f

tracefs: Check for dentry->d_inode exists in set_gid() · ad579864

Steven Rostedt (Google) authored Jan 02, 2024

If a getdents() is called on the tracefs directory but does not get all
the files, it can leave a "cursor" dentry in the d_subdirs list of tracefs
dentry. This cursor dentry does not have a d_inode for it. Before
referencing tracefs_inode from the dentry, the d_inode must first be
checked if it has content. If not, then it's not a tracefs_inode and can
be ignored.

The following caused a crash:

 #define getdents64(fd, dirp, count) syscall(SYS_getdents64, fd, dirp, count)
 #define BUF_SIZE 256
 #define TDIR "/tmp/file0"

 int main(void)
 {
	char buf[BUF_SIZE];
	int fd;
       	int n;

       	mkdir(TDIR, 0777);
	mount(NULL, TDIR, "tracefs", 0, NULL);
       	fd = openat(AT_FDCWD, TDIR, O_RDONLY);
       	n = getdents64(fd, buf, BUF_SIZE);
       	ret = mount(NULL, TDIR, NULL, MS_NOSUID|MS_REMOUNT|MS_RELATIME|MS_LAZYTIME,
		    "gid=1000");
	return 0;
 }

That's because the 256 BUF_SIZE was not big enough to read all the
dentries of the tracefs file system and it left a "cursor" dentry in the
subdirs of the tracefs root inode. Then on remounting with "gid=1000",
it would cause an iteration of all dentries which hit:

	ti = get_tracefs(dentry->d_inode);
	if (ti && (ti->flags & TRACEFS_EVENT_INODE))
		eventfs_update_gid(dentry, gid);

Which crashed because of the dereference of the cursor dentry which had a NULL
d_inode.

In the subdir loop of the dentry lookup of set_gid(), if a child has a
NULL d_inode, simply skip it.

Link: https://lore.kernel.org/all/20240102135637.3a21fb10@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240102151249.05da244d@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 7e8358ed ("eventfs: Fix file and directory uid and gid ownership")
Reported-by: "Ubisectech Sirius" <bugreport@ubisectech.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ad579864

01 Jan, 2024 13 commits

bcachefs: make RO snapshots actually RO · 0d72ab35

Kent Overstreet authored Dec 29, 2023

Add checks to all the VFS paths for "are we in a RO snapshot?".

Note - we don't check this when setting inode options via our xattr
interface, since those generally only affect data placement, not
contents of data.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>

0d72ab35

bcachefs: bch_sb_field_downgrade · 84f16387

Kent Overstreet authored Dec 29, 2023

Add a new superblock section that contains a list of
  { minor version, recovery passes, errors_to_fix }

that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.

The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

84f16387

bcachefs: bch_sb.recovery_passes_required · 8b16413c

Kent Overstreet authored Dec 29, 2023

Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.

 - recovery_passes_requried: recovery passes that must be run on the
   next mount
 - errors_silent: errors that will be silently fixed

These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8b16413c

bcachefs: Add persistent identifiers for recovery passes · 808c680f

Kent Overstreet authored Dec 29, 2023

The next patch will start to refer to recovery passes from the
superblock; naturally, we now need identifiers that don't change, since
the existing enum is in the order in which they are run and is not
fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

808c680f

bcachefs: prt_bitflags_vector() · 560661d4

Kent Overstreet authored Dec 30, 2023

similar to prt_bitflags(), but for ulong arrays
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

560661d4

bcachefs: move BCH_SB_ERRS() to sb-errors_types.h · 6b49b0f7
Kent Overstreet authored Dec 30, 2023
```
we need BCH_SB_ERR_MAX in bcachefs.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
6b49b0f7

bcachefs: fix buffer overflow in nocow write path · d9534cc9

Kent Overstreet authored Dec 30, 2023

BCH_REPLICAS_MAX isn't the actual maximum number of pointers in an
extent, it's the maximum number of dirty pointers.

We don't have a real restriction on the number of cached pointers, and
we don't want a fixed size array here anyways - so switch to
DARRAY_PREALLOCATED().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>

d9534cc9

bcachefs: DARRAY_PREALLOCATED() · 099dc5c2

Kent Overstreet authored Dec 30, 2023

Add support to darray for preallocating some number of elements.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

099dc5c2

bcachefs: Switch darray to kvmalloc() · a58a6a58

Kent Overstreet authored Nov 16, 2023

We sometimes use darrays for quite large buffers - the btree write
buffer in particular needs large buffers, since it must be sized to hold
all the write buffer keys outstanding in the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a58a6a58

bcachefs: Factor out darray resize slowpath · 73ab9e03

Kent Overstreet authored Nov 08, 2023

Move the slowpath (actually growing the darray) to an out-of-line
function; also, add some helpers for the upcoming btree write buffer
rewrite.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

73ab9e03

bcachefs: fix setting version_upgrade_complete · f87bf892

Kent Overstreet authored Dec 29, 2023

If a superblock write hasn't happened (i.e. we never had to go rw), then
c->sb.version will be out of date w.r.t. c->disk_sb.sb->version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f87bf892

bcachefs: fix invalid free in dio write path · f2eb8434

Kent Overstreet authored Dec 30, 2023

turns out iterate_iovec() mutates __iov, we need to save our own copy
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: Marcin Mirosław <marcin@mejor.pl>

f2eb8434

bcachefs: Fix extents iteration + snapshots interaction · fa014953

Kent Overstreet authored Dec 29, 2023

peek_upto() checks against the end position and bails out before
FILTER_SNAPSHOTS checks; this is because if we end up at a different
inode number than the original search key none of the keys we see might
be visibile in the current snapshot - we might be looking at inode in a
completely different subvolume.

But this is broken, because when we're iterating over extents we're
checking against the extent start position to decide when to bail out,
and the extent start position isn't monotonically increasing until after
we've run FILTER_SNAPSHOTS.

Fix this by adding a simple inode number check where the old bailout
check was, and moving the main check to the correct position.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>

fa014953

31 Dec, 2023 3 commits

Linux 6.7-rc8 · 610a9b8f
Linus Torvalds authored Dec 31, 2023

610a9b8f

get_maintainer: remove stray punctuation when cleaning file emails · 2639772a

Alvin Šipraga authored Dec 19, 2023

When parsing emails from .yaml files in particular, stray punctuation
such as a leading '-' can end up in the name.  For example, consider a
common YAML section such as:

  maintainers:
    - devicetree@vger.kernel.org

This would previously be processed by get_maintainer.pl as:

  - <devicetree@vger.kernel.org>

Make the logic in clean_file_emails more robust by deleting any
sub-names which consist of common single punctuation marks before
proceeding to the best-effort name extraction logic.  The output is then
correct:

  devicetree@vger.kernel.org

Some additional comments are added to the function to make things
clearer to future readers.

Link: https://lore.kernel.org/all/0173e76a36b3a9b4e7f324dd3a36fd4a9757f302.camel@perches.com/Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2639772a

get_maintainer: correctly parse UTF-8 encoded names in files · 9c334eb9

Alvin Šipraga authored Dec 19, 2023

While the script correctly extracts UTF-8 encoded names from the
MAINTAINERS file, the regular expressions damage my name when parsing
from .yaml files. Fix this by replacing the Latin-1-compatible regular
expressions with the unicode property matcher \p{L}, which matches on
any letter according to the Unicode General Category of letters.

The proposed solution only works if the script uses proper string
encoding from the outset, so instruct Perl to unconditionally open all
files with UTF-8 encoding. This should be safe, as the entire source
tree is either UTF-8 or ASCII encoded anyway. See [1] for a detailed
analysis.

Furthermore, to prevent the \w expression from matching non-ASCII when
checking for whether a name should be escaped with quotes, add the /a
flag to the regular expression. The escaping logic was duplicated in
two places, so it has been factored out into its own function.

The original issue was also identified on the tools mailing list [2].
This should solve the observed side effects there as well.

Link: https://lore.kernel.org/all/dzn6uco4c45oaa3ia4u37uo5mlt33obecv7gghj2l756fr4hdh@mt3cprft3tmq/ [1]
Link: https://lore.kernel.org/tools/20230726-gush-slouching-a5cd41@meerkat/ [2]
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9c334eb9

30 Dec, 2023 5 commits

Merge tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 453f5db0

Linus Torvalds authored Dec 30, 2023

Pull tracing fixes from Steven Rostedt:

 - Fix readers that are blocked on the ring buffer when buffer_percent
   is 100%. They are supposed to wake up when the buffer is full, but
   because the sub-buffer that the writer is on is never considered
   "dirty" in the calculation, dirty pages will never equal nr_pages.
   Add +1 to the dirty count in order to count for the sub-buffer that
   the writer is on.

 - When a reader is blocked on the "snapshot_raw" file, it is to be
   woken up when a snapshot is done and be able to read the snapshot
   buffer. But because the snapshot swaps the buffers (the main one with
   the snapshot one), and the snapshot reader is waiting on the old
   snapshot buffer, it was not woken up (because it is now on the main
   buffer after the swap). Worse yet, when it reads the buffer after a
   snapshot, it's not reading the snapshot buffer, it's reading the live
   active main buffer.

   Fix this by forcing a wakeup of all readers on the snapshot buffer
   when a new snapshot happens, and then update the buffer that the
   reader is reading to be back on the snapshot buffer.

 - Fix the modification of the direct_function hash. There was a race
   when new functions were added to the direct_function hash as when it
   moved function entries from the old hash to the new one, a direct
   function trace could be hit and not see its entry.

   This is fixed by allocating the new hash, copy all the old entries
   onto it as well as the new entries, and then use rcu_assign_pointer()
   to update the new direct_function hash with it.

   This also fixes a memory leak in that code.

 - Fix eventfs ownership

* tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Fix modification of direct_function hash while in use
  tracing: Fix blocked reader of snapshot buffer
  ring-buffer: Fix wake ups when buffer_percent is set to 100
  eventfs: Fix file and directory uid and gid ownership

453f5db0

locking/osq_lock: Clarify osq_wait_next() · b106bcf0

David Laight authored Dec 29, 2023

Directly return NULL or 'next' instead of breaking out of the loop.
Signed-off-by: David Laight <david.laight@aculab.com>
[ Split original patch into two independent parts  - Linus ]
Link: https://lore.kernel.org/lkml/7c8828aec72e42eeb841ca0ee3397e9a@AcuMS.aculab.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b106bcf0

locking/osq_lock: Clarify osq_wait_next() calling convention · 563adbfc

David Laight authored Dec 29, 2023

osq_wait_next() is passed 'prev' from osq_lock() and NULL from
osq_unlock() but only needs the 'cpu' value to write to lock->tail.

Just pass prev->cpu or OSQ_UNLOCKED_VAL instead.

Should have no effect on the generated code since gcc manages to assume
that 'prev != NULL' due to an earlier dereference.
Signed-off-by: David Laight <david.laight@aculab.com>
[ Changed 'old' to 'old_cpu' by request from Waiman Long  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

563adbfc

locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c · 7c223098

David Laight authored Dec 29, 2023

struct optimistic_spin_node is private to the implementation.
Move it into the C file to ensure nothing is accessing it.
Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7c223098

ftrace: Fix modification of direct_function hash while in use · d05cb470

Steven Rostedt (Google) authored Dec 29, 2023

Masami Hiramatsu reported a memory leak in register_ftrace_direct() where
if the number of new entries are added is large enough to cause two
allocations in the loop:

        for (i = 0; i < size; i++) {
                hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
                        new = ftrace_add_rec_direct(entry->ip, addr, &free_hash);
                        if (!new)
                                goto out_remove;
                        entry->direct = addr;
                }
        }

Where ftrace_add_rec_direct() has:

        if (ftrace_hash_empty(direct_functions) ||
            direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
                struct ftrace_hash *new_hash;
                int size = ftrace_hash_empty(direct_functions) ? 0 :
                        direct_functions->count + 1;

                if (size < 32)
                        size = 32;

                new_hash = dup_hash(direct_functions, size);
                if (!new_hash)
                        return NULL;

                *free_hash = direct_functions;
                direct_functions = new_hash;
        }

The "*free_hash = direct_functions;" can happen twice, losing the previous
allocation of direct_functions.

But this also exposed a more serious bug.

The modification of direct_functions above is not safe. As
direct_functions can be referenced at any time to find what direct caller
it should call, the time between:

                new_hash = dup_hash(direct_functions, size);
 and
                direct_functions = new_hash;

can have a race with another CPU (or even this one if it gets interrupted),
and the entries being moved to the new hash are not referenced.

That's because the "dup_hash()" is really misnamed and is really a
"move_hash()". It moves the entries from the old hash to the new one.

Now even if that was changed, this code is not proper as direct_functions
should not be updated until the end. That is the best way to handle
function reference changes, and is the way other parts of ftrace handles
this.

The following is done:

 1. Change add_hash_entry() to return the entry it created and inserted
    into the hash, and not just return success or not.

 2. Replace ftrace_add_rec_direct() with add_hash_entry(), and remove
    the former.

 3. Allocate a "new_hash" at the start that is made for holding both the
    new hash entries as well as the existing entries in direct_functions.

 4. Copy (not move) the direct_function entries over to the new_hash.

 5. Copy the entries of the added hash to the new_hash.

 6. If everything succeeds, then use rcu_pointer_assign() to update the
    direct_functions with the new_hash.

This simplifies the code and fixes both the memory leak as well as the
race condition mentioned above.

Link: https://lore.kernel.org/all/170368070504.42064.8960569647118388081.stgit@devnote2/
Link: https://lore.kernel.org/linux-trace-kernel/20231229115134.08dd5174@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 763e34e7 ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

d05cb470

29 Dec, 2023 10 commits

Merge tag 'gpio-fixes-for-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · f016f754

Linus Torvalds authored Dec 29, 2023

Pull gpio fixes from Bartosz Golaszewski:

 - Andy steps down as GPIO reviewer

 - Kent becomes a reviewer for GPIO uAPI

 - add missing intel file to the relevant MAINTAINERS section

* tag 'gpio-fixes-for-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: Add a missing file to the INTEL GPIO section
  MAINTAINERS: Remove Andy from GPIO maintainers
  MAINTAINERS: split out the uAPI into a new section

f016f754

Merge tag 'platform-drivers-x86-v6.7-6' of... · e543d0b5

Linus Torvalds authored Dec 29, 2023

Merge tag 'platform-drivers-x86-v6.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

 - Intel PMC GBE LTR regression

 - P2SB / PCI deadlock fix

* tag 'platform-drivers-x86-v6.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback
  platform/x86/intel/pmc: Allow reenabling LTRs
  platform/x86/intel/pmc: Add suspend callback
  platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe

e543d0b5

Merge tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux · 09c57a76

Linus Torvalds authored Dec 29, 2023

Pull block fixes from Jens Axboe:
 "Fix for a badly numbered flag, and a regression fix for the badblocks
  updates from this merge window"

* tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux:
  block: renumber QUEUE_FLAG_HW_WC
  badblocks: avoid checking invalid range in badblocks_check()

09c57a76

tracing: Fix blocked reader of snapshot buffer · 39a7dc23

Steven Rostedt (Google) authored Dec 28, 2023

If an application blocks on the snapshot or snapshot_raw files, expecting
to be woken up when a snapshot occurs, it will not happen. Or it may
happen with an unexpected result.

That result is that the application will be reading the main buffer
instead of the snapshot buffer. That is because when the snapshot occurs,
the main and snapshot buffers are swapped. But the reader has a descriptor
still pointing to the buffer that it originally connected to.

This is fine for the main buffer readers, as they may be blocked waiting
for a watermark to be hit, and when a snapshot occurs, the data that the
main readers want is now on the snapshot buffer.

But for waiters of the snapshot buffer, they are waiting for an event to
occur that will trigger the snapshot and they can then consume it quickly
to save the snapshot before the next snapshot occurs. But to do this, they
need to read the new snapshot buffer, not the old one that is now
receiving new data.

Also, it does not make sense to have a watermark "buffer_percent" on the
snapshot buffer, as the snapshot buffer is static and does not receive new
data except all at once.

Link: https://lore.kernel.org/linux-trace-kernel/20231228095149.77f5b45d@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

39a7dc23

ring-buffer: Fix wake ups when buffer_percent is set to 100 · 623b1f89

Steven Rostedt (Google) authored Dec 26, 2023

The tracefs file "buffer_percent" is to allow user space to set a
water-mark on how much of the tracing ring buffer needs to be filled in
order to wake up a blocked reader.

 0 - is to wait until any data is in the buffer
 1 - is to wait for 1% of the sub buffers to be filled
 50 - would be half of the sub buffers are filled with data
 100 - is not to wake the waiter until the ring buffer is completely full

Unfortunately the test for being full was:

	dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
	return (dirty * 100) > (full * nr_pages);

Where "full" is the value for "buffer_percent".

There is two issues with the above when full == 100.

1. dirty * 100 > 100 * nr_pages will never be true
   That is, the above is basically saying that if the user sets
   buffer_percent to 100, more pages need to be dirty than exist in the
   ring buffer!

2. The page that the writer is on is never considered dirty, as dirty
   pages are only those that are full. When the writer goes to a new
   sub-buffer, it clears the contents of that sub-buffer.

That is, even if the check was ">=" it would still not be equal as the
most pages that can be considered "dirty" is nr_pages - 1.

To fix this, add one to dirty and use ">=" in the compare.

Link: https://lore.kernel.org/linux-trace-kernel/20231226125902.4a057f1d@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 03329f99 ("tracing: Add tracefs file buffer_percentage")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

623b1f89

platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback · 70681aa0

David E. Box authored Dec 22, 2023

Commit 80495120 ("platform/x86:intel/pmc: Combine core_init() and
core_configure()") caused a network performance regression due to the GBE
LTR ignore that it added at probe. This was needed in order to allow the
SoC to enter the deepest Package C state. To fix the regression and at
least support PC10 during suspend, move the LTR ignore from probe to the
suspend callback, and enable it again on resume. This solution will allow
PC10 during suspend but restrict Package C entry at runtime to no deeper
than PC8/9 while a network cable it attach to the PCH LAN.

Fixes: 80495120 ("platform/x86:intel/pmc: Combine core_init() and core_configure()")
Signed-off-by: "David E. Box" <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20231223032548.1680738-6-david.e.box@linux.intel.comReviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

70681aa0

platform/x86/intel/pmc: Allow reenabling LTRs · 6f9cc5c1

David E. Box authored Dec 22, 2023

Commit 80495120 ("platform/x86:intel/pmc: Combine core_init() and
core_configure()") caused a network performance regression due to the GBE
LTR ignore that it added during probe. The fix will move the ignore to
occur at suspend-time (so as to not affect suspend power). This will
require the ability to enable the LTR again on resume. Modify
pmc_core_send_ltr_ignore() to allow enabling an LTR.

Fixes: 80495120 ("platform/x86:intel/pmc: Combine core_init() and core_configure()")
Signed-off-by: "David E. Box" <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20231223032548.1680738-5-david.e.box@linux.intel.comReviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

6f9cc5c1

platform/x86/intel/pmc: Add suspend callback · 7c13f365

David E. Box authored Dec 22, 2023

Add a suspend callback to struct pmc for performing platform specific tasks
before device suspend. This is needed in order to perform GBE LTR ignore on
certain platforms at suspend-time instead of at probe-time and replace the
GBE LTR ignore removal that was done in order to fix a bug introduced by
commit 80495120 ("platform/x86:intel/pmc: Combine core_init() and
core_configure()").

Fixes: 80495120 ("platform/x86:intel/pmc: Combine core_init() and core_configure()")
Signed-off-by: "David E. Box" <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20231223032548.1680738-4-david.e.box@linux.intel.comReviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

7c13f365

platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe · b28ff7a7

Shin'ichiro Kawasaki authored Dec 29, 2023

p2sb_bar() unhides P2SB device to get resources from the device. It
guards the operation by locking pci_rescan_remove_lock so that parallel
rescans do not find the P2SB device. However, this lock causes deadlock
when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan
locks pci_rescan_remove_lock and probes PCI devices. When PCI devices
call p2sb_bar() during probe, it locks pci_rescan_remove_lock again.
Hence the deadlock.

To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar().
Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources()
for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(),
refer the cache and return to the caller.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Fixes: 9745fb07 ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support")
Cc: stable@vger.kernel.org
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/
Link: https://lore.kernel.org/r/20231229063912.2517922-2-shinichiro.kawasaki@wdc.comSigned-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

b28ff7a7

Merge tag '6.7rc7-smb3-srv-fix' of git://git.samba.org/ksmbd · 8735c7c8

Linus Torvalds authored Dec 28, 2023

Pull ksmbd server fix from Steve French:

 - address possible slab out of bounds in parsing of open requests

* tag '6.7rc7-smb3-srv-fix' of git://git.samba.org/ksmbd:
  ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()

8735c7c8

28 Dec, 2023 5 commits

Merge tag 'kbuild-fixes-v6.7-2' of... · 505e701c

Linus Torvalds authored Dec 28, 2023

Merge tag 'kbuild-fixes-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Revive proper alignment for the ksymtab and kcrctab sections

 - Fix gen_compile_commands.py tool to resolve symbolic links

 - Fix symbolic links to installed debug VDSO files

 - Update MAINTAINERS

* tag 'kbuild-fixes-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  linux/export: Ensure natural alignment of kcrctab array
  kbuild: fix build ID symlinks to installed debug VDSO files
  gen_compile_commands.py: fix path resolve with symlinks in it
  MAINTAINERS: Add scripts/clang-tools to Kbuild section
  linux/export: Fix alignment for 64-bit ksymtab entries

505e701c

Merge tag 'bcachefs-2023-12-27' of https://evilpiepirate.org/git/bcachefs · eeec2599

Linus Torvalds authored Dec 28, 2023

Pull bcachefs fixes from Kent Overstreet:
 "Just a few fixes: besides a few one liners, we have a fix for
  snapshots + compression where the extent update path didn't account
  for the fact that with snapshots, we might split an existing extent
  into three, not just two; and a small fixup for promotes which were
  broken by the recent changes in the data update path to correctly take
  into account device durability"

* tag 'bcachefs-2023-12-27' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix promotes
  bcachefs: Fix leakage of internal error code
  bcachefs: Fix insufficient disk reservation with compression + snapshots
  bcachefs: fix BCH_FSCK_ERR enum

eeec2599

linux/export: Ensure natural alignment of kcrctab array · 753547de

Helge Deller authored Dec 28, 2023

The ___kcrctab section holds an array of 32-bit CRC values.
Add a .balign 4 to tell the linker the correct memory alignment.

Fixes: f3304ecd ("linux/export: use inline assembler to populate symbol CRCs")
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

753547de

ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16() · d10c7787

Namjae Jeon authored Dec 20, 2023

If ->NameOffset/Length is bigger than ->CreateContextsOffset/Length,
ksmbd_check_message doesn't validate request buffer it correctly.
So slab-out-of-bounds warning from calling smb_strndup_from_utf16()
in smb2_open() could happen. If ->NameLength is non-zero, Set the larger
of the two sums (Name and CreateContext size) as the offset and length of
the data area.
Reported-by: Yang Chaoming <lometsj@live.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

d10c7787

Merge tag 'mm-hotfixes-stable-2023-12-27-15-00' of... · f5837722

Linus Torvalds authored Dec 27, 2023

Merge tag 'mm-hotfixes-stable-2023-12-27-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "11 hotfixes. 7 are cc:stable and the other 4 address post-6.6 issues
  or are not considered backporting material"

* tag 'mm-hotfixes-stable-2023-12-27-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: add an old address for Naoya Horiguchi
  mm/memory-failure: cast index to loff_t before shifting it
  mm/memory-failure: check the mapcount of the precise page
  mm/memory-failure: pass the folio and the page to collect_procs()
  selftests: secretmem: floor the memory size to the multiple of page_size
  mm: migrate high-order folios in swap cache correctly
  maple_tree: do not preallocate nodes for slot stores
  mm/filemap: avoid buffered read/write race to read inconsistent data
  kunit: kasan_test: disable fortify string checker on kmalloc_oob_memset
  kexec: select CRYPTO from KEXEC_FILE instead of depending on it
  kexec: fix KEXEC_FILE dependencies

f5837722