Commits · 21ca59b365c091d583f36ac753eaa8baf947be6f · Kirill Smelkov / linux

11 Oct, 2023 2 commits

binfmt_misc: enable sandboxed mounts · 21ca59b3

Christian Brauner authored Oct 28, 2021

Enable unprivileged sandboxes to create their own binfmt_misc mounts.
This is based on Laurent's work in [1] but has been significantly
reworked to fix various issues we identified in earlier versions.

While binfmt_misc can currently only be mounted in the initial user
namespace, binary types registered in this binfmt_misc instance are
available to all sandboxes (Either by having them installed in the
sandbox or by registering the binary type with the F flag causing the
interpreter to be opened right away). So binfmt_misc binary types are
already delegated to sandboxes implicitly.

However, while a sandbox has access to all registered binary types in
binfmt_misc a sandbox cannot currently register its own binary types
in binfmt_misc. This has prevented various use-cases some of which were
already outlined in [1] but we have a range of issues associated with
this (cf. [3]-[5] below which are just a small sample).

Extend binfmt_misc to be mountable in non-initial user namespaces.
Similar to other filesystem such as nfsd, mqueue, and sunrpc we use
keyed superblock management. The key determines whether we need to
create a new superblock or can reuse an already existing one. We use the
user namespace of the mount as key. This means a new binfmt_misc
superblock is created once per user namespace creation. Subsequent
mounts of binfmt_misc in the same user namespace will mount the same
binfmt_misc instance. We explicitly do not create a new binfmt_misc
superblock on every binfmt_misc mount as the semantics for
load_misc_binary() line up with the keying model. This also allows us to
retrieve the relevant binfmt_misc instance based on the caller's user
namespace which can be done in a simple (bounded to 32 levels) loop.

Similar to the current binfmt_misc semantics allowing access to the
binary types in the initial binfmt_misc instance we do allow sandboxes
access to their parent's binfmt_misc mounts if they do not have created
a separate binfmt_misc instance.

Overall, this will unblock the use-cases mentioned below and in general
will also allow to support and harden execution of another
architecture's binaries in tight sandboxes. For instance, using the
unshare binary it possible to start a chroot of another architecture and
configure the binfmt_misc interpreter without being root to run the
binaries in this chroot and without requiring the host to modify its
binary type handlers.

Henning had already posted a few experiments in the cover letter at [1].
But here's an additional example where an unprivileged container
registers qemu-user-static binary handlers for various binary types in
its separate binfmt_misc mount and is then seamlessly able to start
containers with a different architecture without affecting the host:

root    [lxc monitor] /var/snap/lxd/common/lxd/containers f1
1000000  \_ /sbin/init
1000000      \_ /lib/systemd/systemd-journald
1000000      \_ /lib/systemd/systemd-udevd
1000100      \_ /lib/systemd/systemd-networkd
1000101      \_ /lib/systemd/systemd-resolved
1000000      \_ /usr/sbin/cron -f
1000103      \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1000000      \_ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
1000104      \_ /usr/sbin/rsyslogd -n -iNONE
1000000      \_ /lib/systemd/systemd-logind
1000000      \_ /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220
1000107      \_ dnsmasq --conf-file=/dev/null -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --liste
1000000      \_ [lxc monitor] /var/lib/lxc f1-s390x
1100000          \_ /usr/bin/qemu-s390x-static /sbin/init
1100000              \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-journald
1100000              \_ /usr/bin/qemu-s390x-static /usr/sbin/cron -f
1100103              \_ /usr/bin/qemu-s390x-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-ac
1100000              \_ /usr/bin/qemu-s390x-static /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
1100104              \_ /usr/bin/qemu-s390x-static /usr/sbin/rsyslogd -n -iNONE
1100000              \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-logind
1100000              \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220
1100000              \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/0 115200,38400,9600 vt220
1100000              \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/1 115200,38400,9600 vt220
1100000              \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/2 115200,38400,9600 vt220
1100000              \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/3 115200,38400,9600 vt220
1100000              \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-udevd

[1]: https://lore.kernel.org/all/20191216091220.465626-1-laurent@vivier.eu
[2]: https://discuss.linuxcontainers.org/t/binfmt-misc-permission-denied
[3]: https://discuss.linuxcontainers.org/t/lxd-binfmt-support-for-qemu-static-interpreters
[4]: https://discuss.linuxcontainers.org/t/3-1-0-binfmt-support-service-in-unprivileged-guest-requires-write-access-on-hosts-proc-sys-fs-binfmt-misc
[5]: https://discuss.linuxcontainers.org/t/qemu-user-static-not-working-4-11

Link: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu (origin)
Link: https://lore.kernel.org/r/20211028103114.2849140-2-brauner@kernel.org (v1)
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jann Horn <jannh@google.com>
Cc: Henning Schild <henning.schild@siemens.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
/* v2 */
- Serge Hallyn <serge@hallyn.com>:
  - Use GFP_KERNEL_ACCOUNT for userspace triggered allocations when a
    new binary type handler is registered.
- Christian Brauner <christian.brauner@ubuntu.com>:
  - Switch authorship to me. I refused to do that earlier even though
    Laurent said I should do so because I think it's genuinely bad form.
    But by now I have changed so many things that it'd be unfair to
    blame Laurent for any potential bugs in here.
  - Add more comments that explain what's going on.
  - Rename functions while changing them to better reflect what they are
    doing to make the code easier to understand.
  - In the first version when a specific binary type handler was removed
    either through a write to the entry's file or all binary type
    handlers were removed by a write to the binfmt_misc mount's status
    file all cleanup work happened during inode eviction.
    That includes removal of the relevant entries from entry list. While
    that works fine I disliked that model after thinking about it for a
    bit. Because it means that there was a window were someone has
    already removed a or all binary handlers but they could still be
    safely reached from load_misc_binary() when it has managed to take
    the read_lock() on the entries list while inode eviction was already
    happening. Again, that perfectly benign but it's cleaner to remove
    the binary handler from the list immediately meaning that ones the
    write to then entry's file or the binfmt_misc status file returns
    the binary type cannot be executed anymore. That gives stronger
    guarantees to the user.

21ca59b3

binfmt_misc: cleanup on filesystem umount · 1c5976ef

Christian Brauner authored Oct 28, 2021

Currently, registering a new binary type pins the binfmt_misc
filesystem. Specifically, this means that as long as there is at least
one binary type registered the binfmt_misc filesystem survives all
umounts, i.e. the superblock is not destroyed. Meaning that a umount
followed by another mount will end up with the same superblock and the
same binary type handlers. This is a behavior we tend to discourage for
any new filesystems (apart from a few special filesystems such as e.g.
configfs or debugfs). A umount operation without the filesystem being
pinned - by e.g. someone holding a file descriptor to an open file -
should usually result in the destruction of the superblock and all
associated resources. This makes introspection easier and leads to
clearly defined, simple and clean semantics. An administrator can rely
on the fact that a umount will guarantee a clean slate making it
possible to reinitialize a filesystem. Right now all binary types would
need to be explicitly deleted before that can happen.

This allows us to remove the heavy-handed calls to simple_pin_fs() and
simple_release_fs() when creating and deleting binary types. This in
turn allows us to replace the current brittle pinning mechanism abusing
dget() which has caused a range of bugs judging from prior fixes in [2]
and [3]. The additional dget() in load_misc_binary() pins the dentry but
only does so for the sake to prevent ->evict_inode() from freeing the
node when a user removes the binary type and kill_node() is run. Which
would mean ->interpreter and ->interp_file would be freed causing a UAF.

This isn't really nicely documented nor is it very clean because it
relies on simple_pin_fs() pinning the filesystem as long as at least one
binary type exists. Otherwise it would cause load_misc_binary() to hold
on to a dentry belonging to a superblock that has been shutdown.
Replace that implicit pinning with a clean and simple per-node refcount
and get rid of the ugly dget() pinning. A similar mechanism exists for
e.g. binderfs (cf. [4]). All the cleanup work can now be done in
->evict_inode().

In a follow-up patch we will make it possible to use binfmt_misc in
sandboxes. We will use the cleaner semantics where a umount for the
filesystem will cause the superblock and all resources to be
deallocated. In preparation for this apply the same semantics to the
initial binfmt_misc mount. Note, that this is a user-visible change and
as such a uapi change but one that we can reasonably risk. We've
discussed this in earlier versions of this patchset (cf. [1]).

The main user and provider of binfmt_misc is systemd. Systemd provides
binfmt_misc via autofs since it is configurable as a kernel module and
is used by a few exotic packages and users. As such a binfmt_misc mount
is triggered when /proc/sys/fs/binfmt_misc is accessed and is only
provided on demand. Other autofs on demand filesystems include EFI ESP
which systemd umounts if the mountpoint stays idle for a certain amount
of time. This doesn't apply to the binfmt_misc autofs mount which isn't
touched once it is mounted meaning this change can't accidently wipe
binary type handlers without someone having explicitly unmounted
binfmt_misc. After speaking to systemd folks they don't expect this
change to affect them.

In line with our general policy, if we see a regression for systemd or
other users with this change we will switch back to the old behavior for
the initial binfmt_misc mount and have binary types pin the filesystem
again. But while we touch this code let's take the chance and let's
improve on the status quo.

[1]: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu
[2]: commit 43a4f261 ("exec: binfmt_misc: fix race between load_misc_binary() and kill_node()"
[3]: commit 83f91827 ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()")
[4]: commit f0fe2c0f ("binder: prevent UAF for binderfs devices II")

Link: https://lore.kernel.org/r/20211028103114.2849140-1-brauner@kernel.org (v1)
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jann Horn <jannh@google.com>
Cc: Henning Schild <henning.schild@siemens.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: linux-fsdevel@vger.kernel.org
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
/* v2 */
- Christian Brauner <christian.brauner@ubuntu.com>:
  - Add more comments that explain what's going on.
  - Rename functions while changing them to better reflect what they are
    doing to make the code easier to understand.
  - In the first version when a specific binary type handler was removed
    either through a write to the entry's file or all binary type
    handlers were removed by a write to the binfmt_misc mount's status
    file all cleanup work happened during inode eviction.
    That includes removal of the relevant entries from entry list. While
    that works fine I disliked that model after thinking about it for a
    bit. Because it means that there was a window were someone has
    already removed a or all binary handlers but they could still be
    safely reached from load_misc_binary() when it has managed to take
    the read_lock() on the entries list while inode eviction was already
    happening. Again, that perfectly benign but it's cleaner to remove
    the binary handler from the list immediately meaning that ones the
    write to then entry's file or the binfmt_misc status file returns
    the binary type cannot be executed anymore. That gives stronger
    guarantees to the user.

1c5976ef

04 Oct, 2023 4 commits

binfmt_elf_fdpic: clean up debug warnings · 553e41d1

Greg Ungerer authored Sep 27, 2023

The binfmt_elf_fdpic loader has some debug trace that can be enabled at
build time. The recent 64-bit additions cause some warnings if that
debug is enabled, such as:

    fs/binfmt_elf_fdpic.c: In function ‘elf_fdpic_map_file’:
    fs/binfmt_elf_fdpic.c:46:33: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘Elf64_Addr’ {aka ‘long long unsigned int’} [-Wformat=]
       46 | #define kdebug(fmt, ...) printk("FDPIC "fmt"\n" ,##__VA_ARGS__ )
          |                                 ^~~~~~~~
    ./include/linux/printk.h:427:25: note: in definition of macro ‘printk_index_wrap’
      427 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
          |                         ^~~~

Cast values to the largest possible type (which is equivilent to unsigned
long long in this case) and use appropriate format specifiers to match.

Fixes: b922bf04 ("binfmt_elf_fdpic: support 64-bit systems")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Link: https://lore.kernel.org/r/20230927132933.3290734-1-gerg@kernel.orgSigned-off-by: Kees Cook <keescook@chromium.org>

553e41d1

mm: Remove unused vm_brk() · 2632bb84

Kees Cook authored Sep 28, 2023

With fs/binfmt_elf.c fully refactored to use the new elf_load() helper,
there are no more users of vm_brk(), so remove it.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20230929032435.2391507-6-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org>

2632bb84

binfmt_elf: Only report padzero() errors when PROT_WRITE · f9c0a39d

Kees Cook authored Sep 28, 2023

Errors with padzero() should be caught unless we're expecting a
pathological (non-writable) segment. Report -EFAULT only when PROT_WRITE
is present.

Additionally add some more documentation to padzero(), elf_map(), and
elf_load().

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20230929032435.2391507-5-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org>

f9c0a39d

binfmt_elf: Use elf_load() for library · d5ca24f6

Kees Cook authored Sep 28, 2023

While load_elf_library() is a libc5-ism, we can still replace most of
its contents with elf_load() as well, further simplifying the code.

Some historical context:
- libc4 was a.out and used uselib (a.out support has been removed)
- libc5 was ELF and used uselib (there may still be users)
- libc6 is ELF and has never used uselib

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20230929032435.2391507-4-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org>

d5ca24f6

29 Sep, 2023 3 commits

binfmt_elf: Use elf_load() for interpreter · 8b04d326

Kees Cook authored Sep 28, 2023

Handle arbitrary memsz>filesz in interpreter ELF segments, instead of
only supporting it in the last segment (which is expected to be the
BSS).

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Closes: https://lore.kernel.org/lkml/20221106021657.1145519-1-pedro.falcato@gmail.com/Tested-by: Pedro Falcato <pedro.falcato@gmail.com>
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20230929032435.2391507-3-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org>

8b04d326

binfmt_elf: elf_bss no longer used by load_elf_binary() · 8ed2ef21

Kees Cook authored Sep 28, 2023

With the BSS handled generically via the new filesz/memsz mismatch
handling logic in elf_load(), elf_bss no longer needs to be tracked.
Drop the variable.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20230929032435.2391507-2-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org>

8ed2ef21

binfmt_elf: Support segments with 0 filesz and misaligned starts · 585a0186

Eric W. Biederman authored Sep 28, 2023

Implement a helper elf_load() that wraps elf_map() and performs all
of the necessary work to ensure that when "memsz > filesz" the bytes
described by "memsz > filesz" are zeroed.

An outstanding issue is if the first segment has filesz 0, and has a
randomized location. But that is the same as today.

In this change I replaced an open coded padzero() that did not clear
all of the way to the end of the page, with padzero() that does.

I also stopped checking the return of padzero() as there is at least
one known case where testing for failure is the wrong thing to do.
It looks like binfmt_elf_fdpic may have the proper set of tests
for when error handling can be safely completed.

I found a couple of commits in the old history
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git,
that look very interesting in understanding this code.

commit 39b56d90 ("[PATCH] binfmt_elf: clearing bss may fail")
commit c6e2227e ("[SPARC64]: Missing user access return value checks in fs/binfmt_elf.c and fs/compat.c")
commit 5bf3be03 ("v2.4.10.1 -> v2.4.10.2")

Looking at commit 39b56d90 ("[PATCH] binfmt_elf: clearing bss may fail"):
>  commit 39b56d90
>  Author: Pavel Machek <pavel@ucw.cz>
>  Date:   Wed Feb 9 22:40:30 2005 -0800
>
>     [PATCH] binfmt_elf: clearing bss may fail
>
>     So we discover that Borland's Kylix application builder emits weird elf
>     files which describe a non-writeable bss segment.
>
>     So remove the clear_user() check at the place where we zero out the bss.  I
>     don't _think_ there are any security implications here (plus we've never
>     checked that clear_user() return value, so whoops if it is a problem).
>
>     Signed-off-by: Pavel Machek <pavel@suse.cz>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>

It seems pretty clear that binfmt_elf_fdpic with skipping clear_user() for
non-writable segments and otherwise calling clear_user(), aka padzero(),
and checking it's return code is the right thing to do.

I just skipped the error checking as that avoids breaking things.

And notably, it looks like Borland's Kylix died in 2005 so it might be
safe to just consider read-only segments with memsz > filesz an error.
Reported-by: Sebastian Ott <sebott@redhat.com>
Reported-by: Thomas Weißschuh <linux@weissschuh.net>
Closes: https://lkml.kernel.org/r/20230914-bss-alloc-v1-1-78de67d2c6dd@weissschuh.netSigned-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/r/87sf71f123.fsf@email.froward.int.ebiederm.orgTested-by: Pedro Falcato <pedro.falcato@gmail.com>
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20230929032435.2391507-1-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org>

585a0186

25 Sep, 2023 1 commit

elf, uapi: Remove struct tag 'dynamic' · ff7a6549

Alejandro Colomar authored Aug 29, 2023

Such a generic struct tag shouldn't have been exposed in a public
header.  Since it's undocumented, we can assume it's a historical
accident.  And since no software (at least on Debian) relies on this
tag, we can safely remove it.

Here are the results of a Debian Code Search[1]:

$ # packages that contain 'include [<"]linux/elf\.h[">]'
$ curl -s https://codesearch.debian.net/results/e5e7c74dfcdae609/packages.txt > include
$ # packages that contain '\bstruct dynamic\b'
$ curl -s https://codesearch.debian.net/results/b23577e099048c6a/packages.txt > struct
$ cat struct include | sort | uniq -d
chromium
hurd
linux
qemu
qt6-webengine
qtwebengine-opensource-src
$ # chromium: Seems to hold a copy of the UAPI header.  No uses of the tag.
$ # hurd:     Same thing as chromium.
$ # linux:    :)
$ # qemu:     Same thing as chromium.
$ # qt6-webengine:  Same thing as all.
$ # qtwebengine-opensource-src:  Yet another copy.

Link: https://codesearch.debian.net/ [1]
Link: https://lore.kernel.org/linux-mm/87wmxdokum.fsf@email.froward.int.ebiederm.org/T/
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Rolf Eike Beer <eb@emlix.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>

ff7a6549

17 Sep, 2023 11 commits

Linux 6.6-rc2 · ce9ecca0
Linus Torvalds authored Sep 17, 2023

ce9ecca0

Merge tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7892864

Linus Torvalds authored Sep 17, 2023

Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - Fix an UV boot crash

   - Skip spurious ENDBR generation on _THIS_IP_

   - Fix ENDBR use in putuser() asm methods

   - Fix corner case boot crashes on 5-level paging

   - and fix a false positive WARNING on LTO kernels"

* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory: Remove LTO flags
  x86/boot/compressed: Reserve more memory for page tables
  x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
  x86/ibt: Suppress spurious ENDBR
  x86/platform/uv: Use alternate source for socket to node data

e7892864

Merge tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e5a710d1

Linus Torvalds authored Sep 17, 2023

Pull scheduler fixes from Ingo Molnar:
 "Fix a performance regression on large SMT systems, an Intel SMT4
  balancing bug, and a topology setup bug on (Intel) hybrid processors"

* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
  sched/fair: Fix SMT4 group_smt_balance handling
  sched/fair: Optimize should_we_balance() for large SMT systems

e5a710d1

Merge tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e54ca3c8

Linus Torvalds authored Sep 17, 2023

Pull objtool fix from Ingo Molnar:
 "Fix a cold functions related false-positive objtool warning that
  triggers on Clang"

* tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix _THIS_IP_ detection for cold functions

e54ca3c8

Merge tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 99a73f9e

Linus Torvalds authored Sep 17, 2023

Pull WARN fix from Ingo Molnar:
 "Fix a missing preempt-enable in the WARN() slowpath"

* tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  panic: Reenable preemption in WARN slowpath

99a73f9e

stat: remove no-longer-used helper macros · 42aadec8

Linus Torvalds authored Sep 03, 2023

The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit a52dd971 ("vfs: de-crapify "cp_new_stat()" function").

Then a decade later Mikulas noticed that said inconsistency had been a
mistake in the early x86-64 port, and shouldn't have existed in the
first place.  So commit 932aba1e ("stat: fix inconsistency between
struct stat and struct compat_stat") removed the uses of the helpers.

But the helpers remained around, unused.

Get rid of them.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

42aadec8

Merge tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 45c3c627

Linus Torvalds authored Sep 17, 2023

Pull smb client fixes from Steve French:
 "Three small SMB3 client fixes, one to improve a null check and two
  minor cleanups"

* tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix some minor typos and repeated words
  smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP
  smb3: move server check earlier when setting channel sequence number

45c3c627

Merge tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd · 39e0c8af

Linus Torvalds authored Sep 17, 2023

Pull smb server fixes from Steve French:
 "Two ksmbd server fixes"

* tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd:
  ksmbd: fix passing freed memory 'aux_payload_buf'
  ksmbd: remove unneeded mark_inode_dirty in set_info_sec()

39e0c8af

Merge tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 3fde3003

Linus Torvalds authored Sep 17, 2023

Pull ext4 fixes from Ted Ts'o:
 "Regression and bug fixes for ext4"

* tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix rec_len verify error
  ext4: do not let fstrim block system suspend
  ext4: move setting of trimmed bit into ext4_try_to_trim_range()
  jbd2: Fix memory leak in journal_init_common()
  jbd2: Remove page size assumptions
  buffer: Make bh_offset() work for compound pages

3fde3003

x86/purgatory: Remove LTO flags · 75b2f7e4

Song Liu authored Sep 14, 2023

-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:

  $ readelf -S purgatory.ro  | grep " .text"
    [ 1] .text             PROGBITS         0000000000000000  00000040
    [ 7] .text.purgatory   PROGBITS         0000000000000000  000020e0
    [ 9] .text.warn        PROGBITS         0000000000000000  000021c0
    [13] .text.sha256_upda PROGBITS         0000000000000000  000022f0
    [15] .text.sha224_upda PROGBITS         0000000000000000  00002be0
    [17] .text.sha256_fina PROGBITS         0000000000000000  00002bf0
    [19] .text.sha224_fina PROGBITS         0000000000000000  00002cc0

This causes WARNING from kexec_purgatory_setup_sechdrs():

  WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
  kexec_load_purgatory+0x37f/0x390

Fix this by disabling LTO for purgatory.

[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]

We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.

Fixes: b33fff07 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org

75b2f7e4

x86/boot/compressed: Reserve more memory for page tables · f530ee95

Kirill A. Shutemov authored Sep 15, 2023

The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.

The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.

In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.

Update wost-case calculations to include 5-level paging.

To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.

[ Also add a warning, should this memory run low - by Dave Hansen. ]

Fixes: 34bbb000 ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com

f530ee95

16 Sep, 2023 12 commits

Merge tag 'kbuild-fixes-v6.6' of... · f0b0d403

Linus Torvalds authored Sep 16, 2023

Merge tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix kernel-devel RPM and linux-headers Deb package

 - Fix too long argument list error in 'make modules_install'

* tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: avoid long argument lists in make modules_install
  kbuild: fix kernel-devel RPM package and linux-headers Deb package

f0b0d403

vm: fix move_vma() memory accounting being off · 3cec5049

Linus Torvalds authored Sep 16, 2023

Commit 408579cd ("mm: Update do_vmi_align_munmap() return
semantics") seems to have updated one of the callers of do_vmi_munmap()
incorrectly: it used to check for the error case (which didn't
change: negative means error).

That commit changed the check to the success case (which did change:
before that commit, 0 was success, and 1 was "success and lock
downgraded". After the change, it's always 0 for success, and the lock
will have been released if requested).

This didn't change any actual VM behavior _except_ for memory accounting
when 'VM_ACCOUNT' was set on the vma. Which made the wrong return value
test fairly subtle, since everything continues to work.

Or rather - it continues to work but the "Committed memory" accounting
goes all wonky (Committed_AS value in /proc/meminfo), and depending on
settings that then causes problems much much later as the VM relies on
bogus statistics for its heuristics.

Revert that one line of the change back to the original logic.

Fixes: 408579cd ("mm: Update do_vmi_align_munmap() return semantics")
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Reported-bisected-and-tested-by: Michael Labiuk <michael.labiuk@virtuozzo.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3cec5049

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ad8a69f3

Linus Torvalds authored Sep 16, 2023

Pull SCSI fixes from James Bottomley:
"16 small(ish) fixes all in drivers.

The major fixes are in pm8001 (fixes MSI-X issue going back to its
origin), the qla2xxx endianness fix, which fixes a bug on big endian
and the lpfc ones which can cause an oops on module removal without
them"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
scsi: target: core: Fix target_cmd_counter leak
scsi: pm8001: Setup IRQs on resume
scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
scsi: qedf: Add synchronization between I/O completions and abort
scsi: target: Replace strlcpy() with strscpy()
scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
scsi: qla2xxx: Correct endianness for rqstlen and rsplen
scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
scsi: megaraid_sas: Fix deadlock on firmware crashdump

ad8a69f3

Merge tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · cc3e5afc

Linus Torvalds authored Sep 16, 2023

Pull ata fixes from Damien Le Moal:

 - Fix link power management transitions to disallow unsupported states
   (Niklas)

 - A small string handling fix for the sata_mv driver (Christophe)

 - Clear port pending interrupts before reset, as per AHCI
   specifications (Szuying).

   Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
   ata_eh_reset() to allow EH to continue on with other actions recorded
   with error interrupts triggered before EH completes. And an
   additional fix to avoid thawing a port twice in EH (Niklas)

 - Small code style fixes in the pata_parport driver to silence the
   build bot as it keeps complaining about bad indentation (me)

 - A fix for the recent CDL code to avoid fetching sense data for
   successful commands when not necessary for correct operation (Niklas)

* tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-core: fetch sense data for successful commands iff CDL enabled
  ata: libata-eh: do not thaw the port twice in ata_eh_reset()
  ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
  ata: pata_parport: Fix code style issues
  ata: libahci: clear pending interrupt status
  ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
  ata: libata: disallow dev-initiated LPM transitions to unsupported states

cc3e5afc

Merge tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cce67b6b

Linus Torvalds authored Sep 16, 2023

Pull USB fix from Greg KH:
 "Here is a single USB fix for a much-reported regression for 6.6-rc1.

  It resolves a crash in the typec debugfs code for many systems. It's
  been in linux-next with no reported issues, and many people have
  reported it resolving their problem with 6.6-rc1"

* tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix NULL pointer dereference

cce67b6b

Merge tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 205d0494

Linus Torvalds authored Sep 16, 2023

Pull driver core fixes from Greg KH:
 "Here is a single driver core fix for a much-reported-by-sysbot issue
  that showed up in 6.6-rc1. It's been submitted by many people, all in
  the same way, so it obviously fixes things for them all.

  Also in here is a single documentation update adding riscv to the
  embargoed hardware document in case there are any future issues with
  that processor family.

  Both of these have been in linux-next with no reported problems"

* tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
  driver core: return an error when dev_set_name() hasn't happened

205d0494

Merge tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · fd455e77

Linus Torvalds authored Sep 16, 2023

Pull char/misc fix from Greg KH:
 "Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
  comedi subsystem that has ended up being incorrect and caused drivers
  that were working for people to be unable to be able to be selected to
  build at all.

  To fix this, the Kconfig change needs to be reverted and a future set
  of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
  no rush for them.)

  This has been in linux-next with no reported issues"

* tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "comedi: add HAS_IOPORT dependencies"

fd455e77

Merge tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · c37f8efc

Linus Torvalds authored Sep 16, 2023

Pull i2c fixes from Wolfram Sang:
 "The main thing is the removal of 'probe_new' because all i2c client
  drivers are converted now. Thanks Uwe, this marks the end of a long
  conversion process.

  Other than that, we have a few Kconfig updates and driver bugfixes"

* tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: cadence: Fix the kernel-doc warnings
  i2c: aspeed: Reset the i2c controller when timeout occurs
  i2c: I2C_MLXCPLD on ARM64 should depend on ACPI
  i2c: Make I2C_ATR invisible
  i2c: Drop legacy callback .probe_new()
  w1: ds2482: Switch back to use struct i2c_driver's .probe()

c37f8efc

ata: libata-core: fetch sense data for successful commands iff CDL enabled · 5e35a9ac

Niklas Cassel authored Sep 13, 2023

Currently, we fetch sense data for a _successful_ command if either:
1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
sense data supported bit is set); or
2) Command was non-NCQ and regular sense data reporting is enabled.

This means that case 2) will trigger for a non-NCQ command which has
ATA_SENSE bit set, regardless if CDL is enabled or not.

This decision was by design. If the device reports that it has sense data
available, it makes sense to fetch that sense data, since the sk/asc/ascq
could be important information regardless if CDL is enabled or not.

However, the fetching of sense data for a successful command is done via
ATA EH. Considering how intricate the ATA EH is, we really do not want to
invoke ATA EH unless absolutely needed.

Before commit 18bd7718 ("scsi: ata: libata: Handle completion of CDL
commands using policy 0xD") we never fetched sense data for successful
commands.

In order to not invoke the ATA EH unless absolutely necessary, even if the
device claims support for sense data reporting, only fetch sense data for
successful (NCQ and non-NCQ commands) commands that are using CDL.

[Damien] Modified the check to test the qc flag ATA_QCFLAG_HAS_CDL
instead of the device support for CDL, which is implied for commands
using CDL.

Fixes: 3ac873c7 ("ata: libata-core: fix when to fetch sense data for successful commands")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

5e35a9ac

ata: libata-eh: do not thaw the port twice in ata_eh_reset() · 7a3bc2b3

Niklas Cassel authored Sep 14, 2023

commit 1e641060 ("libata: clear eh_info on reset completion") added
a workaround that broke the retry mechanism in ATA EH.

Tejun himself suggested to remove this workaround when it was identified
to cause additional problems:
https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/

He even said:
"Hmm... it seems I wasn't thinking straight when I added that work around."
https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/

While removing the workaround solved the issue, however, the workaround was
kept to avoid "spurious hotplug events during reset", and instead another
workaround was added on top of the existing workaround in commit
8c56cacc ("libata: fix unexpectedly frozen port after ata_eh_reset()").

Because these IRQs happened when the port was frozen, we know that they
were actually a side effect of PxIS and IS.IPS(x) not being cleared before
the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
pending interrupt status"), so these workarounds can now be removed.

Since commit 1e641060 ("libata: clear eh_info on reset completion") has
now been reverted, the ATA EH retry mechanism is functional again, so there
is once again no need to thaw the port more than once in ata_eh_reset().

This reverts "the workaround on top of the workaround" introduced in commit
8c56cacc ("libata: fix unexpectedly frozen port after ata_eh_reset()").
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

7a3bc2b3

ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() · 80cc944e

Niklas Cassel authored Sep 14, 2023

ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).

If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.

Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.

The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().

ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.

The additional clearing in ata_eh_reset() was introduced in commit
1e641060 ("libata: clear eh_info on reset completion").

Looking at the original error report:
https://marc.info/?l=linux-ide&m=124765325828495&w=2

We can see the following happening:
[    1.074659] ata3: XXX port freeze
[    1.074700] ata3: XXX hardresetting link, stopping engine
[    1.074746] ata3: XXX flipping SControl

[    1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[    1.411475] ata3: XXX port freeze

[    1.420049] ata3: XXX starting engine
[    1.420096] ata3: XXX rc=0, class=1
[    1.420142] ata3: XXX clearing IRQs for thawing
[    1.420188] ata3: XXX port thawed
[    1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."

Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.

So IS.IPS(x) only gets set if PxIS and PxIE is set.

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."

So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.

Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.

However, since commit 737dd811 ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit 1e641060 ("libata:
clear eh_info on reset completion") fixed can no longer happen.

Thus, revert commit 1e641060 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

80cc944e

Merge tag 'linux-kselftest-fixes-6.6-rc2' of... · 57d88e8a

Linus Torvalds authored Sep 15, 2023

Merge tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull more kselftest fixes from Shuah Khan
 "Fixes to user_events test and ftrace test.

  The user_events test was enabled by default in Linux 6.6-rc1. The
  following fixes are for bugs found since then:

   - add checks for dependencies and skip the test if they aren't met.

     The user_events test requires root access, and tracefs and
     user_events enabled. It leaves tracefs mounted and a fix is in
     progress for that missing piece.

   - create user_events test-specific Kconfig fragments

  ftrace test fixes:

   - unmount tracefs for recovering environment. Fix identified during
     the above mentioned user_events dependencies fix.

   - adds softlink to latest log directory improving usage"

* tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: tracing: Fix to unmount tracefs for recovering environment
  selftests: user_events: create test-specific Kconfig fragments
  ftrace/selftests: Add softlink to latest log directory
  selftests/user_events: Fix failures when user_events is not installed

57d88e8a

15 Sep, 2023 7 commits

Merge tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · d8d7cd65

Linus Torvalds authored Sep 15, 2023

Pull nfsd fixes from Chuck Lever:

 - Use correct order when encoding NFSv4 RENAME change_info

 - Fix a potential oops during NFSD shutdown

* tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix possible oops when nfsd/pool_stats is closed.
  nfsd: fix change_info in NFSv4 RENAME replies

d8d7cd65

Merge tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4eb2bd24

Linus Torvalds authored Sep 15, 2023

Pull power management fixes from Rafael Wysocki:
 "Fix the handling of block devices in the test_resume mode of
  hibernation (Chen Yu)"

* tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: Fix the exclusive get block device in test_resume mode
  PM: hibernate: Rename function parameter from snapshot_test to exclusive

4eb2bd24

Merge tag 'thermal-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e2dd7a16

Linus Torvalds authored Sep 15, 2023

Pull thermal control fixes from Rafael Wysocki:
 "These fix a thermal core breakage introduced by one of the recent
  changes, amend those changes by adding 'const' to a new callback
  argument and fix two memory leaks.

  Specifics:

   - Unbreak disabled trip point check in handle_thermal_trip() that may
     cause it to skip enabled trip points (Rafael Wysocki)

   - Add missing of_node_put() to of_find_trip_id() and
     thermal_of_for_each_cooling_maps() that each break out of a
     for_each_child_of_node() loop without dropping the reference to the
     child object (Julia Lawall)

   - Constify the recently added trip argument of the .get_trend()
     thermal zone callback (Rafael Wysocki)"

* tag 'thermal-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Fix disabled trip point check in handle_thermal_trip()
  thermal: Constify the trip argument of the .get_trend() zone callback
  thermal/of: add missing of_node_put()

e2dd7a16

Merge tag 'for-6.6/dm-fixes' of... · e39bfb59

Linus Torvalds authored Sep 15, 2023

Merge tag 'for-6.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM core retrieve_deps() UAF race due to missing locking of a DM
   table's list of devices that is managed using dm_{get,put}_device.

 - Revert DM core's half-baked RCU optimization if IO submitter has set
   REQ_NOWAIT. Can be revisited, and properly justified, after
   comprehensively auditing all of DM to also pass GFP_NOWAIT for any
   allocations if REQ_NOWAIT used.

* tag 'for-6.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: don't attempt to queue IO under RCU protection
  dm: fix a race condition in retrieve_deps

e39bfb59

Merge tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux · 5bc357b2

Linus Torvalds authored Sep 15, 2023

Pull block fixes from Jens Axboe:

 - NVMe pull via Keith:
      - nvme-tcp iov len fix (Varun)
      - nvme-hwmon const qualifier for safety (Krzysztof)
      - nvme-fc null pointer checks (Nigel)
      - nvme-pci no numa node fix (Pratyush)
      - nvme timeout fix for non-compliant controllers (Keith)

 - MD pull via Song fixing regressions with both 6.5 and 6.6

 - Fix a use-after-free regression in resizing blk-mq tags (Chengming)

* tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux:
  nvme: avoid bogus CRTO values
  md: Put the right device in md_seq_next
  nvme-pci: do not set the NUMA node of device if it has none
  blk-mq: fix tags UAF when shrinking q->nr_hw_queues
  md/raid1: fix error: ISO C90 forbids mixed declarations
  md: fix warning for holder mismatch from export_rdev()
  md: don't dereference mddev after export_rdev()
  nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()
  nvme: host: hwmon: constify pointers to hwmon_channel_info
  nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()

5bc357b2

Merge tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux · 31d8fddb

Linus Torvalds authored Sep 15, 2023

Pull io_uring fix from Jens Axboe:
 "Just a single fix, fixing a regression with poll first, recvmsg, and
  using a provided buffer"

* tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux:
  io_uring/net: fix iter retargeting for selected buf

31d8fddb

Merge tag 'firewire-fixes-6.6-rc2' of... · 0e494be7

Linus Torvalds authored Sep 15, 2023

Merge tag 'firewire-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
 "A change applied to v6.5 kernel brings an issue that usual GFP
  allocation is done in atomic context under acquired spin-lock. Let us
  revert it"

* tag 'firewire-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  Revert "firewire: core: obsolete usage of GFP_ATOMIC at building node tree"

0e494be7