Commits · 5fadbd992996e9dda7ebcb62f5352866057bd619 · Kirill Smelkov / linux

21 Jul, 2022 2 commits

ceph: rely on vfs for setgid stripping · 5fadbd99

Yang Xu authored Jul 14, 2022

Now that we finished moving setgid stripping for regular files in setgid
directories into the vfs, individual filesystem don't need to manually
strip the setgid bit anymore. Drop the now unneeded code from ceph.

Link: https://lore.kernel.org/r/1657779088-2242-4-git-send-email-xuyang2018.jy@fujitsu.comReviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christian Brauner (Microsoft)<brauner@kernel.org>
Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

5fadbd99

fs: move S_ISGID stripping into the vfs_*() helpers · 1639a49c

Yang Xu authored Jul 14, 2022

Move setgid handling out of individual filesystems and into the VFS
itself to stop the proliferation of setgid inheritance bugs.

Creating files that have both the S_IXGRP and S_ISGID bit raised in
directories that themselves have the S_ISGID bit set requires additional
privileges to avoid security issues.

When a filesystem creates a new inode it needs to take care that the
caller is either in the group of the newly created inode or they have
CAP_FSETID in their current user namespace and are privileged over the
parent directory of the new inode. If any of these two conditions is
true then the S_ISGID bit can be raised for an S_IXGRP file and if not
it needs to be stripped.

However, there are several key issues with the current implementation:

* S_ISGID stripping logic is entangled with umask stripping.

  If a filesystem doesn't support or enable POSIX ACLs then umask
  stripping is done directly in the vfs before calling into the
  filesystem.
  If the filesystem does support POSIX ACLs then unmask stripping may be
  done in the filesystem itself when calling posix_acl_create().

  Since umask stripping has an effect on S_ISGID inheritance, e.g., by
  stripping the S_IXGRP bit from the file to be created and all relevant
  filesystems have to call posix_acl_create() before inode_init_owner()
  where we currently take care of S_ISGID handling S_ISGID handling is
  order dependent. IOW, whether or not you get a setgid bit depends on
  POSIX ACLs and umask and in what order they are called.

  Note that technically filesystems are free to impose their own
  ordering between posix_acl_create() and inode_init_owner() meaning
  that there's additional ordering issues that influence S_SIGID
  inheritance.

* Filesystems that don't rely on inode_init_owner() don't get S_ISGID
  stripping logic.

  While that may be intentional (e.g. network filesystems might just
  defer setgid stripping to a server) it is often just a security issue.

This is not just ugly it's unsustainably messy especially since we do
still have bugs in this area years after the initial round of setgid
bugfixes.

So the current state is quite messy and while we won't be able to make
it completely clean as posix_acl_create() is still a filesystem specific
call we can improve the S_SIGD stripping situation quite a bit by
hoisting it out of inode_init_owner() and into the vfs creation
operations. This means we alleviate the burden for filesystems to handle
S_ISGID stripping correctly and can standardize the ordering between
S_ISGID and umask stripping in the vfs.

We add a new helper vfs_prepare_mode() so S_ISGID handling is now done
in the VFS before umask handling. This has S_ISGID handling is
unaffected unaffected by whether umask stripping is done by the VFS
itself (if no POSIX ACLs are supported or enabled) or in the filesystem
in posix_acl_create() (if POSIX ACLs are supported).

The vfs_prepare_mode() helper is called directly in vfs_*() helpers that
create new filesystem objects. We need to move them into there to make
sure that filesystems like overlayfs hat have callchains like:

sys_mknod()
-> do_mknodat(mode)
   -> .mknod = ovl_mknod(mode)
      -> ovl_create(mode)
         -> vfs_mknod(mode)

get S_ISGID stripping done when calling into lower filesystems via
vfs_*() creation helpers. Moving vfs_prepare_mode() into e.g.
vfs_mknod() takes care of that. This is in any case semantically cleaner
because S_ISGID stripping is VFS security requirement.

Security hooks so far have seen the mode with the umask applied but
without S_ISGID handling done. The relevant hooks are called outside of
vfs_*() creation helpers so by calling vfs_prepare_mode() from vfs_*()
helpers the security hooks would now see the mode without umask
stripping applied. For now we fix this by passing the mode with umask
settings applied to not risk any regressions for LSM hooks. IOW, nothing
changes for LSM hooks. It is worth pointing out that security hooks
never saw the mode that is seen by the filesystem when actually creating
the file. They have always been completely misplaced for that to work.

The following filesystems use inode_init_owner() and thus relied on
S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus,
hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs,
reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs.

All of the above filesystems end up calling inode_init_owner() when new
filesystem objects are created through the ->mkdir(), ->mknod(),
->create(), ->tmpfile(), ->rename() inode operations.

Since directories always inherit the S_ISGID bit with the exception of
xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't
apply. The ->symlink() and ->link() inode operations trivially inherit
the mode from the target and the ->rename() inode operation inherits the
mode from the source inode. All other creation inode operations will get
S_ISGID handling via vfs_prepare_mode() when called from their relevant
vfs_*() helpers.

In addition to this there are filesystems which allow the creation of
filesystem objects through ioctl()s or - in the case of spufs -
circumventing the vfs in other ways. If filesystem objects are created
through ioctl()s the vfs doesn't know about it and can't apply regular
permission checking including S_ISGID logic. Therfore, a filesystem
relying on S_ISGID stripping in inode_init_owner() in their ioctl()
callpath will be affected by moving this logic into the vfs. We audited
those filesystems:

* btrfs allows the creation of filesystem objects through various
  ioctls(). Snapshot creation literally takes a snapshot and so the mode
  is fully preserved and S_ISGID stripping doesn't apply.

  Creating a new subvolum relies on inode_init_owner() in
  btrfs_new_subvol_inode() but only creates directories and doesn't
  raise S_ISGID.

* ocfs2 has a peculiar implementation of reflinks. In contrast to e.g.
  xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with
  the actual extents ocfs2 uses a separate ioctl() that also creates the
  target file.

  Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on
  inode_init_owner() to strip the S_ISGID bit. This is the only place
  where a filesystem needs to call mode_strip_sgid() directly but this
  is self-inflicted pain.

* spufs doesn't go through the vfs at all and doesn't use ioctl()s
  either. Instead it has a dedicated system call spufs_create() which
  allows the creation of filesystem objects. But spufs only creates
  directories and doesn't allo S_SIGID bits, i.e. it specifically only
  allows 0777 bits.

* bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created.

The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount
option is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used,
and on xfs when the XFS_FEAT_GRPID mount option is used. When any of
these filesystems are mounted with their respective GRPID option then
newly created files inherit the parent directories group
unconditionally. In these cases non of the filesystems call
inode_init_owner() and thus did never strip the S_ISGID bit for newly
created files. Moving this logic into the VFS means that they now get
the S_ISGID bit stripped. This is a user visible change. If this leads
to regressions we will either need to figure out a better way or we need
to revert. However, given the various setgid bugs that we found just in
the last two years this is a regression risk we should take.

Associated with this change is a new set of fstests to enforce the
semantics for all new filesystems.

Link: https://lore.kernel.org/ceph-devel/20220427092201.wvsdjbnc7b4dttaw@wittgenstein [1]
Link: e014f37d ("xfs: use setattr_copy to set vfs inode attributes") [2]
Link: 01ea173e ("xfs: fix up non-directory creation in SGID directories") [3]
Link: fd84bfdd ("ceph: fix up non-directory creation in SGID directories") [4]
Link: https://lore.kernel.org/r/1657779088-2242-3-git-send-email-xuyang2018.jy@fujitsu.comSuggested-by: Dave Chinner <david@fromorbit.com>
Suggested-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
[<brauner@kernel.org>: rewrote commit message]
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

1639a49c

19 Jul, 2022 2 commits

fs: Add missing umask strip in vfs_tmpfile · ac6800e2

Yang Xu authored Jul 14, 2022

All creation paths except for O_TMPFILE handle umask in the vfs directly
if the filesystem doesn't support or enable POSIX ACLs. If the filesystem
does then umask handling is deferred until posix_acl_create().
Because, O_TMPFILE misses umask handling in the vfs it will not honor
umask settings. Fix this by adding the missing umask handling.

Link: https://lore.kernel.org/r/1657779088-2242-2-git-send-email-xuyang2018.jy@fujitsu.com
Fixes: 60545d0d ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...")
Cc: <stable@vger.kernel.org> # 4.19+
Reported-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

ac6800e2

fs: add mode_strip_sgid() helper · 2b3416ce

Yang Xu authored Jul 14, 2022

Add a dedicated helper to handle the setgid bit when creating a new file
in a setgid directory. This is a preparatory patch for moving setgid
stripping into the vfs. The patch contains no functional changes.

Currently the setgid stripping logic is open-coded directly in
inode_init_owner() and the individual filesystems are responsible for
handling setgid inheritance. Since this has proven to be brittle as
evidenced by old issues we uncovered over the last months (see [1] to
[3] below) we will try to move this logic into the vfs.

Link: e014f37d ("xfs: use setattr_copy to set vfs inode attributes") [1]
Link: 01ea173e ("xfs: fix up non-directory creation in SGID directories") [2]
Link: fd84bfdd ("ceph: fix up non-directory creation in SGID directories") [3]
Link: https://lore.kernel.org/r/1657779088-2242-1-git-send-email-xuyang2018.jy@fujitsu.comReviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>

2b3416ce

17 Jul, 2022 15 commits

Linux 5.19-rc7 · ff699273
Linus Torvalds authored Jul 17, 2022

ff699273

Merge tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel · 55ea9bd6

Linus Torvalds authored Jul 17, 2022

Pull intel drm build fix from Rodrigo Vivi:
 "Our 'dim' flow has a problem with fixes of fixes getting missed. We
  need to take a look on that later.

  Meanwhile, please allow me to quickly propagate this fix for the
  32-bit build issue here upstream"

* tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915/ttm: fix 32b build

55ea9bd6

Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of... · f7f4da30

Linus Torvalds authored Jul 17, 2022

Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix SIGSEGV when processing syscall args in perf.data files in 'perf
   trace'

 - Sync kvm, msr-index and cpufeatures headers with the kernel sources

 - Fix 'convert perf time to TSC' 'perf test':
     - No need to open events twice
     - Fix finding correct event on hybrid systems

* tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf trace: Fix SIGSEGV when processing syscall args
  perf tests: Fix Convert perf time to TSC test for hybrid
  perf tests: Stop Convert perf time to TSC test opening events twice
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources

f7f4da30

drm/i915/ttm: fix 32b build · ced7866d

Matthew Auld authored Jul 12, 2022

Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: aff1e0b0 ("drm/i915/ttm: fix sg_table construction")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com
(cherry picked from commit 9306b2b2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

ced7866d

Merge tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b18593e

Linus Torvalds authored Jul 17, 2022

Pull perf fix from Borislav Petkov:

 - A single data race fix on the perf event cleanup path to avoid
   endless loops due to insufficient locking

* tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()

2b18593e

Merge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 59c80f05

Linus Torvalds authored Jul 17, 2022

Pull x86 fixes from Borislav Petkov:

 - Improve the check whether the kernel supports WP mappings so that it
   can accomodate a XenPV guest due to how the latter is setting up the
   PAT machinery

  - Now that the retbleed nightmare is public, here's the first round of
    fallout fixes:

      * Fix a build failure on 32-bit due to missing include

      * Remove an untraining point in espfix64 return path

      * other small cleanups

* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Remove apostrophe typo
  um: Add missing apply_returns()
  x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
  x86/bugs: Mark retbleed_strings static
  x86/pat: Fix x86_has_pat_wp()
  x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit

59c80f05

Merge tag 'gpio-fixes-for-v5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 2eccaca7

Linus Torvalds authored Jul 17, 2022

Pull gpio fix from Bartosz Golaszewski:

 - fix a configfs attribute of the gpio-sim module

* tag 'gpio-fixes-for-v5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sim: fix the chip_name configfs item

2eccaca7

Merge tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 8ad4b6fa

Linus Torvalds authored Jul 17, 2022

Pull input fixes from Dmitry Torokhov:

 - fix Goodix driver to properly behave on the Aya Neo Next

 - some more sanity checks in usbtouchscreen driver

 - a tweak in wm97xx driver in preparation for remove() to return void

 - a clarification in input core regarding units of measurement for
   resolution on touch events.

* tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: document the units for resolution of size axes
  Input: goodix - call acpi_device_fix_up_power() in some cases
  Input: wm97xx - make .remove() obviously always return 0
  Input: usbtouchscreen - add driver_info sanity check

8ad4b6fa

Merge tag 'for-v5.19-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · 396df700

Linus Torvalds authored Jul 17, 2022

Pull power supply fixes from Sebastian Reichel:

 - power-supply core temperature interpolation regression fix for
   incorrect boundaries

 - ab8500 needs to destroy its work queues in error paths

 - Fix old DT refcount leak in arm-versatile

* tag 'for-v5.19-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: core: Fix boundary conditions in interpolation
  power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
  power: supply: ab8500_fg: add missing destroy_workqueue in ab8500_fg_probe

396df700

perf trace: Fix SIGSEGV when processing syscall args · 4b335e1e

Naveen N. Rao authored Jul 07, 2022

On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':

  #0  0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
  #1  syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
  #2  syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
  #3  0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
  #4  syscall__scnprintf_args <snip> at builtin-trace.c:2041
  #5  0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319

That points to the below code in tools/perf/builtin-trace.c:
	/*
	 * If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
	 * arguments, even if the syscall being handled, say "openat", uses only 4 arguments
	 * this breaks syscall__augmented_args() check for augmented args, as we calculate
	 * syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
	 * so when handling, say the openat syscall, we end up getting 6 args for the
	 * raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
	 * thinking that the extra 2 u64 args are the augmented filename, so just check
	 * here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
	 */
	if (evsel != trace->syscalls.events.sys_enter)
		augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);

As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.
Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

4b335e1e

perf tests: Fix Convert perf time to TSC test for hybrid · deb44a62

Adrian Hunter authored Jul 13, 2022

The test does not always correctly determine the number of events for
hybrids, nor allow for more than 1 evsel when parsing.

Fix by iterating the events actually created and getting the correct
evsel for the events processed.

Fixes: d9da6f70 ("perf tests: Support 'Convert perf time to TSC' test for hybrid")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713123459.24145-3-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

deb44a62

perf tests: Stop Convert perf time to TSC test opening events twice · 498c7a54

Adrian Hunter authored Jul 13, 2022

Do not call evlist__open() twice.

Fixes: 5bb017d4 ("perf test: Fix error message for test case 71 on s390, where it is not supported")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713123459.24145-2-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

498c7a54

tools arch x86: Sync the msr-index.h copy with the kernel sources · 91d248c3

Arnaldo Carvalho de Melo authored Jul 01, 2021

To pick up the changes from these csets:

  4ad3278d ("x86/speculation: Disable RRSBA behavior")
  d7caac99 ("x86/cpu/amd: Add Spectral Chicken")

That cause no changes to tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  $

Just silences this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YtQTm9wsB3hxQWvy@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

91d248c3

tools headers cpufeatures: Sync with the kernel sources · f098addb

Arnaldo Carvalho de Melo authored Jul 01, 2021

To pick the changes from:

  f43b9876 ("x86/retbleed: Add fine grained Kconfig knobs")
  a149180f ("x86: Add magic AMD return-thunk")
  15e67227 ("x86: Undo return-thunk damage")
  369ae6ff ("x86/retpoline: Cleanup some #ifdefery")
  4ad3278d x86/speculation: Disable RRSBA behavior
  26aae8cc x86/cpu/amd: Enumerate BTC_NO
  9756bba2 x86/speculation: Fill RSB on vmexit for IBRS
  3ebc1700 x86/bugs: Add retbleed=ibpb
  2dbb887e x86/entry: Add kernel IBRS implementation
  6b80b59b x86/bugs: Report AMD retbleed vulnerability
  a149180f x86: Add magic AMD return-thunk
  15e67227 x86: Undo return-thunk damage
  a883d624 x86/cpufeatures: Move RETPOLINE flags to word 11
  51802186 x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org
Link: https://lore.kernel.org/lkml/YtQM40VmiLTkPND2@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

f098addb

tools headers UAPI: Sync linux/kvm.h with the kernel sources · eee51fe3

Arnaldo Carvalho de Melo authored May 09, 2021

To pick the changes in:

  1b870fa5 ("kvm: stats: tell userspace which values are boolean")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.

This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/YtQLDvQrBhJNl3n5@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

eee51fe3

16 Jul, 2022 12 commits

Merge tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 972a278f

Linus Torvalds authored Jul 16, 2022

Pull btrfs reverts from David Sterba:
 "Due to a recent report [1] we need to revert the radix tree to xarray
  conversion patches.

  There's a problem with sleeping under spinlock, when xa_insert could
  allocate memory under pressure. We use GFP_NOFS so this is a real
  problem that we unfortunately did not discover during review.

  I'm sorry to do such change at rc6 time but the revert is IMO the
  safer option, there are patches to use mutex instead of the spin locks
  but that would need more testing. The revert branch has been tested on
  a few setups, all seem ok.

  The conversion to xarray will be revisited in the future"

Link: https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/ [1]

* tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: turn delayed_nodes_tree into an XArray"
  Revert "btrfs: turn name_cache radix tree into XArray in send_ctx"
  Revert "btrfs: turn fs_info member buffer_radix into XArray"
  Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray"

972a278f

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c5fe7a97

Linus Torvalds authored Jul 16, 2022

Pull SCSI fixes from James Bottomley:
 "Six small and reasonably obvious fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: pm80xx: Set stopped phy's linkrate to Disabled
  scsi: pm80xx: Fix 'Unknown' max/min linkrate
  scsi: ufs: core: Fix missing clk change notification on host reset
  scsi: ufs: core: Drop loglevel of WriteBoost message
  scsi: megaraid: Clear READ queue map's nr_queues
  scsi: target: Fix WRITE_SAME No Data Buffer crash

c5fe7a97

Merge tag 'block-5.19-2022-07-15' of git://git.kernel.dk/linux-block · 6bca047e

Linus Torvalds authored Jul 16, 2022

Pull block fixes from Jens Axboe:
 "Two NVMe fixes, and a regression fix for the core block layer from
  this merge window"

* tag 'block-5.19-2022-07-15' of git://git.kernel.dk/linux-block:
  block: fix missing blkcg_bio_issue_init
  nvme: fix block device naming collision
  nvme-pci: fix freeze accounting for error handling

6bca047e

Merge tag 'usb-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 9ed714db

Linus Torvalds authored Jul 16, 2022

Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes and new device ids for 5.19-rc7.
  They include:

   - new usb-serial driver ids

   - typec uevent fix

   - uvc gadget driver fix

   - dwc3 driver fixes

   - ehci-fsl driver fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: ftdi_sio: add Belimo device ids
  drivers/usb/host/ehci-fsl: Fix interrupt setup in host mode.
  usb: gadget: uvc: fix changing interface name via configfs
  usb: typec: add missing uevent when partner support PD
  usb: dwc3-am62: remove unnecesary clk_put()
  usb: dwc3: gadget: Fix event pending check

9ed714db

Merge tag 'tty-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 8c91723a

Linus Torvalds authored Jul 16, 2022

Pull tty and serial driver fixes from Greg KH:
 "Here are some TTY and Serial driver fixes for 5.19-rc7. They resolve a
  number of reported problems including:

   - longtime bug in pty_write() that has been reported in the past.

   - 8250 driver fixes

   - new serial device ids

   - vt overlapping data copy bugfix

   - other tiny serial driver bugfixes

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'tty-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
  tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
  serial: 8250: dw: Fix the macro RZN1_UART_xDMACR_8_WORD_BURST
  vt: fix memory overlapping when deleting chars in the buffer
  serial: mvebu-uart: correctly report configured baudrate value
  serial: 8250: Fix PM usage_count for console handover
  serial: 8250: fix return error code in serial8250_request_std_resource()
  serial: stm32: Clear prev values before setting RTS delays
  tty: Add N_CAN327 line discipline ID for ELM327 based CAN driver
  serial: 8250: Fix __stop_tx() & DMA Tx restart races
  serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
  tty: serial: samsung_tty: set dma burst_size to 1
  serial: 8250: dw: enable using pdata with ACPI

8c91723a

Merge tag 's390-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · c658cabb

Linus Torvalds authored Jul 16, 2022

Pull s390 fixes from Alexander Gordeev:

 - Fix building of out-of-tree kernel modules without a pre-built kernel
   in case CONFIG_EXPOLINE_EXTERN=y.

 - Fix a reference counting error that could prevent unloading of zcrypt
   modules.

* tag 's390-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ap: fix error handling in __verify_queue_reservations()
  s390/nospec: remove unneeded header includes
  s390/nospec: build expoline.o for modules_prepare target

c658cabb

Merge tag 'pm-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ab6efe68

Linus Torvalds authored Jul 16, 2022

Pull power management fix from Rafael Wysocki
 "Fix recent regression in the cpufreq mediatek driver related to
  incorrect handling of regulator_get_optional() return value
  (AngeloGioacchino Del Regno)"

* tag 'pm-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: mediatek: Handle sram regulator probe deferral

ab6efe68

Merge tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 16c957f0

Linus Torvalds authored Jul 16, 2022

Pull ACPI fix from Rafael Wysocki:
 "Fix more fallout from recent changes of the ACPI CPPC handling on AMD
  platforms (Mario Limonciello)"

* tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory

16c957f0

Merge tag 'printk-for-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · be9b7b6a

Linus Torvalds authored Jul 16, 2022

Pull printk fix from Petr Mladek:

 - Make pr_flush() fast when consoles are suspended.

* tag 'printk-for-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: do not wait for consoles when suspended

be9b7b6a

random: cap jitter samples per bit to factor of HZ · 829d680e

Jason A. Donenfeld authored Jul 13, 2022

Currently the jitter mechanism will require two timer ticks per
iteration, and it requires N iterations per bit. This N is determined
with a small measurement, and if it's too big, it won't waste time with
jitter entropy because it'd take too long or not have sufficient entropy
anyway.

With the current max N of 32, there are large timeouts on systems with a
small CONFIG_HZ. Rather than set that maximum to 32, instead choose a
factor of CONFIG_HZ. In this case, 1/30 seems to yield sane values for
different configurations of CONFIG_HZ.
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Fixes: 78c768e6 ("random: vary jitter iterations based on cycle counter speed")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

829d680e

efi/x86: use naked RET on mixed mode call wrapper · 51a6fa07

Thadeu Lima de Souza Cascardo authored Jul 15, 2022

When running with return thunks enabled under 32-bit EFI, the system
crashes with:

  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
  BUG: unable to handle page fault for address: 000000005bc02900
  #PF: supervisor instruction fetch in kernel mode
  #PF: error_code(0x0011) - permissions violation
  PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
  Oops: 0011 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:0x5bc02900
  Code: Unable to access opcode bytes at RIP 0x5bc028d6.
  RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
  RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
  RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
  R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
  R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 0000000080050033
  CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
  Call Trace:
   ? efi_set_virtual_address_map+0x9c/0x175
   efi_enter_virtual_mode+0x4a6/0x53e
   start_kernel+0x67c/0x71e
   x86_64_start_reservations+0x24/0x2a
   x86_64_start_kernel+0xe9/0xf4
   secondary_startup_64_no_verify+0xe5/0xeb

That's because it cannot jump to the return thunk from the 32-bit code.

Using a naked RET and marking it as safe allows the system to proceed
booting.

Fixes: aa3d4803 ("x86: Use return-thunk in asm code")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: <stable@vger.kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

51a6fa07

x86/bugs: Remove apostrophe typo · bcf16315

Kim Phillips authored Jul 08, 2022

Remove a superfluous ' in the mitigation string.

Fixes: e8ec1b6e ("x86/bugs: Enable STIBP for JMP2RET")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>

bcf16315

15 Jul, 2022 9 commits

Merge tag 'riscv-for-linus-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 9b59ec8d

Linus Torvalds authored Jul 15, 2022

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to avoid printing a warning when modules do not exercise any
   errata-dependent behavior and the SiFive errata are enabled.

 - A fix to the Microchip PFSOC to attach the L2 cache to the CPU nodes.

* tag 'riscv-for-linus-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: don't warn for sifive erratas in modules
  riscv: dts: microchip: hook up the mpfs' l2cache

9b59ec8d

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a8ebfcd3

Linus Torvalds authored Jul 15, 2022

Pull KVM fixes from Paolo Bonzini:
 "RISC-V:
   - Fix missing PAGE_PFN_MASK

   - Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()

  x86:
   - Fix for nested virtualization when TSC scaling is active

   - Estimate the size of fastcc subroutines conservatively, avoiding
     disastrous underestimation when return thunks are enabled

   - Avoid possible use of uninitialized fields of 'struct
     kvm_lapic_irq'

  Generic:
   - Mark as such the boolean values available from the statistics file
     descriptors

   - Clarify statistics documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: emulate: do not adjust size of fastop and setcc subroutines
  KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
  Documentation: kvm: clarify histogram units
  kvm: stats: tell userspace which values are boolean
  x86/kvm: fix FASTOP_SIZE when return thunks are enabled
  KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
  RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
  riscv: Fix missing PAGE_PFN_MASK

a8ebfcd3

Merge tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client · 1ce9d792

Linus Torvalds authored Jul 15, 2022

Pull ceph fix from Ilya Dryomov:
 "A folio locking fixup that Xiubo and David cooperated on, marked for
  stable. Most of it is in netfs but I picked it up into ceph tree on
  agreement with David"

* tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client:
  netfs: do not unlock and put the folio twice

1ce9d792

Merge tag 'spi-fix-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 8006112d

Linus Torvalds authored Jul 15, 2022

Pull spi fixes from Mark Brown:
 "A few driver specific fixes, none especially remarkable, plus a
  MAINTAINERS file update due to the previous maintainer for the NXP
  FSPI driver having left the company"

* tag 'spi-fix-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cadence-quadspi: Remove spi_master_put() in probe failure path
  MAINTAINERS: change the NXP FSPI driver maintainer.
  spi: amd: Limit max transfer and message size
  spi: aspeed: Fix division by zero
  spi: aspeed: Add dev_dbg() to dump the spi-mem direct mapping descriptor

8006112d

Merge tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 1c49f281

Linus Torvalds authored Jul 15, 2022

Pull ARM SoC fixes from Arnd Bergmann:
 "Most of the contents are bugfixes for the devicetree files:

   - A Qualcomm MSM8974 pin controller regression, caused by a cleanup
     patch that gets partially reverted here.

   - Missing properties for Broadcom BCM49xx to fix timer detection and
     SMP boot.

   - Fix touchscreen pinctrl for imx6ull-colibri board

   - Multiple fixes for Rockchip rk3399 based machines including the vdu
     clock-rate fix, otg port fix on Quartz64-A and ethernet on
     Quartz64-B

   - Fixes for misspelled DT contents causing minor problems on
     imx6qdl-ts7970m, orangepi-zero, sama5d2, kontron-kswitch-d10, and
     ls1028a

  And a couple of changes elsewhere:

   - Fix binding for Allwinner D1 display pipeline

   - Trivial code fixes to the TEE and reset controller driver
     subsystems and the rockchip platform code.

   - Multiple updates to the MAINTAINERS files, marking the Palm Treo
     support as orphaned, and fixing some entries for added or changed
     file names"

* tag 'soc-fixes-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
  arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
  arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC
  ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
  ARM: dts: at91: sama5d2: Fix typo in i2s1 node
  tee: tee_get_drvdata(): fix description of return value
  optee: Remove duplicate 'of' in two places.
  ARM: dts: kswitch-d10: use open drain mode for coma-mode pins
  ARM: dts: colibri-imx6ull: fix snvs pinmux group
  optee: smc_abi.c: fix wrong pointer passed to IS_ERR/PTR_ERR()
  MAINTAINERS: add polarfire rng, pci and clock drivers
  MAINTAINERS: mark ARM/PALM TREO SUPPORT orphan
  ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
  arm64: dts: ls1028a: Update SFP node to include clock
  dt-bindings: display: sun4i: Fix D1 pipeline count
  ARM: dts: qcom: msm8974: re-add missing pinctrl
  reset: Fix devm bulk optional exclusive control getter
  MAINTAINERS: rectify entry for SYNOPSYS AXS10x RESET CONTROLLER DRIVER
  ARM: rockchip: Add missing of_node_put() in rockchip_suspend_init()
  arm64: dts: rockchip: Assign RK3399 VDU clock rate
  arm64: dts: rockchip: Fix Quartz64-A dwc3 otg port behavior
  ...

1c49f281

Revert "btrfs: turn delayed_nodes_tree into an XArray" · 088aea3b

David Sterba authored Jul 15, 2022

This reverts commit 253bf575.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

088aea3b

Revert "btrfs: turn name_cache radix tree into XArray in send_ctx" · 5b8418b8

David Sterba authored Jul 15, 2022

This reverts commit 40769420.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

5b8418b8

Revert "btrfs: turn fs_info member buffer_radix into XArray" · 01cd3909

David Sterba authored Jul 15, 2022

This reverts commit 8ee92268.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

01cd3909

Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray" · fc7cbcd4

David Sterba authored Jul 15, 2022

This reverts commit 48b36a60.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

fc7cbcd4