Commits · 792b9f65a568f48c50b3175536db9cde5a1edcc0 · Kirill Smelkov / linux

13 Jun, 2022 8 commits

sched: Allow newidle balancing to bail out of load_balance · 792b9f65

Josh Don authored Jun 08, 2022

While doing newidle load balancing, it is possible for new tasks to
arrive, such as with pending wakeups. newidle_balance() already accounts
for this by exiting the sched_domain load_balance() iteration if it
detects these cases. This is very important for minimizing wakeup
latency.

However, if we are already in load_balance(), we may stay there for a
while before returning back to newidle_balance(). This is most
exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
very straightforward workaround to this is to adjust should_we_balance()
to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
detected.

This was tested with the following reproduction:
- two threads that take turns sleeping and waking each other up are
  affined to two cores
- a large number of threads with 100% utilization are pinned to all
  other cores

Without this patch, wakeup latency was ~120us for the pair of threads,
almost entirely spent in load_balance(). With this patch, wakeup latency
is ~6us.
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220609025515.2086253-1-joshdon@google.com

792b9f65

sched/deadline: Use proc_douintvec_minmax() limit minimum value · 2ed81e76

Yajun Deng authored Jun 07, 2022

sysctl_sched_dl_period_max and sysctl_sched_dl_period_min are unsigned
integer, but proc_dointvec() wouldn't return error even if we set a
negative number.

Use proc_douintvec_minmax() instead of proc_dointvec(). Add extra1 for
sysctl_sched_dl_period_max and extra2 for sysctl_sched_dl_period_min.

It's just an optimization for match data and proc_handler in struct
ctl_table. The 'if (period < min || period > max)' in __checkparam_dl()
will work fine even if there hasn't this patch.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Link: https://lore.kernel.org/r/20220607101807.249965-1-yajun.deng@linux.dev

2ed81e76

sched/fair: Optimize and simplify rq leaf_cfs_rq_list · 51bf903b

Chengming Zhou authored Jun 01, 2022

We notice the rq leaf_cfs_rq_list has two problems when do bugfix
backports and some test profiling.

1. cfs_rqs under throttled subtree could be added to the list, and
make their fully decayed ancestors on the list, even though not needed.

2. #1 also make the leaf_cfs_rq_list management complex and error prone,
this is the list of related bugfix so far:

commit 31bc6aea ("sched/fair: Optimize update_blocked_averages()")
commit fe61468b ("sched/fair: Fix enqueue_task_fair warning")
commit b34cb07d ("sched/fair: Fix enqueue_task_fair() warning some more")
commit 39f23ce0 ("sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list")
commit 0258bdfa ("sched/fair: Fix unfairness caused by missing load decay")
commit a7b359fc ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
commit fdaba61e ("sched/fair: Ensure that the CFS parent is added after unthrottling")
commit 2630cde2 ("sched/fair: Add ancestors of unthrottled undecayed cfs_rq")

commit 31bc6aea ("sched/fair: Optimize update_blocked_averages()")
delete every cfs_rq under throttled subtree from rq->leaf_cfs_rq_list,
and delete the throttled_hierarchy() test in update_blocked_averages(),
which optimized update_blocked_averages().

But those later bugfix add cfs_rqs under throttled subtree back to
rq->leaf_cfs_rq_list again, with their fully decayed ancestors, for
the integrity of rq->leaf_cfs_rq_list.

This patch takes another method, skip all cfs_rqs under throttled
hierarchy when list_add_leaf_cfs_rq(), to completely make cfs_rqs
under throttled subtree off the leaf_cfs_rq_list.

So we don't need to consider throttled related things in
enqueue_entity(), unthrottle_cfs_rq() and enqueue_task_fair(),
which simplify the code a lot. Also optimize update_blocked_averages()
since cfs_rqs under throttled hierarchy and their ancestors
won't be on the leaf_cfs_rq_list.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20220601021848.76943-1-zhouchengming@bytedance.com

51bf903b

sched/fair: Consider CPU affinity when allowing NUMA imbalance in find_idlest_group() · f5b2eeb4

K Prateek Nayak authored Apr 07, 2022

In the case of systems containing multiple LLCs per socket, like
AMD Zen systems, users want to spread bandwidth hungry applications
across multiple LLCs. Stream is one such representative workload where
the best performance is obtained by limiting one stream thread per LLC.
To ensure this, users are known to pin the tasks to a specify a subset
of the CPUs consisting of one CPU per LLC while running such bandwidth
hungry tasks.

Suppose we kickstart a multi-threaded task like stream with 8 threads
using taskset or numactl to run on a subset of CPUs on a 2 socket Zen3
server where each socket contains 128 CPUs
(0-63,128-191 in one socket, 64-127,192-255 in another socket)

Eg: numactl -C 0,16,32,48,64,80,96,112 ./stream8

Here each CPU in the list is from a different LLC and 4 of those LLCs
are on one socket, while the other 4 are on another socket.

Ideally we would prefer that each stream thread runs on a different
CPU from the allowed list of CPUs. However, the current heuristics in
find_idlest_group() do not allow this during the initial placement.

Suppose the first socket (0-63,128-191) is our local group from which
we are kickstarting the stream tasks. The first four stream threads
will be placed in this socket. When it comes to placing the 5th
thread, all the allowed CPUs are from the local group (0,16,32,48)
would have been taken.

However, the current scheduler code simply checks if the number of
tasks in the local group is fewer than the allowed numa-imbalance
threshold. This threshold was previously 25% of the NUMA domain span
(in this case threshold = 32) but after the v6 of Mel's patchset
"Adjust NUMA imbalance for multiple LLCs", got merged in sched-tip,
Commit: e496132e ("sched/fair: Adjust the allowed NUMA imbalance
when SD_NUMA spans multiple LLCs") it is now equal to number of LLCs
in the NUMA domain, for processors with multiple LLCs.
(in this case threshold = 8).

For this example, the number of tasks will always be within threshold
and thus all the 8 stream threads will be woken up on the first socket
thereby resulting in sub-optimal performance.

The following sched_wakeup_new tracepoint output shows the initial
placement of tasks in the current tip/sched/core on the Zen3 machine:

stream-5313    [016] d..2.   627.005036: sched_wakeup_new: comm=stream pid=5315 prio=120 target_cpu=032
stream-5313    [016] d..2.   627.005086: sched_wakeup_new: comm=stream pid=5316 prio=120 target_cpu=048
stream-5313    [016] d..2.   627.005141: sched_wakeup_new: comm=stream pid=5317 prio=120 target_cpu=000
stream-5313    [016] d..2.   627.005183: sched_wakeup_new: comm=stream pid=5318 prio=120 target_cpu=016
stream-5313    [016] d..2.   627.005218: sched_wakeup_new: comm=stream pid=5319 prio=120 target_cpu=016
stream-5313    [016] d..2.   627.005256: sched_wakeup_new: comm=stream pid=5320 prio=120 target_cpu=016
stream-5313    [016] d..2.   627.005295: sched_wakeup_new: comm=stream pid=5321 prio=120 target_cpu=016

Once the first four threads are distributed among the allowed CPUs of
socket one, the rest of the treads start piling on these same CPUs
when clearly there are CPUs on the second socket that can be used.

Following the initial pile up on a small number of CPUs, though the
load-balancer eventually kicks in, it takes a while to get to {4}{4}
and even {4}{4} isn't stable as we observe a bunch of ping ponging
between {4}{4} to {5}{3} and back before a stable state is reached
much later (1 Stream thread per allowed CPU) and no more migration is
required.

We can detect this piling and avoid it by checking if the number of
allowed CPUs in the local group are fewer than the number of tasks
running in the local group and use this information to spread the
5th task out into the next socket (after all, the goal in this
slowpath is to find the idlest group and the idlest CPU during the
initial placement!).

The following sched_wakeup_new tracepoint output shows the initial
placement of tasks after adding this fix on the Zen3 machine:

stream-4485    [016] d..2.   230.784046: sched_wakeup_new: comm=stream pid=4487 prio=120 target_cpu=032
stream-4485    [016] d..2.   230.784123: sched_wakeup_new: comm=stream pid=4488 prio=120 target_cpu=048
stream-4485    [016] d..2.   230.784167: sched_wakeup_new: comm=stream pid=4489 prio=120 target_cpu=000
stream-4485    [016] d..2.   230.784222: sched_wakeup_new: comm=stream pid=4490 prio=120 target_cpu=112
stream-4485    [016] d..2.   230.784271: sched_wakeup_new: comm=stream pid=4491 prio=120 target_cpu=096
stream-4485    [016] d..2.   230.784322: sched_wakeup_new: comm=stream pid=4492 prio=120 target_cpu=080
stream-4485    [016] d..2.   230.784368: sched_wakeup_new: comm=stream pid=4493 prio=120 target_cpu=064

We see that threads are using all of the allowed CPUs and there is
no pileup.

No output is generated for tracepoint sched_migrate_task with this
patch due to a perfect initial placement which removes the need
for balancing later on - both across NUMA boundaries and within
NUMA boundaries for stream.

Following are the results from running 8 Stream threads with and
without pinning on a dual socket Zen3 Machine (2 x 64C/128T):

During the testing of this patch, the tip sched/core was at
commit: 089c02ae "ftrace: Use preemption model accessors for trace
header printout"

Pinning is done using: numactl -C 0,16,32,48,64,80,96,112 ./stream8

	           5.18.0-rc1               5.18.0-rc1                5.18.0-rc1
               tip sched/core           tip sched/core            tip sched/core
                 (no pinning)                + pinning              + this-patch
								       + pinning

 Copy:   109364.74 (0.00 pct)     94220.50 (-13.84 pct)    158301.28 (44.74 pct)
Scale:   109670.26 (0.00 pct)     90210.59 (-17.74 pct)    149525.64 (36.34 pct)
  Add:   129029.01 (0.00 pct)    101906.00 (-21.02 pct)    186658.17 (44.66 pct)
Triad:   127260.05 (0.00 pct)    106051.36 (-16.66 pct)    184327.30 (44.84 pct)

Pinning currently hurts the performance compared to unbound case on
tip/sched/core. With the addition of this patch, we are able to
outperform tip/sched/core by a good margin with pinning.

Following are the results from running 16 Stream threads with and
without pinning on a dual socket IceLake Machine (2 x 32C/64T):

NUMA Topology of Intel Skylake machine:
Node 1: 0,2,4,6 ... 126 (Even numbers)
Node 2: 1,3,5,7 ... 127 (Odd numbers)

Pinning is done using: numactl -C 0-15 ./stream16

	           5.18.0-rc1               5.18.0-rc1                5.18.0-rc1
               tip sched/core           tip sched/core            tip sched/core
                 (no pinning)                 +pinning              + this-patch
								       + pinning

 Copy:    85815.31 (0.00 pct)     149819.21 (74.58 pct)    156807.48 (82.72 pct)
Scale:    64795.60 (0.00 pct)      97595.07 (50.61 pct)     99871.96 (54.13 pct)
  Add:    71340.68 (0.00 pct)     111549.10 (56.36 pct)    114598.33 (60.63 pct)
Triad:    68890.97 (0.00 pct)     111635.16 (62.04 pct)    114589.24 (66.33 pct)

In case of Icelake machine, with single LLC per socket, pinning across
the two sockets reduces cache contention, thus showing great
improvement in pinned case which is further benefited by this patch.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20220407111222.22649-1-kprateek.nayak@amd.com

f5b2eeb4

sched/numa: Adjust imb_numa_nr to a better approximation of memory channels · 026b98a9

Mel Gorman authored May 20, 2022

For a single LLC per node, a NUMA imbalance is allowed up until 25%
of CPUs sharing a node could be active. One intent of the cut-off is
to avoid an imbalance of memory channels but there is no topological
information based on active memory channels. Furthermore, there can
be differences between nodes depending on the number of populated
DIMMs.

A cut-off of 25% was arbitrary but generally worked. It does have a severe
corner cases though when an parallel workload is using 25% of all available
CPUs over-saturates memory channels. This can happen due to the initial
forking of tasks that get pulled more to one node after early wakeups
(e.g. a barrier synchronisation) that is not quickly corrected by the
load balancer. The LB may fail to act quickly as the parallel tasks are
considered to be poor migrate candidates due to locality or cache hotness.

On a range of modern Intel CPUs, 12.5% appears to be a better cut-off
assuming all memory channels are populated and is used as the new cut-off
point. A minimum of 1 is specified to allow a communicating pair to
remain local even for CPUs with low numbers of cores. For modern AMDs,
there are multiple LLCs and are not affected.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20220520103519.1863-5-mgorman@techsingularity.net

026b98a9

sched/numa: Apply imbalance limitations consistently · cb29a5c1

Mel Gorman authored May 20, 2022

The imbalance limitations are applied inconsistently at fork time
and at runtime. At fork, a new task can remain local until there are
too many running tasks even if the degree of imbalance is larger than
NUMA_IMBALANCE_MIN which is different to runtime. Secondly, the imbalance
figure used during load balancing is different to the one used at NUMA
placement. Load balancing uses the number of tasks that must move to
restore imbalance where as NUMA balancing uses the total imbalance.

In combination, it is possible for a parallel workload that uses a small
number of CPUs without applying scheduler policies to have very variable
run-to-run performance.

[lkp@intel.com: Fix build breakage for arc-allyesconfig]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20220520103519.1863-4-mgorman@techsingularity.net

cb29a5c1

sched/numa: Do not swap tasks between nodes when spare capacity is available · 13ede331

Mel Gorman authored May 20, 2022

If a destination node has spare capacity but there is an imbalance then
two tasks are selected for swapping. If the tasks have no numa group
or are within the same NUMA group, it's simply shuffling tasks around
without having any impact on the compute imbalance. Instead, it's just
punishing one task to help another.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20220520103519.1863-3-mgorman@techsingularity.net

13ede331

sched/numa: Initialise numa_migrate_retry · 70ce3ea9

Mel Gorman authored May 20, 2022

On clone, numa_migrate_retry is inherited from the parent which means
that the first NUMA placement of a task is non-deterministic. This
affects when load balancing recognises numa tasks and whether to
migrate "regular", "remote" or "all" tasks between NUMA scheduler
domains.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20220520103519.1863-2-mgorman@techsingularity.net

70ce3ea9

12 Jun, 2022 10 commits

Linux 5.19-rc2 · b13baccc
Linus Torvalds authored Jun 12, 2022

b13baccc

Merge tag 'platform-drivers-x86-v5.19-2' of... · 99795285

Linus Torvalds authored Jun 12, 2022

Merge tag 'platform-drivers-x86-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "Highlights:

   - Fix hp-wmi regression on HP Omen laptops introduced in 5.18

   - Several hardware-id additions

   - A couple of other tiny fixes"

* tag 'platform-drivers-x86-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel: hid: Add Surface Go to VGBS allow list
  platform/x86: hp-wmi: Use zero insize parameter only when supported
  platform/x86: hp-wmi: Resolve WMI query failures on some devices
  platform/x86: gigabyte-wmi: Add support for B450M DS3H-CF
  platform/x86: gigabyte-wmi: Add Z690M AORUS ELITE AX DDR4 support
  platform/x86: barco-p50-gpio: Add check for platform_driver_register
  platform/x86/intel: pmc: Support Intel Raptorlake P
  platform/x86/intel: Fix pmt_crashlog array reference
  platform/mellanox: Add static in struct declaration.
  platform/mellanox: Spelling s/platfom/platform/

99795285

Merge tag 'wq-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · b0cb8db3

Linus Torvalds authored Jun 12, 2022

Pull workqueue fixes from Tejun Heo:
 "Tetsuo's patch to trigger build warnings if system-wide wq's are
  flushed along with a TP type update and trivial comment update"

* tag 'wq-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Switch to new kerneldoc syntax for named variable macro argument
  workqueue: Fix type of cpu in trace event
  workqueue: Wrap flush_workqueue() using a macro

b0cb8db3

Merge tag 'kbuild-fixes-v5.19' of... · e3b8e2de

Linus Torvalds authored Jun 12, 2022

Merge tag 'kbuild-fixes-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Make the *.mod build rule portable for POSIX awk

 - Fix regression of 'make nsdeps'

 - Make scripts/check-local-export working for older bash versions

 - Fix scripts/gdb to extract the .config data from vmlinux

* tag 'kbuild-fixes-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  scripts/gdb: change kernel config dumping method
  scripts/check-local-export: avoid 'wait $!' for process substitution
  scripts/nsdeps: adjust to the format change of *.mod files
  kbuild: avoid regex RS for POSIX awk

e3b8e2de

Merge tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 2275c6ba

Linus Torvalds authored Jun 12, 2022

Pull cifs client fixes from Steve French:
 "Three reconnect fixes, all for stable as well.

  One of these three reconnect fixes does address a problem with
  multichannel reconnect, but this does not include the additional
  fix (still being tested) for dynamically detecting multichannel
  adapter changes which will improve those reconnect scenarios even
  more"

* tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: populate empty hostnames for extra channels
  cifs: return errors during session setup during reconnects
  cifs: fix reconnect on smb3 mount types

2275c6ba

Merge tag 'random-5.19-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · 3cae0d84

Linus Torvalds authored Jun 12, 2022

Pull random number generator fixes from Jason Donenfeld:

- A fix for a 5.19 regression for a case in which early device tree
initializes the RNG, which flips a static branch.

On most plaforms, jump labels aren't initialized until much later, so
this caused splats. On a few mailing list threads, we cooked up easy
fixes for arm64, arm32, and risc-v. But then things looked slightly
more involved for xtensa, powerpc, arc, and mips. And at that point,
when we're patching 7 architectures in a place before the console is
even available, it seems like the cost/risk just wasn't worth it.

So random.c works around it now by checking the already exported
`static_key_initialized` boolean, as though somebody already ran into
this issue in the past. I'm not super jazzed about that; it'd be
prettier to not have to complicate downstream code. But I suppose
it's practical.

- A few small code nits and adding a missing __init annotation.

- A change to the default config values to use the cpu and bootloader's
seeds for initializing the RNG earlier.

This brings them into line with what all the distros do (Fedora/RHEL,
Debian, Ubuntu, Gentoo, Arch, NixOS, Alpine, SUSE, and Void... at
least), and moreover will now give us test coverage in various test
beds that might have caught the above device tree bug earlier.

- A change to WireGuard CI's configuration to increase test coverage
around the RNG.

- A documentation comment fix to unrelated maintainerless CRC code that
I was asked to take, I guess because it has to do with polynomials
(which the RNG thankfully no longer uses).

* tag 'random-5.19-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
wireguard: selftests: use maximum cpu features and allow rng seeding
random: remove rng_has_arch_random()
random: credit cpu and bootloader seeds by default
random: do not use jump labels before they are initialized
random: account for arch randomness in bits
random: mark bootloader randomness code as __init
random: avoid checking crng_ready() twice in random_init()
crc-itu-t: fix typo in CRC ITU-T polynomial comment

3cae0d84

platform/x86/intel: hid: Add Surface Go to VGBS allow list · d4fe9cc4

Duke Lee authored Jun 07, 2022

The Surface Go reports Chassis Type 9 (Laptop,) so the device needs to be
added to dmi_vgbs_allow_list to enable tablet mode when an attached Type
Cover is folded back.

BugLink: https://github.com/linux-surface/linux-surface/issues/837Signed-off-by: Duke Lee <krnhotwings@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220607213654.5567-1-krnhotwings@gmail.comReviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

d4fe9cc4

platform/x86: hp-wmi: Use zero insize parameter only when supported · 65f936f3

Bedant Patnaik authored Jun 09, 2022

commit be9d73e6 ("platform/x86: hp-wmi: Fix 0x05 error code reported by
several WMI calls") and commit 12b19f14 ("platform/x86: hp-wmi: Fix
hp_wmi_read_int() reporting error (0x05)") cause ACPI BIOS Error (bug):
Attempt to CreateField of length zero (20211217/dsopcode-133) because of
the ACPI method HWMC, which unconditionally creates a Field of
size (insize*8) bits:
	CreateField (Arg1, 0x80, (Local5 * 0x08), DAIN)
In cases where args->insize = 0, the Field size is 0, resulting in
an error.

Fix this by using zero insize only if 0x5 error code is returned

Tested on Omen 15 AMD (2020) board ID: 8786.

Fixes: be9d73e6 ("platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls")
Signed-off-by: Bedant Patnaik <bedant.patnaik@gmail.com>
Tested-by: Jorge Lopez <jorge.lopez2@hp.com>
Link: https://lore.kernel.org/r/41be46743d21c78741232a47bbb5f1cdbcc3d21e.camel@gmail.comReviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

65f936f3

platform/x86: hp-wmi: Resolve WMI query failures on some devices · dc6a6ab5

Jorge Lopez authored Jun 08, 2022

WMI queries fail on some devices where the ACPI method HWMC
unconditionally attempts to create Fields beyond the buffer
if the buffer is too small, this breaks essential features
such as power profiles:

         CreateByteField (Arg1, 0x10, D008)
         CreateByteField (Arg1, 0x11, D009)
         CreateByteField (Arg1, 0x12, D010)
         CreateDWordField (Arg1, 0x10, D032)
         CreateField (Arg1, 0x80, 0x0400, D128)

In cases where args->data had zero length, ACPI BIOS Error
(bug): AE_AML_BUFFER_LIMIT, Field [D008] at bit
offset/length 128/8 exceeds size of target Buffer (128 bits)
(20211217/dsopcode-198) was obtained.

ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [D009] at bit
offset/length 136/8 exceeds size of target Buffer (136bits)
(20211217/dsopcode-198)

The original code created a buffer size of 128 bytes regardless if
the WMI call required a smaller buffer or not.  This particular
behavior occurs in older BIOS and reproduced in OMEN laptops.  Newer
BIOS handles buffer sizes properly and meets the latest specification
requirements.  This is the reason why testing with a dynamically
allocated buffer did not uncover any failures with the test systems at
hand.

This patch was tested on several OMEN, Elite, and Zbooks.  It was
confirmed the patch resolves HPWMI_FAN GET/SET calls in an OMEN
Laptop 15-ek0xxx.  No problems were reported when testing on several Elite
and Zbooks notebooks.

Fixes: 4b4967cb ("platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated")
Signed-off-by: Jorge Lopez <jorge.lopez2@hp.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220608212923.8585-2-jorge.lopez2@hp.comReviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

dc6a6ab5

workqueue: Switch to new kerneldoc syntax for named variable macro argument · 8bee9dd9

Jonathan Neuschäfer authored Jun 10, 2022

The syntax without dots is available since commit 43756e34
("scripts/kernel-doc: Add support for named variable macro arguments").

The same HTML output is produced with and without this patch.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

8bee9dd9

11 Jun, 2022 9 commits

Merge tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 7a68065e

Linus Torvalds authored Jun 11, 2022

Pull gpio fixes from Bartosz Golaszewski:
 "A set of fixes. Most address the new warning we emit at build time
  when irq chips are not immutable with some additional tweaks to
  gpio-crystalcove from Andy and a small tweak to gpio-dwapd.

   - make irq_chip structs immutable in several Diolan and intel drivers
     to get rid of the new warning we emit when fiddling with irq chips

   - don't print error messages on probe deferral in gpio-dwapb"

* tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: dwapb: Don't print error on -EPROBE_DEFER
  gpio: dln2: make irq_chip immutable
  gpio: sch: make irq_chip immutable
  gpio: merrifield: make irq_chip immutable
  gpio: wcove: make irq_chip immutable
  gpio: crystalcove: Join function declarations and long lines
  gpio: crystalcove: Use specific type and API for IRQ number
  gpio: crystalcove: make irq_chip immutable

7a68065e

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · cecb3540

Linus Torvalds authored Jun 11, 2022

Pull SCSI fixes from James Bottomley:
 "Driver fixes and and one core patch.

  Nine of the driver patches are minor fixes and reworks to lpfc and the
  rest are trivial and minor fixes elsewhere"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: pmcraid: Fix missing resource cleanup in error case
  scsi: ipr: Fix missing/incorrect resource cleanup in error case
  scsi: mpt3sas: Fix out-of-bounds compiler warning
  scsi: lpfc: Update lpfc version to 14.2.0.4
  scsi: lpfc: Allow reduced polling rate for nvme_admin_async_event cmd completion
  scsi: lpfc: Add more logging of cmd and cqe information for aborted NVMe cmds
  scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topology
  scsi: lpfc: Resolve NULL ptr dereference after an ELS LOGO is aborted
  scsi: lpfc: Address NULL pointer dereference after starget_to_rport()
  scsi: lpfc: Resolve some cleanup issues following SLI path refactoring
  scsi: lpfc: Resolve some cleanup issues following abort path refactoring
  scsi: lpfc: Correct BDE type for XMIT_SEQ64_WQE in lpfc_ct_reject_event()
  scsi: vmw_pvscsi: Expand vcpuHint to 16 bits
  scsi: sd: Fix interpretation of VPD B9h length

cecb3540

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · abe71eb3

Linus Torvalds authored Jun 11, 2022

Pull virtio fixes from Michael Tsirkin:
 "Fixes all over the place, most notably fixes for latent bugs in
  drivers that got exposed by suppressing interrupts before DRIVER_OK,
  which in turn has been done by 8b4ec69d ("virtio: harden vring
  IRQ")"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  um: virt-pci: set device ready in probe()
  vdpa: make get_vq_group and set_group_asid optional
  virtio: Fix all occurences of the "the the" typo
  vduse: Fix NULL pointer dereference on sysfs access
  vringh: Fix loop descriptors check in the indirect cases
  vdpa/mlx5: clean up indenting in handle_ctrl_vlan()
  vdpa/mlx5: fix error code for deleting vlan
  virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed
  vdpa/mlx5: Fix syntax errors in comments
  virtio-rng: make device ready before making request

abe71eb3

Merge tag 'loongarch-fixes-5.19-1' of... · 0678afa6

Linus Torvalds authored Jun 11, 2022

Merge tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen.
 "Fix build errors and a stale comment"

* tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Remove MIPS comment about cycle counter
  LoongArch: Fix copy_thread() build errors
  LoongArch: Fix the !CONFIG_SMP build

0678afa6

iov_iter: fix build issue due to possible type mis-match · 1c27f1fc

Linus Torvalds authored Jun 11, 2022

Commit 6c776766 ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()")
introduced a problem on some 32-bit architectures (at least arm, xtensa,
csky,sparc and mips), that have a 'size_t' that is 'unsigned int'.

The reason is that we now do

    min(nr * PAGE_SIZE - offset, maxsize);

where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is
'unsigned long'.  As a result, the normal C type rules means that the
first argument to 'min()' ends up being 'unsigned long'.

In contrast, 'maxsize' is of type 'size_t'.

Now, 'size_t' and 'unsigned long' are always the same physical type in
the kernel, so you'd think this doesn't matter, and from an actual
arithmetic standpoint it doesn't.

But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if
it could also be 'unsigned long'.  In that situation, both are unsigned
32-bit types, but they are not the *same* type.

And as a result 'min()' will complain about the distinct types (ignore
the "pointer types" part of the error message: that's an artifact of the
way we have made 'min()' check types for being the same):

  lib/iov_iter.c: In function 'iter_xarray_get_pages':
  include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
     20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
        |                                   ^~
  lib/iov_iter.c:1464:16: note: in expansion of macro 'min'
   1464 |         return min(nr * PAGE_SIZE - offset, maxsize);
        |                ^~~

This was not visible on 64-bit architectures (where we always define
'size_t' to be 'unsigned long').

Force these cases to use 'min_t(size_t, x, y)' to make the type explicit
and avoid the issue.

[ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned
  long' arithmetically. We've certainly historically seen environments
  with 16-bit address spaces and 32-bit 'unsigned long'.

  Similarly, even in 64-bit modern environments, 'size_t' could be its
  own type distinct from 'unsigned long', even if it were arithmetically
  identical.

  So the above type commentary is only really descriptive of the kernel
  environment, not some kind of universal truth for the kinds of wild
  and crazy situations that are allowed by the C standard ]
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/
Cc: Jeff Layton <jlayton@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1c27f1fc

wireguard: selftests: use maximum cpu features and allow rng seeding · 17b0128a

Jason A. Donenfeld authored Jun 10, 2022

By forcing the maximum CPU that QEMU has available, we expose additional
capabilities, such as the RNDR instruction, which increases test
coverage. This then allows the CI to skip the fake seeding step in some
cases. Also enable STRICT_KERNEL_RWX to catch issues related to early
jump labels when the RNG is initialized at boot.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

17b0128a

scripts/gdb: change kernel config dumping method · 1f7a6cf6

Kuan-Ying Lee authored Jun 10, 2022

MAGIC_START("IKCFG_ST") and MAGIC_END("IKCFG_ED") are moved out
from the kernel_config_data variable.

Thus, we parse kernel_config_data directly instead of considering
offset of MAGIC_START and MAGIC_END.

Fixes: 13610aa9 ("kernel/configs: use .incbin directive to embed config_data.gz")
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

1f7a6cf6

um: virt-pci: set device ready in probe() · eacea844

Vincent Whitchurch authored Jun 10, 2022

Call virtio_device_ready() to make this driver work after commit
b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the
virtqueues in the probe function.  (The virtio core sets the device
ready when probe returns.)

Fixes: 8b4ec69d ("virtio: harden vring IRQ")
Fixes: 68f5d3f3 ("um: add PCI over virtio emulation driver")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Message-Id: <20220610151203.3492541-1-vincent.whitchurch@axis.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Johannes Berg <johannes@sipsolutions.net>

eacea844

Merge tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 0885eacd

Linus Torvalds authored Jun 10, 2022

Pull nfsd fixes from Chuck Lever:
 "Notable changes:

   - There is now a backup maintainer for NFSD

  Notable fixes:

   - Prevent array overruns in svc_rdma_build_writes()

   - Prevent buffer overruns when encoding NFSv3 READDIR results

   - Fix a potential UAF in nfsd_file_put()"

* tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
  SUNRPC: Clean up xdr_get_next_encode_buffer()
  SUNRPC: Clean up xdr_commit_encode()
  SUNRPC: Optimize xdr_reserve_space()
  SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
  SUNRPC: Trap RDMA segment overflows
  NFSD: Fix potential use-after-free in nfsd_file_put()
  MAINTAINERS: reciprocal co-maintainership for file locking and nfsd

0885eacd

10 Jun, 2022 13 commits

cifs: populate empty hostnames for extra channels · 4c14d704

Shyam Prasad N authored Jun 06, 2022

Currently, the secondary channels of a multichannel session
also get hostname populated based on the info in primary channel.
However, this will end up with a wrong resolution of hostname to
IP address during reconnect.

This change fixes this by not populating hostname info for all
secondary channels.

Fixes: 5112d80c ("cifs: populate server_hostname for extra channels")
Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

4c14d704

Merge tag 'for-5.19/dm-fixes-2' of... · 90add6d4

Linus Torvalds authored Jun 10, 2022

Merge tag 'for-5.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM core's bioset initialization so that blk integrity pool is
   properly setup. Remove now unused bioset_init_from_src.

 - Fix DM zoned hang from locking imbalance due to needless check in
   clone_endio().

* tag 'for-5.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix zoned locking imbalance due to needless check in clone_endio
  block: remove bioset_init_from_src
  dm: fix bio_set allocation

90add6d4

Merge branch 'fscache-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 045fb9c2

Linus Torvalds authored Jun 10, 2022

Pull fscache cleanups from David Howells:

 - fix checker complaint in afs

 - two netfs cleanups:

    - netfs_inode calling convention cleanup plus the requisite
      documentation changes

    -  replace the ->cleanup op with a ->free_request op.

       This is possible as the I/O request is now always available at
       the cleanup point as the stuff to be cleaned up is no longer
       passed into the API functions, but rather obtained by ->init_request.

* 'fscache-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  netfs: Rename the netfs_io_request cleanup op and give it an op pointer
  netfs: Further cleanups after struct netfs_inode wrapper introduced
  afs: Fix some checker issues

045fb9c2

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · b0989159

Linus Torvalds authored Jun 10, 2022

Pull iov_iter fix from Al Viro:
 "ITER_XARRAY get_pages fix; now the return value is a lot saner (and
  more similar to logics for other flavours)"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  iov_iter: Fix iter_xarray_get_pages{,_alloc}()

b0989159

platform/x86: gigabyte-wmi: Add support for B450M DS3H-CF · c6bc7e8e

August Wikerfors authored Jun 08, 2022

Tested and works on my system.
Signed-off-by: August Wikerfors <git@augustwikerfors.se>
Link: https://lore.kernel.org/r/20220608212028.28307-1-git@augustwikerfors.seSigned-off-by: Hans de Goede <hdegoede@redhat.com>

c6bc7e8e

platform/x86: gigabyte-wmi: Add Z690M AORUS ELITE AX DDR4 support · 8a041afe

Piotr Chmura authored Jun 06, 2022

Add dmi_system_id of Gigabyte Z690M AORUS ELITE AX DDR4 board.
Tested on my PC.
Signed-off-by: Piotr Chmura <chmooreck@gmail.com>
Link: https://lore.kernel.org/r/bd83567e-ebf5-0b31-074b-5f6dc7f7c147@gmail.comSigned-off-by: Hans de Goede <hdegoede@redhat.com>

8a041afe

platform/x86: barco-p50-gpio: Add check for platform_driver_register · 011881b8

Jiasheng Jiang authored May 26, 2022

As platform_driver_register() could fail, it should be better
to deal with the return value in order to maintain the code
consisitency.

Fixes: 86af1d02 ("platform/x86: Support for EC-connected GPIOs for identify LED/button on Barco P50 board")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Peter Korsgaard <peter.korsgaard@barco.com>
Link: https://lore.kernel.org/r/20220526090345.1444172-1-jiasheng@iscas.ac.cnSigned-off-by: Hans de Goede <hdegoede@redhat.com>

011881b8

platform/x86/intel: pmc: Support Intel Raptorlake P · 552f3b80

George D Sworo authored Jun 01, 2022

Add Raptorlake P to the list of the platforms that intel_pmc_core driver
supports for pmc_core device. Raptorlake P PCH is based on Alderlake P
PCH.
Signed-off-by: George D Sworo <george.d.sworo@intel.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220602012617.20100-1-george.d.sworo@intel.comSigned-off-by: Hans de Goede <hdegoede@redhat.com>

552f3b80

platform/x86/intel: Fix pmt_crashlog array reference · 66cb3a2d

David Arcari authored May 26, 2022

The probe function pmt_crashlog_probe() may incorrectly reference
the 'priv->entry array' as it uses 'i' to reference the array instead
of 'priv->num_entries' as it should.  This is similar to the problem
that was addressed in pmt_telemetry_probe via commit 2cdfa0c2
("platform/x86/intel: Fix 'rmmod pmt_telemetry' panic").

Cc: "David E. Box" <david.e.box@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <markgross@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David Arcari <darcari@redhat.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220526203140.339120-1-darcari@redhat.comSigned-off-by: Hans de Goede <hdegoede@redhat.com>

66cb3a2d

platform/mellanox: Add static in struct declaration. · b9c29f39

Michael Shych authored Jun 02, 2022

Fix problem of missing static in struct declaration.

Fixes: 662f2482 ("platform/mellanox: Add support for new SN2201 system")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Shych <michaelsh@nvidia.com>
Link: https://lore.kernel.org/r/20220602145103.11859-1-michaelsh@nvidia.comSigned-off-by: Hans de Goede <hdegoede@redhat.com>

b9c29f39

iov_iter: Fix iter_xarray_get_pages{,_alloc}() · 6c776766

David Howells authored Jun 09, 2022

The maths at the end of iter_xarray_get_pages() to calculate the actual
size doesn't work under some circumstances, such as when it's been asked to
extract a partial single page.  Various terms of the equation cancel out
and you end up with actual == offset.  The same issue exists in
iter_xarray_get_pages_alloc().

Fix these to just use min() to select the lesser amount from between the
amount of page content transcribed into the buffer, minus the offset, and
the size limit specified.

This doesn't appear to have caused a problem yet upstream because network
filesystems aren't getting the pages from an xarray iterator, but rather
passing it directly to the socket, which just iterates over it.  Cachefiles
*does* do DIO from one to/from ext4/xfs/btrfs/etc. but it always asks for
whole pages to be written or read.

Fixes: 7ff50620 ("iov_iter: Add ITER_XARRAY")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Mike Marshall <hubcap@omnibond.com>
cc: Gao Xiang <xiang@kernel.org>
cc: linux-afs@lists.infradead.org
cc: v9fs-developer@lists.sourceforge.net
cc: devel@lists.orangefs.org
cc: linux-erofs@lists.ozlabs.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6c776766

netfs: Rename the netfs_io_request cleanup op and give it an op pointer · 40a81101

David Howells authored Feb 25, 2022

The netfs_io_request cleanup op is now always in a position to be given a
pointer to a netfs_io_request struct, so this can be passed in instead of
the mapping and private data arguments (both of which are included in the
struct).

So rename the ->cleanup op to ->free_request (to match ->init_request) and
pass in the I/O pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com

40a81101

netfs: Further cleanups after struct netfs_inode wrapper introduced · e81fb419

Linus Torvalds authored Jun 09, 2022

Change the signature of netfs helper functions to take a struct netfs_inode
pointer rather than a struct inode pointer where appropriate, thereby
relieving the need for the network filesystem to convert its internal inode
format down to the VFS inode only for netfslib to bounce it back up.  For
type safety, it's better not to do that (and it's less typing too).

Give netfs_write_begin() an extra argument to pass in a pointer to the
netfs_inode struct rather than deriving it internally from the file
pointer.  Note that the ->write_begin() and ->write_end() ops are intended
to be replaced in the future by netfslib code that manages this without the
need to call in twice for each page.

netfs_readpage() and similar are intended to be pointed at directly by the
address_space_operations table, so must stick to the signature dictated by
the function pointers there.

Changes
=======
- Updated the kerneldoc comments and documentation [DH].
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/

e81fb419