Commits · 9d246053a69196c7c27068870e9b4b66ac536f68 · Kirill Smelkov / linux

08 Jul, 2020 10 commits

sched: Add a tracepoint to track rq->nr_running · 9d246053

Phil Auld authored Jun 29, 2020

Add a bare tracepoint trace_sched_update_nr_running_tp which tracks
->nr_running CPU's rq. This is used to accurately trace this data and
provide a visualization of scheduler imbalances in, for example, the
form of a heat map.  The tracepoint is accessed by loading an external
kernel module. An example module (forked from Qais' module and including
the pelt related tracepoints) can be found at:

  https://github.com/auldp/tracepoints-helpers.git

A script to turn the trace-cmd report output into a heatmap plot can be
found at:

  https://github.com/jirvoz/plot-nr-running

The tracepoints are added to add_nr_running() and sub_nr_running() which
are in kernel/sched/sched.h. In order to avoid CREATE_TRACE_POINTS in
the header a wrapper call is used and the trace/events/sched.h include
is moved before sched.h in kernel/sched/core.
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200629192303.GC120228@lorien.usersys.redhat.com

9d246053

net: Restrict receive packets queuing to housekeeping CPUs · 07bbecb3

Alex Belits authored Jun 25, 2020

With the existing implementation of store_rps_map(), packets are queued
in the receive path on the backlog queues of other CPUs irrespective of
whether they are isolated or not. This could add a latency overhead to
any RT workload that is running on the same CPU.

Ensure that store_rps_map() only uses available housekeeping CPUs for
storing the rps_map.
Signed-off-by: Alex Belits <abelits@marvell.com>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200625223443.2684-4-nitesh@redhat.com

07bbecb3

PCI: Restrict probe functions to housekeeping CPUs · 69a18b18

Alex Belits authored Jun 25, 2020

pci_call_probe() prevents the nesting of work_on_cpu() for a scenario
where a VF device is probed from work_on_cpu() of the PF.

Replace the cpumask used in pci_call_probe() from all online CPUs to only
housekeeping CPUs. This is to ensure that there are no additional latency
overheads caused due to the pinning of jobs on isolated CPUs.
Signed-off-by: Alex Belits <abelits@marvell.com>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/20200625223443.2684-3-nitesh@redhat.com

69a18b18

lib: Restrict cpumask_local_spread to houskeeping CPUs · 1abdfe70

Alex Belits authored Jun 25, 2020

The current implementation of cpumask_local_spread() does not respect the
isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
it will return it to the caller for pinning of its IRQ threads. Having
these unwanted IRQ threads on an isolated CPU adds up to a latency
overhead.

Restrict the CPUs that are returned for spreading IRQs only to the
available housekeeping CPUs.
Signed-off-by: Alex Belits <abelits@marvell.com>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200625223443.2684-2-nitesh@redhat.com

1abdfe70

sched/uclamp: Protect uclamp fast path code with static key · 46609ce2

Qais Yousef authored Jun 30, 2020

There is a report that when uclamp is enabled, a netperf UDP test
regresses compared to a kernel compiled without uclamp.

https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/

While investigating the root cause, there were no sign that the uclamp
code is doing anything particularly expensive but could suffer from bad
cache behavior under certain circumstances that are yet to be
understood.

https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/

To reduce the pressure on the fast path anyway, add a static key that is
by default will skip executing uclamp logic in the
enqueue/dequeue_task() fast path until it's needed.

As soon as the user start using util clamp by:

1. Changing uclamp value of a task with sched_setattr()
2. Modifying the default sysctl_sched_util_clamp_{min, max}
3. Modifying the default cpu.uclamp.{min, max} value in cgroup

We flip the static key now that the user has opted to use util clamp.
Effectively re-introducing uclamp logic in the enqueue/dequeue_task()
fast path. It stays on from that point forward until the next reboot.

This should help minimize the effect of util clamp on workloads that
don't need it but still allow distros to ship their kernels with uclamp
compiled in by default.

SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end
up with unbalanced call to uclamp_rq_dec_id() if we flip the key while
a task is running in the rq. Since we know it is harmless we just
quietly return if we attempt a uclamp_rq_dec_id() when
rq->uclamp[].bucket[].tasks is 0.

In schedutil, we introduce a new uclamp_is_enabled() helper which takes
the static key into account to ensure RT boosting behavior is retained.

The following results demonstrates how this helps on 2 Sockets Xeon E5
2x10-Cores system.

nouclamp uclamp uclamp-static-key
Hmean send-64 162.43 ( 0.00%) 157.84 * -2.82%* 163.39 * 0.59%*
Hmean send-128 324.71 ( 0.00%) 314.78 * -3.06%* 326.18 * 0.45%*
Hmean send-256 641.55 ( 0.00%) 628.67 * -2.01%* 648.12 * 1.02%*
Hmean send-1024 2525.28 ( 0.00%) 2448.26 * -3.05%* 2543.73 * 0.73%*
Hmean send-2048 4836.14 ( 0.00%) 4712.08 * -2.57%* 4867.69 * 0.65%*
Hmean send-3312 7540.83 ( 0.00%) 7425.45 * -1.53%* 7621.06 * 1.06%*
Hmean send-4096 9124.53 ( 0.00%) 8948.82 * -1.93%* 9276.25 * 1.66%*
Hmean send-8192 15589.67 ( 0.00%) 15486.35 * -0.66%* 15819.98 * 1.48%*
Hmean send-16384 26386.47 ( 0.00%) 25752.25 * -2.40%* 26773.74 * 1.47%*

The perf diff between nouclamp and uclamp-static-key when uclamp is
disabled in the fast path:

8.73% -1.55% [kernel.kallsyms] [k] try_to_wake_up
0.07% +0.04% [kernel.kallsyms] [k] deactivate_task
0.13% -0.02% [kernel.kallsyms] [k] activate_task

The diff between nouclamp and uclamp-static-key when uclamp is enabled
in the fast path:

8.73% -0.72% [kernel.kallsyms] [k] try_to_wake_up
0.13% +0.39% [kernel.kallsyms] [k] activate_task
0.07% +0.38% [kernel.kallsyms] [k] deactivate_task

Fixes: 69842cba ("sched/uclamp: Add CPU's clamp buckets refcounting")
Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lkml.kernel.org/r/20200630112123.12076-3-qais.yousef@arm.com

46609ce2

sched/uclamp: Fix initialization of struct uclamp_rq · d81ae8aa

Qais Yousef authored Jun 30, 2020

struct uclamp_rq was zeroed out entirely in assumption that in the first
call to uclamp_rq_inc() they'd be initialized correctly in accordance to
default settings.

But when next patch introduces a static key to skip
uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil
will fail to perform any frequency changes because the
rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which
means all rqs are capped to 0 by default.

Fix it by making sure we do proper initialization at init without
relying on uclamp_rq_inc() doing it later.

Fixes: 69842cba ("sched/uclamp: Add CPU's clamp buckets refcounting")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lkml.kernel.org/r/20200630112123.12076-2-qais.yousef@arm.com

d81ae8aa

sched, vmlinux.lds: Increase STRUCT_ALIGNMENT to 64 bytes for GCC-4.9 · 85c2ce91

Peter Zijlstra authored Jun 30, 2020

For some mysterious reason GCC-4.9 has a 64 byte section alignment for
structures, all other GCC versions (and Clang) tested (including 4.8
and 5.0) are fine with the 32 bytes alignment.

Getting this right is important for the new SCHED_DATA macro that
creates an explicitly ordered array of 'struct sched_class' in the
linker script and expect pointer arithmetic to work.

Fixes: c3a340f7 ("sched: Have sched_class_highest define by vmlinux.lds.h")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200630144905.GX4817@hirez.programming.kicks-ass.net

85c2ce91

Merge branch 'sched/urgent' · faa2fd7c
Peter Zijlstra authored Jul 08, 2020

faa2fd7c

sched: Fix unreliable rseq cpu_id for new tasks · ce3614da

Mathieu Desnoyers authored Jul 06, 2020

While integrating rseq into glibc and replacing glibc's sched_getcpu
implementation with rseq, glibc's tests discovered an issue with
incorrect __rseq_abi.cpu_id field value right after the first time
a newly created process issues sched_setaffinity.

For the records, it triggers after building glibc and running tests, and
then issuing:

  for x in {1..2000} ; do posix/tst-affinity-static  & done

and shows up as:

error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0

This is caused by the scheduler invoking __set_task_cpu() directly from
sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which
is done by set_task_cpu().

Add the missing rseq_migrate() to both functions. The only other direct
use of __set_task_cpu() is done by init_idle(), which does not involve a
user-space task.

Based on my testing with the glibc test-case, just adding rseq_migrate()
to wake_up_new_task() is sufficient to fix the observed issue. Also add
it to sched_fork() to keep things consistent.

The reason why this never triggered so far with the rseq/basic_test
selftest is unclear.

The current use of sched_getcpu(3) does not typically require it to be
always accurate. However, use of the __rseq_abi.cpu_id field within rseq
critical sections requires it to be accurate. If it is not accurate, it
can cause corruption in the per-cpu data targeted by rseq critical
sections in user-space.
Reported-By: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-By: Florian Weimer <fweimer@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.com

ce3614da

sched: Fix loadavg accounting race · dbfb089d

Peter Zijlstra authored Jul 03, 2020

The recent commit:

  c6e7bd7a ("sched/core: Optimize ttwu() spinning on p->on_cpu")

moved these lines in ttwu():

	p->sched_contributes_to_load = !!task_contributes_to_load(p);
	p->state = TASK_WAKING;

up before:

	smp_cond_load_acquire(&p->on_cpu, !VAL);

into the 'p->on_rq == 0' block, with the thinking that once we hit
schedule() the current task cannot change it's ->state anymore. And
while this is true, it is both incorrect and flawed.

It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0'
to avoid weak hardware from re-ordering things for us. This can fairly
easily be achieved by relying on the control-dependency already in
place.

The second problem, which makes the flaw in the original argument, is
that while schedule() will not change prev->state, it will read it a
number of times (arguably too many times since it's marked volatile).
The previous condition 'p->on_cpu == 0' was sufficient because that
indicates schedule() has completed, and will no longer read
prev->state. So now the trick is to make this same true for the (much)
earlier 'prev->on_rq == 0' case.

Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0'
assignment needs to he a RELEASE, but adding additional ordering to
schedule() is an unwelcome proposition at the best of times, doubly so
for mere accounting.

Luckily we can push the prev->state load up before rq->lock, with the
only caveat that we then have to re-read the state after. However, we
know that if it changed, we no longer have to worry about the blocking
path. This gives us the required ordering, if we block, we did the
prev->state load before an (effective) smp_mb() and the p->on_rq store
needs not change.

With this we end up with the effective ordering:

	LOAD p->state           LOAD-ACQUIRE p->on_rq == 0
	MB
	STORE p->on_rq, 0       STORE p->state, TASK_WAKING

which ensures the TASK_WAKING store happens after the prev->state
load, and all is well again.

Fixes: c6e7bd7a ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net

dbfb089d

05 Jul, 2020 14 commits

Linux 5.8-rc4 · dcb7fd82
Linus Torvalds authored Jul 05, 2020

dcb7fd82

x86/ldt: use "pr_info_once()" instead of open-coding it badly · bb5a93aa

Linus Torvalds authored Jul 05, 2020

Using a mutex for "print this warning only once" is so overdesigned as
to be actively offensive to my sensitive stomach.

Just use "pr_info_once()" that already does this, although in a
(harmlessly) racy manner that can in theory cause the message to be
printed twice if more than one CPU races on that "is this the first
time" test.

[ If somebody really cares about that harmless data race (which sounds
  very unlikely indeed), that person can trivially fix printk_once() by
  using a simple atomic access, preferably with an optimistic non-atomic
  test first before even bothering to treat the pointless "make sure it
  is _really_ just once" case.

  A mutex is most definitely never the right primitive to use for
  something like this. ]

Yes, this is a small and meaningless detail in a code path that hardly
matters.  But let's keep some code quality standards here, and not
accept outrageously bad code.

Link: https://lore.kernel.org/lkml/CAHk-=wgV9toS7GU3KmNpj8hCS9SeF+A0voHS8F275_mgLhL4Lw@mail.gmail.com/
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bb5a93aa

Merge tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 72674d48

Linus Torvalds authored Jul 05, 2020

Pull x86 fixes from Thomas Gleixner:
 "A series of fixes for x86:

   - Reset MXCSR in kernel_fpu_begin() to prevent using a stale user
     space value.

   - Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
     whitelisted for split lock detection. Some CPUs which do not
     support it crash even when the MSR is written to 0 which is the
     default value.

   - Fix the XEN PV fallout of the entry code rework

   - Fix the 32bit fallout of the entry code rework

   - Add more selftests to ensure that these entry problems don't come
     back.

   - Disable 16 bit segments on XEN PV. It's not supported because XEN
     PV does not implement ESPFIX64"

* tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ldt: Disable 16-bit segments on Xen PV
  x86/entry/32: Fix #MC and #DB wiring on x86_32
  x86/entry/xen: Route #DB correctly on Xen PV
  x86/entry, selftests: Further improve user entry sanity checks
  x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
  selftests/x86: Consolidate and fix get/set_eflags() helpers
  selftests/x86/syscall_nt: Clear weird flags after each test
  selftests/x86/syscall_nt: Add more flag combinations
  x86/entry/64/compat: Fix Xen PV SYSENTER frame setup
  x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into C
  x86/entry: Assert that syscalls are on the right stack
  x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
  x86/fpu: Reset MXCSR to default in kernel_fpu_begin()

72674d48

Merge tag 'irq-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f23dbe18

Linus Torvalds authored Jul 05, 2020

Pull irq fixes from Thomas Gleixner:
 "A set of interrupt chip driver fixes:

   - Ensure the atomicity of affinity updates in the GIC driver

   - Don't try to sleep in atomic context when waiting for the GICv4.1
     to respond. Use polling instead.

   - Typo fixes in Kconfig and warnings"

* tag 'irq-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic: Atomically update affinity
  irqchip/riscv-intc: Fix a typo in a pr_warn()
  irqchip/gic-v4.1: Use readx_poll_timeout_atomic() to fix sleep in atomic
  irqchip/loongson-pci-msi: Fix a typo in Kconfig

f23dbe18

Merge tag 'core-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5465a324

Linus Torvalds authored Jul 05, 2020

Pull rcu fixlet from Thomas Gleixner:
 "A single fix for a printk format warning in RCU"

* tag 'core-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcuperf: Fix printk format warning

5465a324

Merge tag 'kbuild-fixes-v5.8-2' of... · 4bc92736

Linus Torvalds authored Jul 05, 2020

Merge tag 'kbuild-fixes-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes frin Masahiro Yamada:

 - fix various bugs in xconfig

 - fix some issues in cross-compilation using Clang

 - fix documentation

* tag 'kbuild-fixes-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  .gitignore: Do not track `defconfig` from `make savedefconfig`
  kbuild: make Clang build userprogs for target architecture
  kbuild: fix CONFIG_CC_CAN_LINK(_STATIC) for cross-compilation with Clang
  kconfig: qconf: parse newer types at debug info
  kconfig: qconf: navigate menus on hyperlinks
  kconfig: qconf: don't show goback button on splitMode
  kconfig: qconf: simplify the goBack() logic
  kconfig: qconf: re-implement setSelected()
  kconfig: qconf: make debug links work again
  kconfig: qconf: make search fully work again on split mode
  kconfig: qconf: cleanup includes
  docs: kbuild: fix ReST formatting
  gcc-plugins: fix gcc-plugins directory path in documentation

4bc92736

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 19a61a75

Linus Torvalds authored Jul 05, 2020

Pull SCSI fixes from James Bottomley:
 "Four small fixes in three drivers.

  The mptfusion one has actually caused user visible issues in certain
  kernel configurations"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mptfusion: Don't use GFP_ATOMIC for larger DMA allocations
  scsi: libfc: Skip additional kref updating work event
  scsi: libfc: Handling of extra kref
  scsi: qla2xxx: Fix a condition in qla2x00_find_all_fabric_devs()

19a61a75

Merge tag 'block-5.8-2020-07-05' of git://git.kernel.dk/linux-block · 29206c63

Linus Torvalds authored Jul 05, 2020

Pull block fixes from Jens Axboe:

 - NVMe fixes from Christoph:
    - Fix crash in multi-path disk add (Christoph)
    - Fix ignore of identify error (Sagi)

 - Fix a compiler complaint that a function should be static (Wei)

* tag 'block-5.8-2020-07-05' of git://git.kernel.dk/linux-block:
  block: make function __bio_integrity_free() static
  nvme: fix a crash in nvme_mpath_add_disk
  nvme: fix identify error status silent ignore

29206c63

Merge tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block · 9fbe565c

Linus Torvalds authored Jul 05, 2020

Pull io_uring fix from Jens Axboe:
 "Andres reported a regression with the fix that was merged earlier this
  week, where his setup of using signals to interrupt io_uring CQ waits
  no longer worked correctly.

  Fix this, and also limit our use of TWA_SIGNAL to the case where we
  need it, and continue using TWA_RESUME for task_work as before.

  Since the original is marked for 5.7 stable, let's flush this one out
  early"

* tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block:
  io_uring: fix regression with always ignoring signals in io_cqring_wait()

9fbe565c

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 77834854

Linus Torvalds authored Jul 05, 2020

Pull i2c fixes from Wolfram Sang:
 "The usual driver fixes and documentation updates"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: mlxcpld: check correct size of maximum RECV_LEN packet
  i2c: add Kconfig help text for slave mode
  i2c: slave-eeprom: update documentation
  i2c: eg20t: Load module automatically if ID matches
  i2c: designware: platdrv: Set class based on DMI
  i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665

77834854

Merge tag 'mips_fixes_5.8_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 45a5ac7a

Linus Torvalds authored Jul 05, 2020

Pull MIPS fixes from Thomas Bogendoerfer:

 - fix for missing hazard barrier

 - DT fix for ingenic

 - DT fix of GPHY names for lantiq

 - fix usage of smp_processor_id() while preemption is enabled

* tag 'mips_fixes_5.8_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Do not use smp_processor_id() in preemptible code
  MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
  MIPS: ingenic: gcw0: Fix HP detection GPIO.
  MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names

45a5ac7a

MIPS: Do not use smp_processor_id() in preemptible code · 5868347a

Xingxing Su authored Jul 03, 2020

Use preempt_disable() to fix the following bug under CONFIG_DEBUG_PREEMPT.

[   21.915305] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-mip/1056
[   21.923996] caller is do_ri+0x1d4/0x690
[   21.927921] CPU: 0 PID: 1056 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
[   21.934913] Stack : 0000000000000001 ffffffff81370000 ffffffff8071cd60 a80f926d5ac95694
[   21.942984]         a80f926d5ac95694 0000000000000000 98000007f0043c88 ffffffff80f2fe40
[   21.951054]         0000000000000000 0000000000000000 0000000000000001 0000000000000000
[   21.959123]         ffffffff802d60cc 98000007f0043dd8 ffffffff81f4b1e8 ffffffff81f60000
[   21.967192]         ffffffff81f60000 ffffffff80fe0000 ffff000000000000 0000000000000000
[   21.975261]         fffffffff500cce1 0000000000000001 0000000000000002 0000000000000000
[   21.983331]         ffffffff80fe1a40 0000000000000006 ffffffff8077f940 0000000000000000
[   21.991401]         ffffffff81460000 98000007f0040000 98000007f0043c80 000000fffba8cf20
[   21.999471]         ffffffff8071cd60 0000000000000000 0000000000000000 0000000000000000
[   22.007541]         0000000000000000 0000000000000000 ffffffff80212ab4 a80f926d5ac95694
[   22.015610]         ...
[   22.018086] Call Trace:
[   22.020562] [<ffffffff80212ab4>] show_stack+0xa4/0x138
[   22.025732] [<ffffffff8071cd60>] dump_stack+0xf0/0x150
[   22.030903] [<ffffffff80c73f5c>] check_preemption_disabled+0xf4/0x100
[   22.037375] [<ffffffff80213b84>] do_ri+0x1d4/0x690
[   22.042198] [<ffffffff8020b828>] handle_ri_int+0x44/0x5c
[   24.359386] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-mip/1072
[   24.368204] caller is do_ri+0x1a8/0x690
[   24.372169] CPU: 4 PID: 1072 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
[   24.379170] Stack : 0000000000000001 ffffffff81370000 ffffffff8071cd60 a80f926d5ac95694
[   24.387246]         a80f926d5ac95694 0000000000000000 98001007ef06bc88 ffffffff80f2fe40
[   24.395318]         0000000000000000 0000000000000000 0000000000000001 0000000000000000
[   24.403389]         ffffffff802d60cc 98001007ef06bdd8 ffffffff81f4b818 ffffffff81f60000
[   24.411461]         ffffffff81f60000 ffffffff80fe0000 ffff000000000000 0000000000000000
[   24.419533]         fffffffff500cce1 0000000000000001 0000000000000002 0000000000000000
[   24.427603]         ffffffff80fe0000 0000000000000006 ffffffff8077f940 0000000000000020
[   24.435673]         ffffffff81460020 98001007ef068000 98001007ef06bc80 000000fffbbbb370
[   24.443745]         ffffffff8071cd60 0000000000000000 0000000000000000 0000000000000000
[   24.451816]         0000000000000000 0000000000000000 ffffffff80212ab4 a80f926d5ac95694
[   24.459887]         ...
[   24.462367] Call Trace:
[   24.464846] [<ffffffff80212ab4>] show_stack+0xa4/0x138
[   24.470029] [<ffffffff8071cd60>] dump_stack+0xf0/0x150
[   24.475208] [<ffffffff80c73f5c>] check_preemption_disabled+0xf4/0x100
[   24.481682] [<ffffffff80213b58>] do_ri+0x1a8/0x690
[   24.486509] [<ffffffff8020b828>] handle_ri_int+0x44/0x5c
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

5868347a

MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen · fcec538e

Hauke Mehrtens authored Jul 03, 2020

This resolves the hazard between the mtc0 in the change_c0_status() and
the mfc0 in configure_exception_vector(). Without resolving this hazard
configure_exception_vector() could read an old value and would restore
this old value again. This would revert the changes change_c0_status()
did. I checked this by printing out the read_c0_status() at the end of
per_cpu_trap_init() and the ST0_MX is not set without this patch.

The hazard is documented in the MIPS Architecture Reference Manual Vol.
III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
6.03 table 8.1 which includes:

   Producer | Consumer | Hazard
  ----------|----------|----------------------------
   mtc0     | mfc0     | any coprocessor 0 register

I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
There the change_c0_status() function would activate the DSPen by
setting ST0_MX in the c0_status register. This was reverted and then the
system got a DSP exception when the DSP registers were saved in
save_dsp() in the first process switch. The crash looks like this:

[    0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.107070] Kernel panic - not syncing: Unexpected DSP exception
[    0.113470] Rebooting in 1 seconds..

We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
did not happen.

In the kernel I bisected this problem to commit 9012d011 ("compiler:
allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
reverted it also happened after commit 172dcd93 ("MIPS: Always
allocate exception vector for MIPSr2+").

Commit 0b24cae4 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
does similar changes to a different file. I am not sure if there are
more places affected by this problem.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

fcec538e

.gitignore: Do not track `defconfig` from `make savedefconfig` · ba77dca5

Paul Menzel authored Jul 02, 2020

Running `make savedefconfig` creates by default `defconfig`, which is,
currently, on git’s radar, for example, `git status` lists this file as
untracked.

So, add the file to `.gitignore`, so it’s ignored by git.
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

ba77dca5

04 Jul, 2020 16 commits

Merge tag 'powerpc-5.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 9bc0b029

Linus Torvalds authored Jul 04, 2020

Pull powerpc fixes from Michael Ellerman:
 "One fix for a regression in our pkey handling, which exhibits as
  PROT_EXEC mappings taking continuous page faults.

  Thanks to: Jan Stancek, Aneesh Kumar K.V"

* tag 'powerpc-5.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm/pkeys: Make pkey access check work on execute_only_key

9bc0b029

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · ec84c3f6

Linus Torvalds authored Jul 04, 2020

Pull arm64 fixes from Will Deacon:
 "Nothing earth-shattering, really - some CPU errata workarounds (one
  day they'll get it right, ha!) and a fix for a boot failure with very
  large kernel images where the alternative patching gets confused when
  patching relative branches using veneers.

   - Fix alternative patching for very large kernel images and modules

   - Hook up existing CPU errata workarounds for Qualcomm Kryo CPUs"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add KRYO4XX silver CPU cores to erratum list 1530923 and 1024718
  arm64: Add KRYO4XX gold CPU cores to erratum list 1463225 and 1418040
  arm64: Add MIDR value for KRYO4XX gold CPU cores
  arm64/alternatives: use subsections for replacement sequences

ec84c3f6

io_uring: fix regression with always ignoring signals in io_cqring_wait() · b7db41c9

Jens Axboe authored Jul 04, 2020

When switching to TWA_SIGNAL for task_work notifications, we also made
any signal based condition in io_cqring_wait() return -ERESTARTSYS.
This breaks applications that rely on using signals to abort someone
waiting for events.

Check if we have a signal pending because of queued task_work, and
repeat the signal check once we've run the task_work. This provides a
reliable way of telling the two apart.

Additionally, only use TWA_SIGNAL if we are using an eventfd. If not,
we don't have the dependency situation described in the original commit,
and we can get by with just using TWA_RESUME like we previously did.

Fixes: ce593a6c ("io_uring: use signal based task_work running")
Cc: stable@vger.kernel.org # v5.7
Reported-by: Andres Freund <andres@anarazel.de>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b7db41c9

x86/ldt: Disable 16-bit segments on Xen PV · cc801833

Andy Lutomirski authored Jul 03, 2020

Xen PV doesn't implement ESPFIX64, so they don't work right. Disable
them. Also print a warning the first time anyone tries to use a
16-bit segment on a Xen PV guest that would otherwise allow it
to help people diagnose this change in behavior.

This gets us closer to having all x86 selftests pass on Xen PV.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/92b2975459dfe5929ecf34c3896ad920bd9e3f2d.1593795633.git.luto@kernel.org

cc801833

x86/entry/32: Fix #MC and #DB wiring on x86_32 · 13cbc0cd

Andy Lutomirski authored Jul 03, 2020

DEFINE_IDTENTRY_MCE and DEFINE_IDTENTRY_DEBUG were wired up as non-RAW
on x86_32, but the code expected them to be RAW.

Get rid of all the macro indirection for them on 32-bit and just use
DECLARE_IDTENTRY_RAW and DEFINE_IDTENTRY_RAW directly.

Also add a warning to make sure that we only hit the _kernel paths
in kernel mode.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9e90a7ee8e72fd757db6d92e1e5ff16339c1ecf9.1593795633.git.luto@kernel.org

13cbc0cd

x86/entry/xen: Route #DB correctly on Xen PV · f41f0824

Andy Lutomirski authored Jul 03, 2020

On Xen PV, #DB doesn't use IST. It still needs to be correctly routed
depending on whether it came from user or kernel mode.

Get rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too hard to follow the
logic. Instead, route #DB and NMI through DECLARE/DEFINE_IDTENTRY_RAW on
Xen, and do the right thing for #DB. Also add more warnings to the
exc_debug* handlers to make this type of failure more obvious.

This fixes various forms of corruption that happen when usermode
triggers #DB on Xen PV.

Fixes: 4c0dcd83 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/4163e733cce0b41658e252c6c6b3464f33fdff17.1593795633.git.luto@kernel.org

f41f0824

x86/entry, selftests: Further improve user entry sanity checks · 3c73b81a

Andy Lutomirski authored Jul 03, 2020

Chasing down a Xen bug caused me to realize that the new entry sanity
checks are still fairly weak. Add some more checks.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/881de09e786ab93ce56ee4a2437ba2c308afe7a9.1593795633.git.luto@kernel.org

3c73b81a

x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER · db5b2c5a

Andy Lutomirski authored Jul 03, 2020

Move the clearing of the high bits of RAX after Xen PV joins the SYSENTER
path so that Xen PV doesn't skip it.

Arguably this code should be deleted instead, but that would belong in the
merge window.

Fixes: ffae641f ("x86/entry/64/compat: Fix Xen PV SYSENTER frame setup")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9d33b3f3216dcab008070f1c28b6091ae7199969.1593795633.git.luto@kernel.org

db5b2c5a

Merge tag 'for-linus-5.8b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 35e884f8

Linus Torvalds authored Jul 03, 2020

Pull xen fixes from Juergen Gross:
 "One small cleanup patch for ARM and two patches for the xenbus driver
  fixing latent problems (large stack allocations and bad return code
  settings)"

* tag 'for-linus-5.8b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: let xenbus_map_ring_valloc() return errno values only
  xen/xenbus: avoid large structs and arrays on the stack
  arm/xen: remove the unused macro GRANT_TABLE_PHYSADDR

35e884f8

i2c: mlxcpld: check correct size of maximum RECV_LEN packet · 59791128

Wolfram Sang authored Jun 28, 2020

I2C_SMBUS_BLOCK_MAX defines already the maximum number as defined in the
SMBus 2.0 specs. I don't see a reason to add 1 here. Also, fix the errno
to what is suggested for this error.

Fixes: c9bfdc7c ("i2c: mlxcpld: Add support for smbus block read transaction")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Michael Shych <michaelsh@mellanox.com>
Tested-by: Michael Shych <michaelsh@mellanox.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

59791128

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 8b082a41

Linus Torvalds authored Jul 03, 2020

Pull sysctl fix from Al Viro:
 "Another regression fix for sysctl changes this cycle..."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Call sysctl_head_finish on error

8b082a41

i2c: add Kconfig help text for slave mode · 58e64b05

Wolfram Sang authored Jun 28, 2020

I can't recall why there was none, but we surely want to have it.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

58e64b05

i2c: slave-eeprom: update documentation · 59d3d604

Wolfram Sang authored Jun 28, 2020

Add more details which have either been missing ever since or describe
recent additions.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

59d3d604

i2c: eg20t: Load module automatically if ID matches · 5f90786b

Andy Shevchenko authored Jul 02, 2020

The driver can't be loaded automatically because it misses
module alias to be provided. Add corresponding MODULE_DEVICE_TABLE()
call to the driver.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

5f90786b

i2c: designware: platdrv: Set class based on DMI · db2a8b6f

Ricardo Ribalda authored Jul 02, 2020

Current AMD's zen-based APUs use this core for some of its i2c-buses.

With this patch we re-enable autodetection of hwmon-alike devices, so
lm-sensors will be able to work automatically.

It does not affect the boot-time of embedded devices, as the class is
set based on the DMI information.

DMI is probed only on Qtechnology QT5222 Industrial Camera Platform.

DocLink: https://qtec.com/camera-technology-camera-platforms/
Fixes: 3eddad96 ("i2c: designware: reverts "i2c: designware: Add support for AMD I2C controller"")
Signed-off-by: Ricardo Ribalda <ribalda@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

db2a8b6f

i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 · cd217f23

Chris Packham authored Jul 02, 2020

The PCA9665 datasheet says that I2CSTA = 78h indicates that SCL is stuck
low, this differs to the PCA9564 which uses 90h for this indication.
Treat either 0x78 or 0x90 as an indication that the SCL line is stuck.

Based on looking through the PCA9564 and PCA9665 datasheets this should
be safe for both chips. The PCA9564 should not return 0x78 for any valid
state and the PCA9665 should not return 0x90.

Fixes: eff9ec95 ("i2c-algo-pca: Add PCA9665 support")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

cd217f23