Commits · 8b24e69fc47e43679bb29ddb481aa0e8dce2a3c5 · nexedi / linux

01 Jul, 2017 2 commits

KVM: PPC: Book3S HV: Close race with testing for signals on guest entry · 8b24e69f

Paul Mackerras authored Jun 26, 2017

At present, interrupts are hard-disabled fairly late in the guest
entry path, in the assembly code. Since we check for pending signals
for the vCPU(s) task(s) earlier in the guest entry path, it is
possible for a signal to be delivered before we enter the guest but
not be noticed until after we exit the guest for some other reason.

Similarly, it is possible for the scheduler to request a reschedule
while we are in the guest entry path, and we won't notice until after
we have run the guest, potentially for a whole timeslice.

Furthermore, with a radix guest on POWER9, we can take the interrupt
with the MMU on. In this case we end up leaving interrupts
hard-disabled after the guest exit, and they are likely to stay
hard-disabled until we exit to userspace or context-switch to
another process. This was masking the fact that we were also not
setting the RI (recoverable interrupt) bit in the MSR, meaning
that if we had taken an interrupt, it would have crashed the host
kernel with an unrecoverable interrupt message.

To close these races, we need to check for signals and reschedule
requests after hard-disabling interrupts, and then keep interrupts
hard-disabled until we enter the guest. If there is a signal or a
reschedule request from another CPU, it will send an IPI, which will
cause a guest exit.

This puts the interrupt disabling before we call kvmppc_start_thread()
for all the secondary threads of this core that are going to run vCPUs.
The reason for that is that once we have started the secondary threads
there is no easy way to back out without going through at least part
of the guest entry path. However, kvmppc_start_thread() includes some
code for radix guests which needs to call smp_call_function(), which
must be called with interrupts enabled. To solve this problem, this
patch moves that code into a separate function that is called earlier.

When the guest exit is caused by an external interrupt, a hypervisor
doorbell or a hypervisor maintenance interrupt, we now handle these
using the replay facility. __kvmppc_vcore_entry() now returns the
trap number that caused the exit on this thread, and instead of the
assembly code jumping to the handler entry, we return to C code with
interrupts still hard-disabled and set the irq_happened flag in the
PACA, so that when we do local_irq_enable() the appropriate handler
gets called.

With all this, we now have the interrupt soft-enable flag clear while
we are in the guest. This is useful because code in the real-mode
hypercall handlers that checks whether interrupts are enabled will
now see that they are disabled, which is correct, since interrupts
are hard-disabled in the real-mode code.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

8b24e69f

KVM: PPC: Book3S HV: Simplify dynamic micro-threading code · 898b25b2

Paul Mackerras authored Jun 22, 2017

Since commit b009031f ("KVM: PPC: Book3S HV: Take out virtual
core piggybacking code", 2016-09-15), we only have at most one
vcore per subcore.  Previously, the fact that there might be more
than one vcore per subcore meant that we had the notion of a
"master vcore", which was the vcore that controlled thread 0 of
the subcore.  We also needed a list per subcore in the core_info
struct to record which vcores belonged to each subcore.  Now that
there can only be one vcore in the subcore, we can replace the
list with a simple pointer and get rid of the notion of the
master vcore (and in fact treat every vcore as a master vcore).

We can also get rid of the subcore_vm[] field in the core_info
struct since it is never read.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

898b25b2

22 Jun, 2017 2 commits

KVM: PPC: Book3S HV: Add capability to report possible virtual SMT modes · 2ed4f9dd

Paul Mackerras authored Jun 21, 2017

Now that userspace can set the virtual SMT mode by enabling the
KVM_CAP_PPC_SMT capability, it is useful for userspace to be able
to query the set of possible virtual SMT modes.  This provides a
new capability, KVM_CAP_PPC_SMT_POSSIBLE, to provide this
information.  The return value is a bitmap of possible modes, with
bit N set if virtual SMT mode 2^N is available.  That is, 1 indicates
SMT1 is available, 2 indicates that SMT2 is available, 3 indicates
that both SMT1 and SMT2 are available, and so on.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

2ed4f9dd

KVM: PPC: Book3S HV: Exit guest upon MCE when FWNMI capability is enabled · e20bbd3d

Aravinda Prasad authored May 11, 2017

Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reason upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering a 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivery of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Note:

This patch now directly invokes machine_check_print_event_info()
from kvmppc_handle_exit_hv() to print the event to host console
at the time of guest exit before the exception is passed on to the
guest. Hence, the host-side handling which was performed earlier
via machine_check_fwnmi is removed.

The reasons for this approach is (i) it is not possible
to distinguish whether the exception occurred in the
guest or the host from the pt_regs passed on the
machine_check_exception(). Hence machine_check_exception()
calls panic, instead of passing on the exception to
the guest, if the machine check exception is not
recoverable. (ii) the approach introduced in this
patch gives opportunity to the host kernel to perform
actions in virtual mode before passing on the exception
to the guest. This approach does not require complex
tweaks to machine_check_fwnmi and friends.
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

e20bbd3d

21 Jun, 2017 2 commits

powerpc/book3s: EXPORT_SYMBOL_GPL machine_check_print_event_info · 8aa586c6

Mahesh Salgaonkar authored May 11, 2017

It will be used in arch/powerpc/kvm/book3s_hv.c KVM module.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

8aa586c6

KVM: PPC: Book3S HV: Add new capability to control MCE behaviour · 134764ed

Aravinda Prasad authored May 11, 2017

This introduces a new KVM capability to control how KVM behaves
on machine check exception (MCE) in HV KVM guests.

If this capability has not been enabled, KVM redirects machine check
exceptions to guest's 0x200 vector, if the address in error belongs to
the guest. With this capability enabled, KVM will cause a guest exit
with the exit reason indicating an NMI.

The new capability is required to avoid problems if a new kernel/KVM
is used with an old QEMU, running a guest that doesn't issue
"ibm,nmi-register".  As old QEMU does not understand the NMI exit
type, it treats it as a fatal error.  However, the guest could have
handled the machine check error if the exception was delivered to
guest's 0x200 interrupt vector instead of NMI exit in case of old
QEMU.

[paulus@ozlabs.org - Reworded the commit message to be clearer,
 enable only on HV KVM.]
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

134764ed

20 Jun, 2017 1 commit

KVM: PPC: Book3S HV: Don't sleep if XIVE interrupt pending on POWER9 · ee3308a2

Paul Mackerras authored Jun 20, 2017

On a POWER9 system, it is possible for an interrupt to become pending
for a VCPU when that VCPU is about to cede (execute a H_CEDE hypercall)
and has already disabled interrupts, or in the H_CEDE processing up
to the point where the XIVE context is pulled from the hardware. In
such a case, the H_CEDE should not sleep, but should return immediately
to the guest. However, the conditions tested in kvmppc_vcpu_woken()
don't include the condition that a XIVE interrupt is pending, so the
VCPU could sleep until the next decrementer interrupt.

To fix this, we add a new xive_interrupt_pending() helper which looks
in the XIVE context that was pulled from the hardware to see if the
priority of any pending interrupt is higher (numerically lower than)
the CPU priority. If so then kvmppc_vcpu_woken() will return true.
If the XIVE context has never been used, then both the pipr and the
cppr fields will be zero and the test will indicate that no interrupt
is pending.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

ee3308a2

19 Jun, 2017 5 commits

KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9 · 57900694

Paul Mackerras authored May 16, 2017

On POWER9, we no longer have the restriction that we had on POWER8
where all threads in a core have to be in the same partition, so
the CPU threads are now independent. However, we still want to be
able to run guests with a virtual SMT topology, if only to allow
migration of guests from POWER8 systems to POWER9.

A guest that has a virtual SMT mode greater than 1 will expect to
be able to use the doorbell facility; it will expect the msgsndp
and msgclrp instructions to work appropriately and to be able to read
sensible values from the TIR (thread identification register) and
DPDES (directed privileged doorbell exception status) special-purpose
registers. However, since each CPU thread is a separate sub-processor
in POWER9, these instructions and registers can only be used within
a single CPU thread.

In order for these instructions to appear to act correctly according
to the guest's virtual SMT mode, we have to trap and emulate them.
We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
register. The emulation is triggered by the hypervisor facility
unavailable interrupt that occurs when the guest uses them.

To cause a doorbell interrupt to occur within the guest, we set the
DPDES register to 1. If the guest has interrupts enabled, the CPU
will generate a doorbell interrupt and clear the DPDES register in
hardware. The DPDES hardware register for the guest is saved in the
vcpu->arch.vcore->dpdes field. Since this gets written by the guest
exit code, other VCPUs wishing to cause a doorbell interrupt don't
write that field directly, but instead set a vcpu->arch.doorbell_request
flag. This is consumed and set to 0 by the guest entry code, which
then sets DPDES to 1.

Emulating reads of the DPDES register is somewhat involved, because
it requires reading the doorbell pending interrupt status of all of the
VCPU threads in the virtual core, and if any of those VCPUs are
running, their doorbell status is only up-to-date in the hardware
DPDES registers of the CPUs where they are running. In order to get
a reasonable approximation of the current doorbell status, we send
those CPUs an IPI, causing an exit from the guest which will update
the vcpu->arch.vcore->dpdes field. We then use that value in
constructing the emulated DPDES register value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

57900694

KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode · 3c313524

Paul Mackerras authored Feb 06, 2017

This allows userspace to set the desired virtual SMT (simultaneous
multithreading) mode for a VM, that is, the number of VCPUs that
get assigned to each virtual core.  Previously, the virtual SMT mode
was fixed to the number of threads per subcore, and if userspace
wanted to have fewer vcpus per vcore, then it would achieve that by
using a sparse CPU numbering.  This had the disadvantage that the
vcpu numbers can get quite large, particularly for SMT1 guests on
a POWER8 with 8 threads per core.  With this patch, userspace can
set its desired virtual SMT mode and then use contiguous vcpu
numbering.

On POWER8, where the threading mode is "strict", the virtual SMT mode
must be less than or equal to the number of threads per subcore.  On
POWER9, which implements a "loose" threading mode, the virtual SMT
mode can be any power of 2 between 1 and 8, even though there is
effectively one thread per subcore, since the threads are independent
and can all be in different partitions.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

3c313524

KVM: PPC: Book3S HV: Context-switch HFSCR between host and guest on POWER9 · 769377f7

Paul Mackerras authored Feb 15, 2017

This adds code to allow us to use a different value for the HFSCR
(Hypervisor Facilities Status and Control Register) when running the
guest from that which applies in the host.  The reason for doing this
is to allow us to trap the msgsndp instruction and related operations
in future so that they can be virtualized.  We also save the value of
HFSCR when a hypervisor facility unavailable interrupt occurs, because
the high byte of HFSCR indicates which facility the guest attempted to
access.

We save and restore the host value on guest entry/exit because some
bits of it affect host userspace execution.

We only do all this on POWER9, not on POWER8, because we are not
intending to virtualize any of the facilities controlled by HFSCR on
POWER8.  In particular, the HFSCR bit that controls execution of
msgsndp and related operations does not exist on POWER8.  The HFSCR
doesn't exist at all on POWER7.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

769377f7

KVM: PPC: Book3S HV: Don't let VCPU sleep if it has a doorbell pending · 1da4e2f4

Paul Mackerras authored May 19, 2017

It is possible, through a narrow race condition, for a VCPU to exit
the guest with a H_CEDE hypercall while it has a doorbell interrupt
pending.  In this case, the H_CEDE should return immediately, but in
fact it puts the VCPU to sleep until some other interrupt becomes
pending or a prod is received (via another VCPU doing H_PROD).

This fixes it by checking the DPDES (Directed Privileged Doorbell
Exception Status) bit for the thread along with the other interrupt
pending bits.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

1da4e2f4

KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9 · 1bc3fe81

Paul Mackerras authored May 22, 2017

This allows userspace (e.g. QEMU) to enable large decrementer mode for
the guest when running on a POWER9 host, by setting the LPCR_LD bit in
the guest LPCR value. With this, the guest exit code saves 64 bits of
the guest DEC value on exit. Other places that use the guest DEC
value check the LPCR_LD bit in the guest LPCR value, and if it is set,
omit the 32-bit sign extension that would otherwise be done.

This doesn't change the DEC emulation used by PR KVM because PR KVM
is not supported on POWER9 yet.

This is partly based on an earlier patch by Oliver O'Halloran.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

1bc3fe81

16 Jun, 2017 2 commits

KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 · 3d3efb68

Paul Mackerras authored Jun 06, 2017

POWER9 DD1 has an erratum where writing to the TBU40 register, which
is used to apply an offset to the timebase, can cause the timebase to
lose counts.  This results in the timebase on some CPUs getting out of
sync with other CPUs, which then results in misbehaviour of the
timekeeping code.

To work around the problem, we make KVM ignore the timebase offset for
all guests on POWER9 DD1 machines.  This means that live migration
cannot be supported on POWER9 DD1 machines.

Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

3d3efb68

KVM: PPC: Book3S HV: Save/restore host values of debug registers · 7ceaa6dc

Paul Mackerras authored Jun 16, 2017

At present, HV KVM on POWER8 and POWER9 machines loses any instruction
or data breakpoint set in the host whenever a guest is run.
Instruction breakpoints are currently only used by xmon, but ptrace
and the perf_event subsystem can set data breakpoints as well as xmon.

To fix this, we save the host values of the debug registers (CIABR,
DAWR and DAWRX) before entering the guest and restore them on exit.
To provide space to save them in the stack frame, we expand the stack
frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.

Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

7ceaa6dc

15 Jun, 2017 2 commits

KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 46a704f8

Paul Mackerras authored Jun 15, 2017

If userspace attempts to call the KVM_RUN ioctl when it has hardware
transactional memory (HTM) enabled, the values that it has put in the
HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
guest values.  To fix this, we detect this condition and save those
SPR values in the thread struct, and disable HTM for the task.  If
userspace goes to access those SPRs or the HTM facility in future,
a TM-unavailable interrupt will occur and the handler will reload
those SPRs and re-enable HTM.

If userspace has started a transaction and suspended it, we would
currently lose the transactional state in the guest entry path and
would almost certainly get a "TM Bad Thing" interrupt, which would
cause the host to crash.  To avoid this, we detect this case and
return from the KVM_RUN ioctl with an EINVAL error, with the KVM
exit reason set to KVM_EXIT_FAIL_ENTRY.

Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

46a704f8

KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · 4c3bb4cc

Paul Mackerras authored Jun 15, 2017

This restores several special-purpose registers (SPRs) to sane values
on guest exit that were missed before.

TAR and VRSAVE are readable and writable by userspace, and we need to
save and restore them to prevent the guest from potentially affecting
userspace execution (not that TAR or VRSAVE are used by any known
program that run uses the KVM_RUN ioctl). We save/restore these
in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.

FSCR affects userspace execution in that it can prohibit access to
certain facilities by userspace. We restore it to the normal value
for the task on exit from the KVM_RUN ioctl.

IAMR is normally 0, and is restored to 0 on guest exit. However,
with a radix host on POWER9, it is set to a value that prevents the
kernel from executing user-accessible memory. On POWER9, we save
IAMR on guest entry and restore it on guest exit to the saved value
rather than 0. On POWER8 we continue to set it to 0 on guest exit.

PSPB is normally 0. We restore it to 0 on guest exit to prevent
userspace taking advantage of the guest having set it non-zero
(which would allow userspace to set its SMT priority to high).

UAMOR is normally 0. We restore it to 0 on guest exit to prevent
the AMR from being used as a covert channel between userspace
processes, since the AMR is not context-switched at present.

Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

4c3bb4cc

13 Jun, 2017 1 commit

KVM: PPC: Book3S HV: Context-switch EBB registers properly · ca8efa1d

Paul Mackerras authored Jun 06, 2017

This adds code to save the values of three SPRs (special-purpose
registers) used by userspace to control event-based branches (EBBs),
which are essentially interrupts that get delivered directly to
userspace.  These registers are loaded up with guest values when
entering the guest, and their values are saved when exiting the
guest, but we were not saving the host values and restoring them
before going back to userspace.

On POWER8 this would only affect userspace programs which explicitly
request the use of EBBs and also use the KVM_RUN ioctl, since the
only source of EBBs on POWER8 is the PMU, and there is an explicit
enable bit in the PMU registers (and those PMU registers do get
properly context-switched between host and guest).  On POWER9 there
is provision for externally-generated EBBs, and these are not subject
to the control in the PMU registers.

Since these registers only affect userspace, we can save them when
we first come in from userspace and restore them before returning to
userspace, rather than saving/restoring the host values on every
guest entry/exit.  Similarly, we don't need to worry about their
values on offline secondary threads since they execute in the context
of the idle task, which never executes in userspace.

Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

ca8efa1d

29 May, 2017 1 commit

KVM: PPC: Book3S HV: Cope with host using large decrementer mode · 2f272463

Paul Mackerras authored May 22, 2017

POWER9 introduces a new mode for the decrementer register, called
large decrementer mode, in which the decrementer counter is 56 bits
wide rather than 32, and reads are sign-extended rather than
zero-extended.  For the decrementer, this new mode is optional and
controlled by a bit in the LPCR.  The hypervisor decrementer (HDEC)
is 56 bits wide on POWER9 and has no mode control.

Since KVM code reads and writes the decrementer and hypervisor
decrementer registers in a few places, it needs to be aware of the
need to treat the decrementer value as a 64-bit quantity, and only do
a 32-bit sign extension when large decrementer mode is not in effect.
Similarly, the HDEC should always be treated as a 64-bit quantity on
POWER9.  We define a new EXTEND_HDEC macro to encapsulate the feature
test for POWER9 and the sign extension.

To enable the sign extension to be removed in large decrementer mode,
we test the LPCR_LD bit in the host LPCR image stored in the struct
kvm for the guest.  If is set then large decrementer mode is enabled
and the sign extension should be skipped.

This is partly based on an earlier patch by Oliver O'Halloran.

Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

2f272463

22 May, 2017 2 commits

Linux 4.12-rc2 · 08332893
Linus Torvalds authored May 21, 2017

08332893

x86: fix 32-bit case of __get_user_asm_u64() · 33c9e972

Linus Torvalds authored May 21, 2017

The code to fetch a 64-bit value from user space was entirely buggered,
and has been since the code was merged in early 2016 in commit
b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
kernels").

Happily the buggered routine is almost certainly entirely unused, since
the normal way to access user space memory is just with the non-inlined
"get_user()", and the inlined version didn't even historically exist.

The normal "get_user()" case is handled by external hand-written asm in
arch/x86/lib/getuser.S that doesn't have either of these issues.

There were two independent bugs in __get_user_asm_u64():

 - it still did the STAC/CLAC user space access marking, even though
   that is now done by the wrapper macros, see commit 11f1a4b9
   ("x86: reorganize SMAP handling in user space accesses").

   This didn't result in a semantic error, it just means that the
   inlined optimized version was hugely less efficient than the
   allegedly slower standard version, since the CLAC/STAC overhead is
   quite high on modern Intel CPU's.

 - the double register %eax/%edx was marked as an output, but the %eax
   part of it was touched early in the asm, and could thus clobber other
   inputs to the asm that gcc didn't expect it to touch.

   In particular, that meant that the generated code could look like
   this:

        mov    (%eax),%eax
        mov    0x4(%eax),%edx

   where the load of %edx obviously was _supposed_ to be from the 32-bit
   word that followed the source of %eax, but because %eax was
   overwritten by the first instruction, the source of %edx was
   basically random garbage.

The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
the 64-bit output as early-clobber to let gcc know that no inputs should
alias with the output register.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org   # v4.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

33c9e972

21 May, 2017 7 commits

Clean up x86 unsafe_get/put_user() type handling · 334a023e

Linus Torvalds authored May 21, 2017

Al noticed that unsafe_put_user() had type problems, and fixed them in
commit a7cc722f ("fix unsafe_put_user()"), which made me look more
at those functions.

It turns out that unsafe_get_user() had a type issue too: it limited the
largest size of the type it could handle to "unsigned long".  Which is
fine with the current users, but doesn't match our existing normal
get_user() semantics, which can also handle "u64" even when that does
not fit in a long.

While at it, also clean up the type cast in unsafe_put_user().  We
actually want to just make it an assignment to the expected type of the
pointer, because we actually do want warnings from types that don't
convert silently.  And it makes the code more readable by not having
that one very long and complex line.

[ This patch might become stable material if we ever end up back-porting
  any new users of the unsafe uaccess code, but as things stand now this
  doesn't matter for any current existing uses. ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

334a023e

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f3926e4c

Linus Torvalds authored May 21, 2017

Pull misc uaccess fixes from Al Viro:
 "Fix for unsafe_put_user() (no callers currently in mainline, but
  anyone starting to use it will step into that) + alpha osf_wait4()
  infoleak fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  osf_wait4(): fix infoleak
  fix unsafe_put_user()

f3926e4c

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 970c305a

Linus Torvalds authored May 21, 2017

Pull scheduler fix from Thomas Gleixner:
 "A single scheduler fix:

  Prevent idle task from ever being preempted. That makes sure that
  synchronize_rcu_tasks() which is ignoring idle task does not pretend
  that no task is stuck in preempted state. If that happens and idle was
  preempted on a ftrace trampoline the machine crashes due to
  inconsistent state"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Call __schedule() from do_idle() without enabling preemption

970c305a

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7a3d627

Linus Torvalds authored May 21, 2017

Pull irq fixes from Thomas Gleixner:
 "A set of small fixes for the irq subsystem:

   - Cure a data ordering problem with chained interrupts

   - Three small fixlets for the mbigen irq chip"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Fix chained interrupt data ordering
  irqchip/mbigen: Fix the clear register offset calculation
  irqchip/mbigen: Fix potential NULL dereferencing
  irqchip/mbigen: Fix memory mapping code

e7a3d627

osf_wait4(): fix infoleak · a8c39544

Al Viro authored May 14, 2017

failing sys_wait4() won't fill struct rusage...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

a8c39544

fix unsafe_put_user() · a7cc722f

Al Viro authored May 21, 2017

__put_user_size() relies upon its first argument having the same type as what
the second one points to; the only other user makes sure of that and
unsafe_put_user() should do the same.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

a7cc722f

Merge tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 56f410cf

Linus Torvalds authored May 20, 2017

Pull tracing fixes from Steven Rostedt:

 - Fix a bug caused by not cleaning up the new instance unique triggers
   when deleting an instance. It also creates a selftest that triggers
   that bug.

 - Fix the delayed optimization happening after kprobes boot up self
   tests being removed by freeing of init memory.

 - Comment kprobes on why the delay optimization is not a problem for
   removal of modules, to keep other developers from searching that
   riddle.

 - Fix another case of rcu not watching in stack trace tracing.

* tag 'trace-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Make sure RCU is watching before calling a stack trace
  kprobes: Document how optimized kprobes are removed from module unload
  selftests/ftrace: Add test to remove instance with active event triggers
  selftests/ftrace: Fix bashisms
  ftrace: Remove #ifdef from code and add clear_ftrace_function_probes() stub
  ftrace/instances: Clear function triggers when removing instances
  ftrace: Simplify glob handling in unregister_ftrace_function_probe_func()
  tracing/kprobes: Enforce kprobes teardown after testing
  tracing: Move postpone selftests to core from early_initcall

56f410cf

20 May, 2017 13 commits

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 894e2164

Linus Torvalds authored May 20, 2017

Pull block fixes from Jens Axboe:
 "A small collection of fixes that should go into this cycle.

   - a pull request from Christoph for NVMe, which ended up being
     manually applied to avoid pulling in newer bits in master. Mostly
     fibre channel fixes from James, but also a few fixes from Jon and
     Vijay

   - a pull request from Konrad, with just a single fix for xen-blkback
     from Gustavo.

   - a fuseblk bdi fix from Jan, fixing a regression in this series with
     the dynamic backing devices.

   - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull().

   - a request leak fix for drbd from Lars, fixing a regression in the
     last series with the kref changes. This will go to stable as well"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvmet: release the sq ref on rdma read errors
  nvmet-fc: remove target cpu scheduling flag
  nvme-fc: stop queues on error detection
  nvme-fc: require target or discovery role for fc-nvme targets
  nvme-fc: correct port role bits
  nvme: unmap CMB and remove sysfs file in reset path
  blktrace: fix integer parse
  fuseblk: Fix warning in super_setup_bdi_name()
  block: xen-blkback: add null check to avoid null pointer dereference
  drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()

894e2164

nvmet: release the sq ref on rdma read errors · 549f01ae

Vijay Immanuel authored May 08, 2017

On rdma read errors, release the sq ref that was taken
when the req was initialized. This avoids a hang in
nvmet_sq_destroy() when the queue is being freed.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

549f01ae

nvmet-fc: remove target cpu scheduling flag · 4b8ba5fa

James Smart authored Apr 25, 2017

Remove NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED. It's unnecessary.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

4b8ba5fa

nvme-fc: stop queues on error detection · 2952a879

James Smart authored Apr 25, 2017

Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Rather than waiting for reset work thread to stop queues and abort the ios,
immediately stop the queues on error detection. Reset thread will restop
the queues (as it's called on other paths), but it does not appear to have
a side effect.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

2952a879

nvme-fc: require target or discovery role for fc-nvme targets · 85e6a6ad

James Smart authored May 05, 2017

In order to create an association, the remoteport must be
serving either a target role or a discovery role.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

85e6a6ad

nvme-fc: correct port role bits · 41231090

James Smart authored May 05, 2017

FC Port roles is a bit mask, not individual values.
Correct nvme definitions to unique bits.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

41231090

nvme: unmap CMB and remove sysfs file in reset path · f63572df

Jon Derrick authored May 05, 2017

CMB doesn't get unmapped until removal while getting remapped on every
reset. Add the unmapping and sysfs file removal to the reset path in
nvme_pci_disable to match the mapping path in nvme_pci_enable.

Fixes: 202021c1 ("nvme : Add sysfs entry for NVMe CMBs when appropriate")
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-By: Stephen Bates <sbates@raithlin.com>
Cc: <stable@vger.kernel.org> # 4.9+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

f63572df

Merge tag 'staging-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · ef82f1ad

Linus Torvalds authored May 20, 2017

Pull staging driver fixes from Greg KH:
 "Here are a number of staging driver fixes for 4.12-rc2

  Most of them are typec driver fixes found by reviewers and users of
  the code. There are also some removals of files no longer needed in
  the tree due to the ion driver rewrite in 4.12-rc1, as well as some
  wifi driver fixes. And to round it out, a MAINTAINERS file update.

  All have been in linux-next with no reported issues"

* tag 'staging-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (22 commits)
  MAINTAINERS: greybus-dev list is members-only
  staging: fsl-dpaa2/eth: add ETHERNET dependency
  staging: typec: fusb302: refactor resume retry mechanism
  staging: typec: fusb302: reset i2c_busy state in error
  staging: rtl8723bs: remove re-positioned call to kfree in os_dep/ioctl_cfg80211.c
  staging: rtl8192e: GetTs Fix invalid TID 7 warning.
  staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
  staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
  staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
  staging: vc04_services: Fix bulk cache maintenance
  staging: ccree: remove extraneous spin_unlock_bh() in error handler
  staging: typec: Fix sparse warnings about incorrect types
  staging: typec: fusb302: do not free gpio from managed resource
  staging: typec: tcpm: Fix Port Power Role field in PS_RDY messages
  staging: typec: tcpm: Respond to Discover Identity commands
  staging: typec: tcpm: Set correct flags in PD request messages
  staging: typec: tcpm: Drop duplicate PD messages
  staging: typec: fusb302: Fix chip->vbus_present init value
  staging: typec: fusb302: Fix module autoload
  staging: typec: tcpci: declare private structure as static
  ...

ef82f1ad

Merge tag 'usb-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 32026293

Linus Torvalds authored May 20, 2017

Pull USB fixes from Greg KH:
 "Here are a number of small USB fixes for 4.12-rc2

  Most of them come from Johan, in his valiant quest to fix up all
  drivers that could be affected by "malicious" USB devices. There's
  also some fixes for more "obscure" drivers to handle some of the
  vmalloc stack fallout (which for USB drivers, was always the case, but
  very few people actually ran those systems...)

  Other than that, the normal set of xhci and gadget and musb driver
  fixes as well.

  All have been in linux-next with no reported issues"

* tag 'usb-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (42 commits)
  usb: musb: tusb6010_omap: Do not reset the other direction's packet size
  usb: musb: Fix trying to suspend while active for OTG configurations
  usb: host: xhci-plat: propagate return value of platform_get_irq()
  xhci: Fix command ring stop regression in 4.11
  xhci: remove GFP_DMA flag from allocation
  USB: xhci: fix lock-inversion problem
  usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd
  usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
  xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
  usb: xhci: trace URB before giving it back instead of after
  USB: serial: qcserial: add more Lenovo EM74xx device IDs
  USB: host: xhci: use max-port define
  USB: hub: fix SS max number of ports
  USB: hub: fix non-SS hub-descriptor handling
  USB: hub: fix SS hub-descriptor handling
  USB: usbip: fix nonconforming hub descriptor
  USB: gadget: dummy_hcd: fix hub-descriptor removable fields
  doc-rst: fixed kernel-doc directives in usb/typec.rst
  USB: core: of: document reference taken by companion helper
  USB: ehci-platform: fix companion-device leak
  ...

32026293

Merge tag 'char-misc-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 331da109

Linus Torvalds authored May 20, 2017

Pull char/misc driver fixes from Greg KH:
 "Here are five small bugfixes for reported issues with 4.12-rc1 and
  earlier kernels. Nothing huge here, just a lp, mem, vpd, and uio
  driver fix, along with a Kconfig fixup for one of the misc drivers.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  firmware: Google VPD: Fix memory allocation error handling
  drivers: char: mem: Check for address space wraparound with mmap()
  uio: fix incorrect memory leak cleanup
  misc: pci_endpoint_test: select CRC32
  char: lp: fix possible integer overflow in lp_setup()

331da109

Merge git://www.linux-watchdog.org/linux-watchdog · ec53c027

Linus Torvalds authored May 20, 2017

Pull watchdog fixes from Wim Van Sebroeck:
 - orion_wdt compile-test dependencies
 - sama5d4_wdt: WDDIS handling and a race confition
 - pcwd_usb: fix NULL-deref at probe
 - cadence_wdt: fix timeout setting
 - wdt_pci: fix build error if SOFTWARE_REBOOT is defined
 - iTCO_wdt: all versions count down twice
 - zx2967: remove redundant dev_err call in zx2967_wdt_probe()
 - bcm281xx: Fix use of uninitialized spinlock

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: bcm281xx: Fix use of uninitialized spinlock.
  watchdog: zx2967: remove redundant dev_err call in zx2967_wdt_probe()
  iTCO_wdt: all versions count down twice
  watchdog: wdt_pci: fix build error if define SOFTWARE_REBOOT
  watchdog: cadence_wdt: fix timeout setting
  watchdog: pcwd_usb: fix NULL-deref at probe
  watchdog: sama5d4: fix race condition
  watchdog: sama5d4: fix WDDIS handling
  watchdog: orion: fix compile-test dependencies

ec53c027

Merge tag 'drm-fixes-for-v4.12-rc2' of git://people.freedesktop.org/~airlied/linux · cf80a6fb

Linus Torvalds authored May 20, 2017

Pull drm fixes from Dave Airlie:
 "Mostly nouveau and i915, fairly quiet as usual for rc2"

* tag 'drm-fixes-for-v4.12-rc2' of git://people.freedesktop.org/~airlied/linux:
  drm/atmel-hlcdc: Fix output initialization
  gpu: host1x: select IOMMU_IOVA
  drm/nouveau/fifo/gk104-: Silence a locking warning
  drm/nouveau/secboot: plug memory leak in ls_ucode_img_load_gr() error path
  drm/nouveau: Fix drm poll_helper handling
  drm/i915: don't do allocate_va_range again on PIN_UPDATE
  drm/i915: Fix rawclk readout for g4x
  drm/i915: Fix runtime PM for LPE audio
  drm/i915/glk: Fix DSI "*ERROR* ULPS is still active" messages
  drm/i915/gvt: avoid unnecessary vgpu switch
  drm/i915/gvt: not to restore in-context mmio
  drm/etnaviv: don't put fence in case of submit failure
  drm/i915/gvt: fix typo: "supporte" -> "support"
  drm: hdlcd: Fix the calculation of the scanout start address

cf80a6fb

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 6fe1de43

Linus Torvalds authored May 19, 2017

Pull SCSI fixes from James Bottomley:
 "This is the first sweep of mostly minor fixes. There's one security
  one: the read past the end of a buffer in qedf, and a panic fix for
  lpfc SLI-3 adapters, but the rest are a set of include and build
  dependency tidy ups and assorted other small fixes and updates"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: pmcraid: remove redundant check to see if request_size is less than zero
  scsi: lpfc: ensure els_wq is being checked before destroying it
  scsi: cxlflash: Select IRQ_POLL
  scsi: qedf: Avoid reading past end of buffer
  scsi: qedf: Cleanup the type of io_log->op
  scsi: lpfc: double lock typo in lpfc_ns_rsp()
  scsi: qedf: properly update arguments position in function call
  scsi: scsi_lib: Add #include <scsi/scsi_transport.h>
  scsi: MAINTAINERS: update OSD entries
  scsi: Skip deleted devices in __scsi_device_lookup
  scsi: lpfc: Fix panic on BFS configuration
  scsi: libfc: do not flood console with messages 'libfc: queue full ...'

6fe1de43