Commits · 6d8c6c0f97ad8a3517c42b179c1dc8e77397d0e2 · Kirill Smelkov / linux

07 Apr, 2017 9 commits

blk-mq: Restart a single queue if tag sets are shared · 6d8c6c0f

Bart Van Assche authored Apr 07, 2017

To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

6d8c6c0f

dm rq: Avoid that request processing stalls sporadically · 6077c2d7

Bart Van Assche authored Apr 07, 2017

While running the srp-test software I noticed that request
processing stalls sporadically at the beginning of a test, namely
when mkfs is run against a dm-mpath device. Every time when that
happened the following command was sufficient to resume request
processing:

    echo run >/sys/kernel/debug/block/dm-0/state

This patch avoids that such request processing stalls occur. The
test I ran is as follows:

    while srp-test/run_tests -d -r 30 -t 02-mq; do :; done
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Jens Axboe <axboe@fb.com>

6077c2d7

scsi: Avoid that SCSI queues get stuck · 36e3cf27

Bart Van Assche authored Apr 07, 2017

If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued again successfully.

commit 52d7f1b5 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.

Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.

Fixes: commit 52d7f1b5 ("blk-mq: Avoid that requeueing starts stopped queues")
Fixes: commit 7e79dadc ("blk-mq: stop hardware queue in blk_mq_delay_queue()")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

36e3cf27

blk-mq: Introduce blk_mq_delay_run_hw_queue() · 7587a5ae

Bart Van Assche authored Apr 07, 2017

Introduce a function that runs a hardware queue unconditionally
after a delay. Note: there is already a function that stops and
restarts a hardware queue after a delay, namely blk_mq_delay_queue().

This function will be used in the next patch in this series.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

7587a5ae

blk-mq: remap queues when adding/removing hardware queues · ebe8bddb

Omar Sandoval authored Apr 07, 2017

blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
behavior that drivers expect. However, commit 4e68a011 changed
blk_mq_queue_reinit() to not remap queues for the case of CPU
hotplugging, inadvertently making blk_mq_update_nr_hw_queues() not remap
queues as well. This breaks, for example, NBD's multi-connection mode,
leaving the added hardware queues unused. Fix it by making
blk_mq_update_nr_hw_queues() explicitly remap the queues.

Fixes: 4e68a011 ("blk-mq: don't redistribute hardware queues on a CPU hotplug event")
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

ebe8bddb

blk-mq-sched: fix crash in switch error path · 54d5329d

Omar Sandoval authored Apr 07, 2017

In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
back to the original scheduler. However, at this point, we've already
torn down the original scheduler's tags, so this causes a crash. Doing
the fallback like the legacy elevator path is much harder for mq, so fix
it by just falling back to none, instead.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

54d5329d

blk-mq-sched: set up scheduler tags when bringing up new queues · 93252632

Omar Sandoval authored Apr 05, 2017

If a new hardware queue is added at runtime, we don't allocate scheduler
tags for it, leading to a crash. This hooks up the scheduler framework
to blk_mq_{init,exit}_hctx() to make sure everything gets properly
initialized/freed.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

93252632

blk-mq-sched: refactor scheduler initialization · 6917ff0b

Omar Sandoval authored Apr 05, 2017

Preparation cleanup for the next couple of fixes, push
blk_mq_sched_setup() and e->ops.mq.init_sched() into a helper.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

6917ff0b

blk-mq: use the right hctx when getting a driver tag fails · 81380ca1

Omar Sandoval authored Apr 07, 2017

While dispatching requests, if we fail to get a driver tag, we mark the
hardware queue as waiting for a tag and put the requests on a
hctx->dispatch list to be run later when a driver tag is freed. However,
blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware
queues if using a single-queue scheduler with a multiqueue device. If
blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we
are processing. This means we end up using the hardware queue of the
previous request, which may or may not be the same as that of the
current request. If it isn't, the wrong hardware queue will end up
waiting for a tag, and the requests will be on the wrong dispatch list,
leading to a hang.

The fix is twofold:

1. Make sure we save which hardware queue we were trying to get a
   request for in blk_mq_get_driver_tag() regardless of whether it
   succeeds or not.
2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a
   blk_mq_hw_queue to make it clear that it must handle multiple
   hardware queues, since I've already messed this up on a couple of
   occasions.

This didn't appear in testing with nvme and mq-deadline because nvme has
more driver tags than the default number of scheduler tags. However,
with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

81380ca1

04 Apr, 2017 1 commit
- Merge branch 'nvme-4.11-rc' of git://git.infradead.org/nvme into for-linus · 8945927c
  Jens Axboe authored Apr 04, 2017
```
Sagi writes:

We have one spec mis-match fix from Roland and several sparse fixes from
Christoph.
```
  8945927c
02 Apr, 2017 5 commits

nvmet: fix byte swap in nvmet_parse_io_cmd · 793c7ed9

Christoph Hellwig authored Mar 31, 2017

We need to do arithmetics after byte swapping, not before.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

793c7ed9

nvmet: fix byte swap in nvmet_execute_write_zeroes · 78ce3daa

Christoph Hellwig authored Mar 31, 2017

The length field in the Write Zeroes command is a 16-bit field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

78ce3daa

nvmet: add missing byte swap in nvmet_get_smart_log · 5ac5fcc6

Christoph Hellwig authored Mar 31, 2017

In this case entirely harmless as it's all-ones, but still nice to
shut up sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

5ac5fcc6

nvme: add missing byte swap in nvme_setup_discard · f1dd03a8

Christoph Hellwig authored Mar 31, 2017

Fixes: b35ba01e ("nvme: support ranged discard requests")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

f1dd03a8

nvme: Correct NVMF enum values to match NVMe-oF rev 1.0 · bf17aa36

Roland Dreier authored Mar 01, 2017

The enum values for QPTYPE, PRTYPE and CMS are off by 1 from the
values defined in figure 42 of the NVM Express over Fabrics 1.0:

    http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605-1.pdf

Fix our enums to match the final spec.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

bf17aa36

30 Mar, 2017 2 commits

block: do not put mq context in blk_mq_alloc_request_hctx · ac77a0c4

Minchan Kim authored Mar 30, 2017

In blk_mq_alloc_request_hctx, blk_mq_sched_get_request doesn't
get sw context so we don't need to put the context with
blk_mq_put_ctx. Unless, we will see preempt counter underflow.

Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>

ac77a0c4

Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 89970a04

Linus Torvalds authored Mar 29, 2017

Pull thermal management fixes from Zhang Rui:

 - Fix a potential deadlock in cpu_cooling driver, which was introduced
   in 4.11-rc1. (Matthew Wilcox)

 - Fix the cpu_cooling and devfreq_cooling code to handle possible error
   return value from OPP calls, together with three minor fixes in the
   same patch series. (Viresh Kumar)

* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal: cpu_cooling: Check OPP for errors
  thermal: cpu_cooling: Replace dev_warn with dev_err
  thermal: devfreq: Check OPP for errors
  thermal: devfreq_cooling: Replace dev_warn with dev_err
  thermal: devfreq: Simplify expression
  thermal: Fix potential deadlock in cpu_cooling

89970a04

29 Mar, 2017 12 commits

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 806276b7

Linus Torvalds authored Mar 29, 2017

Pull block fixes from Jens Axboe:
 "Five fixes for this series:

   - a fix from me to ensure that blk-mq drivers that terminate IO in
     their ->queue_rq() handler by returning QUEUE_ERROR don't stall
     with a scheduler enabled.

   - four nbd fixes from Josef and Ratna, fixing various problems that
     are critical enough to go in for this cycle. They have been well
     tested"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nbd: replace kill_bdev() with __invalidate_device()
  nbd: set queue timeout properly
  nbd: set rq->errors to actual error code
  nbd: handle ERESTARTSYS properly
  blk-mq: include errors in did_work calculation

806276b7

Merge branch 'apw' (xfrm_user fixes) · 52b9c816

Linus Torvalds authored Mar 29, 2017

Merge xfrm_user validation fixes from Andy Whitcroft:
 "Two patches we are applying to Ubuntu for XFRM_MSG_NEWAE validation
  issue reported by ZDI.

  The first of these is the primary fix, and the second is for a more
  theoretical issue that Kees pointed out when reviewing the first"

* emailed patches from Andy Whitcroft <apw@canonical.com>:
  xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
  xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window

52b9c816

Merge branch 'regset' (PTRACE_SETREGSET data leakage) · 72c33734

Linus Torvalds authored Mar 29, 2017

Merge PTRACE_SETREGSET leakage fixes from Dave Martin:
 "This series is the collection of fixes I proposed on this topic, that
  have not yet appeared upstream or in the stable branches,

  The issue can leak kernel stack, but doesn't appear to allow userspace
  to attack the kernel directly.  The affected architectures are c6x,
  h8300, metag, mips and sparc.

  [ Mark Salter points out that c6x has no MMU or other mechanism to
    prevent userspace access to kernel code or data on c6x, but it
    doesn't hurt to clean that case up too. ]

  The bugs arise from use of user_regset_copyin(). Users of
  user_regset_copyin() can work in one of two ways:

   1) Copy directly to thread_struct or equivalent. (This seems to be
      the design assumption of the regset API, and is the most common
      approach.)

   2) Copy to a local variable and then transfer to thread_struct. (A
      significant minority of cases.)

  Buggy code typically involves approach 2"

* emailed patches from Dave Martin <Dave.Martin@arm.com>:
  sparc/ptrace: Preserve previous registers for short regset write
  mips/ptrace: Preserve previous registers for short regset write
  metag/ptrace: Reject partial NT_METAG_RPIPE writes
  metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
  metag/ptrace: Preserve previous registers for short regset write
  h8300/ptrace: Fix incorrect register transfer count
  c6x/ptrace: Remove useless PTRACE_SETREGSET implementation

72c33734

sparc/ptrace: Preserve previous registers for short regset write · d3805c54

Dave Martin authored Mar 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d3805c54

mips/ptrace: Preserve previous registers for short regset write · d614fd58

Dave Martin authored Mar 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d614fd58

metag/ptrace: Reject partial NT_METAG_RPIPE writes · 7195ee31

Dave Martin authored Mar 27, 2017

It's not clear what behaviour is sensible when doing partial write of
NT_METAG_RPIPE, so just don't bother.

This patch assumes that userspace will never rely on a partial SETREGSET
in this case, since it's not clear what should happen anyway.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7195ee31

metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS · 5fe81fe9

Dave Martin authored Mar 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill TXSTATUS, a well-defined default value is used, based on the
task's current value.
Suggested-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5fe81fe9

metag/ptrace: Preserve previous registers for short regset write · a78ce80d

Dave Martin authored Mar 27, 2017

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a78ce80d

h8300/ptrace: Fix incorrect register transfer count · 502585c7

Dave Martin authored Mar 27, 2017

regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun
if CONFIG_CPU_H8S is set, since this adds an extra entry to
register_offset[] but not to user_regs_struct.

So, iterate over user_regs_struct based on its actual size, not based on
the length of register_offset[].
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

502585c7

c6x/ptrace: Remove useless PTRACE_SETREGSET implementation · fb411b83

Dave Martin authored Mar 27, 2017

gpr_set won't work correctly and can never have been tested, and the
correct behaviour is not clear due to the endianness-dependent task
layout.

So, just remove it.  The core code will now return -EOPNOTSUPPORT when
trying to set NT_PRSTATUS on this architecture until/unless a correct
implementation is supplied.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fb411b83

xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder · f843ee6d

Andy Whitcroft authored Mar 23, 2017

Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues.  To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f843ee6d

xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window · 677e806d

Andy Whitcroft authored Mar 22, 2017

When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer.  However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents.  We do
not at this point check that the replay_window is within the allocated
memory.  This leads to out-of-bounds reads and writes triggered by
netlink packets.  This leads to memory corruption and the potential for
priviledge escalation.

We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn.  It however does not check the replay_window
remains within that buffer.  Add validation of the contained
replay_window.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

677e806d

28 Mar, 2017 9 commits

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · fe82203b

Linus Torvalds authored Mar 28, 2017

Pull virtio fixes from Michael Tsirkin:
 "Fixes to multiple issues in virtio.

  Most notably a regression fix for crashes reported by Fedora users.
  Hibernate is still reportedly broken, working on it"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: prevent uninitialized variable use
  virtio-balloon: use actual number of stats for stats queue buffers
  virtio_balloon: init 1st buffer in stats vq
  virtio_pci: fix out of bound access for msix_names

fe82203b

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 050fc52d

Linus Torvalds authored Mar 28, 2017

Pull KVM fixes from Paolo Bonzini:
 "All x86-specific, apart from some arch-independent syzkaller fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: cleanup the page tracking SRCU instance
  KVM: nVMX: fix nested EPT detection
  KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
  KVM: kvm_io_bus_unregister_dev() should never fail
  KVM: VMX: Fix enable VPID conditions
  KVM: nVMX: Fix nested VPID vmx exec control
  KVM: x86: correct async page present tracepoint
  kvm: vmx: Flush TLB when the APIC-access address changes
  KVM: x86: use pic/ioapic destructor when destroy vm
  KVM: x86: check existance before destroy
  KVM: x86: clear bus pointer when destroyed
  KVM: Documentation: document MCE ioctls
  KVM: nVMX: don't reset kvm mmu twice
  PTP: fix ptr_ret.cocci warnings
  kvm: fix usage of uninit spinlock in avic_vm_destroy()
  KVM: VMX: downgrade warning on unexpected exit code

050fc52d

virtio_balloon: prevent uninitialized variable use · f0bb2d50

Arnd Bergmann authored Mar 28, 2017

The latest gcc-7.0.1 snapshot reports a new warning:

virtio/virtio_balloon.c: In function 'update_balloon_stats':
virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in this function [-Werror=uninitialized]

This seems absolutely right, so we should add an extra check to
prevent copying uninitialized stack data into the statistics.
>From all I can tell, this has been broken since the statistics code
was originally added in 2.6.34.

Fixes: 9564e138 ("virtio: Add memory statistics reporting to the balloon driver (V4)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

f0bb2d50

virtio-balloon: use actual number of stats for stats queue buffers · 9646b26e

Ladi Prosek authored Mar 28, 2017

The virtio balloon driver contained a not-so-obvious invariant that
update_balloon_stats has to update exactly VIRTIO_BALLOON_S_NR counters
in order to send valid stats to the host. This commit fixes it by having
update_balloon_stats return the actual number of counters, and its
callers use it when pushing buffers to the stats virtqueue.

Note that it is still out of spec to change the number of counters
at run-time. "Driver MUST supply the same subset of statistics in all
buffers submitted to the statsq."
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

9646b26e

virtio_balloon: init 1st buffer in stats vq · fc865322

Ladi Prosek authored Mar 23, 2017

When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.

This patch updates the stats before pushing the initial buffer.

Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
  virtio implementation and violates the spec "Driver MUST supply the
  same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
  spec clause, plus "invalid tag" is not really defined.

Note: the spec says:
	When using the legacy interface, the device SHOULD ignore all values in
	the first buffer in the statsq supplied by the driver after device
	initialization. Note: Historically, drivers supplied an uninitialized
	buffer in the first buffer.

Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.

Cc: stable@vger.kernel.org
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

fc865322

virtio_pci: fix out of bound access for msix_names · de85ec8b

Jason Wang authored Mar 23, 2017

Fedora has received multiple reports of crashes when running
4.11 as a guest

https://bugzilla.redhat.com/show_bug.cgi?id=1430297
https://bugzilla.redhat.com/show_bug.cgi?id=1434462
https://bugzilla.kernel.org/show_bug.cgi?id=194911
https://bugzilla.redhat.com/show_bug.cgi?id=1433899

The crashes are not always consistent but they are generally
some flavor of oops or GPF in virtio related code. Multiple people
have done bisections (Thank you Thorsten Leemhuis and
Richard W.M. Jones) and found this commit to be at fault

07ec5148 is the first bad commit
commit 07ec5148
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Feb 5 18:15:19 2017 +0100

    virtio_pci: use shared interrupts for virtqueues

The issue seems to be an out of bounds access to the msix_names
array corrupting kernel memory.

Fixes: 07ec5148 ("virtio_pci: use shared interrupts for virtqueues")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>

de85ec8b

KVM: x86: cleanup the page tracking SRCU instance · 2beb6dad

Paolo Bonzini authored Mar 27, 2017

SRCU uses a delayed work item.  Skip cleaning it up, and
the result is use-after-free in the work item callbacks.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: 0eb05bf2Reviewed-by: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2beb6dad

KVM: nVMX: fix nested EPT detection · 7ad658b6

Ladi Prosek authored Mar 23, 2017

The nested_ept_enabled flag introduced in commit 7ca29de2 was not
computed correctly. We are interested only in L1's EPT state, not the
the combined L0+L1 value.

In particular, if L0 uses EPT but L1 does not, nested_ept_enabled must
be false to make sure that PDPSTRs are loaded based on CR3 as usual,
because the special case described in 26.3.2.4 Loading Page-Directory-
Pointer-Table Entries does not apply.

Fixes: 7ca29de2 ("KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT")
Cc: qemu-stable@nongnu.org
Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7ad658b6

KVM: pci-assign: do not map smm memory slot pages in vt-d page tables · 0292e169

Herongguang (Stephen) authored Mar 27, 2017

or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when
destroy VM.

This is consistent with current vfio implementation.
Signed-off-by: herongguang <herongguang.he@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0292e169

27 Mar, 2017 2 commits

Merge tag 'edac_for_4.11_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · ad0376eb

Linus Torvalds authored Mar 27, 2017

Pull EDAC updates from Borislav Petkov:
 "A new EDAC driver for the Pondicherry2 memory controller IP found in
  the Intel Apollo Lake platform and the Denverton microserver.

  Plus small fixlets.

  Normally I had this queued for 4.12 but Tony requested for the
  pnd2_edac driver to possibly land in 4.11 therefore I'm sending it to
  you now.

  It is a driver for new hardware which people don't have yet so it
  shouldn't cause any regressions.

  The couple of patches ontop of it show that Qiuxu actually did test it
  on the hardware he has access to :)"

* tag 'edac_for_4.11_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, pnd2_edac: Fix reported DIMM number
  EDAC, pnd2_edac: Fix !EDAC_DEBUG build
  EDAC: Select DEBUG_FS
  EDAC, pnd2_edac: Add new EDAC driver for Intel SoC platforms
  EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
  EDAC, xgene: Fix wrongly spelled "procesing"

ad0376eb

Merge tag 'pinctrl-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 85f91d5c

Linus Torvalds authored Mar 27, 2017

Pull more pin control fixes from Linus Walleij:
 "Here is a bunch of pin control fixes again

  A bit more than I'd like for this subsystem at this point, but what
  can I do. They are all driver fixes for hardware issues, as like "we
  forgot", "we didn't think of the fact that this could happen", "oops
  that one goes there" etc

   - Kconfig fixup for the TI IOdelay pinctrl-single add-on

   - fix up a typo in the meson i2c ao groups

   - switch a remapping back to use devm_ioremap() as
     devm_ioremap_resource() does not allow for sharing memory regions

   - do not clear the Qualcomm irq status bit in irq_unmask(), as this
     can lead to missing interrupts while the irq handler is executing

   - add irq_request/release_resources() on the ST driver

   - add a bunch of mysteriously missing pingroups for high numbered
     pins in the Qualcomm ipq4019 driver"

* tag 'pinctrl-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: ipq4019: add missing pingroups for pins > 70
  pinctrl: st: add irq_request/release_resources callbacks
  pinctrl: qcom: Don't clear status bit on irq_unmask
  pinctrl: samsung: Fix memory mapping code
  pinctrl: meson-gxbb: Fix typo in i2c ao groups
  pinctrl: ti: The IODelay driver is a DRA7xxx feature so depend on that SoC

85f91d5c