Commits · 0185604c2d82c560dab2f2933a18f797e74ab5a8 · nexedi / linux

22 Dec, 2015 4 commits

KVM: x86: Reload pit counters for all channels when restoring state · 0185604c

Andrew Honig authored Nov 18, 2015

Currently if userspace restores the pit counters with a count of 0
on channels 1 or 2 and the guest attempts to read the count on those
channels, then KVM will perform a mod of 0 and crash.  This will ensure
that 0 values are converted to 65536 as per the spec.

This is CVE-2015-7513.
Signed-off-by: Andy Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0185604c

KVM: MTRR: treat memory as writeback if MTRR is disabled in guest CPUID · e24dea2a

Paolo Bonzini authored Dec 22, 2015

Virtual machines can be run with CPUID such that there are no MTRRs.
In that case, the firmware will never enable MTRRs and it is obviously
undesirable to run the guest entirely with UC memory. Check out guest
CPUID, and use WB memory if MTRR do not exist.

Cc: qemu-stable@nongnu.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107561Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e24dea2a

KVM: MTRR: observe maxphyaddr from guest CPUID, not host · fa7c4ebd

Paolo Bonzini authored Dec 14, 2015

Conversion of MTRRs to ranges used the maxphyaddr from the boot CPU.
This is wrong, because var_mtrr_range's mask variable then is discontiguous
(like FF00FFFF000, where the first run of 0s corresponds to the bits
between host and guest maxphyaddr).  Instead always set up the masks
to be full 64-bit values---we know that the reserved bits at the top
are zero, and we can restore them when reading the MSR.  This way
var_mtrr_range gets a mask that just works.

Fixes: a13842dc
Cc: qemu-stable@nongnu.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107561Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fa7c4ebd

KVM: MTRR: fix fixed MTRR segment look up · a7f2d786

Alexis Dambricourt authored Dec 14, 2015

This fixes the slow-down of VM running with pci-passthrough, since some MTRR
range changed from MTRR_TYPE_WRBACK to MTRR_TYPE_UNCACHABLE.  Memory in the
0K-640K range was incorrectly treated as uncacheable.

Fixes: f7bfb57b
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107561
Cc: qemu-stable@nongnu.org
Signed-off-by: Alexis Dambricourt <alexis.dambricourt@gmail.com>
[Use correct BZ for "Fixes" annotation.  - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a7f2d786

18 Dec, 2015 1 commit

Merge tag 'kvm-arm-for-v4.4-rc6' of... · 6fd08621

Paolo Bonzini authored Dec 18, 2015

Merge tag 'kvm-arm-for-v4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM fixes for v4.4-rc6

- Fix for the active interrupt detection code, affecting
  the timer interrupt injection.

6fd08621

14 Dec, 2015 1 commit

KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX · 81b1b9ca

Haozhong Zhang authored Dec 14, 2015

The current handling of accesses to guest MSR_TSC_AUX returns error if
vcpu does not support rdtscp, though those accesses are initiated by
host. This can result in the reboot failure of some versions of
QEMU. This patch fixes this issue by passing those host initiated
accesses for further handling instead.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

81b1b9ca

11 Dec, 2015 3 commits

KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check · fdec12c1

Christoffer Dall authored Dec 10, 2015

External inputs to the vgic from time to time need to poke into the
state of a virtual interrupt, the prime example is the architected timer
code.

Since the IRQ's active state can be represented in two places; the LR or
the distributor, we first loop over the LRs but if not active in the LRs
we just return if *any* IRQ is active on the VCPU in question.

This is of course bogus, as we should check if the specific IRQ in
quesiton is active on the distributor instead.
Reported-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

fdec12c1

Merge branch 'kvm-ppc-fixes' of... · e307beca

Paolo Bonzini authored Dec 11, 2015

Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master

e307beca

kvm: x86: move tracepoints outside extended quiescent state · 8b89fe1f

Paolo Bonzini authored Dec 10, 2015

Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
lockdep splat.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8b89fe1f

10 Dec, 2015 4 commits

Merge tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio · 6764e5eb

Linus Torvalds authored Dec 09, 2015

Pull VFIO fixes from Alex Williamson:

 - Various fixes for removing redundancy, const'ifying structs, avoiding
   stack usage, fixing WARN usage (Krzysztof Kozlowski, Julia Lawall,
   Kees Cook, Dan Carpenter)

 - Revert No-IOMMU mode as the intended user has not emerged (Alex
   Williamson)

* tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio:
  Revert: "vfio: Include No-IOMMU mode"
  vfio: fix a warning message
  vfio: platform: remove needless stack usage
  vfio-pci: constify pci_error_handlers structures
  vfio: Drop owner assignment from platform_driver

6764e5eb

Merge tag 'devicetree-fixes-for-4.4-rc4' of... · eef121f4

Linus Torvalds authored Dec 09, 2015

Merge tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DT fixes from Rob Herring:
 "I think this should be all for 4.4:

   - Fix incorrect warning about overlapping memory regions

   - Export of_irq_find_parent again which was made static in 4.4, but
     has users pending for 4.5.

   - Fix of_msi_map_rid declaration location

   - Fix re-entrancy for of_fdt_unflatten_tree

   - Clean-up of phys_addr_t printks"

* tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of/irq: move of_msi_map_rid declaration to the correct ifdef section
  of/irq: Export of_irq_find_parent again
  of/fdt: Add mutex protection for calls to __unflatten_device_tree()
  of/address: fix typo in comment block of of_translate_one()
  of: do not use 0x in front of %pa
  of: Fix comparison of reserved memory regions

eef121f4

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · abb7e2b3

Linus Torvalds authored Dec 09, 2015

Pull clk fixes from Stephen Boyd:
 "One small build fix, a couple do_div() fixes, and a fix for the gpio
  basic clock type are the major changes here.  There's also a couple
  fixes for the TI, sunxi, and scpi clock drivers"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi: pll2: Fix clock running too fast
  clk: scpi: add missing of_node_put
  clk: qoriq: fix memory leak
  imx/clk-pllv2: fix wrong do_div() usage
  imx/clk-pllv1: fix wrong do_div() usage
  clk: mmp: add linux/clk.h includes
  clk: ti: drop locking code from mux/divider drivers
  clk: ti816x: Add missing dmtimer clkdev entries
  clk: ti: fapll: fix wrong do_div() usage
  clk: ti: clkt_dpll: fix wrong do_div() usage
  clk: gpio: Get parent clk names in of_gpio_clk_setup()

abb7e2b3

KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR · c20875a3

Paul Mackerras authored Nov 12, 2015

Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination.  The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).

Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point.  If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.

This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).

Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Paul Mackerras <paulus@samba.org>

c20875a3

09 Dec, 2015 8 commits

Merge tag 'for-linus-4.4-1' of git://git.code.sf.net/p/openipmi/linux-ipmi · 9a0f76fd

Linus Torvalds authored Dec 09, 2015

Pull IPMI fix from Corey Minyard:
 "Fix an Oops if an interrupt occurs at startup.  This can happen on
  some hardware"

* tag 'for-linus-4.4-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: move timer init to before irq is setup

9a0f76fd

ipmi: move timer init to before irq is setup · 27f972d3

Jan Stancek authored Dec 08, 2015

We encountered a panic on boot in ipmi_si on a dell per320 due to an
uninitialized timer as follows.

static int smi_start_processing(void       *send_info,
                                ipmi_smi_t intf)
{
        /* Try to claim any interrupts. */
        if (new_smi->irq_setup)
                new_smi->irq_setup(new_smi);

 --> IRQ arrives here and irq handler tries to modify uninitialized timer

    which triggers BUG_ON(!timer->function) in __mod_timer().

 Call Trace:
   <IRQ>
   [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
   [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
   [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
   [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
   [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
   [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
   [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
   [<ffffffff8100fc59>] handle_irq+0x49/0xa0
   [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
   [<ffffffff8100ba53>] ret_from_intr+0x0/0x11

        /* Set up the timer that drives the interface. */
        setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);

The following patch fixes the problem.

To: Openipmi-developer@lists.sourceforge.net
To: Corey Minyard <minyard@acm.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # Applies cleanly to 3.10-, needs small rework before

27f972d3

bitops.h: correctly handle rol32 with 0 byte shift · d7e35dfa

Sasha Levin authored Dec 03, 2015

ROL on a 32 bit integer with a shift of 32 or more is undefined and the
result is arch-dependent. Avoid this by handling the trivial case of
roling by 0 correctly.

The trivial solution of checking if shift is 0 breaks gcc's detection
of this code as a ROL instruction, which is unacceptable.

This bug was reported and fixed in GCC
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):

	The standard rotate idiom,

	  (x << n) | (x >> (32 - n))

	is recognized by gcc (for concreteness, I discuss only the case that x
	is an uint32_t here).

	However, this is portable C only for n in the range 0 < n < 32. For n
	== 0, we get x >> 32 which gives undefined behaviour according to the
	C standard (6.5.7, Bitwise shift operators). To portably support n ==
	0, one has to write the rotate as something like

	  (x << n) | (x >> ((-n) & 31))

	And this is apparently not recognized by gcc.

Note that this is broken on older GCCs and will result in slower ROL.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d7e35dfa

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 626d114f

Linus Torvalds authored Dec 09, 2015

Pull vfs fixes from Al Viro:
 "A couple of fixes, both -stable fodder (9p one all way back to 2.6.32,
  dio - to all branches where "Fix negative return from dio read beyond
  eof" will end up it; it's a fixup to commit marked for -stable)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix the regression from "direct-io: Fix negative return from dio read beyond eof"
  9p: ->evict_inode() should kick out ->i_data, not ->i_mapping

626d114f

Merge tag 'pci-v4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 978d6a90

Linus Torvalds authored Dec 09, 2015

Pull PCI fixes from Bjorn Helgaas:
 "These are more fixes I'd like to have in v4.4.  Several for the Altera
  driver added for v4.4, and one for an MSI domain problem that affects
  several arm64 platforms:

  MSI:
   - Only use the generic MSI layer when domain is hierarchical (Marc
     Zyngier)

  Altera host bridge driver:
   - Fix loop in tlp_read_packet() (Dan Carpenter)
   - Fix Requester ID for config accesses (Ley Foon Tan)
   - Check TLP completion status (Ley Foon Tan)
   - Fix error when INTx is 4 (Ley Foon Tan)"

* tag 'pci-v4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: altera: Fix error when INTx is 4
  PCI: altera: Check TLP completion status
  PCI: altera: Fix Requester ID for config accesses
  PCI: altera: Fix loop in tlp_read_packet()
  PCI/MSI: Only use the generic MSI layer when domain is hierarchical

978d6a90

of/irq: move of_msi_map_rid declaration to the correct ifdef section · eaddb572

Rob Herring authored Dec 09, 2015

In checking fixes for of_irq_find_parent declaration location, I found
that of_msi_map_rid is also wrong. of_msi_map_rid is not implemented for
Sparc, so it should not be in the Sparc specific section of the header.
Move it to just depend on OF_IRQ.

Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>

eaddb572

of/irq: Export of_irq_find_parent again · 4c3141e0

Carlo Caione authored Dec 01, 2015

of_irq_find_parent was made static since it had no users outside of
of_irq.c. Export it again since we are going to use it again.
Signed-off-by: Carlo Caione <carlo@endlessm.com>
[robh: move of_irq_find_parent to correct ifdef section]
Signed-off-by: Rob Herring <robh@kernel.org>

4c3141e0

Merge branch 'for-linus-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · aa536855

Linus Torvalds authored Dec 08, 2015

Pull uml fixes from Richard Weinberger:
 "This contains various bug fixes, most of them are fall out from the
  merge window"

* 'for-linus-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: fix returns without va_end
  um: Fix fpstate handling
  arch: um: fix error when linking vmlinux.
  um: Fix get_signal() usage

aa536855

08 Dec, 2015 10 commits

Merge branch 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 5406812e

Linus Torvalds authored Dec 08, 2015

Pull cgroup fixes from Tejun Heo:
 "More change than I'd have liked at this stage.  The pids controller
  and the changes made to cgroup core to support it introduced and
  revealed several important issues.

   - Assigning membership to a newly created task and migrating it can
     race leading to incorrect accounting.  Oleg fixed it by widening
     threadgroup synchronization.  It looks like we'll be able to merge
     it with a different percpu rwsem which is used in fork path making
     things simpler and cheaper.

   - The recent change to extend cgroup membership to zombies (so that
     pid accounting can extend till the pid is actually released) missed
     pinning the underlying data structures leading to use-after-free.
     Fixed.

   - v2 hierarchy was calling subsystem callbacks with the wrong target
     cgroup_subsys_state based on the incorrect assumption that they
     share the same target.  pids is the first controller affected by
     this.  Subsys callbacks updated so that they can deal with
     multi-target migrations"

* 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup_pids: don't account for the root cgroup
  cgroup: fix handling of multi-destination migration from subtree_control enabling
  cgroup_freezer: simplify propagation of CGROUP_FROZEN clearing in freezer_attach()
  cgroup: pids: kill pids_fork(), simplify pids_can_fork() and pids_cancel_fork()
  cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()
  cgroup: make css_set pin its css's to avoid use-afer-free
  cgroup: fix cftype->file_offset handling

5406812e

Merge branch 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · 633bb738

Linus Torvalds authored Dec 08, 2015

Pull libata fixes from Tejun Heo:
 "Nothing too interesting.  All are device specific additions and
  workarounds"

* 'for-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ata/sata_fsl.c: add ATA_FLAG_NO_LOG_PAGE to blacklist the controller for log page reads
  libata-eh.c: Introduce new ata port flag for controller which lockup on read log page
  sata_sil: disable trim
  AHCI: Fix softreset failed issue of Port Multiplier
  sata/mvebu: use #ifdef around suspend/resume code
  ahci: Order SATA device IDs for codename Lewisburg
  ahci: Add Device ID for Intel Sunrise Point PCH

633bb738

um: fix returns without va_end · 887a9853

Geyslan G. Bem authored Dec 01, 2015

When using va_list ensure that va_start will be followed by va_end.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

887a9853

um: Fix fpstate handling · 8090bfd2

Richard Weinberger authored Nov 29, 2015

The x86 FPU cleanup changed fpstate to a plain integer.
UML on x86 has to deal with that too.
Signed-off-by: Richard Weinberger <richard@nod.at>

8090bfd2

arch: um: fix error when linking vmlinux. · fb1770aa

Lorenzo Colitti authored Nov 18, 2015

On gcc Ubuntu 4.8.4-2ubuntu1~14.04, linking vmlinux fails with:

arch/um/os-Linux/built-in.o: In function `os_timer_create':
/android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create'
arch/um/os-Linux/built-in.o: In function `os_timer_set_interval':
/android/kernel/android/arch/um/os-Linux/time.c:84: undefined reference to `timer_settime'
arch/um/os-Linux/built-in.o: In function `os_timer_remain':
/android/kernel/android/arch/um/os-Linux/time.c:109: undefined reference to `timer_gettime'
arch/um/os-Linux/built-in.o: In function `os_timer_one_shot':
/android/kernel/android/arch/um/os-Linux/time.c:132: undefined reference to `timer_settime'
arch/um/os-Linux/built-in.o: In function `os_timer_disable':
/android/kernel/android/arch/um/os-Linux/time.c:145: undefined reference to `timer_settime'

This is because -lrt appears in the generated link commandline
after arch/um/os-Linux/built-in.o. Fix this by removing -lrt from
arch/um/Makefile and adding it to the UM-specific section of
scripts/link-vmlinux.sh.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

fb1770aa

um: Fix get_signal() usage · db2f24dc

Richard Weinberger authored Nov 18, 2015

If get_signal() returns us a signal to post
we must not call it again, otherwise the already
posted signal will be overridden.
Before commit a610d6e6 this was the case as we stopped
the while after a successful handle_signal().

Cc: <stable@vger.kernel.org> # 3.10-
Fixes: a610d6e6 ("pull clearing RESTORE_SIGMASK into block_sigmask()")
Signed-off-by: Richard Weinberger <richard@nod.at>

db2f24dc

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 51825c8a

Linus Torvalds authored Dec 08, 2015

Pull perf fixes from Ingo Molnar:
 "This tree includes four core perf fixes for misc bugs, three fixes to
  x86 PMU drivers, and two updates to old email addresses"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do not send exit event twice
  perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro
  perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell
  perf: Fix PERF_EVENT_IOC_PERIOD deadlock
  treewide: Remove old email address
  perf/x86: Fix LBR call stack save/restore
  perf: Update email address in MAINTAINERS
  perf/core: Robustify the perf_cgroup_from_task() RCU checks
  perf/core: Fix RCU problem with cgroup context switching code

51825c8a

fix the regression from "direct-io: Fix negative return from dio read beyond eof" · 2d4594ac

Al Viro authored Dec 08, 2015

Sure, it's better to bail out of past-the-eof read and return 0 than return
a bogus negative value on such.  Only we'd better make sure we are bailing out
with 0 and not -ENOMEM...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

2d4594ac

9p: ->evict_inode() should kick out ->i_data, not ->i_mapping · 4ad78628

Al Viro authored Dec 08, 2015

For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own ->i_data empty and ->i_mapping pointing
to the (unique per major/minor) bdevfs inode.  That guarantees
cache coherence between all block device inodes with the same
device number.

Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, ->i_mapping is only
safe to access when the thing is opened.  At the time of
->evict_inode() the victim is definitely *not* opened.  We are
about to kill the address space embedded into struct inode
(inode->i_data) and that's what we need to empty of any pages.

9p instance tries to empty inode->i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.

Fortunately, other instances in the tree are OK; they are
evicting from &inode->i_data instead, as 9p one should.

Cc: stable@vger.kernel.org # v2.6.32+, ones prior to 2.6.36 need only half of that
Reported-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com>
Tested-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4ad78628

of/fdt: Add mutex protection for calls to __unflatten_device_tree() · f8062386

Guenter Roeck authored Dec 05, 2015

__unflatten_device_tree() calls unflatten_dt_node(), which declares
a static variable. It is therefore not reentrant.

One of the callers of __unflatten_device_tree(), unflatten_device_tree(),
is only called once during early initialization and does not need to be
protected. The other caller, of_fdt_unflatten_tree(), can be called at
any time, possibly multiple times in parallel. This can happen, for
example, if multiple devicetree overlays have to be loaded and installed.

Without this protection, errors such as the following may be seen.

kernel: End of tree marker overwritten: e6a3a458
kernel: find_target_node:
	Failed to find target-indirect node at /fragment@0
kernel: __of_overlay_create: of_build_overlay_info() failed for tree@/

Add a mutex to of_fdt_unflatten_tree() to make the call reentrant.

Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Rob Herring <robh@kernel.org>

f8062386

07 Dec, 2015 9 commits

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 62ea1ec5

Linus Torvalds authored Dec 07, 2015

Pull virtio fixes from Michael Tsirkin:
 "This includes some fixes and cleanups in virtio and vhost code.

  Most notably, shadowing the index fixes the excessive cacheline
  bouncing observed on AMD platforms"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_ring: shadow available ring flags & index
  virtio: Do not drop __GFP_HIGH in alloc_indirect
  vhost: replace % with & on data path
  tools/virtio: fix byteswap logic
  tools/virtio: move list macro stubs
  virtio: fix memory leak of virtio ida cache layers
  vhost: relax log address alignment
  virtio-net: Stop doing DMA from the stack

62ea1ec5

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · f41683a2

Linus Torvalds authored Dec 07, 2015

Pull ext4 fixes from Ted Ts'o:
 "Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings,
  some endian conversion problems with ext4 encryption, potential memory
  leaks after truncate in data=journal mode, and an ocfs2 regression
  caused by a jbd2 performance improvement"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix null committed data return in undo_access
  ext4: add "static" to ext4_seq_##name##_fops struct
  ext4: fix an endianness bug in ext4_encrypted_follow_link()
  ext4: fix an endianness bug in ext4_encrypted_zeroout()
  jbd2: Fix unreclaimed pages after truncate in data=journal mode
  ext4: Fix handling of extended tv_sec

f41683a2

virtio_ring: shadow available ring flags & index · f277ec42

Venkatesh Srinivas authored Nov 10, 2015

Improves cacheline transfer flow of available ring header.

Virtqueues are implemented as a pair of rings, one producer->consumer
avail ring and one consumer->producer used ring; preceding the
avail ring in memory are two contiguous u16 fields -- avail->flags
and avail->idx. A producer posts work by writing to avail->idx and
a consumer reads avail->idx.

The flags and idx fields only need to be written by a producer CPU
and only read by a consumer CPU; when the producer and consumer are
running on different CPUs and the virtio_ring code is structured to
only have source writes/sink reads, we can continuously transfer the
avail header cacheline between 'M' states between cores. This flow
optimizes core -> core bandwidth on certain CPUs.

(see: "Software Optimization Guide for AMD Family 15h Processors",
Section 11.6; similar language appears in the 10h guide and should
apply to CPUs w/ exclusive caches, using LLC as a transfer cache)

Unfortunately the existing virtio_ring code issued reads to the
avail->idx and read-modify-writes to avail->flags on the producer.

This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.

In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:

(w/o shadowing):
 Performance counter stats for './vring_bench':
     5,451,082,016      L1-dcache-loads
     ...
       2.221477739 seconds time elapsed

(w/ shadowing):
 Performance counter stats for './vring_bench':
     5,405,701,361      L1-dcache-loads
     ...
       2.168405376 seconds time elapsed

The further away (in a NUMA sense) virtio producers and consumers are
from each other, the more we expect to benefit. Physical implementations
of virtio devices and implementations of virtio where the consumer polls
vring avail indexes (vhost) should also benefit.
Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

f277ec42

virtio: Do not drop __GFP_HIGH in alloc_indirect · 82107539

Michal Hocko authored Dec 01, 2015

b92b1b89 ("virtio: force vring descriptors to be allocated from
lowmem") tried to exclude highmem pages for descriptors so it cleared
__GFP_HIGHMEM from a given gfp mask. The patch also cleared __GFP_HIGH
which doesn't make much sense for this fix because __GFP_HIGH only
controls access to memory reserves and it doesn't have any influence
on the zone selection. Some of the call paths use GFP_ATOMIC and
dropping __GFP_HIGH will reduce their changes for success because the
lack of access to memory reserves.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>

82107539

vhost: replace % with & on data path · 5fba13b5

Michael S. Tsirkin authored Nov 29, 2015

We know vring num is a power of 2, so use &
to mask the high bits.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

5fba13b5

tools/virtio: fix byteswap logic · 55564a02

Michael S. Tsirkin authored Nov 29, 2015

commit cf561f0d ("virtio: introduce
virtio_is_little_endian() helper") changed byteswap logic to
skip feature bit checks for LE platforms, but didn't
update tools/virtio, so vring_bench started failing.

Update the copy under tools/virtio/ (TODO: find a way to avoid this code
duplication).

Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

55564a02

tools/virtio: move list macro stubs · 40c172e5
Michael S. Tsirkin authored Nov 29, 2015
```
Makes them more generally available.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
```
40c172e5

virtio: fix memory leak of virtio ida cache layers · c13f99b7

Suman Anna authored Sep 16, 2015

The virtio core uses a static ida named virtio_index_ida for
assigning index numbers to virtio devices during registration.
The ida core may allocate some internal idr cache layers and
an ida bitmap upon any ida allocation, and all these layers are
truely freed only upon the ida destruction. The virtio_index_ida
is not destroyed at present, leading to a memory leak when using
the virtio core as a module and atleast one virtio device is
registered and unregistered.

Fix this by invoking ida_destroy() in the virtio core module
exit.

Cc: stable@vger.kernel.org
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

c13f99b7

vhost: relax log address alignment · d5424838

Michael S. Tsirkin authored Nov 16, 2015

commit 5d9a07b0 ("vhost: relax used
address alignment") fixed the alignment for the used virtual address,
but not for the physical address used for logging.

That's a mistake: alignment should clearly be the same for virtual and
physical addresses,

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

d5424838