Commits · 525d5ef5ae91f4114a819b3e6a27e6e040328e29 · Kirill Smelkov / linux

25 May, 2017 40 commits

cxl: Route eeh events to all drivers in cxl_pci_error_detected() · 525d5ef5

Vaibhav Jain authored Apr 27, 2017

commit 4f58f0bf upstream.

Fix a boundary condition where in some cases an eeh event that results
in card reset isn't passed on to a driver attached to the virtual PCI
device associated with a slice. This will happen in case when a slice
attached device driver returns a value other than
PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
would result in an early return from cxl_pci_error_detected() and
other drivers attached to other AFUs on the card wont be notified.

The patch fixes this by making sure that all slice attached
device-drivers are notified and the return values from
error_detected() callback are aggregated in a scheme where request for
'disconnect' trumps all and 'none' trumps 'need_reset'.

Fixes: 9e8df8a2 ("cxl: EEH support")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

525d5ef5

cxl: Force context lock during EEH flow · 4a0e8757

Vaibhav Jain authored Apr 27, 2017

commit ea9a26d1 upstream.

During an eeh event when the cxl card is fenced and card sysfs attr
perst_reloads_same_image is set following warning message is seen in the
kernel logs:

  Adapter context unlocked with 0 active contexts
  ------------[ cut here ]------------
  WARNING: CPU: 12 PID: 627 at
  ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]

Even though this warning is harmless, it clutters the kernel log
during an eeh event. This warning is triggered as the EEH callback
cxl_pci_error_detected doesn't obtain a context-lock before forcibly
detaching all active context and when context-lock is released during
call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
cxl_adapter_context_unlock is triggered.

To fix this warning, we acquire the adapter context-lock via
cxl_adapter_context_lock() in the eeh callback
cxl_pci_error_detected() once all the virtual AFU PHBs are notified
and their contexts detached. The context-lock is released in
cxl_pci_slot_reset() after the adapter is successfully reconfigured
and before the we call the slot_reset callback on slice attached
device-drivers.

Fixes: 70b565bb ("cxl: Prevent adapter reset if an active context exists")
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Tested-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4a0e8757

ohci-pci: add qemu quirk · 6c58e144

Gerd Hoffmann authored Mar 20, 2017

commit 21a60f6e upstream.

On a loaded virtualization host (dozen guests booting at the same time)
it may happen that the ohci controller emulation doesn't manage to do
timely frame processing, with the result that the io watchdog fires and
considers the controller being dead, even though it's only the emulation
being unusual slow due to the load peak.

So, add a quirk for qemu and don't use the watchdog in case we figure we
are running on emulated ohci.  The virtual ohci controller masquerades
as apple ohci controller, but we can identify it by subsystem id.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6c58e144

cdc-acm: fix possible invalid access when processing notification · a83f3099

Tobias Herzog authored Mar 30, 2017

commit 1bb9914e upstream.

Notifications may only be 8 bytes long. Accessing the 9th and
10th byte of unimplemented/unknown notifications may be insecure.
Also check the length of known notifications before accessing anything
behind the 8th byte.
Signed-off-by: Tobias Herzog <t-herzog@gmx.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a83f3099

gpio: omap: return error if requested debounce time is not possible · 9541621a

David Rivshin authored Apr 24, 2017

commit 83977443 upstream.

omap_gpio_debounce() does not validate that the requested debounce
is within a range it can handle. Instead it lets the register value
wrap silently, and always returns success.

This can lead to all sorts of unexpected behavior, such as gpio_keys
asking for a too-long debounce, but getting a very short debounce in
practice.

Fix this by returning -EINVAL if the requested value does not fit into
the register field. If there is no debounce clock available at all,
return -ENOTSUPP.

Fixes: e85ec6c3 ("gpio: omap: fix omap2_set_gpio_debounce")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Acked-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9541621a

drm/nouveau/tmr: handle races with hw when updating the next alarm time · 33f379db

Ben Skeggs authored May 11, 2017

commit 1b0f8438 upstream.

If the time to the next alarm is short enough, we could race with HW and
end up with an ~4 second delay until it triggers.

Fix this by checking again after we update HW.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

33f379db

drm/nouveau/tmr: avoid processing completed alarms when adding a new one · 53b8e320

Ben Skeggs authored May 11, 2017

commit 330bdf62 upstream.

The idea here was to avoid having to "manually" program the HW if there's
a new earliest alarm.  This was lazy and bad, as it leads to loads of fun
races between inter-related callers (ie. therm).

Turns out, it's not so difficult after all.  Go figure ;)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

53b8e320

drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm · 952030b8

Ben Skeggs authored May 11, 2017

commit 9fc64667 upstream.

At least therm/fantog "attempts" to work around this issue, which could
lead to corruption of the pending alarm list.

Fix it properly by not updating the timestamp without the lock held, or
trying to add an already pending alarm to the pending alarm list....
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

952030b8

drm/nouveau/tmr: ack interrupt before processing alarms · 4c168033

Ben Skeggs authored May 11, 2017

commit 3733bd8b upstream.

Fixes a race where we can miss an alarm that triggers while we're already
processing previous alarms.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4c168033

drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes · a5e41c9e

Ben Skeggs authored May 01, 2017

commit e6db9579 upstream.

The DRM core used to only call prepare_fb/cleanup_fb() when a plane's
framebuffer changed, which achieved the desired effect.

It's apparently now up to the driver to decide on its own.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a5e41c9e

drm/nouveau/kms/nv50: fix source-rect-only plane updates · 4aecf3a7

Ben Skeggs authored May 01, 2017

commit 36601c2b upstream.

This "optimisation" (which was originally meant to skip updating cursor
settings in the core channel on position-only updates) turned out to be
pointless in the final design of the code before it was merged.

Remove it completely, as it breaks other cases.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4aecf3a7

drm/nouveau/therm: remove ineffective workarounds for alarm bugs · 2138faf0

Ben Skeggs authored May 11, 2017

commit e4311ee5 upstream.

These were ineffective due to touching the list without the alarm lock,
but should no longer be required.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2138faf0

drm/amdgpu: Add missing lb_vblank_lead_lines setup to DCE-6 path. · 21d3947b

Mario Kleiner authored Apr 24, 2017

commit effaf848 upstream.

This apparently got lost when implementing the new DCE-6 support
and would cause failures in pageflip scheduling and timestamping.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

21d3947b

drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations. · 3f0db81c

Mario Kleiner authored Mar 29, 2017

commit e190ed1e upstream.

At dot clocks > approx. 250 Mhz, some of these calcs will overflow and
cause miscalculation of latency watermarks, and for some overflows also
divide-by-zero driver crash ("divide error: 0000 [#1] PREEMPT SMP" in
"dce_v10_0_latency_watermark+0x12d/0x190").

This zero-divide happened, e.g., on AMD Tonga Pro under DCE-10,
on a Displayport panel when trying to set a video mode of 2560x1440
at 165 Hz vrefresh with a dot clock of 635.540 Mhz.

Refine calculations to avoid the overflows.

Tested for DCE-10 with R9 380 Tonga + ASUS ROG PG279 panel.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3f0db81c

drm/amdgpu: Make display watermark calculations more accurate · de2fa45a

Mario Kleiner authored Mar 29, 2017

commit d63c277d upstream.

Avoid big roundoff errors in scanline/hactive durations for
high pixel clocks, especially for >= 500 Mhz, and thereby
program more accurate display fifo watermarks.

Implemented here for DCE 6,8,10,11.
Successfully tested on DCE 10 with AMD R9 380 Tonga.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

de2fa45a

ath9k_htc: fix NULL-deref at probe · fab44023

Johan Hovold authored Mar 13, 2017

commit ebeb3667 upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.

Fixes: 36bcce43 ("ath9k_htc: Handle storage devices")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fab44023

ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device · 43068b45

Dmitry Tunin authored Mar 08, 2017

commit 16ff1fb0 upstream.

T:  Bus=01 Lev=02 Prnt=02 Port=02 Cnt=01 Dev#=  7 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=ff(vend.) Sub=ff Prot=ff MxPS=64 #Cfgs=  1
P:  Vendor=1eda ProdID=2315 Rev=01.08
S:  Manufacturer=ATHEROS
S:  Product=USB2.0 WLAN
S:  SerialNumber=12345
C:  #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 6 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

43068b45

s390/cputime: fix incorrect system time · 82e31de0

Martin Schwidefsky authored May 02, 2017

commit 07a63cbe upstream.

git commit c5328901 "[S390] entry[64].S improvements" removed
the update of the exit_timer lowcore field from the critical section
cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done
blocks. If the PSW is updated by the critical section cleanup to point to
user space again, the interrupt entry code will do a vtime calculation
after the cleanup completed with an exit_timer value which has *not* been
updated. Due to this incorrect system time deltas are calculated.

If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done
or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry
time of the interrupt.
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

82e31de0

s390/kdump: Add final note · 5d1adbaf

Michael Holzheu authored Mar 23, 2017

commit dcc00b79 upstream.

Since linux v3.14 with commit 38dfac84 ("vmcore: prevent PT_NOTE
p_memsz overflow during header update") on s390 we get the following
message in the kdump kernel:

  Warning: Exceeded p_memsz, dropping PT_NOTE entry n_namesz=0x6b6b6b6b,
  n_descsz=0x6b6b6b6b

The reason for this is that we don't create a final zero note in
the ELF header which the proc/vmcore code uses to find out the end
of the notes section (see also kernel/kexec_core.c:final_note()).

It still worked on s390 by chance because we (most of the time?) have the
byte pattern 0x6b6b6b6b after the notes section which also makes the notes
parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is
interpreded as note size:

  if ((real_sz + sz) > max_sz) {
          pr_warn("Warning: Exceeded p_memsz, dropping P ...);
          break;
  }

So fix this and add the missing final note to the ELF header.
We don't have to adjust the memory size for ELF header ("alloc_size")
because the new ELF note still fits into the 0x1000 base memory.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5d1adbaf

regulator: tps65023: Fix inverted core enable logic. · d2cc3a8c

Richard Cochran authored Apr 17, 2017

commit c90722b5 upstream.

Commit 43530b69 ("regulator: Use
regmap_read/write(), regmap_update_bits functions directly") intended
to replace working inline helper functions with standard regmap
calls.  However, it also inverted the set/clear logic of the "CORE ADJ
Allowed" bit.  That patch was clearly never tested, since without that
bit cleared, the core VDCDC1 voltage output does not react to I2C
configuration changes.

This patch fixes the issue by clearing the bit as in the original,
correct implementation.  Note for stable back porting that, due to
subsequent driver churn, this patch will not apply on every kernel
version.

Fixes: 43530b69 ("regulator: Use regmap_read/write(), regmap_update_bits functions directly")
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d2cc3a8c

regulator: rk808: Fix RK818 LDO2 · fde4f790

Wadim Egorov authored Mar 22, 2017

commit 75f88115 upstream.

Set the correct voltage select register for LDO2.
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fde4f790

x86: fix 32-bit case of __get_user_asm_u64() · efb209c8

Linus Torvalds authored May 21, 2017

commit 33c9e972 upstream.

The code to fetch a 64-bit value from user space was entirely buggered,
and has been since the code was merged in early 2016 in commit
b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
kernels").

Happily the buggered routine is almost certainly entirely unused, since
the normal way to access user space memory is just with the non-inlined
"get_user()", and the inlined version didn't even historically exist.

The normal "get_user()" case is handled by external hand-written asm in
arch/x86/lib/getuser.S that doesn't have either of these issues.

There were two independent bugs in __get_user_asm_u64():

 - it still did the STAC/CLAC user space access marking, even though
   that is now done by the wrapper macros, see commit 11f1a4b9
   ("x86: reorganize SMAP handling in user space accesses").

   This didn't result in a semantic error, it just means that the
   inlined optimized version was hugely less efficient than the
   allegedly slower standard version, since the CLAC/STAC overhead is
   quite high on modern Intel CPU's.

 - the double register %eax/%edx was marked as an output, but the %eax
   part of it was touched early in the asm, and could thus clobber other
   inputs to the asm that gcc didn't expect it to touch.

   In particular, that meant that the generated code could look like
   this:

        mov    (%eax),%eax
        mov    0x4(%eax),%edx

   where the load of %edx obviously was _supposed_ to be from the 32-bit
   word that followed the source of %eax, but because %eax was
   overwritten by the first instruction, the source of %edx was
   basically random garbage.

The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
the 64-bit output as early-clobber to let gcc know that no inputs should
alias with the output register.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

efb209c8

KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation · 5bbaf88d

Wanpeng Li authored May 19, 2017

commit cbfc6c91 upstream.

Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.

- "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
  a read should be ignored) in guest can get a random number.
- "rep insb" instruction to access PIT register port 0x43 can control memcpy()
  in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
  which will disclose the unimportant kernel memory in host but no crash.

The similar test program below can reproduce the read out-of-bounds vulnerability:

void hexdump(void *mem, unsigned int len)
{
        unsigned int i, j;

        for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
        {
                /* print offset */
                if(i % HEXDUMP_COLS == 0)
                {
                        printf("0x%06x: ", i);
                }

                /* print hex data */
                if(i < len)
                {
                        printf("%02x ", 0xFF & ((char*)mem)[i]);
                }
                else /* end of block, just aligning for ASCII dump */
                {
                        printf("   ");
                }

                /* print ASCII dump */
                if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
                {
                        for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
                        {
                                if(j >= len) /* end of block, not really printing */
                                {
                                        putchar(' ');
                                }
                                else if(isprint(((char*)mem)[j])) /* printable char */
                                {
                                        putchar(0xFF & ((char*)mem)[j]);
                                }
                                else /* other char */
                                {
                                        putchar('.');
                                }
                        }
                        putchar('\n');
                }
        }
}

int main(void)
{
	int i;
	if (iopl(3))
	{
		err(1, "set iopl unsuccessfully\n");
		return -1;
	}
	static char buf[0x40];

	/* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */

	memset(buf, 0xab, sizeof(buf));

	asm volatile("push %rdi;");
	asm volatile("mov %0, %%rdi;"::"q"(buf));

	asm volatile ("mov $0x40, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x41, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x42, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x43, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x44, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x45, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("pop %rdi;");
	hexdump(buf, 0x40);

	printf("\n");

	/* ins port 0x40 */

	memset(buf, 0xab, sizeof(buf));

	asm volatile("push %rdi;");
	asm volatile("mov %0, %%rdi;"::"q"(buf));

	asm volatile ("mov $0x20, %rcx;");
	asm volatile ("mov $0x40, %rdx;");
	asm volatile ("rep insb;");

	asm volatile ("pop %rdi;");
	hexdump(buf, 0x40);

	printf("\n");

	/* ins port 0x43 */

	memset(buf, 0xab, sizeof(buf));

	asm volatile("push %rdi;");
	asm volatile("mov %0, %%rdi;"::"q"(buf));

	asm volatile ("mov $0x20, %rcx;");
	asm volatile ("mov $0x43, %rdx;");
	asm volatile ("rep insb;");

	asm volatile ("pop %rdi;");
	hexdump(buf, 0x40);

	printf("\n");
	return 0;
}

The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
w/o clear after using which results in some random datas are left over in
the buffer. Guest reads port 0x43 will be ignored since it is write only,
however, the function kernel_pio() can't distigush this ignore from successfully
reads data from device's ioport. There is no new data fill the buffer from
port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
the buffer to the guest unconditionally. This patch fixes it by clearing the
buffer before in instruction emulation to avoid to grant guest the stale data
in the buffer.

In addition, string I/O is not supported for in kernel device. So there is no
iteration to read ioport %RCX times for string I/O. The function kernel_pio()
just reads one round, and then copy the io size * %RCX to the guest unconditionally,
actually it copies the one round ioport data w/ other random datas which are left
over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
introducing the string I/O support for in kernel device in order to grant the right
ioport datas to the guest.

Before the patch:

0x000000: fe 38 93 93 ff ff ab ab .8......
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

After the patch:

0x000000: 1e 02 f8 00 ff ff ab ab ........
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: d2 e2 d2 df d2 db d2 d7 ........
0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: 00 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 00 00 00 00 ........
0x000018: 00 00 00 00 00 00 00 00 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
Reported-by: Moguofang <moguofang@huawei.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Moguofang <moguofang@huawei.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5bbaf88d

KVM: x86: Fix potential preemption when get the current kvmclock timestamp · b67dcf8c

Wanpeng Li authored May 11, 2017

commit e2c2206a upstream.

 BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/2809
 caller is __this_cpu_preempt_check+0x13/0x20
 CPU: 2 PID: 2809 Comm: qemu-system-x86 Not tainted 4.11.0+ #13
 Call Trace:
  dump_stack+0x99/0xce
  check_preemption_disabled+0xf5/0x100
  __this_cpu_preempt_check+0x13/0x20
  get_kvmclock_ns+0x6f/0x110 [kvm]
  get_time_ref_counter+0x5d/0x80 [kvm]
  kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
  ? kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
  ? kvm_arch_vcpu_ioctl_run+0xac9/0x1ce0 [kvm]
  kvm_arch_vcpu_ioctl_run+0x5bf/0x1ce0 [kvm]
  kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? __fget+0xf3/0x210
  do_vfs_ioctl+0xa4/0x700
  ? __fget+0x114/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2
 RIP: 0033:0x7f9d164ed357
  ? __this_cpu_preempt_check+0x13/0x20

This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/
CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled.

Safe access to per-CPU data requires a couple of constraints, though: the
thread working with the data cannot be preempted and it cannot be migrated
while it manipulates per-CPU variables. If the thread is preempted, the
thread that replaces it could try to work with the same variables; migration
to another CPU could also cause confusion. However there is no preemption
disable when reads host per-CPU tsc rate to calculate the current kvmclock
timestamp.

This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both
__this_cpu_read() and rdtsc() are not preempted.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b67dcf8c

KVM: x86: Fix load damaged SSEx MXCSR register · 8223e8f9

Wanpeng Li authored May 11, 2017

commit a575813b upstream.

Reported by syzkaller:

   BUG: unable to handle kernel paging request at ffffffffc07f6a2e
   IP: report_bug+0x94/0x120
   PGD 348e12067
   P4D 348e12067
   PUD 348e14067
   PMD 3cbd84067
   PTE 80000003f7e87161

   Oops: 0003 [#1] SMP
   CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G           OE   4.11.0+ #8
   task: ffff92fdfb525400 task.stack: ffffbda6c3d04000
   RIP: 0010:report_bug+0x94/0x120
   RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202
    do_trap+0x156/0x170
    do_error_trap+0xa3/0x170
    ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
    ? mark_held_locks+0x79/0xa0
    ? retint_kernel+0x10/0x10
    ? trace_hardirqs_off_thunk+0x1a/0x1c
    do_invalid_op+0x20/0x30
    invalid_op+0x1e/0x30
   RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
    ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm]
    kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm]
    kvm_vcpu_ioctl+0x384/0x780 [kvm]
    ? kvm_vcpu_ioctl+0x384/0x780 [kvm]
    ? sched_clock+0x13/0x20
    ? __do_page_fault+0x2a0/0x550
    do_vfs_ioctl+0xa4/0x700
    ? up_read+0x1f/0x40
    ? __do_page_fault+0x2a0/0x550
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x23/0xc2

SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
a 1 to any of these bits will cause a general-protection exception(#GP) to be
generated". The syzkaller forks' testcase overrides xsave area w/ random values
and steps on the reserved bits of MXCSR register. The damaged MXCSR register
values of guest will be restored to SSEx MXCSR register before vmentry. This
patch fixes it by catching userspace override MXCSR register reserved bits w/
random values and bails out immediately.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8223e8f9

ima: accept previously set IMA_NEW_FILE · efa8cd1e

Daniel Glöckner authored Feb 24, 2017

commit 1ac202e9 upstream.

Modifying the attributes of a file makes ima_inode_post_setattr reset
the IMA cache flags. So if the file, which has just been created,
is opened a second time before the first file descriptor is closed,
verification fails since the security.ima xattr has not been written
yet. We therefore have to look at the IMA_NEW_FILE even if the file
already existed.

With this patch there should no longer be an error when cat tries to
open testfile:

$ rm -f testfile
$ ( echo test >&3 ; touch testfile ; cat testfile ) 3>testfile

A file being new is no reason to accept that it is missing a digital
signature demanded by the policy.
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

efa8cd1e

mwifiex: pcie: fix cmd_buf use-after-free in remove/reset · d5811285

Brian Norris authored Apr 14, 2017

commit 3c8cb9ad upstream.

Command buffers (skb's) are allocated by the main driver, and freed upon
the last use. That last use is often in mwifiex_free_cmd_buffer(). In
the meantime, if the command buffer gets used by the PCI driver, we map
it as DMA-able, and store the mapping information in the 'cb' memory.

However, if a command was in-flight when resetting the device (and
therefore was still mapped), we don't get a chance to unmap this memory
until after the core has cleaned up its command handling.

Let's keep a refcount within the PCI driver, so we ensure the memory
only gets freed after we've finished unmapping it.

Noticed by KASAN when forcing a reset via:

  echo 1 > /sys/bus/pci/.../reset

The same code path can presumably be exercised in remove() and
shutdown().

[  205.390377] mwifiex_pcie 0000:01:00.0: info: shutdown mwifiex...
[  205.400393] ==================================================================
[  205.407719] BUG: KASAN: use-after-free in mwifiex_unmap_pci_memory.isra.14+0x4c/0x100 [mwifiex_pcie] at addr ffffffc0ad471b28
[  205.419040] Read of size 16 by task bash/1913
[  205.423421] =============================================================================
[  205.431625] BUG skbuff_head_cache (Tainted: G    B          ): kasan: bad access detected
[  205.439815] -----------------------------------------------------------------------------
[  205.439815]
[  205.449534] INFO: Allocated in __build_skb+0x48/0x114 age=1311 cpu=4 pid=1913
[  205.456709] 	alloc_debug_processing+0x124/0x178
[  205.461282] 	___slab_alloc.constprop.58+0x528/0x608
[  205.466196] 	__slab_alloc.isra.54.constprop.57+0x44/0x54
[  205.471542] 	kmem_cache_alloc+0xcc/0x278
[  205.475497] 	__build_skb+0x48/0x114
[  205.479019] 	__netdev_alloc_skb+0xe0/0x170
[  205.483244] 	mwifiex_alloc_cmd_buffer+0x68/0xdc [mwifiex]
[  205.488759] 	mwifiex_init_fw+0x40/0x6cc [mwifiex]
[  205.493584] 	_mwifiex_fw_dpc+0x158/0x520 [mwifiex]
[  205.498491] 	mwifiex_reinit_sw+0x2c4/0x398 [mwifiex]
[  205.503510] 	mwifiex_pcie_reset_notify+0x114/0x15c [mwifiex_pcie]
[  205.509643] 	pci_reset_notify+0x5c/0x6c
[  205.513519] 	pci_reset_function+0x6c/0x7c
[  205.517567] 	reset_store+0x68/0x98
[  205.521003] 	dev_attr_store+0x54/0x60
[  205.524705] 	sysfs_kf_write+0x9c/0xb0
[  205.528413] INFO: Freed in __kfree_skb+0xb0/0xbc age=131 cpu=4 pid=1913
[  205.535064] 	free_debug_processing+0x264/0x370
[  205.539550] 	__slab_free+0x84/0x40c
[  205.543075] 	kmem_cache_free+0x1c8/0x2a0
[  205.547030] 	__kfree_skb+0xb0/0xbc
[  205.550465] 	consume_skb+0x164/0x178
[  205.554079] 	__dev_kfree_skb_any+0x58/0x64
[  205.558304] 	mwifiex_free_cmd_buffer+0xa0/0x158 [mwifiex]
[  205.563817] 	mwifiex_shutdown_drv+0x578/0x5c4 [mwifiex]
[  205.569164] 	mwifiex_shutdown_sw+0x178/0x310 [mwifiex]
[  205.574353] 	mwifiex_pcie_reset_notify+0xd4/0x15c [mwifiex_pcie]
[  205.580398] 	pci_reset_notify+0x5c/0x6c
[  205.584274] 	pci_dev_save_and_disable+0x24/0x6c
[  205.588837] 	pci_reset_function+0x30/0x7c
[  205.592885] 	reset_store+0x68/0x98
[  205.596324] 	dev_attr_store+0x54/0x60
[  205.600017] 	sysfs_kf_write+0x9c/0xb0
...
[  205.800488] Call trace:
[  205.802980] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
[  205.808415] [<ffffffc00020a96c>] show_stack+0x20/0x28
[  205.813506] [<ffffffc0005d020c>] dump_stack+0xa4/0xcc
[  205.818598] [<ffffffc0003be44c>] print_trailer+0x158/0x168
[  205.824120] [<ffffffc0003be5f0>] object_err+0x4c/0x5c
[  205.829210] [<ffffffc0003c45bc>] kasan_report+0x334/0x500
[  205.834641] [<ffffffc0003c3994>] check_memory_region+0x20/0x14c
[  205.840593] [<ffffffc0003c3b14>] __asan_loadN+0x14/0x1c
[  205.845879] [<ffffffbffc46171c>] mwifiex_unmap_pci_memory.isra.14+0x4c/0x100 [mwifiex_pcie]
[  205.854282] [<ffffffbffc461864>] mwifiex_pcie_delete_cmdrsp_buf+0x94/0xa8 [mwifiex_pcie]
[  205.862421] [<ffffffbffc462028>] mwifiex_pcie_free_buffers+0x11c/0x158 [mwifiex_pcie]
[  205.870302] [<ffffffbffc4620d4>] mwifiex_pcie_down_dev+0x70/0x80 [mwifiex_pcie]
[  205.877736] [<ffffffbffc1397a8>] mwifiex_shutdown_sw+0x190/0x310 [mwifiex]
[  205.884658] [<ffffffbffc4606b4>] mwifiex_pcie_reset_notify+0xd4/0x15c [mwifiex_pcie]
[  205.892446] [<ffffffc000635f54>] pci_reset_notify+0x5c/0x6c
[  205.898048] [<ffffffc00063a044>] pci_dev_save_and_disable+0x24/0x6c
[  205.904350] [<ffffffc00063cf0c>] pci_reset_function+0x30/0x7c
[  205.910134] [<ffffffc000641118>] reset_store+0x68/0x98
[  205.915312] [<ffffffc000771588>] dev_attr_store+0x54/0x60
[  205.920750] [<ffffffc00046f53c>] sysfs_kf_write+0x9c/0xb0
[  205.926182] [<ffffffc00046dfb0>] kernfs_fop_write+0x184/0x1f8
[  205.931963] [<ffffffc0003d64f4>] __vfs_write+0x6c/0x17c
[  205.937221] [<ffffffc0003d7164>] vfs_write+0xf0/0x1c4
[  205.942310] [<ffffffc0003d7da0>] SyS_write+0x78/0xd8
[  205.947312] [<ffffffc000204634>] el0_svc_naked+0x24/0x28
...
[  205.998268] ==================================================================

This bug has been around in different forms for a while. It was sort of
noticed in commit 955ab095 ("mwifiex: Do not kfree cmd buf while
unregistering PCIe"), but it just fixed the double-free, without
acknowledging the potential for use-after-free.

Fixes: fc331460 ("mwifiex: use pci_alloc/free_consistent APIs for PCIe")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d5811285

mwifiex: MAC randomization should not be persistent · a32a0c82

Brian Norris authored Apr 05, 2017

commit 7e2f18f0 upstream.

nl80211 provides the NL80211_SCAN_FLAG_RANDOM_ADDR for every scan
request that should be randomized; the absence of such a flag means we
should not randomize. However, mwifiex was stashing the latest
randomization request and *always* using it for future scans, even those
that didn't set the flag.

Let's zero out the randomization info whenever we get a scan request
without NL80211_SCAN_FLAG_RANDOM_ADDR. I'd prefer to remove
priv->random_mac entirely (and plumb the randomization MAC properly
through the call sequence), but the spaghetti is a little difficult to
unravel here for me.

Fixes: c2a8f0ff ("mwifiex: support random MAC address for scanning")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a32a0c82

rtlwifi: rtl8821ae: setup 8812ae RFE according to device type · cf6df2bc

Larry Finger authored Apr 16, 2017

commit 46cfa214 upstream.

Current channel switch implementation sets 8812ae RFE reg value assuming
that device always has type 2.

Extend possible RFE types set and write corresponding reg values.

Source for new code is
http://dlcdnet.asus.com/pub/ASUS/wireless/PCE-AC51/DR_PCE_AC51_20232801152016.zipSigned-off-by: Maxim Samoylov <max7255@gmail.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Yan-Hsuan Chuang <yhchuang@realtek.com>
Cc: Pkshih <pkshih@realtek.com>
Cc: Birming Chiu <birming@realtek.com>
Cc: Shaofu <shaofu@realtek.com>
Cc: Steven Ting <steventing@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cf6df2bc

md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop · 699ae096

NeilBrown authored Apr 06, 2017

commit 065e519e upstream.

if called md_set_readonly and set MD_CLOSING bit, the mddev cannot
be opened any more due to the MD_CLOING bit wasn't cleared. Thus it
needs to be cleared in md_ioctl after any call to md_set_readonly()
or do_md_stop().
Signed-off-by: NeilBrown <neilb@suse.com>
Fixes: af8d8e6f ("md: changes for MD_STILL_CLOSED flag")
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

699ae096

md: update slab_cache before releasing new stripes when stripes resizing · 1800fde3

Dennis Yang authored Mar 29, 2017

commit 583da48e upstream.

When growing raid5 device on machine with small memory, there is chance that
mdadm will be killed and the following bug report can be observed. The same
bug could also be reproduced in linux-4.10.6.

[57600.075774] BUG: unable to handle kernel NULL pointer dereference at           (null)
[57600.083796] IP: [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.110378] PGD 421cf067 PUD 4442d067 PMD 0
[57600.114678] Oops: 0002 [#1] SMP
[57600.180799] CPU: 1 PID: 25990 Comm: mdadm Tainted: P           O    4.2.8 #1
[57600.187849] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS QV05AR66 03/06/2013
[57600.197490] task: ffff880044e47240 ti: ffff880043070000 task.ti: ffff880043070000
[57600.204963] RIP: 0010:[<ffffffff81a6aa87>]  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.213057] RSP: 0018:ffff880043073810  EFLAGS: 00010046
[57600.218359] RAX: 0000000000000000 RBX: 000000000000000c RCX: ffff88011e296dd0
[57600.225486] RDX: 0000000000000001 RSI: ffffe8ffffcb46c0 RDI: 0000000000000000
[57600.232613] RBP: ffff880043073878 R08: ffff88011e5f8170 R09: 0000000000000282
[57600.239739] R10: 0000000000000005 R11: 28f5c28f5c28f5c3 R12: ffff880043073838
[57600.246872] R13: ffffe8ffffcb46c0 R14: 0000000000000000 R15: ffff8800b9706a00
[57600.253999] FS:  00007f576106c700(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
[57600.262078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[57600.267817] CR2: 0000000000000000 CR3: 00000000428fe000 CR4: 00000000001406e0
[57600.274942] Stack:
[57600.276949]  ffffffff8114ee35 ffff880043073868 0000000000000282 000000000000eb3f
[57600.284383]  ffffffff81119043 ffff880043073838 ffff880043073838 ffff88003e197b98
[57600.291820]  ffffe8ffffcb46c0 ffff88003e197360 0000000000000286 ffff880043073968
[57600.299254] Call Trace:
[57600.301698]  [<ffffffff8114ee35>] ? cache_flusharray+0x35/0xe0
[57600.307523]  [<ffffffff81119043>] ? __page_cache_release+0x23/0x110
[57600.313779]  [<ffffffff8114eb53>] kmem_cache_free+0x63/0xc0
[57600.319344]  [<ffffffff81579942>] drop_one_stripe+0x62/0x90
[57600.324915]  [<ffffffff81579b5b>] raid5_cache_scan+0x8b/0xb0
[57600.330563]  [<ffffffff8111b98a>] shrink_slab.part.36+0x19a/0x250
[57600.336650]  [<ffffffff8111e38c>] shrink_zone+0x23c/0x250
[57600.342039]  [<ffffffff8111e4f3>] do_try_to_free_pages+0x153/0x420
[57600.348210]  [<ffffffff8111e851>] try_to_free_pages+0x91/0xa0
[57600.353959]  [<ffffffff811145b1>] __alloc_pages_nodemask+0x4d1/0x8b0
[57600.360303]  [<ffffffff8157a30b>] check_reshape+0x62b/0x770
[57600.365866]  [<ffffffff8157a4a5>] raid5_check_reshape+0x55/0xa0
[57600.371778]  [<ffffffff81583df7>] update_raid_disks+0xc7/0x110
[57600.377604]  [<ffffffff81592b73>] md_ioctl+0xd83/0x1b10
[57600.382827]  [<ffffffff81385380>] blkdev_ioctl+0x170/0x690
[57600.388307]  [<ffffffff81195238>] block_ioctl+0x38/0x40
[57600.393525]  [<ffffffff811731c5>] do_vfs_ioctl+0x2b5/0x480
[57600.399010]  [<ffffffff8115e07b>] ? vfs_write+0x14b/0x1f0
[57600.404400]  [<ffffffff811733cc>] SyS_ioctl+0x3c/0x70
[57600.409447]  [<ffffffff81a6ad97>] entry_SYSCALL_64_fastpath+0x12/0x6a
[57600.415875] Code: 00 00 00 00 55 48 89 e5 8b 07 85 c0 74 04 31 c0 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 ef b0 01 5d c3 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 85 d1 63 ff 5d
[57600.435460] RIP  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.441208]  RSP <ffff880043073810>
[57600.444690] CR2: 0000000000000000
[57600.448000] ---[ end trace cbc6b5cc4bf9831d ]---

The problem is that resize_stripes() releases new stripe_heads before assigning new
slab cache to conf->slab_cache. If the shrinker function raid5_cache_scan() gets called
after resize_stripes() starting releasing new stripes but right before new slab cache
being assigned, it is possible that these new stripe_heads will be freed with the old
slab_cache which was already been destoryed and that triggers this bug.
Signed-off-by: Dennis Yang <dennisyang@qnap.com>
Fixes: edbe83ab ("md/raid5: allow the stripe_cache to grow and shrink.")
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1800fde3

dm space map disk: fix some book keeping in the disk space map · d7252903

Joe Thornber authored May 15, 2017

commit 0377a07c upstream.

When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d7252903

dm thin metadata: call precommit before saving the roots · f568a102

Joe Thornber authored May 15, 2017

commit 91bcdb92 upstream.

These calls were the wrong way round in __write_initial_superblock.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f568a102

dm bufio: make the parameter "retain_bytes" unsigned long · 17cedf3b

Mikulas Patocka authored Apr 30, 2017

commit 13840d38 upstream.

Change the type of the parameter "retain_bytes" from unsigned to
unsigned long, so that on 64-bit machines the user can set more than
4GiB of data to be retained.

Also, change the type of the variable "count" in the function
"__evict_old_buffers" to unsigned long.  The assignment
"count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
could result in unsigned long to unsigned overflow and that could result
in buffers not being freed when they should.

While at it, avoid division in get_retain_buffers().  Division is slow,
we can change it to shift because we have precalculated the log2 of
block size.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

17cedf3b

dm cache metadata: fail operations if fail_io mode has been established · 2e80638c

Mike Snitzer authored May 05, 2017

commit 10add84e upstream.

Otherwise it is possible to trigger crashes due to the metadata being
inaccessible yet these methods don't safely account for that possibility
without these checks.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2e80638c

dm mpath: delay requeuing while path initialization is in progress · a5b9afd6

Bart Van Assche authored Apr 27, 2017

commit c1d7ecf7 upstream.

Requeuing a request immediately while path initialization is ongoing
causes high CPU usage, something that is undesired.  Hence delay
requeuing while path initialization is in progress.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a5b9afd6

dm mpath: avoid that path removal can trigger an infinite loop · 5087ded2

Bart Van Assche authored Apr 27, 2017

commit 7083abbb upstream.

If blk_get_request() fails, check whether the failure is due to a path
being removed.  If that is the case, fail the path by triggering a call
to fail_path().  This avoids that the following scenario can be
encountered while removing paths:
* CPU usage of a kworker thread jumps to 100%.
* Removing the DM device becomes impossible.

Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK,
and the queue is not dying, because in these cases immediate requeuing
is inappropriate.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5087ded2

dm mpath: split and rename activate_path() to prepare for its expanded use · 5a094143

Bart Van Assche authored Apr 27, 2017

commit 89bfce76 upstream.

activate_path() is renamed to activate_path_work() which now calls
activate_or_offline_path().  activate_or_offline_path() will be used
by the next commit.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5a094143

dm mpath: requeue after a small delay if blk_get_request() fails · 4ff00db7

Bart Van Assche authored Apr 07, 2017

commit 06eb061f upstream.

If blk_get_request() returns ENODEV then multipath_clone_and_map()
causes a request to be requeued immediately. This can cause a kworker
thread to spend 100% of the CPU time of a single core in
__blk_mq_run_hw_queue() and also can cause device removal to never
finish.

Avoid this by only requeuing after a delay if blk_get_request() fails.
Additionally, reduce the requeue delay.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4ff00db7

dm bufio: check new buffer allocation watermark every 30 seconds · 7d67cd02

Mikulas Patocka authored Apr 30, 2017

commit 390020ad upstream.

dm-bufio checks a watermark when it allocates a new buffer in
__bufio_new().  However, it doesn't check the watermark when the user
changes /sys/module/dm_bufio/parameters/max_cache_size_bytes.

This may result in a problem - if the watermark is high enough so that
all possible buffers are allocated and if the user lowers the value of
"max_cache_size_bytes", the watermark will never be checked against the
new value because no new buffer would be allocated.

To fix this, change __evict_old_buffers() so that it checks the
watermark.  __evict_old_buffers() is called every 30 seconds, so if the
user reduces "max_cache_size_bytes", dm-bufio will react to this change
within 30 seconds and decrease memory consumption.

Depends-on: 1b0fb5a5 ("dm bufio: avoid a possible ABBA deadlock")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7d67cd02