Commits · 9ab9ee7a87bbea7b4719e0445bcda2d7f0b7d197 · Kirill Smelkov / linux

28 Mar, 2015 40 commits

kvm: move advertising of KVM_CAP_IRQFD to common code · 9ab9ee7a

Paolo Bonzini authored Mar 05, 2015

[ Upstream commit dc9be0fa ]

POWER supports irqfds but forgot to advertise them.  Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.

To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.
Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 297e2105Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

9ab9ee7a

x86/fpu: Drop_fpu() should not assume that tsk equals current · 1b6bd919

Oleg Nesterov authored Mar 13, 2015

[ Upstream commit f4c36863 ]

drop_fpu() does clear_used_math() and usually this is correct
because tsk == current.

However switch_fpu_finish()->restore_fpu_checking() is called before
__switch_to() updates the "current_task" variable. If it fails,
we will wrongly clear the PF_USED_MATH flag of the previous task.

So use clear_stopped_child_used_math() instead.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

1b6bd919

x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() · 1a29c27a

Oleg Nesterov authored Mar 13, 2015

[ Upstream commit a7c80ebc ]

math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.

This triggers:

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
   dump_stack
   ___might_sleep
   ? _raw_spin_unlock_irqrestore
   __might_sleep
   down_read
   ? _raw_spin_unlock_irqrestore
   print_vma_addr
   signal_fault
   sys32_rt_sigreturn

Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.

[ Note: this is only the first step. math_state_restore() should
        not check used_math(), it should set this flag. While
	init_fpu() should simply die. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

1a29c27a

crypto: aesni - fix memory usage in GCM decryption · 3b389956

Stephan Mueller authored Mar 12, 2015

[ Upstream commit ccfe8c3f ]

The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.

The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.

In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.

Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.

Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.

[1] http://www.chronox.de/libkcapi.html

CC: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

3b389956

crypto: arm/aes update NEON AES module to latest OpenSSL version · 73a115c5

Ard Biesheuvel authored Feb 26, 2015

[ Upstream commit 001eabfd ]

This updates the bit sliced AES module to the latest version in the
upstream OpenSSL repository (e620e5ae37bc). This is needed to fix a
bug in the XTS decryption path, where data chunked in a certain way
could trigger the ciphertext stealing code, which is not supposed to
be active in the kernel build (The kernel implementation of XTS only
supports round multiples of the AES block size of 16 bytes, whereas
the conformant OpenSSL implementation of XTS supports inputs of
arbitrary size by applying ciphertext stealing). This is fixed in
the upstream version by adding the missing #ifndef XTS_CHAIN_TWEAK
around the offending instructions.

The upstream code also contains the change applied by Russell to
build the code unconditionally, i.e., even if __LINUX_ARM_ARCH__ < 7,
but implemented slightly differently.

Cc: stable@vger.kernel.org
Fixes: e4e7f10b ("ARM: add support for bit sliced AES using NEON instructions")
Reported-by: Adrian Kotelba <adrian.kotelba@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

73a115c5

pagemap: do not leak physical addresses to non-privileged userspace · 1cd3d374

Kirill A. Shutemov authored Mar 09, 2015

[ Upstream commit ab676b7d ]

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Seaborn <mseaborn@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

1cd3d374

irqchip: armada-370-xp: Fix chained per-cpu interrupts · 016958bf

Maxime Ripard authored Mar 03, 2015

[ Upstream commit 5724be84 ]

On the Cortex-A9-based Armada SoCs, the MPIC is not the primary interrupt
controller. Yet, it still has to handle some per-cpu interrupt.

To do so, it is chained with the GIC using a per-cpu interrupt. However, the
current code only call irq_set_chained_handler, which is called and enable that
interrupt only on the boot CPU, which means that the parent per-CPU interrupt
is never unmasked on the secondary CPUs, preventing the per-CPU interrupt to
actually work as expected.

This was not seen until now since the only MPIC PPI users were the Marvell
timers that were not working, but not used either since the system use the ARM
TWD by default, and the ethernet controllers, that are faking there interrupts
as SPI, and don't really expect to have interrupts on the secondary cores
anyway.

Add a CPU notifier that will enable the PPI on the secondary cores when they
are brought up.

Cc: <stable@vger.kernel.org> # 3.15+
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1425378443-28822-1-git-send-email-maxime.ripard@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

016958bf

PCI: Don't read past the end of sysfs "driver_override" buffer · a10f2890

Sasha Levin authored Feb 04, 2015

[ Upstream commit 4efe874a ]

When printing the driver_override parameter when it is 4095 and 4094 bytes
long, the printing code would access invalid memory because we need count+1
bytes for printing.

Fixes: 782a985d ("PCI: Introduce new device binding path using pci_dev.driver_override")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
CC: stable@vger.kernel.org	# v3.16+
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Alexander Graf <agraf@suse.de>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

a10f2890

libsas: Fix Kernel Crash in smp_execute_task · 89410138

James Bottomley authored Mar 04, 2015

[ Upstream commit 6302ce4d ]

This crash was reported:

[  366.947370] sd 3:0:1:0: [sdb] Spinning up disk....
[  368.804046] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  368.804072] IP: [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.804098] PGD 0
[  368.804114] Oops: 0002 [#1] SMP
[  368.804143] CPU 1
[  368.804151] Modules linked in: sg netconsole s3g(PO) uinput joydev hid_multitouch usbhid hid snd_hda_codec_via cpufreq_userspace cpufreq_powersave cpufreq_stats uhci_hcd cpufreq_conservative snd_hda_intel snd_hda_codec snd_hwdep snd_pcm sdhci_pci snd_page_alloc sdhci snd_timer snd psmouse evdev serio_raw pcspkr soundcore xhci_hcd shpchp s3g_drm(O) mvsas mmc_core ahci libahci drm i2c_core acpi_cpufreq mperf video processor button thermal_sys dm_dmirror exfat_fs exfat_core dm_zcache dm_mod padlock_aes aes_generic padlock_sha iscsi_target_mod target_core_mod configfs sswipe libsas libata scsi_transport_sas picdev via_cputemp hwmon_vid fuse parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 sd_mod crc_t10dif usb_storage scsi_mod ehci_hcd usbcore usb_common
[  368.804749]
[  368.804764] Pid: 392, comm: kworker/u:3 Tainted: P        W  O 3.4.87-logicube-ng.22 #1 To be filled by O.E.M. To be filled by O.E.M./EPIA-M920
[  368.804802] RIP: 0010:[<ffffffff81358457>]  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.804827] RSP: 0018:ffff880117001cc0  EFLAGS: 00010246
[  368.804842] RAX: 0000000000000000 RBX: ffff8801185030d0 RCX: ffff88008edcb420
[  368.804857] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8801185030d4
[  368.804873] RBP: ffff8801181531c0 R08: 0000000000000020 R09: 00000000fffffffe
[  368.804885] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801185030d4
[  368.804899] R13: 0000000000000002 R14: ffff880117001fd8 R15: ffff8801185030d8
[  368.804916] FS:  0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
[  368.804931] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  368.804946] CR2: 0000000000000000 CR3: 000000000160b000 CR4: 00000000000006e0
[  368.804962] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.804978] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.804995] Process kworker/u:3 (pid: 392, threadinfo ffff880117000000, task ffff8801181531c0)
[  368.805009] Stack:
[  368.805017]  ffff8801185030d8 0000000000000000 ffffffff8161ddf0 ffffffff81056f7c
[  368.805062]  000000000000b503 ffff8801185030d0 ffff880118503000 0000000000000000
[  368.805100]  ffff8801185030d0 ffff8801188b8000 ffff88008edcb420 ffffffff813583ac
[  368.805135] Call Trace:
[  368.805153]  [<ffffffff81056f7c>] ? up+0xb/0x33
[  368.805168]  [<ffffffff813583ac>] ? mutex_lock+0x16/0x25
[  368.805194]  [<ffffffffa018c414>] ? smp_execute_task+0x4e/0x222 [libsas]
[  368.805217]  [<ffffffffa018ce1c>] ? sas_find_bcast_dev+0x3c/0x15d [libsas]
[  368.805240]  [<ffffffffa018ce4f>] ? sas_find_bcast_dev+0x6f/0x15d [libsas]
[  368.805264]  [<ffffffffa018e989>] ? sas_ex_revalidate_domain+0x37/0x2ec [libsas]
[  368.805280]  [<ffffffff81355a2a>] ? printk+0x43/0x48
[  368.805296]  [<ffffffff81359a65>] ? _raw_spin_unlock_irqrestore+0xc/0xd
[  368.805318]  [<ffffffffa018b767>] ? sas_revalidate_domain+0x85/0xb6 [libsas]
[  368.805336]  [<ffffffff8104e5d9>] ? process_one_work+0x151/0x27c
[  368.805351]  [<ffffffff8104f6cd>] ? worker_thread+0xbb/0x152
[  368.805366]  [<ffffffff8104f612>] ? manage_workers.isra.29+0x163/0x163
[  368.805382]  [<ffffffff81052c4e>] ? kthread+0x79/0x81
[  368.805399]  [<ffffffff8135fea4>] ? kernel_thread_helper+0x4/0x10
[  368.805416]  [<ffffffff81052bd5>] ? kthread_flush_work_fn+0x9/0x9
[  368.805431]  [<ffffffff8135fea0>] ? gs_change+0x13/0x13
[  368.805442] Code: 83 7d 30 63 7e 04 f3 90 eb ab 4c 8d 63 04 4c 8d 7b 08 4c 89 e7 e8 fa 15 00 00 48 8b 43 10 4c 89 3c 24 48 89 63 10 48 89 44 24 08 <48> 89 20 83 c8 ff 48 89 6c 24 10 87 03 ff c8 74 35 4d 89 ee 41
[  368.805851] RIP  [<ffffffff81358457>] __mutex_lock_common.isra.7+0x9c/0x15b
[  368.805877]  RSP <ffff880117001cc0>
[  368.805886] CR2: 0000000000000000
[  368.805899] ---[ end trace b720682065d8f4cc ]---

It's directly caused by 89d3cf6a [SCSI] libsas: add mutex for SMP task
execution, but shows a deeper cause: expander functions expect to be able to
cast to and treat domain devices as expanders.  The correct fix is to only do
expander discover when we know we've got an expander device to avoid wrongly
casting a non-expander device.
Reported-by: Praveen Murali <pmurali@logicube.com>
Tested-by: Praveen Murali <pmurali@logicube.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

89410138

gadgetfs: use-after-free in ->aio_read() · c81fc59b

Al Viro authored Feb 06, 2015

[ Upstream commit f01d35a1 ]

AIO_PREAD requests call ->aio_read() with iovec on caller's stack, so if
we are going to access it asynchronously, we'd better get ourselves
a copy - the one on kernel stack of aio_run_iocb() won't be there
anymore.  function/f_fs.c take care of doing that, legacy/inode.c
doesn't...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

c81fc59b

xen-pciback: limit guest control of command register · c7fd1867

Jan Beulich authored Mar 11, 2015

[ Upstream commit af6fc858 ]

Otherwise the guest can abuse that control to cause e.g. PCIe
Unsupported Request responses by disabling memory and/or I/O decoding
and subsequently causing (CPU side) accesses to the respective address
ranges, which (depending on system configuration) may be fatal to the
host.

Note that to alter any of the bits collected together as
PCI_COMMAND_GUEST permissive mode is now required to be enabled
globally or on the specific device.

This is CVE-2015-2150 / XSA-120.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

c7fd1867

xen/events: avoid NULL pointer dereference in dom0 on large machines · 72c7a855

Juergen Gross authored Feb 26, 2015

[ Upstream commit 85e40b05 ]

Using the pvops kernel a NULL pointer dereference was detected on a
large machine (144 processors) when booting as dom0 in
evtchn_fifo_unmask() during assignment of a pirq.

The event channel in question was the first to need a new entry in
event_array[] in events_fifo.c. Unfortunately xen_irq_info_pirq_setup()
is called with evtchn being 0 for a new pirq and the real event channel
number is assigned to the pirq only during __startup_pirq().

It is mandatory to call xen_evtchn_port_setup() after assigning the
event channel number to the pirq to make sure all memory needed for the
event channel is allocated.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

72c7a855

drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data · 75391143

Javier Martinez Canillas authored Mar 12, 2015

[ Upstream commit 8792f777 ]

Commit df9e26d0 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
added an "rtc_src" DT property to specify the clock used as a source to
the S3C real-time clock.

Not all SoCs needs this so commit eaf3a659 ("drivers/rtc/rtc-s3c.c:
fix initialization failure without rtc source clock") changed to check
the struct s3c_rtc_data .needs_src_clk to conditionally grab the clock.

But that commit didn't update the data for each IP version so the RTC
broke on the boards that needs a source clock. This is the case of at
least Exynos5250 and Exynos5440 which uses the s3c6410 RTC IP block.

This commit fixes the S3C rtc on the Exynos5250 Snow and Exynos5420
Peach Pit and Pi Chromebooks.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Tyler Baker <tyler.baker@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

75391143

drm: Don't assign fbs for universal cursor support to files · af6887e2

Chris Wilson authored Feb 25, 2015

[ Upstream commit 9a6f5130 ]

The internal framebuffers we create to remap legacy cursor ioctls to
plane operations for the universal plane support shouldn't be linke to
the file like normal userspace framebuffers. This bug goes back to the
original universal cursor plane support introduced in

commit 161d0dc1
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Tue Jun 10 08:28:10 2014 -0700

    drm: Support legacy cursor ioctls via universal planes when possible (v4)

The isn't too disastrous since fbs are small, we only create one when the
cursor bo gets changed and ultimately they'll be reaped when the window
server restarts.

Conceptually we'd want to just pass NULL for file_priv when creating it,
but the driver needs the file to lookup the underlying buffer object for
cursor id. Instead let's move the file_priv linking out of
add_framebuffer_internal() into the addfb ioctl implementation, which is
the only place it is needed. And also rename the function for a more
accurate since it only creates the fb, but doesn't add it anywhere.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> (fix & commit msg)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (provider of lipstick)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

af6887e2

drm/vmwgfx: Fix a couple of lock dependency violations · 2c7f0370

Thomas Hellstrom authored Mar 09, 2015

[ Upstream commit 5151adb3 ]

Experimental lockdep annotation added to the TTM lock has unveiled a
couple of lock dependency violations in the vmwgfx driver. In both
cases it turns out that the device_private::reservation_sem is not
needed so the offending code is moved out of that lock.

Cc: <stable@vger.kernel.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

2c7f0370

drm/vmwgfx: Reorder device takedown somewhat · c95800d0

Thomas Hellstrom authored Mar 05, 2015

[ Upstream commit 3458390b ]

To take down the MOB and GMR memory types, the driver may have to issue
fence objects and thus make sure that the fence manager is taken down
after those memory types.
Reorder device init accordingly.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

c95800d0

Revert "i2c: core: Dispose OF IRQ mapping at client removal time" · 9297c326

Jakub Kicinski authored Mar 11, 2015

[ Upstream commit a4944572 ]

This reverts commit e4df3a0b
("i2c: core: Dispose OF IRQ mapping at client removal time")

Calling irq_dispose_mapping() will destroy the mapping and disassociate
the IRQ from the IRQ chip to which it belongs. Keeping it is OK, because
existent mappings are reused properly.

Also, this commit breaks drivers using devm* for IRQ management on
OF-based systems because devm* cleanup happens in device code, after
bus's remove() method returns.
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Reported-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[wsa: updated the commit message with findings fromt the other bug report]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Fixes: e4df3a0bSigned-off-by: Sasha Levin <sasha.levin@oracle.com>

9297c326

nilfs2: fix deadlock of segment constructor during recovery · 5e0c3d9e

Ryusuke Konishi authored Mar 12, 2015

[ Upstream commit 283ee148 ]

According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
during recovery at mount time.  The code path that caused the deadlock was
as follows:

  nilfs_fill_super()
    load_nilfs()
      nilfs_salvage_orphan_logs()
        * Do roll-forwarding, attach segment constructor for recovery,
          and kick it.

        nilfs_segctor_thread()
          nilfs_segctor_thread_construct()
           * A lock is held with nilfs_transaction_lock()
             nilfs_segctor_do_construct()
               nilfs_segctor_drop_written_files()
                 iput()
                   iput_final()
                     write_inode_now()
                       writeback_single_inode()
                         __writeback_single_inode()
                           do_writepages()
                             nilfs_writepage()
                               nilfs_construct_dsync_segment()
                                 nilfs_transaction_lock() --> deadlock

This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment
constructor over I_SYNC flag") is applied and roll-forward recovery was
performed at mount time.  The roll-forward recovery can happen if datasync
write is done and the file system crashes immediately after that.  For
instance, we can reproduce the issue with the following steps:

 < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
 # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
 # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
 count=1 && reboot -nfh
 < the system will immediately reboot >
 # mount -t nilfs2 /dev/sdb1 /nilfs

The deadlock occurs because iput() can run segment constructor through
writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
above commit changed segment constructor so that it calls iput()
asynchronously for inodes with i_nlink == 0, but that change was
imperfect.

This fixes the another deadlock by deferring iput() in segment constructor
even for the case that mount is not finished, that is, for the case that
MS_ACTIVE flag is not set.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

5e0c3d9e

regulator: core: Fix enable GPIO reference counting · bf90526a

Doug Anderson authored Mar 03, 2015

[ Upstream commit 29d62ec5 ]

Normally _regulator_do_enable() isn't called on an already-enabled
rdev.  That's because the main caller, _regulator_enable() always
calls _regulator_is_enabled() and only calls _regulator_do_enable() if
the rdev was not already enabled.

However, there is one caller of _regulator_do_enable() that doesn't
check: regulator_suspend_finish().  While we might want to make
regulator_suspend_finish() behave more like _regulator_enable(), it's
probably also a good idea to make _regulator_do_enable() robust if it
is called on an already enabled rdev.

At the moment, _regulator_do_enable() is _not_ robust for already
enabled rdevs if we're using an ena_pin.  Each time
_regulator_do_enable() is called for an rdev using an ena_pin the
reference count of the ena_pin is incremented even if the rdev was
already enabled.  This is not as intended because the ena_pin is for
something else: for keeping track of how many active rdevs there are
sharing the same ena_pin.

Here's how the reference counting works here:

* Each time _regulator_enable() is called we increment
  rdev->use_count, so _regulator_enable() calls need to be balanced
  with _regulator_disable() calls.

* There is no explicit reference counting in _regulator_do_enable()
  which is normally just a warapper around rdev->desc->ops->enable()
  with code for supporting delays.  It's not expected that the
  "ops->enable()" call do reference counting.

* Since regulator_ena_gpio_ctrl() does have reference counting
  (handling the sharing of the pin amongst multiple rdevs), we
  shouldn't call it if the current rdev is already enabled.

Note that as part of this we cleanup (remove) the initting of
ena_gpio_state in regulator_register().  In _regulator_do_enable(),
_regulator_do_disable() and _regulator_is_enabled() is is clear that
ena_gpio_state should be the state of whether this particular rdev has
requested the GPIO be enabled.  regulator_register() was initting it
as the actual state of the pin.

Fixes: 967cfb18 ("regulator: core: manage enable GPIO list")
Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

bf90526a

regulator: Only enable disabled regulators on resume · 4aeea725

Javier Martinez Canillas authored Mar 02, 2015

[ Upstream commit 0548bf4f ]

The _regulator_do_enable() call ought to be a no-op when called on an
already-enabled regulator.  However, as an optimization
_regulator_enable() doesn't call _regulator_do_enable() on an already
enabled regulator.  That means we never test the case of calling
_regulator_do_enable() during normal usage and there may be hidden
bugs or warnings.  We have seen warnings issued by the tps65090 driver
and bugs when using the GPIO enable pin.

Let's match the same optimization that _regulator_enable() in
regulator_suspend_finish().  That may speed up suspend/resume and also
avoids exposing hidden bugs.

[Use much clearer commit message from Doug Anderson]
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4aeea725

regulator: rk808: Set the enable time for LDOs · 084968ad

Doug Anderson authored Feb 20, 2015

[ Upstream commit 28249b0c ]

The LDOs are documented in the rk808 datasheet to have a soft start
time of 400us.  Add that to the driver.  If this time takes longer on
a certain board the device tree should be able to override with
"regulator-enable-ramp-delay".

This fixes some dw_mmc probing problems (together with other patches
posted to the mmc maiing lists) on rk3288.
Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

084968ad

bnx2x: Force fundamental reset for EEH recovery · 45eacb50

Brian King authored Mar 04, 2015

[ Upstream commit da293700 ]

EEH recovery for bnx2x based adapters is not reliable on all Power
systems using the default hot reset, which can result in an
unrecoverable EEH error. Forcing the use of fundamental reset
during EEH recovery fixes this.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

45eacb50

mtd: nand: pxa3xx: Fix PIO FIFO draining · 8c07b3ab

Maxime Ripard authored Feb 18, 2015

[ Upstream commit 8dad0386 ]

The NDDB register holds the data that are needed by the read and write
commands.

However, during a read PIO access, the datasheet specifies that after each 32
bytes read in that register, when BCH is enabled, we have to make sure that the
RDDREQ bit is set in the NDSR register.

This fixes an issue that was seen on the Armada 385, and presumably other mvebu
SoCs, when a read on a newly erased page would end up in the driver reporting a
timeout from the NAND.

Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

8c07b3ab

ALSA: hda - Treat stereo-to-mono mix properly · 79eb59a6

Takashi Iwai authored Mar 16, 2015

[ Upstream commit cc261738 ]

The commit [ef403edb: ALSA: hda - Don't access stereo amps for
mono channel widgets] fixed the handling of mono widgets in general,
but it still misses an exceptional case: namely, a mono mixer widget
taking a single stereo input.  In this case, it has stereo volumes
although it's a mono widget, and thus we have to take care of both
left and right input channels, as stated in HD-audio spec ("7.1.3
Widget Interconnection Rules").

This patch covers this missing piece by adding proper checks of stereo
amps in both the generic parser and the proc output codes.
Reported-by: Raymond Yau <superquad.vortex2@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

79eb59a6

ALSA: hda - Fix regression of HD-audio controller fallback modes · 6fdbb0ba

Takashi Iwai authored Mar 08, 2015

[ Upstream commit a1f3f1ca ]

The commit [63e51fd7: ALSA: hda - Don't take unresponsive D3
transition too serious] introduced a conditional fallback behavior to
the HD-audio controller depending on the flag set.  However, it
introduced a silly bug, too, that the flag was evaluated in a reverse
way.  This resulted in a regression of HD-audio controller driver
where it can't go to the fallback mode at communication errors.

Unfortunately (or fortunately?) this didn't come up until recently
because the affected code path is an error handling that happens only
on an unstable hardware chip.  Most of recent chips work stably, thus
they didn't hit this problem.  Now, we've got a regression report with
a VIA chip, and this seems indeed requiring the fallback to the
polling mode, and finally the bug was revealed.

The fix is a oneliner to remove the wrong logical NOT in the check.
(Lesson learned - be careful about double negation.)

The bug should be backported to stable, but the patch won't be
applicable to 3.13 or earlier because of the code splits.  The stable
fix patches for earlier kernels will be posted later manually.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94021
Fixes: 63e51fd7 ('ALSA: hda - Don't take unresponsive D3 transition too serious')
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

6fdbb0ba

ALSA: hda - Add workaround for MacBook Air 5,2 built-in mic · 73de0edf

Takashi Iwai authored Mar 12, 2015

[ Upstream commit 2ddee91a ]

MacBook Air 5,2 has the same problem as MacBook Pro 8,1 where the
built-in mic records only the right channel.  Apply the same
workaround as MBP8,1 to spread the mono channel via a Cirrus codec
vendor-specific COEF setup.
Reported-and-tested-by: Vasil Zlatanov <vasil.zlatanov@gmail.com>
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

73de0edf

ALSA: hda - Set single_adc_amp flag for CS420x codecs · b0501ca4

Takashi Iwai authored Mar 12, 2015

[ Upstream commit bad994f5 ]

CS420x codecs seem to deal only the single amps of ADC nodes even
though the nodes receive multiple inputs.  This leads to the
inconsistent amp value after S3/S4 resume, for example.

The fix is just to set codec->single_adc_amp flag.  Then the driver
handles these ADC amps as if single connections.
Reported-and-tested-by: Vasil Zlatanov <vasil.zlatanov@gmail.com>
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

b0501ca4

ALSA: hda - Don't access stereo amps for mono channel widgets · e2b501a9

Takashi Iwai authored Mar 12, 2015

[ Upstream commit ef403edb ]

The current HDA generic parser initializes / modifies the amp values
always in stereo, but this seems causing the problem on ALC3229 codec
that has a few mono channel widgets: namely, these mono widgets react
to actions for both channels equally.

In the driver code, we do care the mono channel and create a control
only for the left channel (as defined in HD-audio spec) for such a
node.  When the control is updated, only the left channel value is
changed.  However, in the resume, the right channel value is also
restored from the initial value we took as stereo, and this overwrites
the left channel value.  This ends up being the silent output as the
right channel has been never touched and remains muted.

This patch covers the places where unconditional stereo amp accesses
are done and converts to the conditional accesses.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94581
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

e2b501a9

ALSA: hda - Fix built-in mic on Compaq Presario CQ60 · 05e83bd5

Takashi Iwai authored Mar 11, 2015

[ Upstream commit ddb6ca75 ]

Compaq Presario CQ60 laptop with CX20561 gives a wrong pin for the
built-in mic NID 0x17 instead of NID 0x1d, and it results in the
non-working mic.  This patch just remaps the pin correctly via fixup.

Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=920604
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

05e83bd5

ALSA: control: Add sanity checks for user ctl id name string · c0527b93

Takashi Iwai authored Mar 11, 2015

[ Upstream commit be3bb823 ]

There was no check about the id string of user control elements, so we
accepted even a control element with an empty string, which is
obviously bogus.  This patch adds more sanity checks of id strings.

Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

c0527b93

ALSA: snd-usb: add quirks for Roland UA-22 · 095ea4b4

Daniel Mack authored Mar 12, 2015

[ Upstream commit fcdcd1de ]

The device complies to the UAC1 standard but hides that fact with
proprietary descriptors. The autodetect quirk for Roland devices
catches the audio interface but misses the MIDI part, so a specific
quirk is needed.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reported-by: Rafa Lafuente <rafalafuente@gmail.com>
Tested-by: Raphaël Doursenaud <raphael@doursenaud.fr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

095ea4b4

spi: pl022: Fix race in giveback() leading to driver lock-up · 766afbab

Alexander Sverdlin authored Feb 27, 2015

[ Upstream commit cd6fa8d2 ]

Commit fd316941 ("spi/pl022: disable port when unused") introduced a race,
which leads to possible driver lock up (easily reproducible on SMP).

The problem happens in giveback() function where the completion of the transfer
is signalled to SPI subsystem and then the HW SPI controller is disabled. Another
transfer might be setup in between, which brings driver in locked-up state.

Exact event sequence on SMP:

core0                                   core1

                                        => pump_transfers()
                                        /* message->state == STATE_DONE */
                                          => giveback()
                                            => spi_finalize_current_message()

=> pl022_unprepare_transfer_hardware()
=> pl022_transfer_one_message
  => flush()
  => do_interrupt_dma_transfer()
    => set_up_next_transfer()
    /* Enable SSP, turn on interrupts */
    writew((readw(SSP_CR1(pl022->virtbase)) |
           SSP_CR1_MASK_SSE), SSP_CR1(pl022->virtbase));

...

=> pl022_interrupt_handler()
  => readwriter()

                                        /* disable the SPI/SSP operation */
                                        => writew((readw(SSP_CR1(pl022->virtbase)) &
                                                  (~SSP_CR1_MASK_SSE)), SSP_CR1(pl022->virtbase));

Lockup! SPI controller is disabled and the data will never be received. Whole
SPI subsystem is waiting for transfer ACK and blocked.

So, only signal transfer completion after disabling the controller.

Fixes: fd316941 (spi/pl022: disable port when unused)
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

766afbab

spi: dw-mid: avoid potential NULL dereference · f3aa9105

Andy Shevchenko authored Mar 02, 2015

[ Upstream commit c9dafb27 ]

When DMA descriptor allocation fails we should not try to assign any fields in
the bad descriptor. The patch adds the necessary checks for that.

Fixes: 7063c0d9 (spi/dw_spi: add DMA support)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

f3aa9105

spi: atmel: Fix interrupt setup for PDC transfers · fbafcf8f

Torsten Fleischer authored Feb 24, 2015

[ Upstream commit 76e1d14b ]

Additionally to the current DMA transfer the PDC allows to set up a next DMA
transfer. This is useful for larger SPI transfers.

The driver currently waits for ENDRX as end of the transfer. But ENDRX is set
when the current DMA transfer is done (RCR = 0), i.e. it doesn't include the
next DMA transfer.
Thus a subsequent SPI transfer could be started although there is currently a
transfer in progress. This can cause invalid accesses to the SPI slave devices
and to SPI transfer errors.

This issue has been observed on a hardware with a M25P128 SPI NOR flash.

So instead of ENDRX we should wait for RXBUFF. This flag is set if there is
no more DMA transfer in progress (RCR = RNCR = 0).
Signed-off-by: Torsten Fleischer <torfl6749@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

fbafcf8f

tpm/tpm_i2c_stm_st33: Add status check when reading data on the FIFO · a0366884

Christophe Ricard authored Jan 13, 2015

[ Upstream commit c4eadfaf ]

Add a return value check when reading data from the FIFO register.

Cc: <stable@vger.kernel.org>
Reviewed-by: Jason Gunthorpe <jason.gunthorpe@obsidianresearch.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Reviewed-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

a0366884

tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send · 33c5b3ad

jmlatten@linux.vnet.ibm.com authored Feb 20, 2015

[ Upstream commit 62dfd912 ]

Problem: When IMA and VTPM are both enabled in kernel config,
kernel hangs during bootup on LE OS.

Why?: IMA calls tpm_pcr_read() which results in tpm_ibmvtpm_send
and tpm_ibmtpm_recv getting called. A trace showed that
tpm_ibmtpm_recv was hanging.

Resolution: tpm_ibmtpm_recv was hanging because tpm_ibmvtpm_send
was sending CRQ message that probably did not make much sense
to phype because of Endianness. The fix below sends correctly
converted CRQ for LE. This was not caught before because it
seems IMA is not enabled by default in kernel config and
IMA exercises this particular code path in vtpm.

Tested with IMA and VTPM enabled in kernel config and VTPM
enabled on both a BE OS and a LE OS ppc64 lpar. This exercised
CRQ and TPM command code paths in vtpm.
Patch is against Peter's tpmdd tree on github which included
Vicky's previous vtpm le patches.
Signed-off-by: Joy Latten <jmlatten@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # eb71f8a5: "Added Little Endian support to vtpm module"
Cc: <stable@vger.kernel.org>
Reviewed-by: Ashley Lai <ashley@ahsleylai.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

33c5b3ad

cpuset: Fix cpuset sched_relax_domain_level · ea7358ff

Jason Low authored Feb 13, 2015

[ Upstream commit 283cb41f ]

The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.

The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.

This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.

Fixes: fc560a26 ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()")
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

ea7358ff

cpuset: fix a warning when clearing configured masks in old hierarchy · 79692efa

Zefan Li authored Feb 13, 2015

[ Upstream commit 79063bff ]

When we clear cpuset.cpus, cpuset.effective_cpus won't be cleared:

  # mount -t cgroup -o cpuset xxx /mnt
  # mkdir /mnt/tmp
  # echo 0 > /mnt/tmp/cpuset.cpus
  # echo > /mnt/tmp/cpuset.cpus
  # cat cpuset.cpus

  # cat cpuset.effective_cpus
  0-15

And a kernel warning in update_cpumasks_hier() is triggered:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4028 at kernel/cpuset.c:894 update_cpumasks_hier+0x471/0x650()

Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

79692efa

cpuset: initialize effective masks when clone_children is enabled · dacb6ccd

Zefan Li authored Feb 13, 2015

[ Upstream commit 790317e1 ]

If clone_children is enabled, effective masks won't be initialized
due to the bug:

  # mount -t cgroup -o cpuset xxx /mnt
  # echo 1 > cgroup.clone_children
  # mkdir /mnt/tmp
  # cat /mnt/tmp/
  # cat cpuset.effective_cpus

  # cat cpuset.cpus
  0-15

And then this cpuset won't constrain the tasks in it.

Either the bug or the fix has no effect on unified hierarchy, as
there's no clone_chidren flag there any more.
Reported-by: Christian Brauner <christianvanbrauner@gmail.com>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

dacb6ccd

workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE · d4bc18f7

Tejun Heo authored Mar 05, 2015

[ Upstream commit 8603e1b3 ]

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
Cc: stable@vger.kernel.org
Tested-by: Jesper Nilsson <jesper.nilsson@axis.com>
Tested-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

d4bc18f7