Commits · 65037c17e587212222a4c39936ecf14face6fb02 · Kirill Smelkov / linux

06 Aug, 2015 22 commits

ALSA: usb-audio: Add MIDI support for Steinberg MI2/MI4 · 65037c17

Dominic Sacré authored Jun 30, 2015

commit 0689a86a upstream.

The Steinberg MI2 and MI4 interfaces are compatible with the USB class
audio spec, but the MIDI part of the devices is reported as a vendor
specific interface.

This patch adds entries to quirks-table.h to recognize the MIDI
endpoints. Audio functionality was already working and is unaffected by
this change.
Signed-off-by: Dominic Sacré <dominic.sacre@gmx.de>
Signed-off-by: Albert Huitsing <albert@huitsing.nl>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

65037c17

iio: light: tcs3414: Fix bug preventing to set integration time · ca54c928

Peter Meerwald authored Jun 14, 2015

commit 33361e56 upstream.

the millisecond values in tcs3414_times should be checked against
val2, not val, which is always zero.
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Reported-by: Stephan Kleisinger <stephan.kleisinger@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ca54c928

iio: adc: at91_adc: allow to use full range of startup time · d1ccf1c4

Jan Leupold authored Jun 17, 2015

commit 2ab5f39b upstream.

The DT-Property "atmel,adc-startup-time" is stored in an u8 for a microsecond
value. When trying to increase the value of STARTUP in Register AT91_ADC_MR
some higher values can't be reached.

Change the type in function parameter and private structure field from u8 to
u32.
Signed-off-by: Jan Leupold <leupold@rsi-elektrotechnik.de>
[nicolas.ferre@atmel.com: change commit message, increase u16 to u32 for startup time]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d1ccf1c4

iio: DAC: ad5624r_spi: fix bit shift of output data value · c0dc841d

JM Friedt authored Jun 19, 2015

commit adfa9698 upstream.

The value sent on the SPI bus is shifted by an erroneous number of bits.
The shift value was already computed in the iio_chan_spec structure and
hence subtracting this argument to 16 yields an erroneous data position
in the SPI stream.
Signed-off-by: JM Friedt <jmfriedt@femto-st.fr>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

c0dc841d

iio: inv-mpu: Specify the expected format/precision for write channels · 9c86e9b9

Adriana Reus authored Jun 12, 2015

commit 6a3c45bb upstream.

The gyroscope needs IIO_VAL_INT_PLUS_NANO for the scale channel and
unless specified write returns MICRO by default.
This needs to be properly specified so that write operations into scale
have the expected behaviour.
Signed-off-by: Adriana Reus <adriana.reus@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

9c86e9b9

iio: twl4030-madc: Pass the IRQF_ONESHOT flag · 3466a6bb

Fabio Estevam authored May 24, 2015

commit 6c0d48cb upstream.

Since commit 1c6c6952 ("genirq: Reject bogus threaded irq requests")
threaded IRQs without a primary handler need to be requested with
IRQF_ONESHOT, otherwise the request will fail.

So pass the IRQF_ONESHOT flag in this case.

The semantic patch that makes this change is available
in scripts/coccinelle/misc/irqf_oneshot.cocci.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

3466a6bb

iio: adc: rockchip_saradc: add missing MODULE_* data · 8b4fcae6

Heiko Stuebner authored May 25, 2015

commit dc7b8d98 upstream.

The module-data is currently missing. This includes the license-information
which makes the driver taint the kernel and miss symbols when compiled as
module.

Fixes: 44d6f2ef ("iio: adc: add driver for Rockchip saradc")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8b4fcae6

drivers: clk: st: Fix flexgen lock init · 2f502e43

Giuseppe Cavallaro authored Jun 23, 2015

commit 0f4f2afd upstream.

While proving lock, the following warning happens
and it is fixed after initializing lock in the setup
function

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.27-02861-g39df285-dirty #33
[<c00154ac>] (unwind_backtrace+0x0/0xf4) from [<c0011b50>] (show_stack+0x10/0x14)
[<c0011b50>] (show_stack+0x10/0x14) from [<c00689ac>] (__lock_acquire+0x900/0xb14)
[<c00689ac>] (__lock_acquire+0x900/0xb14) from [<c0069394>] (lock_acquire+0x68/0x7c)
[<c0069394>] (lock_acquire+0x68/0x7c) from [<c04958f8>] (_raw_spin_lock_irqsave+0x48/0x5c)
[<c04958f8>] (_raw_spin_lock_irqsave+0x48/0x5c) from [<c0381e6c>] (clk_gate_endisable+0x28/0x88)
[<c0381e6c>] (clk_gate_endisable+0x28/0x88) from [<c0381ee0>] (clk_gate_enable+0xc/0x14)
[<c0381ee0>] (clk_gate_enable+0xc/0x14) from [<c0386c68>] (flexgen_enable+0x28/0x40)
[<c0386c68>] (flexgen_enable+0x28/0x40) from [<c037f260>] (__clk_enable+0x5c/0x9c)
[<c037f260>] (__clk_enable+0x5c/0x9c) from [<c037f558>] (clk_enable+0x18/0x2c)
[<c037f558>] (clk_enable+0x18/0x2c) from [<c064a1dc>] (st_lpc_of_register+0xc0/0x248)
[<c064a1dc>] (st_lpc_of_register+0xc0/0x248) from [<c0649e44>] (clocksource_of_init+0x34/0x58)
[<c0649e44>] (clocksource_of_init+0x34/0x58) from [<c0637ddc>] (sti_timer_init+0x10/0x18)
[<c0637ddc>] (sti_timer_init+0x10/0x18) from [<c06343f8>] (time_init+0x20/0x30)
[<c06343f8>] (time_init+0x20/0x30) from [<c0632984>] (start_kernel+0x20c/0x2e8)
[<c0632984>] (start_kernel+0x20c/0x2e8) from [<40008074>] (0x40008074)
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Fixes: b1165170 ("clk: st: STiH407: Support for Flexgen Clocks")
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

2f502e43

ARM: 8393/1: smp: Fix suspicious RCU usage with ipi tracepoints · e266acbc

Stephen Boyd authored Jun 19, 2015

commit 398f7456 upstream.

John Stultz reports an RCU splat on boot with ARM ipi trace
events enabled.

===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc7-00033-gb5bed2f #153 Not tainted
-------------------------------
include/trace/events/ipi.h:68 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc7-00033-gb5bed2f #153
Hardware name: Qualcomm (Flattened Device Tree)
[<c0216b08>] (unwind_backtrace) from [<c02136e8>] (show_stack+0x10/0x14)
[<c02136e8>] (show_stack) from [<c075e678>] (dump_stack+0x70/0xbc)
[<c075e678>] (dump_stack) from [<c0215a80>] (handle_IPI+0x428/0x604)
[<c0215a80>] (handle_IPI) from [<c020942c>] (gic_handle_irq+0x54/0x5c)
[<c020942c>] (gic_handle_irq) from [<c0766604>] (__irq_svc+0x44/0x7c)
Exception stack(0xc09f3f48 to 0xc09f3f90)
3f40:                   00000001 00000001 00000000 c09f73b8 c09f4528 c0a5de9c
3f60: c076b4f0 00000000 00000000 c09ef108 c0a5cec1 00000001 00000000 c09f3f90
3f80: c026bf60 c0210ab8 20000113 ffffffff
[<c0766604>] (__irq_svc) from [<c0210ab8>] (arch_cpu_idle+0x20/0x3c)
[<c0210ab8>] (arch_cpu_idle) from [<c02647f0>] (cpu_startup_entry+0x2c0/0x5dc)
[<c02647f0>] (cpu_startup_entry) from [<c099bc1c>] (start_kernel+0x358/0x3c4)
[<c099bc1c>] (start_kernel) from [<8020807c>] (0x8020807c)

At this point in the IPI handling path we haven't called
irq_enter() yet, so RCU doesn't know that we're about to exit
idle and properly warns that we're using RCU from an idle CPU.
Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so
that RCU is informed about our exit from idle.

Fixes: 365ec7b1 ("ARM: add IPI tracepoints")
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

e266acbc

drivers: clk: st: Fix mux bit-setting for Cortex A9 clocks · 8a92e0c5

Gabriel Fernandez authored Jun 23, 2015

commit 3be6d8ce upstream.

This patch fixes the mux bit-setting for ClockgenA9.
Signed-off-by: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Fixes: 13e6f2da ("clk: st: STiH407: Support for A9 MUX Clocks")
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8a92e0c5

drivers: clk: st: Incorrect register offset used for lock_status · ddac306d

Pankaj Dev authored Jul 07, 2015

commit 56551da9 upstream.

Incorrect register offset used for sthi407 clockgenC
Signed-off-by: Pankaj Dev <pankaj.dev@st.com>
Signed-off-by: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Fixes: 51306d56 ("clk: st: STiH407: Support for clockgenC0")
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ddac306d

clk: qcom: Use parent rate when set rate to pixel RCG clock · 547dca92

Hai Li authored Jun 25, 2015

commit 6d451367 upstream.

Since the parent rate has been recalculated, pixel RCG clock
should rely on it to find the correct M/N values during set_rate,
instead of calling __clk_round_rate() to its parent again.
Signed-off-by: Hai Li <hali@codeaurora.org>
Tested-by: Archit Taneja <architt@codeaurora.org>
Fixes: 99cbd064 ("clk: qcom: Support display RCG clocks")
[sboyd@codeaurora.org: Silenced unused parent variable warning]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[ kamal: backport to 3.19-stable: context ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

547dca92

freeing unlinked file indefinitely delayed · f644ddac

Al Viro authored Jul 08, 2015

commit 75a6f82a upstream.

	Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course).  However, there's one case where that
does *not* happen.  Namely, if you open it by fhandle with cold dcache,
then unlink() and close().

	In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal.  In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()).  The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in.  As the result, we have the final iput() delayed
indefinitely.  It's trivial to reproduce -

void flush_dcache(void)
{
        system("mount -o remount,rw /");
}

static char buf[20 * 1024 * 1024];

main()
{
        int fd;
        union {
                struct file_handle f;
                char buf[MAX_HANDLE_SZ];
        } x;
        int m;

        x.f.handle_bytes = sizeof(x);
        chdir("/root");
        mkdir("foo", 0700);
        fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
        close(fd);
        name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
        flush_dcache();
        fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
        unlink("foo/bar");
        write(fd, buf, sizeof(buf));
        system("df .");			/* 20Mb eaten */
        close(fd);
        system("df .");			/* should've freed those 20Mb */
        flush_dcache();
        system("df .");			/* should be the same as #2 */
}

will spit out something like
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 283282     21692  93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ kamal: backport to 3.19-stable: no fast_dput() ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

f644ddac

9p: don't leave a half-initialized inode sitting around · 43af4b20

Al Viro authored Jul 12, 2015

commit 0a73d0a2 upstream.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

43af4b20

parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results · fbe677d6

John David Anglin authored Jul 01, 2015

commit 01ab6057 upstream.

The increased use of pdtlb/pitlb instructions seemed to increase the
frequency of random segmentation faults building packages. Further, we
had a number of cases where TLB inserts would repeatedly fail and all
forward progress would stop. The Haskell ghc package caused a lot of
trouble in this area. The final indication of a race in pte handling was
this syslog entry on sibaris (C8000):

 swap_free: Unused swap offset entry 00000004
 BUG: Bad page map in process mysqld  pte:00000100 pmd:019bbec5
 addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
 CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
 Backtrace:
  [<0000000040173eb0>] show_stack+0x20/0x38
  [<0000000040444424>] dump_stack+0x9c/0x110
  [<00000000402a0d38>] print_bad_pte+0x1a8/0x278
  [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
  [<00000000402a4090>] zap_page_range+0xf0/0x198
  [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0

Note that the pte value is 0 except for the accessed bit 0x100. This bit
shouldn't be set without the present bit.

It should be noted that the madvise system call is probably a trigger for many
of the random segmentation faults.

In looking at the kernel code, I found the following problems:

1) The pte_clear define didn't take TLB lock when clearing a pte.
2) We didn't test pte present bit inside lock in exception support.
3) The pte and tlb locks needed to merged in order to ensure consistency
between page table and TLB. This also has the effect of serializing TLB
broadcasts on SMP systems.

The attached change implements the above and a few other tweaks to try
to improve performance. Based on the timing code, TLB purges are very
slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
beneficial to test the split_tlb variable to avoid duplicate purges.
Probably, all PA 2.0 machines have combined TLBs.

I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
applications and most threads have a stack size that is too large to
make this useful. I added some comments to this effect.

Since implementing 1 through 3, I haven't had any random segmentation
faults on mx3210 (rp3440) in about one week of building code and running
as a Debian buildd.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

fbe677d6

cxl: Check if afu is not null in cxl_slbia · 6005d1af

Daniel Axtens authored Jul 10, 2015

commit 2c069a11 upstream.

The pointer to an AFU in the adapter's list of AFUs can be null
if we're in the process of removing AFUs. The afu_list_lock
doesn't guard against this.

Say we have 2 slices, and we're in the process of removing cxl.
 - We remove the AFUs in order (see cxl_remove). In cxl_remove_afu
   for AFU 0, we take the lock, set adapter->afu[0] = NULL, and
   release the lock.
 - Then we get an slbia. In cxl_slbia we take the lock, and set
   afu = adapter->afu[0], which is NULL.
 - Therefore our attempt to check afu->enabled will blow up.

Therefore, check if afu is a null pointer before dereferencing it.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6005d1af

hpfs: hpfs_error: Remove static buffer, use vsprintf extension %pV instead · bb7728b8

Joe Perches authored Mar 26, 2015

commit a28e4b2b upstream.

Removing unnecessary static buffers is good.
Use the vsprintf %pV extension instead.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

bb7728b8

hpfs: kstrdup() out of memory handling · 2fab688e

Sanidhya Kashyap authored Mar 21, 2015

commit ce657611 upstream.

There is a possibility of nothing being allocated to the new_opts in
case of memory pressure, therefore return ENOMEM for such case.
Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

2fab688e

selinux: don't waste ebitmap space when importing NetLabel categories · d80ad5a2

Paul Moore authored Jul 09, 2015

commit 33246035 upstream.

At present we don't create efficient ebitmaps when importing NetLabel
category bitmaps.  This can present a problem when comparing ebitmaps
since ebitmap_cmp() is very strict about these things and considers
these wasteful ebitmaps not equal when compared to their more
efficient counterparts, even if their values are the same.  This isn't
likely to cause problems on 64-bit systems due to a bit of luck on
how NetLabel/CIPSO works and the default ebitmap size, but it can be
a problem on 32-bit systems.

This patch fixes this problem by being a bit more intelligent when
importing NetLabel category bitmaps by skipping over empty sections
which should result in a nice, efficient ebitmap.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d80ad5a2

mm: avoid setting up anonymous pages into file mapping · b2ec585b

Kirill A. Shutemov authored Jul 06, 2015

commit 6b7339f4 upstream.

Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().

Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.

For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ kamal: backport to 3.19-stable: s/do_fault/do_linear_fault/ ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

b2ec585b

drm/radeon: unpin cursor BOs on suspend and pin them again on resume (v2) · e198d1d8

Grigori Goronzy authored Jul 07, 2015

commit f3cbb17b upstream.

Everything is evicted from VRAM before suspend, so we need to make
sure all BOs are unpinned and re-pinned after resume. Fixes broken
mouse cursor after resume introduced by commit b9729b17.

[Michel Dänzer: Add pinning BOs on resume]

v2:
[Alex Deucher: merge cursor unpin into fb unpin loop]

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100541
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Grigori Goronzy <greg@chown.ath.cx>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

e198d1d8

drm/radeon: Clean up reference counting and pinning of the cursor BOs · 997f42e2

Michel Dänzer authored Jul 07, 2015

commit cd404af0 upstream.

Take a GEM reference for and pin the new cursor BO, unpin and drop the
GEM reference for the old cursor BO in radeon_crtc_cursor_set2, and use
radeon_crtc->cursor_addr in radeon_set_cursor.

This fixes radeon_cursor_reset accidentally incrementing the cursor BO
pin count, and cleans up the code a little.
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

997f42e2

05 Aug, 2015 14 commits

drm/radeon: fix HDP flushing · 8b72decc

Grigori Goronzy authored Jul 03, 2015

commit 54e03986 upstream.

This was regressed by commit 39e7f6f8, although I don't know of any
actual issues caused by it.

The storage domain is read without TTM locking now, but the lock
never helped to prevent any races.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Grigori Goronzy <greg@chown.ath.cx>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8b72decc

cxl: Fix off by one error allowing subsequent mmap page to be accessed · 8c980662

Ian Munsie authored Jul 07, 2015

commit 10a5894f upstream.

It was discovered that if a process mmaped their problem state area they
were able to access one page more than expected, potentially allowing
them to access the problem state area of an unrelated process.

This was due to a simple off by one error in the mmap fault handler
introduced in 0712dc7e ("cxl: Fix issues
when unmapping contexts"), which is fixed in this patch.

Fixes: 0712dc7e ("cxl: Fix issues when unmapping contexts")
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8c980662

powerpc/powernv: Fix race in updating core_idle_state · 55fbd98c

Shreyas B. Prabhu authored Jul 07, 2015

commit b32aadc1 upstream.

core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
 - The thread is first in subcore to wakeup from sleep/winkle.
 - If its the last thread in the core about to enter sleep/winkle

While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.

But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-

First thread in the subcorea		Another thread in the same
waking up		   		core entering sleep/winkle

lwarx   r15,0,r14
ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx.  r15,0,r14
[Code to restore subcore state]

						lwarx   r15,0,r14
						[clear thread bit]
						stwcx.  r15,0,r14

andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw     r15,0(r14)

Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.

This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

55fbd98c

ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage · 39893e3b

Rafael J. Wysocki authored Jul 04, 2015

commit 0294112e upstream.

This effectively reverts the following three commits:

 7bc10388 ACPI / resources: free memory on error in add_region_before()
 0f1b414d ACPI / PNP: Avoid conflicting resource reservations
 b9a5e5e1 ACPI / init: Fix the ordering of acpi_reserve_resources()

(commit b9a5e5e1 introduced regressions some of which, but not
all, were addressed by commit 0f1b414d and commit 7bc10388
was a fixup on top of the latter) and causes ACPI fixed hardware
resources to be reserved at the fs_initcall_sync stage of system
initialization.

The story is as follows.  First, a boot regression was reported due
to an apparent resource reservation ordering change after a commit
that shouldn't lead to such changes.  Investigation led to the
conclusion that the problem happened because acpi_reserve_resources()
was executed at the device_initcall() stage of system initialization
which wasn't strictly ordered with respect to driver initialization
(and with respect to the initialization of the pcieport driver in
particular), so a random change causing the device initcalls to be
run in a different order might break things.

The response to that was to attempt to run acpi_reserve_resources()
as soon as we knew that ACPI would be in use (commit b9a5e5e1).
However, that turned out to be too early, because it caused resource
reservations made by the PNP system driver to fail on at least one
system and that failure was addressed by commit 0f1b414d.

That fix still turned out to be insufficient, though, because
calling acpi_reserve_resources() before the fs_initcall stage of
system initialization caused a boot regression to happen on the
eCAFE EC-800-H20G/S netbook.  That meant that we only could call
acpi_reserve_resources() at the fs_initcall initialization stage
or later, but then we might just as well call it after the PNP
initalization in which case commit 0f1b414d wouldn't be
necessary any more.

For this reason, the changes made by commit 0f1b414d are reverted
(along with a memory leak fixup on top of that commit), the changes
made by commit b9a5e5e1 that went too far are reverted too and
acpi_reserve_resources() is changed into fs_initcall_sync, which
will cause it to be executed after the PNP subsystem initialization
(which is an fs_initcall) and before device initcalls (including
the pcieport driver initialization) which should avoid the initial
issue.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
Link: http://marc.info/?t=143092384600002&r=1&w=2
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
Link: http://marc.info/?t=143389402600001&r=1&w=2
Fixes: b9a5e5e1 "ACPI / init: Fix the ordering of acpi_reserve_resources()"
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

39893e3b

ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2 · 934ab6eb

Roger Quadros authored Jun 17, 2015

commit 9ab402ae upstream.

Without this USB2 breaks if USB1 is disabled or USB1
initializes after USB2 e.g. due to deferred probing.

Fixes: 5a0f93c6 ("ARM: dts: Add am57xx-beagle-x15")
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

934ab6eb

ext4: replace open coded nofail allocation in ext4_free_blocks() · ef505188

Michal Hocko authored Jul 05, 2015

commit 7444a072 upstream.

ext4_free_blocks is looping around the allocation request and mimics
__GFP_NOFAIL behavior without any allocation fallback strategy. Let's
remove the open coded loop and replace it with __GFP_NOFAIL. Without the
flag the allocator has no way to find out never-fail requirement and
cannot help in any way.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ef505188

ext4: correctly migrate a file with a hole at the beginning · 47eb4b36

Eryu Guan authored Jul 04, 2015

commit 8974fec7 upstream.

Currently ext4_ind_migrate() doesn't correctly handle a file which
contains a hole at the beginning of the file.  This caused the migration
to be done incorrectly, and then if there is a subsequent following
delayed allocation write to the "hole", this would reclaim the same data
blocks again and results in fs corruption.

  # assmuing 4k block size ext4, with delalloc enabled
  # skip the first block and write to the second block
  xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile

  # converting to indirect-mapped file, which would move the data blocks
  # to the beginning of the file, but extent status cache still marks
  # that region as a hole
  chattr -e /mnt/ext4/testfile

  # delayed allocation writes to the "hole", reclaim the same data block
  # again, results in i_blocks corruption
  xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
  umount /mnt/ext4
  e2fsck -nf /dev/sda6
  ...
  Inode 53, i_blocks is 16, should be 8.  Fix? no
  ...
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

47eb4b36

ext4: be more strict when migrating to non-extent based file · ca842bc9

Eryu Guan authored Jul 03, 2015

commit d6f123a9 upstream.

Currently the check in ext4_ind_migrate() is not enough before doing the
real conversion:

a) delayed allocated extents could bypass the check on eh->eh_entries
   and eh->eh_depth

This can be demonstrated by this script

  xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
  chattr -e /mnt/ext4/testfile

where testfile has two extents but still be converted to non-extent
based file format.

b) only extent length is checked but not the offset, which would result
   in data lose (delalloc) or fs corruption (nodelalloc), because
   non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
   blocks

This can be demostrated by

  xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
  chattr -e /mnt/ext4/testfile
  sync

If delalloc is enabled, dmesg prints
  EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
  EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
  EXT4-fs (dm-4): This should not happen!! Data will be lost

If delalloc is disabled, e2fsck -nf shows corruption
  Inode 53, i_size is 5497558142976, should be 4096.  Fix? no

Fix the two issues by

a) forcing all delayed allocation blocks to be allocated before checking
   eh->eh_depth and eh->eh_entries
b) limiting the last logical block of the extent is within direct map
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ca842bc9

ext4: fix reservation release on invalidatepage for delalloc fs · bbc80299

Lukas Czerner authored Jul 03, 2015

commit 9705acd6 upstream.

On delalloc enabled file system on invalidatepage operation
in ext4_da_page_release_reservation() we want to clear the delayed
buffer and remove the extent covering the delayed buffer from the extent
status tree.

However currently there is a bug where on the systems with page size >
block size we will always remove extents from the start of the page
regardless where the actual delayed buffers are positioned in the page.
This leads to the errors like this:

EXT4-fs warning (device loop0): ext4_da_release_space:1225:
ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
blocks

This however can cause data loss on writeback time if the file system is
in ENOSPC condition because we're releasing reservation for someones
else delayed buffer.

Fix this by only removing extents that corresponds to the part of the
page we want to invalidate.

This problem is reproducible by the following fio receipt (however I was
only able to reproduce it with fio-2.1 or older.

[global]
bs=8k
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1m
directory=/mnt/test
numjobs=20
[job1]
ioengine=sync
bs=1k
direct=1
rw=randread
filename=file1:file2
[job2]
ioengine=libaio
rw=randwrite
direct=1
filename=file1:file2
[job3]
bs=1k
ioengine=posixaio
rw=randwrite
direct=1
filename=file1:file2
[job5]
bs=1k
ioengine=sync
rw=randread
filename=file1:file2
[job7]
ioengine=libaio
rw=randwrite
filename=file1:file2
[job8]
ioengine=posixaio
rw=randwrite
filename=file1:file2
[job10]
ioengine=mmap
rw=randwrite
bs=1k
filename=file1:file2
[job11]
ioengine=mmap
rw=randwrite
direct=1
filename=file1:file2
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

bbc80299

ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp · 1fc56e88

Nikolay Borisov authored Jul 02, 2015

commit c45653c3 upstream.

Switch ext4 to using sb_getblk_gfp with GFP_NOFS added to fix possible
deadlocks in the page writeback path.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

1fc56e88

bufferhead: Add _gfp version for sb_getblk() · 52a0b6ce

Nikolay Borisov authored Jul 02, 2015

commit bd7ade3c upstream.

sb_getblk() is used during ext4 (and possibly other FSes) writeback
paths. Sometimes such path require allocating memory and guaranteeing
that such allocation won't block. Currently, however, there is no way
to provide user flags for sb_getblk which could lead to deadlocks.

This patch implements a sb_getblk_gfp with the only difference it can
accept user-provided GFP flags.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

52a0b6ce

Btrfs: fix fsync data loss after append write · 895ee048

Filipe Manana authored Jun 17, 2015

commit e4545de5 upstream.

If we do an append write to a file (which increases its inode's i_size)
that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
and the previous transaction added a new hard link to the file, which sets
the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
the file, the inode's new i_size isn't logged. This has the consequence
that after the fsync log is replayed, the file size remains what it was
before the append write operation, which means users/applications will
not be able to read the data that was successsfully fsync'ed before.

This happens because neither the inode item nor the delayed inode get
their i_size updated when the append write is made - doing so would
require starting a transaction in the buffered write path, something that
we do not do intentionally for performance reasons.

Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
set the inode is logged with its current i_size (log the in-memory inode
into the log tree).

This issue is not a recent regression and is easy to reproduce with the
following test case for fstests:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  here=`pwd`
  tmp=/tmp/$$
  status=1	# failure is the default!

  _cleanup()
  {
          _cleanup_flakey
          rm -f $tmp.*
  }
  trap "_cleanup; exit \$status" 0 1 2 3 15

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs generic
  _supported_os Linux
  _need_to_be_root
  _require_scratch
  _require_dm_flakey
  _require_metadata_journaling $SCRATCH_DEV

  _crash_and_mount()
  {
          # Simulate a crash/power loss.
          _load_flakey_table $FLAKEY_DROP_WRITES
          _unmount_flakey
          # Allow writes again and mount. This makes the fs replay its fsync log.
          _load_flakey_table $FLAKEY_ALLOW_WRITES
          _mount_flakey
  }

  rm -f $seqres.full

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create the test file with some initial data and then fsync it.
  # The fsync here is only needed to trigger the issue in btrfs, as it causes the
  # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                  -c "fsync" \
                  $SCRATCH_MNT/foo | _filter_xfs_io
  sync

  # Add a hard link to our file.
  # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
  # which is a necessary condition to trigger the issue.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar

  # Sync the filesystem to force a commit of the current btrfs transaction, this
  # is a necessary condition to trigger the bug on btrfs.
  sync

  # Now append more data to our file, increasing its size, and fsync the file.
  # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
  # write path did not update the inode item in the btree nor the delayed inode
  # item (in memory struture) in the current transaction (created by the fsync
  # handler), the fsync did not record the inode's new i_size in the fsync
  # log/journal. This made the data unavailable after the fsync log/journal is
  # replayed.
  $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
               -c "fsync" \
               $SCRATCH_MNT/foo | _filter_xfs_io

  echo "File content after fsync and before crash:"
  od -t x1 $SCRATCH_MNT/foo

  _crash_and_mount

  echo "File content after crash and log replay:"
  od -t x1 $SCRATCH_MNT/foo

  status=0
  exit

The expected file output before and after the crash/power failure expects the
appended data to be available, which is:

  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0200000
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

895ee048

Btrfs: fix race between caching kthread and returning inode to inode cache · ff30e038

Filipe Manana authored Jun 13, 2015

commit ae9d8f17 upstream.

While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.

This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.

This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:

[132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
[132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
[132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
[132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
[132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
[132408.505075] Call Trace:
[132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
[132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
[132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
[132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
[132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
[132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
[132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
[132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
[132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
[132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
[132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
[132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
[132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
[132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
[132409.503355] ------------[ cut here ]------------
[132409.504241] kernel BUG at mm/slab.c:2571!

Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
that protects the rbtree while doing the searches and removing entries.

Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ff30e038

Btrfs: use kmem_cache_free when freeing entry in inode cache · 4899b628

Filipe Manana authored Jun 13, 2015

commit c3f4a168 upstream.

The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:

  /*
   * (...)
   *
   * Don't free memory not originally allocated by kmalloc()
   * or you will run into trouble.
   */

So better be safe and use kmem_cache_free().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4899b628

04 Aug, 2015 1 commit

sg_start_req(): make sure that there's not too many elements in iovec · b50ae828

Al Viro authored Aug 01, 2015

commit 451a2886 upstream.

unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there.  If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.

X-Coverup: TINC (and there's no lumber cartel either)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: s/MAX_UIOVEC/UIO_MAXIOV/. This was fixed upstream by commit
 fdc81f45 ("sg_start_req(): use import_iovec()"), but we don't have
 that function.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reference: CVE-2015-5707
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

b50ae828

30 Jul, 2015 3 commits

KEYS: ensure we free the assoc array edit if edit is valid · 21b0f383

Colin Ian King authored Jul 27, 2015

commit ca4da5dd upstream.

__key_link_end is not freeing the associated array edit structure
and this leads to a 512 byte memory leak each time an identical
existing key is added with add_key().

The reason the add_key() system call returns okay is that
key_create_or_update() calls __key_link_begin() before checking to see
whether it can update a key directly rather than adding/replacing - which
it turns out it can.  Thus __key_link() is not called through
__key_instantiate_and_link() and __key_link_end() must cancel the edit.

CVE-2015-1333
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

21b0f383

x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection · cff4ef3d

Andy Lutomirski authored Jul 10, 2015

commit 810bc075 upstream.

We have a tricky bug in the nested NMI code: if we see RSP pointing
to the NMI stack on NMI entry from kernel mode, we assume that we
are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point RSP
at the NMI stack, issue SYSCALL, and arrange for an NMI to happen
while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that the RSP
check is intended to detect.  IRET will clear DF atomically.

(Note: other than paravirt, there's little need for all this complexity.
 We could check RIP instead of RSP.)

Fixes CVE-2015-3291.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
CVE-2015-3291
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

cff4ef3d

x86/nmi/64: Reorder nested NMI checks · 7eddd205

Andy Lutomirski authored Jul 12, 2015

commit a27507ca upstream.

Check the repeat_nmi .. end_repeat_nmi special case first.  The next
patch will rework the RSP check and, as a side effect, the RSP check
will no longer detect repeat_nmi .. end_repeat_nmi, so we'll need
this ordering of the checks.

Note: this is more subtle than it appears.  The check for repeat_nmi
.. end_repeat_nmi jumps straight out of the NMI code instead of
adjusting the "iret" frame to force a repeat.  This is necessary,
because the code between repeat_nmi and end_repeat_nmi sets "NMI
executing" and then writes to the "iret" frame itself.  If a nested
NMI comes in and modifies the "iret" frame while repeat_nmi is also
modifying it, we'll end up with garbage.  The old code got this
right, as does the new code, but the new code is a bit more
explicit.

If we were to move the check right after the "NMI executing" check,
then we'd get it wrong and have random crashes.

This is a prerequisite for the fix for CVE-2015-3291.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, spacing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

7eddd205