Commits · 1264f1e3728bb2e684395148ec61468c5edbf66c · Kirill Smelkov / linux

23 Mar, 2015 6 commits

workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE · 1264f1e3

Tejun Heo authored Mar 05, 2015

commit 8603e1b3 upstream.

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
Tested-by: Jesper Nilsson <jesper.nilsson@axis.com>
Tested-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

1264f1e3

cpuset: Fix cpuset sched_relax_domain_level · 6efeedf2

Jason Low authored Feb 13, 2015

commit 283cb41f upstream.

The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.

The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.

This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.

Fixes: fc560a26 ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()")
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

6efeedf2

mtd: nand: pxa3xx: Fix PIO FIFO draining · 41ba1af9

Maxime Ripard authored Feb 18, 2015

commit 8dad0386 upstream.

The NDDB register holds the data that are needed by the read and write
commands.

However, during a read PIO access, the datasheet specifies that after each 32
bytes read in that register, when BCH is enabled, we have to make sure that the
RDDREQ bit is set in the NDSR register.

This fixes an issue that was seen on the Armada 385, and presumably other mvebu
SoCs, when a read on a newly erased page would end up in the driver reporting a
timeout from the NAND.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

41ba1af9

spi: atmel: Fix interrupt setup for PDC transfers · 293a6372

Torsten Fleischer authored Feb 24, 2015

commit 76e1d14b upstream.

Additionally to the current DMA transfer the PDC allows to set up a next DMA
transfer. This is useful for larger SPI transfers.

The driver currently waits for ENDRX as end of the transfer. But ENDRX is set
when the current DMA transfer is done (RCR = 0), i.e. it doesn't include the
next DMA transfer.
Thus a subsequent SPI transfer could be started although there is currently a
transfer in progress. This can cause invalid accesses to the SPI slave devices
and to SPI transfer errors.

This issue has been observed on a hardware with a M25P128 SPI NOR flash.

So instead of ENDRX we should wait for RXBUFF. This flag is set if there is
no more DMA transfer in progress (RCR = RNCR = 0).
Signed-off-by: Torsten Fleischer <torfl6749@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

293a6372

spi: dw: revisit FIFO size detection again · e44ea338

Andy Shevchenko authored Feb 25, 2015

commit 9d239d35 upstream.

The commit d297933c (spi: dw: Fix detecting FIFO depth) tries to fix the
logic of the FIFO detection based on the description on the comments. However,
there is a slight difference between numbers in TX Level and TX FIFO size.

So, by specification the FIFO size would be in a range 2-256 bytes. From TX
Level prospective it means we can set threshold in the range 0-(FIFO size - 1)
bytes. Hence there are currently two issues:
  a) FIFO size 2 bytes is actually skipped since TX Level is 1 bit and could be
     either 0 or 1 byte;
  b) FIFO size is incorrectly decreased by 1 which already done by meaning of
     TX Level register.

This patch fixes it eventually right.

Fixes: d297933c (spi: dw: Fix detecting FIFO depth)
Reviewed-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

e44ea338

PCI: Don't read past the end of sysfs "driver_override" buffer · d81b1275

Sasha Levin authored Feb 04, 2015

commit 4efe874a upstream.

When printing the driver_override parameter when it is 4095 and 4094 bytes
long, the printing code would access invalid memory because we need count+1
bytes for printing.

Fixes: 782a985d ("PCI: Introduce new device binding path using pci_dev.driver_override")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Alexander Graf <agraf@suse.de>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d81b1275

20 Mar, 2015 4 commits

ip6_gre: fix endianness errors in ip6gre_err · 468763f7

Sabrina Dubroca authored Feb 04, 2015

commit d1e158e2 upstream.

info is in network byte order, change it back to host byte order
before use. In particular, the current code sets the MTU of the tunnel
to a wrong (too big) value.

Fixes: c12b395a ("gre: Support GRE over IPv6")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

468763f7

qlge: Fix qlge_update_hw_vlan_features to handle if interface is down · 167fa487

Marcelo Leitner authored Jan 30, 2015

commit 61132bf7 upstream.

Currently qlge_update_hw_vlan_features() will always first put the
interface down, then update features and then bring it up again. But it
is possible to hit this code while the adapter is down and this causes a
non-paired call to napi_disable(), which will get stuck.

This patch fixes it by skipping these down/up actions if the interface
is already down.

Fixes: a45adbe8 ("qlge: Enhance nested VLAN (Q-in-Q) handling.")
Cc: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

167fa487

net: cls_bpf: fix auto generation of per list handles · d817b892

Daniel Borkmann authored Jan 22, 2015

commit 3f2ab135 upstream.

When creating a bpf classifier in tc with priority collisions and
invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
will return a wrong handle id which in fact is non-unique. Usually
altering of specific filters is being addressed over major id, but
in case of collisions we result in a filter chain, where handle ids
address individual cls_bpf_progs inside the classifier.

Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
in cls_bpf_get() and in case we found a free handle, we're supposed
to use exactly head->hgen. In case of insufficient numbers of handles,
we bail out later as handle id 0 is not allowed.

Fixes: 7d1d65cb ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d817b892

net: cls_bpf: fix size mismatch on filter preparation · 709178a6

Daniel Borkmann authored Jan 22, 2015

commit 7913ecf6 upstream.

In cls_bpf_modify_existing(), we read out the number of filter blocks,
do some sanity checks, allocate a block on that size, and copy over the
BPF instruction blob from user space, then pass everything through the
classic BPF checker prior to installation of the classifier.

We should reject mismatches here, there are 2 scenarios: the number of
filter blocks could be smaller than the provided instruction blob, so
we do a partial copy of the BPF program, and thus the instructions will
either be rejected from the verifier or a valid BPF program will be run;
in the other case, we'll end up copying more than we're supposed to,
and most likely the trailing garbage will be rejected by the verifier
as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
last instruction, load/stores must be correct, etc); in case not, we
would leak memory when dumping back instruction patterns. The code should
have only used nla_len() as Dave noted to avoid this from the beginning.
Anyway, lets fix it by rejecting such load attempts.

Fixes: 7d1d65cb ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

709178a6

18 Mar, 2015 30 commits

ath5k: fix spontaneus AR5312 freezes · 032f0e4f

Sergey Ryazanov authored Feb 04, 2015

commit 8bfae4f9 upstream.

Sometimes while CPU have some load and ath5k doing the wireless
interface reset the whole WiSoC completely freezes. Set of tests shows
that using atomic delay function while we wait interface reset helps to
avoid such freezes.

The easiest way to reproduce this issue: create a station interface,
start continous scan with wpa_supplicant and load CPU by something. Or
just create multiple station interfaces and put them all in continous
scan.

This patch partially reverts the commit 1846ac3d ("ath5k: Use
usleep_range where possible"), which replaces initial udelay()
by usleep_range().

I do not know actual source of this issue, but all looks like that HW
freeze is caused by transaction on internal SoC bus, while wireless
block is in reset state.

Also I should note that I do not know how many chips are affected, but I
did not see this issue with chips, other than AR5312.

CC: Jiri Slaby <jirislaby@gmail.com>
CC: Nick Kossifidis <mickflemm@gmail.com>
CC: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Fixes: 1846ac3d ("ath5k: Use usleep_range where possible")
Reported-by: Christophe Prevotaux <c.prevotaux@rural-networks.com>
Tested-by: Christophe Prevotaux <c.prevotaux@rural-networks.com>
Tested-by: Eric Bree <ebree@nltinc.com>
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

032f0e4f

SUNRPC: Always manipulate rpc_rqst::rq_bc_pa_list under xprt->bc_pa_lock · bd074fc2

Chuck Lever authored Feb 13, 2015

commit 813b00d6 upstream.

Other code that accesses rq_bc_pa_list holds xprt->bc_pa_lock.
xprt_complete_bc_request() should do the same.

Fixes: 2ea24497 ("SUNRPC: RPC callbacks may be split . . .")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

bd074fc2

svcrpc: fix memory leak in gssp_accept_sec_context_upcall · ee0edc3e

David Ramos authored Feb 13, 2015

commit a1d1e9be upstream.

Our UC-KLEE tool found a kernel memory leak of 512 bytes (on x86_64) for
each call to gssp_accept_sec_context_upcall()
(net/sunrpc/auth_gss/gss_rpc_upcall.c). Since it appears that this call
can be triggered by remote connections (at least, from a cursory a
glance at the call chain), it may be exploitable to cause kernel memory
exhaustion. We found the bug in kernel 3.16.3, but it appears to date
back to commit 9dfd87da (2013-08-20).

The gssp_accept_sec_context_upcall() function performs a pair of calls
to gssp_alloc_receive_pages() and gssp_free_receive_pages().  The first
allocates memory for arg->pages.  The second then frees the pages
pointed to by the arg->pages array, but not the array itself.
Reported-by: David A. Ramos <daramos@stanford.edu>
Fixes: 9dfd87da ("rpc: fix huge kmalloc's in gss-proxy”)
Signed-off-by: David A. Ramos <daramos@stanford.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ee0edc3e

sunrpc: fix braino in ->poll() · 4574bc36

Al Viro authored Mar 07, 2015

commit 1711fd9a upstream.

POLL_OUT isn't what callers of ->poll() are expecting to see; it's
actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap
bit...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

4574bc36

TTY: fix tty_wait_until_sent on 64-bit machines · 5a4dbe15

Johan Hovold authored Mar 04, 2015

commit 79fbf4a5 upstream.

Fix overflow bug in tty_wait_until_sent on 64-bit machines, where an
infinite timeout (0) would be passed to the underlying tty-driver's
wait_until_sent-operation as a negative timeout (-1), causing it to
return immediately.

This manifests itself for example as tcdrain() returning immediately,
drivers not honouring the drain flags when setting terminal attributes,
or even dropped data on close as a requested infinite closing-wait
timeout would be ignored.

The first symptom  was reported by Asier LLANO who noted that tcdrain()
returned prematurely when using the ftdi_sio usb-serial driver.

Fix this by passing 0 rather than MAX_SCHEDULE_TIMEOUT (LONG_MAX) to the
underlying tty driver.

Note that the serial-core wait_until_sent-implementation is not affected
by this bug due to a lucky chance (comparison to an unsigned maximum
timeout), and neither is the cyclades one that had an explicit check for
negative timeouts, but all other tty drivers appear to be affected.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: ZIV-Asier Llano Palacios <asier.llano@cgglobal.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

5a4dbe15

USB: serial: fix infinite wait_until_sent timeout · 8b47d032

Johan Hovold authored Mar 04, 2015

commit f528bf4f upstream.

Make sure to handle an infinite timeout (0).

Note that wait_until_sent is currently never called with a 0-timeout
argument due to a bug in tty_wait_until_sent.

Fixes: dcf01050 ("USB: serial: add generic wait_until_sent
implementation")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

8b47d032

net: irda: fix wait_until_sent poll timeout · 232cccf2

Johan Hovold authored Mar 04, 2015

commit 2c3fbe3c upstream.

In case an infinite timeout (0) is requested, the irda wait_until_sent
implementation would use a zero poll timeout rather than the default
200ms.

Note that wait_until_sent is currently never called with a 0-timeout
argument due to a bug in tty_wait_until_sent.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

232cccf2

console: Fix console name size mismatch · 11f41b27

Peter Hurley authored Mar 01, 2015

commit 30a22c21 upstream.

commit 6ae9200f ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

11f41b27

tty: fix up atime/mtime mess, take four · c468bf26

Jiri Slaby authored Feb 27, 2015

commit f0bf0bd0 upstream.

This problem was taken care of three times already in
* b0de59b5 (TTY: do not update
  atime/mtime on read/write),
* 37b7f3c7 (TTY: fix atime/mtime
  regression), and
* b0b88565 (tty: fix up atime/mtime
  mess, take three)

But it still misses one point. As John Paul correctly points out, we
do not care about setting date. If somebody ever changes wall
time backwards (by mistake for example), tty timestamps are never
updated until the original wall time passes.

So check the absolute difference of times and if it large than "8
seconds or so", always update the time. That means we will update
immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the
check, but it was always that way.

Thanks John for serving me this so nicely debugged.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: John Paul Perry <john_paul.perry@alcatel-lucent.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

c468bf26

Change email address for 8250_pci · 5b428ea8

Russell King authored Mar 06, 2015

commit f2e0ea86 upstream.

I'm still receiving reports to my email address, so let's point this
at the linux-serial mailing list instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

5b428ea8

xhci: Workaround for PME stuck issues in Intel xhci · 41f2c377

Mathias Nyman authored Mar 06, 2015

commit b8cb91e0 upstream.

The xhci in Intel Sunrisepoint and Cherryview platforms need a driver
workaround for a Stuck PME that might either block PME events in suspend,
or create spurious PME events preventing runtime suspend.

Workaround is to clear a internal PME flag, BIT(28) in a vendor specific
PMCTRL register at offset 0x80a4, in both suspend resume callbacks

Without this, xhci connected usb devices might never be able to wake up the
system from suspend, or prevent device from going to suspend (xhci d3)
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

41f2c377

xhci: fix reporting of 0-sized URBs in control endpoint · 34a58e7e

Aleksander Morgado authored Mar 06, 2015

commit 45ba2154 upstream.

When a control transfer has a short data stage, the xHCI controller generates
two transfer events: a COMP_SHORT_TX event that specifies the untransferred
amount, and a COMP_SUCCESS event. But when the data stage is not short, only the
COMP_SUCCESS event occurs. Therefore, xhci-hcd must set urb->actual_length to
urb->transfer_buffer_length while processing the COMP_SUCCESS event, unless
urb->actual_length was set already by a previous COMP_SHORT_TX event.

The driver checks this by seeing whether urb->actual_length == 0, but this alone
is the wrong test, as it is entirely possible for a short transfer to have an
urb->actual_length = 0.

This patch changes the xhci driver to rely on a new td->urb_length_set flag,
which is set to true when a COMP_SHORT_TX event is received and the URB length
updated at that stage.

This fixes a bug which affected the HSO plugin, which relies on URBs with
urb->actual_length == 0 to halt re-submitting the RX URB in the control
endpoint.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

34a58e7e

Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. · 028a0a83

Quentin Casasnovas authored Mar 03, 2015

commit dd9ef135 upstream.

Improper arithmetics when calculting the address of the extended ref could
lead to an out of bounds memory read and kernel panic.
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

028a0a83

Btrfs: fix data loss in the fast fsync path · f8da4b44

Filipe Manana authored Mar 01, 2015

commit 3a8b36f3 upstream.

When using the fast file fsync code path we can miss the fact that new
writes happened since the last file fsync and therefore return without
waiting for the IO to finish and write the new extents to the fsync log.

Here's an example scenario where the fsync will miss the fact that new
file data exists that wasn't yet durably persisted:

1. fs_info->last_trans_committed == N - 1 and current transaction is
   transaction N (fs_info->generation == N);

2. do a buffered write;

3. fsync our inode, this clears our inode's full sync flag, starts
   an ordered extent and waits for it to complete - when it completes
   at btrfs_finish_ordered_io(), the inode's last_trans is set to the
   value N (via btrfs_update_inode_fallback -> btrfs_update_inode ->
   btrfs_set_inode_last_trans);

4. transaction N is committed, so fs_info->last_trans_committed is now
   set to the value N and fs_info->generation remains with the value N;

5. do another buffered write, when this happens btrfs_file_write_iter
   sets our inode's last_trans to the value N + 1 (that is
   fs_info->generation + 1 == N + 1);

6. transaction N + 1 is started and fs_info->generation now has the
   value N + 1;

7. transaction N + 1 is committed, so fs_info->last_trans_committed
   is set to the value N + 1;

8. fsync our inode - because it doesn't have the full sync flag set,
   we only start the ordered extent, we don't wait for it to complete
   (only in a later phase) therefore its last_trans field has the
   value N + 1 set previously by btrfs_file_write_iter(), and so we
   have:

       inode->last_trans <= fs_info->last_trans_committed
           (N + 1)              (N + 1)

   Which made us not log the last buffered write and exit the fsync
   handler immediately, returning success (0) to user space and resulting
   in data loss after a crash.

This can actually be triggered deterministically and the following excerpt
from a testcase I made for xfstests triggers the issue. It moves a dummy
file across directories and then fsyncs the old parent directory - this
is just to trigger a transaction commit, so moving files around isn't
directly related to the issue but it was chosen because running 'sync' for
example does more than just committing the current transaction, as it
flushes/waits for all file data to be persisted. The issue can also happen
at random periods, since the transaction kthread periodicaly commits the
current transaction (about every 30 seconds by default).
The body of the test is:

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our main test file 'foo', the one we check for data loss.
  # By doing an fsync against our file, it makes btrfs clear the 'needs_full_sync'
  # bit from its flags (btrfs inode specific flags).
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" \
                  -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io

  # Now create one other file and 2 directories. We will move this second file
  # from one directory to the other later because it forces btrfs to commit its
  # currently open transaction if we fsync the old parent directory. This is
  # necessary to trigger the data loss bug that affected btrfs.
  mkdir $SCRATCH_MNT/testdir_1
  touch $SCRATCH_MNT/testdir_1/bar
  mkdir $SCRATCH_MNT/testdir_2

  # Make sure everything is durably persisted.
  sync

  # Write more 8Kb of data to our file.
  $XFS_IO_PROG -c "pwrite -S 0xbb 8K 8K" $SCRATCH_MNT/foo | _filter_xfs_io

  # Move our 'bar' file into a new directory.
  mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar

  # Fsync our first directory. Because it had a file moved into some other
  # directory, this made btrfs commit the currently open transaction. This is
  # a condition necessary to trigger the data loss bug.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1

  # Now fsync our main test file. If the fsync succeeds, we expect the 8Kb of
  # data we wrote previously to be persisted and available if a crash happens.
  # This did not happen with btrfs, because of the transaction commit that
  # happened when we fsynced the parent directory.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # Now check that all data we wrote before are available.
  echo "File content after log replay:"
  od -t x1 $SCRATCH_MNT/foo

  status=0
  exit

The expected golden output for the test, which is what we get with this
fix applied (or when running against ext3/4 and xfs), is:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 8192/8192 bytes at offset 8192
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0040000

Without this fix applied, the output shows the test file does not have
the second 8Kb extent that we successfully fsynced:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 8192/8192 bytes at offset 8192
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000

So fix this by skipping the fsync only if we're doing a full sync and
if the inode's last_trans is <= fs_info->last_trans_committed, or if
the inode is already in the log. Also remove setting the inode's
last_trans in btrfs_file_write_iter since it's useless/unreliable.

Also because btrfs_file_write_iter no longer sets inode->last_trans to
fs_info->generation + 1, don't set last_trans to 0 if we bail out and don't
bail out if last_trans is 0, otherwise something as simple as the following
example wouldn't log the second write on the last fsync:

  1. write to file

  2. fsync file

  3. fsync file
       |--> btrfs_inode_in_log() returns true and it set last_trans to 0

  4. write to file
       |--> btrfs_file_write_iter() no longers sets last_trans, so it
            remained with a value of 0
  5. fsync
       |--> inode->last_trans == 0, so it bails out without logging the
            second write

A test case for xfstests will be sent soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

f8da4b44

x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization · c1dc5bb3

Andy Lutomirski authored Mar 05, 2015

commit 956421fb upstream.

'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'.  This is
entirely the wrong check.  TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.

This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
[ Backported from tip:x86/asm. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

c1dc5bb3

ASoC: omap-pcm: Correct dma mask · ae43ec32

Peter Ujfalusi authored Mar 03, 2015

commit d51199a8 upstream.

DMA_BIT_MASK of 64 is not valid dma address mask for OMAPs, it should be
set to 32.
The 64 was introduced by commit (in 2009):
a152ff24 ASoC: OMAP: Make DMA 64 aligned

But the dma_mask and coherent_dma_mask can not be used to specify alignment.

Fixes: a152ff24 (ASoC: OMAP: Make DMA 64 aligned)
Reported-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ae43ec32

ACPI / video: Load the module even if ACPI is disabled · 9491ccd6

Chris Wilson authored Mar 01, 2015

commit 6e17cb12 upstream.

i915.ko depends upon the acpi/video.ko module and so refuses to load if
ACPI is disabled at runtime if for example the BIOS is broken beyond
repair. acpi/video provides an optional service for i915.ko and so we
should just allow the modules to load, but do no nothing in order to let
the machines boot correctly.
Reported-by: Bill Augur <bill-auger@programmer.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Aaron Lu <aaron.lu@intel.com>
[ rjw: Fixed up the new comment in acpi_video_init() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

9491ccd6

drm/radeon: fix interlaced modes on DCE8 · d379f9d5

Alex Deucher authored Mar 03, 2015

commit 77ae5f4b upstream.

Need to double the viewport height.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d379f9d5

drm/radeon: fix DRM_IOCTL_RADEON_CS oops · accc44db

Tommi Rantala authored Mar 02, 2015

commit a28b2a47 upstream.

Passing zeroed drm_radeon_cs struct to DRM_IOCTL_RADEON_CS produces the
following oops.

Fix by always calling INIT_LIST_HEAD() to avoid the crash in list_sort().

----------------------------------

 #include <stdint.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <sys/ioctl.h>
 #include <drm/radeon_drm.h>

 static const struct drm_radeon_cs cs;

 int main(int argc, char **argv)
 {
         return ioctl(open(argv[1], O_RDWR), DRM_IOCTL_RADEON_CS, &cs);
 }

----------------------------------

[ttrantal@test2 ~]$ ./main /dev/dri/card0
[   46.904650] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   46.905022] IP: [<ffffffff814d6df2>] list_sort+0x42/0x240
[   46.905022] PGD 68f29067 PUD 688b5067 PMD 0
[   46.905022] Oops: 0002 [#1] SMP
[   46.905022] CPU: 0 PID: 2413 Comm: main Not tainted 4.0.0-rc1+ #58
[   46.905022] Hardware name: Hewlett-Packard HP Compaq dc5750 Small Form Factor/0A64h, BIOS 786E3 v02.10 01/25/2007
[   46.905022] task: ffff880058e2bcc0 ti: ffff880058e64000 task.ti: ffff880058e64000
[   46.905022] RIP: 0010:[<ffffffff814d6df2>]  [<ffffffff814d6df2>] list_sort+0x42/0x240
[   46.905022] RSP: 0018:ffff880058e67998  EFLAGS: 00010246
[   46.905022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   46.905022] RDX: ffffffff81644410 RSI: ffff880058e67b40 RDI: ffff880058e67a58
[   46.905022] RBP: ffff880058e67a88 R08: 0000000000000000 R09: 0000000000000000
[   46.905022] R10: ffff880058e2bcc0 R11: ffffffff828e6ca0 R12: ffffffff81644410
[   46.905022] R13: ffff8800694b8018 R14: 0000000000000000 R15: ffff880058e679b0
[   46.905022] FS:  00007fdc65a65700(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[   46.905022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.905022] CR2: 0000000000000000 CR3: 0000000058dd9000 CR4: 00000000000006f0
[   46.905022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   46.905022] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[   46.905022] Stack:
[   46.905022]  ffff880058e67b40 ffff880058e2bcc0 ffff880058e67a78 0000000000000000
[   46.905022]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   46.905022]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   46.905022] Call Trace:
[   46.905022]  [<ffffffff81644a65>] radeon_cs_parser_fini+0x195/0x220
[   46.905022]  [<ffffffff81645069>] radeon_cs_ioctl+0xa9/0x960
[   46.905022]  [<ffffffff815e1f7c>] drm_ioctl+0x19c/0x640
[   46.905022]  [<ffffffff810f8fdd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[   46.905022]  [<ffffffff810f90ad>] ? trace_hardirqs_on+0xd/0x10
[   46.905022]  [<ffffffff8160c066>] radeon_drm_ioctl+0x46/0x80
[   46.905022]  [<ffffffff81211868>] do_vfs_ioctl+0x318/0x570
[   46.905022]  [<ffffffff81462ef6>] ? selinux_file_ioctl+0x56/0x110
[   46.905022]  [<ffffffff81211b41>] SyS_ioctl+0x81/0xa0
[   46.905022]  [<ffffffff81dc6312>] system_call_fastpath+0x12/0x17
[   46.905022] Code: 48 89 b5 10 ff ff ff 0f 84 03 01 00 00 4c 8d bd 28 ff ff
ff 31 c0 48 89 fb b9 15 00 00 00 49 89 d4 4c 89 ff f3 48 ab 48 8b 46 08 <48> c7
00 00 00 00 00 48 8b 0e 48 85 c9 0f 84 7d 00 00 00 c7 85
[   46.905022] RIP  [<ffffffff814d6df2>] list_sort+0x42/0x240
[   46.905022]  RSP <ffff880058e67998>
[   46.905022] CR2: 0000000000000000
[   47.149253] ---[ end trace 09576b4e8b2c20b8 ]---
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

accc44db

drm/radeon: do a posting read in cik_set_irq · ed9becce

Alex Deucher authored Mar 02, 2015

commit cffefd9b upstream.

To make sure the writes go through the pci bridge.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=90741Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ed9becce

drm/radeon: do a posting read in si_set_irq · 7fd68839

Alex Deucher authored Mar 02, 2015

commit 0586915e upstream.

To make sure the writes go through the pci bridge.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=90741Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

7fd68839

drm/radeon: do a posting read in evergreen_set_irq · 5d4c2f2b

Alex Deucher authored Mar 02, 2015

commit c320bb5f upstream.

To make sure the writes go through the pci bridge.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=90741Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

5d4c2f2b

drm/radeon: do a posting read in r600_set_irq · d76ffb4c

Alex Deucher authored Mar 02, 2015

commit 9d1393f2 upstream.

To make sure the writes go through the pci bridge.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=90741Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d76ffb4c

drm/radeon: do a posting read in rs600_set_irq · ec1975e1

Alex Deucher authored Mar 02, 2015

commit 54acf107 upstream.

To make sure the writes go through the pci bridge.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=90741Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ec1975e1

drm/radeon: do a posting read in r100_set_irq · d1ba0003

Alex Deucher authored Mar 02, 2015

commit f957063f upstream.

To make sure the writes go through the pci bridge.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=90741Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d1ba0003

eCryptfs: don't pass fs-specific ioctl commands through · 1c28096a

Tyler Hicks authored Feb 24, 2015

commit 6d65261a upstream.

eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.

One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.

This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.

https://bugzilla.kernel.org/show_bug.cgi?id=93691
https://launchpad.net/bugs/1305335Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Rocko <rockorequin@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

1c28096a

ecryptfs: ->f_op is never NULL · ae4538c2

Al Viro authored Sep 02, 2014

commit c2e3f5d5 upstream.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ luis: 3.16-stable, just to make 6d65261a ("eCryptfs: don't pass
  fs-specific ioctl commands through") a clean cherry-pick ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ae4538c2

usb: ftdi_sio: Add jtag quirk support for Cyber Cortex AV boards · 4795f346

Max Mansfield authored Mar 02, 2015

commit c7d373c3 upstream.

This patch integrates Cyber Cortex AV boards with the existing
ftdi_jtag_quirk in order to use serial port 0 with JTAG which is
required by the manufacturers' software.

Steps: 2

[ftdi_sio_ids.h]
1. Defined the device PID

[ftdi_sio.c]
2. Added a macro declaration to the ids array, in order to enable the
jtag quirk for the device.
Signed-off-by: Max Mansfield <max.m.mansfield@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

4795f346

KVM: MIPS: Fix trace event to save PC directly · d48fc177

James Hogan authored Feb 24, 2015

commit b3cffac0 upstream.

Currently the guest exit trace event saves the VCPU pointer to the
structure, and the guest PC is retrieved by dereferencing it when the
event is printed rather than directly from the trace record. This isn't
safe as the printing may occur long afterwards, after the PC has changed
and potentially after the VCPU has been freed. Usually this results in
the same (wrong) PC being printed for multiple trace events. It also
isn't portable as userland has no way to access the VCPU data structure
when interpreting the trace record itself.

Lets save the actual PC in the structure so that the correct value is
accessible later.

Fixes: 669e846e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d48fc177

btrfs: fix lost return value due to variable shadowing · d5d21e3a

David Sterba authored Feb 24, 2015

commit 1932b7be upstream.

A block-local variable stores error code but btrfs_get_blocks_direct may
not return it in the end as there's a ret defined in the function scope.

Fixes: d187663e ("Btrfs: lock extents as we map them in DIO")
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d5d21e3a