Commits · 41585808f5f877106a69b8ab4d6ca8d660a522a3 · Kirill Smelkov / linux

12 Apr, 2016 40 commits

perf/x86/pebs: Add workaround for broken OVFL status on HSW+ · 41585808

Stephane Eranian authored Mar 03, 2016

commit 8077eca0 upstream.

This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits on
Haswell, Broadwell and Skylake processors when using PEBS.

The SDM stipulates that when the PEBS iterrupt threshold is crossed,
an interrupt is posted and the kernel is interrupted. The kernel will
find GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records to
drain. But the bits corresponding to the actual counters should NOT be
set. The kernel follows the SDM and assumes that all PEBS events are
processed in the drain_pebs() callback. The kernel then checks for
remaining overflows on any other (non-PEBS) events and processes these
in the for_each_bit_set(&status) loop.

As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be
set as well. In that case, the kernel drains PEBS and generates
SAMPLES with the EXACT tag, then it processes the counter bits, and
generates normal (non-EXACT) SAMPLES.

I ran into this problem by trying to understand why on HSW sampling on
a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.

The workaround in this patch simply ensures that the bits for the
counters used for PEBS events are cleared after the PEBS buffer has
been drained. With this fix 100% of the PEBS samples on my user code
report the EXACT tag.

Before:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is missing

After:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is set

The problem tends to appear more often when multiple PEBS events are used.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-3-git-send-email-eranian@google.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

41585808

sched/cputime: Fix steal time accounting vs. CPU hotplug · a9a3cef5

Thomas Gleixner authored Mar 04, 2016

commit e9532e69 upstream.

On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());

	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: commit 095c0aa8 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808 "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685a "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanosSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a9a3cef5

scsi_common: do not clobber fixed sense information · be7cc5a6

Hannes Reinecke authored Mar 18, 2016

commit ba083116 upstream.

For fixed sense the information field is 32 bits, to we need to truncate
the information field to avoid clobbering the sense code.

Fixes: a1524f22 ("libata-eh: Set 'information' field for autosense")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

be7cc5a6

PM / sleep: Clear pm_suspend_global_flags upon hibernate · ab8a4365

Lukas Wunner authored Mar 23, 2016

commit 27614273 upstream.

When suspending to RAM, waking up and later suspending to disk,
we gratuitously runtime resume devices after the thaw phase.
This does not occur if we always suspend to RAM or always to disk.

pm_complete_with_resume_check(), which gets called from
pci_pm_complete() among others, schedules a runtime resume
if PM_SUSPEND_FLAG_FW_RESUME is set. The flag is set during
a suspend-to-RAM cycle. It is cleared at the beginning of
the suspend-to-RAM cycle but not afterwards and it is not
cleared during a suspend-to-disk cycle at all. Fix it.

Fixes: ef25ba04 (PM / sleep: Add flags to indicate platform firmware involvement)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ab8a4365

intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled · 87afa6a7

Len Brown authored Mar 13, 2016

commit d70e28f5 upstream.

Some SKL-H configurations require "intel_idle.max_cstate=7" to boot.
While that is an effective workaround, it disables C10.

This patch detects the problematic configuration,
and disables C8 and C9, keeping C10 enabled.

Note that enabling SGX in BIOS SETUP can also prevent this issue,
if the system BIOS provides that option.

https://bugzilla.kernel.org/show_bug.cgi?id=109081
"Freezes with Intel i7 6700HQ (Skylake), unless intel_idle.max_cstate=7"
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

87afa6a7

mtd: onenand: fix deadlock in onenand_block_markbad · 5bb18668

Aaro Koskinen authored Feb 20, 2016

commit 5e64c29e upstream.

Commit 5942ddbc ("mtd: introduce mtd_block_markbad interface")
incorrectly changed onenand_block_markbad() to call mtd_block_markbad
instead of onenand_chip's block_markbad function. As a result the function
will now recurse and deadlock. Fix by reverting the change.

Fixes: 5942ddbc ("mtd: introduce mtd_block_markbad interface")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5bb18668

mm/page_alloc: prevent merging between isolated and other pageblocks · cbb6e048

Vlastimil Babka authored Mar 25, 2016

commit d9dddbf5 upstream.

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:

> Before the test, I got:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:          195044 kB
>
>
> After running the test:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:         6602584 kB
>
> So the freed CMA memory is more than total..
>
> Also the the MemFree is more than mem total:
>
> -bash-4.3# cat /proc/meminfo
> MemTotal:       16342016 kB
> MemFree:        22367268 kB
> MemAvailable:   22370528 kB

Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim.  Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:

> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
>
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
>
> CC -> IC -> II (Isolation)
> II -> CI -> CC (Un-isolation)
>
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.

This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
but since it doesn't hold the zone->lock between pageblocks, a race
window does exist.

It's also likely that unexpected merging can occur between
MIGRATE_ISOLATE and non-CMA pageblocks.  This should be prevented in
__free_one_page() since commit 3c605096 ("mm/page_alloc: restrict
max order of merging on isolated pageblock").  However, we only check
the migratetype of the pageblock where buddy merging has been initiated,
not the migratetype of the buddy pageblock (or group of pageblocks)
which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath
and bloat-o-meter has shown significant code bloat (the function is
inline).

This patch reduces the bloat at some expense of more complicated code.
The buddy-merging while-loop in __free_one_page() is initially bounded
to pageblock_border and without any migratetype checks.  The checks are
placed outside, bumping the max_order if merging is allowed, and
returning to the while-loop with a statement which can't be possibly
considered harmful.

This fixes the accounting bug and also removes the arguably weird state
in the original commit 3c605096 where buddies could be left
unmerged.

Fixes: 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Debugged-by: Laura Abbott <labbott@redhat.com>
Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cbb6e048

ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list · ce492bc8

Joseph Qi authored Mar 25, 2016

commit be12b299 upstream.

When master handles convert request, it queues ast first and then
returns status.  This may happen that the ast is sent before the request
status because the above two messages are sent by two threads.  And
right after the ast is sent, if master down, it may trigger BUG in
dlm_move_lockres_to_recovery_list in the requested node because ast
handler moves it to grant list without clear lock->convert_pending.  So
remove BUG_ON statement and check if the ast is processed in
dlmconvert_remote.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ce492bc8

ocfs2/dlm: fix race between convert and recovery · e12a8511

Joseph Qi authored Mar 25, 2016

commit ac7cf246 upstream.

There is a race window between dlmconvert_remote and
dlm_move_lockres_to_recovery_list, which will cause a lock with
OCFS2_LOCK_BUSY in grant list, thus system hangs.

dlmconvert_remote
{
        spin_lock(&res->spinlock);
        list_move_tail(&lock->list, &res->converting);
        lock->convert_pending = 1;
        spin_unlock(&res->spinlock);

        status = dlm_send_remote_convert_request();
        >>>>>> race window, master has queued ast and return DLM_NORMAL,
               and then down before sending ast.
               this node detects master down and calls
               dlm_move_lockres_to_recovery_list, which will revert the
               lock to grant list.
               Then OCFS2_LOCK_BUSY won't be cleared as new master won't
               send ast any more because it thinks already be authorized.

        spin_lock(&res->spinlock);
        lock->convert_pending = 0;
        if (status != DLM_NORMAL)
                dlm_revert_pending_convert(res, lock);
        spin_unlock(&res->spinlock);
}

In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
(res is still in recovering) or res master changed (new master has
finished recovery), reset the status to DLM_RECOVERING, then it will
retry convert.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e12a8511

ocfs2: o2hb: fix double free bug · 9aff4c76

Junxiao Bi authored Mar 25, 2016

commit 9e13f1f9 upstream.

This is a regression issue and caused the following kernel panic when do
ocfs2 multiple test.

  BUG: unable to handle kernel paging request at 00000002000800c0
  IP: [<ffffffff81192978>] kmem_cache_alloc+0x78/0x160
  PGD 7bbe5067 PUD 0
  Oops: 0000 [#1] SMP
  Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
  CPU: 2 PID: 4044 Comm: mpirun Not tainted 4.5.0-rc5-next-20160225 #1
  Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
  task: ffff88007a521a80 ti: ffff88007aed0000 task.ti: ffff88007aed0000
  RIP: 0010:[<ffffffff81192978>]  [<ffffffff81192978>] kmem_cache_alloc+0x78/0x160
  RSP: 0018:ffff88007aed3a48  EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001991
  RDX: 0000000000001990 RSI: 00000000024000c0 RDI: 000000000001b330
  RBP: ffff88007aed3a98 R08: ffff88007d29b330 R09: 00000002000800c0
  R10: 0000000c51376d87 R11: ffff8800792cac38 R12: ffff88007cc30f00
  R13: 00000000024000c0 R14: ffffffff811b053f R15: ffff88007aed3ce7
  FS:  0000000000000000(0000) GS:ffff88007d280000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000002000800c0 CR3: 000000007aeb2000 CR4: 00000000000406e0
  Call Trace:
    __d_alloc+0x2f/0x1a0
    d_alloc+0x17/0x80
    lookup_dcache+0x8a/0xc0
    path_openat+0x3c3/0x1210
    do_filp_open+0x80/0xe0
    do_sys_open+0x110/0x200
    SyS_open+0x19/0x20
    do_syscall_64+0x72/0x230
    entry_SYSCALL64_slow_path+0x25/0x25
  Code: 05 e6 77 e7 7e 4d 8b 08 49 8b 40 10 4d 85 c9 0f 84 dd 00 00 00 48 85 c0 0f 84 d4 00 00 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 1c 01 4c 89 c8 65 48 0f c7 0f 0f 94 c0 3c 01 75 b6 49 63
  RIP   kmem_cache_alloc+0x78/0x160
  CR2: 00000002000800c0
  ---[ end trace 823969e602e4aaac ]---

Fixes: a4a1dfa4("ocfs2/cluster: fix memory leak in o2hb_region_release")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9aff4c76

Input: ati_remote2 - fix crashes on detecting device with invalid descriptor · e0a40d84

Vladis Dronov authored Mar 23, 2016

commit 950336ba upstream.

The ati_remote2 driver expects at least two interfaces with one
endpoint each. If given malicious descriptor that specify one
interface or no endpoints, it will crash in the probe function.
Ensure there is at least two interfaces and one endpoint for each
interface before using it.

The full disclosure: http://seclists.org/bugtraq/2016/Mar/90Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e0a40d84

Input: ims-pcu - sanity check against missing interfaces · cadaf14c

Oliver Neukum authored Mar 17, 2016

commit a0ad220c upstream.

A malicious device missing interface can make the driver oops.
Add sanity checking.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cadaf14c

Input: synaptics - handle spurious release of trackstick buttons, again · 2c84af56

Benjamin Tissoires authored Mar 17, 2016

commit 82be788c upstream.

Looks like the fimware 8.2 still has the extra buttons spurious release
bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=114321Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2c84af56

writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode · 5fc5642b

Tejun Heo authored Mar 18, 2016

commit aaf25593 upstream.

When cgroup writeback is in use, there can be multiple wb's
(bdi_writeback's) per bdi and an inode may switch among them
dynamically. In a couple places, the wrong wb was used leading to
performing operations on the wrong list under the wrong lock
corrupting the io lists.

* writeback_single_inode() was taking @wb parameter and used it to
remove the inode from io lists if it becomes clean after writeback.
The callers of this function were always passing in the root wb
regardless of the actual wb that the inode was associated with,
which could also change while writeback is in progress.

Fix it by dropping the @wb parameter and using
inode_to_wb_and_lock_list() to determine and lock the associated wb.

* After writeback_sb_inodes() writes out an inode, it re-locks @wb and
inode to remove it from or move it to the right io list. It assumes
that the inode is still associated with @wb; however, the inode may
have switched to another wb while writeback was in progress.

Fix it by using inode_to_wb_and_lock_list() to determine and lock
the associated wb after writeback is complete. As the function
requires the original @wb->list_lock locked for the next iteration,
in the unlikely case where the inode has changed association, switch
the locks.

Kudos to Tahsin for pinpointing these subtle breakages.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: d10c8095 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.comReported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5fc5642b

writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list() · 7ea4d0c1

Tejun Heo authored Mar 18, 2016

commit 614a4e37 upstream.

locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with
the target inode, unlocks inode, locks the wb's list_lock and verifies
that the inode is still associated with the wb.  To prevent the wb
going away between dropping inode lock and acquiring list_lock, the wb
is pinned while inode lock is held.  The wb reference is put right
after acquiring list_lock citing that the wb won't be dereferenced
anymore.

This isn't true.  If the inode is still associated with the wb, the
inode has reference and it's safe to return the wb; however, if inode
has been switched, the wb still needs to be unlocked which is a
dereference and can lead to use-after-free if it it races with wb
destruction.

Fix it by putting the reference after releasing list_lock.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 87e1d789 ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7ea4d0c1

ACPI / PM: Runtime resume devices when waking from hibernate · 2444c61a

Lukas Wunner authored Mar 23, 2016

commit fbda4b38 upstream.

Commit 58a1fbbb ("PM / PCI / ACPI: Kick devices that might have been
reset by firmware") added a runtime resume for devices that were runtime
suspended when the system entered suspend-to-RAM.

Briefly, the motivation was to ensure that devices did not remain in a
reset-power-on state after resume, potentially preventing deep SoC-wide
low-power states from being entered on idle.

Currently we're not doing the same when leaving suspend-to-disk and this
asymmetry is a problem if drivers rely on the automatic resume triggered
by pm_complete_with_resume_check(). Fix it.

Fixes: 58a1fbbb (PM / PCI / ACPI: Kick devices that might have been reset by firmware)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2444c61a

ARM: dts: at91: sama5d4 Xplained: don't disable hsmci regulator · 5af4e0be

Ludovic Desroches authored Mar 11, 2016

commit b02acd4e upstream.

If enabling the hsmci regulator on card detection, the board can reboot
on sd card insertion. Keeping the regulator always enabled fixes this
issue.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: 8d545f32 ("ARM: at91/dt: sama5d4 xplained: add regulators for v(q)mmc1 supplies")
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5af4e0be

ARM: dts: at91: sama5d3 Xplained: don't disable hsmci regulator · 8daeb040

Ludovic Desroches authored Mar 11, 2016

commit ae3fc8ea upstream.

If enabling the hsmci regulator on card detection, the board can reboot
on sd card insertion. Keeping the regulator always enabled fixes this
issue.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: 1b53e341 ("ARM: at91/dt: sama5d3 xplained: add fixed regulator for vmmc0")
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8daeb040

nfsd: fix deadlock secinfo+readdir compound · 4eb648c6

J. Bruce Fields authored Mar 02, 2016

commit 2f6fc056 upstream.

nfsd_lookup_dentry exits with the parent filehandle locked.  fh_put also
unlocks if necessary (nfsd filehandle locking is probably too lenient),
so it gets unlocked eventually, but if the following op in the compound
needs to lock it again, we can deadlock.

A fuzzer ran into this; normal clients don't send a secinfo followed by
a readdir in the same compound.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4eb648c6

nfsd4: fix bad bounds checking · cbeacd73

J. Bruce Fields authored Feb 29, 2016

commit 4aed9c46 upstream.

A number of spots in the xdr decoding follow a pattern like

	n = be32_to_cpup(p++);
	READ_BUF(n + 4);

where n is a u32.  The only bounds checking is done in READ_BUF itself,
but since it's checking (n + 4), it won't catch cases where n is very
large, (u32)(-4) or higher.  I'm not sure exactly what the consequences
are, but we've seen crashes soon after.

Instead, just break these up into two READ_BUF()s.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cbeacd73

iser-target: Rework connection termination · 924d4bb8

Jenny Derzhavetz authored Feb 24, 2016

commit 6d1fba0c upstream.

When we receive an event that triggers connection termination,
we have a a couple of things we may want to do:
1. In case we are already terminating, bailout early
2. In case we are connected but not bound, disconnect and schedule
   a connection cleanup silently (don't reinstate)
3. In case we are connected and bound, disconnect and reinstate the connection

This rework fixes a bug that was detected against a mis-behaved
initiator which rejected our rdma_cm accept, in this stage the
isert_conn is no bound and reinstate caused a bogus dereference.

What's great about this is that we don't need the
post_recv_buf_count anymore, so get rid of it.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

924d4bb8

iser-target: Separate flows for np listeners and connections cma events · 86ea155a

Jenny Derzhavetz authored Feb 24, 2016

commit f81bf458 upstream.

No need to restrict this check to specific events.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

86ea155a

iser-target: Add new state ISER_CONN_BOUND to isert_conn · 8cfea5d3

Jenny Derzhavetz authored Feb 24, 2016

commit aea92980 upstream.

We need an indication that isert_conn->iscsi_conn binding has
happened so we'll know not to invoke a connection reinstatement
on an unbound connection which will lead to a bogus isert_conn->conn
dereferece.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8cfea5d3

iser-target: Fix identification of login rx descriptor type · cce41f40

Jenny Derzhavetz authored Feb 24, 2016

commit b89a7c25 upstream.

Once connection request is accepted, one rx descriptor
is posted to receive login request. This descriptor has rx type,
but is outside the main pool of rx descriptors, and thus
was mistreated as tx type.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cce41f40

target: Fix target_release_cmd_kref shutdown comp leak · 653a9b30

Himanshu Madhani authored Mar 14, 2016

commit 5e47f198 upstream.

This patch fixes an active I/O shutdown bug for fabric
drivers using target_wait_for_sess_cmds(), where se_cmd
descriptor shutdown would result in hung tasks waiting
indefinitely for se_cmd->cmd_wait_comp to complete().

To address this bug, drop the incorrect list_del_init()
usage in target_wait_for_sess_cmds() and always complete()
during se_cmd target_release_cmd_kref() put, in order to
let caller invoke the final fabric release callback
into se_cmd->se_tfo->release_cmd() code.
Reported-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Tested-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

653a9b30

clk: bcm2835: Fix setting of PLL divider clock rates · 68e89b11

Eric Anholt authored Feb 15, 2016

commit 773b3966 upstream.

Our dividers weren't being set successfully because CM_PASSWORD wasn't
included in the register write.  It looks easier to just compute the
divider to write ourselves than to update clk-divider for the ability
to OR in some arbitrary bits on write.

Fixes about half of the video modes on my HDMI monitor (everything
except 720x400).
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

68e89b11

clk: rockchip: add hclk_cpubus to the list of rk3188 critical clocks · df775ddb

Alexander Kochetkov authored Jan 26, 2016

commit e8b63288 upstream.

hclk_cpubus needs to keep running because it is needed for devices like
the rom, i2s0 or spdif to be accessible via cpu. Without that all
accesses to devices (readl/writel) return wrong data. So add it
to the list of critical clocks.

Fixes: 78eaf609 ("clk: rockchip: disable unused clocks")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

df775ddb

clk: rockchip: rk3368: fix hdmi_cec gate-register · 76f5f39b

Heiko Stuebner authored Jan 20, 2016

commit fd0c0740 upstream.

Fix a typo making the sclk_hdmi_cec access a wrong register to handle
its gate.

Fixes: 3536c97a ("clk: rockchip: add rk3368 clock controller")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

76f5f39b

clk: rockchip: rk3368: fix parents of video encoder/decoder · c21ff6a5

Heiko Stuebner authored Jan 20, 2016

commit 0f28d984 upstream.

The vdpu and vepu clocks can also be parented to the npll and current
parent list also is wrong as it would use the npll as "usbphy" source,
so adapt the parent to the correct one.

Fixes: 3536c97a ("clk: rockchip: add rk3368 clock controller")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c21ff6a5

clk: rockchip: rk3368: fix cpuclk core dividers · 2c72f046

Heiko Stuebner authored Jan 19, 2016

commit c6d5fe2c upstream.

Similar to commit 9880d427 ("clk: rockchip: fix rk3288 cpuclk core
dividers") it seems the cpuclk dividers are one to high on the rk3368
as well.

And again similar to the previous fix, we opt to make the divider list
contain the values to be written to use the same paradigm for them on all
supported socs.

Fixes: 3536c97a ("clk: rockchip: add rk3368 clock controller")
Reported-by: Zhang Qing <zhangqing@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2c72f046

clk: rockchip: rk3368: fix cpuclk mux bit of big cpu-cluster · 02bde5aa

Heiko Stuebner authored Jan 19, 2016

commit 535ebd42 upstream.

Both clusters have their mux bit in bit 7 of their respective register.
For whatever reason the big cluster currently lists bit 15 which is
definitly wrong.

Fixes: 3536c97a ("clk: rockchip: add rk3368 clock controller")
Reported-by: Zhang Qing <zhangqing@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

02bde5aa

mmc: atmel-mci: Check pdata for NULL before dereferencing it at DMA config · ae854643

Brent Taylor authored Mar 13, 2016

commit 93c77d29 upstream.

Using an at91sam9g20ek development board with DTS configuration may trigger
a kernel panic because of a NULL pointer dereference exception, while
configuring DMA. Let's fix this by adding a check for pdata before
dereferencing it.
Signed-off-by: Brent Taylor <motobud@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ae854643

mmc: sdhci: Fix override of timeout clk wrt max_busy_timeout · 6576116d

Adrian Hunter authored Mar 07, 2016

commit 99513624 upstream.

Normally the timeout clock frequency is read from the capabilities
register.  It is also possible to set the value prior to calling
sdhci_add_host() in which case that value will override the
capabilities register value.  However that was being done after
calculating max_busy_timeout so that max_busy_timeout was being
calculated using the wrong value of timeout_clk.

Fix that by moving the override before max_busy_timeout is
calculated.

The result is that the max_busy_timeout and max_discard
increase for BSW devices so that, for example, the time for
mkfs.ext4 on a 64GB eMMC drops from about 1 minute 40 seconds
to about 20 seconds.

Note, in the future, the capabilities setting will be tidied up
and this override won't be used anymore.  However this fix is
needed for stable.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6576116d

mmc: tegra: properly disable card clock · 6eadae1a

Lucas Stach authored Feb 29, 2016

commit 3491b690 upstream.

The new code to do the clock rate setting externally to the SDMMC
module has a shortcut to not propagate changes with a 0 rate to
the CAR by simply bailing out. This breaks proper cutting of the
card clock. Fix it by directly calling the correct sdhci function.

Fixes: a8e326a9 "mmc: tegra: implement module external clock change"
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6eadae1a

mmc: tegra: Disable UHS-I modes for tegra114 · 8af25535

Jon Hunter authored Feb 26, 2016

commit 7bf037d6 upstream.

SD card support for Tegra114 started failing after commit a8e326a9
("mmc: tegra: implement module external clock change") was merged. This
commit was part of a series to enable UHS-I modes for Tegra. To
workaround this problem for now, disable UHS-I modes for Tegra114 by
separating the soc data structures for Tegra114 and Tegra124 so that
UHS-I is still enabled for Tegra124 but not Tegra114.

Fixes: a8e326a9 ("mmc: tegra: implement module external clock change")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Lucas Stach <dev@lynxeye.de>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8af25535

mmc: sdhci-pxav3: fix higher speed mode capabilities · 82e3506e

Russell King authored Jan 26, 2016

commit 0ca33b4a upstream.

Commit 1140011e ("mmc: sdhci-pxav3: Modify clock settings for the
SDR50 and DDR50 modes") broke any chance of the SDR50 or DDR50 modes
being used.

The commit claims that SDR50 and DDR50 require clock adjustments in
the SDIO3 Configuration register, which is located via the "conf-sdio3"
resource.  However, when this resource is given, we fail to read the
host capabilities 1 register, resulting in host->caps1 being zero.
Hence, both SDHCI_SUPPORT_SDR50 and SDHCI_SUPPORT_DDR50 bits remain
zero, disabling the SDR50 and DDR50 modes.

The underlying idea in this function appears to be to read the device
capabilities, modify them, and set SDHCI_QUIRK_MISSING_CAPS to cause
our modified capabilities to be used.  Implement exactly that.

Fixes: 1140011e ("mmc: sdhci-pxav3: Modify clock settings for the SDR50 and DDR50 modes")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

82e3506e

mmc: sdhci: fix data timeout (part 2) · 1909289b

Russell King authored Jan 26, 2016

commit 7f05538a upstream.

The calculation for the timeout based on the number of card clocks is
incorrect.  The calculation assumed:

	timeout in microseconds = clock cycles / clock in Hz

which is clearly a several orders of magnitude wrong.  Fix this by
multiplying the clock cycles by 1000000 prior to dividing by the Hz
based clock.  Also, as per part 1, ensure that the division rounds
up.

As this needs 64-bit math via do_div(), avoid it if the clock cycles
is zero.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1909289b

mmc: sdhci: fix data timeout (part 1) · e2592e82

Russell King authored Jan 26, 2016

commit fafcfda9 upstream.

The data timeout gives the minimum amount of time that should be
waited before timing out if no data is received from the card.
Simply dividing the nanosecond part by 1000 does not give this
required guarantee, since such a division rounds down.  Use
DIV_ROUND_UP() to give the desired timeout.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e2592e82

mmc: sdhci: plug DMA mapping leak on error · aef1cde8

Russell King authored Jan 26, 2016

commit 054cedff upstream.

If we terminate a command early, we fail to properly clean up the DMA
mappings for the data part of the request.  Put this clean up to the
tasklet, which is the common path for finishing a request so we always
clean up after ourselves.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ Split original patch so that it now contains only the fix ]
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

aef1cde8

mmc: sdhci: avoid unnecessary mapping/unmapping of align buffer · 15c1e7e4

Russell King authored Jan 26, 2016

commit edd63fcc upstream.

Unnecessarily mapping and unmapping the align buffer for SD cards is
expensive: performance measurements on iMX6 show that this gives a hit
of 10% on hdparm buffered disk reads.

MMC/SD card IO comes from the mm/vfs which gives us page based IO, so
for this case, the align buffer is not going to be used.  However, we
still map and unmap this buffer.

Eliminate this by switching the align buffer to be a DMA coherent
buffer, which needs no DMA maintenance to access the buffer.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

15c1e7e4