Commits · f199bdbaab37585ff6912dfb5524cf2a0ef06a05 · Kirill Smelkov / linux

06 Jan, 2017 40 commits

Revert "netfilter: move nat hlist_head to nf_conn" · f199bdba

Greg Kroah-Hartman authored Jan 04, 2017

This reverts commit 7c966435 as it is
not working properly.  Please move to 4.9 to get the full fix.
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f199bdba

Revert "netfilter: nat: convert nat bysrc hash to rhashtable" · 99d6d4e0

Greg Kroah-Hartman authored Jan 04, 2017

This reverts commit 870190a9 as it is
not working properly.  Please move to 4.9 to get the full fix.
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

99d6d4e0

arm64: mark reserved memblock regions explicitly in iomem · 77422569

AKASHI Takahiro authored Aug 22, 2016

commit e7cd1903 upstream.

Kdump(kexec-tools) parses /proc/iomem to identify all the memory regions
on the system. Since the current kernel names "nomap" regions, like UEFI
runtime services code/data, as "System RAM," kexec-tools sets up elf core
header to include them in a crash dump file (/proc/vmcore).

Then crash dump kernel parses UEFI memory map again, re-marks those regions
as "nomap" and does not create a memory mapping for them unlike the other
areas of System RAM. In this case, copying /proc/vmcore through
copy_oldmem_page() on crash dump kernel will end up with a kernel abort,
as reported in [1].

This patch names all the "nomap" regions explicitly as "reserved" so that
we can exclude them from a crash dump file. acpi_os_ioremap() must also
be modified because those regions have WB attributes [2].

Apart from kdump, this change also matches x86's use of acpi (and
/proc/iomem).

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/448186.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450089.htmlReviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

77422569

xfs: set AGI buffer type in xlog_recover_clear_agi_bucket · 587e89bd

Eric Sandeen authored Dec 05, 2016

commit 6b10b23c upstream.

xlog_recover_clear_agi_bucket didn't set the
type to XFS_BLFT_AGI_BUF, so we got a warning during log
replay (or an ASSERT on a debug build).

    XFS (md0): Unknown buffer type 0!
    XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1

Fix this, as was done in f19b872b for 2 other locations
with the same problem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

587e89bd

arm/xen: Use alloc_percpu rather than __alloc_percpu · 959e363e

Julien Grall authored Dec 07, 2016

commit 24d5373d upstream.

The function xen_guest_init is using __alloc_percpu with an alignment
which are not power of two.

However, the percpu allocator never supported alignments which are not power
of two and has always behaved incorectly in thise case.

Commit 3ca45a46 "percpu: ensure requested alignment is power of two"
introduced a check which trigger a warning [1] when booting linux-next
on Xen. But in reality this bug was always present.

This can be fixed by replacing the call to __alloc_percpu with
alloc_percpu. The latter will use an alignment which are a power of two.

[1]

[    0.023921] illegal size (48) or align (48) for percpu allocation
[    0.024167] ------------[ cut here ]------------
[    0.024344] WARNING: CPU: 0 PID: 1 at linux/mm/percpu.c:892 pcpu_alloc+0x88/0x6c0
[    0.024584] Modules linked in:
[    0.024708]
[    0.024804] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.9.0-rc7-next-20161128 #473
[    0.025012] Hardware name: Foundation-v8A (DT)
[    0.025162] task: ffff80003d870000 task.stack: ffff80003d844000
[    0.025351] PC is at pcpu_alloc+0x88/0x6c0
[    0.025490] LR is at pcpu_alloc+0x88/0x6c0
[    0.025624] pc : [<ffff00000818e678>] lr : [<ffff00000818e678>]
pstate: 60000045
[    0.025830] sp : ffff80003d847cd0
[    0.025946] x29: ffff80003d847cd0 x28: 0000000000000000
[    0.026147] x27: 0000000000000000 x26: 0000000000000000
[    0.026348] x25: 0000000000000000 x24: 0000000000000000
[    0.026549] x23: 0000000000000000 x22: 00000000024000c0
[    0.026752] x21: ffff000008e97000 x20: 0000000000000000
[    0.026953] x19: 0000000000000030 x18: 0000000000000010
[    0.027155] x17: 0000000000000a3f x16: 00000000deadbeef
[    0.027357] x15: 0000000000000006 x14: ffff000088f79c3f
[    0.027573] x13: ffff000008f79c4d x12: 0000000000000041
[    0.027782] x11: 0000000000000006 x10: 0000000000000042
[    0.027995] x9 : ffff80003d847a40 x8 : 6f697461636f6c6c
[    0.028208] x7 : 6120757063726570 x6 : ffff000008f79c84
[    0.028419] x5 : 0000000000000005 x4 : 0000000000000000
[    0.028628] x3 : 0000000000000000 x2 : 000000000000017f
[    0.028840] x1 : ffff80003d870000 x0 : 0000000000000035
[    0.029056]
[    0.029152] ---[ end trace 0000000000000000 ]---
[    0.029297] Call trace:
[    0.029403] Exception stack(0xffff80003d847b00 to
                               0xffff80003d847c30)
[    0.029621] 7b00: 0000000000000030 0001000000000000
ffff80003d847cd0 ffff00000818e678
[    0.029901] 7b20: 0000000000000002 0000000000000004
ffff000008f7c060 0000000000000035
[    0.030153] 7b40: ffff000008f79000 ffff000008c4cd88
ffff80003d847bf0 ffff000008101778
[    0.030402] 7b60: 0000000000000030 0000000000000000
ffff000008e97000 00000000024000c0
[    0.030647] 7b80: 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[    0.030895] 7ba0: 0000000000000035 ffff80003d870000
000000000000017f 0000000000000000
[    0.031144] 7bc0: 0000000000000000 0000000000000005
ffff000008f79c84 6120757063726570
[    0.031394] 7be0: 6f697461636f6c6c ffff80003d847a40
0000000000000042 0000000000000006
[    0.031643] 7c00: 0000000000000041 ffff000008f79c4d
ffff000088f79c3f 0000000000000006
[    0.031877] 7c20: 00000000deadbeef 0000000000000a3f
[    0.032051] [<ffff00000818e678>] pcpu_alloc+0x88/0x6c0
[    0.032229] [<ffff00000818ece8>] __alloc_percpu+0x18/0x20
[    0.032409] [<ffff000008d9606c>] xen_guest_init+0x174/0x2f4
[    0.032591] [<ffff0000080830f8>] do_one_initcall+0x38/0x130
[    0.032783] [<ffff000008d90c34>] kernel_init_freeable+0xe0/0x248
[    0.032995] [<ffff00000899a890>] kernel_init+0x10/0x100
[    0.033172] [<ffff000008082ec0>] ret_from_fork+0x10/0x50
Reported-by: Wei Chen <wei.chen@arm.com>
Link: https://lkml.org/lkml/2016/11/28/669Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

959e363e

xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing · 6fbd3fb6

Boris Ostrovsky authored Nov 21, 2016

commit 30faaafd upstream.

Commit 9c17d965 ("xen/gntdev: Grant maps should not be subject to
NUMA balancing") set VM_IO flag to prevent grant maps from being
subjected to NUMA balancing.

It was discovered recently that this flag causes get_user_pages() to
always fail with -EFAULT.

check_vma_flags
__get_user_pages
__get_user_pages_locked
__get_user_pages_unlocked
get_user_pages_fast
iov_iter_get_pages
dio_refill_pages
do_direct_IO
do_blockdev_direct_IO
do_blockdev_direct_IO
ext4_direct_IO_read
generic_file_read_iter
aio_run_iocb

(which can happen if guest's vdisk has direct-io-safe option).

To avoid this let's use VM_MIXEDMAP flag instead --- it prevents
NUMA balancing just as VM_IO does and has no effect on
check_vma_flags().
Reported-by: Olaf Hering <olaf@aepfle.de>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6fbd3fb6

tpm xen: Remove bogus tpm_chip_unregister · 883f12a2

Jason Gunthorpe authored Oct 26, 2016

commit 1f0f30e4 upstream.

tpm_chip_unregister can only be called after tpm_chip_register.
devm manages the allocation so no unwind is needed here.

Fixes: afb5abc2 ("tpm: two-phase chip management functions")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

883f12a2

kernel/debug/debug_core.c: more properly delay for secondary CPUs · 8419f521

Douglas Anderson authored Dec 14, 2016

commit 2d13bb64 upstream.

We've got a delay loop waiting for secondary CPUs.  That loop uses
loops_per_jiffy.  However, loops_per_jiffy doesn't actually mean how
many tight loops make up a jiffy on all architectures.  It is quite
common to see things like this in the boot log:

  Calibrating delay loop (skipped), value calculated using timer
  frequency.. 48.00 BogoMIPS (lpj=24000)

In my case I was seeing lots of cases where other CPUs timed out
entering the debugger only to print their stack crawls shortly after the
kdb> prompt was written.

Elsewhere in kgdb we already use udelay(), so that should be safe enough
to use to implement our timeout.  We'll delay 1 ms for 1000 times, which
should give us a full second of delay (just like the old code wanted)
but allow us to notice that we're done every 1 ms.

[akpm@linux-foundation.org: simplifications, per Daniel]
Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.orgSigned-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Brian Norris <briannorris@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8419f521

watchdog: qcom: fix kernel panic due to external abort on non-linefetch · 63b33e08

Christian Lamparter authored Nov 14, 2016

commit f06f35c6 upstream.

This patch fixes a off-by-one in the "watchdog: qcom: add option for
standalone watchdog not in timer block" patch that causes the
following panic on boot:

> Unhandled fault: external abort on non-linefetch (0x1008) at 0xc8874002
> pgd = c0204000
> [c8874002] *pgd=87806811, *pte=0b017653, *ppte=0b017453
> Internal error: : 1008 [#1] SMP ARM
> CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.8.6 #0
> Hardware name: Generic DT based system
> PC is at 0xc02222f4
> LR is at 0x1
> pc : [<c02222f4>]    lr : [<00000001>]    psr: 00000113
> sp : c782fc98  ip : 00000003  fp : 00000000
> r10: 00000004  r9 : c782e000  r8 : c04ab98c
> r7 : 00000001  r6 : c8874002  r5 : c782fe00  r4 : 00000002
> r3 : 00000000  r2 : c782fe00  r1 : 00100000  r0 : c8874002
> Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 8020406a  DAC: 00000051
> Process swapper/0 (pid: 1, stack limit = 0xc782e210)
> Stack: (0xc782fc98 to 0xc7830000)
> [...]

The WDT_STS (status) needs to be translated via wdt_addr as well.

fixes: f0d9d0f4 ("watchdog: qcom: add option for standalone watchdog not in timer block")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

63b33e08

watchdog: mei_wdt: request stop on reboot to prevent false positive event · bf902ead

Alexander Usyskin authored Nov 08, 2016

commit 9eff1140 upstream.

Systemd on reboot enables shutdown watchdog that leaves the watchdog
device open to ensure that even if power down process get stuck the
platform reboots nonetheless.
The iamt_wdt is an alarm-only watchdog and can't reboot system, but the
FW will generate an alarm event reboot was completed in time, as the
watchdog is not automatically disabled during power cycle.
So we should request stop watchdog on reboot to eliminate wrong alarm
from the FW.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bf902ead

kernel/watchdog: use nmi registers snapshot in hardlockup handler · 2f826a72

Konstantin Khlebnikov authored Dec 14, 2016

commit 4d1f0fb0 upstream.

NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ.
Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP
pointing to the code interrupted by IRQ which was interrupted by NMI.
NULL isn't a problem: in this case watchdog calls dump_stack() and
prints full stack trace including NMI. But if we're stuck in IRQ
handler then NMI watchlog will print stack trace without IRQ part at
all.

This patch uses registers snapshot passed into NMI handler as arguments:
these registers point exactly to the instruction interrupted by NMI.

Fixes: 55537871 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup")
Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzzSigned-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2f826a72

CIFS: Fix a possible memory corruption in push locks · bbf23f00

Pavel Shilovsky authored Nov 29, 2016

commit e3d240e9 upstream.

If maxBuf is not 0 but less than a size of SMB2 lock structure
we can end up with a memory corruption.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bbf23f00

CIFS: Fix missing nls unload in smb2_reconnect() · 9f1f5076

Pavel Shilovsky authored Nov 29, 2016

commit 4772c795 upstream.
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9f1f5076

CIFS: Fix a possible memory corruption during reconnect · ff04da38

Pavel Shilovsky authored Nov 04, 2016

commit 53e0e11e upstream.

We can not unlock/lock cifs_tcp_ses_lock while walking through ses
and tcon lists because it can corrupt list iterator pointers and
a tcon structure can be released if we don't hold an extra reference.
Fix it by moving a reconnect process to a separate delayed work
and acquiring a reference to every tcon that needs to be reconnected.
Also do not send an echo request on newly established connections.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ff04da38

ASoC: intel: Fix crash at suspend/resume without card registration · 6cb589c7

Takashi Iwai authored Nov 25, 2016

commit 2fc995a8 upstream.

When ASoC Intel SST Medfield driver is probed but without codec / card
assigned, it causes an Oops and freezes the kernel at suspend/resume,

 PM: Suspending system (freeze)
 Suspending console(s) (use no_console_suspend to debug)
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
 IP: [<ffffffffc09d9409>] sst_soc_prepare+0x19/0xa0 [snd_soc_sst_mfld_platform]
 Oops: 0000 [#1] PREEMPT SMP
 CPU: 0 PID: 1552 Comm: systemd-sleep Tainted: G W 4.9.0-rc6-1.g5f5c2ad-default #1
 Call Trace:
  [<ffffffffb45318f9>] dpm_prepare+0x209/0x460
  [<ffffffffb4531b61>] dpm_suspend_start+0x11/0x60
  [<ffffffffb40d3cc2>] suspend_devices_and_enter+0xb2/0x710
  [<ffffffffb40d462e>] pm_suspend+0x30e/0x390
  [<ffffffffb40d2eba>] state_store+0x8a/0x90
  [<ffffffffb43c670f>] kobj_attr_store+0xf/0x20
  [<ffffffffb42b0d97>] sysfs_kf_write+0x37/0x40
  [<ffffffffb42b02bc>] kernfs_fop_write+0x11c/0x1b0
  [<ffffffffb422be68>] __vfs_write+0x28/0x140
  [<ffffffffb43728a8>] ? apparmor_file_permission+0x18/0x20
  [<ffffffffb433b2ab>] ? security_file_permission+0x3b/0xc0
  [<ffffffffb422d095>] vfs_write+0xb5/0x1a0
  [<ffffffffb422e3d6>] SyS_write+0x46/0xa0
  [<ffffffffb4719fbb>] entry_SYSCALL_64_fastpath+0x1e/0xad

Add proper NULL checks in the PM code of mdfld driver.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6cb589c7

dm space map metadata: fix 'struct sm_metadata' leak on failed create · 769c0922

Benjamin Marzinski authored Nov 30, 2016

commit 314c25c5 upstream.

In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

769c0922

dm raid: fix discard support regression · ab10ab0a

Heinz Mauelshagen authored Nov 29, 2016

commit 11e29684 upstream.

Commit ecbfb9f1 ("dm raid: add raid level takeover support") moved the
configure_discard_support() call from raid_ctr() to raid_preresume().

Enabling/disabling discard _must_ happen during table load (through the
.ctr hook).  Fix this regression by moving the
configure_discard_support() call back to raid_ctr().

Fixes: ecbfb9f1 ("dm raid: add raid level takeover support")
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ab10ab0a

dm rq: fix a race condition in rq_completed() · 454b98d3

Bart Van Assche authored Nov 11, 2016

commit d15bb3a6 upstream.

It is required to hold the queue lock when calling blk_run_queue_async()
to avoid that a race between blk_run_queue_async() and
blk_cleanup_queue() is triggered.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

454b98d3

dm crypt: mark key as invalid until properly loaded · 26011e67

Ondrej Kozina authored Nov 02, 2016

commit 265e9098 upstream.

In crypt_set_key(), if a failure occurs while replacing the old key
(e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
set.  Otherwise, the crypto layer would have an invalid key that still
has DM_CRYPT_KEY_VALID flag set.
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

26011e67

dm flakey: return -EINVAL on interval bounds error in flakey_ctr() · bd5fcd18

Wei Yongjun authored Aug 08, 2016

commit bff7e067 upstream.

Fix to return error code -EINVAL instead of 0, as is done elsewhere in
this function.

Fixes: e80d1c80 ("dm: do not override error code returned from dm_get_device()")
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bd5fcd18

dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM device · 1ca66d6a

Bart Van Assche authored Dec 07, 2016

commit 301fc3f5 upstream.

When dm_table_set_type() is used by a target to establish a DM table's
type (e.g. DM_TYPE_MQ_REQUEST_BASED in the case of DM multipath) the
DM core must go on to verify that the devices in the table are
compatible with the established type.

Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1ca66d6a

dm table: fix 'all_blk_mq' inconsistency when an empty table is loaded · d948d3b1

Mike Snitzer authored Nov 23, 2016

commit 6936c12c upstream.

An earlier DM multipath table could have been build ontop of underlying
devices that were all using blk-mq.  In that case, if that active
multipath table is replaced with an empty DM multipath table (that
reflects all paths have failed) then it is important that the
'all_blk_mq' state of the active table is transfered to the new empty DM
table.  Otherwise dm-rq.c:dm_old_prep_tio() will incorrectly clone a
request that isn't needed by the DM multipath target when it is to issue
IO to an underlying blk-mq device.

Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d948d3b1

blk-mq: Do not invoke .queue_rq() for a stopped queue · 45f63111

Bart Van Assche authored Oct 28, 2016

commit bc27c01b upstream.

The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.
Reported-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

45f63111

PM / OPP: Pass opp_table to dev_pm_opp_put_regulator() · e3742a15

Stephen Boyd authored Nov 30, 2016

commit 91291d9a upstream.

Joonyoung Shim reported an interesting problem on his ARM octa-core
Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator()
was failing for a struct device for which dev_pm_opp_set_regulator() is
called earlier.

This happened because an earlier call to
dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file)
removed all the entries from opp_table->dev_list apart from the last CPU
device in the cpumask of CPUs sharing the OPP.

But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator()
routines get CPU device for the first CPU in the cpumask. And so the OPP
core failed to find the OPP table for the struct device.

This patch attempts to fix this problem by returning a pointer to the
opp_table from dev_pm_opp_set_regulator() and using that as the
parameter to dev_pm_opp_put_regulator(). This ensures that the
dev_pm_opp_put_regulator() doesn't fail to find the opp table.

Note that similar design problem also exists with other
dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and
so we don't need to update them for now.
Reported-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ Viresh: Wrote commit log and tested on exynos 5250 ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e3742a15

usb: gadget: composite: always set ep->mult to a sensible value · 8b63a922

Felipe Balbi authored Sep 28, 2016

commit eaa496ff upstream.

ep->mult is supposed to be set to Isochronous and
Interrupt Endapoint's multiplier value. This value
is computed from different places depending on the
link speed.

If we're dealing with HighSpeed, then it's part of
bits [12:11] of wMaxPacketSize. This case wasn't
taken into consideration before.

While at that, also make sure the ep->mult defaults
to one so drivers can use it unconditionally and
assume they'll never multiply ep->maxpacket to zero.

Cc: <stable@vger.kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8b63a922

mm, page_alloc: keep pcp count and list contents in sync if struct page is corrupted · d4f4b2e6

Mel Gorman authored Dec 12, 2016

commit a6de734b upstream.

Vlastimil Babka pointed out that commit 479f854a ("mm, page_alloc:
defer debugging checks of pages allocated from the PCP") will allow the
per-cpu list counter to be out of sync with the per-cpu list contents if
a struct page is corrupted.

The consequence is an infinite loop if the per-cpu lists get fully
drained by free_pcppages_bulk because all the lists are empty but the
count is positive.  The infinite loop occurs here

                do {
                        batch_free++;
                        if (++migratetype == MIGRATE_PCPTYPES)
                                migratetype = 0;
                        list = &pcp->lists[migratetype];
                } while (list_empty(list));

What the user sees is a bad page warning followed by a soft lockup with
interrupts disabled in free_pcppages_bulk().

This patch keeps the accounting in sync.

Fixes: 479f854a ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Link: http://lkml.kernel.org/r/20161202112951.23346-2-mgorman@techsingularity.netSigned-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d4f4b2e6

mm/vmscan.c: set correct defer count for shrinker · 0927d281

Shaohua Li authored Dec 12, 2016

commit 5f33a080 upstream.

Our system uses significantly more slab memory with memcg enabled with
the latest kernel.  With 3.10 kernel, slab uses 2G memory, while with
4.6 kernel, 6G memory is used.  The shrinker has problem.  Let's see we
have two memcg for one shrinker.  In do_shrink_slab:

1. Check cg1.  nr_deferred = 0, assume total_scan = 700.  batch size
   is 1024, then no memory is freed.  nr_deferred = 700

2. Check cg2.  nr_deferred = 700.  Assume freeable = 20, then
   total_scan = 10 or 40.  Let's assume it's 10.  No memory is freed.
   nr_deferred = 10.

The deferred share of cg1 is lost in this case.  kswapd will free no
memory even run above steps again and again.

The fix makes sure one memcg's deferred share isn't lost.

Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.comSigned-off-by: Shaohua Li <shli@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0927d281

nvmet: Fix possible infinite loop triggered on hot namespace removal · 3e0ef1b8

Solganik Alexander authored Oct 30, 2016

commit e4fcf07c upstream.

When removing a namespace we delete it from the subsystem namespaces
list with list_del_init which allows us to know if it is enabled or
not.

The problem is that list_del_init initialize the list next and does
not respect the RCU list-traversal we do on the IO path for locating
a namespace. Instead we need to use list_del_rcu which is allowed to
run concurrently with the _rcu list-traversal primitives (keeps list
next intact) and guarantees concurrent nvmet_find_naespace forward
progress.

By changing that, we cannot rely on ns->dev_link for knowing if the
namspace is enabled, so add enabled indicator entry to nvmet_ns for
that.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3e0ef1b8

loop: return proper error from loop_queue_rq() · 6290a3bc

Omar Sandoval authored Nov 14, 2016

commit b4a567e8 upstream.

->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not
an errno.

Fixes: f4aa4c7b ("block: loop: convert to per-device workqueue")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6290a3bc

f2fs: fix overflow due to condition check order · bf0f0207

Jaegeuk Kim authored Nov 23, 2016

commit e87f7329 upstream.

In the last ilen case, i was already increased, resulting in accessing out-
of-boundary entry of do_replace and blkaddr.
Fix to check ilen first to exit the loop.

Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bf0f0207

f2fs: set ->owner for debugfs status file's file_operations · 154d83a8

Nicolai Stange authored Nov 20, 2016

commit 05e6ea26 upstream.

The struct file_operations instance serving the f2fs/status debugfs file
lacks an initialization of its ->owner.

This means that although that file might have been opened, the f2fs module
can still get removed. Any further operation on that opened file, releasing
included,  will cause accesses to unmapped memory.

Indeed, Mike Marshall reported the following:

  BUG: unable to handle kernel paging request at ffffffffa0307430
  IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
  <...>
  Call Trace:
   [] __fput+0xdf/0x1d0
   [] ____fput+0xe/0x10
   [] task_work_run+0x8e/0xc0
   [] do_exit+0x2ae/0xae0
   [] ? __audit_syscall_entry+0xae/0x100
   [] ? syscall_trace_enter+0x1ca/0x310
   [] do_group_exit+0x44/0xc0
   [] SyS_exit_group+0x14/0x20
   [] do_syscall_64+0x61/0x150
   [] entry_SYSCALL64_slow_path+0x25/0x25
  <...>
  ---[ end trace f22ae883fa3ea6b8 ]---
  Fixing recursive fault but reboot is needed!

Fix this by initializing the f2fs/status file_operations' ->owner with
THIS_MODULE.

This will allow debugfs to grab a reference to the f2fs module upon any
open on that file, thus preventing it from getting removed.

Fixes: 902829aa ("f2fs: move proc files to debugfs")
Reported-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

154d83a8

Revert "f2fs: use percpu_counter for # of dirty pages in inode" · 67e5239c

Jaegeuk Kim authored Dec 02, 2016

commit 204706c7 upstream.

This reverts commit 1beba1b3.

The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

67e5239c

ext4: do not perform data journaling when data is encrypted · d06eaf28

Sergey Karamov authored Dec 10, 2016

commit 73b92a2a upstream.

Currently data journalling is incompatible with encryption: enabling both
at the same time has never been supported by design, and would result in
unpredictable behavior. However, users are not precluded from turning on
both features simultaneously. This change programmatically replaces data
journaling for encrypted regular files with ordered data journaling mode.

Background:
Journaling encrypted data has not been supported because it operates on
buffer heads of the page in the page cache. Namely, when the commit
happens, which could be up to five seconds after caching, the commit
thread uses the buffer heads attached to the page to copy the contents of
the page to the journal. With encryption, it would have been required to
keep the bounce buffer with ciphertext for up to the aforementioned five
seconds, since the page cache can only hold plaintext and could not be
used for journaling. Alternatively, it would be required to setup the
journal to initiate a callback at the commit time to perform deferred
encryption - in this case, not only would the data have to be written
twice, but it would also have to be encrypted twice. This level of
complexity was not justified for a mode that in practice is very rarely
used because of the overhead from the data journalling.

Solution:
If data=journaled has been set as a mount option for a filesystem, or if
journaling is enabled on a regular file, do not perform journaling if the
file is also encrypted, instead fall back to the data=ordered mode for the
file.

Rationale:
The intent is to allow seamless and proper filesystem operation when
journaling and encryption have both been enabled, and have these two
conflicting features gracefully resolved by the filesystem.

Fixes: 44614711Signed-off-by: Sergey Karamov <skaramov@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d06eaf28

ext4: return -ENOMEM instead of success · e33673be

Dan Carpenter authored Dec 10, 2016

commit 578620f4 upstream.

We should set the error code if kzalloc() fails.

Fixes: 67cf5b09 ("ext4: add the basic function for inline data support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e33673be

ext4: reject inodes with negative size · 36648770

Darrick J. Wong authored Dec 10, 2016

commit 7e6e1ef4 upstream.

Don't load an inode with a negative size; this causes integer overflow
problems in the VFS.

[ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT]

Fixes: a48380f7 (ext4: rename i_dir_acl to i_size_high)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

36648770

ext4: add sanity checking to count_overhead() · 1bfcffbb

Theodore Ts'o authored Nov 18, 2016

commit c48ae41b upstream.

The commit "ext4: sanity check the block and cluster size at mount
time" should prevent any problems, but in case the superblock is
modified while the file system is mounted, add an extra safety check
to make sure we won't overrun the allocated buffer.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1bfcffbb

ext4: fix in-superblock mount options processing · 9689eb99

Theodore Ts'o authored Nov 18, 2016

commit 5aee0f8a upstream.

Fix a large number of problems with how we handle mount options in the
superblock.  For one, if the string in the superblock is long enough
that it is not null terminated, we could run off the end of the string
and try to interpret superblocks fields as characters.  It's unlikely
this will cause a security problem, but it could result in an invalid
parse.  Also, parse_options is destructive to the string, so in some
cases if there is a comma-separated string, it would be modified in
the superblock.  (Fortunately it only happens on file systems with a
1k block size.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9689eb99

ext4: use more strict checks for inodes_per_block on mount · 52a9daa3

Theodore Ts'o authored Nov 18, 2016

commit cd6bb35b upstream.

Centralize the checks for inodes_per_block and be more strict to make
sure the inodes_per_block_group can't end up being zero.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

52a9daa3

ext4: fix stack memory corruption with 64k block size · 75055843

Chandan Rajendra authored Nov 14, 2016

commit 30a9d7af upstream.

The number of 'counters' elements needed in 'struct sg' is
super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
elements in the array. This is insufficient for block sizes >= 32k. In
such cases the memcpy operation performed in ext4_mb_seq_groups_show()
would cause stack memory corruption.

Fixes: c9de560dSigned-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

75055843

ext4: fix mballoc breakage with 64k block size · 86efd99f

Chandan Rajendra authored Nov 14, 2016

commit 69e43e8c upstream.

'border' variable is set to a value of 2 times the block size of the
underlying filesystem. With 64k block size, the resulting value won't
fit into a 16-bit variable. Hence this commit changes the data type of
'border' to 'unsigned int'.

Fixes: c9de560dSigned-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

86efd99f