1. 21 Nov, 2018 40 commits
    • Lu Fengqi's avatar
      btrfs: fix pinned underflow after transaction aborted · 84e7fe7b
      Lu Fengqi authored
      commit fcd5e742 upstream.
      
      When running generic/475, we may get the following warning in dmesg:
      
      [ 6902.102154] WARNING: CPU: 3 PID: 18013 at fs/btrfs/extent-tree.c:9776 btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
      [ 6902.109160] CPU: 3 PID: 18013 Comm: umount Tainted: G        W  O      4.19.0-rc8+ #8
      [ 6902.110971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      [ 6902.112857] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
      [ 6902.118921] RSP: 0018:ffffc9000459bdb0 EFLAGS: 00010286
      [ 6902.120315] RAX: ffff880175050bb0 RBX: ffff8801124a8000 RCX: 0000000000170007
      [ 6902.121969] RDX: 0000000000000002 RSI: 0000000000170007 RDI: ffffffff8125fb74
      [ 6902.123716] RBP: ffff880175055d10 R08: 0000000000000000 R09: 0000000000000000
      [ 6902.125417] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175055d88
      [ 6902.127129] R13: ffff880175050bb0 R14: 0000000000000000 R15: dead000000000100
      [ 6902.129060] FS:  00007f4507223780(0000) GS:ffff88017ba00000(0000) knlGS:0000000000000000
      [ 6902.130996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6902.132558] CR2: 00005623599cac78 CR3: 000000014b700001 CR4: 00000000003606e0
      [ 6902.134270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6902.135981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 6902.137836] Call Trace:
      [ 6902.138939]  close_ctree+0x171/0x330 [btrfs]
      [ 6902.140181]  ? kthread_stop+0x146/0x1f0
      [ 6902.141277]  generic_shutdown_super+0x6c/0x100
      [ 6902.142517]  kill_anon_super+0x14/0x30
      [ 6902.143554]  btrfs_kill_super+0x13/0x100 [btrfs]
      [ 6902.144790]  deactivate_locked_super+0x2f/0x70
      [ 6902.146014]  cleanup_mnt+0x3b/0x70
      [ 6902.147020]  task_work_run+0x9e/0xd0
      [ 6902.148036]  do_syscall_64+0x470/0x600
      [ 6902.149142]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [ 6902.150375]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 6902.151640] RIP: 0033:0x7f45077a6a7b
      [ 6902.157324] RSP: 002b:00007ffd589f3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [ 6902.159187] RAX: 0000000000000000 RBX: 000055e8eec732b0 RCX: 00007f45077a6a7b
      [ 6902.160834] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055e8eec73490
      [ 6902.162526] RBP: 0000000000000000 R08: 000055e8eec734b0 R09: 00007ffd589f26c0
      [ 6902.164141] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e8eec73490
      [ 6902.165815] R13: 00007f4507ac61a4 R14: 0000000000000000 R15: 00007ffd589f40d8
      [ 6902.167553] irq event stamp: 0
      [ 6902.168998] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
      [ 6902.170731] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
      [ 6902.172773] softirqs last  enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
      [ 6902.174671] softirqs last disabled at (0): [<0000000000000000>]           (null)
      [ 6902.176407] ---[ end trace 463138c2986b275c ]---
      [ 6902.177636] BTRFS info (device dm-3): space_info 4 has 273465344 free, is not full
      [ 6902.179453] BTRFS info (device dm-3): space_info total=276824064, used=4685824, pinned=18446744073708158976, reserved=0, may_use=0, readonly=65536
      
      In the above line there's "pinned=18446744073708158976" which is an
      unsigned u64 value of -1392640, an obvious underflow.
      
      When transaction_kthread is running cleanup_transaction(), another
      fsstress is running btrfs_commit_transaction(). The
      btrfs_finish_extent_commit() may get the same range as
      btrfs_destroy_pinned_extent() got, which causes the pinned underflow.
      
      Fixes: d4b450cd ("Btrfs: fix race between transaction commit and empty block group removal")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84e7fe7b
    • Mathieu Malaterre's avatar
      watchdog/core: Add missing prototypes for weak functions · ca35750b
      Mathieu Malaterre authored
      commit 81bd415c upstream.
      
      The split out of the hard lockup detector exposed two new weak functions,
      but no prototypes for them, which triggers the build warning:
      
        kernel/watchdog.c:109:12: warning: no previous prototype for ‘watchdog_nmi_enable’ [-Wmissing-prototypes]
        kernel/watchdog.c:115:13: warning: no previous prototype for ‘watchdog_nmi_disable’ [-Wmissing-prototypes]
      
      Add the prototypes.
      
      Fixes: 73ce0511 ("kernel/watchdog.c: move hardlockup detector to separate file")
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Babu Moger <babu.moger@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180606194232.17653-1-malat@debian.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca35750b
    • H. Peter Anvin (Intel)'s avatar
      arch/alpha, termios: implement BOTHER, IBSHIFT and termios2 · 4fa1a679
      H. Peter Anvin (Intel) authored
      commit d0ffb805 upstream.
      
      Alpha has had c_ispeed and c_ospeed, but still set speeds in c_cflags
      using arbitrary flags. Because BOTHER is not defined, the general
      Linux code doesn't allow setting arbitrary baud rates, and because
      CBAUDEX == 0, we can have an array overrun of the baud_rate[] table in
      drivers/tty/tty_baudrate.c if (c_cflags & CBAUD) == 037.
      
      Resolve both problems by #defining BOTHER to 037 on Alpha.
      
      However, userspace still needs to know if setting BOTHER is actually
      safe given legacy kernels (does anyone actually care about that on
      Alpha anymore?), so enable the TCGETS2/TCSETS*2 ioctls on Alpha, even
      though they use the same structure. Define struct termios2 just for
      compatibility; it is the exact same structure as struct termios. In a
      future patchset, this will be cleaned up so the uapi headers are
      usable from libc.
      Signed-off-by: default avatarH. Peter Anvin (Intel) <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Eugene Syromiatnikov <esyr@redhat.com>
      Cc: <linux-alpha@vger.kernel.org>
      Cc: <linux-serial@vger.kernel.org>
      Cc: Johan Hovold <johan@kernel.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fa1a679
    • H. Peter Anvin's avatar
      termios, tty/tty_baudrate.c: fix buffer overrun · d85114a0
      H. Peter Anvin authored
      commit 991a2519 upstream.
      
      On architectures with CBAUDEX == 0 (Alpha and PowerPC), the code in tty_baudrate.c does
      not do any limit checking on the tty_baudrate[] array, and in fact a
      buffer overrun is possible on both architectures. Add a limit check to
      prevent that situation.
      
      This will be followed by a much bigger cleanup/simplification patch.
      Signed-off-by: default avatarH. Peter Anvin (Intel) <hpa@zytor.com>
      Requested-by: default avatarCc: Johan Hovold <johan@kernel.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Eugene Syromiatnikov <esyr@redhat.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d85114a0
    • Michael Kelley's avatar
      x86/hyper-v: Enable PIT shutdown quirk · de51aafd
      Michael Kelley authored
      commit 1de72c70 upstream.
      
      Hyper-V emulation of the PIT has a quirk such that the normal PIT shutdown
      path doesn't work, because clearing the counter register restarts the
      timer.
      
      Disable the counter clearing on PIT shutdown.
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org>
      Cc: "jgross@suse.com" <jgross@suse.com>
      Cc: "akataria@vmware.com" <akataria@vmware.com>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: vkuznets <vkuznets@redhat.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1541303219-11142-3-git-send-email-mikelley@microsoft.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de51aafd
    • Steven Rostedt (VMware)'s avatar
      x86/cpu/vmware: Do not trace vmware_sched_clock() · c46d3436
      Steven Rostedt (VMware) authored
      commit 15035388 upstream.
      
      When running function tracing on a Linux guest running on VMware
      Workstation, the guest would crash. This is due to tracing of the
      sched_clock internal call of the VMware vmware_sched_clock(), which
      causes an infinite recursion within the tracing code (clock calls must
      not be traced).
      
      Make vmware_sched_clock() not traced by ftrace.
      
      Fixes: 80e9a4f2 ("x86/vmware: Add paravirt sched clock")
      Reported-by: default avatarGwanYeong Kim <gy741.kim@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      CC: Alok Kataria <akataria@vmware.com>
      CC: GwanYeong Kim <gy741.kim@gmail.com>
      CC: "H. Peter Anvin" <hpa@zytor.com>
      CC: Ingo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: virtualization@lists.linux-foundation.org
      CC: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20181109152207.4d3e7d70@gandalf.local.homeSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c46d3436
    • John Garry's avatar
      of, numa: Validate some distance map rules · 9c6f231e
      John Garry authored
      commit 89c38422 upstream.
      
      Currently the NUMA distance map parsing does not validate the distance
      table for the distance-matrix rules 1-2 in [1].
      
      However the arch NUMA code may enforce some of these rules, but not all.
      Such is the case for the arm64 port, which does not enforce the rule that
      the distance between separates nodes cannot equal LOCAL_DISTANCE.
      
      The patch adds the following rules validation:
      - distance of node to self equals LOCAL_DISTANCE
      - distance of separate nodes > LOCAL_DISTANCE
      
      This change avoids a yet-unresolved crash reported in [2].
      
      A note on dealing with symmetrical distances between nodes:
      
      Validating symmetrical distances between nodes is difficult. If it were
      mandated in the bindings that every distance must be recorded in the
      table, then it would be easy. However, it isn't.
      
      In addition to this, it is also possible to record [b, a] distance only
      (and not [a, b]). So, when processing the table for [b, a], we cannot
      assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
      distance may not be present in the table and current distance would be
      default at REMOTE_DISTANCE.
      
      As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
      for b > a. This policy is different to kernel ACPI SLIT validation, which
      allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
      the distance debug message is dropped as it may be misleading (for a distance
      which is later overwritten).
      
      Some final notes on semantics:
      
      - It is implied that it is the responsibility of the arch NUMA code to
        reset the NUMA distance map for an error in distance map parsing.
      
      - It is the responsibility of the FW NUMA topology parsing (whether OF or
        ACPI) to enforce NUMA distance rules, and not arch NUMA code.
      
      [1] Documents/devicetree/bindings/numa.txt
      [2] https://www.spinics.net/lists/arm-kernel/msg683304.html
      
      Cc: stable@vger.kernel.org # 4.7
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c6f231e
    • Arnd Bergmann's avatar
      mtd: docg3: don't set conflicting BCH_CONST_PARAMS option · fbd70354
      Arnd Bergmann authored
      commit be2e1c9d upstream.
      
      I noticed during the creation of another bugfix that the BCH_CONST_PARAMS
      option that is set by DOCG3 breaks setting variable parameters for any
      other users of the BCH library code.
      
      The only other user we have today is the MTD_NAND software BCH
      implementation (most flash controllers use hardware BCH these days
      and are not affected). I considered removing BCH_CONST_PARAMS entirely
      because of the inherent conflict, but according to the description in
      lib/bch.c there is a significant performance benefit in keeping it.
      
      To avoid the immediate problem of the conflict between MTD_NAND_BCH
      and DOCG3, this only sets the constant parameters if MTD_NAND_BCH
      is disabled, which should fix the problem for all cases that
      are affected. This should also work for all stable kernels.
      
      Note that there is only one machine that actually seems to use the
      DOCG3 driver (arch/arm/mach-pxa/mioa701.c), so most users should have
      the driver disabled, but it almost certainly shows up if we wanted
      to test random kernels on machines that use software BCH in MTD.
      
      Fixes: d13d19ec ("mtd: docg3: add ECC correction code")
      Cc: stable@vger.kernel.org
      Cc: Robert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbd70354
    • Ard Biesheuvel's avatar
      ARM: 8809/1: proc-v7: fix Thumb annotation of cpu_v7_hvc_switch_mm · 2d155427
      Ard Biesheuvel authored
      commit 6282e916 upstream.
      
      Due to what appears to be a copy/paste error, the opening ENTRY()
      of cpu_v7_hvc_switch_mm() lacks a matching ENDPROC(), and instead,
      the one for cpu_v7_smc_switch_mm() is duplicated.
      
      Given that it is ENDPROC() that emits the Thumb annotation, the
      cpu_v7_hvc_switch_mm() routine will be called in ARM mode on a
      Thumb2 kernel, resulting in the following splat:
      
        Internal error: Oops - undefined instruction: 0 [#1] SMP THUMB2
        Modules linked in:
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc1-00030-g4d28ad89189d-dirty #488
        Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        PC is at cpu_v7_hvc_switch_mm+0x12/0x18
        LR is at flush_old_exec+0x31b/0x570
        pc : [<c0316efe>]    lr : [<c04117c7>]    psr: 00000013
        sp : ee899e50  ip : 00000000  fp : 00000001
        r10: eda28f34  r9 : eda31800  r8 : c12470e0
        r7 : eda1fc00  r6 : eda53000  r5 : 00000000  r4 : ee88c000
        r3 : c0316eec  r2 : 00000001  r1 : eda53000  r0 : 6da6c000
        Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      
      Note the 'ISA ARM' in the last line.
      
      Fix this by using the correct name in ENDPROC().
      
      Cc: <stable@vger.kernel.org>
      Fixes: 10115105 ("ARM: spectre-v2: add firmware based hardening")
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d155427
    • Vasily Khoruzhick's avatar
      netfilter: conntrack: fix calculation of next bucket number in early_drop · 54ab5952
      Vasily Khoruzhick authored
      commit f393808d upstream.
      
      If there's no entry to drop in bucket that corresponds to the hash,
      early_drop() should look for it in other buckets. But since it increments
      hash instead of bucket number, it actually looks in the same bucket 8
      times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
      reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
      most cases.
      
      Fix it by increasing bucket number instead of hash and rename _hash
      to bucket to avoid future confusion.
      
      Fixes: 3e86638e ("netfilter: conntrack: consider ct netns in early_drop logic")
      Cc: <stable@vger.kernel.org> # v4.7+
      Signed-off-by: default avatarVasily Khoruzhick <vasilykh@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54ab5952
    • Andrea Arcangeli's avatar
      mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings · e6e4f052
      Andrea Arcangeli authored
      commit ac5b2c18 upstream.
      
      THP allocation might be really disruptive when allocated on NUMA system
      with the local node full or hard to reclaim.  Stefan has posted an
      allocation stall report on 4.12 based SLES kernel which suggests the
      same issue:
      
        kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null)
        kvm cpuset=/ mems_allowed=0-1
        CPU: 10 PID: 84752 Comm: kvm Tainted: G        W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased)
        Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017
        Call Trace:
         dump_stack+0x5c/0x84
         warn_alloc+0xe0/0x180
         __alloc_pages_slowpath+0x820/0xc90
         __alloc_pages_nodemask+0x1cc/0x210
         alloc_pages_vma+0x1e5/0x280
         do_huge_pmd_wp_page+0x83f/0xf00
         __handle_mm_fault+0x93d/0x1060
         handle_mm_fault+0xc6/0x1b0
         __do_page_fault+0x230/0x430
         do_page_fault+0x2a/0x70
         page_fault+0x7b/0x80
         [...]
        Mem-Info:
        active_anon:126315487 inactive_anon:1612476 isolated_anon:5
         active_file:60183 inactive_file:245285 isolated_file:0
         unevictable:15657 dirty:286 writeback:1 unstable:0
         slab_reclaimable:75543 slab_unreclaimable:2509111
         mapped:81814 shmem:31764 pagetables:370616 bounce:0
         free:32294031 free_pcp:6233 free_cma:0
        Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
        Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
      
      The defrag mode is "madvise" and from the above report it is clear that
      the THP has been allocated for MADV_HUGEPAGA vma.
      
      Andrea has identified that the main source of the problem is
      __GFP_THISNODE usage:
      
      : The problem is that direct compaction combined with the NUMA
      : __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very
      : hard the local node, instead of failing the allocation if there's no
      : THP available in the local node.
      :
      : Such logic was ok until __GFP_THISNODE was added to the THP allocation
      : path even with MPOL_DEFAULT.
      :
      : The idea behind the __GFP_THISNODE addition, is that it is better to
      : provide local memory in PAGE_SIZE units than to use remote NUMA THP
      : backed memory. That largely depends on the remote latency though, on
      : threadrippers for example the overhead is relatively low in my
      : experience.
      :
      : The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in
      : extremely slow qemu startup with vfio, if the VM is larger than the
      : size of one host NUMA node. This is because it will try very hard to
      : unsuccessfully swapout get_user_pages pinned pages as result of the
      : __GFP_THISNODE being set, instead of falling back to PAGE_SIZE
      : allocations and instead of trying to allocate THP on other nodes (it
      : would be even worse without vfio type1 GUP pins of course, except it'd
      : be swapping heavily instead).
      
      Fix this by removing __GFP_THISNODE for THP requests which are
      requesting the direct reclaim.  This effectivelly reverts 5265047a
      on the grounds that the zone/node reclaim was known to be disruptive due
      to premature reclaim when there was memory free.  While it made sense at
      the time for HPC workloads without NUMA awareness on rare machines, it
      was ultimately harmful in the majority of cases.  The existing behaviour
      is similar, if not as widespare as it applies to a corner case but
      crucially, it cannot be tuned around like zone_reclaim_mode can.  The
      default behaviour should always be to cause the least harm for the
      common case.
      
      If there are specialised use cases out there that want zone_reclaim_mode
      in specific cases, then it can be built on top.  Longterm we should
      consider a memory policy which allows for the node reclaim like behavior
      for the specific memory ranges which would allow a
      
      [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com
      
      Mel said:
      
      : Both patches look correct to me but I'm responding to this one because
      : it's the fix.  The change makes sense and moves further away from the
      : severe stalling behaviour we used to see with both THP and zone reclaim
      : mode.
      :
      : I put together a basic experiment with usemem configured to reference a
      : buffer multiple times that is 80% the size of main memory on a 2-socket
      : box with symmetric node sizes and defrag set to "always".  The defrag
      : setting is not the default but it would be functionally similar to
      : accessing a buffer with madvise(MADV_HUGEPAGE).  Usemem is configured to
      : reference the buffer multiple times and while it's not an interesting
      : workload, it would be expected to complete reasonably quickly as it fits
      : within memory.  The results were;
      :
      : usemem
      :                                   vanilla           noreclaim-v1
      : Amean     Elapsd-1       42.78 (   0.00%)       26.87 (  37.18%)
      : Amean     Elapsd-3       27.55 (   0.00%)        7.44 (  73.00%)
      : Amean     Elapsd-4        5.72 (   0.00%)        5.69 (   0.45%)
      :
      : This shows the elapsed time in seconds for 1 thread, 3 threads and 4
      : threads referencing buffers 80% the size of memory.  With the patches
      : applied, it's 37.18% faster for the single thread and 73% faster with two
      : threads.  Note that 4 threads showing little difference does not indicate
      : the problem is related to thread counts.  It's simply the case that 4
      : threads gets spread so their workload mostly fits in one node.
      :
      : The overall view from /proc/vmstats is more startling
      :
      :                          4.19.0-rc1  4.19.0-rc1
      :                             vanillanoreclaim-v1r1
      : Minor Faults               35593425      708164
      : Major Faults                 484088          36
      : Swap Ins                    3772837           0
      : Swap Outs                   3932295           0
      :
      : Massive amounts of swap in/out without the patch
      :
      : Direct pages scanned        6013214           0
      : Kswapd pages scanned              0           0
      : Kswapd pages reclaimed            0           0
      : Direct pages reclaimed      4033009           0
      :
      : Lots of reclaim activity without the patch
      :
      : Kswapd efficiency              100%        100%
      : Kswapd velocity               0.000       0.000
      : Direct efficiency               67%        100%
      : Direct velocity           11191.956       0.000
      :
      : Mostly from direct reclaim context as you'd expect without the patch.
      :
      : Page writes by reclaim  3932314.000       0.000
      : Page writes file                 19           0
      : Page writes anon            3932295           0
      : Page reclaim immediate        42336           0
      :
      : Writes from reclaim context is never good but the patch eliminates it.
      :
      : We should never have default behaviour to thrash the system for such a
      : basic workload.  If zone reclaim mode behaviour is ever desired but on a
      : single task instead of a global basis then the sensible option is to build
      : a mempolicy that enforces that behaviour.
      
      This was a severe regression compared to previous kernels that made
      important workloads unusable and it starts when __GFP_THISNODE was
      added to THP allocations under MADV_HUGEPAGE.  It is not a significant
      risk to go to the previous behavior before __GFP_THISNODE was added, it
      worked like that for years.
      
      This was simply an optimization to some lucky workloads that can fit in
      a single node, but it ended up breaking the VM for others that can't
      possibly fit in a single node, so going back is safe.
      
      [mhocko@suse.com: rewrote the changelog based on the one from Andrea]
      Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org
      Fixes: 5265047a ("mm, thp: really limit transparent hugepage allocation to local node")
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
      Debugged-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: <stable@vger.kernel.org>	[4.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6e4f052
    • Wengang Wang's avatar
      ocfs2: free up write context when direct IO failed · 1d0ad8ac
      Wengang Wang authored
      commit 5040f8df upstream.
      
      The write context should also be freed even when direct IO failed.
      Otherwise a memory leak is introduced and entries remain in
      oi->ip_unwritten_list causing the following BUG later in unlink path:
      
        ERROR: bug expression: !list_empty(&oi->ip_unwritten_list)
        ERROR: Clear inode of 215043, inode has unwritten extents
        ...
        Call Trace:
        ? __set_current_blocked+0x42/0x68
        ocfs2_evict_inode+0x91/0x6a0 [ocfs2]
        ? bit_waitqueue+0x40/0x33
        evict+0xdb/0x1af
        iput+0x1a2/0x1f7
        do_unlinkat+0x194/0x28f
        SyS_unlinkat+0x1b/0x2f
        do_syscall_64+0x79/0x1ae
        entry_SYSCALL_64_after_hwframe+0x151/0x0
      
      This patch also logs, with frequency limit, direct IO failures.
      
      Link: http://lkml.kernel.org/r/20181102170632.25921-1-wen.gang.wang@oracle.comSigned-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Reviewed-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d0ad8ac
    • Changwei Ge's avatar
      ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entry · ce8bf4cf
      Changwei Ge authored
      commit 29aa3016 upstream.
      
      Somehow, file system metadata was corrupted, which causes
      ocfs2_check_dir_entry() to fail in function ocfs2_dir_foreach_blk_el().
      
      According to the original design intention, if above happens we should
      skip the problematic block and continue to retrieve dir entry.  But
      there is obviouse misuse of brelse around related code.
      
      After failure of ocfs2_check_dir_entry(), current code just moves to
      next position and uses the problematic buffer head again and again
      during which the problematic buffer head is released for multiple times.
      I suppose, this a serious issue which is long-lived in ocfs2.  This may
      cause other file systems which is also used in a the same host insane.
      
      So we should also consider about bakcporting this patch into linux
      -stable.
      
      Link: http://lkml.kernel.org/r/HK2PR06MB045211675B43EED794E597B6D56E0@HK2PR06MB0452.apcprd06.prod.outlook.comSigned-off-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Suggested-by: default avatarChangkuo Shi <shi.changkuo@h3c.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce8bf4cf
    • Marc Zyngier's avatar
      soc: ti: QMSS: Fix usage of irq_set_affinity_hint · d172f252
      Marc Zyngier authored
      commit 832ad0e3 upstream.
      
      The Keystone QMSS driver is pretty damaged, in the sense that it
      does things like this:
      
      	irq_set_affinity_hint(irq, to_cpumask(&cpu_map));
      
      where cpu_map is a local variable. As we leave the function, this
      will point to nowhere-land, and things will end-up badly.
      
      Instead, let's use a proper cpumask that gets allocated, giving
      the driver a chance to actually work with things like irqbalance
      as well as have a hypothetical 64bit future.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d172f252
    • Ming Lei's avatar
      SCSI: fix queue cleanup race before queue initialization is done · a0613957
      Ming Lei authored
      commit 8dc765d4 upstream.
      
      c2856ae2 ("blk-mq: quiesce queue before freeing queue") has
      already fixed this race, however the implied synchronize_rcu()
      in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
      performance regression.
      
      Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      tried to quiesce queue for avoiding unnecessary synchronize_rcu()
      only when queue initialization is done, because it is usual to see
      lots of inexistent LUNs which need to be probed.
      
      However, turns out it isn't safe to quiesce queue only when queue
      initialization is done. Because when one SCSI command is completed,
      the user of sending command can be waken up immediately, then the
      scsi device may be removed, meantime the run queue in scsi_end_request()
      is still in-progress, so kernel panic can be caused.
      
      In Red Hat QE lab, there are several reports about this kind of kernel
      panic triggered during kernel booting.
      
      This patch tries to address the issue by grabing one queue usage
      counter during freeing one request and the following run queue.
      
      Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
      Cc: stable <stable@vger.kernel.org>
      Cc: jianchao.wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0613957
    • Quinn Tran's avatar
      scsi: qla2xxx: Initialize port speed to avoid setting lower speed · 2cbff556
      Quinn Tran authored
      commit f635e48e upstream.
      
      This patch initializes port speed so that firmware does not set lower
      operating speed. Setting lower speed in firmware impacts WRITE perfomance.
      
      Fixes: 726b8548 ("qla2xxx: Add framework for async fabric discovery")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarQuinn Tran <quinn.tran@cavium.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Tested-by: default avatarLaurence Oberman <loberman@redhat.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cbff556
    • Greg Edwards's avatar
      vhost/scsi: truncate T10 PI iov_iter to prot_bytes · 56bce0e5
      Greg Edwards authored
      commit 4542d623 upstream.
      
      Commands with protection information included were not truncating the
      protection iov_iter to the number of protection bytes in the command.
      This resulted in vhost_scsi mis-calculating the size of the protection
      SGL in vhost_scsi_calc_sgls(), and including both the protection and
      data SG entries in the protection SGL.
      
      Fixes: 09b13fa8 ("vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq")
      Signed-off-by: default avatarGreg Edwards <gedwards@ddn.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Fixes: 09b13fa8
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56bce0e5
    • Gustavo A. R. Silva's avatar
      reset: hisilicon: fix potential NULL pointer dereference · b3d31806
      Gustavo A. R. Silva authored
      commit e9a2310f upstream.
      
      There is a potential execution path in which function
      platform_get_resource() returns NULL. If this happens,
      we will end up having a NULL pointer dereference.
      
      Fix this by replacing devm_ioremap with devm_ioremap_resource,
      which has the NULL check and the memory region request.
      
      This code was detected with the help of Coccinelle.
      
      Cc: stable@vger.kernel.org
      Fixes: 97b7129c ("reset: hisilicon: change the definition of hisi_reset_init")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3d31806
    • Mikulas Patocka's avatar
      mach64: fix image corruption due to reading accelerator registers · 859eac94
      Mikulas Patocka authored
      commit c09bcc91 upstream.
      
      Reading the registers without waiting for engine idle returns
      unpredictable values. These unpredictable values result in display
      corruption - if atyfb_imageblit reads the content of DP_PIX_WIDTH with the
      bit DP_HOST_TRIPLE_EN set (from previous invocation), the driver would
      never ever clear the bit, resulting in display corruption.
      
      We don't want to wait for idle because it would degrade performance, so
      this patch modifies the driver so that it never reads accelerator
      registers.
      
      HOST_CNTL doesn't have to be read, we can just write it with
      HOST_BYTE_ALIGN because no other part of the driver cares if
      HOST_BYTE_ALIGN is set.
      
      DP_PIX_WIDTH is written in the functions atyfb_copyarea and atyfb_fillrect
      with the default value and in atyfb_imageblit with the value set according
      to the source image data.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <syrjala@sci.fi>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      859eac94
    • Mikulas Patocka's avatar
      mach64: fix display corruption on big endian machines · 07ea1362
      Mikulas Patocka authored
      commit 3c6c6a78 upstream.
      
      The code for manual bit triple is not endian-clean. It builds the variable
      "hostdword" using byte accesses, therefore we must read the variable with
      "le32_to_cpu".
      
      The patch also enables (hardware or software) bit triple only if the image
      is monochrome (image->depth). If we want to blit full-color image, we
      shouldn't use the triple code.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <syrjala@sci.fi>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07ea1362
    • Allen Wild's avatar
      thermal: enable broadcom menu for arm64 bcm2835 · 3b29cbb5
      Allen Wild authored
      commit fec3624f upstream.
      
      Moving the bcm2835 thermal driver to the broadcom directory prevented it
      from getting enabled for arm64 builds, since the broadcom directory is only
      available when 32-bit specific ARCH_BCM is set.
      
      Fix this by enabling the Broadcom menu for ARCH_BCM or ARCH_BCM2835.
      
      Fixes: 6892cf07 ("thermal: bcm2835: move to the broadcom subdirectory")
      Reviewed-by: default avatarEric Anholt <eric@anholt.net>
      Signed-off-by: default avatarAllen Wild <allenwild93@gmail.com>
      Signed-off-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b29cbb5
    • Yan, Zheng's avatar
      Revert "ceph: fix dentry leak in splice_dentry()" · 32ffde79
      Yan, Zheng authored
      commit efe32823 upstream.
      
      This reverts commit 8b8f53af.
      
      splice_dentry() is used by three places. For two places, req->r_dentry
      is passed to splice_dentry(). In the case of error, req->r_dentry does
      not get updated. So splice_dentry() should not drop reference.
      
      Cc: stable@vger.kernel.org # 4.18+
      Signed-off-by: default avatar"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32ffde79
    • Ilya Dryomov's avatar
      libceph: bump CEPH_MSG_MAX_DATA_LEN · fc07f543
      Ilya Dryomov authored
      commit 94e6992b upstream.
      
      If the read is large enough, we end up spinning in the messenger:
      
        libceph: osd0 192.168.122.1:6801 io error
        libceph: osd0 192.168.122.1:6801 io error
        libceph: osd0 192.168.122.1:6801 io error
      
      This is a receive side limit, so only reads were affected.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc07f543
    • Lubomir Rintel's avatar
      media: ov7670: make "xclk" clock optional · 65dd3e59
      Lubomir Rintel authored
      commit 786fa584 upstream.
      
      When the "xclk" clock was added, it was made mandatory. This broke the
      driver on an OLPC plaform which doesn't know such clock. Make it
      optional.
      
      Tested on a OLPC XO-1 laptop.
      
      Fixes: 0a024d63 ("[media] ov7670: get xclk")
      
      Cc: stable@vger.kernel.org # 4.11+
      Signed-off-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65dd3e59
    • Chris Packham's avatar
      clk: mvebu: use correct bit for 98DX3236 NAND · d2aaeb9a
      Chris Packham authored
      commit 00c5a926 upstream.
      
      The correct fieldbit value for the NAND PLL reload trigger is 27.
      
      Fixes: commit e120c17a ("clk: mvebu: support for 98DX3236 SoC")
      Signed-off-by: default avatarChris Packham <chris.packham@alliedtelesis.co.nz>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2aaeb9a
    • Enric Balletbo i Serra's avatar
      clk: rockchip: Fix static checker warning in rockchip_ddrclk_get_parent call · 864aede9
      Enric Balletbo i Serra authored
      commit 665636b2 upstream.
      
      Fixes the signedness bug returning '(-22)' on the return type by removing the
      sanity checker in rockchip_ddrclk_get_parent(). The function should return
      and unsigned value only and it's safe to remove the sanity checker as the
      core functions that call get_parent like clk_core_get_parent_by_index already
      ensures the validity of the clk index returned (index >= core->num_parents).
      
      Fixes: a4f182bf ("clk: rockchip: add new clock-type for the ddrclk")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Reviewed-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      864aede9
    • Ronald Wahl's avatar
      clk: at91: Fix division by zero in PLL recalc_rate() · abe960b7
      Ronald Wahl authored
      commit 0f5cb0e6 upstream.
      
      Commit a982e45d ("clk: at91: PLL recalc_rate() now using cached MUL
      and DIV values") removed a check that prevents a division by zero. This
      now causes a stacktrace when booting the kernel on a at91 platform if
      the PLL DIV register contains zero. This commit reintroduces this check.
      
      Fixes: a982e45d ("clk: at91: PLL recalc_rate() now using cached...")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRonald Wahl <rwahl@gmx.de>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abe960b7
    • Krzysztof Kozlowski's avatar
      clk: s2mps11: Fix matching when built as module and DT node contains compatible · f074414a
      Krzysztof Kozlowski authored
      commit 8985167e upstream.
      
      When driver is built as module and DT node contains clocks compatible
      (e.g. "samsung,s2mps11-clk"), the module will not be autoloaded because
      module aliases won't match.
      
      The modalias from uevent: of:NclocksT<NULL>Csamsung,s2mps11-clk
      The modalias from driver: platform:s2mps11-clk
      
      The devices are instantiated by parent's MFD.  However both Device Tree
      bindings and parent define the compatible for clocks devices.  In case
      of module matching this DT compatible will be used.
      
      The issue will not happen if this is a built-in (no need for module
      matching) or when clocks DT node does not contain compatible (not
      correct from bindings perspective but working for driver).
      
      Note when backporting to stable kernels: adjust the list of device ID
      entries.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 53c31b34 ("mfd: sec-core: Add of_compatible strings for clock MFD cells")
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Acked-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f074414a
    • Richard Weinberger's avatar
      um: Drop own definition of PTRACE_SYSEMU/_SINGLESTEP · 5d33c77b
      Richard Weinberger authored
      commit 0676b957 upstream.
      
      32bit UML used to define PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP
      own its own because many years ago not all libcs had these request codes
      in their UAPI.
      These days PTRACE_SYSEMU/_SINGLESTEP is well known and part of glibc
      and our own define becomes problematic.
      
      With change c48831d0eebf ("linux/x86: sync sys/ptrace.h with Linux 4.14
      [BZ #22433]") glibc turned PTRACE_SYSEMU/_SINGLESTEP into a enum and
      UML failed to build.
      
      Let's drop our define and rely on the fact that every libc has
      PTRACE_SYSEMU/_SINGLESTEP.
      
      Cc: <stable@vger.kernel.org>
      Cc: Ritesh Raj Sarraf <rrs@researchut.com>
      Reported-and-tested-by: default avatarRitesh Raj Sarraf <rrs@researchut.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d33c77b
    • Max Filippov's avatar
      xtensa: fix boot parameters address translation · fa68fdf0
      Max Filippov authored
      commit 40dc948f upstream.
      
      The bootloader may pass physical address of the boot parameters structure
      to the MMUv3 kernel in the register a2. Code in the _SetupMMU block in
      the arch/xtensa/kernel/head.S is supposed to map that physical address to
      the virtual address in the configured virtual memory layout.
      
      This code haven't been updated when additional 256+256 and 512+512
      memory layouts were introduced and it may produce wrong addresses when
      used with these layouts.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa68fdf0
    • Max Filippov's avatar
      xtensa: make sure bFLT stack is 16 byte aligned · f221f7b4
      Max Filippov authored
      commit 0773495b upstream.
      
      Xtensa ABI requires stack alignment to be at least 16. In noMMU
      configuration ARCH_SLAB_MINALIGN is used to align stack. Make it at
      least 16.
      
      This fixes the following runtime error in noMMU configuration, caused by
      interaction between insufficiently aligned stack and alloca function,
      that results in corruption of on-stack variable in the libc function
      glob:
      
       Caught unhandled exception in 'sh' (pid = 47, pc = 0x02d05d65)
        - should not happen
        EXCCAUSE is 15
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f221f7b4
    • Max Filippov's avatar
      xtensa: add NOTES section to the linker script · 1592c520
      Max Filippov authored
      commit 4119ba21 upstream.
      
      This section collects all source .note.* sections together in the
      vmlinux image. Without it .note.Linux section may be placed at address
      0, while the rest of the kernel is at its normal address, resulting in a
      huge vmlinux.bin image that may not be linked into the xtensa Image.elf.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1592c520
    • Huacai Chen's avatar
      MIPS: Loongson-3: Fix BRIDGE irq delivery problem · 14563f42
      Huacai Chen authored
      [ Upstream commit 360fe725 ]
      
      After commit e509bd7d ("genirq: Allow migration of chained
      interrupts by installing default action") Loongson-3 fails at here:
      
      setup_irq(LOONGSON_HT1_IRQ, &cascade_irqaction);
      
      This is because both chained_action and cascade_irqaction don't have
      IRQF_SHARED flag. This will cause Loongson-3 resume fails because HPET
      timer interrupt can't be delivered during S3. So we set the irqchip of
      the chained irq to loongson_irq_chip which doesn't disable the chained
      irq in CP0.Status.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Patchwork: https://patchwork.linux-mips.org/patch/20434/
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: Huacai Chen <chenhuacai@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      14563f42
    • Huacai Chen's avatar
      MIPS: Loongson-3: Fix CPU UART irq delivery problem · 639fa868
      Huacai Chen authored
      [ Upstream commit d06f8a2f ]
      
      Masking/unmasking the CPU UART irq in CP0_Status (and redirecting it to
      other CPUs) may cause interrupts be lost, especially in multi-package
      machines (Package-0's UART irq cannot be delivered to others). So make
      mask_loongson_irq() and unmask_loongson_irq() be no-ops.
      
      The original problem (UART IRQ may deliver to any core) is also because
      of masking/unmasking the CPU UART irq in CP0_Status. So it is safe to
      remove all of the stuff.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Patchwork: https://patchwork.linux-mips.org/patch/20433/
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: Huacai Chen <chenhuacai@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      639fa868
    • Amir Goldstein's avatar
      ovl: fix recursive oi->lock in ovl_link() · 3c5cf798
      Amir Goldstein authored
      commit 6cd07870 upstream.
      
      linking a non-copied-up file into a non-copied-up parent results in a
      nested call to mutex_lock_interruptible(&oi->lock). Fix this by copying up
      target parent before ovl_nlink_start(), same as done in ovl_rename().
      
      ~/unionmount-testsuite$ ./run --ov -s
      ~/unionmount-testsuite$ ln /mnt/a/foo100 /mnt/a/dir100/
      
       WARNING: possible recursive locking detected
       --------------------------------------------
       ln/1545 is trying to acquire lock:
       00000000bcce7c4c (&ovl_i_lock_key[depth]){+.+.}, at:
           ovl_copy_up_start+0x28/0x7d
       but task is already holding lock:
       0000000026d73d5b (&ovl_i_lock_key[depth]){+.+.}, at:
           ovl_nlink_start+0x3c/0xc1
      
      [SzM: this seems to be a false positive, but doing the copy-up first is
      harmless and removes the lockdep splat]
      
      Reported-by: syzbot+3ef5c0d1a5cb0b21e6be@syzkaller.appspotmail.com
      Fixes: 5f8415d6 ("ovl: persistent overlay inode nlink for...")
      Cc: <stable@vger.kernel.org> # v4.13
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      [amir: backport to v4.18]
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c5cf798
    • Miklos Szeredi's avatar
      fuse: set FR_SENT while locked · 78da72ee
      Miklos Szeredi authored
      commit 4c316f2f upstream.
      
      Otherwise fuse_dev_do_write() could come in and finish off the request, and
      the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...))
      in request_end().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai
      Fixes: 46c34a34 ("fuse: no fc->lock for pqueue parts")
      Cc: <stable@vger.kernel.org> # v4.2
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78da72ee
    • Miklos Szeredi's avatar
      fuse: fix blocked_waitq wakeup · f6f21a2b
      Miklos Szeredi authored
      commit 908a572b upstream.
      
      Using waitqueue_active() is racy.  Make sure we issue a wake_up()
      unconditionally after storing into fc->blocked.  After that it's okay to
      optimize with waitqueue_active() since the first wake up provides the
      necessary barrier for all waiters, not the just the woken one.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 3c18ef81 ("fuse: optimize wake_up")
      Cc: <stable@vger.kernel.org> # v3.10
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6f21a2b
    • Kirill Tkhai's avatar
      fuse: Fix use-after-free in fuse_dev_do_write() · ab962e91
      Kirill Tkhai authored
      commit d2d2d4fb upstream.
      
      After we found req in request_find() and released the lock,
      everything may happen with the req in parallel:
      
      cpu0                              cpu1
      fuse_dev_do_write()               fuse_dev_do_write()
        req = request_find(fpq, ...)    ...
        spin_unlock(&fpq->lock)         ...
        ...                             req = request_find(fpq, oh.unique)
        ...                             spin_unlock(&fpq->lock)
        queue_interrupt(&fc->iq, req);   ...
        ...                              ...
        ...                              ...
        request_end(fc, req);
          fuse_put_request(fc, req);
        ...                              queue_interrupt(&fc->iq, req);
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 46c34a34 ("fuse: no fc->lock for pqueue parts")
      Cc: <stable@vger.kernel.org> # v4.2
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab962e91
    • Kirill Tkhai's avatar
      fuse: Fix use-after-free in fuse_dev_do_read() · d94b3a23
      Kirill Tkhai authored
      commit bc78abbd upstream.
      
      We may pick freed req in this way:
      
      [cpu0]                                  [cpu1]
      fuse_dev_do_read()                      fuse_dev_do_write()
         list_move_tail(&req->list, ...);     ...
         spin_unlock(&fpq->lock);             ...
         ...                                  request_end(fc, req);
         ...                                    fuse_put_request(fc, req);
         if (test_bit(FR_INTERRUPTED, ...))
               queue_interrupt(fiq, req);
      
      Fix that by keeping req alive until we finish all manipulations.
      
      Reported-by: syzbot+4e975615ca01f2277bdd@syzkaller.appspotmail.com
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 46c34a34 ("fuse: no fc->lock for pqueue parts")
      Cc: <stable@vger.kernel.org> # v4.2
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d94b3a23
    • Quinn Tran's avatar
      scsi: qla2xxx: Fix re-using LoopID when handle is in use · 0a7a9ed0
      Quinn Tran authored
      commit 5c640053 upstream.
      
      This patch fixes issue where driver clears NPort ID map instead of marking
      handle in use. Once driver clears NPort ID from the database, it can reuse
      the same NPort ID resulting in a PLOGI failure.
      
      [mkp: fixed Himanshu's SoB]
      
      Fixes: a084fd68 ("scsi: qla2xxx: Fix re-login for Nport Handle in use")
      Cc: <stable@vger.kernel.org>
      Signed-of-by: default avatarQuinn Tran <quinn.tran@cavium.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarHimanshu Madhani <hmadhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a7a9ed0