1. 30 May, 2018 40 commits
    • David S. Miller's avatar
      sparc64: Make atomic_xchg() an inline function rather than a macro. · 4a6cd791
      David S. Miller authored
      [ Upstream commit d13864b6 ]
      
      This avoids a lot of -Wunused warnings such as:
      
      ====================
      kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
      ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
       #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
      
      ./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
       #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
                                    ^~~~
      kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
          atomic_xchg(&kgdb_active, cpu);
          ^~~~~~~~~~~
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a6cd791
    • David Howells's avatar
      fscache: Fix hanging wait on page discarded by writeback · 22f1bde5
      David Howells authored
      [ Upstream commit 2c984257 ]
      
      If the fscache asynchronous write operation elects to discard a page that's
      pending storage to the cache because the page would be over the store limit
      then it needs to wake the page as someone may be waiting on completion of
      the write.
      
      The problem is that the store limit may be updated by a different
      asynchronous operation - and so may miss the write - and that the store
      limit may not even get updated until later by the netfs.
      
      Fix the kernel hang by making fscache_write_op() mark as written any pages
      that are over the limit.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22f1bde5
    • Alexander Graf's avatar
      lan78xx: Connect phy early · 6d03ff16
      Alexander Graf authored
      [ Upstream commit 92571a1a ]
      
      When using wicked with a lan78xx device attached to the system, we
      end up with ethtool commands issued on the device before an ifup
      got issued. That lead to the following crash:
      
          Unable to handle kernel NULL pointer dereference at virtual address 0000039c
          pgd = ffff800035b30000
          [0000039c] *pgd=0000000000000000
          Internal error: Oops: 96000004 [#1] SMP
          Modules linked in: [...]
          Supported: Yes
          CPU: 3 PID: 638 Comm: wickedd Tainted: G            E      4.12.14-0-default #1
          Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
          task: ffff800035e74180 task.stack: ffff800036718000
          PC is at phy_ethtool_ksettings_get+0x20/0x98
          LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
          pc : [<ffff0000086f7f30>] lr : [<ffff000000dcca84>] pstate: 20000005
          sp : ffff80003671bb20
          x29: ffff80003671bb20 x28: ffff800035e74180
          x27: ffff000008912000 x26: 000000000000001d
          x25: 0000000000000124 x24: ffff000008f74d00
          x23: 0000004000114809 x22: 0000000000000000
          x21: ffff80003671bbd0 x20: 0000000000000000
          x19: ffff80003671bbd0 x18: 000000000000040d
          x17: 0000000000000001 x16: 0000000000000000
          x15: 0000000000000000 x14: ffffffffffffffff
          x13: 0000000000000000 x12: 0000000000000020
          x11: 0101010101010101 x10: fefefefefefefeff
          x9 : 7f7f7f7f7f7f7f7f x8 : fefefeff31677364
          x7 : 0000000080808080 x6 : ffff80003671bc9c
          x5 : ffff80003671b9f8 x4 : ffff80002c296190
          x3 : 0000000000000000 x2 : 0000000000000000
          x1 : ffff80003671bbd0 x0 : ffff80003671bc00
          Process wickedd (pid: 638, stack limit = 0xffff800036718000)
          Call trace:
          Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
          b9e0: ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
          ba00: ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
          ba20: fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
          ba40: 0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
          ba60: 0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
          ba80: 0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
          baa0: ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
          bac0: ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
          bae0: ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
          bb00: 0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
          [<ffff0000086f7f30>] phy_ethtool_ksettings_get+0x20/0x98
          [<ffff000000dcca84>] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
          [<ffff0000087cbc40>] ethtool_get_settings+0x68/0x210
          [<ffff0000087cc0d4>] dev_ethtool+0x214/0x2180
          [<ffff0000087e5008>] dev_ioctl+0x400/0x630
          [<ffff00000879dd00>] sock_do_ioctl+0x70/0x88
          [<ffff00000879f5f8>] sock_ioctl+0x208/0x368
          [<ffff0000082cde10>] do_vfs_ioctl+0xb0/0x848
          [<ffff0000082ce634>] SyS_ioctl+0x8c/0xa8
          Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
          bec0: 0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
          bee0: 0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
          bf00: 000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
          bf20: 0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
          bf40: 0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
          bf60: 0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
          bf80: 0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
          bfa0: 0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
          bfc0: 0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
          bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
      The culprit is quite simple: The driver tries to access the phy left and right,
      but only actually has a working reference to it when the device is up.
      
      The fix thus is quite simple too: Get a reference to the phy on probe already
      and keep it even when the device is going down.
      
      With this patch applied, I can successfully run wicked on my system and bring
      the interface up and down as many times as I want, without getting NULL pointer
      dereferences in between.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d03ff16
    • Sean Christopherson's avatar
      KVM: VMX: raise internal error for exception during invalid protected mode state · 80b8f3da
      Sean Christopherson authored
      [ Upstream commit add5ff7a ]
      
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80b8f3da
    • Sai Praneeth's avatar
      x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of... · fd97bbca
      Sai Praneeth authored
      x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
      
      [ Upstream commit 162ee5a8 ]
      
      Linus reported the following boot warning:
      
        WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170
        [...]
        Call Trace:
        switch_mm_irqs_off+0x267/0x590
        switch_mm+0xe/0x20
        efi_switch_mm+0x3e/0x50
        efi_enter_virtual_mode+0x43f/0x4da
        start_kernel+0x3bf/0x458
        secondary_startup_64+0xa5/0xb0
      
      ... after merging:
      
        03781e40: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
      
      When the platform supports PCID and if CONFIG_DEBUG_VM=y is enabled,
      build_cr3_noflush() (called via switch_mm()) does a sanity check to see
      if X86_FEATURE_PCID is set.
      
      Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to
      perform the check but this_cpu_has() works only after SMP is initialized
      (i.e. per cpu cpu_info's should be populated) and this happens to be very
      late in the boot process (during rest_init()).
      
      As efi_runtime_services() are called during (early) kernel boot time
      and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
      time. As suggested by Dave Hansen, this should be OK because all CPU's have
      same capabilities on x86.
      
      With this change the warning is fixed.
      
      ( Dave also suggested that we put a warning in this_cpu_has() if it's used
        early in the boot process. This is still work in progress as it affects
        MCE. )
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Lee Chun-Yi <jlee@suse.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1522870459-7432-1-git-send-email-sai.praneeth.prakhya@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd97bbca
    • Davidlohr Bueso's avatar
      sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning · 3aeaeecd
      Davidlohr Bueso authored
      [ Upstream commit d29a2064 ]
      
      While running rt-tests' pi_stress program I got the following splat:
      
        rq->clock_update_flags < RQCF_ACT_SKIP
        WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20
      
        [...]
      
        <IRQ>
        enqueue_top_rt_rq+0xf4/0x150
        ? cpufreq_dbs_governor_start+0x170/0x170
        sched_rt_rq_enqueue+0x65/0x80
        sched_rt_period_timer+0x156/0x360
        ? sched_rt_rq_enqueue+0x80/0x80
        __hrtimer_run_queues+0xfa/0x260
        hrtimer_interrupt+0xcb/0x220
        smp_apic_timer_interrupt+0x62/0x120
        apic_timer_interrupt+0xf/0x20
        </IRQ>
      
        [...]
      
        do_idle+0x183/0x1e0
        cpu_startup_entry+0x5f/0x70
        start_secondary+0x192/0x1d0
        secondary_startup_64+0xa5/0xb0
      
      We can get rid of it be the "traditional" means of adding an
      update_rq_clock() call after acquiring the rq->lock in
      do_sched_rt_period_timer().
      
      The case for the RT task throttling (which this workload also hits)
      can be ignored in that the skip_update call is actually bogus and
      quite the contrary (the request bits are removed/reverted).
      
      By setting RQCF_UPDATED we really don't care if the skip is happening
      or not and will therefore make the assert_clock_updated() check happy.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: linux-kernel@vger.kernel.org
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3aeaeecd
    • Nicholas Piggin's avatar
      powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep · be6a5ad5
      Nicholas Piggin authored
      [ Upstream commit c1b25a17 ]
      
      POWER8 restores AMOR when waking from deep sleep, but POWER9 does not,
      because it does not go through the subcore restore.
      
      Have POWER9 restore it in core restore.
      
      Fixes: ee97b6b9 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be6a5ad5
    • Jun Piao's avatar
      ocfs2/dlm: don't handle migrate lockres if already in shutdown · 839c27f7
      Jun Piao authored
      [ Upstream commit bb34f24c ]
      
      We should not handle migrate lockres if we are already in
      'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
      dlm domain.  At last other nodes will get stuck into infinite loop when
      requsting lock from us.
      
      The problem is caused by concurrency umount between nodes.  Before
      receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
      migrate target.  So N2 will continue sending lockres to N1 even though
      N1 has left domain.
      
              N1                             N2 (owner)
                                             touch file
      
          access the file,
          and get pr lock
      
                                             begin leave domain and
                                             pick up N1 as new owner
      
          begin leave domain and
          migrate all lockres done
      
                                             begin migrate lockres to N1
      
          end leave domain, but
          the lockres left
          unexpectedly, because
          migrate task has passed
      
      [piaojun@huawei.com: v3]
        Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
      Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.comSigned-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      839c27f7
    • Mikhail Malygin's avatar
      IB/rxe: Fix for oops in rxe_register_device on ppc64le arch · 9ebe2977
      Mikhail Malygin authored
      [ Upstream commit efc365e7 ]
      
      On ppc64le arch rxe_add command causes oops in kernel log:
      
      [   92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
      [   92.499710] SMP NR_CPUS=2048 NUMA pSeries
      [   92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
      _nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
       nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
      b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
      m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
      generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
      a2xxx(E)
      [   92.499832]  scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
      [   92.499834] Supported: No, Unsupported modules are loaded
      [   92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G           OE   NX 4.4.120-ttln.17-default #1
      [   92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
      [   92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
      [   92.499844] REGS: c0000000bebab750 TRAP: 0300   Tainted: G           OE   NX  (4.4.120-ttln.17-default)
      [   92.499850] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28424428  XER: 20000000
      [   92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
                     GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
                     GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
                     GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
                     GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
                     GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
                     GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
                     GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
                     GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
      [   92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
      [   92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
      [   92.499886] Call Trace:
      [   92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
      [   92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
      [   92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
      [   92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
      [   92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
      [   92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
      [   92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
      [   92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
      [   92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
      [   92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
      [   92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
      [   92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
      [   92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
      [   92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
      [   92.499949] Instruction dump:
      [   92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
      [   92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 <e9230208> 7c7e1b78 2fa90000 419e0078
      [   92.499962] ---[ end trace bed077e15eb420cf ]---
      
      It fails in dma_get_required_mask, that has ppc-specific implementation,
      and fail if provided device argument is NULL
      Signed-off-by: default avatarMikhail Malygin <mikhail@malygin.me>
      Reviewed-by: default avatarYonatan Cohen <yonatanc@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ebe2977
    • Nikolay Borisov's avatar
      btrfs: Fix possible softlock on single core machines · 370b3353
      Nikolay Borisov authored
      [ Upstream commit 1e1c50a9 ]
      
      do_chunk_alloc implements a loop checking whether there is a pending
      chunk allocation and if so causes the caller do loop. Generally this
      loop is executed only once, however testing with btrfs/072 on a single
      core vm machines uncovered an extreme case where the system could loop
      indefinitely. This is due to a missing cond_resched when loop which
      doesn't give a chance to the previous chunk allocator finish its job.
      
      The fix is to simply add the missing cond_resched.
      
      Fixes: 6d74119f ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      370b3353
    • Liu Bo's avatar
      Btrfs: fix NULL pointer dereference in log_dir_items · acfd8e88
      Liu Bo authored
      [ Upstream commit 80c0b421 ]
      
      0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
      returned, path->nodes[0] could be NULL, log_dir_items lacks such a
      check for <0 and we may run into a null pointer dereference panic.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acfd8e88
    • Liu Bo's avatar
      Btrfs: bail out on error during replay_dir_deletes · afef64b1
      Liu Bo authored
      [ Upstream commit b98def7c ]
      
      If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
      to bail out, otherwise @ret would be forced to be 0 after 'break;' and
      the caller won't be aware of it.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afef64b1
    • Yang Shi's avatar
      mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one() · 5ade3c96
      Yang Shi authored
      [ Upstream commit f0849ac0 ]
      
      For PTE-mapped THP, the compound THP has not been split to normal 4K
      pages yet, the whole THP is considered referenced if any one of sub page
      is referenced.
      
      When walking PTE-mapped THP by pvmw, all relevant PTEs will be checked
      to retrieve referenced bit.  But, the current code just returns the
      result of the last PTE.  If the last PTE has not referenced, the
      referenced flag will be cleared.
      
      Just set referenced when ptep{pmdp}_clear_young_notify() returns true.
      
      Link: http://lkml.kernel.org/r/1518212451-87134-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reported-by: default avatarGang Deng <gavin.dg@linux.alibaba.com>
      Suggested-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ade3c96
    • Huang Ying's avatar
      mm: fix races between address_space dereference and free in page_evicatable · 8d700626
      Huang Ying authored
      [ Upstream commit e92bb4dd ]
      
      When page_mapping() is called and the mapping is dereferenced in
      page_evicatable() through shrink_active_list(), it is possible for the
      inode to be truncated and the embedded address space to be freed at the
      same time.  This may lead to the following race.
      
      CPU1                                                CPU2
      
      truncate(inode)                                     shrink_active_list()
        ...                                                 page_evictable(page)
        truncate_inode_page(mapping, page);
          delete_from_page_cache(page)
            spin_lock_irqsave(&mapping->tree_lock, flags);
              __delete_from_page_cache(page, NULL)
                page_cache_tree_delete(..)
                  ...                                         mapping = page_mapping(page);
                  page->mapping = NULL;
                  ...
            spin_unlock_irqrestore(&mapping->tree_lock, flags);
            page_cache_free_page(mapping, page)
              put_page(page)
                if (put_page_testzero(page)) -> false
      - inode now has no pages and can be freed including embedded address_space
      
                                                              mapping_unevictable(mapping)
      							  test_bit(AS_UNEVICTABLE, &mapping->flags);
      - we've dereferenced mapping which is potentially already free.
      
      Similar race exists between swap cache freeing and page_evicatable()
      too.
      
      The address_space in inode and swap cache will be freed after a RCU
      grace period.  So the races are fixed via enclosing the page_mapping()
      and address_space usage in rcu_read_lock/unlock().  Some comments are
      added in code to make it clear what is protected by the RCU read lock.
      
      Link: http://lkml.kernel.org/r/20180212081227.1940-1-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d700626
    • Claudio Imbrenda's avatar
      mm/ksm: fix interaction with THP · 763111d9
      Claudio Imbrenda authored
      [ Upstream commit 77da2ba0 ]
      
      This patch fixes a corner case for KSM.  When two pages belong or
      belonged to the same transparent hugepage, and they should be merged,
      KSM fails to split the page, and therefore no merging happens.
      
      This bug can be reproduced by:
      * making sure ksm is running (in case disabling ksmtuned)
      * enabling transparent hugepages
      * allocating a THP-aligned 1-THP-sized buffer
        e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
      * filling it with the same values
        e.g. memset(p, 42, 1<<21)
      * performing madvise to make it mergeable
        e.g. madvise(p, 1<<21, MADV_MERGEABLE)
      * waiting for KSM to perform a few scans
      
      The expected outcome is that the all the pages get merged (1 shared and
      the rest sharing); the actual outcome is that no pages get merged (1
      unshared and the rest volatile)
      
      The reason of this behaviour is that we increase the reference count
      once for both pages we want to merge, but if they belong to the same
      hugepage (or compound page), the reference counter used in both cases is
      the one of the head of the compound page.  This means that
      split_huge_page will find a value of the reference counter too high and
      will fail.
      
      This patch solves this problem by testing if the two pages to merge
      belong to the same hugepage when attempting to merge them.  If so, the
      hugepage is split safely.  This means that the hugepage is not split if
      not necessary.
      
      Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.comSigned-off-by: default avatarClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Co-authored-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      763111d9
    • Thomas Falcon's avatar
      ibmvnic: Zero used TX descriptor counter on reset · 378a1e49
      Thomas Falcon authored
      [ Upstream commit 41f71467 ]
      
      The counter that tracks used TX descriptors pending completion
      needs to be zeroed as part of a device reset. This change fixes
      a bug causing transmit queues to be stopped unnecessarily and in
      some cases a transmit queue stall and timeout reset. If the counter
      is not reset, the remaining descriptors will not be "removed",
      effectively reducing queue capacity. If the queue is over half full,
      it will cause the queue to stall if stopped.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      378a1e49
    • Esben Haabendal's avatar
      dp83640: Ensure against premature access to PHY registers after reset · d04e5e72
      Esben Haabendal authored
      [ Upstream commit 76327a35 ]
      
      The datasheet specifies a 3uS pause after performing a software
      reset. The default implementation of genphy_soft_reset() does not
      provide this, so implement soft_reset with the needed pause.
      Signed-off-by: default avatarEsben Haabendal <eha@deif.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d04e5e72
    • Sandipan Das's avatar
      perf clang: Add support for recent clang versions · 4be06bc0
      Sandipan Das authored
      [ Upstream commit 7854e499 ]
      
      The clang API calls used by perf have changed in recent releases and
      builds succeed with libclang-3.9 only. This introduces compatibility
      with libclang-4.0 and above.
      
      Without this patch, we will see the following compilation errors with
      libclang-4.0+:
      
       util/c++/clang.cpp: In function ‘clang::CompilerInvocation* perf::createCompilerInvocation(llvm::opt::ArgStringList, llvm::StringRef&, clang::DiagnosticsEngine&)’:
       util/c++/clang.cpp:62:33: error: ‘IK_C’ was not declared in this scope
         Opts.Inputs.emplace_back(Path, IK_C);
                                        ^~~~
       util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::Module> perf::getModuleFromSource(llvm::opt::ArgStringList, llvm::StringRef, llvm::IntrusiveRefCntPtr<clang::vfs::FileSystem>)’:
       util/c++/clang.cpp:75:26: error: no matching function for call to ‘clang::CompilerInstance::setInvocation(clang::CompilerInvocation*)’
         Clang.setInvocation(&*CI);
                                 ^
       In file included from util/c++/clang.cpp:14:0:
       /usr/include/clang/Frontend/CompilerInstance.h:231:8: note: candidate: void clang::CompilerInstance::setInvocation(std::shared_ptr<clang::CompilerInvocation>)
          void setInvocation(std::shared_ptr<CompilerInvocation> Value);
               ^~~~~~~~~~~~~
      
      Committer testing:
      
      Tested on Fedora 27 after installing the clang-devel and llvm-devel
      packages, versions:
      
        # rpm -qa | egrep llvm\|clang
        llvm-5.0.1-6.fc27.x86_64
        clang-libs-5.0.1-5.fc27.x86_64
        clang-5.0.1-5.fc27.x86_64
        clang-tools-extra-5.0.1-5.fc27.x86_64
        llvm-libs-5.0.1-6.fc27.x86_64
        llvm-devel-5.0.1-6.fc27.x86_64
        clang-devel-5.0.1-5.fc27.x86_64
        #
      
      Make sure you don't have some older version lying around in /usr/local,
      etc, then:
      
        $ make LIBCLANGLLVM=1 -C tools/perf install-bin
      
      And in the end perf will be linked agains these libraries:
      
        # ldd ~/bin/perf | egrep -i llvm\|clang
      	libclangAST.so.5 => /lib64/libclangAST.so.5 (0x00007f8bb2eb4000)
      	libclangBasic.so.5 => /lib64/libclangBasic.so.5 (0x00007f8bb29e3000)
      	libclangCodeGen.so.5 => /lib64/libclangCodeGen.so.5 (0x00007f8bb23f7000)
      	libclangDriver.so.5 => /lib64/libclangDriver.so.5 (0x00007f8bb2060000)
      	libclangFrontend.so.5 => /lib64/libclangFrontend.so.5 (0x00007f8bb1d06000)
      	libclangLex.so.5 => /lib64/libclangLex.so.5 (0x00007f8bb1a3e000)
      	libclangTooling.so.5 => /lib64/libclangTooling.so.5 (0x00007f8bb17d4000)
      	libclangEdit.so.5 => /lib64/libclangEdit.so.5 (0x00007f8bb15c5000)
      	libclangSema.so.5 => /lib64/libclangSema.so.5 (0x00007f8bb0cc9000)
      	libclangAnalysis.so.5 => /lib64/libclangAnalysis.so.5 (0x00007f8bb0a23000)
      	libclangParse.so.5 => /lib64/libclangParse.so.5 (0x00007f8bb0725000)
      	libclangSerialization.so.5 => /lib64/libclangSerialization.so.5 (0x00007f8bb039a000)
      	libLLVM-5.0.so => /lib64/libLLVM-5.0.so (0x00007f8bace98000)
      	libclangASTMatchers.so.5 => /lib64/../lib64/libclangASTMatchers.so.5 (0x00007f8bab735000)
      	libclangFormat.so.5 => /lib64/../lib64/libclangFormat.so.5 (0x00007f8bab4b2000)
      	libclangRewrite.so.5 => /lib64/../lib64/libclangRewrite.so.5 (0x00007f8bab2a1000)
      	libclangToolingCore.so.5 => /lib64/../lib64/libclangToolingCore.so.5 (0x00007f8bab08e000)
        #
      Signed-off-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Fixes: 00b86691 ("perf clang: Add builtin clang support ant test case")
      Link: http://lkml.kernel.org/r/20180404180419.19056-2-sandipan@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4be06bc0
    • Sandipan Das's avatar
      perf tools: Fix perf builds with clang support · ee7c28b2
      Sandipan Das authored
      [ Upstream commit c2fb54a1 ]
      
      For libclang, some distro packages provide static libraries (.a) while
      some provide shared libraries (.so). Currently, perf code can only be
      linked with static libraries. This makes perf build possible for both
      cases.
      Signed-off-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Fixes: d58ac0bf ("perf build: Add clang and llvm compile and linking support")
      Link: http://lkml.kernel.org/r/20180404180419.19056-1-sandipan@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee7c28b2
    • Anshuman Khandual's avatar
      powerpc/fscr: Enable interrupts earlier before calling get_user() · 6689a4c7
      Anshuman Khandual authored
      [ Upstream commit 709b973c ]
      
      The function get_user() can sleep while trying to fetch instruction
      from user address space and causes the following warning from the
      scheduler.
      
      BUG: sleeping function called from invalid context
      
      Though interrupts get enabled back but it happens bit later after
      get_user() is called. This change moves enabling these interrupts
      earlier covering the function get_user(). While at this, lets check
      for kernel mode and crash as this interrupt should not have been
      triggered from the kernel context.
      Signed-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6689a4c7
    • Shunyong Yang's avatar
      cpufreq: CPPC: Initialize shared perf capabilities of CPUs · 96fdc64d
      Shunyong Yang authored
      [ Upstream commit 8913315e ]
      
      When multiple CPUs are related in one cpufreq policy, the first online
      CPU will be chosen by default to handle cpufreq operations. Let's take
      cpu0 and cpu1 as an example.
      
      When cpu0 is offline, policy->cpu will be shifted to cpu1. cpu1's perf
      capabilities should be initialized. Otherwise, perf capabilities are 0s
      and speed change can not take effect.
      
      This patch copies perf capabilities of the first online CPU to other
      shared CPUs when policy shared type is CPUFREQ_SHARED_TYPE_ANY.
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96fdc64d
    • Carlos Maiolino's avatar
      Force log to disk before reading the AGF during a fstrim · 8bff7ca9
      Carlos Maiolino authored
      [ Upstream commit 8c81dd46 ]
      
      Forcing the log to disk after reading the agf is wrong, we might be
      calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
      
      This can cause a deadlock when racing a fstrim with a filesystem
      shutdown.
      
      The deadlock has been identified due a miscalculation bug in device-mapper
      dm-thin, which returns lack of space to its users earlier than the device itself
      really runs out of space, changing the device-mapper volume into an error state.
      
      The problem happened while filling the filesystem with a single file,
      triggering the bug in device-mapper, consequently causing an IO error
      and shutting down the filesystem.
      
      If such file is removed, and fstrim executed before the XFS finishes the
      shut down process, the fstrim process will end up holding the buffer
      lock, and going to sleep on the cil wait queue.
      
      At this point, the shut down process will try to wake up all the threads
      waiting on the cil wait queue, but for this, it will try to hold the
      same buffer log already held my the fstrim, locking up the filesystem.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bff7ca9
    • Jens Axboe's avatar
      sr: get/drop reference to device in revalidate and check_events · 28143fe3
      Jens Axboe authored
      [ Upstream commit 2d097c50 ]
      
      We can't just use scsi_cd() to get the scsi_cd structure, we have
      to grab a live reference to the device. For both callbacks, we're
      not inside an open where we already hold a reference to the device.
      
      This fixes device removal/addition under concurrent device access,
      which otherwise could result in the below oops.
      
      NULL pointer dereference at 0000000000000010
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      sr 12:0:0:0: [sr2] scsi-1 drive
       scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme nvme_core sb_edac xl
      sr 12:0:0:0: Attached scsi CD-ROM sr2
       sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash lzo_compress zlib_defc
      sr 12:0:0:0: Attached scsi generic sg7 type 5
       igb ahci libahci i2c_algo_bit libata dca [last unloaded: crc_t10dif]
      CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
      Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
      RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
      RSP: 0018:ffff883ff357bb58 EFLAGS: 00010292
      RAX: ffffffffa00b07d0 RBX: ffff883ff3058000 RCX: ffff883ff357bb66
      RDX: 0000000000000003 RSI: 0000000000007530 RDI: ffff881fea631000
      RBP: 0000000000000000 R08: ffff881fe4d38400 R09: 0000000000000000
      R10: 0000000000000000 R11: 00000000000001b6 R12: 000000000800005d
      R13: 000000000800005d R14: ffff883ffd9b3790 R15: 0000000000000000
      FS:  00007f7dc8e6d8c0(0000) GS:ffff883fff340000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000003ffda98005 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ? __invalidate_device+0x48/0x60
       check_disk_change+0x4c/0x60
       sr_block_open+0x16/0xd0 [sr_mod]
       __blkdev_get+0xb9/0x450
       ? iget5_locked+0x1c0/0x1e0
       blkdev_get+0x11e/0x320
       ? bdget+0x11d/0x150
       ? _raw_spin_unlock+0xa/0x20
       ? bd_acquire+0xc0/0xc0
       do_dentry_open+0x1b0/0x320
       ? inode_permission+0x24/0xc0
       path_openat+0x4e6/0x1420
       ? cpumask_any_but+0x1f/0x40
       ? flush_tlb_mm_range+0xa0/0x120
       do_filp_open+0x8c/0xf0
       ? __seccomp_filter+0x28/0x230
       ? _raw_spin_unlock+0xa/0x20
       ? __handle_mm_fault+0x7d6/0x9b0
       ? list_lru_add+0xa8/0xc0
       ? _raw_spin_unlock+0xa/0x20
       ? __alloc_fd+0xaf/0x160
       ? do_sys_open+0x1a6/0x230
       do_sys_open+0x1a6/0x230
       do_syscall_64+0x5a/0x100
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28143fe3
    • Xidong Wang's avatar
      z3fold: fix memory leak · 3a0de65a
      Xidong Wang authored
      [ Upstream commit 1ec6995d ]
      
      In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
      released on the error path that pool->compact_wq , which holds the
      return value of create_singlethread_workqueue(), is NULL.  This will
      result in a memory leak bug.
      
      [akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
      Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.comSigned-off-by: default avatarXidong Wang <wangxidong_97@163.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a0de65a
    • Tom Abraham's avatar
      swap: divide-by-zero when zero length swap file on ssd · 2ab77381
      Tom Abraham authored
      [ Upstream commit a06ad633 ]
      
      Calling swapon() on a zero length swap file on SSD can lead to a
      divide-by-zero.
      
      Although creating such files isn't possible with mkswap and they woud be
      considered invalid, it would be better for the swapon code to be more
      robust and handle this condition gracefully (return -EINVAL).
      Especially since the fix is small and straightforward.
      
      To help with wear leveling on SSD, the swapon syscall calculates a
      random position in the swap file using modulo p->highest_bit, which is
      set to maxpages - 1 in read_swap_header.
      
      If the swap file is zero length, read_swap_header sets maxpages=1 and
      last_page=0, resulting in p->highest_bit=0 and we divide-by-zero when we
      modulo p->highest_bit in swapon syscall.
      
      This can be prevented by having read_swap_header return zero if
      last_page is zero.
      
      Link: http://lkml.kernel.org/r/5AC747C1020000A7001FA82C@prv-mh.provo.novell.comSigned-off-by: default avatarThomas Abraham <tabraham@suse.com>
      Reported-by: <Mark.Landis@Teradata.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ab77381
    • Danilo Krummrich's avatar
      fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table · 9c9844d9
      Danilo Krummrich authored
      [ Upstream commit a0b0d1c3 ]
      
      proc_sys_link_fill_cache() does not take currently unregistering sysctl
      tables into account, which might result into a page fault in
      sysctl_follow_link() - add a check to fix it.
      
      This bug has been present since v3.4.
      
      Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de
      Fixes: 0e47c99d ("sysctl: Replace root_list with links between sysctl_table_sets")
      Signed-off-by: default avatarDanilo Krummrich <danilokrummrich@dk-develop.de>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c9844d9
    • Dave Hansen's avatar
      x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init · 59bdc587
      Dave Hansen authored
      [ Upstream commit 639d6aaf ]
      
      __ro_after_init data gets stuck in the .rodata section.  That's normally
      fine because the kernel itself manages the R/W properties.
      
      But, if we run __change_page_attr() on an area which is __ro_after_init,
      the .rodata checks will trigger and force the area to be immediately
      read-only, even if it is early-ish in boot.  This caused problems when
      trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
      it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
      to clear _PAGE_RW.  The kernel then oopses the next time it wrote to
      a __ro_after_init data structure.
      
      To fix this, add the kernel_set_to_readonly check, just like we have
      for kernel text, just a few lines below in this function.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59bdc587
    • Joerg Roedel's avatar
      x86/pgtable: Don't set huge PUD/PMD on non-leaf entries · c1af6891
      Joerg Roedel authored
      [ Upstream commit e3e28812 ]
      
      The pmd_set_huge() and pud_set_huge() functions are used from
      the generic ioremap() code to establish large mappings where this
      is possible.
      
      But the generic ioremap() code does not check whether the
      PMD/PUD entries are already populated with a non-leaf entry,
      so that any page-table pages these entries point to will be
      lost.
      
      Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a
      BUG_ON() in vmalloc_sync_one() when PMD entries are synced
      from swapper_pg_dir to the current page-table. This happens
      because the PMD entry from swapper_pg_dir was promoted to a
      huge-page entry while the current PGD still contains the
      non-leaf entry. Because both entries are present and point
      to a different page, the BUG_ON() triggers.
      
      This was actually triggered with pti-x32 enabled in a KVM
      virtual machine by the graphics driver.
      
      A real and better fix for that would be to improve the
      page-table handling in the generic ioremap() code. But that is
      out-of-scope for this patch-set and left for later work.
      Reported-by: default avatarDavid H. Gutteridge <dhgutteridge@sympatico.ca>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Waiman Long <llong@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1af6891
    • Filipe Manana's avatar
      Btrfs: fix loss of prealloc extents past i_size after fsync log replay · c527ab91
      Filipe Manana authored
      [ Upstream commit 471d557a ]
      
      Currently if we allocate extents beyond an inode's i_size (through the
      fallocate system call) and then fsync the file, we log the extents but
      after a power failure we replay them and then immediately drop them.
      This behaviour happens since about 2009, commit c71bf099 ("Btrfs:
      Avoid orphan inodes cleanup while replaying log"), because it marks
      the inode as an orphan instead of dropping any extents beyond i_size
      before replaying logged extents, so after the log replay, and while
      the mount operation is still ongoing, we find the inode marked as an
      orphan and then perform a truncation (drop extents beyond the inode's
      i_size). Because the processing of orphan inodes is still done
      right after replaying the log and before the mount operation finishes,
      the intention of that commit does not make any sense (at least as
      of today). However reverting that behaviour is not enough, because
      we can not simply discard all extents beyond i_size and then replay
      logged extents, because we risk dropping extents beyond i_size created
      in past transactions, for example:
      
        add prealloc extent beyond i_size
        fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
        transaction commit
        add another prealloc extent beyond i_size
        fsync - triggers the fast fsync path
        power failure
      
      In that scenario, we would drop the first extent and then replay the
      second one. To fix this just make sure that all prealloc extents
      beyond i_size are logged, and if we find too many (which is far from
      a common case), fallback to a full transaction commit (like we do when
      logging regular extents in the fast fsync path).
      
      Trivial reproducer:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
       $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
       $ sync
       $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
       $ xfs_io -c "fsync" /mnt/foo
       <power failure>
      
       # mount to replay log
       $ mount /dev/sdb /mnt
       # at this point the file only has one extent, at offset 0, size 256K
      
      A test case for fstests follows soon, covering multiple scenarios that
      involve adding prealloc extents with previous shrinking truncates and
      without such truncates.
      
      Fixes: c71bf099 ("Btrfs: Avoid orphan inodes cleanup while replaying log")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c527ab91
    • Liu Bo's avatar
      Btrfs: clean up resources during umount after trans is aborted · f2924e32
      Liu Bo authored
      [ Upstream commit af722733 ]
      
      Currently if some fatal errors occur, like all IO get -EIO, resources
      would be cleaned up when
      a) transaction is being committed or
      b) BTRFS_FS_STATE_ERROR is set
      
      However, in some rare cases, resources may be left alone after transaction
      gets aborted and umount may run into some ASSERT(), e.g.
      ASSERT(list_empty(&block_group->dirty_list));
      
      For case a), in btrfs_commit_transaciton(), there're several places at the
      beginning where we just call btrfs_end_transaction() without cleaning up
      resources.  For case b), it is possible that the trans handle doesn't have
      any dirty stuff, then only trans hanlde is marked as aborted while
      BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
      
      This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
      all resources won't stay in memory after umount.
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2924e32
    • Johannes Thumshirn's avatar
      nvme: don't send keep-alives to the discovery controller · 1908ca22
      Johannes Thumshirn authored
      [ Upstream commit 74c6c715 ]
      
      NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and
      Command Support" Figure 31 "Discovery Controller – Admin Commands"
      explicitly listst all commands but "Get Log Page" and "Identify" as
      reserved, but NetApp report the Linux host is sending Keep Alive
      commands to the discovery controller, which is a violation of the
      Spec.
      
      We're already checking for discovery controllers when configuring the
      keep alive timeout but when creating a discovery controller we're not
      hard wiring the keep alive timeout to 0 and thus remain on
      NVME_DEFAULT_KATO for the discovery controller.
      
      This can be easily remproduced when issuing a direct connect to the
      discovery susbsystem using:
      'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery'
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: 07bfcd09 ("nvme-fabrics: add a generic NVMe over Fabrics library")
      Reported-by: default avatarMartin George <marting@netapp.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1908ca22
    • Jean Delvare's avatar
      firmware: dmi_scan: Fix UUID length safety check · 145b7e06
      Jean Delvare authored
      [ Upstream commit 90fe6f8f ]
      
      The test which ensures that the DMI type 1 structure is long enough
      to hold the UUID is off by one. It would fail if the structure is
      exactly 24 bytes long, while that's sufficient to hold the UUID.
      
      I don't expect this bug to cause problem in practice because all
      implementations I have seen had length 8, 25 or 27 bytes, in line
      with the SMBIOS specifications. But let's fix it still.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: a814c359 ("firmware: dmi_scan: Check DMI structure length")
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      145b7e06
    • Rich Felker's avatar
      sh: fix debug trap failure to process signals before return to user · d9179b4a
      Rich Felker authored
      [ Upstream commit 96a59899 ]
      
      When responding to a debug trap (breakpoint) in userspace, the
      kernel's trap handler raised SIGTRAP but returned from the trap via a
      code path that ignored pending signals, resulting in an infinite loop
      re-executing the trapping instruction.
      Signed-off-by: default avatarRich Felker <dalias@libc.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9179b4a
    • Yelena Krivosheev's avatar
      net: mvneta: fix enable of all initialized RXQs · 4ee9130f
      Yelena Krivosheev authored
      [ Upstream commit e81b5e01 ]
      
      In mvneta_port_up() we enable relevant RX and TX port queues by write
      queues bit map to an appropriate register.
      
      q_map must be ZERO in the beginning of this process.
      Signed-off-by: default avatarYelena Krivosheev <yelena@marvell.com>
      Acked-by: default avatarThomas Petazzoni <thomas.petazzoni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ee9130f
    • Toshiaki Makita's avatar
      vlan: Fix vlan insertion for packets without ethernet header · 20619941
      Toshiaki Makita authored
      [ Upstream commit c769accd ]
      
      In some situation vlan packets do not have ethernet headers. One example
      is packets from tun devices. Users can specify vlan protocol in tun_pi
      field instead of IP protocol. When we have a vlan device with reorder_hdr
      disabled on top of the tun device, such packets from tun devices are
      untagged in skb_vlan_untag() and vlan headers will be inserted back in
      vlan_insert_inner_tag().
      
      vlan_insert_inner_tag() however did not expect packets without ethernet
      headers, so in such a case size argument for memmove() underflowed.
      
      We don't need to copy headers for packets which do not have preceding
      headers of vlan headers, so skip memmove() in that case.
      Also don't write vlan protocol in skb->data when it does not have enough
      room for it.
      
      Fixes: cbe7128c ("vlan: Fix out of order vlan headers with reorder header off")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20619941
    • Toshiaki Makita's avatar
      net: Fix untag for vlan packets without ethernet header · 34a9a036
      Toshiaki Makita authored
      [ Upstream commit ae474573 ]
      
      In some situation vlan packets do not have ethernet headers. One example
      is packets from tun devices. Users can specify vlan protocol in tun_pi
      field instead of IP protocol, and skb_vlan_untag() attempts to untag such
      packets.
      
      skb_vlan_untag() (more precisely, skb_reorder_vlan_header() called by it)
      however did not expect packets without ethernet headers, so in such a case
      size argument for memmove() underflowed and triggered crash.
      
      ====
      BUG: unable to handle kernel paging request at ffff8801cccb8000
      IP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      PGD 9cee067 P4D 9cee067 PUD 1d9401063 PMD 1cccb7063 PTE 2810100028101
      Oops: 000b [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17663 Comm: syz-executor2 Not tainted 4.16.0-rc7+ #368
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      RSP: 0018:ffff8801cc046e28 EFLAGS: 00010287
      RAX: ffff8801ccc244c4 RBX: fffffffffffffffe RCX: fffffffffff6c4c2
      RDX: fffffffffffffffe RSI: ffff8801cccb7ffc RDI: ffff8801cccb8000
      RBP: ffff8801cc046e48 R08: ffff8801ccc244be R09: ffffed0039984899
      R10: 0000000000000001 R11: ffffed0039984898 R12: ffff8801ccc244c4
      R13: ffff8801ccc244c0 R14: ffff8801d96b7c06 R15: ffff8801d96b7b40
      FS:  00007febd562d700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8801cccb8000 CR3: 00000001ccb2f006 CR4: 00000000001606e0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Call Trace:
       memmove include/linux/string.h:360 [inline]
       skb_reorder_vlan_header net/core/skbuff.c:5031 [inline]
       skb_vlan_untag+0x470/0xc40 net/core/skbuff.c:5061
       __netif_receive_skb_core+0x119c/0x3460 net/core/dev.c:4460
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4627
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4701
       netif_receive_skb+0xae/0x390 net/core/dev.c:4725
       tun_rx_batched.isra.50+0x5ee/0x870 drivers/net/tun.c:1555
       tun_get_user+0x299e/0x3c20 drivers/net/tun.c:1962
       tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1990
       call_write_iter include/linux/fs.h:1782 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x454879
      RSP: 002b:00007febd562cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007febd562d6d4 RCX: 0000000000454879
      RDX: 0000000000000157 RSI: 0000000020000180 RDI: 0000000000000014
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000006b0 R14: 00000000006fc120 R15: 0000000000000000
      Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
      RIP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP: ffff8801cc046e28
      CR2: ffff8801cccb8000
      ====
      
      We don't need to copy headers for packets which do not have preceding
      headers of vlan headers, so skip memmove() in that case.
      
      Fixes: 4bbb3e0e ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34a9a036
    • Manish Chopra's avatar
      qede: Do not drop rx-checksum invalidated packets. · 235ca6a0
      Manish Chopra authored
      [ Upstream commit 58f101bf ]
      
      Today, driver drops received packets which are indicated as
      invalid checksum by the device. Instead of dropping such packets,
      pass them to the stack with CHECKSUM_NONE indication in skb.
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      235ca6a0
    • Stephen Hemminger's avatar
      hv_netvsc: enable multicast if necessary · 78c986bf
      Stephen Hemminger authored
      [ Upstream commit f03dbb06 ]
      
      My recent change to netvsc drive in how receive flags are handled
      broke multicast.  The Hyper-v/Azure virtual interface there is not a
      multicast filter list, filtering is only all or none. The driver must
      enable all multicast if any multicast address is present.
      
      Fixes: 009f766c ("hv_netvsc: filter multicast/broadcast")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78c986bf
    • Vinayak Menon's avatar
      mm/kmemleak.c: wait for scan completion before disabling free · 28bbb0d9
      Vinayak Menon authored
      [ Upstream commit 914b6dff ]
      
      A crash is observed when kmemleak_scan accesses the object->pointer,
      likely due to the following race.
      
        TASK A             TASK B                     TASK C
        kmemleak_write
         (with "scan" and
         NOT "scan=on")
        kmemleak_scan()
                           create_object
                           kmem_cache_alloc fails
                           kmemleak_disable
                           kmemleak_do_cleanup
                           kmemleak_free_enabled = 0
                                                      kfree
                                                      kmemleak_free bails out
                                                       (kmemleak_free_enabled is 0)
                                                      slub frees object->pointer
        update_checksum
        crash - object->pointer
         freed (DEBUG_PAGEALLOC)
      
      kmemleak_do_cleanup waits for the scan thread to complete, but not for
      direct call to kmemleak_scan via kmemleak_write.  So add a wait for
      kmemleak_scan completion before disabling kmemleak_free, and while at it
      fix the comment on stop_scan_thread.
      
      [vinmenon@codeaurora.org: fix stop_scan_thread comment]
        Link: http://lkml.kernel.org/r/1522219972-22809-1-git-send-email-vinmenon@codeaurora.org
      Link: http://lkml.kernel.org/r/1522063429-18992-1-git-send-email-vinmenon@codeaurora.orgSigned-off-by: default avatarVinayak Menon <vinmenon@codeaurora.org>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28bbb0d9
    • Steven J. Hill's avatar
      mm/vmstat.c: fix vmstat_update() preemption BUG · 08e9dbd5
      Steven J. Hill authored
      [ Upstream commit c7f26ccf ]
      
      Attempting to hotplug CPUs with CONFIG_VM_EVENT_COUNTERS enabled can
      cause vmstat_update() to report a BUG due to preemption not being
      disabled around smp_processor_id().
      
      Discovered on Ubiquiti EdgeRouter Pro with Cavium Octeon II processor.
      
        BUG: using smp_processor_id() in preemptible [00000000] code:
        kworker/1:1/269
        caller is vmstat_update+0x50/0xa0
        CPU: 0 PID: 269 Comm: kworker/1:1 Not tainted
        4.16.0-rc4-Cavium-Octeon-00009-gf83bbd5-dirty #1
        Workqueue: mm_percpu_wq vmstat_update
        Call Trace:
          show_stack+0x94/0x128
          dump_stack+0xa4/0xe0
          check_preemption_disabled+0x118/0x120
          vmstat_update+0x50/0xa0
          process_one_work+0x144/0x348
          worker_thread+0x150/0x4b8
          kthread+0x110/0x140
          ret_from_kernel_thread+0x14/0x1c
      
      Link: http://lkml.kernel.org/r/1520881552-25659-1-git-send-email-steven.hill@cavium.comSigned-off-by: default avatarSteven J. Hill <steven.hill@cavium.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Tejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08e9dbd5