1. 02 Sep, 2017 9 commits
    • Waiman Long's avatar
      locking/spinlock/debug: Remove spinlock lockup detection code · c0c6dff9
      Waiman Long authored
      commit bc88c10d upstream.
      
      The current spinlock lockup detection code can sometimes produce false
      positives because of the unfairness of the locking algorithm itself.
      
      So the lockup detection code is now removed. Instead, we are relying
      on the NMI watchdog to detect potential lockup. We won't have lockup
      detection if the watchdog isn't running.
      
      The commented-out read-write lock lockup detection code are also
      removed.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1486583208-11038-1-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0c6dff9
    • Dave Martin's avatar
      arm64: fpsimd: Prevent registers leaking across exec · 27e7506c
      Dave Martin authored
      commit 09662210 upstream.
      
      There are some tricky dependencies between the different stages of
      flushing the FPSIMD register state during exec, and these can race
      with context switch in ways that can cause the old task's regs to
      leak across.  In particular, a context switch during the memset() can
      cause some of the task's old FPSIMD registers to reappear.
      
      Disabling preemption for this small window would be no big deal for
      performance: preemption is already disabled for similar scenarios
      like updating the FPSIMD registers in sigreturn.
      
      So, instead of rearranging things in ways that might swap existing
      subtle bugs for new ones, this patch just disables preemption
      around the FPSIMD state flushing so that races of this type can't
      occur here.  This brings fpsimd_flush_thread() into line with other
      code paths.
      
      Fixes: 674c242c ("arm64: flush FP/SIMD state correctly after execve()")
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27e7506c
    • Arnd Bergmann's avatar
      x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl · 43f776da
      Arnd Bergmann authored
      commit 7206f9bf upstream.
      
      The x86 version of insb/insw/insl uses an inline assembly that does
      not have the target buffer listed as an output. This can confuse
      the compiler, leading it to think that a subsequent access of the
      buffer is uninitialized:
      
        drivers/net/wireless/wl3501_cs.c: In function ‘wl3501_mgmt_scan_confirm’:
        drivers/net/wireless/wl3501_cs.c:665:9: error: ‘sig.status’ is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/wireless/wl3501_cs.c:668:12: error: ‘sig.cap_info’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c: In function 'sb1000_rx':
        drivers/net/sb1000.c:775:9: error: 'st[0]' is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/sb1000.c:776:10: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c:784:11: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      I tried to mark the exact input buffer as an output here, but couldn't
      figure it out. As suggested by Linus, marking all memory as clobbered
      however is good enough too. For the outs operations, I also add the
      memory clobber, to force the input to be written to local variables.
      This is probably already guaranteed by the "asm volatile", but it can't
      hurt to do this for symmetry.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-5-arnd@arndb.de
      Link: https://lkml.org/lkml/2017/7/12/605Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43f776da
    • Mark Rutland's avatar
      arm64: mm: abort uaccess retries upon fatal signal · 509d8b52
      Mark Rutland authored
      commit 289d07a2 upstream.
      
      When there's a fatal signal pending, arm64's do_page_fault()
      implementation returns 0. The intent is that we'll return to the
      faulting userspace instruction, delivering the signal on the way.
      
      However, if we take a fatal signal during fixing up a uaccess, this
      results in a return to the faulting kernel instruction, which will be
      instantly retried, resulting in the same fault being taken forever. As
      the task never reaches userspace, the signal is not delivered, and the
      task is left unkillable. While the task is stuck in this state, it can
      inhibit the forward progress of the system.
      
      To avoid this, we must ensure that when a fatal signal is pending, we
      apply any necessary fixup for a faulting kernel instruction. Thus we
      will return to an error path, and it is up to that code to make forward
      progress towards delivering the fatal signal.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Reviewed-by: default avatarSteve Capper <steve.capper@arm.com>
      Tested-by: default avatarSteve Capper <steve.capper@arm.com>
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Tested-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      509d8b52
    • Suzuki K Poulose's avatar
      kvm: arm/arm64: Fix race in resetting stage2 PGD · 3e033635
      Suzuki K Poulose authored
      commit 6c0d706b upstream.
      
      In kvm_free_stage2_pgd() we check the stage2 PGD before holding
      the lock and proceed to take the lock if it is valid. And we unmap
      the page tables, followed by releasing the lock. We reset the PGD
      only after dropping this lock, which could cause a race condition
      where another thread waiting on or even holding the lock, could
      potentially see that the PGD is still valid and proceed to perform
      a stage2 operation and later encounter a NULL PGD.
      
      [223090.242280] Unable to handle kernel NULL pointer dereference at
      virtual address 00000040
      [223090.262330] PC is at unmap_stage2_range+0x8c/0x428
      [223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
      [223090.262531] Call trace:
      [223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428
      [223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c
      [223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104
      [223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc
      [223090.262543] [<ffff0000080a2478>]
      kvm_mmu_notifier_invalidate_page+0x50/0x8c
      [223090.262547] [<ffff0000082274f8>]
      __mmu_notifier_invalidate_page+0x5c/0x84
      [223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0
      [223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0
      [223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4
      [223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac
      [223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac
      [223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0
      [223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290
      [223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac
      [223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4
      [223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254
      [223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538
      [223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4
      [223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c
      [223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8
      [223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c
      [223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c
      [223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774
      [223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac
      [223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648
      [223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768
      [223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4
      [223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4
      [223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c
      
      This patch moves the stage2 PGD manipulation under the lock.
      Reported-by: default avatarAlexander Graf <agraf@suse.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e033635
    • Martin Liska's avatar
      gcov: support GCC 7.1 · b8a1532b
      Martin Liska authored
      commit 05384213 upstream.
      
      Starting from GCC 7.1, __gcov_exit is a new symbol expected to be
      implemented in a profiling runtime.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mliska@suse.cz: v2]
        Link: http://lkml.kernel.org/r/e63a3c59-0149-c97e-4084-20ca8f146b26@suse.cz
      Link: http://lkml.kernel.org/r/8c4084fa-3885-29fe-5fc4-0d4ca199c785@suse.czSigned-off-by: default avatarMartin Liska <mliska@suse.cz>
      Acked-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8a1532b
    • Arnd Bergmann's avatar
      staging: wilc1000: simplify vif[i]->ndev accesses · 47974403
      Arnd Bergmann authored
      commit 735bb39c upstream.
      
      With gcc-7, I got a new warning for this driver:
      
      wilc1000/linux_wlan.c: In function 'wilc_netdev_cleanup':
      wilc1000/linux_wlan.c:1224:15: error: 'vif[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      wilc1000/linux_wlan.c:1224:15: error: 'vif[0]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      A closer look at the function reveals that it's more complex than
      it needs to be, given that based on how the device is created
      we always get
      
      	netdev_priv(vif->ndev) == vif
      
      Based on this assumption, I found a few other places in the same file
      that can be simplified. That code appears to be a relic from times
      when the assumption above was not valid.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47974403
    • Arnd Bergmann's avatar
      scsi: isci: avoid array subscript warning · dd758f82
      Arnd Bergmann authored
      commit 5cfa2a3c upstream.
      
      I'm getting a new warning with gcc-7:
      
      isci/remote_node_context.c: In function 'sci_remote_node_context_destruct':
      isci/remote_node_context.c:69:16: error: array subscript is above array bounds [-Werror=array-bounds]
      
      This is odd, since we clearly cover all values for enum
      scis_sds_remote_node_context_states here. Anyway, checking for an array
      overflow can't harm and it makes the warning go away.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd758f82
    • Jiri Slaby's avatar
      p54: memset(0) whole array · f71996c3
      Jiri Slaby authored
      commit 6f175817 upstream.
      
      gcc 7 complains:
      drivers/net/wireless/intersil/p54/fwio.c: In function 'p54_scan':
      drivers/net/wireless/intersil/p54/fwio.c:491:4: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
      
      Fix that by passing the correct size to memset.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Christian Lamparter <chunkeey@googlemail.com>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Acked-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f71996c3
  2. 30 Aug, 2017 31 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.46 · 0eed54bd
      Greg Kroah-Hartman authored
      0eed54bd
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Ensure cpumask update is ordered · 5aa523a9
      Benjamin Herrenschmidt authored
      commit 1a92a80a upstream.
      
      There is no guarantee that the various isync's involved with
      the context switch will order the update of the CPU mask with
      the first TLB entry for the new context being loaded by the HW.
      
      Be safe here and add a memory barrier to order any subsequent
      load/store which may bring entries into the TLB.
      
      The corresponding barrier on the other side already exists as
      pte updates use pte_xchg() which uses __cmpxchg_u64 which has
      a sync after the atomic operation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Add comments in the code]
      [mpe: Backport to 4.12, minor context change]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5aa523a9
    • Lv Zheng's avatar
      ACPI: EC: Fix regression related to wrong ECDT initialization order · 5906715b
      Lv Zheng authored
      commit 98529b92 upstream.
      
      Commit 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle
      EC events) introduced acpi_ec_ecdt_start(), but that function is
      invoked before acpi_ec_query_init(), which is too early.  This causes
      the kernel to crash if an EC event occurs after boot, when ec_query_wq
      is not valid:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
       ...
       Workqueue: events acpi_ec_event_handler
       task: ffff9f539790dac0 task.stack: ffffb437c0e10000
       RIP: 0010:__queue_work+0x32/0x430
      
      Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start()
      is actually a no-op in the majority of cases.  However, commit
      c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      caused the probing of the DSDT EC as the "boot EC" to be skipped when
      the ECDT EC is valid and uncovered the bug.
      
      Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init()
      in acpi_ec_init().
      
      Link: https://jira01.devtools.intel.com/browse/LCK-4348
      Fixes: 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events)
      Fixes: c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      Reported-by: default avatarWang Wendy <wendy.wang@intel.com>
      Tested-by: default avatarFeng Chenzhou <chenzhoux.feng@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5906715b
    • James Morse's avatar
      ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal · 3bc8e4f9
      James Morse authored
      commit 7d64f82c upstream.
      
      When removing a GHES device notified by SCI, list_del_rcu() is used,
      ghes_remove() should call synchronize_rcu() before it goes on to call
      kfree(ghes), otherwise concurrent RCU readers may still hold this list
      entry after it has been freed.
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Fixes: 81e88fdc (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bc8e4f9
    • Joerg Roedel's avatar
      ACPI: ioapic: Clear on-stack resource before using it · 454cac5d
      Joerg Roedel authored
      commit e3d5092b upstream.
      
      The on-stack resource-window 'win' in setup_res() is not
      properly initialized. This causes the pointers in the
      embedded 'struct resource' to contain stale addresses.
      
      These pointers (in my case the ->child pointer) later get
      propagated to the global iomem_resources list, causing a #GP
      exception when the list is traversed in
      iomem_map_sanity_check().
      
      Fixes: c183619b (x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug)
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      454cac5d
    • Dave Jiang's avatar
      ntb: transport shouldn't disable link due to bogus values in SPADs · c1628774
      Dave Jiang authored
      commit f3fd2afe upstream.
      
      It seems that under certain scenarios the SPAD can have bogus values caused
      by an agent (i.e. BIOS or other software) that is not the kernel driver, and
      that causes memory window setup failure. This should not cause the link to
      be disabled because if we do that, the driver will never recover again. We
      have verified in testing that this issue happens and prevents proper link
      recovery.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: 84f76685 ("ntb: stop link work when we do not have memory")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1628774
    • Logan Gunthorpe's avatar
      ntb: ntb_test: ensure the link is up before trying to configure the mws · 4d4f3547
      Logan Gunthorpe authored
      commit 0eb46345 upstream.
      
      After the link tests, there is a race on one side of the test for
      the link coming up. It's possible, in some cases, for the test script
      to write to the 'peer_trans' files before the link has come up.
      
      To fix this, we simply use the link event file to ensure both sides
      see the link as up before continuning.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: a9c59ef7 ("ntb_test: Add a selftest script for the NTB subsystem")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d4f3547
    • Allen Hubbe's avatar
      ntb: no sleep in ntb_async_tx_submit · 7592db55
      Allen Hubbe authored
      commit 88931ec3 upstream.
      
      Do not sleep in ntb_async_tx_submit, which could deadlock.
      This reverts commit "8c874cc1"
      
      Fixes: 8c874cc1 ("NTB: Address out of DMA descriptor issue with NTB")
      Reported-by: default avatarJia-Ju Bai <baijiaju1990@163.com>
      Signed-off-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Acked-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7592db55
    • Logan Gunthorpe's avatar
      NTB: ntb_test: fix bug printing ntb_perf results · bff04a46
      Logan Gunthorpe authored
      commit 07b0b22b upstream.
      
      The code mistakenly prints the local perf results for the remote test
      so the script reports identical results for both directions. Fix this
      by ensuring we print the remote result.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Fixes: a9c59ef7 ("ntb_test: Add a selftest script for the NTB subsystem")
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bff04a46
    • Logan Gunthorpe's avatar
      ntb_transport: fix bug calculating num_qps_mw · 471954c3
      Logan Gunthorpe authored
      commit 8e8496e0 upstream.
      
      A divide by zero error occurs if qp_count is less than mw_count because
      num_qps_mw is calculated to be zero. The calculation appears to be
      incorrect.
      
      The requirement is for num_qps_mw to be set to qp_count / mw_count
      with any remainder divided among the earlier mws.
      
      For example, if mw_count is 5 and qp_count is 12 then mws 0 and 1
      will have 3 qps per window and mws 2 through 4 will have 2 qps per window.
      Thus, when mw_num < qp_count % mw_count, num_qps_mw is 1 higher
      than when mw_num >= qp_count.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Fixes: e26a5843 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      471954c3
    • Logan Gunthorpe's avatar
      ntb_transport: fix qp count bug · 4743d1b3
      Logan Gunthorpe authored
      commit cb827ee6 upstream.
      
      In cases where there are more mw's than spads/2-2, the mw count gets
      reduced to match the limitation. ntb_transport also tries to ensure that
      there are fewer qps than mws but uses the full mw count instead of
      the reduced one. When this happens, the math in
      'ntb_transport_setup_qp_mw' will get confused and result in a kernel
      paging request bug.
      
      This patch fixes the bug by reducing qp_count to the reduced mw count
      instead of the full mw count.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Fixes: e26a5843 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4743d1b3
    • Linus Torvalds's avatar
      Clarify (and fix) MAX_LFS_FILESIZE macros · b8fce382
      Linus Torvalds authored
      commit 0cc3b0ec upstream.
      
      We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
      filesystems (and other IO targets) that know they are 64-bit clean and
      don't have any 32-bit limits in their IO path.
      
      It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
      the VM layer is limited by the page cache to only 32-bit index values,
      but our logic for that was confusing and actually wrong.  We used to
      define that value to
      
      	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
      
      which is actually odd in several ways: it limits the index to 31 bits,
      and then it limits files so that they can't have data in that last byte
      of a page that has the highest 31-bit index (ie page index 0x7fffffff).
      
      Neither of those limitations make sense.  The index is actually the full
      32 bit unsigned value, and we can use that whole full page.  So the
      maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".
      
      However, we do wan tto avoid the maximum index, because we have code
      that iterates over the page indexes, and we don't want that code to
      overflow.  So the maximum size of a file on a 32-bit host should
      actually be one page less than the full 32-bit index.
      
      So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
      not actually be using the page of that last index (ULONG_MAX), but we
      can grow a file up to that limit.
      
      The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
      Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
      volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
      byte less), but the actual true VM limit is one page less than 16TiB.
      
      This was invisible until commit c2a9737f ("vfs,mm: fix a dead loop
      in truncate_inode_pages_range()"), which started applying that
      MAX_LFS_FILESIZE limit to block devices too.
      
      NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
      actually just the offset type itself (loff_t), which is signed.  But for
      clarity, on 64-bit, just use the maximum signed value, and don't make
      people have to count the number of 'f' characters in the hex constant.
      
      So just use LLONG_MAX for the 64-bit case.  That was what the value had
      been before too, just written out as a hex constant.
      
      Fixes: c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      Reported-and-tested-by: default avatarDoug Nazar <nazard@nazar.ca>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8fce382
    • Charles Milette's avatar
      staging: rtl8188eu: add RNX-N150NUB support · ab4be3a6
      Charles Milette authored
      commit f299aec6 upstream.
      
      Add support for USB Device Rosewill RNX-N150NUB.
      VendorID: 0x0bda, ProductID: 0xffef
      Signed-off-by: default avatarCharles Milette <charles.milette@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab4be3a6
    • Srinivas Pandruvada's avatar
      iio: hid-sensor-trigger: Fix the race with user space powering up sensors · 23caaf2f
      Srinivas Pandruvada authored
      commit f1664eaa upstream.
      
      It has been reported for a while that with iio-sensor-proxy service the
      rotation only works after one suspend/resume cycle. This required a wait
      in the systemd unit file to avoid race. I found a Yoga 900 where I could
      reproduce this.
      
      The problem scenerio is:
      - During sensor driver init, enable run time PM and also set a
        auto-suspend for 3 seconds.
      	This result in one runtime resume. But there is a check to avoid
      a powerup in this sequence, but rpm is active
      - User space iio-sensor-proxy tries to power up the sensor. Since rpm is
        active it will simply return. But sensors were not actually
      powered up in the prior sequence, so actaully the sensors will not work
      - After 3 seconds the auto suspend kicks
      
      If we add a wait in systemd service file to fire iio-sensor-proxy after
      3 seconds, then now everything will work as the runtime resume will
      actually powerup the sensor as this is a user request.
      
      To avoid this:
      - Remove the check to match user requested state, this will cause a
        brief powerup, but if the iio-sensor-proxy starts immediately it will
      still work as the sensors are ON.
      - Also move the autosuspend delay to place when user requested turn off
        of sensors, like after user finished raw read or buffer disable
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Tested-by: default avatarBastien Nocera <hadess@hadess.net>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23caaf2f
    • Dragos Bogdan's avatar
      iio: imu: adis16480: Fix acceleration scale factor for adis16480 · b150ee06
      Dragos Bogdan authored
      commit fdd0d32e upstream.
      
      According to the datasheet, the range of the acceleration is [-10 g, + 10 g],
      so the scale factor should be 10 instead of 5.
      Signed-off-by: default avatarDragos Bogdan <dragos.bogdan@analog.com>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b150ee06
    • Martijn Coenen's avatar
      ANDROID: binder: fix proc->tsk check. · cbd854d9
      Martijn Coenen authored
      commit b2a6d1b9 upstream.
      
      Commit c4ea41ba ("binder: use group leader instead of open thread")'
      was incomplete and didn't update a check in binder_mmap(), causing all
      mmap() calls into the binder driver to fail.
      Signed-off-by: default avatarMartijn Coenen <maco@android.com>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbd854d9
    • Riley Andrews's avatar
      binder: Use wake up hint for synchronous transactions. · 8fb0b0ce
      Riley Andrews authored
      commit 00b40d61 upstream.
      
      Use wake_up_interruptible_sync() to hint to the scheduler binder
      transactions are synchronous wakeups. Disable preemption while waking
      to avoid ping-ponging on the binder lock.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarOmprakash Dhyade <odhyade@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fb0b0ce
    • Todd Kjos's avatar
      binder: use group leader instead of open thread · 51050750
      Todd Kjos authored
      commit c4ea41ba upstream.
      
      The binder allocator assumes that the thread that
      called binder_open will never die for the lifetime of
      that proc. That thread is normally the group_leader,
      however it may not be. Use the group_leader instead
      of current.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51050750
    • Todd Kjos's avatar
      Revert "android: binder: Sanity check at binder ioctl" · eda70a55
      Todd Kjos authored
      commit a2b18708 upstream.
      
      This reverts commit a906d693.
      
      The patch introduced a race in the binder driver. An attempt to fix the
      race was submitted in "[PATCH v2] android: binder: fix dangling pointer
      comparison", however the conclusion in the discussion for that patch
      was that the original patch should be reverted.
      
      The reversion is being done as part of the fine-grained locking
      patchset since the patch would need to be refactored when
      proc->vmm_vm_mm is removed from struct binder_proc and added
      in the binder allocator.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eda70a55
    • Jeffy Chen's avatar
      Bluetooth: bnep: fix possible might sleep error in bnep_session · 242cea2d
      Jeffy Chen authored
      commit 25717382 upstream.
      
      It looks like bnep_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      242cea2d
    • Jeffy Chen's avatar
      Bluetooth: cmtp: fix possible might sleep error in cmtp_session · ffb7640a
      Jeffy Chen authored
      commit f06d9773 upstream.
      
      It looks like cmtp_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffb7640a
    • Jeffy Chen's avatar
      Bluetooth: hidp: fix possible might sleep error in hidp_session_thread · 1b5fcb3b
      Jeffy Chen authored
      commit 5da8e47d upstream.
      
      It looks like hidp_session_thread has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Tested-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Tested-by: default avatarRohit Vaswani <rvaswani@nvidia.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b5fcb3b
    • Florian Westphal's avatar
      netfilter: nat: fix src map lookup · 5f81b1f5
      Florian Westphal authored
      commit 97772bcd upstream.
      
      When doing initial conversion to rhashtable I replaced the bucket
      walk with a single rhashtable_lookup_fast().
      
      When moving to rhlist I failed to properly walk the list of identical
      tuples, but that is what is needed for this to work correctly.
      The table contains the original tuples, so the reply tuples are all
      distinct.
      
      We currently decide that mapping is (not) in range only based on the
      first entry, but in case its not we need to try the reply tuple of the
      next entry until we either find an in-range mapping or we checked
      all the entries.
      
      This bug makes nat core attempt collision resolution while it might be
      able to use the mapping as-is.
      
      Fixes: 870190a9 ("netfilter: nat: convert nat bysrc hash to rhashtable")
      Reported-by: default avatarJaco Kroon <jaco@uls.co.za>
      Tested-by: default avatarJaco Kroon <jaco@uls.co.za>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f81b1f5
    • Zhang Bo's avatar
      Revert "leds: handle suspend/resume in heartbeat trigger" · 090911a2
      Zhang Bo authored
      commit 436c4c45 upstream.
      
      This reverts commit 5ab92a7c.
      
      System cannot enter suspend mode because of heartbeat led trigger.
      In autosleep_wq, try_to_suspend function will try to enter suspend
      mode in specific period. it will get wakeup_count then call pm_notifier
      chain callback function and freeze processes.
      Heartbeat_pm_notifier is called and it call led_trigger_unregister to
      change the trigger of led device to none. It will send uevent message
      and the wakeup source count changed. As wakeup_count changed, suspend
      will abort.
      
      Fixes: 5ab92a7c ("leds: handle suspend/resume in heartbeat trigger")
      Signed-off-by: default avatarZhang Bo <bo.zhang@nxp.com>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      090911a2
    • Vadim Lomovtsev's avatar
      net: sunrpc: svcsock: fix NULL-pointer exception · d4c5c26c
      Vadim Lomovtsev authored
      commit eebe53e8 upstream.
      
      While running nfs/connectathon tests kernel NULL-pointer exception
      has been observed due to races in svcsock.c.
      
      Race is appear when kernel accepts connection by kernel_accept
      (which creates new socket) and start queuing ingress packets
      to new socket. This happens in ksoftirq context which could run
      concurrently on a different core while new socket setup is not done yet.
      
      The fix is to re-order socket user data init sequence and add
      write/read barrier calls to be sure that we got proper values
      for callback pointers before actually calling them.
      
      Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
      
      Crash log:
      ---<-snip->---
      [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [ 6708.647093] pgd = ffff0000094e0000
      [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
      [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
      [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
      [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
      [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
      [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
      [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
      [ 6708.773822] PC is at 0x0
      [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
      [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
      [ 6708.789248] sp : ffff810ffbad3900
      [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
      [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
      [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
      [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
      [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
      [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
      [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
      [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
      [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
      [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
      [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
      [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
      [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
      [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
      [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
      [ 6708.872084]
      [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
      [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
      [ 6708.886075] Call trace:
      [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
      [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
      [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
      [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
      [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
      [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
      [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
      [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
      [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
      [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
      [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
      [ 6708.973117] [<          (null)>]           (null)
      [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
      [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
      [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
      [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
      [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
      [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
      [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
      [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
      [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
      [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
      [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
      [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
      [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
      [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
      [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
      [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
      [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
      [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
      [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
      [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
      [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
      [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
      [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
      [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
      [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
      [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
      [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
      [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
      [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
      [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
      [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
      [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
      [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
      [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
      [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
      [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
      [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
      [ 6709.207967] Code: bad PC value
      [ 6709.211061] SMP: stopping secondary CPUs
      [ 6709.218830] Starting crashdump kernel...
      [ 6709.222749] Bye!
      ---<-snip>---
      Signed-off-by: default avatarVadim Lomovtsev <vlomovts@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4c5c26c
    • Eric Biggers's avatar
      x86/mm: Fix use-after-free of ldt_struct · 3559de45
      Eric Biggers authored
      commit ccd5b323 upstream.
      
      The following commit:
      
        39a0526f ("x86/mm: Factor out LDT init from context init")
      
      renamed init_new_context() to init_new_context_ldt() and added a new
      init_new_context() which calls init_new_context_ldt().  However, the
      error code of init_new_context_ldt() was ignored.  Consequently, if a
      memory allocation in alloc_ldt_struct() failed during a fork(), the
      ->context.ldt of the new task remained the same as that of the old task
      (due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
      shared, so a use-after-free occurred after one task exited.
      
      Fix the bug by making init_new_context() pass through the error code of
      init_new_context_ldt().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
          Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710
      
          CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           print_address_description+0x73/0x250 mm/kasan/report.c:252
           kasan_report_error mm/kasan/report.c:351 [inline]
           kasan_report+0x24e/0x340 mm/kasan/report.c:409
           __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
           free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           exec_mmap fs/exec.c:1061 [inline]
           flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
           load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
           search_binary_handler+0x142/0x6b0 fs/exec.c:1652
           exec_binprm fs/exec.c:1694 [inline]
           do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
           do_execve+0x31/0x40 fs/exec.c:1860
           call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
          Allocated by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
           kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
           kmalloc include/linux/slab.h:493 [inline]
           alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
           write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
           sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
           __cache_free mm/slab.c:3503 [inline]
           kfree+0xca/0x250 mm/slab.c:3820
           free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           __mmput kernel/fork.c:916 [inline]
           mmput+0x541/0x6e0 kernel/fork.c:927
           copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
           copy_process kernel/fork.c:1546 [inline]
           _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
           SYSC_clone kernel/fork.c:2135 [inline]
           SyS_clone+0x37/0x50 kernel/fork.c:2129
           do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
           return_from_SYSCALL_64+0x0/0x7a
      
      Here is a C reproducer:
      
          #include <asm/ldt.h>
          #include <pthread.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *fork_thread(void *_arg)
          {
              fork();
          }
      
          int main(void)
          {
              struct user_desc desc = { .entry_number = 8191 };
      
              syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));
      
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      srand(getpid());
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
      may use vmalloc() to allocate a large ->entries array, and after
      commit:
      
        5d17a73a ("vmalloc: back off when the current task is killed")
      
      it is possible for userspace to fail a task's vmalloc() by
      sending a fatal signal, e.g. via exit_group().  It would be more
      difficult to reproduce this bug on kernels without that commit.
      
      This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 39a0526f ("x86/mm: Factor out LDT init from context init")
      Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3559de45
    • Nicholas Piggin's avatar
      timers: Fix excessive granularity of new timers after a nohz idle · 70b3fd5c
      Nicholas Piggin authored
      commit 2fe59f50 upstream.
      
      When a timer base is idle, it is forwarded when a new timer is added
      to ensure that granularity does not become excessive. When not idle,
      the timer tick is expected to increment the base.
      
      However there are several problems:
      
      - If an existing timer is modified, the base is forwarded only after
        the index is calculated.
      
      - The base is not forwarded by add_timer_on.
      
      - There is a window after a timer is restarted from a nohz idle, after
        it is marked not-idle and before the timer tick on this CPU, where a
        timer may be added but the ancient base does not get forwarded.
      
      These result in excessive granularity (a 1 jiffy timeout can blow out
      to 100s of jiffies), which cause the rcu lockup detector to trigger,
      among other things.
      
      Fix this by keeping track of whether the timer base has been idle
      since it was last run or forwarded, and if so then forward it before
      adding a new timer.
      
      There is still a case where mod_timer optimises the case of a pending
      timer mod with the same expiry time, where the timer can see excessive
      granularity relative to the new, shorter interval. A comment is added,
      but it's not changed because it is an important fastpath for
      networking.
      
      This has been tested and found to fix the RCU softlockup messages.
      
      Testing was also done with tracing to measure requested versus
      achieved wakeup latencies for all non-deferrable timers in an idle
      system (with no lockup watchdogs running). Wakeup latency relative to
      absolute latency is calculated (note this suffers from round-up skew
      at low absolute times) and analysed:
      
                   max     avg      std
      upstream   506.0    1.20     4.68
      patched      2.0    1.08     0.15
      
      The bug was noticed due to the lockup detector Kconfig changes
      dropping it out of people's .configs and resulting in larger base
      clk skew When the lockup detectors are enabled, no CPU can go idle for
      longer than 4 seconds, which limits the granularity errors.
      Sub-optimal timer behaviour is observable on a smaller scale in that
      case:
      
      	     max     avg      std
      upstream     9.0    1.05     0.19
      patched      2.0    1.04     0.11
      
      Fixes: Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: dzickus@redhat.com
      Cc: sfr@canb.auug.org.au
      Cc: mpe@ellerman.id.au
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: linuxarm@huawei.com
      Cc: abdhalee@linux.vnet.ibm.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: akpm@linux-foundation.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70b3fd5c
    • Thomas Gleixner's avatar
      perf/x86/intel/rapl: Make package handling more robust · 3df3b2ef
      Thomas Gleixner authored
      commit dd86e373 upstream.
      
      The package management code in RAPL relies on package mapping being
      available before a CPU is started. This changed with:
      
        9d85eb91 ("x86/smpboot: Make logical package management more robust")
      
      because the ACPI/BIOS information turned out to be unreliable, but that
      left RAPL in broken state. This was not noticed because on a regular boot
      all CPUs are online before RAPL is initialized.
      
      A possible fix would be to reintroduce the mess which allocates a package
      data structure in CPU prepare and when it turns out to already exist in
      starting throw it away later in the CPU online callback. But that's a
      horrible hack and not required at all because RAPL becomes functional for
      perf only in the CPU online callback. That's correct because user space is
      not yet informed about the CPU being onlined, so nothing caan rely on RAPL
      being available on that particular CPU.
      
      Move the allocation to the CPU online callback and simplify the hotplug
      handling. At this point the package mapping is established and correct.
      
      This also adds a missing check for available package data in the
      event_init() function.
      Reported-by: default avatarYasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
      Link: http://lkml.kernel.org/r/20170131230141.212593966@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ jwang: backport to 4.9 fix Null pointer deref during hotplug cpu.]
      Signed-off-by: default avatarJack Wang <jinpu.wang@profitbricks.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3df3b2ef
    • Masami Hiramatsu's avatar
      perf probe: Fix --funcs to show correct symbols for offline module · bac83e5c
      Masami Hiramatsu authored
      commit eebc509b upstream.
      
      Fix --funcs (-F) option to show correct symbols for offline module.
      Since previous perf-probe uses machine__findnew_module_map() for offline
      module, even if user passes a module file (with full path) which is for
      other architecture, perf-probe always tries to load symbol map for
      current kernel module.
      
      This fix uses dso__new_map() to load the map from given binary as same
      as a map for user applications.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devboxSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bac83e5c
    • Mark Rutland's avatar
      perf/core: Fix group {cpu,task} validation · bde6608d
      Mark Rutland authored
      commit 64aee2a9 upstream.
      
      Regardless of which events form a group, it does not make sense for the
      events to target different tasks and/or CPUs, as this leaves the group
      inconsistent and impossible to schedule. The core perf code assumes that
      these are consistent across (successfully intialised) groups.
      
      Core perf code only verifies this when moving SW events into a HW
      context. Thus, we can violate this requirement for pure SW groups and
      pure HW groups, unless the relevant PMU driver happens to perform this
      verification itself. These mismatched groups subsequently wreak havoc
      elsewhere.
      
      For example, we handle watchpoints as SW events, and reserve watchpoint
      HW on a per-CPU basis at pmu::event_init() time to ensure that any event
      that is initialised is guaranteed to have a slot at pmu::add() time.
      However, the core code only checks the group leader's cpu filter (via
      event_filter_match()), and can thus install follower events onto CPUs
      violating thier (mismatched) CPU filters, potentially installing them
      into a CPU without sufficient reserved slots.
      
      This can be triggered with the below test case, resulting in warnings
      from arch backends.
      
        #define _GNU_SOURCE
        #include <linux/hw_breakpoint.h>
        #include <linux/perf_event.h>
        #include <sched.h>
        #include <stdio.h>
        #include <sys/prctl.h>
        #include <sys/syscall.h>
        #include <unistd.h>
      
        static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
      			   int group_fd, unsigned long flags)
        {
      	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
        }
      
        char watched_char;
      
        struct perf_event_attr wp_attr = {
      	.type = PERF_TYPE_BREAKPOINT,
      	.bp_type = HW_BREAKPOINT_RW,
      	.bp_addr = (unsigned long)&watched_char,
      	.bp_len = 1,
      	.size = sizeof(wp_attr),
        };
      
        int main(int argc, char *argv[])
        {
      	int leader, ret;
      	cpu_set_t cpus;
      
      	/*
      	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
      	 */
      	CPU_ZERO(&cpus);
      	CPU_SET(0, &cpus);
      	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
      	if (ret) {
      		printf("Unable to set cpu affinity\n");
      		return 1;
      	}
      
      	/* open leader event, bound to this task, CPU0 only */
      	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	if (leader < 0) {
      		printf("Couldn't open leader: %d\n", leader);
      		return 1;
      	}
      
      	/*
      	 * Open a follower event that is bound to the same task, but a
      	 * different CPU. This means that the group should never be possible to
      	 * schedule.
      	 */
      	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
      	if (ret < 0) {
      		printf("Couldn't open mismatched follower: %d\n", ret);
      		return 1;
      	} else {
      		printf("Opened leader/follower with mismastched CPUs\n");
      	}
      
      	/*
      	 * Open as many independent events as we can, all bound to the same
      	 * task, CPU0 only.
      	 */
      	do {
      		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	} while (ret >= 0);
      
      	/*
      	 * Force enable/disble all events to trigger the erronoeous
      	 * installation of the follower event.
      	 */
      	printf("Opened all events. Toggling..\n");
      	for (;;) {
      		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
      		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
      	}
      
      	return 0;
        }
      
      Fix this by validating this requirement regardless of whether we're
      moving events.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhou Chengming <zhouchengming1@huawei.com>
      Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bde6608d
    • Steven Rostedt (VMware)'s avatar
      ftrace: Check for null ret_stack on profile function graph entry function · 741397d1
      Steven Rostedt (VMware) authored
      commit a8f0f9e4 upstream.
      
      There's a small race when function graph shutsdown and the calling of the
      registered function graph entry callback. The callback must not reference
      the task's ret_stack without first checking that it is not NULL. Note, when
      a ret_stack is allocated for a task, it stays allocated until the task exits.
      The problem here, is that function_graph is shutdown, and a new task was
      created, which doesn't have its ret_stack allocated. But since some of the
      functions are still being traced, the callbacks can still be called.
      
      The normal function_graph code handles this, but starting with commit
      8861dd30 ("ftrace: Access ret_stack->subtime only in the function
      profiler") the profiler code references the ret_stack on function entry, but
      doesn't check if it is NULL first.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611
      
      Fixes: 8861dd30 ("ftrace: Access ret_stack->subtime only in the function profiler")
      Reported-by: lilydjwg@gmail.com
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      741397d1