1. 26 Jul, 2019 3 commits
  2. 25 Jul, 2019 5 commits
    • Linus Torvalds's avatar
      Merge tag 'pm-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 6789f873
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki
       "These fix two issues related to the RAPL MMIO interface support added
        recently and one cpufreq driver issue.
      
        Specifics:
      
         - Initialize the power capping subsystem and the RAPL driver earlier
           in case the int340X thermal driver is built-in and attempts to
           register an MMIO interface for RAPL which must not happen before
           the requisite infrastructure is ready (Zhang Rui)
      
         - Fix the int340X thermal driver's RAPL MMIO interface registration
           error path (Rafael Wysocki)
      
         - Fix possible use-after-free in the pasemi cpufreq driver (Wen
           Yang)"
      
      * tag 'pm-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
        int340X/processor_thermal_device: Fix proc_thermal_rapl_remove()
        powercap: Invoke powercap_init() and rapl_init() earlier
      6789f873
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · a51edf75
      Linus Torvalds authored
      Pull RISC-V updates from Paul Walmsley:
       "Four minor RISC-V-related changes:
      
         - Add support for the new clone3 syscall for RV64, relying on the
           generic support
      
         - Add DT data for the gigabit Ethernet controller on the SiFive FU540
           and the HiFive Unleashed board
      
         - Update MAINTAINERS to add me to the arch/riscv maintainers' list
      
         - Add support for PCIe message-signaled interrupts by reusing the
           generic header file"
      
      * tag 'riscv/for-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: dts: Add DT node for SiFive FU540 Ethernet controller driver
        riscv: include generic support for MSI irqdomains
        MAINTAINERS: Add Paul as a RISC-V maintainer
        riscv: enable sys_clone3 syscall for rv64
      a51edf75
    • Linus Torvalds's avatar
      Merge tag 'ktest-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · da3cc2e6
      Linus Torvalds authored
      Pull ktest fixlets from Steven Rostedt:
       "This contains only simple spelling fixes"
      
      * tag 'ktest-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Fix some typos in config-bisect.pl
      da3cc2e6
    • Linus Torvalds's avatar
      Merge branch 'access-creds' · a29a0a46
      Linus Torvalds authored
      The access() (and faccessat()) credentials change can cause an
      unnecessary load on the RCU machinery because every access() call ends
      up freeing the temporary access credential using RCU.
      
      This isn't really noticeable on small machines, but if you have hundreds
      of cores you can cause huge slowdowns due to RCU storms.
      
      It's easy to avoid: the temporary access crededntials aren't actually
      normally accessed using RCU at all, so we can avoid the whole issue by
      just marking them as such.
      
      * access-creds:
        access: avoid the RCU grace period for the temporary subjective credentials
      a29a0a46
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq' · fdc75701
      Rafael J. Wysocki authored
      * pm-cpufreq:
        cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
      fdc75701
  3. 24 Jul, 2019 7 commits
    • Masanari Iida's avatar
      aecea57f
    • Linus Torvalds's avatar
      access: avoid the RCU grace period for the temporary subjective credentials · d7852fbd
      Linus Torvalds authored
      It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
      work because it installs a temporary credential that gets allocated and
      freed for each system call.
      
      The allocation and freeing overhead is mostly benign, but because
      credentials can be accessed under the RCU read lock, the freeing
      involves a RCU grace period.
      
      Which is not a huge deal normally, but if you have a lot of access()
      calls, this causes a fair amount of seconday damage: instead of having a
      nice alloc/free patterns that hits in hot per-CPU slab caches, you have
      all those delayed free's, and on big machines with hundreds of cores,
      the RCU overhead can end up being enormous.
      
      But it turns out that all of this is entirely unnecessary.  Exactly
      because access() only installs the credential as the thread-local
      subjective credential, the temporary cred pointer doesn't actually need
      to be RCU free'd at all.  Once we're done using it, we can just free it
      synchronously and avoid all the RCU overhead.
      
      So add a 'non_rcu' flag to 'struct cred', which can be set by users that
      know they only use it in non-RCU context (there are other potential
      users for this).  We can make it a union with the rcu freeing list head
      that we need for the RCU case, so this doesn't need any extra storage.
      
      Note that this also makes 'get_current_cred()' clear the new non_rcu
      flag, in case we have filesystems that take a long-term reference to the
      cred and then expect the RCU delayed freeing afterwards.  It's not
      entirely clear that this is required, but it makes for clear semantics:
      the subjective cred remains non-RCU as long as you only access it
      synchronously using the thread-local accessors, but you _can_ use it as
      a generic cred if you want to.
      
      It is possible that we should just remove the whole RCU markings for
      ->cred entirely.  Only ->real_cred is really supposed to be accessed
      through RCU, and the long-term cred copies that nfs uses might want to
      explicitly re-enable RCU freeing if required, rather than have
      get_current_cred() do it implicitly.
      
      But this is a "minimal semantic changes" change for the immediate
      problem.
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jan Glauber <jglauber@marvell.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7852fbd
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · bed38c3e
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "An assortment of non-regression fixes that have accumulated since the
        start of the merge window.
      
         - A fix for a user triggerable oops on machines where transactional
           memory is disabled, eg. Power9 bare metal, Power8 with TM disabled
           on the command line, or all Power7 or earlier machines.
      
         - Three fixes for handling of PMU and power saving registers when
           running nested KVM on Power9.
      
         - Two fixes for bugs found while stress testing the XIVE interrupt
           controller code, also on Power9.
      
         - A fix to allow guests to boot under Qemu/KVM on Power9 using the
           the Hash MMU with >= 1TB of memory.
      
         - Two fixes for bugs in the recent DMA cleanup, one of which could
           lead to checkstops.
      
         - And finally three fixes for the PAPR SCM nvdimm driver.
      
        Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater,
        Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling,
        Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar
        Singh, Vaibhav Jain"
      
      * tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
        powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
        powerpc/pseries: Update SCM hcall op-codes in hvcall.h
        powerpc/tm: Fix oops on sigreturn on systems without TM
        powerpc/dma: Fix invalid DMA mmap behavior
        KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails
        powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
        powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA
        KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries
        powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR
        KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
        powerpc/mm: Limit rma_size to 1TB when running without HV mode
      bed38c3e
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 76260774
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Bugfixes, a pvspinlock optimization, and documentation moving"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
        Documentation: move Documentation/virtual to Documentation/virt
        KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
        KVM: X86: Dynamically allocate user_fpu
        KVM: X86: Fix fpu state crash in kvm guest
        Revert "kvm: x86: Use task structs fpu field for user"
        KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
      76260774
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping · c2626876
      Linus Torvalds authored
      Pull dma-mapping regression fix from Christoph Hellwig:
       "Ensure that dma_addressing_limited doesn't crash on devices without a
        dma mask (Eric Auger)"
      
      * tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: use dma_get_mask in dma_addressing_limited
      c2626876
    • Wanpeng Li's avatar
      KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption · 266e85a5
      Wanpeng Li authored
      Commit 11752adb (locking/pvqspinlock: Implement hybrid PV queued/unfair locks)
      introduces hybrid PV queued/unfair locks
       - queued mode (no starvation)
       - unfair mode (good performance on not heavily contended lock)
      The lock waiter goes into the unfair mode especially in VMs with over-commit
      vCPUs since increaing over-commitment increase the likehood that the queue
      head vCPU may have been preempted and not actively spinning.
      
      However, reschedule queue head vCPU timely to acquire the lock still can get
      better performance than just depending on lock stealing in over-subscribe
      scenario.
      
      Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
      ebizzy -M
                   vanilla     boosting    improved
       1VM          23520        25040         6%
       2VM           8000        13600        70%
       3VM           3100         5400        74%
      
      The lock holder vCPU yields to the queue head vCPU when unlock, to boost queue
      head vCPU which is involuntary preemption or the one which is voluntary halt
      due to fail to acquire the lock after a short spin in the guest.
      
      Cc: Waiman Long <longman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      266e85a5
    • Christoph Hellwig's avatar
      Documentation: move Documentation/virtual to Documentation/virt · 2f5947df
      Christoph Hellwig authored
      Renaming docs seems to be en vogue at the moment, so fix on of the
      grossly misnamed directories.  We usually never use "virtual" as
      a shortcut for virtualization in the kernel, but always virt,
      as seen in the virt/ top-level directory.  Fix up the documentation
      to match that.
      
      Fixes: ed16648e ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2f5947df
  4. 23 Jul, 2019 6 commits
  5. 22 Jul, 2019 19 commits
    • Yash Shah's avatar
      riscv: dts: Add DT node for SiFive FU540 Ethernet controller driver · 26091eef
      Yash Shah authored
      DT node for SiFive FU540-C000 GEMGXL Ethernet controller driver added
      Signed-off-by: default avatarYash Shah <yash.shah@sifive.com>
      Reviewed-by: default avatarSagar Kadam <sagar.kadam@sifive.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      [paul.walmsley@sifive.com: changed "phy1" to "phy0" at Andrew Lunn's
       suggestion]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      26091eef
    • Wesley Terpstra's avatar
      riscv: include generic support for MSI irqdomains · 251a4488
      Wesley Terpstra authored
      Some RISC-V systems include PCIe host controllers that support PCIe
      message-signaled interrupts.  For this to work on Linux, we need to
      enable PCI_MSI_IRQ_DOMAIN and define struct msi_alloc_info.  Support
      for the latter is enabled by including the architecture-generic msi.h
      include.
      Signed-off-by: default avatarWesley Terpstra <wesley@sifive.com>
      [paul.walmsley@sifive.com: split initial patch into one arch/riscv
       patch and one drivers/pci patch]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      251a4488
    • Palmer Dabbelt's avatar
      MAINTAINERS: Add Paul as a RISC-V maintainer · f4da5d07
      Palmer Dabbelt authored
      The RISC-V port has grown significantly over the past year.  Paul's been
      helping out for a while ago.  We agreed in person that he'd take over
      collecting the patches and submitting the PRs, but it looks like I
      forgot to make it official.
      Signed-off-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      f4da5d07
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7b5cf701
      Linus Torvalds authored
      Pull preemption Kconfig fix from Thomas Gleixner:
       "The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
        PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
        and PREEMPT_RT.
      
        Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
        obviously can't work anymore. oldconfig builds are affected as well,
        but it's more obvious as the user gets asked. [old]defconfig silently
        fixes it up and selects PREEMPT_NONE.
      
        Unbreak it by undoing the rename and adding a intermediate config
        symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
        to chase down a few #ifdefs, but it's better than tweaking 114
        defconfigs and annoying users"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
      7b5cf701
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 44b912cd
      Linus Torvalds authored
      Pull pidfd polling fix from Christian Brauner:
       "A fix for pidfd polling. It ensures that the task's exit state is
        visible to all waiters"
      
      * tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        pidfd: fix a poll race when setting exit_state
      44b912cd
    • Linus Torvalds's avatar
      Merge tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 21c730d7
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - fixes for leaks caused by recently merged patches
      
       - one build fix
      
       - a fix to prevent mixing of incompatible features
      
      * tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: don't leak extent_map in btrfs_get_io_geometry()
        btrfs: free checksum hash on in close_ctree
        btrfs: Fix build error while LIBCRC32C is module
        btrfs: inode: Don't compress if NODATASUM or NODATACOW set
      21c730d7
    • Thomas Gleixner's avatar
      sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y · b8d33498
      Thomas Gleixner authored
      The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
      CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
      set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
      the preemption mode choice wich defaults to NONE. This also affects
      oldconfig builds.
      
      So rather than changing 114 defconfig files and being an annoyance to
      users, revert the rename and select a new config symbol PREEMPTION. That
      keeps everything working smoothly and the revelant ifdef's are going to be
      fixed up step by step.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: a50a3f4b ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      b8d33498
    • Linus Torvalds's avatar
      Merge tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · c92f0380
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "For two regressions in media core:
      
         - v4l2-subdev: fix regression in check_pad()
      
         - videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
           in use"
      
      * tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
        media: v4l2-subdev: fix regression in check_pad()
      c92f0380
    • Sai Praneeth Prakhya's avatar
      iommu/vt-d: Print pasid table entries MSB to LSB in debugfs · 7f6cade5
      Sai Praneeth Prakhya authored
      Commit dd5142ca ("iommu/vt-d: Add debugfs support to show scalable mode
      DMAR table internals") prints content of pasid table entries from LSB to
      MSB where as other entries are printed MSB to LSB. So, to maintain
      uniformity among all entries and to not confuse the user, print MSB first.
      
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: Sohil Mehta <sohil.mehta@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: default avatarSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Fixes: dd5142ca ("iommu/vt-d: Add debugfs support to show scalable mode DMAR table internals")
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      7f6cade5
    • Chris Wilson's avatar
      iommu/iova: Remove stale cached32_node · 9eed17d3
      Chris Wilson authored
      Since the cached32_node is allowed to be advanced above dma_32bit_pfn
      (to provide a shortcut into the limited range), we need to be careful to
      remove the to be freed node if it is the cached32_node.
      
      [   48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110
      [   48.477812] Read of size 8 at addr ffff88870fc19020 by task kworker/u8:1/37
      [   48.477843]
      [   48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G     U            5.2.0+ #735
      [   48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
      [   48.478047] Workqueue: i915 __i915_gem_free_work [i915]
      [   48.478075] Call Trace:
      [   48.478111]  dump_stack+0x5b/0x90
      [   48.478137]  print_address_description+0x67/0x237
      [   48.478178]  ? __cached_rbnode_delete_update+0x68/0x110
      [   48.478212]  __kasan_report.cold.3+0x1c/0x38
      [   48.478240]  ? __cached_rbnode_delete_update+0x68/0x110
      [   48.478280]  ? __cached_rbnode_delete_update+0x68/0x110
      [   48.478308]  __cached_rbnode_delete_update+0x68/0x110
      [   48.478344]  private_free_iova+0x2b/0x60
      [   48.478378]  iova_magazine_free_pfns+0x46/0xa0
      [   48.478403]  free_iova_fast+0x277/0x340
      [   48.478443]  fq_ring_free+0x15a/0x1a0
      [   48.478473]  queue_iova+0x19c/0x1f0
      [   48.478597]  cleanup_page_dma.isra.64+0x62/0xb0 [i915]
      [   48.478712]  __gen8_ppgtt_cleanup+0x63/0x80 [i915]
      [   48.478826]  __gen8_ppgtt_cleanup+0x42/0x80 [i915]
      [   48.478940]  __gen8_ppgtt_clear+0x433/0x4b0 [i915]
      [   48.479053]  __gen8_ppgtt_clear+0x462/0x4b0 [i915]
      [   48.479081]  ? __sg_free_table+0x9e/0xf0
      [   48.479116]  ? kfree+0x7f/0x150
      [   48.479234]  i915_vma_unbind+0x1e2/0x240 [i915]
      [   48.479352]  i915_vma_destroy+0x3a/0x280 [i915]
      [   48.479465]  __i915_gem_free_objects+0xf0/0x2d0 [i915]
      [   48.479579]  __i915_gem_free_work+0x41/0xa0 [i915]
      [   48.479607]  process_one_work+0x495/0x710
      [   48.479642]  worker_thread+0x4c7/0x6f0
      [   48.479687]  ? process_one_work+0x710/0x710
      [   48.479724]  kthread+0x1b2/0x1d0
      [   48.479774]  ? kthread_create_worker_on_cpu+0xa0/0xa0
      [   48.479820]  ret_from_fork+0x1f/0x30
      [   48.479864]
      [   48.479907] Allocated by task 631:
      [   48.479944]  save_stack+0x19/0x80
      [   48.479994]  __kasan_kmalloc.constprop.6+0xc1/0xd0
      [   48.480038]  kmem_cache_alloc+0x91/0xf0
      [   48.480082]  alloc_iova+0x2b/0x1e0
      [   48.480125]  alloc_iova_fast+0x58/0x376
      [   48.480166]  intel_alloc_iova+0x90/0xc0
      [   48.480214]  intel_map_sg+0xde/0x1f0
      [   48.480343]  i915_gem_gtt_prepare_pages+0xb8/0x170 [i915]
      [   48.480465]  huge_get_pages+0x232/0x2b0 [i915]
      [   48.480590]  ____i915_gem_object_get_pages+0x40/0xb0 [i915]
      [   48.480712]  __i915_gem_object_get_pages+0x90/0xa0 [i915]
      [   48.480834]  i915_gem_object_prepare_write+0x2d6/0x330 [i915]
      [   48.480955]  create_test_object.isra.54+0x1a9/0x3e0 [i915]
      [   48.481075]  igt_shared_ctx_exec+0x365/0x3c0 [i915]
      [   48.481210]  __i915_subtests.cold.4+0x30/0x92 [i915]
      [   48.481341]  __run_selftests.cold.3+0xa9/0x119 [i915]
      [   48.481466]  i915_live_selftests+0x3c/0x70 [i915]
      [   48.481583]  i915_pci_probe+0xe7/0x220 [i915]
      [   48.481620]  pci_device_probe+0xe0/0x180
      [   48.481665]  really_probe+0x163/0x4e0
      [   48.481710]  device_driver_attach+0x85/0x90
      [   48.481750]  __driver_attach+0xa5/0x180
      [   48.481796]  bus_for_each_dev+0xda/0x130
      [   48.481831]  bus_add_driver+0x205/0x2e0
      [   48.481882]  driver_register+0xca/0x140
      [   48.481927]  do_one_initcall+0x6c/0x1af
      [   48.481970]  do_init_module+0x106/0x350
      [   48.482010]  load_module+0x3d2c/0x3ea0
      [   48.482058]  __do_sys_finit_module+0x110/0x180
      [   48.482102]  do_syscall_64+0x62/0x1f0
      [   48.482147]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   48.482190]
      [   48.482224] Freed by task 37:
      [   48.482273]  save_stack+0x19/0x80
      [   48.482318]  __kasan_slab_free+0x12e/0x180
      [   48.482363]  kmem_cache_free+0x70/0x140
      [   48.482406]  __free_iova+0x1d/0x30
      [   48.482445]  fq_ring_free+0x15a/0x1a0
      [   48.482490]  queue_iova+0x19c/0x1f0
      [   48.482624]  cleanup_page_dma.isra.64+0x62/0xb0 [i915]
      [   48.482749]  __gen8_ppgtt_cleanup+0x63/0x80 [i915]
      [   48.482873]  __gen8_ppgtt_cleanup+0x42/0x80 [i915]
      [   48.482999]  __gen8_ppgtt_clear+0x433/0x4b0 [i915]
      [   48.483123]  __gen8_ppgtt_clear+0x462/0x4b0 [i915]
      [   48.483250]  i915_vma_unbind+0x1e2/0x240 [i915]
      [   48.483378]  i915_vma_destroy+0x3a/0x280 [i915]
      [   48.483500]  __i915_gem_free_objects+0xf0/0x2d0 [i915]
      [   48.483622]  __i915_gem_free_work+0x41/0xa0 [i915]
      [   48.483659]  process_one_work+0x495/0x710
      [   48.483704]  worker_thread+0x4c7/0x6f0
      [   48.483748]  kthread+0x1b2/0x1d0
      [   48.483787]  ret_from_fork+0x1f/0x30
      [   48.483831]
      [   48.483868] The buggy address belongs to the object at ffff88870fc19000
      [   48.483868]  which belongs to the cache iommu_iova of size 40
      [   48.483920] The buggy address is located 32 bytes inside of
      [   48.483920]  40-byte region [ffff88870fc19000, ffff88870fc19028)
      [   48.483964] The buggy address belongs to the page:
      [   48.484006] page:ffffea001c3f0600 refcount:1 mapcount:0 mapping:ffff8888181a91c0 index:0x0 compound_mapcount: 0
      [   48.484045] flags: 0x8000000000010200(slab|head)
      [   48.484096] raw: 8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0
      [   48.484141] raw: 0000000000000000 0000000000120012 00000001ffffffff 0000000000000000
      [   48.484188] page dumped because: kasan: bad access detected
      [   48.484230]
      [   48.484265] Memory state around the buggy address:
      [   48.484314]  ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   48.484361]  ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   48.484406] >ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
      [   48.484451]                                ^
      [   48.484494]  ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   48.484530]  ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602
      Fixes: e60aa7b5 ("iommu/iova: Extend rbtree node caching")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: <stable@vger.kernel.org> # v4.15+
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      9eed17d3
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 83768245
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Several netfilter fixes including a nfnetlink deadlock fix from
          Florian Westphal and fix for dropping VRF packets from Miaohe Lin.
      
       2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
          proper block sharing.
      
       3) Fix r8169 PHY init from Thomas Voegtle.
      
       4) Fix memory leak in mac80211, from Lorenzo Bianconi.
      
       5) Missing NULL check on object allocation in cxgb4, from Navid
          Emamdoost.
      
       6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.
      
       7) Check that there is actually an ip header to access in skb->data in
          VRF, from Peter Kosyh.
      
       8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.
      
       9) One more tweak the the TCP fragmentation memory limit changes, to be
          less harmful to applications setting small SO_SNDBUF values. From
          Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
        tcp: be more careful in tcp_fragment()
        hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
        vrf: make sure skb->data contains ip header to make routing
        connector: remove redundant input callback from cn_dev
        qed: Prefer pcie_capability_read_word()
        igc: Prefer pcie_capability_read_word()
        cxgb4: Prefer pcie_capability_read_word()
        be2net: Synchronize be_update_queues with dev_watchdog
        bnx2x: Prevent load reordering in tx completion processing
        net: phy: sfp: hwmon: Fix scaling of RX power
        net: sched: verify that q!=NULL before setting q->flags
        chelsio: Fix a typo in a function name
        allocate_flower_entry: should check for null deref
        net: hns3: typo in the name of a constant
        kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
        tipc: Fix a typo
        mac80211: don't warn about CW params when not using them
        mac80211: fix possible memory leak in ieee80211_assign_beacon
        nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
        nl80211: fix VENDOR_CMD_RAW_DATA
        ...
      83768245
    • Dmitry Safonov's avatar
      iommu/vt-d: Check if domain->pgd was allocated · 3ee9eca7
      Dmitry Safonov authored
      There is a couple of places where on domain_init() failure domain_exit()
      is called. While currently domain_init() can fail only if
      alloc_pgtable_page() has failed.
      
      Make domain_exit() check if domain->pgd present, before calling
      domain_unmap(), as it theoretically should crash on clearing pte entries
      in dma_pte_clear_level().
      
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: iommu@lists.linux-foundation.org
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Reviewed-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      3ee9eca7
    • Dmitry Safonov's avatar
      iommu/vt-d: Don't queue_iova() if there is no flush queue · effa4678
      Dmitry Safonov authored
      Intel VT-d driver was reworked to use common deferred flushing
      implementation. Previously there was one global per-cpu flush queue,
      afterwards - one per domain.
      
      Before deferring a flush, the queue should be allocated and initialized.
      
      Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
      queue. It's probably worth to init it for static or unmanaged domains
      too, but it may be arguable - I'm leaving it to iommu folks.
      
      Prevent queuing an iova flush if the domain doesn't have a queue.
      The defensive check seems to be worth to keep even if queue would be
      initialized for all kinds of domains. And is easy backportable.
      
      On 4.19.43 stable kernel it has a user-visible effect: previously for
      devices in si domain there were crashes, on sata devices:
      
       BUG: spinlock bad magic on CPU#6, swapper/0/1
        lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x45/0x115
        intel_unmap+0x107/0x113
        intel_unmap_sg+0x6b/0x76
        __ata_qc_complete+0x7f/0x103
        ata_qc_complete+0x9b/0x26a
        ata_qc_complete_multiple+0xd0/0xe3
        ahci_handle_port_interrupt+0x3ee/0x48a
        ahci_handle_port_intr+0x73/0xa9
        ahci_single_level_irq_intr+0x40/0x60
        __handle_irq_event_percpu+0x7f/0x19a
        handle_irq_event_percpu+0x32/0x72
        handle_irq_event+0x38/0x56
        handle_edge_irq+0x102/0x121
        handle_irq+0x147/0x15c
        do_IRQ+0x66/0xf2
        common_interrupt+0xf/0xf
       RIP: 0010:__do_softirq+0x8c/0x2df
      
      The same for usb devices that use ehci-pci:
       BUG: spinlock bad magic on CPU#0, swapper/0/1
        lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
       Call Trace:
        <IRQ>
        dump_stack+0x61/0x7e
        spin_bug+0x9d/0xa3
        do_raw_spin_lock+0x22/0x8e
        _raw_spin_lock_irqsave+0x32/0x3a
        queue_iova+0x77/0x145
        intel_unmap+0x107/0x113
        intel_unmap_page+0xe/0x10
        usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
        usb_hcd_unmap_urb_for_dma+0x17/0x100
        unmap_urb_for_dma+0x22/0x24
        __usb_hcd_giveback_urb+0x51/0xc3
        usb_giveback_urb_bh+0x97/0xde
        tasklet_action_common.isra.4+0x5f/0xa1
        tasklet_action+0x2d/0x30
        __do_softirq+0x138/0x2df
        irq_exit+0x7d/0x8b
        smp_apic_timer_interrupt+0x10f/0x151
        apic_timer_interrupt+0xf/0x20
        </IRQ>
       RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39
      
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Lu Baolu <baolu.lu@linux.intel.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: <stable@vger.kernel.org> # 4.14+
      Fixes: 13cf0174 ("iommu/vt-d: Make use of iova deferred flushing")
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Reviewed-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      effa4678
    • Lu Baolu's avatar
      iommu/vt-d: Avoid duplicated pci dma alias consideration · 55752949
      Lu Baolu authored
      As we have abandoned the home-made lazy domain allocation
      and delegated the DMA domain life cycle up to the default
      domain mechanism defined in the generic iommu layer, we
      needn't consider pci alias anymore when mapping/unmapping
      the context entries. Without this fix, we see kernel NULL
      pointer dereference during pci device hot-plug test.
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Fixes: fa954e68 ("iommu/vt-d: Delegate the dma domain to upper layer")
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Reported-and-tested-by: default avatarXu Pengfei <pengfei.xu@intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      55752949
    • Joerg Roedel's avatar
      Revert "iommu/vt-d: Consolidate domain_init() to avoid duplication" · 301e7ee1
      Joerg Roedel authored
      This reverts commit 123b2ffc.
      
      This commit reportedly caused boot failures on some systems
      and needs to be reverted for now.
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      301e7ee1
    • Suren Baghdasaryan's avatar
      pidfd: fix a poll race when setting exit_state · b191d649
      Suren Baghdasaryan authored
      There is a race between reading task->exit_state in pidfd_poll and
      writing it after do_notify_parent calls do_notify_pidfd. Expected
      sequence of events is:
      
      CPU 0                            CPU 1
      ------------------------------------------------
      exit_notify
        do_notify_parent
          do_notify_pidfd
        tsk->exit_state = EXIT_DEAD
                                        pidfd_poll
                                           if (tsk->exit_state)
      
      However nothing prevents the following sequence:
      
      CPU 0                            CPU 1
      ------------------------------------------------
      exit_notify
        do_notify_parent
          do_notify_pidfd
                                         pidfd_poll
                                            if (tsk->exit_state)
        tsk->exit_state = EXIT_DEAD
      
      This causes a polling task to wait forever, since poll blocks because
      exit_state is 0 and the waiting task is not notified again. A stress
      test continuously doing pidfd poll and process exits uncovered this bug.
      
      To fix it, we make sure that the task's exit_state is always set before
      calling do_notify_pidfd.
      
      Fixes: b53b0b9d ("pidfd: add polling support")
      Cc: kernel-team@android.com
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
      [christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      b191d649
    • Vaibhav Jain's avatar
      powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails · 3a855b7a
      Vaibhav Jain authored
      In some cases initial bind of scm memory for an lpar can fail if
      previously it wasn't released using a scm-unbind hcall. This situation
      can arise due to panic of the previous kernel or forced lpar
      fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
      
      To mitigate such cases the patch updates papr_scm_probe() to force a
      call to drc_pmem_unbind() in case the initial bind of scm memory fails
      with EBUSY error. In case scm-bind operation again fails after the
      forced scm-unbind then we follow the existing error path. We also
      update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
      and indicate it as a EBUSY error back to the caller.
      Suggested-by: default avatar"Oliver O'Halloran" <oohall@gmail.com>
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
      3a855b7a
    • Vaibhav Jain's avatar
      powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL · 0d7fc080
      Vaibhav Jain authored
      The new hcall named H_SCM_UNBIND_ALL has been introduce that can
      unbind all or specific scm memory assigned to an lpar. This is
      more efficient than using H_SCM_UNBIND_MEM as currently we don't
      support partial unbind of scm memory.
      
      Hence this patch proposes following changes to drc_pmem_unbind():
      
          * Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to
            H_SCM_UNBIND_ALL.
      
          * Update drc_pmem_unbind() to handles cases when PHYP asks the guest
            kernel to wait for specific amount of time before retrying the
            hcall via the 'LONG_BUSY' return value.
      
          * Ensure appropriate error code is returned back from the function
            in case of an error.
      Reviewed-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
      0d7fc080
    • Vaibhav Jain's avatar
      powerpc/pseries: Update SCM hcall op-codes in hvcall.h · 6d140e75
      Vaibhav Jain authored
      Update the hvcalls.h to include op-codes for new hcalls introduce to
      manage SCM memory. Also update existing hcall definitions to reflect
      current papr specification for SCM.
      
      The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
      transient proposals and there support was never implemented by
      Power-VM nor they were used anywhere in Linux kernel. Hence we don't
      expect anyone to be impacted by this change.
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
      6d140e75