1. 18 Jul, 2018 7 commits
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 05df2045
      Linus Torvalds authored
      Pull DeviceTree fixes from Rob Herring:
      
       - Fix phandle cache to work with overlays
      
       - Correct the default clock-frequency for QCom geni-i2c
      
       - Binding doc quote and spelling fixes
      
      * tag 'devicetree-fixes-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        of: overlay: update phandle cache on overlay apply and remove
        dt-bindings: Fix unbalanced quotation marks
        dt-bindings: soc: qcom: Fix default clock-freq for qcom,geni-i2c
        dt-bindings: w1-gpio: Remove unneeded unit address
        Documentation: devicetree: tilcdc: fix spelling mistake "suppors" -> "supports"
      05df2045
    • Linus Torvalds's avatar
      Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 04a13206
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Three regression fixes. They're few-liners and fixing some corner
        cases missed in the origial patches"
      
      * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block()
        btrfs: fix use-after-free of cmp workspace pages
        btrfs: restore uuid_mutex in btrfs_open_devices
      04a13206
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 47f7dc4b
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "Miscellaneous bugfixes, plus a small patchlet related to Spectre v2"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvmclock: fix TSC calibration for nested guests
        KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled
        KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
        KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
        x86/kvmclock: set pvti_cpu0_va after enabling kvmclock
        x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
        kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading
        x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks
        KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
      47f7dc4b
    • Peng Hao's avatar
      kvmclock: fix TSC calibration for nested guests · e10f7805
      Peng Hao authored
      Inside a nested guest, access to hardware can be slow enough that
      tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
      to be called periodically and the nested guest to spend a lot of time
      reading the ACPI timer.
      
      However, if the TSC frequency is available from the pvclock page,
      we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
      'refine' operation.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPeng Hao <peng.hao2@zte.com.cn>
      [Commit message rewritten. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e10f7805
    • Liran Alon's avatar
      KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled · 2307af1c
      Liran Alon authored
      When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
      with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
      by MSR_IA32_VMX_BASIC.
      
      However, even though not explictly documented by TLFS, VMXArea passed
      as VMXON argument should still be marked with revision_id reported by
      physical CPU.
      
      This issue was found by the following setup:
      * L0 = KVM which expose eVMCS to it's L1 guest.
      * L1 = KVM which consume eVMCS reported by L0.
      This setup caused the following to occur:
      1) L1 execute hardware_enable().
      2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
      3) L0 intercept L1 VMXON and execute handle_vmon() which notes
      vmxarea->revision_id != VMCS12_REVISION and therefore fails with
      nested_vmx_failInvalid() which sets RFLAGS.CF.
      4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
      hardware_enable() continues as usual.
      5) L1 hardware_enable() then calls ept_sync_global() which executes
      INVEPT.
      6) L0 intercept INVEPT and execute handle_invept() which notes
      !vmx->nested.vmxon and thus raise a #UD to L1.
      7) Raised #UD caused L1 to panic.
      Reviewed-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 773e8a04Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2307af1c
    • Paolo Bonzini's avatar
      KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer · 9432a317
      Paolo Bonzini authored
      A comment warning against this bug is there, but the code is not doing what
      the comment says.  Therefore it is possible that an EPOLLHUP races against
      irq_bypass_register_consumer.  The EPOLLHUP handler schedules irqfd_shutdown,
      and if that runs soon enough, you get a use-after-free.
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      9432a317
    • Lan Tianyu's avatar
      KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. · b5020a8e
      Lan Tianyu authored
      Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
      when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
      for one specific eventfd. When the assign path hasn't finished but irqfd
      has been added to kvm->irqfds.items list, another thead may deassign the
      eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
      the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
      such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
      has been added to kvm->irqfds.items list, and call synchronize_srcu()
      in irq_shutdown() to make sure that irqfd has been fully initialized in
      the assign path.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTianyu Lan <tianyu.lan@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5020a8e
  2. 17 Jul, 2018 2 commits
    • Linus Torvalds's avatar
      Mark HI and TASKLET softirq synchronous · 3c53776e
      Linus Torvalds authored
      Way back in 4.9, we committed 4cd13c21 ("softirq: Let ksoftirqd do
      its job"), and ever since we've had small nagging issues with it.  For
      example, we've had:
      
        1ff68820 ("watchdog: core: make sure the watchdog_worker is not deferred")
        8d5755b3 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
        217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()")
      
      all of which worked around some of the effects of that commit.
      
      The DVB people have also complained that the commit causes excessive USB
      URB latencies, which seems to be due to the USB code using tasklets to
      schedule USB traffic.  This seems to be an issue mainly when already
      living on the edge, but waiting for ksoftirqd to handle it really does
      seem to cause excessive latencies.
      
      Now Hanna Hawa reports that this issue isn't just limited to USB URB and
      DVB, but also causes timeout problems for the Marvell SoC team:
      
       "I'm facing kernel panic issue while running raid 5 on sata disks
        connected to Macchiatobin (Marvell community board with Armada-8040
        SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
        and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
        uses a tasklet to clean the done descriptors from the queue"
      
      The latency problem causes a panic:
      
        mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
        Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
      
      We've discussed simply just reverting the original commit entirely, and
      also much more involved solutions (with per-softirq threads etc).  This
      patch is intentionally stupid and fairly limited, because the issue
      still remains, and the other solutions either got sidetracked or had
      other issues.
      
      We should probably also consider the timer softirqs to be synchronous
      and not be delayed to ksoftirqd (since they were the issue with the
      earlier watchdog problems), but that should be done as a separate patch.
      This does only the tasklet cases.
      Reported-and-tested-by: default avatarHanna Hawa <hannah@marvell.com>
      Reported-and-tested-by: default avatarJosef Griebichler <griebichler.josef@gmx.at>
      Reported-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c53776e
    • Qu Wenruo's avatar
      btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() · 665d4953
      Qu Wenruo authored
      In commit ac0b4145 ("btrfs: scrub: Don't use inode pages for device
      replace") we removed the branch of copy_nocow_pages() to avoid
      corruption for compressed nodatasum extents.
      
      However above commit only solves the problem in scrub_extent(), if
      during scrub_pages() we failed to read some pages,
      sctx->no_io_error_seen will be non-zero and we go to fixup function
      scrub_handle_errored_block().
      
      In scrub_handle_errored_block(), for sctx without csum (no matter if
      we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
      which does the similar thing with copy_nocow_pages(), but does it
      without the extra check in copy_nocow_pages() routine.
      
      So for test cases like btrfs/100, where we emulate read errors during
      replace/scrub, we could corrupt compressed extent data again.
      
      This patch will fix it just by avoiding any "optimization" for
      nodatasum, just falls back to the normal fixup routine by try read from
      any good copy.
      
      This also solves WARN_ON() or dead lock caused by lame backref iteration
      in scrub_fixup_nodatasum() routine.
      
      The deadlock or WARN_ON() won't be triggered before commit ac0b4145
      ("btrfs: scrub: Don't use inode pages for device replace") since
      copy_nocow_pages() have better locking and extra check for data extent,
      and it's already doing the fixup work by try to read data from any good
      copy, so it won't go scrub_fixup_nodatasum() anyway.
      
      This patch disables the faulty code and will be removed completely in a
      followup patch.
      
      Fixes: ac0b4145 ("btrfs: scrub: Don't use inode pages for device replace")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      665d4953
  3. 16 Jul, 2018 5 commits
  4. 15 Jul, 2018 12 commits
  5. 14 Jul, 2018 14 commits