1. 09 Feb, 2018 2 commits
    • David Gibson's avatar
      KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 · 790a9df5
      David Gibson authored
      This adds code to enable the HPT resizing code to work on POWER9,
      which uses a slightly modified HPT entry format compared to POWER8.
      On POWER9, we convert HPTEs read from the HPT from the new format to
      the old format so that the rest of the HPT resizing code can work as
      before.  HPTEs written to the new HPT are converted to the new format
      as the last step before writing them into the new HPT.
      
      This takes out the checks added by commit bcd3bb63 ("KVM: PPC:
      Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18),
      now that HPT resizing works on POWER9.
      
      On POWER9, when we pivot to the new HPT, we now call
      kvmppc_setup_partition_table() to update the partition table in order
      to make the hardware use the new HPT.
      
      [paulus@ozlabs.org - added kvmppc_setup_partition_table() call,
       wrote commit message.]
      Tested-by: default avatarLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      790a9df5
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · 05f2bb03
      Paul Mackerras authored
      This fixes the computation of the HPTE index to use when the HPT
      resizing code encounters a bolted HPTE which is stored in its
      secondary HPTE group.  The code inverts the HPTE group number, which
      is correct, but doesn't then mask it with new_hash_mask.  As a result,
      new_pteg will be effectively negative, resulting in new_hptep
      pointing before the new HPT, which will corrupt memory.
      
      In addition, this removes two BUG_ON statements.  The condition that
      the BUG_ONs were testing -- that we have computed the hash value
      incorrectly -- has never been observed in testing, and if it did
      occur, would only affect the guest, not the host.  Given that
      BUG_ON should only be used in conditions where the kernel (i.e.
      the host kernel, in this case) can't possibly continue execution,
      it is not appropriate here.
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      05f2bb03
  2. 08 Feb, 2018 1 commit
  3. 01 Feb, 2018 2 commits
    • Alexander Graf's avatar
      KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled · 07ae5389
      Alexander Graf authored
      When copying between the vcpu and svcpu, we may get scheduled away onto
      a different host CPU which in turn means our svcpu pointer may change.
      
      That means we need to atomically copy to and from the svcpu with preemption
      disabled, so that all code around it always sees a coherent state.
      Reported-by: default avatarSimon Guo <wei.guo.simon@gmail.com>
      Fixes: 3d3319b4 ("KVM: PPC: Book3S: PR: Enable interrupts earlier")
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      07ae5389
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Drop locks before reading guest memory · 36ee41d1
      Paul Mackerras authored
      Running with CONFIG_DEBUG_ATOMIC_SLEEP reveals that HV KVM tries to
      read guest memory, in order to emulate guest instructions, while
      preempt is disabled and a vcore lock is held.  This occurs in
      kvmppc_handle_exit_hv(), called from post_guest_process(), when
      emulating guest doorbell instructions on POWER9 systems, and also
      when checking whether we have hit a hypervisor breakpoint.
      Reading guest memory can cause a page fault and thus cause the
      task to sleep, so we need to avoid reading guest memory while
      holding a spinlock or when preempt is disabled.
      
      To fix this, we move the preempt_enable() in kvmppc_run_core() to
      before the loop that calls post_guest_process() for each vcore that
      has just run, and we drop and re-take the vcore lock around the calls
      to kvmppc_emulate_debug_inst() and kvmppc_emulate_doorbell_instr().
      
      Dropping the lock is safe with respect to the iteration over the
      runnable vcpus in post_guest_process(); for_each_runnable_thread
      is actually safe to use locklessly.  It is possible for a vcpu
      to become runnable and add itself to the runnable_threads array
      (code near the beginning of kvmppc_run_vcpu()) and then get included
      in the iteration in post_guest_process despite the fact that it
      has not just run.  This is benign because vcpu->arch.trap and
      vcpu->arch.ceded will be zero.
      
      Cc: stable@vger.kernel.org # v4.13+
      Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      36ee41d1
  4. 19 Jan, 2018 7 commits
  5. 18 Jan, 2018 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9 · d075745d
      Paul Mackerras authored
      Hypervisor maintenance interrupts (HMIs) are generated by various
      causes, signalled by bits in the hypervisor maintenance exception
      register (HMER).  In most cases calling OPAL to handle the interrupt
      is the correct thing to do, but the "debug trigger" HMIs signalled by
      PPC bit 17 (bit 46) of HMER are used to invoke software workarounds
      for hardware bugs, and OPAL does not have any code to handle this
      cause.  The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips
      to work around a hardware bug in executing vector load instructions to
      cache inhibited memory.  In POWER9 DD2.2 chips, it is generated when
      conditions are detected relating to threads being in TM (transactional
      memory) suspended mode when the core SMT configuration needs to be
      reconfigured.
      
      The kernel currently has code to detect the vector CI load condition,
      but only when the HMI occurs in the host, not when it occurs in a
      guest.  If a HMI occurs in the guest, it is always passed to OPAL, and
      then we always re-sync the timebase, because the HMI cause might have
      been a timebase error, for which OPAL would re-sync the timebase, thus
      removing the timebase offset which KVM applied for the guest.  Since
      we don't know what OPAL did, we don't know whether to subtract the
      timebase offset from the timebase, so instead we re-sync the timebase.
      
      This adds code to determine explicitly what the cause of a debug
      trigger HMI will be.  This is based on a new device-tree property
      under the CPU nodes called ibm,hmi-special-triggers, if it is
      present, or otherwise based on the PVR (processor version register).
      The handling of debug trigger HMIs is pulled out into a separate
      function which can be called from the KVM guest exit code.  If this
      function handles and clears the HMI, and no other HMI causes remain,
      then we skip calling OPAL and we proceed to subtract the guest
      timebase offset from the timebase.
      
      The overall handling for HMIs that occur in the host (i.e. not in a
      KVM guest) is largely unchanged, except that we now don't set the flag
      for the vector CI load workaround on DD2.2 processors.
      
      This also removes a BUG_ON in the KVM code.  BUG_ON is generally not
      useful in KVM guest entry/exit code since it is difficult to handle
      the resulting trap gracefully.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d075745d
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow HPT and radix on the same core for POWER9 v2.2 · 00608e1f
      Paul Mackerras authored
      POWER9 chip versions starting with "Nimbus" v2.2 can support running
      with some threads of a core in HPT mode and others in radix mode.
      This means that we don't have to prohibit independent-threads mode
      when running a HPT guest on a radix host, and we don't have to do any
      of the synchronization between threads that was introduced in commit
      c0101509 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix
      hosts", 2017-10-19).
      
      Rather than using up another CPU feature bit, we just do an
      explicit test on the PVR (processor version register) at module
      startup time to determine whether we have to take steps to avoid
      having some threads in HPT mode and some in radix mode (so-called
      "mixed mode").  We test for "Nimbus" (indicated by 0 or 1 in the top
      nibble of the lower 16 bits) v2.2 or later, or "Cumulus" (indicated by
      2 or 3 in that nibble) v1.1 or later.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      00608e1f
  6. 17 Jan, 2018 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded · 6964e6a4
      Paul Mackerras authored
      This moves the code that loads and unloads the guest SLB values so that
      it is done while the guest LPCR value is loaded in the LPCR register.
      The reason for doing this is that on POWER9, the behaviour of the
      slbmte instruction depends on the LPCR[UPRT] bit.  If UPRT is 1, as
      it is for a radix host (or guest), the SLB index is truncated to
      2 bits.  This means that for a HPT guest on a radix host, the SLB
      was not being loaded correctly, causing the guest to crash.
      
      The SLB is now loaded much later in the guest entry path, after the
      LPCR is loaded, which for a secondary thread is after it sees that
      the primary thread has switched the MMU to the guest.  The loop that
      waits for the primary thread has a branch out to the exit code that
      is taken if it sees that other threads have commenced exiting the
      guest.  Since we have now not loaded the SLB at this point, we make
      this path branch to a new label 'guest_bypass' and we move the SLB
      unload code to before this label.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      6964e6a4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Make sure we don't re-enter guest without XIVE loaded · 43ff3f65
      Paul Mackerras authored
      This fixes a bug where it is possible to enter a guest on a POWER9
      system without having the XIVE (interrupt controller) context loaded.
      This can happen because we unload the XIVE context from the CPU
      before doing the real-mode handling for machine checks.  After the
      real-mode handler runs, it is possible that we re-enter the guest
      via a fast path which does not load the XIVE context.
      
      To fix this, we move the unloading of the XIVE context to come after
      the real-mode machine check handler is called.
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      43ff3f65
  7. 16 Jan, 2018 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Enable migration of decrementer register · 5855564c
      Paul Mackerras authored
      This adds a register identifier for use with the one_reg interface
      to allow the decrementer expiry time to be read and written by
      userspace.  The decrementer expiry time is in guest timebase units
      and is equal to the sum of the decrementer and the guest timebase.
      (The expiry time is used rather than the decrementer value itself
      because the expiry time is not constantly changing, though the
      decrementer value is, while the guest vcpu is not running.)
      
      Without this, a guest vcpu migrated to a new host will see its
      decrementer set to some random value.  On POWER8 and earlier, the
      decrementer is 32 bits wide and counts down at 512MHz, so the
      guest vcpu will potentially see no decrementer interrupts for up
      to about 4 seconds, which will lead to a stall.  With POWER9, the
      decrementer is now 56 bits side, so the stall can be much longer
      (up to 2.23 years) and more noticeable.
      
      To help work around the problem in cases where userspace has not been
      updated to migrate the decrementer expiry time, we now set the
      default decrementer expiry at vcpu creation time to the current time
      rather than the maximum possible value.  This should mean an
      immediate decrementer interrupt when a migrated vcpu starts
      running.  In cases where the decrementer is 32 bits wide and more
      than 4 seconds elapse between the creation of the vcpu and when it
      first runs, the decrementer would have wrapped around to positive
      values and there may still be a stall - but this is no worse than
      the current situation.  In the large-decrementer case, we are sure
      to get an immediate decrementer interrupt (assuming the time from
      vcpu creation to first run is less than 2.23 years) and we thus
      avoid a very long stall.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      5855564c
  8. 12 Jan, 2018 2 commits
  9. 11 Jan, 2018 2 commits
  10. 09 Jan, 2018 1 commit
  11. 03 Dec, 2017 5 commits
  12. 02 Dec, 2017 4 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 2db767d9
      Linus Torvalds authored
      Pull NFS client fixes from Anna Schumaker:
       "These patches fix a problem with compiling using an old version of
        gcc, and also fix up error handling in the SUNRPC layer.
      
         - NFSv4: Ensure gcc 4.4.4 can compile initialiser for
           "invalid_stateid"
      
         - SUNRPC: Allow connect to return EHOSTUNREACH
      
         - SUNRPC: Handle ENETDOWN errors"
      
      * tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        SUNRPC: Handle ENETDOWN errors
        SUNRPC: Allow connect to return EHOSTUNREACH
        NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
      2db767d9
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 788c1da0
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Here are some bug fixes for 4.15-rc2.
      
         - fix memory leaks that appeared after removing ifork inline data
           buffer
      
         - recover deferred rmap update log items in correct order
      
         - fix memory leaks when buffer construction fails
      
         - fix memory leaks when bmbt is corrupt
      
         - fix some uninitialized variables and math problems in the quota
           scrubber
      
         - add some omitted attribution tags on the log replay commit
      
         - fix some UBSAN complaints about integer overflows with large sparse
           files
      
         - implement an effective inode mode check in online fsck
      
         - fix log's inability to retry quota item writeout due to transient
           errors"
      
      * tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Properly retry failed dquot items in case of error during buffer writeback
        xfs: scrub inode mode properly
        xfs: remove unused parameter from xfs_writepage_map
        xfs: ubsan fixes
        xfs: calculate correct offset in xfs_scrub_quota_item
        xfs: fix uninitialized variable in xfs_scrub_quota
        xfs: fix leaks on corruption errors in xfs_bmap.c
        xfs: fortify xfs_alloc_buftarg error handling
        xfs: log recovery should replay deferred ops in order
        xfs: always free inline data before resetting inode fork during ifree
      788c1da0
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of... · e1ba1c99
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
      
      Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt:
       "This contains a handful of small cleanups that are a result of
        feedback that didn't make it into our original patch set, either
        because the feedback hadn't been given yet, I missed the original
        emails, or we weren't ready to submit the changes yet.
      
        I've been maintaining the various cleanup patch sets I have as their
        own branches, which I then merged together and signed. Each merge
        commit has a short summary of the changes, and each branch is based on
        your latest tag (4.15-rc1, in this case). If this isn't the right way
        to do this then feel free to suggest something else, but it seems sane
        to me.
      
        Here's a short summary of the changes, roughly in order of how
        interesting they are.
      
         - libgcc.h has been moved from include/lib, where it's the only
           member, to include/linux. This is meant to avoid tab completion
           conflicts.
      
         - VDSO entries for clock_get/gettimeofday/getcpu have been added.
           These are simple syscalls now, but we want to let glibc use them
           from the start so we can make them faster later.
      
         - A VDSO entry for instruction cache flushing has been added so
           userspace can flush the instruction cache.
      
         - The VDSO symbol versions for __vdso_cmpxchg{32,64} have been
           removed, as those VDSO entries don't actually exist.
      
         - __io_writes has been corrected to respect the given type.
      
         - A new READ_ONCE in arch_spin_is_locked().
      
         - __test_and_op_bit_ord() is now actually ordered.
      
         - Various small fixes throughout the tree to enable allmodconfig to
           build cleanly.
      
         - Removal of some dead code in our atomic support headers.
      
         - Improvements to various comments in our atomic support headers"
      
      * tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits)
        RISC-V: __io_writes should respect the length argument
        move libgcc.h to include/linux
        RISC-V: Clean up an unused include
        RISC-V: Allow userspace to flush the instruction cache
        RISC-V: Flush I$ when making a dirty page executable
        RISC-V: Add missing include
        RISC-V: Use define for get_cycles like other architectures
        RISC-V: Provide stub of setup_profiling_timer()
        RISC-V: Export some expected symbols for modules
        RISC-V: move empty_zero_page definition to C and export it
        RISC-V: io.h: type fixes for warnings
        RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
        RISC-V: use generic serial.h
        RISC-V: remove spin_unlock_wait()
        RISC-V: `sfence.vma` orderes the instruction cache
        RISC-V: Add READ_ONCE in arch_spin_is_locked()
        RISC-V: __test_and_op_bit_ord should be strongly ordered
        RISC-V: Remove smb_mb__{before,after}_spinlock()
        RISC-V: Remove __smp_bp__{before,after}_atomic
        RISC-V: Comment on why {,cmp}xchg is ordered how it is
        ...
      e1ba1c99
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 4b1967c9
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The critical one here is a fix for fpsimd register corruption across
        signals which was introduced by the SVE support code (the register
        files overlap), but the others are worth having as well.
      
        Summary:
      
         - Fix FP register corruption when SVE is not available or in use
      
         - Fix out-of-tree module build failure when CONFIG_ARM64_MODULE_PLTS=y
      
         - Missing 'const' generating errors with LTO builds
      
         - Remove unsupported events from Cortex-A73 PMU description
      
         - Removal of stale and incorrect comments"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: context: Fix comments and remove pointless smp_wmb()
        arm64: cpu_ops: Add missing 'const' qualifiers
        arm64: perf: remove unsupported events for Cortex-A73
        arm64: fpsimd: Fix failure to restore FPSIMD state after signals
        arm64: pgd: Mark pgd_cache as __ro_after_init
        arm64: ftrace: emit ftrace-mod.o contents through code
        arm64: module-plts: factor out PLT generation code for ftrace
        arm64: mm: cleanup stale AIVIVT references
      4b1967c9
  13. 01 Dec, 2017 9 commits
    • Palmer Dabbelt's avatar
      RISC-V: Fixes for clean allmodconfig build · 3b62de26
      Palmer Dabbelt authored
      Olaf said: Here's a short series of patches that produces a working
      allmodconfig. Would be nice to see them go in so we can add build
      coverage.
      
      I've dropped patches 8 and 10 from the original set:
      
      * [PATCH 08/10] (RISC-V: Set __ARCH_WANT_RENAMEAT to pick up generic
        version) has a better fix that I've sent out for review, we don't want
        renameat.
      * [PATCH 10/10] (input: joystick: riscv has get_cycles) has already been
        taken into Dmitry Torokhov's tree.
      3b62de26
    • Palmer Dabbelt's avatar
      move libgcc.h to include/linux · 185e788c
      Palmer Dabbelt authored
      185e788c
    • Palmer Dabbelt's avatar
      7382fbde
    • Palmer Dabbelt's avatar
      RISC-V: User-Visible Changes · 07f8ba74
      Palmer Dabbelt authored
      This merge contains the user-visible, ABI-breaking changes that we want
      to make sure we have in Linux before our first release.   Highlights
      include:
      
      * VDSO entries for clock_get/gettimeofday/getcpu have been added.  These
        are simple syscalls now, but we want to let glibc use them from the
        start so we can make them faster later.
      * A VDSO entry for instruction cache flushing has been added so
        userspace can flush the instruction cache.
      * The VDSO symbol versions for __vdso_cmpxchg{32,64} have been removed,
        as those VDSO entries don't actually exist.
      
      Conflicts:
              arch/riscv/include/asm/tlbflush.h
      07f8ba74
    • Palmer Dabbelt's avatar
      RISC-V Atomic Cleanups · f8182f61
      Palmer Dabbelt authored
      This patch set is the result of some feedback that filtered through
      after our original patch set was reviewed, some of which was the result
      of me missing some email.  It contains:
      
      * A new READ_ONCE in arch_spin_is_locked()
      * __test_and_op_bit_ord() is now actually ordered
      * Improvements to various comments
      * Removal of some dead code
      f8182f61
    • Palmer Dabbelt's avatar
      RISC-V: __io_writes should respect the length argument · da894ff1
      Palmer Dabbelt authored
      Whoops -- I must have just been being an idiot again.  Thanks to Segher
      for finding the bug :).
      
      CC: Segher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      da894ff1
    • Christoph Hellwig's avatar
      move libgcc.h to include/linux · 4db2b604
      Christoph Hellwig authored
      Introducing a new include/lib directory just for this file totally
      messes up tab completion for include/linux, which is highly annoying.
      
      Move it to include/linux where we have headers for all kinds of other
      lib/ code as well.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarPalmer Dabbelt <palmer@sifive.com>
      4db2b604
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a0651c7f
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Two fixes for nasty kexec/kdump crashes in certain configurations.
      
        A couple of minor fixes for the new TIDR code.
      
        A fix for an oops in a CXL error handling path.
      
        Thanks to: Andrew Donnellan, Christophe Lombard, David Gibson, Mahesh
        Salgaonkar, Vaibhav Jain"
      
      * tag 'powerpc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc: Do not assign thread.tidr if already assigned
        powerpc: Avoid signed to unsigned conversion in set_thread_tidr()
        powerpc/kexec: Fix kexec/kdump in P9 guest kernels
        powerpc/powernv: Fix kexec crashes caused by tlbie tracing
        cxl: Check if vphb exists before iterating over AFU devices
      a0651c7f
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · ae753ee2
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
       "Two fix patches for the AFS filesystem:
      
         - Fix the refcounting on permit caching.
      
         - AFS inode (afs_vnode) fields need resetting after allocation
           because they're only initialised when slab pages are obtained from
           the page allocator"
      
      * tag 'afs-fixes-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Properly reset afs_vnode (inode) fields
        afs: Fix permit refcounting
      ae753ee2