An error occurred fetching the project authors.
  1. 28 Aug, 2020 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/book3s64/radix: Fix boot failure with large amount of guest memory · 103a8542
      Aneesh Kumar K.V authored
      If the hypervisor doesn't support hugepages, the kernel ends up allocating a large
      number of page table pages. The early page table allocation was wrongly
      setting the max memblock limit to ppc64_rma_size with radix translation
      which resulted in boot failure as shown below.
      
      Kernel panic - not syncing:
      early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x1000000 nid=-1 from=0x0000000000000000 max_addr=0xffffffffffffffff
       CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2
       Call Trace:
       [c0000000016f3d00] [c0000000007c6470] dump_stack+0xc4/0x114 (unreliable)
       [c0000000016f3d40] [c00000000014c78c] panic+0x164/0x418
       [c0000000016f3dd0] [c000000000098890] early_alloc_pgtable+0xe0/0xec
       [c0000000016f3e60] [c0000000010a5440] radix__early_init_mmu+0x360/0x4b4
       [c0000000016f3ef0] [c000000001099bac] early_init_mmu+0x1c/0x3c
       [c0000000016f3f10] [c00000000109a320] early_setup+0x134/0x170
      
      This was because the kernel was checking for the radix feature before we enable the
      feature via mmu_features. This resulted in the kernel using hash restrictions on
      radix.
      
      Rework the early init code such that the kernel boot with memblock restrictions
      as imposed by hash. At that point, the kernel still hasn't finalized the
      translation the kernel will end up using.
      
      We have three different ways of detecting radix.
      
      1. dt_cpu_ftrs_scan -> used only in case of PowerNV
      2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan
      3. CAS -> Where we negotiate with hypervisor about the supported translation.
      
      We look at 1 or 2 early in the boot and after that, we look at the CAS vector to
      finalize the translation the kernel will use. We also support a kernel command
      line option (disable_radix) to switch to hash.
      
      Update the memblock limit after mmu_early_init_devtree() if the kernel is going
      to use radix translation. This forces some of the memblock allocations we do before
      mmu_early_init_devtree() to be within the RMA limit.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Reported-by: default avatarShirisha Ganta <shiganta@in.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200828100852.426575-1-aneesh.kumar@linux.ibm.com
      103a8542
  2. 24 Aug, 2020 2 commits
  3. 07 Aug, 2020 1 commit
    • Anshuman Khandual's avatar
      mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() · 56993b4e
      Anshuman Khandual authored
      There are many instances where vmemap allocation is often switched between
      regular memory and device memory just based on whether altmap is available
      or not.  vmemmap_alloc_block_buf() is used in various platforms to
      allocate vmemmap mappings.  Lets also enable it to handle altmap based
      device memory allocation along with existing regular memory allocations.
      This will help in avoiding the altmap based allocation switch in many
      places.  To summarize there are two different methods to call
      vmemmap_alloc_block_buf().
      
      vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
      vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */
      
      This converts altmap_alloc_block_buf() into a static function, drops it's
      entry from the header and updates Documentation/vm/memory-model.rst.
      Suggested-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarJia He <justin.he@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Will Deacon <will@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56993b4e
  4. 29 Jul, 2020 1 commit
  5. 16 Jul, 2020 1 commit
  6. 09 Jun, 2020 1 commit
    • Mike Rapoport's avatar
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport authored
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  7. 11 May, 2020 1 commit
  8. 13 Nov, 2019 1 commit
  9. 28 Oct, 2019 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/nvdimm: Update vmemmap_populated to check sub-section range · 5f5d6e40
      Aneesh Kumar K.V authored
      With commit: 7cc7867f ("mm/devm_memremap_pages: enable sub-section remap")
      pmem namespaces are remapped in 2M chunks. On architectures like ppc64 we
      can map the memmap area using 16MB hugepage size and that can cover
      a memory range of 16G.
      
      While enabling new pmem namespaces, since memory is added in sub-section chunks,
      before creating a new memmap mapping, kernel should check whether there is an
      existing memmap mapping covering the new pmem namespace. Currently, this is
      validated by checking whether the section covering the range is already
      initialized or not. Considering there can be multiple namespaces in the same
      section this can result in wrong validation. Update this to check for
      sub-sections in the range. This is done by checking for all pfns in the range we
      are mapping.
      
      We could optimize this by checking only just one pfn in each sub-section. But
      since this is not fast-path we keep this simple.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190917123851.22553-1-aneesh.kumar@linux.ibm.com
      5f5d6e40
  10. 24 Sep, 2019 1 commit
    • Aneesh Kumar K.V's avatar
      libnvdimm/altmap: Track namespace boundaries in altmap · cf387d96
      Aneesh Kumar K.V authored
      With PFN_MODE_PMEM namespace, the memmap area is allocated from the device
      area. Some architectures map the memmap area with large page size. On
      architectures like ppc64, 16MB page for memap mapping can map 262144 pfns.
      This maps a namespace size of 16G.
      
      When populating memmap region with 16MB page from the device area,
      make sure the allocated space is not used to map resources outside this
      namespace. Such usage of device area will prevent a namespace destroy.
      
      Add resource end pnf in altmap and use that to check if the memmap area
      allocation can map pfn outside the namespace. On ppc64 in such case we fallback
      to allocation from memory.
      
      This fix kernel crash reported below:
      
      [  132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0
      [  133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000
      [  133.464760] Faulting instruction address: 0xc00000000007580c
      [  133.464766] Oops: Kernel access of bad area, sig: 11 [#1]
      [  133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
      .....
      [  133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0
      [  133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0
      [  133.464910] Call Trace:
      [  133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable)
      [  133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240
      [  133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590
      [  133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160
      [  133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0
      [  133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50
      [  133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400
      [  133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250
      [  133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110
      [  133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70
      [  133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0
      [  133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250
      [  133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70
      [  133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250
      
      djbw: Aneesh notes that this crash can likely be triggered in any kernel that
      supports 'papr_scm', so flagging that commit for -stable consideration.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarSantosh Sivaraj <santosh@fossix.org>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      cf387d96
  11. 04 Jul, 2019 1 commit
  12. 30 May, 2019 1 commit
  13. 02 May, 2019 1 commit
  14. 02 Mar, 2019 1 commit
  15. 09 Dec, 2018 1 commit
    • Oliver O'Halloran's avatar
      powerpc/mm: Fallback to RAM if the altmap is unusable · 9ef34630
      Oliver O'Halloran authored
      The "altmap" is used to provide a pool of memory that is reserved for
      the vmemmap backing of hot-plugged memory. This is useful when adding
      large amount of ZONE_DEVICE memory to a system with a limited amount of
      normal memory.
      
      On ppc64 we use huge pages to map the vmemmap which requires the backing
      storage to be contigious and aligned to the hugepage size. The altmap
      implementation allows for the altmap provider to reserve a few PFNs at
      the start of the range for it's own uses and when this occurs the
      first chunk of the altmap is not usable for hugepage mappings. On hash
      there is no sane way to fall back to a normal sized page mapping so we
      fail the allocation. This results in memory hotplug failing with
      ENOMEM when the new range doesn't fall into an existing vmemmap block.
      
      This patch handles this case by falling back to using system memory
      rather than failing if we cannot allocate from the altmap. This
      fallback should only ever be used for the first vmemmap block so it
      should not cause excess memory consumption.
      
      Fixes: 7b73d978 ("mm: pass the vmem_altmap to vmemmap_populate")
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9ef34630
  16. 11 Sep, 2018 1 commit
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode · 425333bf
      Alexey Kardashevskiy authored
      At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
      which in turn reads the old TCE and if it was a valid entry, marks
      the physical page dirty if it was mapped for writing. Since it is in
      real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
      to get the page struct. However SetPageDirty() itself reads the compound
      page head and returns a virtual address for the head page struct and
      setting dirty bit for that kills the system.
      
      This adds additional dirty bit tracking into the MM/IOMMU API for use
      in the real mode. Note that this does not change how VFIO and
      KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
      - use the lowest bit of the cached host phys address to carry
      the dirty bit;
      - mark pages dirty when they are unpinned which happens when
      the preregistered memory is released which always happens in virtual
      mode;
      - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
      - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
      in the new mm_iommu_ua_mark_dirty_rm() helper;
      - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
      caller anyway) to reduce the real mode KVM and IOMMU knowledge
      across different subsystems.
      
      This removes realmode_pfn_to_page() as it is not used anymore.
      
      While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
      the real mode only and modules cannot call it anyway.
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      425333bf
  17. 04 Apr, 2018 1 commit
  18. 30 Mar, 2018 1 commit
  19. 08 Jan, 2018 3 commits
  20. 04 Dec, 2017 1 commit
  21. 06 Nov, 2017 2 commits
    • Michael Ellerman's avatar
      powerpc/mm: Add a CONFIG option to choose if radix is used by default · 1fd6c022
      Michael Ellerman authored
      Currently if the hardware supports the radix MMU we will use
      it, *unless* "disable_radix" is passed on the kernel command line.
      
      However some users would like the reverse semantics. ie. The kernel
      uses the hash MMU by default, unless radix is explicitly requested on
      the command line.
      
      So add a CONFIG option to choose whether we use radix by default or
      not, and expand the disable_radix command line option to allow
      "disable_radix=no" which *enables* radix.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1fd6c022
    • Michael Ellerman's avatar
      powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747
      Michael Ellerman authored
      CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
      on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
      found in "server" processors, from IBM mainly.
      
      Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
      annoying to have two symbols that always have the same value, it's not
      quite annoying enough to bother removing one.
      
      However with the arrival of Power9, we now have the situation where
      CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
      Radix MMU - *not* the "standard" MMU. So it is now actively confusing
      to use it, because it implies that code is disabled or inactive when
      the Radix MMU is in use, however that is not necessarily true.
      
      So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
      formatting updates of some of the affected lines.
      
      This will be a pain for backports, but c'est la vie.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4e003747
  22. 10 Aug, 2017 1 commit
  23. 24 Jul, 2017 1 commit
  24. 02 Jul, 2017 2 commits
  25. 28 Jun, 2017 1 commit
  26. 31 Mar, 2017 1 commit
  27. 21 Mar, 2017 1 commit
  28. 06 Mar, 2017 1 commit
    • Suraj Jitindar Singh's avatar
      powerpc: Update to new option-vector-5 format for CAS · 014d02cb
      Suraj Jitindar Singh authored
      On POWER9 the ibm,client-architecture-support (CAS) negotiation process
      has been updated to change how the host to guest negotiation is done for
      the new hash/radix mmu as well as the nest mmu, process tables and guest
      translation shootdown (GTSE).
      
      This is documented in the unreleased PAPR ACR "CAS option vector
      additions for P9".
      
      The host tells the guest which options it supports in
      ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
      to request in the CAS call and these are agreed to in the
      ibm,architecture-vec-5 property of the chosen node.
      
      Thus we read ibm,arch-vec-5-platform-support and make our selection before
      calling CAS. We then parse the ibm,architecture-vec-5 property of the
      chosen node to check whether we should run as hash or radix.
      
      ibm,arch-vec-5-platform-support format:
      
      index value pairs: <index, val> ... <index, val>
      
      index: Option vector 5 byte number
      val:   Some representation of supported values
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      [mpe: Don't print about unknown options, be consistent with OV5_FEAT]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      014d02cb
  29. 16 Feb, 2017 1 commit
    • Paul Mackerras's avatar
      powerpc/64: Disable use of radix under a hypervisor · 3f91a89d
      Paul Mackerras authored
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it may try to use the radix MMU even though it doesn't have
      the necessary code to do so (it doesn't negotiate use of radix, and it
      doesn't do the H_REGISTER_PROC_TBL hcall).  If the hypervisor supports
      both radix and HPT, then it will set up the guest to use HPT (since the
      guest doesn't request radix in the CAS call), but if the radix feature
      bit is set in the ibm,pa-features property (which is valid, since
      ibm,pa-features is defined to represent the capabilities of the
      processor) the guest will try to use radix, resulting in a crash when
      it turns the MMU on.
      
      This makes the minimal fix for the current code, which is to disable
      radix unless we are running in hypervisor mode.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3f91a89d
  30. 31 Jan, 2017 2 commits
    • Paul Mackerras's avatar
      powerpc/64: Enable use of radix MMU under hypervisor on POWER9 · cc3d2940
      Paul Mackerras authored
      To use radix as a guest, we first need to tell the hypervisor via
      the ibm,client-architecture call first that we support POWER9 and
      architecture v3.00, and that we can do either radix or hash and
      that we would like to choose later using an hcall (the
      H_REGISTER_PROC_TBL hcall).
      
      Then we need to check whether the hypervisor agreed to us using
      radix.  We need to do this very early on in the kernel boot process
      before any of the MMU initialization is done.  If the hypervisor
      doesn't agree, we can't use radix and therefore clear the radix
      MMU feature bit.
      
      Later, when we have set up our process table, which points to the
      radix tree for each process, we need to install that using the
      H_REGISTER_PROC_TBL hcall.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cc3d2940
    • Paul Mackerras's avatar
      powerpc/64: Don't try to use radix MMU under a hypervisor · 18569c1f
      Paul Mackerras authored
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it will try to use the radix MMU even though it doesn't have
      the necessary code to use radix under a hypervisor (it doesn't negotiate
      use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
      result is that the guest kernel will crash when it tries to turn on the
      MMU.
      
      This fixes it by looking for the /chosen/ibm,architecture-vec-5
      property, and if it exists, clears the radix MMU feature bit, before we
      decide whether to initialize for radix or HPT. This property is created
      by the hypervisor as a result of the guest calling the
      ibm,client-architecture-support method to indicate its capabilities, so
      it will indicate whether the hypervisor agreed to us using radix.
      
      Systems without a hypervisor may have this property also (for example,
      skiboot creates it), so we check the HV bit in the MSR to see whether we
      are running as a guest or not. If we are in hypervisor mode, then we can
      do whatever we like including using the radix MMU.
      
      The reason for using this property is that in future, when we have
      support for using radix under a hypervisor, we will need to check this
      property to see whether the hypervisor agreed to us using radix.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      18569c1f
  31. 24 Dec, 2016 1 commit
  32. 10 Dec, 2016 1 commit
    • Christophe Leroy's avatar
      powerpc: port 64 bits pgtable_cache to 32 bits · 9b081e10
      Christophe Leroy authored
      Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
      standard pages when using 4k pages and a single pgtable_cache
      if using other size pages.
      
      In preparation of implementing huge pages on the 8xx, this patch
      replaces the specific powerpc32 handling by the 64 bits approach.
      
      This is done by:
      * moving 64 bits pgtable_cache_add() and pgtable_cache_init()
      in a new file called init-common.c
      * modifying pgtable_cache_init() to also handle the case
      without PMD
      * removing the 32 bits version of pgtable_cache_add() and
      pgtable_cache_init()
      * copying related header contents from 64 bits into both the
      book3s/32 and nohash/32 header files
      
      On the 8xx, the following cache sizes will be used:
      * 4k pages mode:
      - PGT_CACHE(10) for PGD
      - PGT_CACHE(3) for 512k hugepage tables
      * 16k pages mode:
      - PGT_CACHE(6) for PGD
      - PGT_CACHE(7) for 512k hugepage tables
      - PGT_CACHE(3) for 8M hugepage tables
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarScott Wood <oss@buserror.net>
      9b081e10
  33. 01 Aug, 2016 2 commits