1. 17 Dec, 2015 19 commits
    • Nathan Fontenot's avatar
      powerpc/pseries: Verify CPU doesn't exist before adding · 1f859adb
      Nathan Fontenot authored
      When DLPAR adding a CPU we should verify that the CPU does not already
      exist. Failure to do so can generate a kernel oops;
      
      [    9.465585] kernel BUG at arch/powerpc/platforms/pseries/dlpar.c:382!
      [    9.465796] Oops: Exception in kernel mode, sig: 5 [#1]
      
      This oops can be generated by causing a probe to be performed on a cpu
      by writing to the sysfs cpu probe file (/sys/devices/system/cpu/probe).
      This patch adds a check for the existence of cpu prior to probing the cpu
      so userspace doing the wrong thing won't trigger a BUG_ON().
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1f859adb
    • Alistair Popple's avatar
      powerpc/476fpe: Add support for kexec · 4450022b
      Alistair Popple authored
      PPC476FPE has a different PVR from previous PPC476 processors. The
      kexec code checks the PVR in order to correctly setup the MMU. When
      the initial support for 476FPE processors was added the corresponding
      change in the kexec code was missed. This patch simply adds the check
      and solves the following bug on kexec:
      
      kexec: Starting new kernel
      Bye!
      Unable to handle kernel paging request for instruction fetch
      Faulting instruction address: 0xee9a50f8
      cpu 0x0: Vector: 400 (Instruction Access) at [ee9d7d20]
          pc: ee9a50f8
          lr: ee9a50e4
          sp: ee9d7dd0
          msr: 21020
          current = 0xee40f000
          pid   = 960, comm = kexec
      enter ? for help
      [link register   ] ee9a50e4
      [ee9d7dd0] c0013748 default_machine_kexec+0x58/0x70 (unreliable)
      [ee9d7df0] c0012f04 machine_kexec+0x34/0x40
      [ee9d7e00] c00aa1ec kernel_kexec+0x9c/0xb0
      [ee9d7e20] c005d704 SyS_reboot+0x1f4/0x220
      [ee9d7f40] c000db68 ret_from_syscall+0x0/0x3c
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4450022b
    • Alistair Popple's avatar
      powerpc/powernv: Add support for Nvlink NPUs · 5d2aa710
      Alistair Popple authored
      NVLink is a high speed interconnect that is used in conjunction with a
      PCI-E connection to create an interface between CPU and GPU that
      provides very high data bandwidth. A PCI-E connection to a GPU is used
      as the control path to initiate and report status of large data
      transfers sent via the NVLink.
      
      On IBM Power systems the NVLink processing unit (NPU) is similar to
      the existing PHB3. This patch adds support for a new NPU PHB type. DMA
      operations on the NPU are not supported as this patch sets the TCE
      translation tables to be the same as the related GPU PCIe device for
      each NVLink. Therefore all DMA operations are setup and controlled via
      the PCIe device.
      
      EEH is not presently supported for the NPU devices, although it may be
      added in future.
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      5d2aa710
    • Alistair Popple's avatar
      powerpc: Add __raw_rm_writeq() function · a84bf321
      Alistair Popple authored
      Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to
      include/asm/io.h so that it can be used by other code.
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a84bf321
    • Alistair Popple's avatar
      Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field" · 94973b24
      Alistair Popple authored
      This commit removed the pcidev field from struct pci_dn as it was no
      longer in use by the kernel. However to support finding the
      association of Nvlink devices to GPU devices from the device-tree this
      field is required.
      
      This reverts commit 250c7b27 ("powerpc/pci: Remove unused struct
      pci_dn.pcidev field").
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      94973b24
    • Gavin Shan's avatar
      powerpc/powernv: Fix M64 resource name in /proc/iomem · e80c4e7c
      Gavin Shan authored
      The name of PCI root bus's M64 resource isn't initialized properly.
      When dumping "/proc/iomem", "<BAD>" is seen for those M64 resources
      on PCI root buses.
      
         ~# cat /proc/iomem | grep -e "BAD"
         3b0000000000-3b0fefffffff : <BAD>
         3b1000000000-3b1fefffffff : <BAD>
         3c0000000000-3c0fefffffff : <BAD>
         3c1000000000-3c1fefffffff : <BAD>
         3c2000000000-3c2fefffffff : <BAD>
      
      This fixes the issue by setting the name of PCI root bus's M64
      resource to that of PHB's device node full name. With the patch,
      no "<BAD>" is seen from "/proc/iomem".
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e80c4e7c
    • Laurent Dufour's avatar
      powerpc/mm: Add page soft dirty tracking · 7207f436
      Laurent Dufour authored
      User space checkpoint and restart tool (CRIU) needs the page's change
      to be soft tracked. This allows to do a pre checkpoint and then dump
      only touched pages.
      
      This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
      the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
      the page is swapped out.
      
      To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k
      and hash 64k pte, the bits already defined in hash-*4k.h should be
      shifted left by one.
      
      The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in
      the swap pte. A check is added to ensure that the bit is not
      overwritten by _PAGE_HPTEFLAGS.
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7207f436
    • Michael Ellerman's avatar
      powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIES · 2613265c
      Michael Ellerman authored
      The STD_EXCEPTION_PSERIES macro takes both a vector number, and a
      location (memory address). However both are always identical, so combine
      them to save repeating ourselves.
      
      This does mean an exception handler must always exist at the location in
      memory that matches its vector number. But that's OK because this is the
      "STD" macro (standard), which does exactly that. We have other macros
      for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line).
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2613265c
    • Michael Ellerman's avatar
      powerpc/kernel: Open code SET_DEFAULT_THREAD_PPR · d8725ce8
      Michael Ellerman authored
      This is only used in one location, open code it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d8725ce8
    • Michael Ellerman's avatar
      powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPR · d030a4b5
      Michael Ellerman authored
      HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d030a4b5
    • Michael Ellerman's avatar
      powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARD · d6265aea
      Michael Ellerman authored
      HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most
      of our first level exception handlers. It conditionally executes a
      HMT_MEDIUM instruction, which sets the processor priority to medium.
      
      On on modern systems, ie. Power7 and later, it is nop'ed out at boot.
      All it does is make the exception vectors more cramped, and consume 4
      bytes of icache.
      
      On old systems it has the effect of boosting the processor priority at
      the start of exception processing. If we were previously in the idle
      loop for example, we may be at low or very low priority. This is
      desirable as we want to process the exception as fast as possible.
      
      However looking closely at the generated code, we see that in all cases
      we execute another HMT_MEDIUM just four instructions later. With code
      patching applied, the final code on an old (Power6) system will look
      like, eg:
      
        c000000000000300 <data_access_pSeries>:
        c000000000000300:	7c 42 13 78	mr	r2,r2		<-
        c000000000000304:	7d b2 43 a6	mtsprg	2,r13
        c000000000000308:	7d b1 42 a6	mfsprg	r13,1
        c00000000000030c:	f9 2d 00 80	std	r9,128(r13)
        c000000000000310:	60 00 00 00	nop
        c000000000000314:	7c 42 13 78	mr	r2,r2		<-
      
      So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is
      not justified by the benefit of boosting the processor priority for the
      duration of four instructions, and therefore we drop it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d6265aea
    • Michael Ellerman's avatar
      powerpc/rtas: Make enter_rtas() private · cd5cdeb6
      Michael Ellerman authored
      There are no longer any users of enter_rtas() outside of rtas.c, so make
      it "private", by moving the declaration inside rtas.c. Hopefully this
      will encourage people to use one of the wrappers which takes the sharp
      edges off the RTAS calling sequence.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cd5cdeb6
    • Michael Ellerman's avatar
      powerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status() · 4456f452
      Michael Ellerman authored
      Although call_rtas_display_status() does actually want to use the
      regular RTAS locking, it doesn't want the extra logic that is in
      rtas_call(), so currently it open codes the logic.
      
      Instead we can use rtas_call_unlocked(), after taking the RTAS lock.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4456f452
    • Michael Ellerman's avatar
      powerpc/pseries: Use rtas_call_unlocked() in pseries hotplug · b2e8590f
      Michael Ellerman authored
      Avoid open coding the logic by using rtas_call_unlocked().
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b2e8590f
    • Michael Ellerman's avatar
      powerpc/xmon: Use rtas_call_unlocked() in xmon · 08eb105a
      Michael Ellerman authored
      Avoid open coding the logic by using rtas_call_unlocked().
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      08eb105a
    • Michael Ellerman's avatar
      powerpc/rtas: Add rtas_call_unlocked() · 209eb4e5
      Michael Ellerman authored
      Most users of RTAS (Run-Time Abstraction Services) use rtas_call(),
      which deals with locking as well as endian handling.
      
      However we have two users outside of rtas.c that can't use rtas_call()
      because they have different locking requirements.
      
      The hotplug CPU code can't take the RTAS lock because the CPU would go
      offline with the lock held and no other CPUs would be able to call RTAS
      until the CPU came back online.
      
      The xmon code doesn't want to take the lock because it would risk dead
      locking when we are trying to recover from a crash.
      
      Both sites required multiple patches when we added little endian
      support, proving that programmers can't do endian right.
      
      Although that ship has sailed, we can still clean the code up by
      providing an unlocked version of rtas_call() which avoids the need to
      open code the logic elsewhere.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      209eb4e5
    • Stewart Smith's avatar
      powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL · e4d54f71
      Stewart Smith authored
      Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is
      just OPALv3, with nobody ever expecting anything on pre-OPALv3 to
      be cared about or supported by mainline kernels.
      
      So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL
      exclusively.
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e4d54f71
    • Stewart Smith's avatar
      powerpc/powernv: Remove OPALv2 firmware define and references · 7261aafc
      Stewart Smith authored
      OPALv2 only ever existed in the lab and didn't escape to the world.
      All OPAL systems in the wild are OPALv3.
      
      The probability of there being an OPALv2 system still powered on
      anywhere inside IBM is approximately zero, let alone anyone
      expecting to run mainline kernels.
      
      So, start to remove references to OPALv2.
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7261aafc
    • Stewart Smith's avatar
      powerpc/powernv: panic() on OPAL < V3 · 786842b6
      Stewart Smith authored
      The OpenPower Abstraction Layer firmware went through a couple
      of iterations in the lab before being released. What we now know
      as OPAL advertises itself as OPALv3.
      
      OPALv2 and OPALv1 never made it outside the lab, and the possibility
      of anyone at all ever building a mainline kernel today and expecting
      it to boot on such hardware is zero.
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      786842b6
  2. 16 Dec, 2015 6 commits
  3. 14 Dec, 2015 15 commits