1. 05 Dec, 2013 18 commits
    • Gavin Shan's avatar
      powerpc/eeh: Output PHB diag-data · 2c77e957
      Gavin Shan authored
      When hitting frozen PE or fenced PHB, it's always indicative to
      have dumped PHB diag-data for further analysis and diagnosis.
      However, we never dump that for the cases. The patch intends to
      dump PHB diag-data at the backend of eeh_ops::get_log() for PowerNV
      platform.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2c77e957
    • Gavin Shan's avatar
      powerpc/powernv: Move PHB-diag dump functions around · 93aef2a7
      Gavin Shan authored
      Prior to the completion of PCI enumeration, we actively detects
      EEH errors on PCI config cycles and dump PHB diag-data if necessary.
      The EEH backend also dumps PHB diag-data in case of frozen PE or
      fenced PHB. However, we are using different functions to dump the
      PHB diag-data for those 2 cases.
      
      The patch merges the functions for dumping PHB diag-data to one so
      that we can avoid duplicate code. Also, we never dump PHB3 diag-data
      during PCI config cycles with frozen PE. The patch fixes it as well.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      93aef2a7
    • Alexey Kardashevskiy's avatar
      PPC: POWERNV: move iommu_add_device earlier · d905c5df
      Alexey Kardashevskiy authored
      The current implementation of IOMMU on sPAPR does not use iommu_ops
      and therefore does not call IOMMU API's bus_set_iommu() which
      1) sets iommu_ops for a bus
      2) registers a bus notifier
      Instead, PCI devices are added to IOMMU groups from
      subsys_initcall_sync(tce_iommu_init) which does basically the same
      thing without using iommu_ops callbacks.
      
      However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
      implements iommu_ops and when tce_iommu_init is called, every PCI device
      is already added to some group so there is a conflict.
      
      This patch does 2 things:
      1. removes the loop in which PCI devices were added to groups and
      adds explicit iommu_add_device() calls to add devices as soon as they get
      the iommu_table pointer assigned to them.
      2. moves a bus notifier to powernv code in order to avoid conflict with
      the notifier from Freescale driver.
      
      iommu_add_device() and iommu_del_device() are public now.
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d905c5df
    • Vasant Hegde's avatar
      powerpc/powernv: Move SG list structure to header file · 7e1ce5a4
      Vasant Hegde authored
      Move SG list and entry structure to header file so that
      it can be used in other places as well.
      Signed-off-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7e1ce5a4
    • Mahesh Salgaonkar's avatar
      powerpc/powernv: Infrastructure to read opal messages in generic format. · 24366360
      Mahesh Salgaonkar authored
      Opal now has a new messaging infrastructure to push the messages to
      linux in a generic format for different type of messages using only one
      event bit. The format of the opal message is as below:
      
      struct opal_msg {
              uint32_t msg_type;
      	uint32_t reserved;
      	uint64_t params[8];
      };
      
      This patch allows clients to subscribe for notification for specific
      message type. It is upto the subscriber to decipher the messages who showed
      interested in receiving specific message type.
      
      The interface to subscribe for notification is:
      
      	int opal_message_notifier_register(enum OpalMessageType msg_type,
                                              struct notifier_block *nb)
      
      The notifier will fetch the opal message when available and notify the
      subscriber with message type and the opal message. It is subscribers
      responsibility to copy the message data before returning from notifier
      callback.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      24366360
    • Geert Uytterhoeven's avatar
      powerpc/windfarm: Remove superfluous name casts · 8bb61fe1
      Geert Uytterhoeven authored
      wf_sensor.name is "const char *"
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8bb61fe1
    • Mahesh Salgaonkar's avatar
      powerpc/powernv: Machine check exception handling. · b63a0ffe
      Mahesh Salgaonkar authored
      Add basic error handling in machine check exception handler.
      
      - If MSR_RI isn't set, we can not recover.
      - Check if disposition set to OpalMCE_DISPOSITION_RECOVERED.
      - Check if address at fault is inside kernel address space, if not then send
        SIGBUS to process if we hit exception when in userspace.
      - If address at fault is not provided then and if we get a synchronous machine
        check while in userspace then kill the task.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b63a0ffe
    • Mahesh Salgaonkar's avatar
      powerpc/powernv: Remove machine check handling in OPAL. · 28446de2
      Mahesh Salgaonkar authored
      Now that we are ready to handle machine check directly in linux, do not
      register with firmware to handle machine check exception.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      28446de2
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Queue up and process delayed MCE events. · b5ff4211
      Mahesh Salgaonkar authored
      When machine check real mode handler can not continue into host kernel
      in V mode, it returns from the interrupt and we loose MCE event which
      never gets logged. In such a situation queue up the MCE event so that
      we can log it later when we get back into host kernel with r1 pointing to
      kernel stack e.g. during syscall exit.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b5ff4211
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Decode and save machine check event. · 36df96f8
      Mahesh Salgaonkar authored
      Now that we handle machine check in linux, the MCE decoding should also
      take place in linux host. This info is crucial to log before we go down
      in case we can not handle the machine check errors. This patch decodes
      and populates a machine check event which contain high level meaning full
      MCE information.
      
      We do this in real mode C code with ME bit on. The MCE information is still
      available on emergency stack (in pt_regs structure format). Even if we take
      another exception at this point the MCE early handler will allocate a new
      stack frame on top of current one. So when we return back here we still have
      our MCE information safe on current stack.
      
      We use per cpu buffer to save high level MCE information. Each per cpu buffer
      is an array of machine check event structure indexed by per cpu counter
      mce_nest_count. The mce_nest_count is incremented every time we enter
      machine check early handler in real mode to get the current free slot
      (index = mce_nest_count - 1). The mce_nest_count is decremented once the
      MCE info is consumed by virtual mode machine exception handler.
      
      This patch provides save_mce_event(), get_mce_event() and release_mce_event()
      generic routines that can be used by machine check handlers to populate and
      retrieve the event. The routine release_mce_event() will free the event slot so
      that it can be reused. Caller can invoke get_mce_event() with a release flag
      either to release the event slot immediately OR keep it so that it can be
      fetched again. The event slot can be also released anytime by invoking
      release_mce_event().
      
      This patch also updates kvm code to invoke get_mce_event to retrieve generic
      mce event rather than paca->opal_mce_evt.
      
      The KVM code always calls get_mce_event() with release flags set to false so
      that event is available for linus host machine
      
      If machine check occurs while we are in guest, KVM tries to handle the error.
      If KVM is able to handle MC error successfully, it enters the guest and
      delivers the machine check to guest. If KVM is not able to handle MC error, it
      exists the guest and passes the control to linux host machine check handler
      which then logs MC event and decides how to handle it in linux host. In failure
      case, KVM needs to make sure that the MC event is available for linux host to
      consume. Hence KVM always calls get_mce_event() with release flags set to false
      and later it invokes release_mce_event() only if it succeeds to handle error.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      36df96f8
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8. · ae744f34
      Mahesh Salgaonkar authored
      This patch handles the memory errors on power8. If we get a machine check
      exception due to SLB or TLB errors, then flush SLBs/TLBs and reload SLBs to
      recover.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ae744f34
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7. · e22a2274
      Mahesh Salgaonkar authored
      If we get a machine check exception due to SLB or TLB errors, then flush
      SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning
      on MMU. Otherwise we would run into nested machine checks.
      
      If we get a machine check when we are in guest, then just flush the
      SLBs and continue. This patch handles errors for power7. The next
      patch will handle errors for power8
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e22a2274
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Add flush_tlb operation in cpu_spec. · 04407050
      Mahesh Salgaonkar authored
      This patch introduces flush_tlb operation in cpu_spec structure. This will
      help us to invoke appropriate CPU-side flush tlb routine. This patch
      adds the foundation to invoke CPU specific flush routine for respective
      architectures. Currently this patch introduce flush_tlb for p7 and p8.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      04407050
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Introduce a early machine check hook in cpu_spec. · 4c703416
      Mahesh Salgaonkar authored
      This patch adds the early machine check function pointer in cputable for
      CPU specific early machine check handling. The early machine handle routine
      will be called in real mode to handle SLB and TLB errors. We can not reuse
      the existing machine_check hook because it is always invoked in kernel
      virtual mode and we would already be in trouble if we get SLB or TLB errors.
      This patch just sets up a mechanism to invoke CPU specific handler. The
      subsequent patches will populate the function pointer.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4c703416
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Return from interrupt if coming from evil context. · 1c51089f
      Mahesh Salgaonkar authored
      We can get machine checks from any context. We need to make sure that
      we handle all of them correctly. If we are coming from hypervisor user-space,
      we can continue in host kernel in virtual mode to deliver the MC event.
      If we got woken up from power-saving mode then we may come in with one of
      the following state:
       a. No state loss
       b. Supervisor state loss
       c. Hypervisor state loss
      For (a) and (b), we go back to nap again. State (c) is fatal, keep spinning.
      
      For all other context which we not sure of queue up the MCE event and return
      from the interrupt.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1c51089f
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: handle machine check in Linux host. · 1e9b4507
      Mahesh Salgaonkar authored
      Move machine check entry point into Linux. So far we were dependent on
      firmware to decode MCE error details and handover the high level info to OS.
      
      This patch introduces early machine check routine that saves the MCE
      information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
      stack frame on emergency stack and set the r1 accordingly. This allows us to be
      prepared to take another exception without loosing context. One thing to note
      here that, if we get another machine check while ME bit is off then we risk a
      checkstop. Hence we restrict ourselves to save only MCE information and
      register saved on PACA_EXMC save are before we turn the ME bit on. We use
      paca->in_mce flag to differentiate between first entry and nested machine check
      entry which helps proper use of emergency stack. We increment paca->in_mce
      every time we enter in early machine check handler and decrement it while
      leaving. When we enter machine check early handler first time (paca->in_mce ==
      0), we are sure nobody is using MC emergency stack and allocate a stack frame
      at the start of the emergency stack. During subsequent entry (paca->in_mce >
      0), we know that r1 points inside emergency stack and we allocate separate
      stack frame accordingly. This prevents us from clobbering MCE information
      during nested machine checks.
      
      The early machine check handler changes are placed under CPU_FTR_HVMODE
      section. This makes sure that the early machine check handler will get executed
      only in hypervisor kernel.
      
      This is the code flow:
      
      		Machine Check Interrupt
      			|
      			V
      		   0x200 vector				  ME=0, IR=0, DR=0
      			|
      			V
      	+-----------------------------------------------+
      	|machine_check_pSeries_early:			| ME=0, IR=0, DR=0
      	|	Alloc frame on emergency stack		|
      	|	Save srr1, srr0, dar and dsisr on stack |
      	+-----------------------------------------------+
      			|
      		(ME=1, IR=0, DR=0, RFID)
      			|
      			V
      		machine_check_handle_early		  ME=1, IR=0, DR=0
      			|
      			V
      	+-----------------------------------------------+
      	|	machine_check_early (r3=pt_regs)	| ME=1, IR=0, DR=0
      	|	Things to do: (in next patches)		|
      	|		Flush SLB for SLB errors	|
      	|		Flush TLB for TLB errors	|
      	|		Decode and save MCE info	|
      	+-----------------------------------------------+
      			|
      	(Fall through existing exception handler routine.)
      			|
      			V
      		machine_check_pSerie			  ME=1, IR=0, DR=0
      			|
      		(ME=1, IR=1, DR=1, RFID)
      			|
      			V
      		machine_check_common			  ME=1, IR=1, DR=1
      			.
      			.
      			.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1e9b4507
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Introduce exclusive emergency stack for machine check exception. · 729b0f71
      Mahesh Salgaonkar authored
      This patch introduces exclusive emergency stack for machine check exception.
      We use emergency stack to handle machine check exception so that we can save
      MCE information (srr1, srr0, dar and dsisr) before turning on ME bit and be
      ready for re-entrancy. This helps us to prevent clobbering of MCE information
      in case of nested machine checks.
      
      The reason for using emergency stack over normal kernel stack is that the
      machine check might occur in the middle of setting up a stack frame which may
      result into improper use of kernel stack.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      729b0f71
    • Mahesh Salgaonkar's avatar
      powerpc/book3s: Split the common exception prolog logic into two section. · b14a7253
      Mahesh Salgaonkar authored
      This patch splits the common exception prolog logic into three parts to
      facilitate reuse of existing code in the next patch. This patch also
      re-arranges few instructions in such a way that the second part now deals
      with saving register values from paca save area to stack frame, and
      the third part deals with saving current register values to stack frame.
      
      The second and third part will be reused in the machine check exception
      routine in the subsequent patch.
      
      Please note that this patch does not introduce or change existing code
      logic. Instead it is just a code movement and instruction re-ordering.
      
      Patch Acked-by Paul. But made some minor modification (explained above) to
      address Paul's comment in the later patch(3).
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b14a7253
  2. 02 Dec, 2013 12 commits
  3. 27 Nov, 2013 1 commit
  4. 25 Nov, 2013 5 commits
    • Chen Gang's avatar
      arch/powerpc/kernel: Use %12.12s instead of %12s to avoid memory overflow · e0513d9e
      Chen Gang authored
      for tmp_part->header.name:
          it is "Terminating null required only for names < 12 chars".
          so need to limit the %.12s for it in printk
      
        additional info:
      
          %12s  limit the width, not for the original string output length
                if name length is more than 12, it still can be fully displayed.
                if name length is less than 12, the ' ' will be filled before name.
      
          %.12s truly limit the original string output length (precision)
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e0513d9e
    • Michael Neuling's avatar
      powerpc/signals: Improved mark VSX not saved with small contexts fix · ec67ad82
      Michael Neuling authored
      In a recent patch:
        commit c13f20ac
        Author: Michael Neuling <mikey@neuling.org>
        powerpc/signals: Mark VSX not saved with small contexts
      
      We fixed an issue but an improved solution was later discussed after the patch
      was merged.
      
      Firstly, this patch doesn't handle the 64bit signals case, which could also hit
      this issue (but has never been reported).
      
      Secondly, the original patch isn't clear what MSR VSX should be set to.  The
      new approach below always clears the MSR VSX bit (to indicate no VSX is in the
      context) and sets it only in the specific case where VSX is available (ie. when
      VSX has been used and the signal context passed has space to provide the
      state).
      
      This reverts the original patch and replaces it with the improved solution.  It
      also adds a 64 bit version.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ec67ad82
    • Hari Bathini's avatar
      powerpc/kdump: Adding symbols in vmcoreinfo to facilitate dump filtering · 8ff81271
      Hari Bathini authored
      When CONFIG_SPARSEMEM_VMEMMAP option is used in kernel, makedumpfile fails
      to filter vmcore dump as it fails to do vmemmap translations. So far
      dump filtering on ppc64 never had to deal with vmemmap addresses seperately
      as vmemmap regions where mapped in zone normal. But with the inclusion of
      CONFIG_SPARSEMEM_VMEMMAP config option in kernel, this vmemmap address
      translation support becomes necessary for dump filtering. For vmemmap adress
      translation, few kernel symbols are needed by dump filtering tool. This patch
      adds those symbols to vmcoreinfo, which a dump filtering tool can use for
      filtering the kernel dump. Tested this changes successfully with makedumpfile
      tool that supports vmemmap to physical address translation outside zone normal.
      
      [ Removed unneeded #ifdef as suggested by Michael Ellerman --BenH ]
      Signed-off-by: default avatarHari Bathini <hbathini@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8ff81271
    • Anton Blanchard's avatar
      powerpc: allyesconfig should not select CONFIG_CPU_LITTLE_ENDIAN · 962bc221
      Anton Blanchard authored
      Stephen reported a failure in an allyesconfig build.
      CONFIG_CPU_LITTLE_ENDIAN=y gets set but his toolchain is not
      new enough to support little endian. We really want to
      default to a big endian build; Ben suggested using a choice
      which defaults to CPU_BIG_ENDIAN.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      962bc221
    • Michael Neuling's avatar
      powerpc: Fix error when cross building TAGS & cscope · 924dd50b
      Michael Neuling authored
      Currently if I cross build TAGS or cscope from x86 I get this:
        % make ARCH=powerpc TAGS
        gcc-4.8.real: error: unrecognized command line option ‘-mbig-endian’
        GEN     TAGS
        %
      
      I'm not setting CROSS_COMPILE= as logically I shouldn't need to and I
      haven't needed to in the past when building TAGS or cscope.  Also, the
      above completess correct as the error is not fatal to the build.
      
      This was caused by:
          commit d72b0801
          Author: Ian Munsie <imunsie@au1.ibm.com>
          powerpc: Add ability to build little endian kernels
      
      The below fixes this by testing for the -mbig-endian option before
      adding it.
      
      I've not done the same thing in the little endian case as if
      -mlittle-endian doesn't exist, we probably want to fail quickly as you
      probably have an old big endian compiler.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      924dd50b
  5. 24 Nov, 2013 1 commit
  6. 22 Nov, 2013 3 commits