1. 28 May, 2010 2 commits
  2. 20 May, 2010 13 commits
    • Huang Ying's avatar
      ACPI, APEI, EINJ injection parameters support · 6e320ec1
      Huang Ying authored
      Some hardware error injection needs parameters, for example, it is
      useful to specify memory address and memory address mask for memory
      errors.
      
      Some BIOSes allow parameters to be specified via an unpublished
      extension. This patch adds support to it. The parameters will be
      ignored on machines without necessary BIOS support.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      6e320ec1
    • Huang Ying's avatar
      Add x64 support to debugfs · 15b0beaa
      Huang Ying authored
      Add debugfs_create_x64. This is needed by ACPI APEI EINJ parameters support.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      15b0beaa
    • Huang Ying's avatar
      ACPI, APEI, Use ERST for persistent storage of MCE · 482908b4
      Huang Ying authored
      Traditionally, fatal MCE will cause Linux print error log to console
      then reboot. Because MCE registers will preserve their content after
      warm reboot, the hardware error can be logged to disk or network after
      reboot. But system may fail to warm reboot, then you may lose the
      hardware error log. ERST can help here. Through saving the hardware
      error log into flash via ERST before go panic, the hardware error log
      can be gotten from the flash after system boot successful again.
      
      The fatal MCE processing procedure with ERST involved is as follow:
      
      - Hardware detect error, MCE raised
      - MCE read MCE registers, check error severity (fatal), prepare error record
      - Write MCE error record into flash via ERST
      - Go panic, then trigger system reboot
      - System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
        for error record of previous boot via ERST, and output and clear
        them if available
      - /sbin/mcelog logs error records into disk or network
      
      ERST only accepts CPER record format, but there is no pre-defined CPER
      section can accommodate all information in struct mce, so a customized
      section type is defined to hold struct mce inside a CPER record as an
      error section.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      482908b4
    • Huang Ying's avatar
      ACPI, APEI, Error Record Serialization Table (ERST) support · a08f82d0
      Huang Ying authored
      ERST is a way provided by APEI to save and retrieve hardware error
      record to and from some simple persistent storage (such as flash).
      
      The Linux kernel support implementation is quite simple and workable
      in NMI context. So it can be used to save hardware error record into
      flash in hardware error exception or NMI handler, where other more
      complex persistent storage such as disk is not usable. After saving
      hardware error records via ERST in hardware error exception or NMI
      handler, the error records can be retrieved and logged into disk or
      network after a clean reboot.
      
      For more information about ERST, please refer to ACPI Specification
      version 4.0, section 17.4.
      
      This patch incorporate fixes from Jin Dongming.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      a08f82d0
    • Huang Ying's avatar
      ACPI, APEI, Generic Hardware Error Source memory error support · d334a491
      Huang Ying authored
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      Now, only SCI notification type and memory errors are supported. More
      notification type and hardware error type will be added later. These
      memory errors are reported to user space through /dev/mcelog via
      faking a corrected Machine Check, so that the error memory page can be
      offlined by /sbin/mcelog if the error count for one page is beyond the
      threshold.
      
      On some machines, Machine Check can not report physical address for
      some corrected memory errors, but GHES can do that. So this simplified
      GHES is implemented firstly.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      d334a491
    • Huang Ying's avatar
      ACPI, APEI, UEFI Common Platform Error Record (CPER) header · 06d65dea
      Huang Ying authored
      CPER stands for Common Platform Error Record, it is the hardware error
      record format used to describe platform hardware error by various APEI
      tables, such as ERST, BERT and HEST etc.
      
      For more information about CPER, please refer to Appendix N of UEFI
      Specification version 2.3.
      
      This patch mainly includes the data structure difinition header file
      used by other files.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      06d65dea
    • Huang Ying's avatar
      Unified UUID/GUID definition · fab1c232
      Huang Ying authored
      There are many different UUID/GUID definitions in kernel, such as that
      in EFI, many file systems, some drivers, etc. Every kernel components
      need UUID/GUID has its own definition. This patch provides a unified
      definition for UUID/GUID.
      
      UUID is defined via typedef. This makes that UUID appears more like a
      preliminary type, and makes the data type explicit (comparing with
      implicit "u8 uuid[16]").
      
      The binary representation of UUID/GUID can be little-endian (used by
      EFI, etc) or big-endian (defined by RFC4122), so both is defined.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      fab1c232
    • Huang Ying's avatar
      ACPI Hardware Error Device (PNP0C33) support · 801eab81
      Huang Ying authored
      Hardware Error Device (PNP0C33) is used to report some hardware errors
      notified via SCI, mainly the corrected errors. Some APEI Generic
      Hardware Error Source (GHES) may use SCI on hardware error device to
      notify hardware error to kernel.
      
      After receiving notification from ACPI core, it is forwarded to all
      listeners via a notifier chain. The listener such as APEI GHES should
      check corresponding error source for new events when notified.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      801eab81
    • Huang Ying's avatar
      ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup · affb72c3
      Huang Ying authored
      Now, a dedicated HEST tabling parsing code is used for PCIE AER
      firmware_first setup. It is rebased on general HEST tabling parsing
      code of APEI. The firmware_first setup code is moved from PCI core to
      AER driver too, because it is only AER related.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Reviewed-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      affb72c3
    • Huang Ying's avatar
      ACPI, APEI, Document for APEI · ea8c071c
      Huang Ying authored
      Add document for APEI, including kernel parameters and EINJ debug file
      sytem interface.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      ea8c071c
    • Huang Ying's avatar
      ACPI, APEI, EINJ support · e4021345
      Huang Ying authored
      EINJ provides a hardware error injection mechanism, this is useful for
      debugging and testing of other APEI and RAS features.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      e4021345
    • Huang Ying's avatar
      ACPI, APEI, HEST table parsing · 9dc96664
      Huang Ying authored
      HEST describes error sources in detail; communicating operational
      parameters (i.e. severity levels, masking bits, and threshold values)
      to OS as necessary. It also allows the platform to report error
      sources for which OS would typically not implement support (for
      example, chipset-specific error registers).
      
      HEST information may be needed by other subsystems. For example, HEST
      PCIE AER error source information describes whether a PCIE root port
      works in "firmware first" mode, this is needed by general PCIE AER
      error subsystem. So a public HEST tabling parsing interface is
      provided.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      9dc96664
    • Huang Ying's avatar
      ACPI, APEI, APEI supporting infrastructure · a643ce20
      Huang Ying authored
      APEI stands for ACPI Platform Error Interface, which allows to report
      errors (for example from the chipset) to the operating system. This
      improves NMI handling especially. In addition it supports error
      serialization and error injection.
      
      For more information about APEI, please refer to ACPI Specification
      version 4.0, chapter 17.
      
      This patch provides some common functions used by more than one APEI
      tables, mainly framework of interpreter for EINJ and ERST.
      
      A machine readable language is defined for EINJ and ERST for OS to
      execute, and so to drive the firmware to fulfill the corresponding
      functions. The machine language for EINJ and ERST is compatible, so a
      common framework is defined for them.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      a643ce20
  3. 19 May, 2010 1 commit
    • Huang Ying's avatar
      ACPI, IO memory pre-mapping and atomic accessing · 15651291
      Huang Ying authored
      Some ACPI IO accessing need to be done in atomic context. For example,
      APEI ERST operations may be used for permanent storage in hardware
      error handler. That is, it may be called in atomic contexts such as
      IRQ or NMI, etc. And, ERST/EINJ implement their operations via IO
      memory/port accessing.  But the IO memory accessing method provided by
      ACPI (acpi_read/acpi_write) maps the IO memory during it is accessed,
      so it can not be used in atomic context. To solve the issue, the IO
      memory should be pre-mapped during EINJ/ERST initializing. A linked
      list is used to record which memory area has been mapped, when memory
      is accessed in hardware error handler, search the linked list for the
      mapped virtual address from the given physical address.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      15651291
  4. 16 May, 2010 6 commits
  5. 15 May, 2010 17 commits
  6. 14 May, 2010 1 commit