1. 03 Aug, 2011 6 commits
    • Huang Ying's avatar
      HWPoison: add memory_failure_queue() · ea8f5fb8
      Huang Ying authored
      memory_failure() is the entry point for HWPoison memory error
      recovery.  It must be called in process context.  But commonly
      hardware memory errors are notified via MCE or NMI, so some delayed
      execution mechanism must be used.  In MCE handler, a work queue + ring
      buffer mechanism is used.
      
      In addition to MCE, now APEI (ACPI Platform Error Interface) GHES
      (Generic Hardware Error Source) can be used to report memory errors
      too.  To add support to APEI GHES memory recovery, a mechanism similar
      to that of MCE is implemented.  memory_failure_queue() is the new
      entry point that can be called in IRQ context.  The next step is to
      make MCE handler uses this interface too.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      ea8f5fb8
    • Huang Ying's avatar
      ACPI, APEI, GHES, Error records content based throttle · 152cef40
      Huang Ying authored
      printk is used by GHES to report hardware errors.  Ratelimit is
      enforced on the printk to avoid too many hardware error reports in
      kernel log.  Because there may be thousands or even millions of
      corrected hardware errors during system running.
      
      Currently, a simple scheme is used.  That is, the total number of
      hardware error reporting is ratelimited.  This may cause some issues
      in practice.
      
      For example, there are two kinds of hardware errors occurred in
      system.  One is corrected memory error, because the fault memory
      address is accessed frequently, there may be hundreds error report
      per-second.  The other is corrected PCIe AER error, it will be
      reported once per-second.  Because they share one ratelimit control
      structure, it is highly possible that only memory error is reported.
      
      To avoid the above issue, an error record content based throttle
      algorithm is implemented in the patch.  Where after the first
      successful reporting, all error records that are same are throttled for
      some time, to let other kinds of error records have the opportunity to
      be reported.
      
      In above example, the memory errors will be throttled for some time,
      after being printked.  Then the PCIe AER error will be printked
      successfully.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      152cef40
    • Huang Ying's avatar
      ACPI, APEI, GHES, printk support for recoverable error via NMI · 67eb2e99
      Huang Ying authored
      Some APEI GHES recoverable errors are reported via NMI, but printk is
      not safe in NMI context.
      
      To solve the issue, a lock-less memory allocator is used to allocate
      memory in NMI handler, save the error record into the allocated
      memory, put the error record into a lock-less list.  On the other
      hand, an irq_work is used to delay the operation from NMI context to
      IRQ context.  The irq_work IRQ handler will remove nodes from
      lock-less list, printk the error record and do some further processing
      include recovery operation, then free the memory.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      67eb2e99
    • Huang Ying's avatar
      lib, Make gen_pool memory allocator lockless · 7f184275
      Huang Ying authored
      This version of the gen_pool memory allocator supports lockless
      operation.
      
      This makes it safe to use in NMI handlers and other special
      unblockable contexts that could otherwise deadlock on locks.  This is
      implemented by using atomic operations and retries on any conflicts.
      The disadvantage is that there may be livelocks in extreme cases.  For
      better scalability, one gen_pool allocator can be used for each CPU.
      
      The lockless operation only works if there is enough memory available.
      If new memory is added to the pool a lock has to be still taken.  So
      any user relying on locklessness has to ensure that sufficient memory
      is preallocated.
      
      The basic atomic operation of this allocator is cmpxchg on long.  On
      architectures that don't have NMI-safe cmpxchg implementation, the
      allocator can NOT be used in NMI handler.  So code uses the allocator
      in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      7f184275
    • Huang Ying's avatar
      lib, Add lock-less NULL terminated single list · f49f23ab
      Huang Ying authored
      Cmpxchg is used to implement adding new entry to the list, deleting
      all entries from the list, deleting first entry of the list and some
      other operations.
      
      Because this is a single list, so the tail can not be accessed in O(1).
      
      If there are multiple producers and multiple consumers, llist_add can
      be used in producers and llist_del_all can be used in consumers.  They
      can work simultaneously without lock.  But llist_del_first can not be
      used here.  Because llist_del_first depends on list->first->next does
      not changed if list->first is not changed during its operation, but
      llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
      llist_add) sequence in another consumer may violate that.
      
      If there are multiple producers and one consumer, llist_add can be
      used in producers and llist_del_all or llist_del_first can be used in
      the consumer.
      
      This can be summarized as follow:
      
                 |   add    | del_first |  del_all
       add       |    -     |     -     |     -
       del_first |          |     L     |     L
       del_all   |          |           |     -
      
      Where "-" stands for no lock is needed, while "L" stands for lock is
      needed.
      
      The list entries deleted via llist_del_all can be traversed with
      traversing function such as llist_for_each etc.  But the list entries
      can not be traversed safely before deleted from the list.  The order
      of deleted entries is from the newest to the oldest added one.  If you
      want to traverse from the oldest to the newest, you must reverse the
      order by yourself before traversing.
      
      The basic atomic operation of this list is cmpxchg on long.  On
      architectures that don't have NMI-safe cmpxchg implementation, the
      list can NOT be used in NMI handler.  So code uses the list in NMI
      handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      f49f23ab
    • Huang Ying's avatar
      Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG · df013ffb
      Huang Ying authored
      cmpxchg() is widely used by lockless code, including NMI-safe lockless
      code.  But on some architectures, the cmpxchg() implementation is not
      NMI-safe, on these architectures the lockless code may need a
      spin_trylock_irqsave() based implementation.
      
      This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that
      NMI-safe lockless code can depend on it or provide different
      implementation according to it.
      
      On many architectures, cmpxchg is only NMI-safe for several specific
      operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch
      only guarantees cmpxchg is NMI-safe for sizeof(unsigned long).
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Acked-by: default avatarMike Frysinger <vapier@gentoo.org>
      Acked-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Acked-by: default avatarHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Acked-by: default avatarRichard Henderson <rth@twiddle.net>
      CC: Mikael Starvik <starvik@axis.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      CC: Yoshinori Sato <ysato@users.sourceforge.jp>
      CC: Tony Luck <tony.luck@intel.com>
      CC: Hirokazu Takata <takata@linux-m32r.org>
      CC: Geert Uytterhoeven <geert@linux-m68k.org>
      CC: Michal Simek <monstr@monstr.eu>
      Acked-by: default avatarRalf Baechle <ralf@linux-mips.org>
      CC: Kyle McMartin <kyle@mcmartin.ca>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Chen Liqin <liqin.chen@sunplusct.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: Chris Zankel <chris@zankel.net>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      df013ffb
  2. 14 Jul, 2011 10 commits
  3. 12 Jul, 2011 8 commits
  4. 11 Jul, 2011 16 commits