Commits · 6e320ec1d98f9eb93d5b2a5d70e2f40dce923f1b · Kirill Smelkov / linux

20 May, 2010 13 commits

ACPI, APEI, EINJ injection parameters support · 6e320ec1

Huang Ying authored May 18, 2010

Some hardware error injection needs parameters, for example, it is
useful to specify memory address and memory address mask for memory
errors.

Some BIOSes allow parameters to be specified via an unpublished
extension. This patch adds support to it. The parameters will be
ignored on machines without necessary BIOS support.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

6e320ec1

Add x64 support to debugfs · 15b0beaa

Huang Ying authored May 18, 2010

Add debugfs_create_x64. This is needed by ACPI APEI EINJ parameters support.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>

15b0beaa

ACPI, APEI, Use ERST for persistent storage of MCE · 482908b4

Huang Ying authored May 18, 2010

Traditionally, fatal MCE will cause Linux print error log to console
then reboot. Because MCE registers will preserve their content after
warm reboot, the hardware error can be logged to disk or network after
reboot. But system may fail to warm reboot, then you may lose the
hardware error log. ERST can help here. Through saving the hardware
error log into flash via ERST before go panic, the hardware error log
can be gotten from the flash after system boot successful again.

The fatal MCE processing procedure with ERST involved is as follow:

- Hardware detect error, MCE raised
- MCE read MCE registers, check error severity (fatal), prepare error record
- Write MCE error record into flash via ERST
- Go panic, then trigger system reboot
- System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
  for error record of previous boot via ERST, and output and clear
  them if available
- /sbin/mcelog logs error records into disk or network

ERST only accepts CPER record format, but there is no pre-defined CPER
section can accommodate all information in struct mce, so a customized
section type is defined to hold struct mce inside a CPER record as an
error section.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

482908b4

ACPI, APEI, Error Record Serialization Table (ERST) support · a08f82d0

Huang Ying authored May 18, 2010

ERST is a way provided by APEI to save and retrieve hardware error
record to and from some simple persistent storage (such as flash).

The Linux kernel support implementation is quite simple and workable
in NMI context. So it can be used to save hardware error record into
flash in hardware error exception or NMI handler, where other more
complex persistent storage such as disk is not usable. After saving
hardware error records via ERST in hardware error exception or NMI
handler, the error records can be retrieved and logged into disk or
network after a clean reboot.

For more information about ERST, please refer to ACPI Specification
version 4.0, section 17.4.

This patch incorporate fixes from Jin Dongming.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Len Brown <len.brown@intel.com>

a08f82d0

ACPI, APEI, Generic Hardware Error Source memory error support · d334a491

Huang Ying authored May 18, 2010

Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.

On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

d334a491

ACPI, APEI, UEFI Common Platform Error Record (CPER) header · 06d65dea

Huang Ying authored May 18, 2010

CPER stands for Common Platform Error Record, it is the hardware error
record format used to describe platform hardware error by various APEI
tables, such as ERST, BERT and HEST etc.

For more information about CPER, please refer to Appendix N of UEFI
Specification version 2.3.

This patch mainly includes the data structure difinition header file
used by other files.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

06d65dea

Unified UUID/GUID definition · fab1c232

Huang Ying authored May 18, 2010

There are many different UUID/GUID definitions in kernel, such as that
in EFI, many file systems, some drivers, etc. Every kernel components
need UUID/GUID has its own definition. This patch provides a unified
definition for UUID/GUID.

UUID is defined via typedef. This makes that UUID appears more like a
preliminary type, and makes the data type explicit (comparing with
implicit "u8 uuid[16]").

The binary representation of UUID/GUID can be little-endian (used by
EFI, etc) or big-endian (defined by RFC4122), so both is defined.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

fab1c232

ACPI Hardware Error Device (PNP0C33) support · 801eab81

Huang Ying authored May 18, 2010

Hardware Error Device (PNP0C33) is used to report some hardware errors
notified via SCI, mainly the corrected errors. Some APEI Generic
Hardware Error Source (GHES) may use SCI on hardware error device to
notify hardware error to kernel.

After receiving notification from ACPI core, it is forwarded to all
listeners via a notifier chain. The listener such as APEI GHES should
check corresponding error source for new events when notified.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

801eab81

ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup · affb72c3

Huang Ying authored May 18, 2010

Now, a dedicated HEST tabling parsing code is used for PCIE AER
firmware_first setup. It is rebased on general HEST tabling parsing
code of APEI. The firmware_first setup code is moved from PCI core to
AER driver too, because it is only AER related.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Len Brown <len.brown@intel.com>

affb72c3

ACPI, APEI, Document for APEI · ea8c071c

Huang Ying authored May 18, 2010

Add document for APEI, including kernel parameters and EINJ debug file
sytem interface.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

ea8c071c

ACPI, APEI, EINJ support · e4021345

Huang Ying authored May 18, 2010

EINJ provides a hardware error injection mechanism, this is useful for
debugging and testing of other APEI and RAS features.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

e4021345

ACPI, APEI, HEST table parsing · 9dc96664

Huang Ying authored May 18, 2010

HEST describes error sources in detail; communicating operational
parameters (i.e. severity levels, masking bits, and threshold values)
to OS as necessary. It also allows the platform to report error
sources for which OS would typically not implement support (for
example, chipset-specific error registers).

HEST information may be needed by other subsystems. For example, HEST
PCIE AER error source information describes whether a PCIE root port
works in "firmware first" mode, this is needed by general PCIE AER
error subsystem. So a public HEST tabling parsing interface is
provided.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

9dc96664

ACPI, APEI, APEI supporting infrastructure · a643ce20

Huang Ying authored May 18, 2010

APEI stands for ACPI Platform Error Interface, which allows to report
errors (for example from the chipset) to the operating system. This
improves NMI handling especially. In addition it supports error
serialization and error injection.

For more information about APEI, please refer to ACPI Specification
version 4.0, chapter 17.

This patch provides some common functions used by more than one APEI
tables, mainly framework of interpreter for EINJ and ERST.

A machine readable language is defined for EINJ and ERST for OS to
execute, and so to drive the firmware to fulfill the corresponding
functions. The machine language for EINJ and ERST is compatible, so a
common framework is defined for them.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

a643ce20

19 May, 2010 1 commit

ACPI, IO memory pre-mapping and atomic accessing · 15651291

Huang Ying authored May 18, 2010

Some ACPI IO accessing need to be done in atomic context. For example,
APEI ERST operations may be used for permanent storage in hardware
error handler. That is, it may be called in atomic contexts such as
IRQ or NMI, etc. And, ERST/EINJ implement their operations via IO
memory/port accessing. But the IO memory accessing method provided by
ACPI (acpi_read/acpi_write) maps the IO memory during it is accessed,
so it can not be used in atomic context. To solve the issue, the IO
memory should be pre-mapped during EINJ/ERST initializing. A linked
list is used to record which memory area has been mapped, when memory
is accessed in hardware error handler, search the linked list for the
mapped virtual address from the given physical address.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

15651291

16 May, 2010 6 commits

Linus 2.6.34 · e40152ee
Linus Torvalds authored May 16, 2010

e40152ee

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · b5dbc858

Linus Torvalds authored May 16, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  rtnetlink: make SR-IOV VF interface symmetric
  sctp: delete active ICMP proto unreachable timer when free transport
  tcp: fix MD5 (RFC2385) support

b5dbc858

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · d34e14f6

Linus Torvalds authored May 16, 2010

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Oprofile: Fix Loongson irq handler
  MIPS: N32: Use compat version for sys_ppoll.
  MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1

d34e14f6

rtnetlink: make SR-IOV VF interface symmetric · c02db8c6

Chris Wright authored May 16, 2010

Now we have a set of nested attributes:

  IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
      IFLA_VF_MAC
      IFLA_VF_VLAN
      IFLA_VF_TX_RATE

This allows a single set to operate on multiple attributes if desired.
Among other things, it means a dump can be replayed to set state.

The current interface has yet to be released, so this seems like
something to consider for 2.6.34.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c02db8c6

sctp: delete active ICMP proto unreachable timer when free transport · 55fa0cfd

Wei Yongjun authored May 09, 2010

transport may be free before ICMP proto unreachable timer expire, so
we should delete active ICMP proto unreachable timer when transport
is going away.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55fa0cfd

tcp: fix MD5 (RFC2385) support · 35790c04

Eric Dumazet authored May 16, 2010

TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.

We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.

Fix is to disable preemption and BH.
Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35790c04

15 May, 2010 17 commits

MIPS: Oprofile: Fix Loongson irq handler · 4e73238d

Wu Zhangjin authored May 07, 2010

    
    The interrupt enable bit for the performance counters is in the Control
    Register $24, not in the counter register.
    loongson2_perfcount_handler(), we need to use
Reported-by: Xu Hengyang <hengyang@mail.ustc.edu.cn>
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
    Cc: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/1198/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

---

4e73238d

MIPS: N32: Use compat version for sys_ppoll. · 46afb829

Chandrakala Chavva authored May 10, 2010

    
    The sys_ppoll() takes struct 'struct timespec'. This is different for the
    N32 and N64 ABIs. Use the compat version to do the proper conversions.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
    To: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/1210/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

---

46afb829

MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1 · 95e8f634

Shane McDonald authored May 06, 2010

In the FPU emulator code of the MIPS, the Cause bits of the FCSR register
are not currently writeable by the ctc1 instruction. In odd corner cases,
this can cause problems. For example, a case existed where a divide-by-zero
exception was generated by the FPU, and the signal handler attempted to
restore the FPU registers to their state before the exception occurred. In
this particular setup, writing the old value to the FCSR register would
cause another divide-by-zero exception to occur immediately. The solution
is to change the ctc1 instruction emulator code to allow the Cause bits of
the FCSR register to be writeable. This is the behaviour of the hardware
that the code is emulating.

This problem was found by Shane McDonald, but the credit for the fix goes
to Kevin Kissell. In Kevin's words:

I submit that the bug is indeed in that ctc_op: case of the emulator. The
Cause bits (17:12) are supposed to be writable by that instruction, but the
CTC1 emulation won't let them be updated by the instruction. I think that
actually if you just completely removed lines 387-388 [...] things would
work a good deal better. At least, it would be a more accurate emulation of
the architecturally defined FPU. If I wanted to be really, really pedantic
(which I sometimes do), I'd also protect the reserved bits that aren't
necessarily writable.
Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
To: anemo@mba.ocn.ne.jp
To: kevink@paralogos.com
To: sshtylyov@mvista.com
Patchwork: http://patchwork.linux-mips.org/patch/1205/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

---

95e8f634

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · 18e41da8

Linus Torvalds authored May 15, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: check for read permission on src file in the clone ioctl

18e41da8

lib/btree: fix possible NULL pointer dereference · 43aa7ac7

kirjanov@gmail.com authored May 15, 2010

mempool_alloc() can return null in atomic case.
Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

43aa7ac7

mmc: at91_mci: modify cache flush routines · bdef2fe8

Nicolas Ferre authored May 15, 2010

As we were using an internal dma flushing routine, this patch changes to
the DMA API flush_kernel_dcache_page().  Driver is able to compile now.

[akpm@linux-foundation.org: flush_kernel_dcache_page() comes before kunmap_atomic()]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bdef2fe8

Btrfs: check for read permission on src file in the clone ioctl · 5dc64164

Dan Rosenberg authored May 15, 2010

The existing code would have allowed you to clone a file that was
only open for writing
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5dc64164

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 3f8bf8f0

Linus Torvalds authored May 15, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  JFS: Free sbi memory in error path
  fs/sysv: dereferencing ERR_PTR()
  Fix double-free in logfs
  Fix the regression created by "set S_DEAD on unlink()..." commit

3f8bf8f0

Merge branch 'perf-fixes-for-linus' of... · c28f3f86

Linus Torvalds authored May 15, 2010

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf record: Add a fallback to the reference relocation symbol

c28f3f86

JFS: Free sbi memory in error path · 684bdc7f

Jan Blunck authored Apr 12, 2010

I spotted the missing kfree() while removing the BKL.

[akpm@linux-foundation.org: avoid multiple returns so it doesn't happen again]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

684bdc7f

fs/sysv: dereferencing ERR_PTR() · 404e7812

Dan Carpenter authored Apr 21, 2010

I moved the dir_put_page() inside the if condition so we don't dereference
"page", if it's an ERR_PTR().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

404e7812

Fix double-free in logfs · 26562449

Al Viro authored Apr 28, 2010

iput() is needed *until* we'd done successful d_alloc_root()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

26562449

Fix the regression created by "set S_DEAD on unlink()..." commit · d83c49f3

Al Viro authored Apr 30, 2010

1) i_flags simply doesn't work for mount/unlink race prevention;
we may have many links to file and rm on one of those obviously
shouldn't prevent bind on top of another later on.  To fix it
right way we need to mark _dentry_ as unsuitable for mounting
upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
i_mutex on the inode in question.  Set it (with dont_mount(dentry))
in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
in namespace.c that used to check for S_DEAD.  Setting S_DEAD
is still needed in places where we used to set it (for directories
getting killed), since we rely on it for readdir/rmdir race
prevention.

2) rename()/mount() protection has another bogosity - we unhash
the target before we'd checked that it's not a mountpoint.  Fixed.

3) ancient bogosity in pivot_root() - we locked i_mutex on the
right directory, but checked S_DEAD on the different (and wrong)
one.  Noticed and fixed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d83c49f3

Merge master.kernel.org:/home/rmk/linux-2.6-arm · bfcf1ae2

Linus Torvalds authored May 14, 2010

* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6126/1: ARM mpcore_wdt: fix build failure and other fixes
  ARM: 6125/1: ARM TWD: move TWD registers to common header
  ARM: 6110/1: Fix Thumb-2 kernel builds when UACCESS_WITH_MEMCPY is enabled
  ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
  ARM: 6111/1: Implement read/write for ownership in the ARMv6 DMA cache ops
  ARM: 6106/1: Implement copy_to_user_page() for noMMU
  ARM: 6105/1: Fix the __arm_ioremap_caller() definition in nommu.c

bfcf1ae2

Merge branch 'x86-fixes-for-linus' of... · ecbb458a

Linus Torvalds authored May 14, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mrst: Don't blindly access extended config space

ecbb458a

profile: fix stats and data leakage · 16a2164b

Hugh Dickins authored May 14, 2010

If the kernel is large or the profiling step small, /proc/profile
leaks data and readprofile shows silly stats, until readprofile -r
has reset the buffer: clear the prof_buffer when it is vmalloc()ed.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

16a2164b

hughd: update email address · bfcc6e2e

Hugh Dickins authored May 14, 2010

My old address will shut down in a couple of weeks: update the tree.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bfcc6e2e

14 May, 2010 3 commits

x86, mrst: Don't blindly access extended config space · e9b1d5d0

H. Peter Anvin authored May 14, 2010

Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform.  The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.

This fixes booting certain VMware configurations with CONFIG_MRST=y.

Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.
Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>

e9b1d5d0

Merge branch 'x86-fixes-for-linus' of... · ef0e9180

Linus Torvalds authored May 14, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
  x86, k8: Fix build error when K8_NB is disabled
  x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
  x86: Fix fake apicid to node mapping for numa emulation

ef0e9180

x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments · 7f284d3c

Frank Arnold authored Apr 22, 2010

When running a quest kernel on xen we get:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
 ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
<0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
<0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
 <IRQ>
 [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
 [<ffffffff81059140>] ? mod_timer+0x23/0x25
 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
 [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
 [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
 [<ffffffff810114eb>] ? default_idle+0x36/0x53
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81423a9a>] rest_init+0x7e/0x80
 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
 ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
 RSP <ffff880002203e08>
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---

The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.

Check if northbridge devices are present and do not enable the feature
if there are none.

[ hpa: backported to 2.6.34 ]
Signed-off-by: Frank Arnold <frank.arnold@amd.com>
LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>

7f284d3c