Commits · 83d435dd6e24b17955f91480f633bb39f0a7c2b4 · Kirill Smelkov / linux

02 Apr, 2013 3 commits

Fix example error_injection_tool · 83d435dd

Xie XiuQi authored Mar 29, 2013

I got a "sched_setaffinity:: Invalid argument" error when using
err_injection_tool to inject error on a system with over 32 cpus.

Error information when injecting an error on a system with over 32 cpus:
$ ./err_injection_tool -i
/sys/devices/system/cpu/cpu0/err_inject//err_type_info
Begine at Tue Mar 26 11:20:08 2013
Configurations:
On cpu32: loop=10, interval=5(s) err_type_info=4101,err_struct_info=95
Error sched_setaffinity:: Invalid argument
All done

This because there is overflow when calculating the cpumask: the
type of (1<<k) is int, while mask[j] is unsigned long. When k > 31,
(1<<k) is truncated to ZERO, resulting in a unexpected cpumask.
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

83d435dd

Fix build error for numa_clear_node() under IA64 · eee46b3d

Yijing Wang authored Mar 21, 2013

numa_clear_node() function is not implemented under IA64,
it will be called in unmap_cpu_on_node() in mm/memory_hotplug.c.
This cause build error under IA64, this patch adds numa_clear_node()
in IA64 to fix this problem.

[Added __cpuinit notation to numa_clear_node() to keep linker happy -Tony]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

eee46b3d

Fix initialization of CMCI/CMCP interrupts · d303e9e9

Tony Luck authored Mar 20, 2013

Back 2010 during a revamp of the irq code some initializations
were moved from ia64_mca_init() to ia64_mca_late_init() in

	commit c75f2aa1
	Cannot use register_percpu_irq() from ia64_mca_init()

But this was hideously wrong. First of all these initializations
are now down far too late. Specifically after all the other cpus
have been brought up and initialized their own CMC vectors from
smp_callin(). Also ia64_mca_late_init() may be called from any cpu
so the line:
	ia64_mca_cmc_vector_setup();       /* Setup vector on BSP */
is generally not executed on the BSP, and so the CMC vector isn't
setup at all on that processor.

Make use of the arch_early_irq_init() hook to get this code executed
at just the right moment: not too early, not too late.
Reported-by: Fred Hartnett <fred.hartnett@hp.com>
Tested-by: Fred Hartnett <fred.hartnett@hp.com>
Cc: stable@kernel.org # v2.6.37+
Signed-off-by: Tony Luck <tony.luck@intel.com>

d303e9e9

19 Mar, 2013 9 commits

Change "select DMAR" to "select INTEL_IOMMU" · 96edc754

Paul Bolle authored Mar 05, 2013

Commit d3f13810 ("iommu: Rename the DMAR and INTR_REMAP config
options") changed all references to DMAR in Kconfig files to INTEL_IOMMU
(and, likewise, changed the references to CONFIG_DMAR everywhere else
to CONFIG_INTEL_IOMMU). That commit missed one "select DMAR" statement
in ia64's Kconfig file. Change that one too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Tony Luck <tony.luck@intel.com>

96edc754

Wrong asm register contraints in the kvm implementation · de53e9ca

Stephan Schreiber authored Mar 19, 2013

The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/kvm/vtlb.c.

I observed this on Kernel 3.2.35 but it is also true on the most
recent Kernel 3.9-rc1.

File arch/ia64/kvm/vtlb.c:

u64 guest_vhpt_lookup(u64 iha, u64 *pte)
{
	u64 ret;
	struct thash_data *data;

	data = __vtr_lookup(current_vcpu, iha, D_TLB);
	if (data != NULL)
		thash_vhpt_insert(current_vcpu, data->page_flags,
			data->itir, iha, D_TLB);

	asm volatile (
			"rsm psr.ic|psr.i;;"
			"srlz.d;;"
			"ld8.s r9=[%1];;"
			"tnat.nz p6,p7=r9;;"
			"(p6) mov %0=1;"
			"(p6) mov r9=r0;"
			"(p7) extr.u r9=r9,0,53;;"
			"(p7) mov %0=r0;"
			"(p7) st8 [%2]=r9;;"
			"ssm psr.ic;;"
			"srlz.d;;"
			"ssm psr.i;;"
			"srlz.d;;"
			: "=r"(ret) : "r"(iha), "r"(pte):"memory");

	return ret;
}

The list of output registers is
			: "=r"(ret) : "r"(iha), "r"(pte):"memory");
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are iha, pte on the example.
If the predicate p7 is true, the 8th assembly instruction
			"(p7) mov %0=r0;"
is the first one which writes to a register which is maintained by the
register constraints; it sets %0. %0 means the first register operand;
it is ret here.
This instruction might overwrite the %2 register (pte) which is needed
by the next instruction:
			"(p7) st8 [%2]=r9;;"
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.

The attached patch  fixes the register operand constraints in
arch/ia64/kvm/vtlb.c.
The register constraints should be
			: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
The & means that GCC must not use any of the input registers to place
this output register in.

This is Debian bug#702639
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).

The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>

de53e9ca

Wrong asm register contraints in the futex implementation · 136f39dd

Stephan Schreiber authored Mar 19, 2013

The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/include/asm/futex.h.

I observed this on Kernel 3.2.23 but it is also true on the most
recent Kernel 3.9-rc1.

File arch/ia64/include/asm/futex.h:

static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
			      u32 oldval, u32 newval)
{
	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
		return -EFAULT;

	{
		register unsigned long r8 __asm ("r8");
		unsigned long prev;
		__asm__ __volatile__(
			"	mf;;					\n"
			"	mov %0=r0				\n"
			"	mov ar.ccv=%4;;				\n"
			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
			"[2:]"
			: "=r" (r8), "=r" (prev)
			: "r" (uaddr), "r" (newval),
			  "rO" ((long) (unsigned) oldval)
			: "memory");
		*uval = prev;
		return r8;
	}
}

The list of output registers is
			: "=r" (r8), "=r" (prev)
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are uaddr, newval, oldval on the
example.
The second assembly instruction
			"	mov %0=r0				\n"
is the first one which writes to a register; it sets %0 to 0. %0 means
the first register operand; it is r8 here. (The r0 is read-only and
always 0 on the Itanium; it can be used if an immediate zero value is
needed.)
This instruction might overwrite one of the other registers which are
still needed.
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.

The objdump utility can give us disassembly.
The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
look for a module that uses the funtion. This is the
cmpxchg_futex_value_locked() function in
kernel/futex.c:

static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
				      u32 uval, u32 newval)
{
	int ret;

	pagefault_disable();
	ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
	pagefault_enable();

	return ret;
}

Now the disassembly. At first from the Kernel package 3.2.23 which has
been compiled with GCC 4.4, remeber this Kernel seemed to work:
objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o

0000000000000230 <cmpxchg_futex_value_locked>:
      230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
      236:	80 40 0d 00 42 00 	            adds r8=40,r3
      23c:	00 00 04 00       	            nop.i 0x0;;
      240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
      246:	90 08 28 00 42 00 	            adds r9=1,r10
      24c:	00 00 04 00       	            nop.i 0x0;;
      250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
      256:	00 48 20 20 23 00 	            st4 [r8]=r9
      25c:	00 00 04 00       	            nop.i 0x0;;
      260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
      266:	00 00 00 02 00 00 	            nop.m 0x0
      26c:	02 08 f1 52       	            extr.u r16=r33,0,61
      270:	05 40 88 00 08 e0 	[MLX]       addp4 r8=r34,r0
      276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
      27c:	f1 f7 ff 65
      280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
      286:	00 00 00 02 00 c0 	            nop.m 0x0
      28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
      290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
      296:	00 00 00 02 00 40 	            nop.m 0x0
      29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
      2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
      2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
      2b6:	80 00 00 00 42 00 	            mov r8=r0
      2bc:	00 00 04 00       	            nop.i 0x0
      2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
      2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
      2cc:	00 00 04 00       	            nop.i 0x0;;
      2d0:	10 00 84 40 90 11 	[MIB]       st4 [r32]=r33
      2d6:	00 00 00 02 00 00 	            nop.i 0x0
      2dc:	20 00 00 40       	            br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
      2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
      2e6:	00 00 00 02 00 00 	            nop.m 0x0
      2ec:	00 00 04 00       	            nop.i 0x0;;
      2f0:	0b 58 20 1a 19 21 	[MMI]       adds r11=3208,r13;;
      2f6:	20 01 2c 20 20 00 	            ld4 r18=[r11]
      2fc:	00 00 04 00       	            nop.i 0x0;;
      300:	0b 88 fc 25 3f 23 	[MMI]       adds r17=-1,r18;;
      306:	00 88 2c 20 23 00 	            st4 [r11]=r17
      30c:	00 00 04 00       	            nop.i 0x0;;
      310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
      316:	00 00 00 02 00 80 	            nop.i 0x0
      31c:	08 00 84 00       	            br.ret.sptk.many b0;;

The lines
      2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
      2b6:	80 00 00 00 42 00 	            mov r8=r0
      2bc:	00 00 04 00       	            nop.i 0x0
      2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
      2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
      2cc:	00 00 04 00       	            nop.i 0x0;;
are the instructions of the assembly block.
The line
      2b6:	80 00 00 00 42 00 	            mov r8=r0
sets the r8 register to 0 and after that
      2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
is wrong.
What happened here is what I explained above: An input register is
overwritten which is still needed.
The register operand constraints in futex.h are wrong.

(The problem doesn't occur when the Kernel is compiled with GCC 4.6.)

The attached patch fixes the register operand constraints in futex.h.
The code after patching of it:

static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
			      u32 oldval, u32 newval)
{
	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
		return -EFAULT;

	{
		register unsigned long r8 __asm ("r8") = 0;
		unsigned long prev;
		__asm__ __volatile__(
			"	mf;;					\n"
			"	mov ar.ccv=%4;;				\n"
			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
			"[2:]"
			: "+r" (r8), "=&r" (prev)
			: "r" (uaddr), "r" (newval),
			  "rO" ((long) (unsigned) oldval)
			: "memory");
		*uval = prev;
		return r8;
	}
}

I also initialized the 'r8' var with the C programming language.
The _asm qualifier on the definition of the 'r8' var forces GCC to use
the r8 processor register for it.
I don't believe that we should use inline assembly for zeroing out a
local variable.
The constraint is
"+r" (r8)
what means that it is both an input register and an output register.
Note that the page fault handler will modify the r8 register which
will be the return value of the function.
The real fix is
"=&r" (prev)
The & means that GCC must not use any of the input registers to place
this output register in.

Patched the Kernel 3.2.23 and compiled it with GCC4.4:

0000000000000230 <cmpxchg_futex_value_locked>:
      230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
      236:	80 40 0d 00 42 00 	            adds r8=40,r3
      23c:	00 00 04 00       	            nop.i 0x0;;
      240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
      246:	90 08 28 00 42 00 	            adds r9=1,r10
      24c:	00 00 04 00       	            nop.i 0x0;;
      250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
      256:	00 48 20 20 23 00 	            st4 [r8]=r9
      25c:	00 00 04 00       	            nop.i 0x0;;
      260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
      266:	20 12 01 10 40 00 	            addp4 r34=r34,r0
      26c:	02 08 f1 52       	            extr.u r16=r33,0,61
      270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
      276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
      27c:	f1 f7 ff 65
      280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
      286:	00 00 00 02 00 c0 	            nop.m 0x0
      28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
      290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
      296:	00 00 00 02 00 40 	            nop.m 0x0
      29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
      2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
      2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
      2b0:	0b 00 00 00 22 00 	[MMI]       mf;;
      2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
      2bc:	00 00 04 00       	            nop.i 0x0;;
      2c0:	09 58 8c 42 11 10 	[MMI]       cmpxchg4.acq r11=[r33],r35,ar.ccv
      2c6:	00 00 00 02 00 00 	            nop.m 0x0
      2cc:	00 00 04 00       	            nop.i 0x0;;
      2d0:	10 00 2c 40 90 11 	[MIB]       st4 [r32]=r11
      2d6:	00 00 00 02 00 00 	            nop.i 0x0
      2dc:	20 00 00 40       	            br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
      2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
      2e6:	00 00 00 02 00 00 	            nop.m 0x0
      2ec:	00 00 04 00       	            nop.i 0x0;;
      2f0:	0b 88 20 1a 19 21 	[MMI]       adds r17=3208,r13;;
      2f6:	30 01 44 20 20 00 	            ld4 r19=[r17]
      2fc:	00 00 04 00       	            nop.i 0x0;;
      300:	0b 90 fc 27 3f 23 	[MMI]       adds r18=-1,r19;;
      306:	00 90 44 20 23 00 	            st4 [r17]=r18
      30c:	00 00 04 00       	            nop.i 0x0;;
      310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
      316:	00 00 00 02 00 80 	            nop.i 0x0
      31c:	08 00 84 00       	            br.ret.sptk.many b0;;

Much better.
There is a
      270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
which was generated by C code r8 = 0. Below
      2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
what means that oldval is no longer overwritten.

This is Debian bug#702641
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).

The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>

136f39dd

Remove cast for kmalloc return value · 7c13e0d1

Zhang Yanfei authored Mar 12, 2013

remove cast for kmalloc return value.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

7c13e0d1

Fix kexec oops when iosapic was removed · ffa90955

Hanjun Guo authored Mar 08, 2013

Iosapic hotplug was supported in IA64 code, but will lead to kexec oops
when iosapic was removed. here is the code logic:

iosapic_remove
  iosapic_free
    memset(&iosapic_lists[index], 0, sizeof(iosapic_lists[0]))
      iosapic_lists[index].addr was set to 0;

and then kexec a new kernel
kexec_disable_iosapic
  iosapic_write(rte->iosapic,..)
    __iosapic_write(iosapic->addr, reg, val);
      addr was set to 0 when iosapic_remove, and oops happened

The call trace is:
Starting new kernel
kexec[11336]: Oops 8804682956800 [1]
Modules linked in: raw(N) ipv6(N) acpi_cpufreq(N) binfmt_misc(N) fuse(N) nls_iso
8859_1(N) loop(N) ipmi_si(N) ipmi_devintf(N) ipmi_msghandler(N) mca_ereport(N) s
csi_ereport(N) nic_ereport(N) pcie_ereport(N) err_transport(N) nvlist(PN) dm_mod
(N) tpm_tis(N) tpm(N) ppdev(N) tpm_bios(N) serio_raw(N) i2c_i801(N) iTCO_wdt(N)
i2c_core(N) iTCO_vendor_support(N) sg(N) ioatdma(N) igb(N) mptctl(N) dca(N) parp
ort_pc(N) parport(N) container(N) button(N) usbhid(N) hid(N) uhci_hcd(N) ehci_hc
d(N) usbcore(N) sd_mod(N) crc_t10dif(N) ext3(N) mbcache(N) jbd(N) fan(N) process
or(N) ide_pci_generic(N) ide_core(N) ata_piix(N) libata(N) mptsas(N) mptscsih(N)
 mptbase(N) scsi_transport_sas(N) scsi_mod(N) thermal(N) thermal_sys(N) hwmon(N)

Supported: Yes, External

Pid: 11336, CPU 0, comm:                kexec
psr : 0000101009522030 ifs : 8000000000000791 ip  : [<a00000010004c160>]    Tain
ted: P          N  (2.6.32.12_RAS_V1R3C00B011)
ip is at kexec_disable_iosapic+0x120/0x1e0
unat: 0000000000000000 pfs : 0000000000000791 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a555a659
ldrs: 0000000000000000 ccv : 00000000ea3cf51e fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010004c150 b6  : a000000100012620 b7  : a00000010000cda0
f6  : 000000000000000000000 f7  : 1003e0000000002000000
f8  : 1003e0000000050000003 f9  : 1003e0000028fb97183cd
f10 : 1003ee9f380df3c548b67 f11 : 1003e00000000000000cc
r1  : a0000001016cf660 r2  : 0000000000000000 r3  : 0000000000000000
r8  : 0000001009526030 r9  : a000000100012620 r10 : e00000010053f600
r11 : c0000000fec34040 r12 : e00000078f76fd30 r13 : e00000078f760000
r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000
r17 : 0000000000000000 r18 : 0000000000007fff r19 : 0000000000000000
r20 : 0000000000000000 r21 : e00000010053f590 r22 : a000000100cf0000
r23 : 0000000000000036 r24 : e0000007002f8a84 r25 : 0000000000000022
r26 : e0000007002f8a88 r27 : 0000000000000020 r28 : 0000000000000002
r29 : a0000001012c8c60 r30 : 0000000000000000 r31 : 0000000000322e49

Call Trace:
 [<a000000100018ca0>] show_stack+0x80/0xa0
                                sp=e00000078f76f8f0 bsp=e00000078f761380
 [<a000000100019300>] show_regs+0x640/0x920
                                sp=e00000078f76fac0 bsp=e00000078f761328
 [<a00000010002a130>] die+0x190/0x2e0
                                sp=e00000078f76fad0 bsp=e00000078f7612e8
 [<a000000100922fa0>] ia64_do_page_fault+0x840/0xb20
                                sp=e00000078f76fad0 bsp=e00000078f761288
 [<a00000010000d5c0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e00000078f76fb60 bsp=e00000078f761288
 [<a00000010004c160>] kexec_disable_iosapic+0x120/0x1e0
                                sp=e00000078f76fd30 bsp=e00000078f761200
 [<a000000100016970>] machine_shutdown+0x110/0x140
                                sp=e00000078f76fd30 bsp=e00000078f7611c8
 [<a000000100133530>] kernel_kexec+0xd0/0x120
                                sp=e00000078f76fd30 bsp=e00000078f7611a0
 [<a0000001000eca40>] sys_reboot+0x480/0x4e0
                                sp=e00000078f76fd30 bsp=e00000078f761128
 [<a00000010000d420>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000078f76fe30 bsp=e00000078f761120
Kernel panic - not syncing: Fatal exception

With Tony and Toshi's advice, the patch removes the "rte" from rte_list
when the iosapic was removed.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

ffa90955

iosapic: fix a minor typo in comments · c74edea3

Hanjun Guo authored Mar 08, 2013

describeinterrupts -> describe interrupts
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

c74edea3

Add WB/UC check for early_ioremap · a4279e62

Li, Zhen-Hua authored Mar 18, 2013

On ia64 system, the function early_ioremap returned an uncached memory
reference without checking whether this was consistent with existing
mappings. This causes efi error and the kernel failed during boot.  Add a
check to test whether memory has EFI_MEMORY_WB set.  Use the function
kern_mem_attribute() in early_iomap() function to provide appropriate
cacheable or uncacheable mapped address.

See the document Documentation/ia64/aliasing.txt for more details.
Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

a4279e62

Fix broken fsys_getppid() · deb60015

Eric W. Biederman authored Mar 18, 2013

In particular fsys_getppid always returns the ppid in the initial pid
namespace so it does not work for a process in a pid namespace.

Fix from Eric Biederman just removes the fast system call path.
While it is a little bit sad to see another one of these bite
the dust ... I can't imagine that getppid() is really on any
real applications critical path.
Signed-off-by: Tony Luck <tony.luck@intel.com>

deb60015

tiocx: check retval from bus_register() · d7c6797f

Jiri Kosina authored Mar 19, 2013

Properly check return value from bus_register() and propagate it out of
tiocx_init() in case of failure.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>

d7c6797f

17 Mar, 2013 4 commits

Linux 3.9-rc3 · a937536b
Linus Torvalds authored Mar 17, 2013

a937536b

perf,x86: fix link failure for non-Intel configs · 6c4d3bc9

David Rientjes authored Mar 17, 2013

Commit 1d9d8639 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") introduces a link failure since
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:

	arch/x86/power/built-in.o: In function `restore_processor_state':
	(.text+0x45c): undefined reference to `perf_restore_debug_store'

Fix it by defining the dummy function appropriately.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6c4d3bc9

perf,x86: fix wrmsr_on_cpu() warning on suspend/resume · 2a6e06b2

Linus Torvalds authored Mar 17, 2013

Commit 1d9d8639 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.

init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.

This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2a6e06b2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 08637024

Linus Torvalds authored Mar 17, 2013

Pull btrfs fixes from Chris Mason:
 "Eric's rcu barrier patch fixes a long standing problem with our
  unmount code hanging on to devices in workqueue helpers.  Liu Bo
  nailed down a difficult assertion for in-memory extent mappings."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix warning of free_extent_map
  Btrfs: fix warning when creating snapshots
  Btrfs: return as soon as possible when edquot happens
  Btrfs: return EIO if we have extent tree corruption
  btrfs: use rcu_barrier() to wait for bdev puts at unmount
  Btrfs: remove btrfs_try_spin_lock
  Btrfs: get better concurrency for snapshot-aware defrag work

08637024

16 Mar, 2013 8 commits

Btrfs: fix warning of free_extent_map · 3b277594

Liu Bo authored Mar 15, 2013

Users report that an extent map's list is still linked when it's actually
going to be freed from cache.

The story is that

a) when we're going to drop an extent map and may split this large one into
smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
that it's on the list to be logged, then the smaller ems split from it will also
be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.

b) we'll keep ems from unlinking the list and freeing when they are flagged with
EXTENT_FLAG_LOGGING, because the log code holds one reference.

The end result is the warning, but the truth is that we set the flag
EXTENT_FLAG_LOGGING only during fsync.

So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

3b277594

Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · e2043785

Linus Torvalds authored Mar 15, 2013

Pull kbuild fix from Michal Marek:
 "One fix for for make headers_install/headers_check to not require make
  3.81.  The requirement has been accidentally introduced in 3.7."

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: fix make headers_check with make 3.80

e2043785

Merge tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux · 23659587

Linus Torvalds authored Mar 15, 2013

Pull OpenRISC bug fixes from Jonas Bonn:

 - The GPIO descriptor work has exposed how broken the non-GPIOLIB bits
   for OpenRISC were.  We now require GPIOLIB as this is the preferred
   way forward.

 - The system.h split introduced a bug in llist.h for arches using
   asm-generic/cmpxchg.h directly, which is currently only OpenRISC.
   The patch here moves two defines from asm-generic/atomic.h to
   asm-generic/cmpxchg.h to make things work as they should.

 - The VIRT_TO_BUS selector was added for OpenRISC, but OpenRISC does
   not have the virt_to_bus methods, so there's a patch to remove it
   again.

* tag 'for-3.9-rc3' of git://openrisc.net/jonas/linux:
  openrisc: remove HAVE_VIRT_TO_BUS
  asm-generic: move cmpxchg*_local defs to cmpxchg.h
  openrisc: require gpiolib

23659587

Merge tag 'char-misc-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 9e1a0aab

Linus Torvalds authored Mar 15, 2013

Pull char/misc fixes from Greg Kroah-Hartman:
 "Here are some tiny fixes for the w1 drivers and the final removal
  patch for getting rid of CONFIG_EXPERIMENTAL (all users of it are now
  gone from your tree, this just drops the Kconfig item itself.)

  All have been in the linux-next tree for a while"

* tag 'char-misc-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  final removal of CONFIG_EXPERIMENTAL
  w1: fix oops when w1_search is called from netlink connector
  w1-gpio: fix unused variable warning
  w1-gpio: remove erroneous __exit and __exit_p()
  ARM: w1-gpio: fix erroneous gpio requests

9e1a0aab

Merge tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5cd8846c

Linus Torvalds authored Mar 15, 2013

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes, as expected for the middle rc:
   - A couple of fixes for potential NULL dereferences and out-of-range
     array accesses revealed by static code parsers
   - A fix for the wrong error handling detected by trinity
   - A regression fix for missing audio on some MacBooks
   - CA0132 DSP loader fixes
   - Fix for EAPD control of IDT codecs on machines w/o speaker
   - Fix a regression in the HD-audio widget list parser code
   - Workaround for the NuForce UDH-100 USB audio"

* tag 'sound-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix missing EAPD/GPIO setup for Cirrus codecs
  sound: sequencer: cap array index in seq_chn_common_event()
  ALSA: hda/ca0132 - Remove extra setting of dsp_state.
  ALSA: hda/ca0132 - Check download state of DSP.
  ALSA: hda/ca0132 - Check if dspload_image succeeded.
  ALSA: hda - Disable IDT eapd_switch if there are no internal speakers
  ALSA: hda - Fix snd_hda_get_num_raw_conns() to return a correct value
  ALSA: usb-audio: add a workaround for the NuForce UDH-100
  ALSA: asihpi - fix potential NULL pointer dereference
  ALSA: seq: Fix missing error handling in snd_seq_timer_open()

5cd8846c

Merge branch 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping · c7f17deb

Linus Torvalds authored Mar 15, 2013

Pull DMA-mapping fix from Marek Szyprowski:
 "An important fix for all ARM architectures which use ZONE_DMA.
  Without it dma_alloc_* calls with GFP_ATOMIC flag might have allocated
  buffers outsize DMA zone."

* 'fixes-for-3.9' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: DMA-mapping: add missing GFP_DMA flag for atomic buffer allocation

c7f17deb

Merge tag 'mfd-fixes-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes · de1893f6

Linus Torvalds authored Mar 15, 2013

Pull MFD fixes from Samuel Ortiz:
 "This is the first batch of MFD fixes for 3.9.

  With this one we have:

   - An ab8500 build failure fix.
   - An ab8500 device tree parsing fix.
   - A fix for twl4030_madc remove routine to work properly (when
     built-in).
   - A fix for properly registering palmas interrupt handler.
   - A fix for omap-usb init routine to actually write into the
     hostconfig register.
   - A couple of warning fixes for ab8500-gpadc and tps65912"

* tag 'mfd-fixes-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
  mfd: twl4030-madc: Remove __exit_p annotation
  mfd: ab8500: Kill "reg" property from binding
  mfd: ab8500-gpadc: Complain if we fail to enable vtvout LDO
  mfd: wm831x: Don't forward declare enum wm831x_auxadc
  mfd: twl4030-audio: Fix argument type for twl4030_audio_disable_resource()
  mfd: tps65912: Declare and use tps65912_irq_exit()
  mfd: palmas: Provide irq flags through DT/platform data
  mfd: Make AB8500_CORE select POWER_SUPPLY to fix build error
  mfd: omap-usb-host: Actually update hostconfig

de1893f6

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 92fbb1c9

Linus Torvalds authored Mar 15, 2013

Pull hwmon fixes from Guenter Roeck:
 "Bug fixes for pmbus, ltc2978, and lineage-pem drivers

  Added specific maintainer for some hwmon drivers"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pmbus/ltc2978) Fix temperature reporting
  hwmon: (pmbus) Fix krealloc() misuse in pmbus_add_attribute()
  hwmon: (lineage-pem) Add missing terminating entry for pem_[input|fan]_attributes
  MAINTAINERS: Add maintainer for MAX6697, INA209, and INA2XX drivers

92fbb1c9

15 Mar, 2013 8 commits

perf,x86: fix kernel crash with PEBS/BTS after suspend/resume · 1d9d8639

Stephane Eranian authored Mar 15, 2013

This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1d9d8639

ALSA: hda - Fix missing EAPD/GPIO setup for Cirrus codecs · 6d3073e1

Takashi Iwai authored Mar 15, 2013

During the transition to the generic parser, the hook to the codec
specific automute function was forgotten.  This resulted in the silent
output on some MacBooks.
Signed-off-by: Takashi Iwai <tiwai@suse.de>

6d3073e1

sound: sequencer: cap array index in seq_chn_common_event() · 57220bc1

Dan Carpenter authored Mar 15, 2013

"chn" here is a number between 0 and 255, but ->chn_info[] only has
16 elements so there is a potential write beyond the end of the
array.

If the seq_mode isn't SEQ_2 then we let the individual drivers
(either opl3.c or midi_synth.c) handle it.  Those functions all
do a bounds check on "chn" so I haven't changed anything here.
The opl3.c driver has up to 18 channels and not 16.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

57220bc1

mfd: twl4030-madc: Remove __exit_p annotation · 03715410

Arnd Bergmann authored Mar 14, 2013

4740f73f "mfd: remove use of __devexit" removed the __devexit annotation
on the twl4030_madc_remove function, but left an __exit_p() present on the
pointer to this function. Using __exit_p was as wrong with the devexit in
place as it is now, but now we get a gcc warning about an unused function.

In order for the twl4030_madc_remove to work correctly in built-in code, we
have to remove the __exit_p.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

03715410

ALSA: hda/ca0132 - Remove extra setting of dsp_state. · b714a710

Dylan Reid authored Mar 14, 2013

spec->dsp_state is initialized to DSP_DOWNLOAD_INIT, no need to reset
and check it in ca0132_download_dsp().
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

b714a710

ALSA: hda/ca0132 - Check download state of DSP. · e8f1bd5d

Dylan Reid authored Mar 14, 2013

Instead of using the dspload_is_loaded() function, check the dsp_state
that is kept in the spec.  The dspload_is_loaded() function returns
true if the DSP transfer was never started.  This false-positive leads
to multiple second delays when ca0132_setup_efaults() times out on
each write.
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

e8f1bd5d

ALSA: hda/ca0132 - Check if dspload_image succeeded. · d1d28500

Dylan Reid authored Mar 14, 2013

If dspload_image() fails, it was ignored and dspload_wait_loaded() was
still called.  dsp_loaded should never be set to true in this case,
skip it.  The check in dspload_wait_loaded() return true if the DSP is
loaded or if it never started.
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

d1d28500

mm/fremap.c: fix possible oops on error path · a2362d24

Michel Lespinasse authored Mar 14, 2013

The vm_flags introduced in 6d7825b1 ("mm/fremap.c: fix oops on error
path") is supposed to avoid a compiler warning about unitialized
vm_flags without changing the generated code.

However I am concerned that this is going to be very brittle, and fail
with some compiler versions. The failure could be either of:

- compiler could actually load vma->vm_flags before checking for the
  !vma condition, thus reintroducing the oops

- compiler could optimize out the !vma check, since the pointer just got
  dereferenced shortly before (so the compiler knows it can't be NULL!)

I propose reversing this part of the change and initializing vm_flags to 0
just to avoid the bogus uninitialized use warning.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a2362d24

14 Mar, 2013 8 commits

Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · f4846e52

Linus Torvalds authored Mar 14, 2013

Pull fix for hlist_entry_safe() regression from Paul McKenney:
 "This contains a single commit that fixes a regression in
  hlist_entry_safe().  This macro references its argument twice, which
  can cause NULL-pointer errors.  This commit applies a gcc statement
  expression, creating a temporary variable to avoid the double
  reference.  This has been posted to LKML at

    https://lkml.org/lkml/2013/3/9/75.

  Kudos to CAI Qian, whose testing uncovered this, to Eric Dumazet, who
  spotted root cause, and to Li Zefan, who tested this commit."

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  list: Fix double fetch of pointer in hlist_entry_safe()

f4846e52

list: Fix double fetch of pointer in hlist_entry_safe() · f65846a1

Paul E. McKenney authored Mar 09, 2013

The current version of hlist_entry_safe() fetches the pointer twice,
once to test for NULL and the other to compute the offset back to the
enclosing structure.  This is OK for normal lock-based use because in
that case, the pointer cannot change.  However, when the pointer is
protected by RCU (as in "rcu_dereference(p)"), then the pointer can
change at any time.  This use case can result in the following sequence
of events:

1.	CPU 0 invokes hlist_entry_safe(), fetches the RCU-protected
	pointer as sees that it is non-NULL.

2.	CPU 1 invokes hlist_del_rcu(), deleting the entry that CPU 0
	just fetched a pointer to.  Because this is the last entry
	in the list, the pointer fetched by CPU 0 is now NULL.

3.	CPU 0 refetches the pointer, obtains NULL, and then gets a
	NULL-pointer crash.

This commit therefore applies gcc's "({ })" statement expression to
create a temporary variable so that the specified pointer is fetched
only once, avoiding the above sequence of events.  Please note that
it is the caller's responsibility to use rcu_dereference() as needed.
This allows RCU-protected uses to work correctly without imposing
any additional overhead on the non-RCU case.

Many thanks to Eric Dumazet for spotting root cause!
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Li Zefan <lizefan@huawei.com>

f65846a1

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 40e4591d

Linus Torvalds authored Mar 14, 2013

Pull ext2, ext3, reiserfs, quota fixes from Jan Kara:
 "A fix for regression in ext2, and a format string issue in ext3.  The
  rest isn't too serious."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Fix BUG_ON in evict() on inode deletion
  reiserfs: Use kstrdup instead of kmalloc/strcpy
  ext3: Fix format string issues
  quota: add missing use of dq_data_lock in __dquot_initialize

40e4591d

Btrfs: fix warning when creating snapshots · 7c2ec3f0

Liu Bo authored Mar 13, 2013

Creating snapshot passes extent_root to commit its transaction,
but it can lead to the warning of checking root for quota in
the __btrfs_end_transaction() when someone else is committing
the current transaction.  Since we've recorded the needed root
in trans_handle, just use it to get rid of the warning.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

7c2ec3f0

Btrfs: return as soon as possible when edquot happens · 720f1e20

Wang Shilong authored Mar 06, 2013

If one of qgroup fails to reserve firstly, we should return immediately,
it is unnecessary to continue check.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

720f1e20

Btrfs: return EIO if we have extent tree corruption · 492104c8

Josef Bacik authored Mar 08, 2013

The callers of lookup_inline_extent_info all handle getting an error back
properly, so return an error if we have corruption instead of being a jerk and
panicing.  Still WARN_ON() since this is kind of crucial and I've been seeing it
a bit too much recently for my taste, I think we're doing something wrong
somewhere.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

492104c8

btrfs: use rcu_barrier() to wait for bdev puts at unmount · bc178622

Eric Sandeen authored Mar 09, 2013

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
	__btrfs_close_devices
		call_rcu(&device->rcu, free_device);
			free_device
				INIT_WORK(&device->rcu_work, __free_device);
				schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

bc178622

Btrfs: remove btrfs_try_spin_lock · d340d247

Liu Bo authored Mar 11, 2013

Remove a useless function declaration
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

d340d247