- 25 Jan, 2016 30 commits
-
-
Ashok Raj authored
commit d90167a9 upstream. Intel's MCA implementation broadcasts MCEs to all CPUs on the node. This poses a problem for offlined CPUs which cannot participate in the rendezvous process: Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler Kernel Offset: disabled Rebooting in 100 seconds.. More specifically, Linux does a soft offline of a CPU when writing a 0 to /sys/devices/system/cpu/cpuX/online, which doesn't prevent the #MC exception from being broadcasted to that CPU. Ensure that offline CPUs don't participate in the MCE rendezvous and clear the RIP valid status bit so that a second MCE won't cause a shutdown. Without the patch, mce_start() will increment mce_callin and wait for all CPUs. Offlined CPUs should avoid participating in the rendezvous process altogether. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Alan Stern authored
commit e50293ef upstream. Commit 8520f380 ("USB: change hub initialization sleeps to delayed_work") changed the hub_activate() routine to make part of it run in a workqueue. However, the commit failed to take a reference to the usb_hub structure or to lock the hub interface while doing so. As a result, if a hub is plugged in and quickly unplugged before the work routine can run, the routine will try to access memory that has been deallocated. Or, if the hub is unplugged while the routine is running, the memory may be deallocated while it is in active use. This patch fixes the problem by taking a reference to the usb_hub at the start of hub_activate() and releasing it at the end (when the work is finished), and by locking the hub interface while the work routine is running. It also adds a check at the start of the routine to see if the hub has already been disconnected, in which nothing should be done. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Alexandru Cornea <alexandru.cornea@intel.com> Tested-by: Alexandru Cornea <alexandru.cornea@intel.com> Fixes: 8520f380 ("USB: change hub initialization sleeps to delayed_work") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ luis: backported to 3.16: - Added forward declaration of hub_release() which mainline had with commit 32a69589 ("usb: hub: convert khubd into workqueue") ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Dan Carpenter authored
commit abdc9a3b upstream. The code expects the loop to end with "retries" set to zero but, because it is a post-op, it will end set to -1. I have fixed this by moving the decrement inside the loop. Fixes: 014aa2a3 ('USB: ipaq: minor ipaq_open() cleanup.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Konrad Rzeszutek Wilk authored
commit 408fb0e5 upstream. commit f598282f ("PCI: Fix the NIU MSI-X problem in a better way") teaches us that dealing with MSI-X can be troublesome. Further checks in the MSI-X architecture shows that if the PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we may not be able to access the BAR (since they are memory regions). Since the MSI-X tables are located in there.. that can lead to us causing PCIe errors. Inhibit us performing any operation on the MSI-X unless the MEMORY bit is set. Note that Xen hypervisor with: "x86/MSI-X: access MSI-X table only after having enabled MSI-X" will return: xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3! When the generic MSI code tries to setup the PIRQ without MEMORY bit set. Which means with later versions of Xen (4.6) this patch is not neccessary. This is part of XSA-157 Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Konrad Rzeszutek Wilk authored
commit 7cfb905b upstream. Otherwise just continue on, returning the same values as previously (return of 0, and op->result has the PIRQ value). This does not change the behavior of XEN_PCI_OP_disable_msi[|x]. The pci_disable_msi or pci_disable_msix have the checks for msi_enabled or msix_enabled so they will error out immediately. However the guest can still call these operations and cause us to disable the 'ack_intr'. That means the backend IRQ handler for the legacy interrupt will not respond to interrupts anymore. This will lead to (if the device is causing an interrupt storm) for the Linux generic code to disable the interrupt line. Naturally this will only happen if the device in question is plugged in on the motherboard on shared level interrupt GSI. This is part of XSA-157 Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Konrad Rzeszutek Wilk authored
commit a396f3a2 upstream. Otherwise an guest can subvert the generic MSI code to trigger an BUG_ON condition during MSI interrupt freeing: for (i = 0; i < entry->nvec_used; i++) BUG_ON(irq_has_action(entry->irq + i)); Xen PCI backed installs an IRQ handler (request_irq) for the dev->irq whenever the guest writes PCI_COMMAND_MEMORY (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is done in case the device has legacy interrupts the GSI line is shared by the backend devices. To subvert the backend the guest needs to make the backend to change the dev->irq from the GSI to the MSI interrupt line, make the backend allocate an interrupt handler, and then command the backend to free the MSI interrupt and hit the BUG_ON. Since the backend only calls 'request_irq' when the guest writes to the PCI_COMMAND register the guest needs to call XEN_PCI_OP_enable_msi before any other operation. This will cause the generic MSI code to setup an MSI entry and populate dev->irq with the new PIRQ value. Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY and cause the backend to setup an IRQ handler for dev->irq (which instead of the GSI value has the MSI pirq). See 'xen_pcibk_control_isr'. Then the guest disables the MSI: XEN_PCI_OP_disable_msi which ends up triggering the BUG_ON condition in 'free_msi_irqs' as there is an IRQ handler for the entry->irq (dev->irq). Note that this cannot be done using MSI-X as the generic code does not over-write dev->irq with the MSI-X PIRQ values. The patch inhibits setting up the IRQ handler if MSI or MSI-X (for symmetry reasons) code had been called successfully. P.S. Xen PCIBack when it sets up the device for the guest consumption ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device). XSA-120 addendum patch removed that - however when upstreaming said addendum we found that it caused issues with qemu upstream. That has now been fixed in qemu upstream. This is part of XSA-157 Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Konrad Rzeszutek Wilk authored
commit 5e0ce145 upstream. The guest sequence of: a) XEN_PCI_OP_enable_msix b) XEN_PCI_OP_enable_msix results in hitting an NULL pointer due to using freed pointers. The device passed in the guest MUST have MSI-X capability. The a) constructs and SysFS representation of MSI and MSI groups. The b) adds a second set of them but adding in to SysFS fails (duplicate entry). 'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that in a) pdev->msi_irq_groups is still set) and also free's ALL of the MSI-X entries of the device (the ones allocated in step a) and b)). The unwind code: 'free_msi_irqs' deletes all the entries and tries to delete the pdev->msi_irq_groups (which hasn't been set to NULL). However the pointers in the SysFS are already freed and we hit an NULL pointer further on when 'strlen' is attempted on a freed pointer. The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard against that. The check for msi_enabled is not stricly neccessary. This is part of XSA-157 Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Konrad Rzeszutek Wilk authored
commit 56441f3c upstream. The guest sequence of: a) XEN_PCI_OP_enable_msi b) XEN_PCI_OP_enable_msi c) XEN_PCI_OP_disable_msi results in hitting an BUG_ON condition in the msi.c code. The MSI code uses an dev->msi_list to which it adds MSI entries. Under the above conditions an BUG_ON() can be hit. The device passed in the guest MUST have MSI capability. The a) adds the entry to the dev->msi_list and sets msi_enabled. The b) adds a second entry but adding in to SysFS fails (duplicate entry) and deletes all of the entries from msi_list and returns (with msi_enabled is still set). c) pci_disable_msi passes the msi_enabled checks and hits: BUG_ON(list_empty(dev_to_msi_list(&dev->dev))); and blows up. The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard against that. The check for msix_enabled is not stricly neccessary. This is part of XSA-157. Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Konrad Rzeszutek Wilk authored
commit 8135cf8b upstream. Double fetch vulnerabilities that happen when a variable is fetched twice from shared memory but a security check is only performed the first time. The xen_pcibk_do_op function performs a switch statements on the op->cmd value which is stored in shared memory. Interestingly this can result in a double fetch vulnerability depending on the performed compiler optimization. This patch fixes it by saving the xen_pci_op command before processing it. We also use 'barrier' to make sure that the compiler does not perform any optimization. This is part of XSA155. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Roger Pau Monné authored
commit 18779149 upstream. Since indirect descriptors are in memory shared with the frontend, the frontend could alter the first_sect and last_sect values after they have been validated but before they are recorded in the request. This may result in I/O requests that overflow the foreign page, possibly overwriting local pages when the I/O request is executed. When parsing indirect descriptors, only read first_sect and last_sect once. This is part of XSA155. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [ luis: backported to 3.16: - Use ACCESS_ONCE instead of READ_ONCE - Use PAGE_SIZE instead of XEN_PAGE_SIZE ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Roger Pau Monné authored
commit 1f13d75c upstream. A compiler may load a switch statement value multiple times, which could be bad when the value is in memory shared with the frontend. When converting a non-native request to a native one, ensure that src->operation is only loaded once by using READ_ONCE(). This is part of XSA155. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [ luis: backported to 3.16: - replaced READ_ONCE() by ACCESS_ONCE() ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
David Vrabel authored
commit 68a33bfd upstream. Instead of open-coding memcpy()s and directly accessing Tx and Rx requests, use the new RING_COPY_REQUEST() that ensures the local copy is correct. This is more than is strictly necessary for guest Rx requests since only the id and gref fields are used and it is harmless if the frontend modifies these. This is part of XSA155. Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [ kamal: backport to 3.13-stable: context (s/queue/vif/) ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
David Vrabel authored
commit 0f589967 upstream. The last from guest transmitted request gives no indication about the minimum amount of credit that the guest might need to send a packet since the last packet might have been a small one. Instead allow for the worst case 128 KiB packet. This is part of XSA155. Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [ kamal: backport to 3.13-stable: context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
David Vrabel authored
commit 454d5d88 upstream. Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly (i.e., by not considering that the other end may alter the data in the shared ring while it is being inspected). Safe usage of a request generally requires taking a local copy. Provide a RING_COPY_REQUEST() macro to use instead of RING_GET_REQUEST() and an open-coded memcpy(). This takes care of ensuring that the copy is done correctly regardless of any possible compiler optimizations. Use a volatile source to prevent the compiler from reordering or omitting the copy. This is part of XSA155. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Michael Holzheu authored
commit 272fa59c upstream. The print_insn() function returns strings like "lghi %r1,0". To escape the '%' character in sprintf() a second '%' is used. For example "lghi %%r1,0" is converted into "lghi %r1,0". After print_insn() the output string is passed to printk(). Because format specifiers like "%r" or "%f" are ignored by printk() this works by chance most of the time. But for instructions with control registers like "lctl %c6,%c6,780" this fails because printk() interprets "%c" as character format specifier. Fix this problem and escape the '%' characters twice. For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf() into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780". Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ luis: backported to 3.16: - drop condition with OPERAND_VR introduced only with commit 3585cb02 ("s390/disassembler: add vector instructions") ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Xiong Zhang authored
commit 3e6db33a upstream. It takes three minutes to enter into hibernation on some OEM SKL machines and we see many codec spurious response after thaw() opertion. This is because HDA is still in D0 state after freeze() call and pci_pm_freeze/pci_pm_freeze_noirq() don't set D3 hot in pci_bus driver. It seems bios still access HDA when system enter into freeze state, HDA will receive codec response interrupt immediately after thaw() call. Because of this unexpected interrupt, HDA enter into a abnormal state and slow down the system enter into hibernation. In this patch, we put HDA into D3 hot state in azx_freeze_noirq() and put HDA into D0 state in azx_thaw_noirq(). V2: Only apply this fix to SKL+ Fix compile error when CONFIG_PM_SLEEP isn't defined [Yet another fix for CONFIG_PM_SLEEP ifdef and the additional comment by tiwai] Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Vineet Gupta authored
commit 323f41f9 upstream. ARC dwarf unwinder only supports CIE version == 1 The boot time dwarf sanitizer (part of binary lookup table constructor) would simply bail if it saw CIE version == 3, rendering unwinder with a NULL lookup table. It seems libgcc linked with kernel does have such entries. With fallback linear search removed, and a NULL binary lookup table, unwinder fails to generate any stack trace. So allow graceful ignoring of unsupported CIE entries. This problem was initially seen in Alexey's setup (and not mine) as he was using buildroot built toolchain (libgcc) which doesn't get built with CFLAGS_FOR_TARGET="-gdwarf-2 which is my default Fixes STAR 9000985048: "kernel unwinder broken with stock tools" Fixes: 2e22502c ARC: dw2 unwind: Remove falllback linear search thru FDE entries Reported-by Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Vineet Gupta authored
commit bc79c9a7 upstream. The fix which removed linear searching of dwarf (because binary lookup data always exists) missed out on the fact that modules don't get the binary lookup tables info. This caused unwinding out of modules to stop working. So add binary lookup header setup (equivalent of eh_frame_hdr setup) to modules as well. While at it, confine the header setup to within unwinder code, reducing one API exposed out of unwinder code. Fixes: 2e22502c ARC: dw2 unwind: Remove falllback linear search thru FDE entries Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Steven Rostedt (Red Hat) authored
commit a50bd439 upstream. Russell King found that he had weird side effects when compiling the kernel with hard linked ccache. The reason was that recordmcount modified the kernel in place via mmap, and when a file gets modified twice by recordmcount, it will complain about it. To fix this issue, Russell wrote a patch that checked if the file was hard linked more than once and would unlink it if it was. Linus Torvalds was not happy with the fact that recordmcount does this in place modification. Instead of doing the unlink only if the file has two or more hard links, it does the unlink all the time. In otherwords, it always does a copy if it changed something. That is, it does the write out if a change was made. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Russell King authored
commit dd39a265 upstream. recordmcount edits the file in-place, which can cause problems when using ccache in hardlink mode. Arrange for recordmcount to break a hardlinked object. Link: http://lkml.kernel.org/r/E1a7MVT-0000et-62@rmk-PC.arm.linux.org.ukSigned-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Johan Hovold authored
commit 157f38f9 upstream. Fix parent-device reference leak due to SPI-core taking an unnecessary reference to the parent when allocating the master structure, a reference that was never released. Note that driver core takes its own reference to the parent when the master device is registered. Fixes: 49dce689 ("spi doesn't need class_device") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Anson Huang authored
commit fa0708b3 upstream. In cpu_v7_do_suspend routine, r11 is used while it is NOT saved/restored, different compiler may have different usage of ARM general registers, so it may cause issues during calling cpu_v7_do_suspend. We meet kernel fault occurs when using GCC 4.8.3, r11 contains valid value before calling into cpu_v7_do_suspend, but when returned from this routine, r11 is corrupted and lead to kernel fault. Doing save/restore for those corrupted registers is a must in assemble code. Signed-off-by: Anson Huang <Anson.Huang@freescale.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Anssi Hannula authored
commit 42e3121d upstream. AudioQuest DragonFly DAC reports a volume control range of 0..50 (0x0000..0x0032) which in USB Audio means a range of 0 .. 0.2dB, which is obviously incorrect and would cause software using the dB information in e.g. volume sliders to have a massive volume difference in 100..102% range. Commit 2d1cb7f6 ("ALSA: usb-audio: add dB range mapping for some devices") added a dB range mapping for it with range 0..50 dB. However, the actual volume mapping seems to be neither linear volume nor linear dB scale, but instead quite close to the cubic mapping e.g. alsamixer uses, with a range of approx. -53...0 dB. Replace the previous quirk with a custom dB mapping based on some basic output measurements, using a 10-item range TLV (which will still fit in alsa-lib MAX_TLV_RANGE_SIZE). Tested on AudioQuest DragonFly HW v1.2. The quirk is only applied if the range is 0..50, so if this gets fixed/changed in later HW revisions it will no longer be applied. v2: incorporated Takashi Iwai's suggestion for the quirk application method Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Signed-off-by: Takashi Iwai <tiwai@suse.de> [ kamal: backport to 3.13-stable: use snd_printk instead of usb_audio_info ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Thomas Gleixner authored
commit abc7e40c upstream. If a interrupt chip utilizes chip->buslock then free_irq() can deadlock in the following way: CPU0 CPU1 interrupt(X) (Shared or spurious) free_irq(X) interrupt_thread(X) chip_bus_lock(X) irq_finalize_oneshot(X) chip_bus_lock(X) synchronize_irq(X) synchronize_irq() waits for the interrupt thread to complete, i.e. forever. Solution is simple: Drop chip_bus_lock() before calling synchronize_irq() as we do with the irq_desc lock. There is nothing to be protected after the point where irq_desc lock has been released. This adds chip_bus_lock/unlock() to the remove_irq() code path, but that's actually correct in the case where remove_irq() is called on such an interrupt. The current users of remove_irq() are not affected as none of those interrupts is on a chip which requires buslock. Reported-by: Fredrik Markström <fredrik.markstrom@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Peter Hurley authored
commit 9ce119f3 upstream. A line discipline which does not define a receive_buf() method can can cause a GPF if data is ever received [1]. Oddly, this was known to the author of n_tracesink in 2011, but never fixed. [1] GPF report BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 3752d067 PUD 37a7b067 PMD 0 Oops: 0010 [#1] SMP KASAN Modules linked in: CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events_unbound flush_to_ldisc task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff88006db67b50 EFLAGS: 00010246 RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102 RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388 RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0 R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000 R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8 FS: 0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0 Stack: ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000 ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90 ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078 Call Trace: [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030 [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162 [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302 [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff88006db67b50> CR2: 0000000000000000 ---[ end trace a587f8947e54d6ea ]--- Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Peter Hurley authored
commit ac8f3bf8 upstream. commit 40d5e090 ("n_tty: Fix EOF push handling") fixed EOF push for reads. However, that approach still allows a condition mismatch between poll() and read(), where poll() returns POLLIN but read() blocks. This state can happen when a previous read() returned because the user buffer was full and the next character was an EOF not at the beginning of the line. While the next read() will properly identify the condition and advance the read buffer tail without improperly indicating an EOF file condition (ie., read() will not mistakenly return 0), poll() will mistakenly indicate POLLIN. Although a possible solution would be to peek at the input buffer in n_tty_poll(), the better solution in this patch is to eat the EOF during the previous read() (ie., fix the problem by eliminating the condition). The current canon line buffer copy limits the scan for next end-of-line to the smaller of either, a. the remaining user buffer size b. completed lines in the input buffer When the remaining user buffer size is exactly one less than the end-of-line marked by EOF push, the EOF is not scanned nor skipped but left for subsequent reads. In the example below, the scan index 'eol' has stopped at the EOF because it is past the scan limit of 5 (not because it has found the next set bit in read_flags) user buffer [*nr = 5] _ _ _ _ _ read_flags 0 0 0 0 0 1 input buffer h e l l o [EOF] ^ ^ / / tail eol result: found = 0, tail += 5, *nr += 5 Instead, allow the scan to peek ahead 1 byte (while still limiting the scan to completed lines in the input buffer). For the example above, result: found = 1, tail += 6, *nr += 5 Because the scan limit is now bumped +1 byte, when the scan is completed, the tail advance and the user buffer copy limit is re-clamped to *nr when EOF is _not_ found. Fixes: 40d5e090 ("n_tty: Fix EOF push handling") Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> [ kamal: backported to 3.13: adjusted context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Dmitry V. Levin authored
commit 2d33fa10 upstream. According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr has to be defined to 259, but it doesn't. Instead, it's defined to 269, which is of course used by another syscall, __NR_sched_setaffinity in this case. This bug was found by strace test suite. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Seth Jennings authored
commit 26bbe7ef upstream. Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for generic x86 64bit") introduced large block sizes for x86. This made it possible to have multiple sections per memory block where previously, there was a only every one section per block. Since blocks consist of contiguous ranges of section, there can be holes in the blocks where sections are not present. If one attempts to offline such a block, a crash occurs since the code is not designed to deal with this. This patch is a quick fix to gaurd against the crash by not allowing blocks with non-present sections to be offlined. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: Seth Jennings <sjennings@variantweb.net> Reported-by: Andrew Banman <abanman@sgi.com> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Russ Anderson <rja@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Naoya Horiguchi authored
commit 0d777df5 upstream. Currently at the beginning of hugetlb_fault(), we call huge_pte_offset() and check whether the obtained *ptep is a migration/hwpoison entry or not. And if not, then we get to call huge_pte_alloc(). This is racy because the *ptep could turn into migration/hwpoison entry after the huge_pte_offset() check. This race results in BUG_ON in huge_pte_alloc(). We don't have to call huge_pte_alloc() when the huge_pte_offset() returns non-NULL, so let's fix this bug with moving the code into else block. Note that the *ptep could turn into a migration/hwpoison entry after this block, but that's not a problem because we have another !pte_present check later (we never go into hugetlb_no_page() in that case.) Fixes: 290408d4 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Michal Hocko authored
commit 373ccbe5 upstream. Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99 ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tejun Heo <tj@kernel.org> Cc: Cristopher Lameter <clameter@sgi.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ben Hutchings <ben@decadent.org.uk> [ luis: backported to 3.16, based on Ben's backport to 3.2: - use queue_delayed_work instead of queue_delayed_work_on in function vmstat_update() - change start_cpu_timer() instead of vmstat_shepherd() - adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> [ kamal: backport to 3.13-stable: context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
- 22 Jan, 2016 10 commits
-
-
Mikulas Patocka authored
commit e46e31a3 upstream. When using the Promise TX2+ SATA controller on PA-RISC, the system often crashes with kernel panic, for example just writing data with the dd utility will make it crash. Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2 Backtrace: [<000000004021497c>] show_stack+0x14/0x20 [<0000000040410bf0>] dump_stack+0x88/0x100 [<000000004023978c>] panic+0x124/0x360 [<0000000040452c18>] sba_alloc_range+0x698/0x6a0 [<0000000040453150>] sba_map_sg+0x260/0x5b8 [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata] [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata] [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata] [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130 [<000000004049da34>] scsi_request_fn+0x6e4/0x970 [<00000000403e95a8>] __blk_run_queue+0x40/0x60 [<00000000403e9d8c>] blk_run_queue+0x3c/0x68 [<000000004049a534>] scsi_run_queue+0x2a4/0x360 [<000000004049be68>] scsi_end_request+0x1a8/0x238 [<000000004049de84>] scsi_io_completion+0xfc/0x688 [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0 The cause of the crash is not exhaustion of the IOMMU space, there is plenty of free pages. The function sba_alloc_range is called with size 0x11000, thus the pages_needed variable is 0x11. The function sba_search_bitmap is called with bits_wanted 0x11 and boundary size is 0x10 (because dma_get_seg_boundary(dev) returns 0xffff). The function sba_search_bitmap attempts to allocate 17 pages that must not cross 16-page boundary - it can't satisfy this requirement (iommu_is_span_boundary always returns true) and fails even if there are many free entries in the IOMMU space. How did it happen that we try to allocate 17 pages that don't cross 16-page boundary? The cause is in the function iommu_coalesce_chunks. This function tries to coalesce adjacent entries in the scatterlist. The function does several checks if it may coalesce one entry with the next, one of those checks is this: if (startsg->length + dma_len > max_seg_size) break; When it finishes coalescing adjacent entries, it allocates the mapping: sg_dma_len(contig_sg) = dma_len; dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); sg_dma_address(contig_sg) = PIDE_FLAG | (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT) | dma_offset; It is possible that (startsg->length + dma_len > max_seg_size) is false (we are just near the 0x10000 max_seg_size boundary), so the funcion decides to coalesce this entry with the next entry. When the coalescing succeeds, the function performs dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE); And now, because of non-zero dma_offset, dma_len is greater than 0x10000. iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts to allocate 17 pages for a device that must not cross 16-page boundary. To fix the bug, we must make sure that dma_len after addition of dma_offset and alignment doesn't cross the segment boundary. I.e. change if (startsg->length + dma_len > max_seg_size) break; to if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size) break; This patch makes this change (it precalculates max_seg_boundary at the beginning of the function iommu_coalesce_chunks). I also added a check that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is not needed for Promise TX2+ SATA, but it may be needed for other devices that have dma_get_seg_boundary lower than dma_get_max_seg_size). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Prarit Bhargava authored
commit 79a21dbf upstream. Intel RAPL initialized on several systems where the BIOS lock bit (msr 0x610, bit 63) was set. This occured because the return value of rapl_read_data_raw() was being checked, rather than the value of the variable passed in, locked. This patch properly implments the rapl_read_data_raw() call to check the variable locked, and now the Intel RAPL driver outputs the warning: intel_rapl: RAPL package 0 domain package locked by BIOS and does not initialize for the package. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Alan Stern authored
commit ad87e032 upstream. Some USB device / host controller combinations seem to have problems with Link Power Management. For example, Steinar found that his xHCI controller wouldn't handle bandwidth calculations correctly for two video cards simultaneously when LPM was enabled, even though the bus had plenty of bandwidth available. This patch introduces a new quirk flag for devices that should remain disabled for LPM, and creates quirk entries for Steinar's devices. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Mathias Nyman authored
commit f69115fd upstream. According to USB 2 specs ports need to signal resume for at least 20ms, in practice even longer, before moving to U0 state. Both host and devices can initiate resume. On device initiated resume, a port status interrupt with the port in resume state in issued. The interrupt handler tags a resume_done[port] timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer. Root hub timer requests for port status, finds the port in resume state, checks if resume_done[port] timestamp passed, and set port to U0 state. On host initiated resume, current code sets the port to resume state, sleep 20ms, and finally sets the port to U0 state. This should also be changed to work in a similar way as the device initiated resume, with timestamp tagging, but that is not yet tested and will be a separate fix later. There are a few issues with this approach 1. A host initiated resume will also generate a resume event. The event handler will find the port in resume state, believe it's a device initiated resume, and act accordingly. 2. A port status request might cut the resume signalling short if a get_port_status request is handled during the host resume signalling. The port will be found in resume state. The timestamp is not set leading to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0. get_port_status will proceed with moving the port to U0. 3. If an error, or anything else happens to the port during device initiated resume signalling it will leave all the device resume parameters hanging uncleared, preventing further suspend, returning -EBUSY, and cause the pm thread to busyloop trying to enter suspend. Fix this by using the existing resuming_ports bitfield to indicate that resume signalling timing is taken care of. Check if the resume_done[port] is set before using it for timestamp comparison, and also clear out any resume signalling related variables if port is not in U0 or Resume state This issue was discovered when a PM thread busylooped, trying to runtime suspend the xhci USB 2 roothub on a Dell XPS Reported-by: Daniel J Blueman <daniel@quora.org> Tested-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
James Bottomley authored
commit 5e103356 upstream. KASAN found that our additional element processing scripts drop off the end of the VPD page into unallocated space. The reason is that not every element has additional information but our traversal routines think they do, leading to them expecting far more additional information than is present. Fix this by adding a gate to the traversal routine so that it only processes elements that are expected to have additional information (list is in SES-2 section 6.1.13.1: Additional Element Status diagnostic page overview) Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Kirill A. Shutemov authored
commit 9f5bd308 upstream. There are few defects in vga_get() related to signal hadning: - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE case; - if we found pending signal we must remove ourself from wait queue and change task state back to running; - -ERESTARTSYS is more appropriate, I guess. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
James Bottomley authored
commit 3417c1b5 upstream. Simple enclosure implementations (mostly USB) are allowed to return only page 8 to every diagnostic query. That really confuses our implementation because we assume the return is the page we asked for and end up doing incorrect offsets based on bogus information leading to accesses outside of allocated ranges. Fix that by checking the page code of the return and giving an error if it isn't the one we asked for. This should fix reported bugs with USB storage by simply refusing to attach to enclosures that behave like this. It's also good defensive practise now that we're starting to see more USB enclosures. Reported-by: Andrea Gelmini <andrea.gelmini@gelma.net> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Joe Thornber authored
commit ed8b45a3 upstream. If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Johannes Berg authored
commit b7bb1100 upstream. Some users of rfkill, like NFC and cfg80211, use a dynamic name when allocating rfkill, in those cases dev_name(). Therefore, the pointer passed to rfkill_alloc() might not be valid forever, I specifically found the case that the rfkill name was quite obviously an invalid pointer (or at least garbage) when the wiphy had been renamed. Fix this by making a copy of the rfkill name in rfkill_alloc(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Paul Mackerras authored
commit c20875a3 upstream. Currently it is possible for userspace (e.g. QEMU) to set a value for the MSR for a guest VCPU which has both of the TS bits set, which is an illegal combination. The result of this is that when we execute a hrfid (hypervisor return from interrupt doubleword) instruction to enter the guest, the CPU will take a TM Bad Thing type of program interrupt (vector 0x700). Now, if PR KVM is configured in the kernel along with HV KVM, we actually handle this without crashing the host or giving hypervisor privilege to the guest; instead what happens is that we deliver a program interrupt to the guest, with SRR0 reflecting the address of the hrfid instruction and SRR1 containing the MSR value at that point. If PR KVM is not configured in the kernel, then we try to run the host's program interrupt handler with the MMU set to the guest context, which almost certainly causes a host crash. This closes the hole by making kvmppc_set_msr_hv() check for the illegal combination and force the TS field to a safe value (00, meaning non-transactional). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-