- 26 Aug, 2010 1 commit
-
-
Maxim Levitsky authored
commit d862b13b upstream. mspro_block_remove() is called from detect thread that first calls the mspro_block_stop(), which stops the request queue. If we call del_gendisk() with the queue stopped we get a deadlock. Signed-off-by:
Maxim Levitsky <maximlevitsky@gmail.com> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
- 20 Aug, 2010 3 commits
-
-
Greg Kroah-Hartman authored
-
Linus Torvalds authored
commit d7824370 upstream. This commit makes the stack guard page somewhat less visible to user space. It does this by: - not showing the guard page in /proc/<pid>/maps It looks like lvm-tools will actually read /proc/self/maps to figure out where all its mappings are, and effectively do a specialized "mlockall()" in user space. By not showing the guard page as part of the mapping (by just adding PAGE_SIZE to the start for grows-up pages), lvm-tools ends up not being aware of it. - by also teaching the _real_ mlock() functionality not to try to lock the guard page. That would just expand the mapping down to create a new guard page, so there really is no point in trying to lock it in place. It would perhaps be nice to show the guard page specially in /proc/<pid>/maps (or at least mark grow-down segments some way), but let's not open ourselves up to more breakage by user space from programs that depends on the exact deails of the 'maps' file. Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools source code to see what was going on with the whole new warning. Reported-and-tested-by: François Valenduc <francois.valenduc@tvcablenet.be Reported-by:
Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Linus Torvalds authored
commit 11ac5524 upstream. We do in fact need to unmap the page table _before_ doing the whole stack guard page logic, because if it is needed (mainly 32-bit x86 with PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it will do a kmap_atomic/kunmap_atomic. And those kmaps will create an atomic region that we cannot do allocations in. However, the whole stack expand code will need to do anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an atomic region. Now, a better model might actually be to do the anon_vma_prepare() when _creating_ a VM_GROWSDOWN segment, and not have to worry about any of this at page fault time. But in the meantime, this is the straightforward fix for the issue. See https://bugzilla.kernel.org/show_bug.cgi?id=16588 for details. Reported-by:
Wylda <wylda@volny.cz> Reported-by:
Sedat Dilek <sedat.dilek@gmail.com> Reported-by:
Mike Pagano <mpagano@gentoo.org> Reported-by:
François Valenduc <francois.valenduc@tvcablenet.be> Tested-by:
Ed Tomlinson <edt@aei.ca> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
- 13 Aug, 2010 36 commits
-
-
Greg Kroah-Hartman authored
-
Linus Torvalds authored
commit 96054569 upstream. It's wrong for several reasons, but the most direct one is that the fault may be for the stack accesses to set up a previous SIGBUS. When we have a kernel exception, the kernel exception handler does all the fixups, not some user-level signal handler. Even apart from the nested SIGBUS issue, it's also wrong to give out kernel fault addresses in the signal handler info block, or to send a SIGBUS when a system call already returns EFAULT. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Linus Torvalds authored
commit 5528f913 upstream. .. which didn't show up in my tests because it's a no-op on x86-64 and most other architectures. But we enter the function with the last-level page table mapped, and should unmap it at exit. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Linus Torvalds authored
commit 320b2b8d upstream. This is a rather minimally invasive patch to solve the problem of the user stack growing into a memory mapped area below it. Whenever we fill the first page of the stack segment, expand the segment down by one page. Now, admittedly some odd application might _want_ the stack to grow down into the preceding memory mapping, and so we may at some point need to make this a process tunable (some people might also want to have more than a single page of guarding), but let's try the minimal approach first. Tested with trivial application that maps a single page just below the stack, and then starts recursing. Without this, we will get a SIGSEGV _after_ the stack has smashed the mapping. With this patch, we'll get a nice SIGBUS just as the stack touches the page just above the mapping. Requested-by:
Keith Packard <keithp@keithp.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
KAMEZAWA Hiroyuki authored
commit 966cca02 upstream. Since 2.6.31, swap_map[]'s refcounting was changed to show that a used swap entry is just for swap-cache, can be reused. Then, while scanning free entry in swap_map[], a swap entry may be able to be reclaimed and reused. It was caused by commit c9e44410 ("mm: reuse unused swap entry if necessary"). But this caused deta corruption at resume. The scenario is - Assume a clean-swap cache, but mapped. - at hibernation_snapshot[], clean-swap-cache is saved as clean-swap-cache and swap_map[] is marked as SWAP_HAS_CACHE. - then, save_image() is called. And reuse SWAP_HAS_CACHE entry to save image, and break the contents. After resume: - the memory reclaim runs and finds clean-not-referenced-swap-cache and discards it because it's marked as clean. But here, the contents on disk and swap-cache is inconsistent. Hance memory is corrupted. This patch avoids the bug by not reclaiming swap-entry during hibernation. This is a quick fix for backporting. Signed-off-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Reported-by:
Ondreg Zary <linux@rainbow-software.org> Tested-by:
Ondreg Zary <linux@rainbow-software.org> Tested-by:
Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by:
Hugh Dickins <hughd@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
NeilBrown authored
commit e555190d upstream. When a raid1 array is configured to support write-behind on some devices, it normally only reads from other devices. If all devices are write-behind (because the rest have failed) it is possible for a read request to be serviced before a behind-write request, which would appear as data corruption. So when forced to read from a WriteMostly device, wait for any write-behind to complete, and don't start any more behind-writes. Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Brian King authored
commit daa142d1 upstream. If a command times out resulting in EH getting invoked, we wait for the aborted commands to come back after sending the abort. Shorten the amount of time we wait for these responses, to ensure we don't get stuck in EH for several minutes. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Signed-off-by:
James Bottomley <James.Bottomley@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Brian King authored
commit f5832fa2 upstream. Commands which are completed by the VIOS are placed on a CRQ in kernel memory for the ibmvfc driver to process. Each CRQ entry is 16 bytes. The ibmvfc driver reads the first 8 bytes to check if the entry is valid, then reads the next 8 bytes to get the handle, which is a pointer the completed command. This fixes an issue seen on Power 7 where the processor reordered the loads from memory, resulting in processing command completion with a stale handle. This could result in command timeouts, and also early completion of commands. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Signed-off-by:
James Bottomley <James.Bottomley@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Hannes Reinecke authored
commit 534ef056 upstream. When removing several devices aic79xx will occasionally Oops in ahd_handle_nonpkt_busfree during rescan. Looking at the code I found that we're indeed not checking if the scb in question is NULL. So check for it before accessing it. Signed-off-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
James Bottomley <James.Bottomley@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Nikanth Karthikesan authored
commit 02246c41 upstream. Update mtime when writing to backing filesystem using the address space operations write_begin and write_end. Signed-off-by:
Nikanth Karthikesan <knikanth@suse.de> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Sachin Prabhu authored
commit ee860b6a upstream. ocfs2_lock() will skip locks on file which has mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. ocfs2_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by:
Sachin Prabhu <sprabhu@redhat.com> Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Neil Brown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jan Kara authored
commit 57b09bb5 upstream. We have to set MS_POSIXACL on remount as well. Otherwise VFS would not know we started supporting ACLs after remount and thus ACLs would not work. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tao Ma authored
commit 38a04e43 upstream. ocfs2 refcount tree is stored as an extent tree while the leaf ocfs2_refcount_rec points to a refcount block. The following step can trip a kernel panic. mkfs.ocfs2 -b 512 -C 1M --fs-features=refcount $DEVICE mount -t ocfs2 $DEVICE $MNT_DIR FILE_NAME=$RANDOM FILE_NAME_1=$RANDOM FILE_REF="${FILE_NAME}_ref" FILE_REF_1="${FILE_NAME}_ref_1" for((i=0;i<305;i++)) do # /mnt/1048576 is a file with 1048576 sizes. cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1 done for((i=0;i<3;i++)) do cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME done for((i=0;i<2;i++)) do cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1 done cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME for((i=0;i<11;i++)) do cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1 done reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF # write_f is a program which will write some bytes to a file at offset. # write_f -f file_name -l offset -w write_bytes. ./write_f -f $MNT_DIR/$FILE_REF -l $[310*1048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_REF -l $[306*1048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_REF -l $[311*1048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_NAME -l $[310*1048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096 reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF_1 ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096 #kernel panic here. The reason is that if the ocfs2_extent_rec is the last record in a leaf extent block, the old solution fails to find the suitable end cpos. So this patch try to walk through the b-tree, find the next sub root and get the c_pos the next sub-tree starts from. btw, I have runned tristan's test case against the patched kernel for several days and this type of kernel panic never happens again. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David Teigland authored
commit cf6620ac upstream. When the lock master processes a successful operation (request, convert, cancel, or unlock), it will process the effects of the change before sending the reply for the operation. The "effects" of the operation are: - blocking callbacks (basts) for any newly granted locks - waiting or converting locks that can now be granted The cast is queued on the local node when the reply from the lock master is received. This means that a lock holder can receive a bast for a lock mode that is doesn't yet know has been granted. Signed-off-by:
David Teigland <teigland@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David Teigland authored
commit 7fe2b319 upstream. When both blocking and completion callbacks are queued for lock, the dlm would always deliver the completion callback (cast) first. In some cases the blocking callback (bast) is queued before the cast, though, and should be delivered first. This patch keeps track of the order in which they were queued and delivers them in that order. This patch also keeps track of the granted mode in the last cast and eliminates the following bast if the bast mode is compatible with the preceding cast mode. This happens when a remotely mastered lock is demoted, e.g. EX->NL, in which case the local node queues a cast immediately after sending the demote message. In this way a cast can be queued for a mode, e.g. NL, that makes an in-transit bast extraneous. Signed-off-by:
David Teigland <teigland@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David Teigland authored
commit 573c24c4 upstream. Replace all GFP_KERNEL and ls_allocation with GFP_NOFS. ls_allocation would be GFP_KERNEL for userland lockspaces and GFP_NOFS for file system lockspaces. It was discovered that any lockspaces on the system can affect all others by triggering memory reclaim in the file system which could in turn call back into the dlm to acquire locks, deadlocking dlm threads that were shared by all lockspaces, like dlm_recv. Signed-off-by:
David Teigland <teigland@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jeff Mahoney authored
commit 6cb4aff0 upstream. Commit 57fe60df ("reiserfs: add atomic addition of selinux attributes during inode creation") contains a bug that will cause it to oops when mounting a file system that didn't previously contain extended attributes on a system using security.* xattrs. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15309Signed-off-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jeff Mahoney authored
commit 3f8b5ee3 upstream. The reiserfs journal behaves inconsistently when determining whether to allow a mount of a read-only device. This is due to the use of the continue_replay variable to short circuit the journal scanning. If it's set, it's assumed that there are transactions to replay, but there may not be. If it's unset, it's assumed that there aren't any, and that may not be the case either. I've observed two failure cases: 1) Where a clean file system on a read-only device refuses to mount 2) Where a clean file system on a read-only device passes the optimization and then tries writing the journal header to update the latest mount id. The former is easily observable by using a freshly created file system on a read-only loopback device. This patch moves the check into journal_read_transaction, where it can bail out before it's about to replay a transaction. That way it can go through and skip transactions where appropriate, yet still refuse to mount a file system with outstanding transactions. Signed-off-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric Sandeen authored
commit 15121c18 upstream. We have 2 mount options, "barrier" and "auto_da_alloc" which may or may not take a 1/0 argument. This causes the ext4 superblock mount code to subtract uninitialized pointers and pass the result to kmalloc, which results in very noisy failures. Per Ted's suggestion, initialize the args struct so that we know whether match_token() found an argument for the option, and skip match_int() if not. Also, return error (0) from parse_options if we thought we found an argument, but match_int() Fails. Reported-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Theodore Ts'o authored
commit 1f5a81e4 upstream. Dan Roseberg has reported a problem with the MOVE_EXT ioctl. If the donor file is an append-only file, we should not allow the operation to proceed, lest we end up overwriting the contents of an append-only file. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Darrick J. Wong authored
commit 455c0d71 upstream. Earlier, Ingo Molnar posted a patch to make it so that the kernel would avoid reading _PPC on his broken T60. Unfortunately, it seems that with Thomas Renninger's patch last July to eliminate _PPC evaluations when the processor driver loads, the kernel never actually reads _PPC at all! This is problematic if you happen to boot your non-T60 computer in a state where the BIOS _wants_ _PPC to be something other than zero. So, put the _PPC evaluation back into acpi_processor_get_performance_info if ignore_ppc isn't 1. Signed-off-by:
Darrick J. Wong <djwong@us.ibm.com> Signed-off-by:
Len Brown <len.brown@intel.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Breno Leitao authored
commit 8d3d50bf upstream. During a EEH recover, the pci_dev structure can be null, mainly if an eeh event is detected during cpi config operation. In this case, the pci_dev will not be known (and will be null) the kernel will crash with the following message: Unable to handle kernel paging request for data at address 0x000000a0 Faulting instruction address: 0xc00000000006b8b4 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0 LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 Call Trace: [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70 The bug occurs because pci_name() tries to access a null pointer. This patch just guarantee that pci_name() is not called on Null pointers. Signed-off-by:
Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by:
Linas Vepstas <linasvepstas@gmail.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Wu Fengguang authored
commit 1668bfd5 upstream. Don't try to isolate a still mapped page. Otherwise we will hit the BUG_ON(page_mapped(page)) in __remove_from_page_cache(). Signed-off-by:
Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Renninger <trenn@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Wu Fengguang authored
commit 9b9a29ec upstream. (PG_swapbacked && !PG_lru) pages should not happen. Better to treat them as unknown pages. Signed-off-by:
Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Renninger <trenn@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric W. Biederman authored
commit fad53995 upstream. Iranna D Ankad reported that IBM x3950 systems have boot problems after this commit: | | commit b9c61b70 | | x86/pci: update pirq_enable_irq() to setup io apic routing | The problem is that with the patch, the machine freezes when console=ttyS0,... kernel serial parameter is passed. It seem to freeze at DVD initialization and the whole problem seem to be DVD/pata related, but somehow exposed through the serial parameter. Such apic problems can expose really weird behavior: ACPI: IOAPIC (id[0x10] address[0xfecff000] gsi_base[0]) IOAPIC[0]: apic_id 16, version 0, address 0xfecff000, GSI 0-2 ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[3]) IOAPIC[1]: apic_id 15, version 0, address 0xfec00000, GSI 3-38 ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[39]) IOAPIC[2]: apic_id 14, version 0, address 0xfec01000, GSI 39-74 ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 4 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 5 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 6 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 7 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 6 global_irq 9 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 10 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 11 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 12 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 12 global_irq 15 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 13 global_irq 16 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 17 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 18 dfl dfl) It turns out that the system has three io apic controllers, but boot ioapic routing is in the second one, and that gsi_base is not 0 - it is using a bunch of INT_SRC_OVR... So these recent changes: 1. one set routing for first io apic controller 2. assume irq = gsi ... will break that system. So try to remap those gsis, need to seperate boot_ioapic_idx detection out of enable_IO_APIC() and call them early. So introduce boot_ioapic_idx, and remap_ioapic_gsi()... -v2: shift gsi with delta instead of gsi_base of boot_ioapic_idx -v3: double check with find_isa_irq_apic(0, mp_INT) to get right boot_ioapic_idx -v4: nr_legacy_irqs -v5: add print out for boot_ioapic_idx, and also make it could be applied for current kernel and previous kernel -v6: add bus_irq, in acpi_sci_ioapic_setup, so can get overwride for sci right mapping... -v7: looks like pnpacpi get irq instead of gsi, so need to revert them back... -v8: split into two patches -v9: according to Eric, use fixed 16 for shifting instead of remap -v10: still need to touch rsparser.c -v11: just revert back to way Eric suggest... anyway the ioapic in first ioapic is blocked by second... -v12: two patches, this one will add more loop but check apic_id and irq > 16 Reported-by:
Iranna D Ankad <iranna.ankad@in.ibm.com> Bisected-by:
Iranna D Ankad <iranna.ankad@in.ibm.com> Tested-by:
Gary Hade <garyhade@us.ibm.com> Signed-off-by:
Yinghai Lu <yinghai@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: len.brown@intel.com LKML-Reference: <4B8A321A.1000008@kernel.org> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Shaohui Zheng authored
commit ea085417 upstream. Newly added memory can not be accessed via /dev/mem, because we do not update the variables high_memory, max_pfn and max_low_pfn. Add a function update_end_of_memory_vars() to update these variables for 64-bit kernels. [akpm@linux-foundation.org: simplify comment] Signed-off-by:
Shaohui Zheng <shaohui.zheng@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Li Haicheng <haicheng.li@intel.com> Reviewed-by:
Wu Fengguang <fengguang.wu@intel.com> Reviewed-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Song Youquan authored
commit 863b557a upstream. When load aesni-intel and ghash_clmulni-intel driver,kernel will complain no test for some internal used algorithm. The strange information as following: alg: No test for __aes-aesni (__driver-aes-aesni) alg: No test for __ecb-aes-aesni (__driver-ecb-aes-aesni) alg: No test for __cbc-aes-aesni (__driver-cbc-aes-aesni) alg: No test for __ecb-aes-aesni (cryptd(__driver-ecb-aes-aesni) alg: No test for __ghash (__ghash-pclmulqdqni) alg: No test for __ghash (cryptd(__ghash-pclmulqdqni)) This patch add NULL test entries for these algorithm and driver. Signed-off-by:
Song Youquan <youquan.song@intel.com> Signed-off-by:
Hang Ying <ying.huang@intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Acked-by:
Jiri Kosina <jkosina@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
FUJITA Tomonori authored
commit e2a46567 upstream. It's possible that SBA IOMMU might fail to find I/O space under heavy I/Os. SBA IOMMU panics on allocation failure but it shouldn't; drivers can handle the failure. The majority of other IOMMU drivers don't panic on allocation failure. This patch fixes SBA IOMMU path to handle allocation failure properly. Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Tony Luck <tony.luck@intel.com> Acked-by:
Leonardo Chiquitto <lchiquitto@novell.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Benjamin Herrenschmidt authored
commit 4b402210 upstream. Due to recent load-balancer changes that delay the task migration to the next wakeup, the adaptive mutex spinning ends up in a live lock when the owner's CPU gets offlined because the cpu_online() check lives before the owner running check. This patch changes mutex_spin_on_owner() to return 0 (don't spin) in any case where we aren't sure about the owner struct validity or CPU number, and if the said CPU is offline. There is no point going back & re-evaluate spinning in corner cases like that, let's just go to sleep. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271212509.13059.135.camel@pasglop> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Hidetoshi Seto authored
commit 0cf55e1e upstream. This is a real fix for problem of utime/stime values decreasing described in the thread: http://lkml.org/lkml/2009/11/3/522 Now cputime is accounted in the following way: - {u,s}time in task_struct are increased every time when the thread is interrupted by a tick (timer interrupt). - When a thread exits, its {u,s}time are added to signal->{u,s}time, after adjusted by task_times(). - When all threads in a thread_group exits, accumulated {u,s}time (and also c{u,s}time) in signal struct are added to c{u,s}time in signal struct of the group's parent. So {u,s}time in task struct are "raw" tick count, while {u,s}time and c{u,s}time in signal struct are "adjusted" values. And accounted values are used by: - task_times(), to get cputime of a thread: This function returns adjusted values that originates from raw {u,s}time and scaled by sum_exec_runtime that accounted by CFS. - thread_group_cputime(), to get cputime of a thread group: This function returns sum of all {u,s}time of living threads in the group, plus {u,s}time in the signal struct that is sum of adjusted cputimes of all exited threads belonged to the group. The problem is the return value of thread_group_cputime(), because it is mixed sum of "raw" value and "adjusted" value: group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time) This misbehavior can break {u,s}time monotonicity. Assume that if there is a thread that have raw values greater than adjusted values (e.g. interrupted by 1000Hz ticks 50 times but only runs 45ms) and if it exits, cputime will decrease (e.g. -5ms). To fix this, we could do: group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time) But task_times() contains hard divisions, so applying it for every thread should be avoided. This patch fixes the above problem in the following way: - Modify thread's exit (= __exit_signal()) not to use task_times(). It means {u,s}time in signal struct accumulates raw values instead of adjusted values. As the result it makes thread_group_cputime() to return pure sum of "raw" values. - Introduce a new function thread_group_times(*task, *utime, *stime) that converts "raw" values of thread_group_cputime() to "adjusted" values, in same calculation procedure as task_times(). - Modify group's exit (= wait_task_zombie()) to use this introduced thread_group_times(). It make c{u,s}time in signal struct to have adjusted values like before this patch. - Replace some thread_group_cputime() by thread_group_times(). This replacements are only applied where conveys the "adjusted" cputime to users, and where already uses task_times() near by it. (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.) This patch have a positive side effect: - Before this patch, if a group contains many short-life threads (e.g. runs 0.9ms and not interrupted by ticks), the group's cputime could be invisible since thread's cputime was accumulated after adjusted: imagine adjustment function as adj(ticks, runtime), {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0. After this patch it will not happen because the adjustment is applied after accumulated. v2: - remove if()s, put new variables into signal_struct. Signed-off-by:
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by:
Peter Zijlstra <peterz@infradead.org> Cc: Spencer Candland <spencer@bluehost.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4B162517.8040909@jp.fujitsu.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Jiri Slaby <jslaby@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Hidetoshi Seto authored
commit 761b1d26 upstream. Originally task_s/utime() were designed to return clock_t but later changed to return cputime_t by following commit: commit efe567fc Author: Christian Borntraeger <borntraeger@de.ibm.com> Date: Thu Aug 23 15:18:02 2007 +0200 It only changed the type of return value, but not the implementation. As the result the granularity of task_s/utime() is still that of clock_t, not that of cputime_t. So using task_s/utime() in __exit_signal() makes values accumulated to the signal struct to be rounded and coarse grained. This patch removes casts to clock_t in task_u/stime(), to keep granularity of cputime_t over the calculation. v2: Use div_u64() to avoid error "undefined reference to `__udivdi3`" on some 32bit systems. Signed-off-by:
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by:
Peter Zijlstra <peterz@infradead.org> Cc: xiyou.wangcong@gmail.com Cc: Spencer Candland <spencer@bluehost.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Jiri Slaby <jslaby@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Lin Ming authored
commit 0696b711 upstream. Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by:
Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Kurt Garloff <garloff@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Martin Schwidefsky authored
commit eed3b9cf upstream. On a system with NOHZ=y tick_check_idle calls tick_nohz_stop_idle and tick_nohz_update_jiffies. Given the right conditions (ts->idle_active and/or ts->tick_stopped) both function get a time stamp with ktime_get. The same time stamp can be reused if both function require one. On s390 this change has the additional benefit that gcc inlines the tick_nohz_stop_idle function into tick_check_idle. The number of instructions to execute tick_check_idle drops from 225 to 144 (without the ktime_get optimization it is 367 vs 215 instructions). before: 0) | tick_check_idle() { 0) | tick_nohz_stop_idle() { 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.601 us | } 0) 1.765 us | } 0) 3.047 us | } 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.570 us | } 0) 1.727 us | } 0) | tick_do_update_jiffies64() { 0) 0.609 us | } 0) 8.055 us | } after: 0) | tick_check_idle() { 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.617 us | } 0) 1.773 us | } 0) | tick_do_update_jiffies64() { 0) 0.593 us | } 0) 4.477 us | } Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: john stultz <johnstul@us.ibm.com> LKML-Reference: <20090929122533.206589318@de.ibm.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
John Jolly <jjolly@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Martin Schwidefsky authored
commit 3c5d92a0 upstream. Allow the architecture to request a normal jiffy tick when the system goes idle and tick_nohz_stop_sched_tick is called . On s390 the hook is used to prevent the system going fully idle if there has been an interrupt other than a clock comparator interrupt since the last wakeup. On s390 the HiperSockets response time for 1 connection ping-pong goes down from 42 to 34 microseconds. The CPU cost decreases by 27%. Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> LKML-Reference: <20090929122533.402715150@de.ibm.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
John Jolly <jjolly@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit da495ecc upstream. We kstrdup the options string, but then strsep screws with the pointer, so when we kfree() it, we're not giving it the right pointer. Tested-by:
Andy Lutomirski <luto@mit.edu> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Shaohua Li authored
commit 3f6fae95 upstream. My test do: fallocate a big file and do write. The file is 512M, but after file write is done btrfs-debug-tree shows: item 6 key (257 EXTENT_DATA 0) itemoff 3516 itemsize 53 extent data disk byte 1103101952 nr 536870912 extent data offset 0 nr 399634432 ram 536870912 extent compression 0 Looks like a regression introducted by 6c7d54ac, where we set wrong slot. Signed-off-by:
Shaohua Li <shaohua.li@intel.com> Acked-by:
Yan Zheng <zheng.yan@oracle.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-