- 13 Aug, 2010 40 commits
-
-
David Teigland authored
commit 7fe2b319 upstream. When both blocking and completion callbacks are queued for lock, the dlm would always deliver the completion callback (cast) first. In some cases the blocking callback (bast) is queued before the cast, though, and should be delivered first. This patch keeps track of the order in which they were queued and delivers them in that order. This patch also keeps track of the granted mode in the last cast and eliminates the following bast if the bast mode is compatible with the preceding cast mode. This happens when a remotely mastered lock is demoted, e.g. EX->NL, in which case the local node queues a cast immediately after sending the demote message. In this way a cast can be queued for a mode, e.g. NL, that makes an in-transit bast extraneous. Signed-off-by:
David Teigland <teigland@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David Teigland authored
commit 573c24c4 upstream. Replace all GFP_KERNEL and ls_allocation with GFP_NOFS. ls_allocation would be GFP_KERNEL for userland lockspaces and GFP_NOFS for file system lockspaces. It was discovered that any lockspaces on the system can affect all others by triggering memory reclaim in the file system which could in turn call back into the dlm to acquire locks, deadlocking dlm threads that were shared by all lockspaces, like dlm_recv. Signed-off-by:
David Teigland <teigland@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jeff Mahoney authored
commit 6cb4aff0 upstream. Commit 57fe60df ("reiserfs: add atomic addition of selinux attributes during inode creation") contains a bug that will cause it to oops when mounting a file system that didn't previously contain extended attributes on a system using security.* xattrs. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15309Signed-off-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jeff Mahoney authored
commit 3f8b5ee3 upstream. The reiserfs journal behaves inconsistently when determining whether to allow a mount of a read-only device. This is due to the use of the continue_replay variable to short circuit the journal scanning. If it's set, it's assumed that there are transactions to replay, but there may not be. If it's unset, it's assumed that there aren't any, and that may not be the case either. I've observed two failure cases: 1) Where a clean file system on a read-only device refuses to mount 2) Where a clean file system on a read-only device passes the optimization and then tries writing the journal header to update the latest mount id. The former is easily observable by using a freshly created file system on a read-only loopback device. This patch moves the check into journal_read_transaction, where it can bail out before it's about to replay a transaction. That way it can go through and skip transactions where appropriate, yet still refuse to mount a file system with outstanding transactions. Signed-off-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric Sandeen authored
commit 15121c18 upstream. We have 2 mount options, "barrier" and "auto_da_alloc" which may or may not take a 1/0 argument. This causes the ext4 superblock mount code to subtract uninitialized pointers and pass the result to kmalloc, which results in very noisy failures. Per Ted's suggestion, initialize the args struct so that we know whether match_token() found an argument for the option, and skip match_int() if not. Also, return error (0) from parse_options if we thought we found an argument, but match_int() Fails. Reported-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Theodore Ts'o authored
commit 1f5a81e4 upstream. Dan Roseberg has reported a problem with the MOVE_EXT ioctl. If the donor file is an append-only file, we should not allow the operation to proceed, lest we end up overwriting the contents of an append-only file. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Darrick J. Wong authored
commit 455c0d71 upstream. Earlier, Ingo Molnar posted a patch to make it so that the kernel would avoid reading _PPC on his broken T60. Unfortunately, it seems that with Thomas Renninger's patch last July to eliminate _PPC evaluations when the processor driver loads, the kernel never actually reads _PPC at all! This is problematic if you happen to boot your non-T60 computer in a state where the BIOS _wants_ _PPC to be something other than zero. So, put the _PPC evaluation back into acpi_processor_get_performance_info if ignore_ppc isn't 1. Signed-off-by:
Darrick J. Wong <djwong@us.ibm.com> Signed-off-by:
Len Brown <len.brown@intel.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Breno Leitao authored
commit 8d3d50bf upstream. During a EEH recover, the pci_dev structure can be null, mainly if an eeh event is detected during cpi config operation. In this case, the pci_dev will not be known (and will be null) the kernel will crash with the following message: Unable to handle kernel paging request for data at address 0x000000a0 Faulting instruction address: 0xc00000000006b8b4 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0 LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 Call Trace: [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70 The bug occurs because pci_name() tries to access a null pointer. This patch just guarantee that pci_name() is not called on Null pointers. Signed-off-by:
Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by:
Linas Vepstas <linasvepstas@gmail.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Wu Fengguang authored
commit 1668bfd5 upstream. Don't try to isolate a still mapped page. Otherwise we will hit the BUG_ON(page_mapped(page)) in __remove_from_page_cache(). Signed-off-by:
Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Renninger <trenn@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Wu Fengguang authored
commit 9b9a29ec upstream. (PG_swapbacked && !PG_lru) pages should not happen. Better to treat them as unknown pages. Signed-off-by:
Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Signed-off-by:
Thomas Renninger <trenn@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric W. Biederman authored
commit fad53995 upstream. Iranna D Ankad reported that IBM x3950 systems have boot problems after this commit: | | commit b9c61b70 | | x86/pci: update pirq_enable_irq() to setup io apic routing | The problem is that with the patch, the machine freezes when console=ttyS0,... kernel serial parameter is passed. It seem to freeze at DVD initialization and the whole problem seem to be DVD/pata related, but somehow exposed through the serial parameter. Such apic problems can expose really weird behavior: ACPI: IOAPIC (id[0x10] address[0xfecff000] gsi_base[0]) IOAPIC[0]: apic_id 16, version 0, address 0xfecff000, GSI 0-2 ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[3]) IOAPIC[1]: apic_id 15, version 0, address 0xfec00000, GSI 3-38 ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[39]) IOAPIC[2]: apic_id 14, version 0, address 0xfec01000, GSI 39-74 ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 4 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 5 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 6 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 7 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 6 global_irq 9 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 10 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 11 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 12 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 12 global_irq 15 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 13 global_irq 16 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 17 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 18 dfl dfl) It turns out that the system has three io apic controllers, but boot ioapic routing is in the second one, and that gsi_base is not 0 - it is using a bunch of INT_SRC_OVR... So these recent changes: 1. one set routing for first io apic controller 2. assume irq = gsi ... will break that system. So try to remap those gsis, need to seperate boot_ioapic_idx detection out of enable_IO_APIC() and call them early. So introduce boot_ioapic_idx, and remap_ioapic_gsi()... -v2: shift gsi with delta instead of gsi_base of boot_ioapic_idx -v3: double check with find_isa_irq_apic(0, mp_INT) to get right boot_ioapic_idx -v4: nr_legacy_irqs -v5: add print out for boot_ioapic_idx, and also make it could be applied for current kernel and previous kernel -v6: add bus_irq, in acpi_sci_ioapic_setup, so can get overwride for sci right mapping... -v7: looks like pnpacpi get irq instead of gsi, so need to revert them back... -v8: split into two patches -v9: according to Eric, use fixed 16 for shifting instead of remap -v10: still need to touch rsparser.c -v11: just revert back to way Eric suggest... anyway the ioapic in first ioapic is blocked by second... -v12: two patches, this one will add more loop but check apic_id and irq > 16 Reported-by:
Iranna D Ankad <iranna.ankad@in.ibm.com> Bisected-by:
Iranna D Ankad <iranna.ankad@in.ibm.com> Tested-by:
Gary Hade <garyhade@us.ibm.com> Signed-off-by:
Yinghai Lu <yinghai@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: len.brown@intel.com LKML-Reference: <4B8A321A.1000008@kernel.org> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Shaohui Zheng authored
commit ea085417 upstream. Newly added memory can not be accessed via /dev/mem, because we do not update the variables high_memory, max_pfn and max_low_pfn. Add a function update_end_of_memory_vars() to update these variables for 64-bit kernels. [akpm@linux-foundation.org: simplify comment] Signed-off-by:
Shaohui Zheng <shaohui.zheng@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Li Haicheng <haicheng.li@intel.com> Reviewed-by:
Wu Fengguang <fengguang.wu@intel.com> Reviewed-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Song Youquan authored
commit 863b557a upstream. When load aesni-intel and ghash_clmulni-intel driver,kernel will complain no test for some internal used algorithm. The strange information as following: alg: No test for __aes-aesni (__driver-aes-aesni) alg: No test for __ecb-aes-aesni (__driver-ecb-aes-aesni) alg: No test for __cbc-aes-aesni (__driver-cbc-aes-aesni) alg: No test for __ecb-aes-aesni (cryptd(__driver-ecb-aes-aesni) alg: No test for __ghash (__ghash-pclmulqdqni) alg: No test for __ghash (cryptd(__ghash-pclmulqdqni)) This patch add NULL test entries for these algorithm and driver. Signed-off-by:
Song Youquan <youquan.song@intel.com> Signed-off-by:
Hang Ying <ying.huang@intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Acked-by:
Jiri Kosina <jkosina@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
FUJITA Tomonori authored
commit e2a46567 upstream. It's possible that SBA IOMMU might fail to find I/O space under heavy I/Os. SBA IOMMU panics on allocation failure but it shouldn't; drivers can handle the failure. The majority of other IOMMU drivers don't panic on allocation failure. This patch fixes SBA IOMMU path to handle allocation failure properly. Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Tony Luck <tony.luck@intel.com> Acked-by:
Leonardo Chiquitto <lchiquitto@novell.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Benjamin Herrenschmidt authored
commit 4b402210 upstream. Due to recent load-balancer changes that delay the task migration to the next wakeup, the adaptive mutex spinning ends up in a live lock when the owner's CPU gets offlined because the cpu_online() check lives before the owner running check. This patch changes mutex_spin_on_owner() to return 0 (don't spin) in any case where we aren't sure about the owner struct validity or CPU number, and if the said CPU is offline. There is no point going back & re-evaluate spinning in corner cases like that, let's just go to sleep. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271212509.13059.135.camel@pasglop> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Hidetoshi Seto authored
commit 0cf55e1e upstream. This is a real fix for problem of utime/stime values decreasing described in the thread: http://lkml.org/lkml/2009/11/3/522 Now cputime is accounted in the following way: - {u,s}time in task_struct are increased every time when the thread is interrupted by a tick (timer interrupt). - When a thread exits, its {u,s}time are added to signal->{u,s}time, after adjusted by task_times(). - When all threads in a thread_group exits, accumulated {u,s}time (and also c{u,s}time) in signal struct are added to c{u,s}time in signal struct of the group's parent. So {u,s}time in task struct are "raw" tick count, while {u,s}time and c{u,s}time in signal struct are "adjusted" values. And accounted values are used by: - task_times(), to get cputime of a thread: This function returns adjusted values that originates from raw {u,s}time and scaled by sum_exec_runtime that accounted by CFS. - thread_group_cputime(), to get cputime of a thread group: This function returns sum of all {u,s}time of living threads in the group, plus {u,s}time in the signal struct that is sum of adjusted cputimes of all exited threads belonged to the group. The problem is the return value of thread_group_cputime(), because it is mixed sum of "raw" value and "adjusted" value: group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time) This misbehavior can break {u,s}time monotonicity. Assume that if there is a thread that have raw values greater than adjusted values (e.g. interrupted by 1000Hz ticks 50 times but only runs 45ms) and if it exits, cputime will decrease (e.g. -5ms). To fix this, we could do: group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time) But task_times() contains hard divisions, so applying it for every thread should be avoided. This patch fixes the above problem in the following way: - Modify thread's exit (= __exit_signal()) not to use task_times(). It means {u,s}time in signal struct accumulates raw values instead of adjusted values. As the result it makes thread_group_cputime() to return pure sum of "raw" values. - Introduce a new function thread_group_times(*task, *utime, *stime) that converts "raw" values of thread_group_cputime() to "adjusted" values, in same calculation procedure as task_times(). - Modify group's exit (= wait_task_zombie()) to use this introduced thread_group_times(). It make c{u,s}time in signal struct to have adjusted values like before this patch. - Replace some thread_group_cputime() by thread_group_times(). This replacements are only applied where conveys the "adjusted" cputime to users, and where already uses task_times() near by it. (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.) This patch have a positive side effect: - Before this patch, if a group contains many short-life threads (e.g. runs 0.9ms and not interrupted by ticks), the group's cputime could be invisible since thread's cputime was accumulated after adjusted: imagine adjustment function as adj(ticks, runtime), {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0. After this patch it will not happen because the adjustment is applied after accumulated. v2: - remove if()s, put new variables into signal_struct. Signed-off-by:
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by:
Peter Zijlstra <peterz@infradead.org> Cc: Spencer Candland <spencer@bluehost.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4B162517.8040909@jp.fujitsu.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Jiri Slaby <jslaby@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Hidetoshi Seto authored
commit 761b1d26 upstream. Originally task_s/utime() were designed to return clock_t but later changed to return cputime_t by following commit: commit efe567fc Author: Christian Borntraeger <borntraeger@de.ibm.com> Date: Thu Aug 23 15:18:02 2007 +0200 It only changed the type of return value, but not the implementation. As the result the granularity of task_s/utime() is still that of clock_t, not that of cputime_t. So using task_s/utime() in __exit_signal() makes values accumulated to the signal struct to be rounded and coarse grained. This patch removes casts to clock_t in task_u/stime(), to keep granularity of cputime_t over the calculation. v2: Use div_u64() to avoid error "undefined reference to `__udivdi3`" on some 32bit systems. Signed-off-by:
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by:
Peter Zijlstra <peterz@infradead.org> Cc: xiyou.wangcong@gmail.com Cc: Spencer Candland <spencer@bluehost.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Jiri Slaby <jslaby@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Lin Ming authored
commit 0696b711 upstream. Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by:
Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Kurt Garloff <garloff@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Martin Schwidefsky authored
commit eed3b9cf upstream. On a system with NOHZ=y tick_check_idle calls tick_nohz_stop_idle and tick_nohz_update_jiffies. Given the right conditions (ts->idle_active and/or ts->tick_stopped) both function get a time stamp with ktime_get. The same time stamp can be reused if both function require one. On s390 this change has the additional benefit that gcc inlines the tick_nohz_stop_idle function into tick_check_idle. The number of instructions to execute tick_check_idle drops from 225 to 144 (without the ktime_get optimization it is 367 vs 215 instructions). before: 0) | tick_check_idle() { 0) | tick_nohz_stop_idle() { 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.601 us | } 0) 1.765 us | } 0) 3.047 us | } 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.570 us | } 0) 1.727 us | } 0) | tick_do_update_jiffies64() { 0) 0.609 us | } 0) 8.055 us | } after: 0) | tick_check_idle() { 0) | ktime_get() { 0) | read_tod_clock() { 0) 0.617 us | } 0) 1.773 us | } 0) | tick_do_update_jiffies64() { 0) 0.593 us | } 0) 4.477 us | } Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: john stultz <johnstul@us.ibm.com> LKML-Reference: <20090929122533.206589318@de.ibm.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
John Jolly <jjolly@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Martin Schwidefsky authored
commit 3c5d92a0 upstream. Allow the architecture to request a normal jiffy tick when the system goes idle and tick_nohz_stop_sched_tick is called . On s390 the hook is used to prevent the system going fully idle if there has been an interrupt other than a clock comparator interrupt since the last wakeup. On s390 the HiperSockets response time for 1 connection ping-pong goes down from 42 to 34 microseconds. The CPU cost decreases by 27%. Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> LKML-Reference: <20090929122533.402715150@de.ibm.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
John Jolly <jjolly@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit da495ecc upstream. We kstrdup the options string, but then strsep screws with the pointer, so when we kfree() it, we're not giving it the right pointer. Tested-by:
Andy Lutomirski <luto@mit.edu> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Shaohua Li authored
commit 3f6fae95 upstream. My test do: fallocate a big file and do write. The file is 512M, but after file write is done btrfs-debug-tree shows: item 6 key (257 EXTENT_DATA 0) itemoff 3516 itemsize 53 extent data disk byte 1103101952 nr 536870912 extent data offset 0 nr 399634432 ram 536870912 extent compression 0 Looks like a regression introducted by 6c7d54ac, where we set wrong slot. Signed-off-by:
Shaohua Li <shaohua.li@intel.com> Acked-by:
Yan Zheng <zheng.yan@oracle.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Aneesh Kumar K.V authored
commit 23b5c509 upstream. This version of the i_size fix for fallocate makes sure we only update the i_size when the current fallocate is really operating outside of i_size. Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit efd049fb upstream. When running the following fio job [torrent] filename=torrent-test rw=randwrite size=4g filesize=4g bs=4k ioengine=sync you would see long stalls where no work was being done. That is because we were doing all this extra work to read in the file extent outside of the transaction, however in the random io case this ends up hurting us because the file extents are not there to begin with. So axe this logic, since we end up reading in the file extent when we go to update it anyway. This took the fio job from 11 mb/s with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Yan, Zheng authored
commit 7a7965f8 upstream. When dropping a empty tree, walk_down_tree() skips checking extent information for the tree root. This will triggers a BUG_ON in walk_up_proc(). Signed-off-by:
Yan Zheng <zheng.yan@oracle.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Miao Xie authored
commit d7ce5843 upstream. Mounting a bad filesystem caused a BUG_ON(). The following is steps to reproduce it. # mkfs.btrfs /dev/sda2 # mount /dev/sda2 /mnt # mkfs.btrfs /dev/sda1 /dev/sda2 (the program says that /dev/sda2 was mounted, and then exits. ) # umount /mnt # mount /dev/sda1 /mnt At the third step, mkfs.btrfs exited in the way of make filesystem. So the initialization of the filesystem didn't finish. So the filesystem was bad, and it caused BUG_ON() when mounting it. But BUG_ON() should be called by the wrong code, not user's operation, so I think it is a bug of btrfs. This patch fixes it. Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Roel Kluin authored
commit 014e4ac4 upstream. It appears the error return should be negative Signed-off-by:
Roel Kluin <roel.kluin@gmail.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Yan, Zheng authored
commit f044ba78 upstream. Increase extent buffer's reference count while holding the lock. Otherwise it can race with try_release_extent_buffer. Signed-off-by:
Yan Zheng <zheng.yan@oracle.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit 035fe03a upstream. If you have a disk failure in RAID1 and then add a new disk to the array, and then try to remove the missing volume, it will fail. The reason is the sanity check only looks at the total number of rw devices, which is just 2 because we have 2 good disks and 1 bad one. Instead check the total number of devices in the array to make sure we can actually remove the device. Tested this with a failed disk setup and with this test we can now run btrfs-vol -r missing /mount/point and it works fine. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit 7f59203a upstream. Hit this problem while testing RAID1 failure stuff. open_bdev_exclusive returns ERR_PTR(), not NULL. So change the return value properly. This is important if you accidently specify a device that doesn't exist when trying to add a new device to an array, you will panic the box dereferencing bdev. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit f48b9075 upstream. If a RAID setup has chunks that span multiple disks, and one of those disks has failed, btrfs_chunk_readonly will return 1 since one of the disks in that chunk's stripes is dead and therefore not writeable. So instead if we are in degraded mode, return 0 so we can go ahead and allocate stuff. Without this patch all of the block groups in a RAID1 setup will end up read-only, which will mean we can't add new disks to the array since we won't be able to make allocations. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit e3acc2a6 upstream. This patch revert's commit 6c090a11 Since it introduces this problem where we can run orphan cleanup on a volume that can have orphan entries re-added. Instead of my original fix, Yan Zheng pointed out that we can just revert my original fix and then run the orphan cleanup in open_ctree after we look up the fs_root. I have tested this with all the tests that gave me problems and this patch fixes both problems. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Yang Hongyang authored
commit f858153c upstream. In btrfs_init_acl() cloned acl is not released Signed-off-by:
Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Aneesh Kumar K.V authored
commit d1ea6a61 upstream. commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 Btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit 11dfe35a upstream. We can race with the unmount of an fs and the stopping of a kthread where we will free the block group before we're done using it. The reason for this is because we do not hold a reference on the block group while its caching, since the allocator drops its reference once it exits or moves on to the next block group. This patch fixes the problem by taking a reference to the block group before we start caching and dropping it when we're done to make sure all accesses to the block group are safe. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Chris Mason authored
commit a9cc71a6 upstream. It is legal for btrfs_set_acl to be sent a NULL acl. This makes sure we don't dereference it. A similar patch was sent by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Josef Bacik authored
commit 6c090a11 upstream. Currently orphan cleanup only ever gets triggered if we cross subvolumes during a lookup, which means that if we just mount a plain jane fs that has orphans in it, they will never get cleaned up. This results in panic's like these http://www.kerneloops.org/oops.php?number=1109085 where adding an orphan entry results in -EEXIST being returned and we panic. In order to fix this, we check to see on lookup if our root has had the orphan cleanup done, and if not go ahead and do it. This is easily reproduceable by running this testcase #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <unistd.h> #include <stdio.h> int main(int argc, char **argv) { char data[4096]; char newdata[4096]; int fd1, fd2; memset(data, 'a', 4096); memset(newdata, 'b', 4096); while (1) { int i; fd1 = creat("file1", 0666); if (fd1 < 0) break; for (i = 0; i < 512; i++) write(fd1, data, 4096); fsync(fd1); close(fd1); fd2 = creat("file2", 0666); if (fd2 < 0) break; ftruncate(fd2, 4096 * 512); for (i = 0; i < 512; i++) write(fd2, newdata, 4096); close(fd2); i = rename("file2", "file1"); unlink("file1"); } return 0; } and then pulling the power on the box, and then trying to run that test again when the box comes back up. I've tested this locally and it fixes the problem. Thanks to Tomas Carnecky for helping me track this down initially. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Yan, Zheng authored
commit 6c7d54ac upstream. Fix bug reported by Johannes Hirte. The reason of that bug is btrfs_del_items is called after btrfs_duplicate_item and btrfs_del_items triggers tree balance. The fix is check that case and call btrfs_search_slot when needed. Signed-off-by:
Yan Zheng <zheng.yan@oracle.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jiri Slaby authored
commit 2423fdfb upstream. Stanse found 2 memory leaks in relocate_block_group and __btrfs_map_block. cluster and multi are not freed/assigned on all paths. Fix that. Signed-off-by:
Jiri Slaby <jslaby@suse.cz> Cc: linux-btrfs@vger.kernel.org Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Yan, Zheng authored
commit a038fab0 upstream. Some callers of btrfs_ordered_update_i_size can now pass in a NULL for the ordered extent to update against. This makes sure we properly align the offset they pass in when deciding how much to bump the on disk i_size. Signed-off-by:
Chris Mason <chris.mason@oracle.com> Acked-by:
Jeff Mahoney <jeffm@suse.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-