- 18 May, 2014 23 commits
-
-
Miao Xie authored
commit 1c70d8fb upstream. Currently, with inode cache enabled, we will reuse its inode id immediately after unlinking file, we may hit something like following: |->iput inode |->return inode id into inode cache |->create dir,fsync |->power off An easy way to reproduce this problem is: mkfs.btrfs -f /dev/sdb mount /dev/sdb /mnt -o inode_cache,commit=100 dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync inode_id=`ls -i /mnt/data | awk '{print $1}'` rm -f /mnt/data i=1 while [ 1 ] do mkdir /mnt/dir_$i test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'` if [ $test1 -eq $inode_id ] then dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync echo b > /proc/sysrq-trigger fi sleep 1 i=$(($i+1)) done mount /dev/sdb /mnt umount /dev/sdb btrfs check /dev/sdb We fix this problem by adding unlinked inode's id into pinned tree, and we can not reuse them until committing transaction. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Stefan Behrens authored
commit ff76b056 upstream. Due to an off-by-one error, it is possible to reproduce a bug when the inode cache is used. The same inode number is assigned twice, the second time this leads to an EEXIST in btrfs_insert_empty_items(). The issue can happen when a file is removed right after a subvolume is created and then a new inode number is created before the inodes in free_inode_pinned are processed. unlink() calls btrfs_return_ino() which calls start_caching() in this case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by searching for the highest inode (which already cannot find the unlinked one anymore in btrfs_find_free_objectid()). So if this unlinked inode's number is equal to the highest_ino + 1 (or >= this value instead of > this value which was the off-by-one error), we mustn't add the inode number to free_ino_pinned (caching_thread() does it right). In this case we need to try directly to add the number to the inode_cache which will fail in this case. When this inode number is allocated while it is still in free_ino_pinned, it is allocated and still added to the free inode cache when the pinned inodes are processed, thus one of the following inode number allocations will get an inode that is already in use and fail with EEXIST in btrfs_insert_empty_items(). One example which was created with the reproducer below: Create a snapshot, work in the newly created snapshot for the rest. In unlink(inode 34284) call btrfs_return_ino() which calls start_caching(). start_caching() calls add_free_space [34284, 18446744073709517077]. In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. btrfs_unpin_free_ino calls add_free_space [34284, 1]. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. EEXIST when the new inode is inserted. One possible reproducer is this one: #!/bin/sh # preparation TEST_DEV=/dev/sdc1 TEST_MNT=/mnt umount ${TEST_MNT} 2>/dev/null || true mkfs.btrfs -f ${TEST_DEV} mount ${TEST_DEV} ${TEST_MNT} -o \ rw,relatime,compress=lzo,space_cache,inode_cache btrfs subv create ${TEST_MNT}/s1 for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2 FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'` rm ${TEST_MNT}/s2/$FILENAME touch ${TEST_MNT}/s2/$FILENAME # the following steps can be repeated to reproduce the issue again and again [ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3 btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3 rm ${TEST_MNT}/s3/$FILENAME touch ${TEST_MNT}/s3/$FILENAME ls -alFi ${TEST_MNT}/s?/$FILENAME touch ${TEST_MNT}/s3/_1 || logger FAILED ls -alFi ${TEST_MNT}/s?/_1 touch ${TEST_MNT}/s3/_2 || logger FAILED ls -alFi ${TEST_MNT}/s?/_2 touch ${TEST_MNT}/s3/__1 || logger FAILED ls -alFi ${TEST_MNT}/s?/__1 touch ${TEST_MNT}/s3/__2 || logger FAILED ls -alFi ${TEST_MNT}/s?/__2 # if the above is not enough, add the following loop: for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done # one of the touch(1) calls in s3 fail due to EEXIST because the inode is # already in use that btrfs_find_ino_for_alloc() returns. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Johan Hovold authored
commit 10164c2a upstream. Fix driver new_id sysfs-attribute removal deadlock by making sure to not hold any locks that the attribute operations grab when removing the attribute. Specifically, usb_serial_deregister holds the table mutex when deregistering the driver, which includes removing the new_id attribute. This can lead to a deadlock as writing to new_id increments the attribute's active count before trying to grab the same mutex in usb_serial_probe. The deadlock can easily be triggered by inserting a sleep in usb_serial_deregister and writing the id of an unbound device to new_id during module unload. As the table mutex (in this case) is used to prevent subdriver unload during probe, it should be sufficient to only hold the lock while manipulating the usb-serial driver list during deregister. A racing probe will then either fail to find a matching subdriver or fail to get the corresponding module reference. Since v3.15-rc1 this also triggers the following lockdep warning: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc2 #123 Tainted: G W ------------------------------------------------------- modprobe/190 is trying to acquire lock: (s_active#4){++++.+}, at: [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94 but task is already holding lock: (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (table_lock){+.+.+.}: [<c0075f84>] __lock_acquire+0x1694/0x1ce4 [<c0076de8>] lock_acquire+0xb4/0x154 [<c03af3cc>] _raw_spin_lock+0x4c/0x5c [<c02bbc24>] usb_store_new_id+0x14c/0x1ac [<bf007eb4>] new_id_store+0x68/0x70 [usbserial] [<c025f568>] drv_attr_store+0x30/0x3c [<c01690e0>] sysfs_kf_write+0x5c/0x60 [<c01682c0>] kernfs_fop_write+0xd4/0x194 [<c010881c>] vfs_write+0xbc/0x198 [<c0108e4c>] SyS_write+0x4c/0xa0 [<c000f880>] ret_fast_syscall+0x0/0x48 -> #0 (s_active#4){++++.+}: [<c03a7a28>] print_circular_bug+0x68/0x2f8 [<c0076218>] __lock_acquire+0x1928/0x1ce4 [<c0076de8>] lock_acquire+0xb4/0x154 [<c0166b70>] __kernfs_remove+0x254/0x310 [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94 [<c0169fb8>] remove_files.isra.1+0x48/0x84 [<c016a2fc>] sysfs_remove_group+0x58/0xac [<c016a414>] sysfs_remove_groups+0x34/0x44 [<c02623b8>] driver_remove_groups+0x1c/0x20 [<c0260e9c>] bus_remove_driver+0x3c/0xe4 [<c026235c>] driver_unregister+0x38/0x58 [<bf007fb4>] usb_serial_bus_deregister+0x84/0x88 [usbserial] [<bf004db4>] usb_serial_deregister+0x6c/0x78 [usbserial] [<bf005330>] usb_serial_deregister_drivers+0x2c/0x4c [usbserial] [<bf016618>] usb_serial_module_exit+0x14/0x1c [sierra] [<c009d6cc>] SyS_delete_module+0x184/0x210 [<c000f880>] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(table_lock); lock(s_active#4); lock(table_lock); lock(s_active#4); *** DEADLOCK *** 1 lock held by modprobe/190: #0: (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial] stack backtrace: CPU: 0 PID: 190 Comm: modprobe Tainted: G W 3.15.0-rc2 #123 [<c0015e10>] (unwind_backtrace) from [<c0013728>] (show_stack+0x20/0x24) [<c0013728>] (show_stack) from [<c03a9a54>] (dump_stack+0x24/0x28) [<c03a9a54>] (dump_stack) from [<c03a7cac>] (print_circular_bug+0x2ec/0x2f8) [<c03a7cac>] (print_circular_bug) from [<c0076218>] (__lock_acquire+0x1928/0x1ce4) [<c0076218>] (__lock_acquire) from [<c0076de8>] (lock_acquire+0xb4/0x154) [<c0076de8>] (lock_acquire) from [<c0166b70>] (__kernfs_remove+0x254/0x310) [<c0166b70>] (__kernfs_remove) from [<c0167aa0>] (kernfs_remove_by_name_ns+0x4c/0x94) [<c0167aa0>] (kernfs_remove_by_name_ns) from [<c0169fb8>] (remove_files.isra.1+0x48/0x84) [<c0169fb8>] (remove_files.isra.1) from [<c016a2fc>] (sysfs_remove_group+0x58/0xac) [<c016a2fc>] (sysfs_remove_group) from [<c016a414>] (sysfs_remove_groups+0x34/0x44) [<c016a414>] (sysfs_remove_groups) from [<c02623b8>] (driver_remove_groups+0x1c/0x20) [<c02623b8>] (driver_remove_groups) from [<c0260e9c>] (bus_remove_driver+0x3c/0xe4) [<c0260e9c>] (bus_remove_driver) from [<c026235c>] (driver_unregister+0x38/0x58) [<c026235c>] (driver_unregister) from [<bf007fb4>] (usb_serial_bus_deregister+0x84/0x88 [usbserial]) [<bf007fb4>] (usb_serial_bus_deregister [usbserial]) from [<bf004db4>] (usb_serial_deregister+0x6c/0x78 [usbserial]) [<bf004db4>] (usb_serial_deregister [usbserial]) from [<bf005330>] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial]) [<bf005330>] (usb_serial_deregister_drivers [usbserial]) from [<bf016618>] (usb_serial_module_exit+0x14/0x1c [sierra]) [<bf016618>] (usb_serial_module_exit [sierra]) from [<c009d6cc>] (SyS_delete_module+0x184/0x210) [<c009d6cc>] (SyS_delete_module) from [<c000f880>] (ret_fast_syscall+0x0/0x48) Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Xiangyu Lu authored
commit 80bb3ef1 upstream. In big-endian systems, "%1" get the most significant part of the value, cause the instruction to get the wrong result. When viewing ftrace record in big-endian ARM systems, we found that the timestamp errors: swapper-0 [001] 1325.970000: 0:120:R ==> [001] 16:120:R events/1 events/1-16 [001] 1325.970000: 16:120:S ==> [001] 0:120:R swapper swapper-0 [000] 1325.1000000: 0:120:R + [000] 15:120:R events/0 swapper-0 [000] 1325.1000000: 0:120:R ==> [000] 15:120:R events/0 swapper-0 [000] 1326.030000: 0:120:R + [000] 1150:120:R sshd swapper-0 [000] 1326.030000: 0:120:R ==> [000] 1150:120:R sshd When viewed ftrace records, it will call the do_div(n, base) function, which achieved arch/arm/include/asm/div64.h in. When n = 10000000, base = 1000000, in do_div(n, base) will execute "umull %Q0, %R0, %1, %Q2". Reviewed-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Alex Wu <wuquanming@huawei.com> Signed-off-by: Xiangyu Lu <luxiangyu@huawei.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Linus Torvalds authored
commit 1b17844b upstream. fixup_user_fault() is used by the futex code when the direct user access fails, and the futex code wants it to either map in the page in a usable form or return an error. It relied on handle_mm_fault() to map the page, and correctly checked the error return from that, but while that does map the page, it doesn't actually guarantee that the page will be mapped with sufficient permissions to be then accessed. So do the appropriate tests of the vma access rights by hand. [ Side note: arguably handle_mm_fault() could just do that itself, but we have traditionally done it in the caller, because some callers - notably get_user_pages() - have been able to access pages even when they are mapped with PROT_NONE. Maybe we should re-visit that design decision, but in the meantime this is the minimal patch. ] Found by Dave Jones running his trinity tool. Reported-by: Dave Jones <davej@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Hans de Goede authored
commit 46a2986e upstream. We expect that all the Haswell series will need such quirks, sigh. The T431s seems to be T430 hardware in a T440s case, using the T440s touchpad, with the same min/max issue. The X1 Carbon 3rd generation name says 2nd while it is a 3rd generation. The X1 and T431s share a PnPID with the T540p, but the reported ranges are closer to those of the T440s. HdG: Squashed 5 quirk patches into one. T431s + L440 + L540 are written by me, S1 Yoga and X1 are written by Benjamin Tissoires. Hdg: Standardized S1 Yoga and X1 values, Yoga uses the same touchpad as the X240, X1 uses the same touchpad as the T440. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jani Nikula authored
commit 5017b285 upstream. dmi_match() considers a substring match to be a successful match. This is not always sufficient to distinguish between DMI data for different systems. Add support for exact string matching using strcmp() in addition to the substring matching using strstr(). The specific use case in the i915 driver is to allow us to use an exact match for D510MO, without also incorrectly matching D510MOV: { .ident = "Intel D510MO", .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "Intel"), DMI_EXACT_MATCH(DMI_BOARD_NAME, "D510MO"), }, } Signed-off-by: Jani Nikula <jani.nikula@intel.com> Cc: <annndddrr@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Cornel Panceac <cpanceac@gmail.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mizuma, Masayoshi authored
commit 7848a4bf upstream. soft lockup in freeing gigantic hugepage fixed in commit 55f67141 "mm: hugetlb: fix softlockup when a large number of hugepages are freed." can happen in return_unused_surplus_pages(), so let's fix it. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dan Williams authored
commit 8a4aeec8 upstream. The AHCI spec allows implementations to issue commands in tag order rather than FIFO order: 5.3.2.12 P:SelectCmd HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1) or HBA selects the command to issue that has had the PxCI bit set to '1' longer than any other command pending to be issued. The result is that commands posted sequentially (time-wise) may play out of sequence when issued by hardware. This behavior has likely been hidden by drives that arrange for commands to complete in issue order. However, it appears recent drives (two from different vendors that we have found so far) inflict out-of-order completions as a matter of course. So, we need to take care to maintain ordered submission, otherwise we risk triggering a drive to fall out of sequential-io automation and back to random-io processing, which incurs large latency and degrades throughput. This issue was found in simple benchmarks where QD=2 seq-write performance was 30-50% *greater* than QD=32 seq-write performance. Tagging for -stable and making the change globally since it has a low risk-to-reward ratio. Also, word is that recent versions of an unnamed OS also does it this way now. So, drives in the field are already experienced with this tag ordering scheme. Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ed Ciechanowski <ed.ciechanowski@intel.com> Reviewed-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jeff Layton authored
commit 3758cf7e upstream. ...otherwise the logic in the timeout handling doesn't work correctly. Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [bwh: Backported to 3.2: max_cb_time() takes no parameters] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Michael Ulbricht authored
commit 895d240d upstream. By specifying NO_UNION_NORMAL the ACM driver does only use the first two USB interfaces (modem data & control). The AT Port, Diagnostic and NMEA interfaces are left to the USB serial driver. Signed-off-by: Michael Ulbricht <michael.ulbricht@systec-electronic.com> Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com> Signed-off-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Aaron Sanders authored
commit b16c02fb upstream. Add device ids to pl2303 for the Hewlett-Packard HP POS pole displays: LD960: 03f0:0B39 LCM220: 03f0:3139 LCM960: 03f0:3239 [ Johan: fix indentation and sort PIDs numerically ] Signed-off-by: Aaron Sanders <aaron.sanders@hp.com> Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Tristan Bruns authored
commit 72b30079 upstream. Signed-off-by: Tristan Bruns <tristan@tristanbruns.de> Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniele Palmas authored
commit d6de486b upstream. option driver, added VID/PID for Telit UE910v2 modem Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Johan Hovold authored
commit 2e01280d upstream. This reverts commit 1ebca9da. This device was erroneously added to the sierra driver even though it's not a Sierra device and was already handled by the option driver. Cc: Richard Farina <sidhayn@gmail.com> Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Michele Baldessari authored
commit efe26e16 upstream. Custom VID/PIDs for Brainboxes cards as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1071914Signed-off-by: Michele Baldessari <michele@acksyn.org> Signed-off-by: Johan Hovold <jhovold@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Larry Finger authored
commit f764cd68 upstream. Zero-initializing ether_type masked that the ether type would never be obtained for 8021x packets and the comparison against eapol_type would always fail. Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Chris Mason authored
commit c98235cb upstream. The mlx4 driver is triggering schedules while atomic inside mlx4_en_netpoll: spin_lock_irqsave(&cq->lock, flags); napi_synchronize(&cq->napi); ^^^^^ msleep here mlx4_en_process_rx_cq(dev, cq, 0); spin_unlock_irqrestore(&cq->lock, flags); This was part of a patch by Alexander Guller from Mellanox in 2011, but it still isn't upstream. Signed-off-by: Chris Mason <clm@fb.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jeff Layton authored
commit f1c6bb2c upstream. A fl->fl_break_time of 0 has a special meaning to the lease break code that basically means "never break the lease". knfsd uses this to ensure that leases don't disappear out from under it. Unfortunately, the code in __break_lease can end up passing this value to wait_event_interruptible as a timeout, which prevents it from going to sleep at all. This makes __break_lease to spin in a tight loop and causes soft lockups. Fix this by ensuring that we pass a minimum value of 1 as a timeout instead. Cc: J. Bruce Fields <bfields@fieldses.org> Reported-by: Terry Barnaby <terry1@beam.ltd.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Helge Deller authored
commit ab3e55b1 upstream. This bug was detected with the libio-epoll-perl debian package where the test case IO-Ppoll-compat.t failed. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Theodore Ts'o authored
commit 6e6358fc upstream. We haven't taken i_mutex yet, so we need to use i_size_read(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Matthew Wilcox authored
commit 9503c67c upstream. ext4_end_bio() currently throws away the error that it receives. Chances are this is part of a spate of errors, one of which will end up getting the error returned to userspace somehow, but we shouldn't take that risk. Also print out the errno to aid in debug. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kazuya Mio authored
commit 4adb6ab3 upstream. When we try to get 2^32-1 block of the file which has the extent (ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON in ext4_ext_put_gap_in_cache(). To avoid the problem, ext4_map_blocks() needs to check the file logical block number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot handle 2^32-1 because the maximum file logical block number is 2^32-2. Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid. So ext4_map_blocks() should also return the same errno. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
- 30 Apr, 2014 17 commits
-
-
Ben Hutchings authored
-
Ben Hutchings authored
This reverts commit 584ec122, which was commit ddfadd77 upstream. It causes boot failure on 3.2 although no such problem occurs upstream. Reported-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Dan Williams <dan.j.williams@intel.com>
-
Ben Hutchings authored
This reverts commit b93b90ff, which was commit 0ef38d70 upstream. It was intended to fix a regression which never occurred in 3.2.
-
Mikulas Patocka authored
commit 22c73795 upstream. This patch reorders reported frequencies from the highest to the lowest, just like in other frequency drivers. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [bwh: Backported to 3.2: cpu_frequency_table::driver_data is called index] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mikulas Patocka authored
commit d82b922a upstream. The powernow-k6 driver used to read the initial multiplier from the powernow register. However, there is a problem with this: * If there was a frequency transition before, the multiplier read from the register corresponds to the current multiplier. * If there was no frequency transition since reset, the field in the register always reads as zero, regardless of the current multiplier that is set using switches on the mainboard and that the CPU is running at. The zero value corresponds to multiplier 4.5, so as a consequence, the powernow-k6 driver always assumes multiplier 4.5. For example, if we have 550MHz CPU with bus frequency 100MHz and multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5 and bus frequency is 122MHz. The powernow-k6 driver then sets the multiplier to 4.5, underclocking the CPU to 450MHz, but reports the current frequency as 550MHz. There is no reliable way how to read the initial multiplier. I modified the driver so that it contains a table of known frequencies (based on parameters of existing CPUs and some common overclocking schemes) and sets the multiplier according to the frequency. If the frequency is unknown (because of unusual overclocking or underclocking), the user must supply the bus speed and maximum multiplier as module parameters. This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [bwh: Backported to 3.2: - Adjust context - s/driver_data/index/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mikulas Patocka authored
commit e20e1d0a upstream. I found out that a system with k6-3+ processor is unstable during network server load. The system locks up or the network card stops receiving. The reason for the instability is the CPU frequency scaling. During frequency transition the processor is in "EPM Stop Grant" state. The documentation says that the processor doesn't respond to inquiry requests in this state. Consequently, coherency of processor caches and bus master devices is not maintained, causing the system instability. This patch flushes the cache during frequency transition. It fixes the instability. Other minor changes: * u64 invalue changed to unsigned long because the variable is 32-bit * move the logic to set the multiplier to a separate function powernow_k6_set_cpu_multiplier * preserve lower 5 bits of the powernow port instead of 4 (the voltage field has 5 bits) * mask interrupts when reading the multiplier, so that the port is not open during other activity (running other kernel code with the port open shouldn't cause any misbehavior, but we should better be safe and keep the port closed) This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Paul Moore authored
commit f64410ec upstream. This patch is based on an earlier patch by Eric Paris, he describes the problem below: "If an inode is accessed before policy load it will get placed on a list of inodes to be initialized after policy load. After policy load we call inode_doinit() which calls inode_doinit_with_dentry() on all inodes accessed before policy load. In the case of inodes in procfs that means we'll end up at the bottom where it does: /* Default to the fs superblock SID. */ isec->sid = sbsec->sid; if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) { if (opt_dentry) { isec->sclass = inode_mode_to_security_class(...) rc = selinux_proc_get_sid(opt_dentry, isec->sclass, &sid); if (rc) goto out_unlock; isec->sid = sid; } } Since opt_dentry is null, we'll never call selinux_proc_get_sid() and will leave the inode labeled with the label on the superblock. I believe a fix would be to mimic the behavior of xattrs. Look for an alias of the inode. If it can't be found, just leave the inode uninitialized (and pick it up later) if it can be found, we should be able to call selinux_proc_get_sid() ..." On a system exhibiting this problem, you will notice a lot of files in /proc with the generic "proc_t" type (at least the ones that were accessed early in the boot), for example: # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }' system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax However, with this patch in place we see the expected result: # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }' system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jiri Slaby authored
commit a94cdd1f upstream. In read_all_bytes, we do unsigned char i; ... bt->read_data[0] = BMC2HOST; bt->read_count = bt->read_data[0]; ... for (i = 1; i <= bt->read_count; i++) bt->read_data[i] = BMC2HOST; If bt->read_data[0] == bt->read_count == 255, we loop infinitely in the 'for' loop. Make 'i' an 'int' instead of 'char' to get rid of the overflow and finish the loop after 255 iterations every time. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-and-debugged-by: Rui Hui Dian <rhdian@novell.com> Cc: Tomas Cech <tcech@suse.cz> Cc: Corey Minyard <minyard@acm.org> Cc: <openipmi-developer@lists.sourceforge.net> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Qiang Huang authored
commit e4af376d(drivers: hv: switch to use mb() instead of smp_mb()), the adjustment mistakenly dropped the change in hv_ringbuffer_read, so add it. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Grover authored
commit 2c42be2d upstream. ft_del_tpg checks tpg->tport is set before unlinking the tpg from the tport when the tpg is being removed. Set this pointer in ft_tport_create, or the unlinking won't happen in ft_del_tpg and tport->tpg will reference a deleted object. This patch sets tpg->tport in ft_tport_create, because that's what ft_del_tpg checks, and is the only way to get back to the tport to clear tport->tpg. The bug was occuring when: - lport created, tport (our per-lport, per-provider context) is allocated. tport->tpg = NULL - tpg created - a PRLI is received. ft_tport_create is called, tpg is found and tport->tpg is set - tpg removed. ft_tpg is freed in ft_del_tpg. Since tpg->tport was not set, tport->tpg is not cleared and points at freed memory - Future calls to ft_tport_create return tport via first conditional, instead of searching for new tpg by calling ft_lport_find_tpg. tport->tpg is still invalid, and will access freed memory. see https://bugzilla.redhat.com/show_bug.cgi?id=1071340Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit b3b42ac2 upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 32-bit mode. Since 16-bit support is somewhat crippled anyway on a 64-bit kernel (no V86 mode), and most (if not quite all) 64-bit processors support virtualization for the users who really need it, simply reject attempts at creating a 16-bit segment when running on top of a 64-bit kernel. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.orgSigned-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Rafał Miłecki authored
commit 12cd43c6 upstream. Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b functions isn't safe. On my machine it causes delayed (!) CPU exception: Disabling lock debugging due to kernel taint mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f mce: [Hardware Error]: TSC 164083803dc mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0 mce: [Hardware Error]: Run the above through 'mcelog --ascii' mce: [Hardware Error]: Machine check: Processor context corrupt Kernel panic - not syncing: Fatal machine check on current CPU Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jens Axboe authored
commit e39435ce upstream. I got a bug report yesterday from Laszlo Ersek in which he states that his kvm instance fails to suspend. Laszlo bisected it down to this commit 1cf7e9c6 ("virtio_blk: blk-mq support") where virtio-blk is converted to use the blk-mq infrastructure. After digging a bit, it became clear that the issue was with the queue drain. blk-mq tracks queue usage in a percpu counter, which is incremented on request alloc and decremented when the request is freed. The initial hunt was for an inconsistency in blk-mq, but everything seemed fine. In fact, the counter only returned crazy values when suspend was in progress. When a CPU is unplugged, the percpu counters merges that CPU state with the general state. blk-mq takes care to register a hotcpu notifier with the appropriate priority, so we know it runs after the percpu counter notifier. However, the percpu counter notifier only merges the state when the CPU is fully gone. This leaves a state transition where the CPU going away is no longer in the online mask, yet it still holds private values. This means that in this state, percpu_counter_sum() returns invalid results, and the suspend then hangs waiting for abs(dead-cpu-value) requests to complete which of course will never happen. Fix this by clearing the state earlier, so we never have a case where the CPU isn't in online mask but still holds private state. This bug has been there since forever, I guess we don't have a lot of users where percpu counters needs to be reliable during the suspend cycle. Signed-off-by: Jens Axboe <axboe@fb.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Takashi Iwai authored
commit 4f8e9400 upstream. PCM pointer callbacks in ice1712 driver check the buffer size boundary wrongly between bytes and frames. This leads to PCM core warnings like: snd_pcm_update_hw_ptr0: 105 callbacks suppressed ALSA pcm_lib.c:352 BUG: pcmC3D0c:0, pos = 5461, buffer size = 5461, period size = 2730 This patch fixes these checks to be placed after the proper unit conversions. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Oleg Nesterov authored
commit dfccbb5e upstream. wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and drops tasklist_lock. If this task is not the natural child and it is traced, we change its state back to EXIT_ZOMBIE for ->real_parent. The last transition is racy, this is even documented in 50b8d257 "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race". wait_consider_task() tries to detect this transition and clear ->notask_error but we can't rely on ptrace_reparented(), debugger can exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE. And there is another problem which were missed before: this transition can also race with reparent_leader() which doesn't reset >exit_signal if EXIT_DEAD, assuming that this task must be reaped by someone else. So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if /sbin/init doesn't use __WALL it becomes unreapable. Change reparent_leader() to update ->exit_signal even if EXIT_DEAD. Note: this is the simple temporary hack for -stable, it doesn't try to solve all problems, it will be reverted by the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Reported-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Lennart Poettering <lpoetter@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mizuma, Masayoshi authored
commit 55f67141 upstream. When I decrease the value of nr_hugepage in procfs a lot, softlockup happens. It is because there is no chance of context switch during this process. On the other hand, when I allocate a large number of hugepages, there is some chance of context switch. Hence softlockup doesn't happen during this process. So it's necessary to add the context switch in the freeing process as same as allocating process to avoid softlockup. When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing process occupied a CPU over 150 seconds and following softlockup message appeared twice or more. $ echo 6000000 > /proc/sys/vm/nr_hugepages $ cat /proc/sys/vm/nr_hugepages 6000000 $ grep ^Huge /proc/meminfo HugePages_Total: 6000000 HugePages_Free: 6000000 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB $ echo 0 > /proc/sys/vm/nr_hugepages BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ... Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1 Call Trace: free_pool_huge_page+0xb8/0xd0 set_max_huge_pages+0x128/0x190 hugetlb_sysctl_handler_common+0x113/0x140 hugetlb_sysctl_handler+0x1e/0x20 proc_sys_call_handler+0x97/0xd0 proc_sys_write+0x14/0x20 vfs_write+0xb8/0x1a0 sys_write+0x51/0x90 __audit_syscall_exit+0x265/0x290 system_call_fastpath+0x16/0x1b I have not confirmed this problem with upstream kernels because I am not able to prepare the machine equipped with 12TB memory now. However I confirmed that the amount of decreasing hugepages was directly proportional to the amount of required time. I measured required times on a smaller machine. It showed 130-145 hugepages decreased in a millisecond. Amount of decreasing Required time Decreasing rate hugepages (msec) (pages/msec) ------------------------------------------------------------ 10,000 pages == 20GB 70 - 74 135-142 30,000 pages == 60GB 208 - 229 131-144 It means decrement of 6TB hugepages will trigger softlockup with the default threshold 20sec, in this decreasing rate. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Vlastimil Babka authored
commit 57e68e9c upstream. A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin fuzzing with trinity. The call site try_to_unmap_cluster() does not lock the pages other than its check_page parameter (which is already locked). The BUG_ON in mlock_vma_page() is not documented and its purpose is somewhat unclear, but apparently it serializes against page migration, which could otherwise fail to transfer the PG_mlocked flag. This would not be fatal, as the page would be eventually encountered again, but NR_MLOCK accounting would become distorted nevertheless. This patch adds a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that effect. The call site try_to_unmap_cluster() is fixed so that for page != check_page, trylock_page() is attempted (to avoid possible deadlocks as we already have check_page locked) and mlock_vma_page() is performed only upon success. If the page lock cannot be obtained, the page is left without PG_mlocked, which is again not a problem in the whole unevictable memory design. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-