- 30 Apr, 2014 40 commits
-
-
Jens Axboe authored
commit e39435ce upstream. I got a bug report yesterday from Laszlo Ersek in which he states that his kvm instance fails to suspend. Laszlo bisected it down to this commit 1cf7e9c6 ("virtio_blk: blk-mq support") where virtio-blk is converted to use the blk-mq infrastructure. After digging a bit, it became clear that the issue was with the queue drain. blk-mq tracks queue usage in a percpu counter, which is incremented on request alloc and decremented when the request is freed. The initial hunt was for an inconsistency in blk-mq, but everything seemed fine. In fact, the counter only returned crazy values when suspend was in progress. When a CPU is unplugged, the percpu counters merges that CPU state with the general state. blk-mq takes care to register a hotcpu notifier with the appropriate priority, so we know it runs after the percpu counter notifier. However, the percpu counter notifier only merges the state when the CPU is fully gone. This leaves a state transition where the CPU going away is no longer in the online mask, yet it still holds private values. This means that in this state, percpu_counter_sum() returns invalid results, and the suspend then hangs waiting for abs(dead-cpu-value) requests to complete which of course will never happen. Fix this by clearing the state earlier, so we never have a case where the CPU isn't in online mask but still holds private state. This bug has been there since forever, I guess we don't have a lot of users where percpu counters needs to be reliable during the suspend cycle. Signed-off-by: Jens Axboe <axboe@fb.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Takashi Iwai authored
commit 4f8e9400 upstream. PCM pointer callbacks in ice1712 driver check the buffer size boundary wrongly between bytes and frames. This leads to PCM core warnings like: snd_pcm_update_hw_ptr0: 105 callbacks suppressed ALSA pcm_lib.c:352 BUG: pcmC3D0c:0, pos = 5461, buffer size = 5461, period size = 2730 This patch fixes these checks to be placed after the proper unit conversions. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Oleg Nesterov authored
commit dfccbb5e upstream. wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and drops tasklist_lock. If this task is not the natural child and it is traced, we change its state back to EXIT_ZOMBIE for ->real_parent. The last transition is racy, this is even documented in 50b8d257 "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race". wait_consider_task() tries to detect this transition and clear ->notask_error but we can't rely on ptrace_reparented(), debugger can exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE. And there is another problem which were missed before: this transition can also race with reparent_leader() which doesn't reset >exit_signal if EXIT_DEAD, assuming that this task must be reaped by someone else. So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if /sbin/init doesn't use __WALL it becomes unreapable. Change reparent_leader() to update ->exit_signal even if EXIT_DEAD. Note: this is the simple temporary hack for -stable, it doesn't try to solve all problems, it will be reverted by the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Reported-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Lennart Poettering <lpoetter@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mizuma, Masayoshi authored
commit 55f67141 upstream. When I decrease the value of nr_hugepage in procfs a lot, softlockup happens. It is because there is no chance of context switch during this process. On the other hand, when I allocate a large number of hugepages, there is some chance of context switch. Hence softlockup doesn't happen during this process. So it's necessary to add the context switch in the freeing process as same as allocating process to avoid softlockup. When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing process occupied a CPU over 150 seconds and following softlockup message appeared twice or more. $ echo 6000000 > /proc/sys/vm/nr_hugepages $ cat /proc/sys/vm/nr_hugepages 6000000 $ grep ^Huge /proc/meminfo HugePages_Total: 6000000 HugePages_Free: 6000000 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB $ echo 0 > /proc/sys/vm/nr_hugepages BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ... Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1 Call Trace: free_pool_huge_page+0xb8/0xd0 set_max_huge_pages+0x128/0x190 hugetlb_sysctl_handler_common+0x113/0x140 hugetlb_sysctl_handler+0x1e/0x20 proc_sys_call_handler+0x97/0xd0 proc_sys_write+0x14/0x20 vfs_write+0xb8/0x1a0 sys_write+0x51/0x90 __audit_syscall_exit+0x265/0x290 system_call_fastpath+0x16/0x1b I have not confirmed this problem with upstream kernels because I am not able to prepare the machine equipped with 12TB memory now. However I confirmed that the amount of decreasing hugepages was directly proportional to the amount of required time. I measured required times on a smaller machine. It showed 130-145 hugepages decreased in a millisecond. Amount of decreasing Required time Decreasing rate hugepages (msec) (pages/msec) ------------------------------------------------------------ 10,000 pages == 20GB 70 - 74 135-142 30,000 pages == 60GB 208 - 229 131-144 It means decrement of 6TB hugepages will trigger softlockup with the default threshold 20sec, in this decreasing rate. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Vlastimil Babka authored
commit 57e68e9c upstream. A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin fuzzing with trinity. The call site try_to_unmap_cluster() does not lock the pages other than its check_page parameter (which is already locked). The BUG_ON in mlock_vma_page() is not documented and its purpose is somewhat unclear, but apparently it serializes against page migration, which could otherwise fail to transfer the PG_mlocked flag. This would not be fatal, as the page would be eventually encountered again, but NR_MLOCK accounting would become distorted nevertheless. This patch adds a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that effect. The call site try_to_unmap_cluster() is fixed so that for page != check_page, trylock_page() is attempted (to avoid possible deadlocks as we already have check_page locked) and mlock_vma_page() is performed only upon success. If the page lock cannot be obtained, the page is left without PG_mlocked, which is again not a problem in the whole unevictable memory design. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Nicholas Bellinger authored
commit d444edc6 upstream. This patch fixes a long-standing bug in iscsit_build_conn_drop_async_message() where during ERL=2 connection recovery, a bogus conn_p pointer could end up being used to send the ISCSI_OP_ASYNC_EVENT + DROPPING_CONNECTION notifying the initiator that cmd->logout_cid has failed. The bug was manifesting itself as an OOPs in iscsit_allocate_cmd() with a bogus conn_p pointer in iscsit_build_conn_drop_async_message(). Reported-by: Arshad Hussain <arshad.hussain@calsoftinc.com> Reported-by: santosh kulkarni <santosh.kulkarni@calsoftinc.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
alex chen authored
commit f7cf4f5b upstream. Do not put bh when buffer_uptodate failed in ocfs2_write_block and ocfs2_write_super_or_backup, because it will put bh in b_end_io. Otherwise it will hit a warning "VFS: brelse: Trying to free free buffer". Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Junxiao Bi authored
commit ded2cf71 upstream. There is a race window in dlm_do_recovery() between dlm_remaster_locks() and dlm_reset_recovery() when the recovery master nearly finish the recovery process for a dead node. After the master sends FINALIZE_RECO message in dlm_remaster_locks(), another node may become the recovery master for another dead node, and then send the BEGIN_RECO message to all the nodes included the old master, in the handler of this message dlm_begin_reco_handler() of old master, dlm->reco.dead_node and dlm->reco.new_master will be set to the second dead node and the new master, then in dlm_reset_recovery(), these two variables will be reset to default value. This will cause new recovery master can not finish the recovery process and hung, at last the whole cluster will hung for recovery. old recovery master: new recovery master: dlm_remaster_locks() become recovery master for another dead node. dlm_send_begin_reco_message() dlm_begin_reco_handler() { if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) { return -EAGAIN; } dlm_set_reco_master(dlm, br->node_idx); dlm_set_reco_dead_node(dlm, br->dead_node); } dlm_reset_recovery() { dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM); dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM); } will hang in dlm_remaster_locks() for request dlm locks info Before send FINALIZE_RECO message, recovery master should set DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done, this can break the race windows as the BEGIN_RECO messages will not be handled before DLM_RECO_STATE_FINALIZE flag is cleared. A similar race may happen between new recovery master and normal node which is in dlm_finalize_reco_handler(), also fix it. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Junxiao Bi authored
commit 34aa8dac upstream. This issue was introduced by commit 800deef3 ("ocfs2: use list_for_each_entry where benefical") in 2007 where it replaced list_for_each with list_for_each_entry. The variable "lock" will point to invalid data if "tmpq" list is empty and a panic will be triggered due to this. Sunil advised reverting it back, but the old version was also not right. At the end of the outer for loop, that list_for_each_entry will also set "lock" to an invalid data, then in the next loop, if the "tmpq" list is empty, "lock" will be an stale invalid data and cause the panic. So reverting the list_for_each back and reset "lock" to NULL to fix this issue. Another concern is that this seemes can not happen because the "tmpq" list should not be empty. Let me describe how. old lock resource owner(node 1): migratation target(node 2): image there's lockres with a EX lock from node 2 in granted list, a NR lock from node x with convert_type EX in converting list. dlm_empty_lockres() { dlm_pick_migration_target() { pick node 2 as target as its lock is the first one in granted list. } dlm_migrate_lockres() { dlm_mark_lockres_migrating() { res->state |= DLM_LOCK_RES_BLOCK_DIRTY; wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res)); //after the above code, we can not dirty lockres any more, // so dlm_thread shuffle list will not run downconvert lock from EX to NR upconvert lock from NR to EX <<< migration may schedule out here, then <<< node 2 send down convert request to convert type from EX to <<< NR, then send up convert request to convert type from NR to <<< EX, at this time, lockres granted list is empty, and two locks <<< in the converting list, node x up convert lock followed by <<< node 2 up convert lock. // will set lockres RES_MIGRATING flag, the following // lock/unlock can not run dlm_lockres_release_ast(dlm, res); } dlm_send_one_lockres() dlm_process_recovery_data() for (i=0; i<mres->num_locks; i++) if (ml->node == dlm->node_num) for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) { list_for_each_entry(lock, tmpq, list) if (lock) break; <<< lock is invalid as grant list is empty. } if (lock->ml.node != ml->node) BUG() >>> crash here } I see the above locks status from a vmcore of our internal bug. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Matt Fleming authored
commit a0c32761 upstream. Kees reported the following error: arch/sh/kernel/dumpstack.c: In function 'print_trace_address': arch/sh/kernel/dumpstack.c:118:2: error: format not a string literal and no format arguments [-Werror=format-security] Use the "%s" format so that it's impossible to interpret 'data' as a format string. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Reported-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Alex Deucher authored
commit 16086279 upstream. This needs to be done to update some of the fields in the connector structure used by the audio code. Noticed by several users on irc. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Christopher Friedt authored
commit aa6de142 upstream. Previously, the vmwgfx_fb driver would allow users to call FBIOSET_VINFO, but it would not adjust the FINFO properly, resulting in distorted screen rendering. The patch corrects that behaviour. See https://bugs.gentoo.org/show_bug.cgi?id=494794 for examples. Signed-off-by: Christopher Friedt <chrisfriedt@gmail.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jeff Mahoney authored
commit 01d88857 upstream. jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2) The -ENOENT is due to readdir calling dir_emit on the same entry twice. If the dir_emit callback sleeps and the tree is changed underneath us, we won't be able to trust deh_offset(deh) anymore. We need to save next_pos before we might sleep so we can find the next entry. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Yann Droneaud authored
commit 5bdb0f02 upstream. In case of error when writing to userspace, function ehca_create_cq() does not set an error code before following its error path. This patch sets the error code to -EFAULT when ib_copy_to_udata() fails. This was caught when using spatch (aka. coccinelle) to rewrite call to ib_copy_{from,to}_udata(). Link: https://www.gitorious.org/opteya/coccib/source/75ebf2c1033c64c1d81df13e4ae44ee99c989eba:ib_copy_udata.cocci Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.comSigned-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Yann Droneaud authored
commit 08e74c4b upstream. In case of error when writing to userspace, the function mthca_create_cq() does not set an error code before following its error path. This patch sets the error code to -EFAULT when ib_copy_to_udata() fails. This was caught when using spatch (aka. coccinelle) to rewrite call to ib_copy_{from,to}_udata(). Link: https://www.gitorious.org/opteya/coccib/source/75ebf2c1033c64c1d81df13e4ae44ee99c989eba:ib_copy_udata.cocci Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.comSigned-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
W. Trevor King authored
commit a4b7f21d upstream. The `lspci -nnvv` output contains (wrapped for line length): 00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller [8086:1e20] (rev 04) Subsystem: ASUSTeK Computer Inc. Device [1043:115d] Signed-off-by: W. Trevor King <wking@tremily.us> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Huacai Chen authored
commit c14af233 upstream. The original MIPS hibernate code flushes cache and TLB entries in swsusp_arch_resume(). But they are removed in Commit 44eeab67 (MIPS: Hibernation: Remove SMP TLB and cacheflushing code.). A cross- CPU flush is surely unnecessary because all but the local CPU have already been disabled. But a local flush (at least the TLB flush) is needed. When we do hibernation on Loongson-3 with an E1000E NIC, it is very easy to produce a kernel panic (kernel page fault, or unaligned access). The root cause is E1000E driver use vzalloc_node() to allocate pages, the stale TLB entries of the booting kernel will be misused by the resumed target kernel. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: John Crispin <john@phrozen.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/6643/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
J. Bruce Fields authored
commit 480efaee upstream. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mike Snitzer authored
commit fe76cd88 upstream. If unable to ensure_next_mapping() we must add the current bio, which was removed from the @bios list via bio_list_pop, back to the deferred_bios list before all the remaining @bios. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jani Nikula authored
commit e1f23f3d upstream. This is *not* bisected, but the likely regression is commit c3561438 Author: Zhao Yakui <yakui.zhao@intel.com> Date: Tue Nov 24 09:48:48 2009 +0800 drm/i915: Don't set up the TV port if it isn't in the BIOS table. The commit does not check for all TV device types that might be present in the VBT, disabling TV out for the missing ones. Add composite S-video. Reported-and-tested-by: Matthew Khouzam <matthew.khouzam@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73362Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> [bwh: Backported to 3.2: s/old\.device_type/device_type/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
J. Bruce Fields authored
commit 9f67f189 upstream. Looks like this bug has been here since these write counts were introduced, not sure why it was just noticed now. Thanks also to Jan Kara for pointing out the problem. Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
Part of commit bad0dcff ('new helpers: fh_{want,drop}_write()') upstream. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
J. Bruce Fields authored
commit 4c69d585 upstream. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
J. Bruce Fields authored
commit de3997a7 upstream. This was an omission from 8c18f205 "nfsd41: SUPPATTR_EXCLCREAT attribute". Cc: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jason Wang authored
commit ca3ba2a2 upstream. This patch bypass the timer_irq_works() check for hyperv guest since: - It was guaranteed to work. - timer_irq_works() may fail sometime due to the lpj calibration were inaccurate in a hyperv guest or a buggy host. In the future, we should get the tsc frequency from hypervisor and use preset lpj instead. [ hpa: I would prefer to not defer things to "the future" in the future... ] Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1393558229-14755-1-git-send-email-jasowang@redhat.comSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Marek Vasut authored
commit a585f87c upstream. The scenario here is that someone calls enable_irq_wake() from somewhere in the code. This will result in the lockdep producing a backtrace as can be seen below. In my case, this problem is triggered when using the wl1271 (TI WlCore) driver found in drivers/net/wireless/ti/ . The problem cause is rather obvious from the backtrace, but let's outline the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(), which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq() calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not marked as recursive, lockdep will spew the stuff below. We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to fix the spew. ============================================= [ INFO: possible recursive locking detected ] 3.10.33-00012-gf06b763-dirty #61 Not tainted --------------------------------------------- kworker/0:1/18 is trying to acquire lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 but task is already holding lock: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/18: #0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4 #1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4 #2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88 stack backtrace: CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty #61 Workqueue: events request_firmware_work_func [<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14) [<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64) [<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104) [<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) [<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) [<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44) [<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58) [<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4) [<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394) [<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0) [<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34) wlcore: loaded Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Josef Bacik authored
commit 3bbb24b2 upstream. Zach found this deadlock that would happen like this btrfs_end_transaction <- reduce trans->use_count to 0 btrfs_run_delayed_refs btrfs_cow_block find_free_extent btrfs_start_transaction <- increase trans->use_count to 1 allocate chunk btrfs_end_transaction <- decrease trans->use_count to 0 btrfs_run_delayed_refs lock tree block we are cowing above ^^ We need to only decrease trans->use_count if it is above 1, otherwise leave it alone. This will make nested trans be the only ones who decrease their added ref, and will let us get rid of the trans->use_count++ hack if we have to commit the transaction. Thanks, Reported-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Tested-by: Zach Brown <zab@redhat.com> Signed-off-by: Chris Mason <clm@fb.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Richard Guy Briggs authored
commit c92cdeb4 upstream. sys_getppid() returns the parent pid of the current process in its own pid namespace. Since audit filters are based in the init pid namespace, a process could avoid a filter or trigger an unintended one by being in an alternate pid namespace or log meaningless information. Switch to task_ppid_nr() for PPIDs to anchor all audit filters in the init_pid_ns. (informed by ebiederman's 6c621b7e) Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [bwh: Backported to 3.2: sys_getppid() is used by audit_exit() but not audit_log_task_info()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Richard Guy Briggs authored
commit ad36d282 upstream. Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup of the PPID (real_parent's pid_t) of a process, including rcu locking, in the arbitrary and init_pid_ns. This provides an alternative to sys_getppid(), which is relative to the child process' pid namespace. (informed by ebiederman's 6c621b7e) Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Krzysztof Kozlowski authored
commit 159ce52a upstream. During probe the driver allocates dummy I2C device for companion chip with i2c_new_dummy() but it does not check the return value of this call. In case of error (i2c_new_device(): memory allocation failure or I2C address cannot be used) this function returns NULL which is later used by regmap_init_i2c(). If i2c_new_dummy() fails for companion device, fail also the probe for main MFD driver. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> [bwh: Backported to 3.2: - Adjust filename, context - Add kfree() before return, as driver is not using managed allocations] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Krzysztof Kozlowski authored
commit 96cf3ded upstream. During probe the driver allocates dummy I2C devices for RTC and ADC with i2c_new_dummy() but it does not check the return value of this calls. In case of error (i2c_new_device(): memory allocation failure or I2C address cannot be used) this function returns NULL which is later used by i2c_unregister_device(). If i2c_new_dummy() fails for RTC or ADC devices, fail also the probe for main MFD driver. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Krzysztof Kozlowski authored
commit ed26f87b upstream. During probe the driver allocates dummy I2C device for RTC with i2c_new_dummy() but it does not check the return value of this call. In case of error (i2c_new_device(): memory allocation failure or I2C address cannot be used) this function returns NULL which is later used by i2c_unregister_device(). If i2c_new_dummy() fails for RTC device, fail also the probe for main MFD driver. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Krzysztof Kozlowski authored
commit 97dc4ed3 upstream. During probe the driver allocates dummy I2C devices for RTC, haptic and MUIC with i2c_new_dummy() but it does not check the return value of this calls. In case of error (i2c_new_device(): memory allocation failure or I2C address cannot be used) this function returns NULL which is later used by i2c_unregister_device(). If i2c_new_dummy() fails for RTC, haptic or MUIC devices, fail also the probe for main MFD driver. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Linus Walleij authored
commit a6e6e660 upstream. It is currently not possible to select the SA1100 or Vexpress drivers in the MFD subsystem, because the menu for the entire subsystem ends before these options are presented. Move the main menu closing and the endif for HAS_IOMEM to the end of the file so these are selectable again. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Yann Droneaud authored
commit 9d194d10 upstream. In case of error while accessing to userspace memory, function nes_create_qp() returns NULL instead of an error code wrapped through ERR_PTR(). But NULL is not expected by ib_uverbs_create_qp(), as it check for error with IS_ERR(). As page 0 is likely not mapped, it is going to trigger an Oops when the kernel will try to dereference NULL pointer to access to struct ib_qp's fields. In some rare cases, page 0 could be mapped by userspace, which could turn this bug to a vulnerability that could be exploited: the function pointers in struct ib_device will be under userspace total control. This was caught when using spatch (aka. coccinelle) to rewrite calls to ib_copy_{from,to}_udata(). Link: https://www.gitorious.org/opteya/ib-hw-nes-create-qp-null Link: https://www.gitorious.org/opteya/coccib/source/75ebf2c1033c64c1d81df13e4ae44ee99c989eba:ib_copy_udata.cocci Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.comSigned-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dennis Dalessandro authored
commit a2cb0eb8 upstream. Guard against a potential buffer overrun. The size to read from the user is passed in, and due to the padding that needs to be taken into account, as well as the place holder for the ICRC it is possible to overflow the 32bit value which would cause more data to be copied from user space than is allocated in the buffer. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Felix Fietkau authored
commit 3b3e0efb upstream. qi->tqi_readyTime is written directly to registers that expect microseconds as unit instead of TU. When setting the CABQ ready time, cur_conf->beacon_interval is in TU, so convert it to microseconds before passing it to ath9k_hw. This should hopefully fix some Tx DMA issues with buffered multicast frames in AP mode. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eric Whitney authored
commit c0634493 upstream. Commit 9cb00419, which enables hole punching for bigalloc file systems, exposed a bug introduced by commit 6ae06ff5 in an earlier release. When run on a bigalloc file system, xfstests generic/013, 068, 075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors or cause kernel error messages indicating that previously freed blocks are being freed again. The latter commit optimizes the selection of the starting extent in ext4_ext_rm_leaf() when hole punching by beginning with the extent supplied in the path argument rather than with the last extent in the leaf node (as is still done when truncating). However, the code in rm_leaf that initially sets partial_cluster to track cluster sharing on extent boundaries is only guaranteed to run if rm_leaf starts with the last node in the leaf. Consequently, partial_cluster is not correctly initialized when hole punching, and a cluster on the boundary of a punched region that should be retained may instead be deallocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Rusty Russell authored
commit 1f74ef0f upstream. When adding or removing 100G from a balloon: BUG: soft lockup - CPU#0 stuck for 22s! [vballoon:367] We have a wait_event_interruptible(), but the condition is always true (more ballooning to do) so we don't ever sleep. We also have a wait_event() for the host to ack, but that is also always true as QEMU is synchronous for balloon operations. Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Emmanuel Grumbach authored
commit 82e5a649 upstream. There is a flow in which we send the host command in SYNC mode, but we don't take priv->mutex. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1046495Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> [bwh: Backported to 3.2: - Adjust filename, context - mutex is priv->shrd->mutex] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-