- 31 Mar, 2016 40 commits
-
-
Vlastimil Babka authored
commit d9dddbf5 upstream. Hanjun Guo has reported that a CMA stress test causes broken accounting of CMA and free pages: > Before the test, I got: > -bash-4.3# cat /proc/meminfo | grep Cma > CmaTotal: 204800 kB > CmaFree: 195044 kB > > > After running the test: > -bash-4.3# cat /proc/meminfo | grep Cma > CmaTotal: 204800 kB > CmaFree: 6602584 kB > > So the freed CMA memory is more than total.. > > Also the the MemFree is more than mem total: > > -bash-4.3# cat /proc/meminfo > MemTotal: 16342016 kB > MemFree: 22367268 kB > MemAvailable: 22370528 kB Laura Abbott has confirmed the issue and suspected the freepage accounting rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA pageblocks: > CMA isolates MAX_ORDER aligned blocks, but, during the process, > partialy isolated block exists. If MAX_ORDER is 11 and > pageblock_order is 9, two pageblocks make up MAX_ORDER > aligned block and I can think following scenario because pageblock > (un)isolation would be done one by one. > > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, > MIGRATE_ISOLATE, respectively. > > CC -> IC -> II (Isolation) > II -> CI -> CC (Un-isolation) > > If some pages are freed at this intermediate state such as IC or CI, > that page could be merged to the other page that is resident on > different type of pageblock and it will cause wrong freepage count. This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but since it doesn't hold the zone->lock between pageblocks, a race window does exist. It's also likely that unexpected merging can occur between MIGRATE_ISOLATE and non-CMA pageblocks. This should be prevented in __free_one_page() since commit 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock"). However, we only check the migratetype of the pageblock where buddy merging has been initiated, not the migratetype of the buddy pageblock (or group of pageblocks) which can be MIGRATE_ISOLATE. Joonsoo has suggested checking for buddy migratetype as part of page_is_buddy(), but that would add extra checks in allocator hotpath and bloat-o-meter has shown significant code bloat (the function is inline). This patch reduces the bloat at some expense of more complicated code. The buddy-merging while-loop in __free_one_page() is initially bounded to pageblock_border and without any migratetype checks. The checks are placed outside, bumping the max_order if merging is allowed, and returning to the while-loop with a statement which can't be possibly considered harmful. This fixes the accounting bug and also removes the arguably weird state in the original commit 3c605096 where buddies could be left unmerged. Fixes: 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Link: https://lkml.org/lkml/2016/3/2/280Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Hanjun Guo <guohanjun@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Debugged-by: Laura Abbott <labbott@redhat.com> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ kamal: backport to 4.2-stable: context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Joseph Qi authored
commit be12b299 upstream. When master handles convert request, it queues ast first and then returns status. This may happen that the ast is sent before the request status because the above two messages are sent by two threads. And right after the ast is sent, if master down, it may trigger BUG in dlm_move_lockres_to_recovery_list in the requested node because ast handler moves it to grant list without clear lock->convert_pending. So remove BUG_ON statement and check if the ast is processed in dlmconvert_remote. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Joseph Qi authored
commit ac7cf246 upstream. There is a race window between dlmconvert_remote and dlm_move_lockres_to_recovery_list, which will cause a lock with OCFS2_LOCK_BUSY in grant list, thus system hangs. dlmconvert_remote { spin_lock(&res->spinlock); list_move_tail(&lock->list, &res->converting); lock->convert_pending = 1; spin_unlock(&res->spinlock); status = dlm_send_remote_convert_request(); >>>>>> race window, master has queued ast and return DLM_NORMAL, and then down before sending ast. this node detects master down and calls dlm_move_lockres_to_recovery_list, which will revert the lock to grant list. Then OCFS2_LOCK_BUSY won't be cleared as new master won't send ast any more because it thinks already be authorized. spin_lock(&res->spinlock); lock->convert_pending = 0; if (status != DLM_NORMAL) dlm_revert_pending_convert(res, lock); spin_unlock(&res->spinlock); } In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set (res is still in recovering) or res master changed (new master has finished recovery), reset the status to DLM_RECOVERING, then it will retry convert. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Guenter Roeck authored
commit 968ce1b1 upstream. The old web page for the hwmon subsystem is no longer operational, and the mailing list has become unreliable. Move both to kernel.org. Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> [ kamal: backport to 4.2-stable: context ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
John Dahlstrom authored
commit 4db9675d upstream. Some Lenovo ideapad models lack a physical rfkill switch. On Lenovo models ideapad Y700 Touch-15ISK and ideapad Y700-15ISK, ideapad-laptop would wrongly report all radios as blocked by hardware which caused wireless network connections to fail. Add these models without an rfkill switch to the no_hw_rfkill list. Signed-off-by: John Dahlstrom <jodarom@sdf.org> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Vladimir Zapolskiy authored
commit ccbc2a9e upstream. On error platform_device_register_simple() returns ERR_PTR() value, check for NULL always fails. The change corrects the check itself and propagates the returned error upwards. Fixes: 81fb0b90 ("staging: android: ion_test: unregister the platform device") Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
H Hartley Sweeten authored
commit bd3a3cd6 upstream. Memory mapped io (dev->mmio) should not also be writing to the ioport (dev->iobase) registers. Add the missing 'else' to these functions. Fixes: 0953ee4a ("staging: comedi: ni_mio_common: checkpatch.pl cleanup (else not useful)") Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Aurelien Jacquiot authored
commit 36915976 upstream. Fix deadlocking during concurrent receive and transmit operations on SMP platforms caused by the use of incorrect lock: on transmit 'tx_lock' spinlock should be used instead of 'lock' which is used for receive operation. This fix is applicable to kernel versions starting from v2.15. Signed-off-by: Aurelien Jacquiot <a-jacquiot@ti.com> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Jann Horn authored
commit 378c6520 upstream. This commit fixes the following security hole affecting systems where all of the following conditions are fulfilled: - The fs.suid_dumpable sysctl is set to 2. - The kernel.core_pattern sysctl's value starts with "/". (Systems where kernel.core_pattern starts with "|/" are not affected.) - Unprivileged user namespace creation is permitted. (This is true on Linux >=3.8, but some distributions disallow it by default using a distro patch.) Under these conditions, if a program executes under secure exec rules, causing it to run with the SUID_DUMP_ROOT flag, then unshares its user namespace, changes its root directory and crashes, the coredump will be written using fsuid=0 and a path derived from kernel.core_pattern - but this path is interpreted relative to the root directory of the process, allowing the attacker to control where a coredump will be written with root privileges. To fix the security issue, always interpret core_pattern for dumps that are written under SUID_DUMP_ROOT relative to the root directory of init. Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Steven Rostedt (Red Hat) authored
commit 3debb0a9 upstream. The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 07d777fe "tracing: Add percpu buffers for trace_printk()" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Paolo Bonzini authored
commit e9ad4ec8 upstream. Moving the initialization earlier is needed in 4.6 because kvm_arch_init_vm is now using mmu_lock, causing lockdep to complain: [ 284.440294] INFO: trying to register non-static key. [ 284.445259] the code is fine but needs lockdep annotation. [ 284.450736] turning off the locking correctness validator. ... [ 284.528318] [<ffffffff810aecc3>] lock_acquire+0xd3/0x240 [ 284.533733] [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm] [ 284.541467] [<ffffffff81715581>] _raw_spin_lock+0x41/0x80 [ 284.546960] [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm] [ 284.554707] [<ffffffffa0305aa0>] kvm_page_track_register_notifier+0x20/0x60 [kvm] [ 284.562281] [<ffffffffa02ece70>] kvm_mmu_init_vm+0x20/0x30 [kvm] [ 284.568381] [<ffffffffa02dbf7a>] kvm_arch_init_vm+0x1ea/0x200 [kvm] [ 284.574740] [<ffffffffa02bff3f>] kvm_dev_ioctl+0xbf/0x4d0 [kvm] However, it also helps fixing a preexisting problem, which is why this patch is also good for stable kernels: kvm_create_vm was incrementing current->mm->mm_count but not decrementing it at the out_err label (in case kvm_init_mmu_notifier failed). The new initialization order makes it possible to add the required mmdrop without adding a new error label. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Paolo Bonzini authored
commit 2849eb4f upstream. A guest executing an invalid invept instruction would hang because the instruction pointer was not updated. Fixes: bfd0a56bReviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Himanshu Madhani authored
commit 5e47f198 upstream. This patch fixes an active I/O shutdown bug for fabric drivers using target_wait_for_sess_cmds(), where se_cmd descriptor shutdown would result in hung tasks waiting indefinitely for se_cmd->cmd_wait_comp to complete(). To address this bug, drop the incorrect list_del_init() usage in target_wait_for_sess_cmds() and always complete() during se_cmd target_release_cmd_kref() put, in order to let caller invoke the final fabric release callback into se_cmd->se_tfo->release_cmd() code. Reported-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Tested-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Peter Zijlstra authored
commit f75d4864 upstream. __clear_bit_unlock() is a special little snowflake. While it carries the non-atomic '__' prefix, it is specifically documented to pair with test_and_set_bit() and therefore should be 'somewhat' atomic. Therefore the generic implementation of __clear_bit_unlock() cannot use the fully non-atomic __clear_bit() as a default. If an arch is able to do better; is must provide an implementation of __clear_bit_unlock() itself. Specifically, this came up as a result of hackbench livelock'ing in slab_lock() on ARC with SMP + SLUB + !LLSC. The issue was incorrect pairing of atomic ops. slab_lock() -> bit_spin_lock() -> test_and_set_bit() slab_unlock() -> __bit_spin_unlock() -> __clear_bit() The non serializing __clear_bit() was getting "lost" 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set 80543b90: or r3,r2,1 <--- (B) other core unlocks right here 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock) Fixes ARC STAR 9000817404 (and probably more). Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Noam Camus <noamc@ezchip.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160309114054.GJ6356@twins.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Tejun Heo authored
commit aaf25593 upstream. When cgroup writeback is in use, there can be multiple wb's (bdi_writeback's) per bdi and an inode may switch among them dynamically. In a couple places, the wrong wb was used leading to performing operations on the wrong list under the wrong lock corrupting the io lists. * writeback_single_inode() was taking @wb parameter and used it to remove the inode from io lists if it becomes clean after writeback. The callers of this function were always passing in the root wb regardless of the actual wb that the inode was associated with, which could also change while writeback is in progress. Fix it by dropping the @wb parameter and using inode_to_wb_and_lock_list() to determine and lock the associated wb. * After writeback_sb_inodes() writes out an inode, it re-locks @wb and inode to remove it from or move it to the right io list. It assumes that the inode is still associated with @wb; however, the inode may have switched to another wb while writeback was in progress. Fix it by using inode_to_wb_and_lock_list() to determine and lock the associated wb after writeback is complete. As the function requires the original @wb->list_lock locked for the next iteration, in the unlikely case where the inode has changed association, switch the locks. Kudos to Tahsin for pinpointing these subtle breakages. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: d10c8095 ("writeback: implement foreign cgroup inode bdi_writeback switching") Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.comReported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com> Tested-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Chris Mason authored
commit 590dca3a upstream. Commit 505a666e ("writeback: plug writeback in wb_writeback() and writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes, which increases the merge rate when relatively contiguous small files are written by the filesystem. It helps both on flash and spindles. For an fs_mark workload creating 4K files in parallel across 8 drives, this commit improves performance ~9% more by unplugging before calling cond_resched(). cond_resched() doesn't trigger an implicit unplug, so explicitly getting the IO down to the device before scheduling reduces latencies for anyone waiting on clean pages. It also cuts down on how often we use kblockd to unplug, which means less work bouncing from one workqueue to another. Many more details about how we got here: https://lkml.org/lkml/2015/9/11/570Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Tejun Heo authored
commit 614a4e37 upstream. locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with the target inode, unlocks inode, locks the wb's list_lock and verifies that the inode is still associated with the wb. To prevent the wb going away between dropping inode lock and acquiring list_lock, the wb is pinned while inode lock is held. The wb reference is put right after acquiring list_lock citing that the wb won't be dereferenced anymore. This isn't true. If the inode is still associated with the wb, the inode has reference and it's safe to return the wb; however, if inode has been switched, the wb still needs to be unlocked which is a dereference and can lead to use-after-free if it it races with wb destruction. Fix it by putting the reference after releasing list_lock. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 87e1d789 ("writeback: implement [locked_]inode_to_wb_and_lock_list()") Tested-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Victor Clément authored
commit 0ef21100 upstream. The Microsoft HD-5001 webcam microphone does not support sample rate reading as the HD-5000 one. This results in dmesg errors and sound hanging with pulseaudio. Signed-off-by: Victor Clément <victor.clement@openmailbox.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Rabin Vincent authored
commit d6785d91 upstream. Running the following command: busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null with any tracing enabled pretty very quickly leads to various NULL pointer dereferences and VM BUG_ON()s, such as these: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) kernel BUG at include/linux/mm.h:367! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89 (busybox's cat uses sendfile(2), unlike the coreutils version) This is because tracing_splice_read_pipe() can call splice_to_pipe() with spd->nr_pages == 0. spd_pages underflows in splice_to_pipe() and we fill the page pointers and the other fields of the pipe_buffers with garbage. All other callers of splice_to_pipe() avoid calling it when nr_pages == 0, and we could make tracing_splice_read_pipe() do that too, but it seems reasonable to have splice_to_page() handle this condition gracefully. Signed-off-by: Rabin Vincent <rabin@rab.in> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Steven Rostedt (Red Hat) authored
commit a29054d9 upstream. If tracing contains data and the trace_pipe file is read with sendfile(), then it can trigger a NULL pointer dereference and various BUG_ON within the VM code. There's a patch to fix this in the splice_to_pipe() code, but it's also a good idea to not let that happen from trace_pipe either. Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.inReported-by: Rabin Vincent <rabin.vincent@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Steven Rostedt (Red Hat) authored
commit cb86e053 upstream. Joel Fernandes reported that the function tracing of preempt disabled sections was not being reported when running either the preemptirqsoff or preemptoff tracers. This was due to the fact that the function tracer callback for those tracers checked if irqs were disabled before tracing. But this fails when we want to trace preempt off locations as well. Joel explained that he wanted to see funcitons where interrupts are enabled but preemption was disabled. The expected output he wanted: <...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit <...>-2265 1d..1 3419us : __do_softirq <-irq_exit <...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq <...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq <...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq <...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq <...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq <...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq <...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq <...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq <...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq <...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq <...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq <...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq There's a comment saying that the irq disabled check is because there's a possible race that tracing_cpu may be set when the function is executed. But I don't remember that race. For now, I added a check for preemption being enabled too to not record the function, as there would be no race if that was the case. I need to re-investigate this, as I'm now thinking that the tracing_cpu will always be correct. But no harm in keeping the check for now, except for the slight performance hit. Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com Fixes: 5e6d2b9c "tracing: Use one prologue for the preempt irqs off tracer function tracers" Reported-by: Joel Fernandes <agnel.joel@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Hans de Goede authored
commit 55ff8cfb upstream. The uas driver can never queue more then MAX_CMNDS (- 1) tags and tags are shared between luns, so there is no need to claim that we can_queue some random large number. Not claiming that we can_queue 65536 commands, fixes the uas driver failing to initialize while allocating the tag map with a "Page allocation failure (order 7)" error on systems which have been running for a while and thus have fragmented memory. Reported-and-tested-by: Yves-Alexis Perez <corsac@corsac.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Oliver Neukum authored
commit 0b818e39 upstream. Attacks that trick drivers into passing a NULL pointer to usb_driver_claim_interface() using forged descriptors are known. This thwarts them by sanity checking. Signed-off-by: Oliver Neukum <ONeukum@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Dave Jones authored
commit 7834c103 upstream. Since 4.4, I've been able to trigger this occasionally: =============================== [ INFO: suspicious RCU usage. ] 4.5.0-rc7-think+ #3 Not tainted Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20160315012054.GA17765@codemonkey.org.ukSigned-off-by: Thomas Gleixner <tglx@linutronix.de> ------------------------------- ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 1 RCU used illegally from extended quiescent state! no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc7-think+ #3 ffffffff92f821e0 1f3e5c340597d7fc ffff880468e07f10 ffffffff92560c2a ffff880462145280 0000000000000001 ffff880468e07f40 ffffffff921376a6 ffffffff93665ea0 0000cc7c876d28da 0000000000000005 ffffffff9383dd60 Call Trace: <IRQ> [<ffffffff92560c2a>] dump_stack+0x67/0x9d [<ffffffff921376a6>] lockdep_rcu_suspicious+0xe6/0x100 [<ffffffff925ae7a7>] do_trace_write_msr+0x127/0x1a0 [<ffffffff92061c83>] native_apic_msr_eoi_write+0x23/0x30 [<ffffffff92054408>] smp_trace_call_function_interrupt+0x38/0x360 [<ffffffff92d1ca60>] trace_call_function_interrupt+0x90/0xa0 <EOI> [<ffffffff92ac5124>] ? cpuidle_enter_state+0x1b4/0x520 Move the entering_irq() call before ack_APIC_irq(), because entering_irq() tells the RCU susbstems to end the extended quiescent state, so that the following trace call in ack_APIC_irq() works correctly. Suggested-by: Andi Kleen <ak@linux.intel.com> Fixes: 4787c368 "x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()" Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Benjamin Tissoires authored
commit 82be788c upstream. Looks like the fimware 8.2 still has the extra buttons spurious release bug. Link: https://bugzilla.kernel.org/show_bug.cgi?id=114321Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Johannes Weiner authored
commit b6e6edcf upstream. Setting the original memory.limit_in_bytes hardlimit is subject to a race condition when the desired value is below the current usage. The code tries a few times to first reclaim and then see if the usage has dropped to where we would like it to be, but there is no locking, and the workload is free to continue making new charges up to the old limit. Thus, attempting to shrink a workload relies on pure luck and hope that the workload happens to cooperate. To fix this in the cgroup2 memory.max knob, do it the other way round: set the limit first, then try enforcement. And if reclaim is not able to succeed, trigger OOM kills in the group. Keep going until the new limit is met, we run out of OOM victims and there's only unreclaimable memory left, or the task writing to memory.max is killed. This allows users to shrink groups reliably, and the behavior is consistent with what happens when new charges are attempted in excess of memory.max. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ kamal: backport to 4.2-stable: no Documentation/cgroup-v2.txt ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Johannes Weiner authored
commit 588083bb upstream. When setting memory.high below usage, nothing happens until the next charge comes along, and then it will only reclaim its own charge and not the now potentially huge excess of the new memory.high. This can cause groups to stay in excess of their memory.high indefinitely. To fix that, when shrinking memory.high, kick off a reclaim cycle that goes after the delta. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Joshua Hunt authored
commit a1ee1932 upstream. While working on a script to restore all sysctl params before a series of tests I found that writing any value into the /proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh} causes them to call proc_watchdog_update(). NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. There doesn't appear to be a reason for doing this work every time a write occurs, so only do it when the values change. Signed-off-by: Josh Hunt <johunt@akamai.com> Acked-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Oliver Neukum authored
commit a0ad220c upstream. A malicious device missing interface can make the driver oops. Add sanity checking. Signed-off-by: Oliver Neukum <ONeukum@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Ken Wang authored
commit 16a8a49b upstream. Signed-off-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Adrian Hunter authored
commit 99513624 upstream. Normally the timeout clock frequency is read from the capabilities register. It is also possible to set the value prior to calling sdhci_add_host() in which case that value will override the capabilities register value. However that was being done after calculating max_busy_timeout so that max_busy_timeout was being calculated using the wrong value of timeout_clk. Fix that by moving the override before max_busy_timeout is calculated. The result is that the max_busy_timeout and max_discard increase for BSW devices so that, for example, the time for mkfs.ext4 on a 64GB eMMC drops from about 1 minute 40 seconds to about 20 seconds. Note, in the future, the capabilities setting will be tidied up and this override won't be used anymore. However this fix is needed for stable. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Boris BREZILLON authored
commit dfe97ad3 upstream. Forward devm_ioremap_resource() error code instead of returning -ENOMEM. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk> Fixes: f63601fd ("crypto: marvell/cesa - add a new driver for Marvell's CESA") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Andy Lutomirski authored
commit c29016cf upstream. iopl(3) is supposed to work if iopl is already 3, even if unprivileged. This didn't work right on Xen PV. Fix it. Reviewewd-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8ce12013e6e4c0a44a97e316be4a6faff31bd5ea.1458162709.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Dmitry V. Levin authored
commit 5f8d498d upstream. Explicitly check show_devname method return code and bail out in case of an error. This fixes regression introduced by commit 9d4d6574. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
J. Bruce Fields authored
commit 2f6fc056 upstream. nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also unlocks if necessary (nfsd filehandle locking is probably too lenient), so it gets unlocked eventually, but if the following op in the compound needs to lock it again, we can deadlock. A fuzzer ran into this; normal clients don't send a secinfo followed by a readdir in the same compound. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Magnus Damm authored
commit bcdc9f26 upstream. This patch fixes the MMC SPI driver from doing polling card detect when a CD GPIO that supports interrupts is specified using the gpios DT property. Without this patch the DT node below results in the following output: spi_gpio: spi-gpio { /* SD2 @ CN12 */ compatible = "spi-gpio"; #address-cells = <1>; #size-cells = <0>; gpio-sck = <&gpio6 16 GPIO_ACTIVE_HIGH>; gpio-mosi = <&gpio6 17 GPIO_ACTIVE_HIGH>; gpio-miso = <&gpio6 18 GPIO_ACTIVE_HIGH>; num-chipselects = <1>; cs-gpios = <&gpio6 21 GPIO_ACTIVE_LOW>; status = "okay"; spi@0 { compatible = "mmc-spi-slot"; reg = <0>; voltage-ranges = <3200 3400>; spi-max-frequency = <25000000>; gpios = <&gpio6 22 GPIO_ACTIVE_LOW>; /* CD */ }; }; # dmesg | grep mmc mmc_spi spi32766.0: SD/MMC host mmc0, no WP, no poweroff, cd polling mmc0: host does not support reading read-only switch, assuming write-enable mmc0: new SDHC card on SPI mmcblk0: mmc0:0000 SU04G 3.69 GiB mmcblk0: p1 With this patch applied the "cd polling" portion above disappears. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Takashi Iwai authored
commit 1f7c6658 upstream. Cirrus HD-audio driver may adjust GPIO pins for EAPD dynamically depending on the jack plug state. This works fine for the auto-mute mode where the speaker gets muted upon the HP jack plug. OTOH, when the auto-mute mode is off, this turns off the EAPD unexpectedly depending on the jack state, which results in the silent speaker output. This patch fixes the silent speaker output issue by setting GPIO bits constantly when the auto-mute mode is off. Reported-and-tested-by: moosotc@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Dmitry Torokhov authored
commit 3b654288 upstream. Even though hid_hw_* checks that passed in data_len is less than HID_MAX_BUFFER_SIZE it is not enough, as i2c-hid does not necessarily allocate buffers of HID_MAX_BUFFER_SIZE but rather checks all device reports and select largest size. In-kernel users normally just send as much data as report needs, so there is no problem, but hidraw users can do whatever they please: BUG: KASAN: slab-out-of-bounds in memcpy+0x34/0x54 at addr ffffffc07135ea80 Write of size 4101 by task syz-executor/8747 CPU: 2 PID: 8747 Comm: syz-executor Tainted: G BU 3.18.0 #37 Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT) Call trace: [<ffffffc00020ebcc>] dump_backtrace+0x0/0x258 arch/arm64/kernel/traps.c:83 [<ffffffc00020ee40>] show_stack+0x1c/0x2c arch/arm64/kernel/traps.c:172 [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffc001958114>] dump_stack+0x90/0x140 lib/dump_stack.c:50 [< inline >] print_error_description mm/kasan/report.c:97 [< inline >] kasan_report_error mm/kasan/report.c:278 [<ffffffc0004597dc>] kasan_report+0x268/0x530 mm/kasan/report.c:305 [<ffffffc0004592e8>] __asan_storeN+0x20/0x150 mm/kasan/kasan.c:718 [<ffffffc0004594e0>] memcpy+0x30/0x54 mm/kasan/kasan.c:299 [<ffffffc001306354>] __i2c_hid_command+0x2b0/0x7b4 drivers/hid/i2c-hid/i2c-hid.c:178 [< inline >] i2c_hid_set_or_send_report drivers/hid/i2c-hid/i2c-hid.c:321 [<ffffffc0013079a0>] i2c_hid_output_raw_report.isra.2+0x3d4/0x4b8 drivers/hid/i2c-hid/i2c-hid.c:589 [<ffffffc001307ad8>] i2c_hid_output_report+0x54/0x68 drivers/hid/i2c-hid/i2c-hid.c:602 [< inline >] hid_hw_output_report include/linux/hid.h:1039 [<ffffffc0012cc7a0>] hidraw_send_report+0x400/0x414 drivers/hid/hidraw.c:154 [<ffffffc0012cc7f4>] hidraw_write+0x40/0x64 drivers/hid/hidraw.c:177 [<ffffffc0004681dc>] vfs_write+0x1d4/0x3cc fs/read_write.c:534 [< inline >] SYSC_pwrite64 fs/read_write.c:627 [<ffffffc000468984>] SyS_pwrite64+0xec/0x144 fs/read_write.c:614 Object at ffffffc07135ea80, in cache kmalloc-512 Object allocated with size 268 bytes. Let's check data length against the buffer size before attempting to copy data over. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Dmitry Torokhov <dtor@chromium.org> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Bryn M. Reeves authored
commit 98dbc9c6 upstream. An "old" (.request_fn) DM 'struct request' stores a pointer to the associated 'struct dm_rq_target_io' in rq->special. dm_requeue_original_request(), previously named dm_requeue_unmapped_original_request(), called dm_unprep_request() to reset rq->special to NULL. But rq_end_stats() would go on to hit a NULL pointer deference because its call to tio_from_request() returned NULL. Fix this by calling rq_end_stats() _before_ dm_unprep_request() Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: e262f347 ("dm stats: add support for request-based DM devices") Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Dmitri Epshtein authored
commit 928b6519 upstream. Function eth_prepare_mac_addr_change() is called as part of MAC address change. This function check if interface is running. To enable change MAC address when interface is running: IFF_LIVE_ADDR_CHANGE flag must be set to dev->priv_flags field Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: Dmitri Epshtein <dima@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-