- 15 Aug, 2018 17 commits
-
-
Fabio Estevam authored
commit 069f0534 upstream. devm_kasprintf() may fail, so we should better add a NULL check and propagate an error on failure. Signed-off-by:
Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by:
Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by:
Amit Pundir <amit.pundir@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 119e1ef8 upstream. __legitimize_mnt() has two problems - one is that in case of success the check of mount_lock is not ordered wrt preceding increment of refcount, making it possible to have successful __legitimize_mnt() on one CPU just before the otherwise final mntpu() on another, with __legitimize_mnt() not seeing mntput() taking the lock and mntput() not seeing the increment done by __legitimize_mnt(). Solved by a pair of barriers. Another is that failure of __legitimize_mnt() on the second read_seqretry() leaves us with reference that'll need to be dropped by caller; however, if that races with final mntput() we can end up with caller dropping rcu_read_lock() and doing mntput() to release that reference - with the first mntput() having freed the damn thing just as rcu_read_lock() had been dropped. Solution: in "do mntput() yourself" failure case grab mount_lock, check if MNT_DOOMED has been set by racing final mntput() that has missed our increment and if it has - undo the increment and treat that as "failure, caller doesn't need to drop anything" case. It's not easy to hit - the final mntput() has to come right after the first read_seqretry() in __legitimize_mnt() *and* manage to miss the increment done by __legitimize_mnt() before the second read_seqretry() in there. The things that are almost impossible to hit on bare hardware are not impossible on SMP KVM, though... Reported-by:
Oleg Nesterov <oleg@redhat.com> Fixes: 48a066e7 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 9ea0a46c upstream. mntput_no_expire() does the calculation of total refcount under mount_lock; unfortunately, the decrement (as well as all increments) are done outside of it, leading to false positives in the "are we dropping the last reference" test. Consider the following situation: * mnt is a lazy-umounted mount, kept alive by two opened files. One of those files gets closed. Total refcount of mnt is 2. On CPU 42 mntput(mnt) (called from __fput()) drops one reference, decrementing component * After it has looked at component #0, the process on CPU 0 does mntget(), incrementing component #0, gets preempted and gets to run again - on CPU 69. There it does mntput(), which drops the reference (component #69) and proceeds to spin on mount_lock. * On CPU 42 our first mntput() finishes counting. It observes the decrement of component #69, but not the increment of component #0. As the result, the total it gets is not 1 as it should've been - it's 0. At which point we decide that vfsmount needs to be killed and proceed to free it and shut the filesystem down. However, there's still another opened file on that filesystem, with reference to (now freed) vfsmount, etc. and we are screwed. It's not a wide race, but it can be reproduced with artificial slowdown of the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups. Fix consists of moving the refcount decrement under mount_lock; the tricky part is that we want (and can) keep the fast case (i.e. mount that still has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu() before that mntput(). IOW, if mntput() observes (under rcu_read_lock()) a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to be dropped. Reported-by:
Jann Horn <jannh@google.com> Tested-by:
Jann Horn <jannh@google.com> Fixes: 48a066e7 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 4c0d7cd5 upstream. RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by:
"Dae R. Jeong" <threeearcat@gmail.com> Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 90bad5e0 upstream. Since mountpoint crossing can happen without leaving lazy mode, root dentries do need the same protection against having their memory freed without RCU delay as everything else in the tree. It's partially hidden by RCU delay between detaching from the mount tree and dropping the vfsmount reference, but the starting point of pathwalk can be on an already detached mount, in which case umount-caused RCU delay has already passed by the time the lazy pathwalk grabs rcu_read_lock(). If the starting point happens to be at the root of that vfsmount *and* that vfsmount covers the entire filesystem, we get trouble. Fixes: 48a066e7 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit b5b1404d upstream. This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by:
Vlastimil Babka <vbabka@suse.cz> Reported-by:
Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Suggested-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Quinn Tran authored
commit 5e53be8e upstream. In the case of IOCB QFull, Initiator code can leave behind a stale pointer to an SRB structure on the outstanding command array. Fixes: 82de802a ("scsi: qla2xxx: Preparation for Target MQ.") Cc: stable@vger.kernel.org #v4.16+ Signed-off-by:
Quinn Tran <quinn.tran@cavium.com> Signed-off-by:
Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bart Van Assche authored
commit 1214fd7b upstream. Surround scsi_execute() calls with scsi_autopm_get_device() and scsi_autopm_put_device(). Note: removing sr_mutex protection from the scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of sr_mutex is to serialize cdrom_*() calls. This patch avoids that complaints similar to the following appear in the kernel log if runtime power management is enabled: INFO: task systemd-udevd:650 blocked for more than 120 seconds. Not tainted 4.18.0-rc7-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. systemd-udevd D28176 650 513 0x00000104 Call Trace: __schedule+0x444/0xfe0 schedule+0x4e/0xe0 schedule_preempt_disabled+0x18/0x30 __mutex_lock+0x41c/0xc70 mutex_lock_nested+0x1b/0x20 __blkdev_get+0x106/0x970 blkdev_get+0x22c/0x5a0 blkdev_open+0xe9/0x100 do_dentry_open.isra.19+0x33e/0x570 vfs_open+0x7c/0xd0 path_openat+0x6e3/0x1120 do_filp_open+0x11c/0x1c0 do_sys_open+0x208/0x2d0 __x64_sys_openat+0x59/0x70 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Cc: Maurizio Lombardi <mlombard@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Tested-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Juergen Gross authored
commit d472b3a6 upstream. skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache its return value. Cc: stable@vger.kernel.org Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
Wei Liu <wei.liu2@citrix.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Isaac J. Manjarres authored
commit 2610e889 upstream. This commit: 9fb8d5dc ("stop_machine, Disable preemption when waking two stopper threads") does not fully address the race condition that can occur as follows: On one CPU, call it CPU 3, thread 1 invokes cpu_stop_queue_two_works(2, 3,...), and the execution is such that thread 1 queues the works for migration/2 and migration/3, and is preempted after releasing the locks for migration/2 and migration/3, but before waking the threads. Then, On CPU 2, a kworker, call it thread 2, is running, and it invokes cpu_stop_queue_two_works(1, 2,...), such that thread 2 queues the works for migration/1 and migration/2. Meanwhile, on CPU 3, thread 1 resumes execution, and wakes migration/2 and migration/3. This means that when CPU 2 releases the locks for migration/1 and migration/2, but before it wakes those threads, it can be preempted by migration/2. If thread 2 is preempted by migration/2, then migration/2 will execute the first work item successfully, since migration/3 was woken up by CPU 3, but when it goes to execute the second work item, it disables preemption, calls multi_cpu_stop(), and thus, CPU 2 will wait forever for migration/1, which should have been woken up by thread 2. However migration/1 cannot be woken up by thread 2, since it is a kworker, so it is affine to CPU 2, but CPU 2 is running migration/2 with preemption disabled, so thread 2 will never run. Disable preemption after queueing works for stopper threads to ensure that the operation of queueing the works and waking the stopper threads is atomic. Co-Developed-by:
Prasad Sodagudi <psodagud@codeaurora.org> Co-Developed-by:
Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by:
Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by:
Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by:
Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: gregkh@linuxfoundation.org Cc: matt@codeblueprint.co.uk Fixes: 9fb8d5dc ("stop_machine, Disable preemption when waking two stopper threads") Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.orgSigned-off-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 3c53776e upstream. Way back in 4.9, we committed 4cd13c21 ("softirq: Let ksoftirqd do its job"), and ever since we've had small nagging issues with it. For example, we've had: 1ff68820 ("watchdog: core: make sure the watchdog_worker is not deferred") 8d5755b3 ("watchdog: softdog: fire watchdog even if softirqs do not get to run") 217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()") all of which worked around some of the effects of that commit. The DVB people have also complained that the commit causes excessive USB URB latencies, which seems to be due to the USB code using tasklets to schedule USB traffic. This seems to be an issue mainly when already living on the edge, but waiting for ksoftirqd to handle it really does seem to cause excessive latencies. Now Hanna Hawa reports that this issue isn't just limited to USB URB and DVB, but also causes timeout problems for the Marvell SoC team: "I'm facing kernel panic issue while running raid 5 on sata disks connected to Macchiatobin (Marvell community board with Armada-8040 SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2) uses a tasklet to clean the done descriptors from the queue" The latency problem causes a panic: mv_xor_v2 f0400000.xor: dma_sync_wait: timeout! Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction We've discussed simply just reverting the original commit entirely, and also much more involved solutions (with per-softirq threads etc). This patch is intentionally stupid and fairly limited, because the issue still remains, and the other solutions either got sidetracked or had other issues. We should probably also consider the timer softirqs to be synchronous and not be delayed to ksoftirqd (since they were the issue with the earlier watchdog problems), but that should be done as a separate patch. This does only the tasklet cases. Reported-and-tested-by:
Hanna Hawa <hannah@marvell.com> Reported-and-tested-by:
Josef Griebichler <griebichler.josef@gmx.at> Reported-by:
Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrey Konovalov authored
commit 12c8f25a upstream. KASAN uses the __no_sanitize_address macro to disable instrumentation of particular functions. Right now it's defined only for GCC build, which causes false positives when clang is used. This patch adds a definition for clang. Note, that clang's revision 329612 or higher is required. [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check] Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.comSigned-off-by:
Andrey Konovalov <andreyknvl@google.com> Acked-by:
Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Paul Lawrence <paullawrence@google.com> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Sodagudi Prasad <psodagud@codeaurora.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
commit b5b6e8c8 upstream. Since commit 84676c1f ("genirq/affinity: assign vectors to all possible CPUs") it is possible to end up in a scenario where only offline CPUs are mapped to an interrupt vector. This is only an issue for the legacy I/O path since with blk-mq/scsi-mq an I/O can't be submitted to a hardware queue if the queue isn't mapped to an online CPU. Fix this issue by forcing virtio-scsi to use blk-mq. [mkp: commit desc] Cc: Omar Sandoval <osandov@fb.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Don Brace <don.brace@microsemi.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Fixes: 84676c1f ("genirq/affinity: assign vectors to all possible CPUs") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Acked-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
commit 2f31115e upstream. This patch introduces 'force_blk_mq' to the scsi_host_template so that drivers that have no desire to support the legacy I/O path can signal blk-mq only support. [mkp: commit desc] Cc: Omar Sandoval <osandov@fb.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Don Brace <don.brace@microsemi.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
commit 8b834bff upstream. Since commit 84676c1f ("genirq/affinity: assign vectors to all possible CPUs") we could end up with an MSI-X vector that did not have any online CPUs mapped. This would lead to I/O hangs since there was no CPU to receive the completion. Retrieve IRQ affinity information using pci_irq_get_affinity() and use this mapping to choose a reply queue. [mkp: tweaked commit desc] Cc: Hannes Reinecke <hare@suse.de> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Don Brace <don.brace@microsemi.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Meelis Roos <mroos@linux.ee> Cc: Artem Bityutskiy <artem.bityutskiy@intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Fixes: 84676c1f ("genirq/affinity: assign vectors to all possible CPUs") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Tested-by:
Laurence Oberman <loberman@redhat.com> Tested-by:
Don Brace <don.brace@microsemi.com> Tested-by:
Artem Bityutskiy <artem.bityutskiy@intel.com> Acked-by:
Don Brace <don.brace@microsemi.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John David Anglin authored
commit fedb8da9 upstream. For years I thought all parisc machines executed loads and stores in order. However, Jeff Law recently indicated on gcc-patches that this is not correct. There are various degrees of out-of-order execution all the way back to the PA7xxx processor series (hit-under-miss). The PA8xxx series has full out-of-order execution for both integer operations, and loads and stores. This is described in the following article: http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml For this reason, we need to define mb() and to insert a memory barrier before the store unlocking spinlocks. This ensures that all memory accesses are complete prior to unlocking. The ldcw instruction performs the same function on entry. Signed-off-by:
John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by:
Helge Deller <deller@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Helge Deller authored
commit 66509a27 upstream. Enable the -mlong-calls compiler option by default, because otherwise in most cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F relocations. This fixes building the 64-bit defconfig. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by:
Helge Deller <deller@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 09 Aug, 2018 22 commits
-
-
Greg Kroah-Hartman authored
-
Shankara Pailoor authored
commit 92d34134 upstream. The code is assuming the buffer is max_size length, but we weren't allocating enough space for it. Signed-off-by:
Shankara Pailoor <shankarapailoor@gmail.com> Signed-off-by:
Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Sandeen authored
commit bb3d48dc upstream. xfs_attr3_leaf_create may have errored out before instantiating a buffer, for example if the blkno is out of range. In that case there is no work to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops if we try. This also seems to fix a flaw where the original error from xfs_attr3_leaf_create gets overwritten in the cleanup case, and it removes a pointless assignment to bp which isn't used after this. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969Reported-by:
Xu, Wen <wen.xu@gatech.edu> Tested-by:
Xu, Wen <wen.xu@gatech.edu> Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Cc: Eduardo Valentin <eduval@amazon.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Chinner authored
commit afca6c5b upstream. A recent fuzzed filesystem image cached random dcache corruption when the reproducer was run. This often showed up as panics in lookup_slow() on a null inode->i_ops pointer when doing pathwalks. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 .... Call Trace: lookup_slow+0x44/0x60 walk_component+0x3dd/0x9f0 link_path_walk+0x4a7/0x830 path_lookupat+0xc1/0x470 filename_lookup+0x129/0x270 user_path_at_empty+0x36/0x40 path_listxattr+0x98/0x110 SyS_listxattr+0x13/0x20 do_syscall_64+0xf5/0x280 entry_SYSCALL_64_after_hwframe+0x42/0xb7 but had many different failure modes including deadlocks trying to lock the inode that was just allocated or KASAN reports of use-after-free violations. The cause of the problem was a corrupt INOBT on a v4 fs where the root inode was marked as free in the inobt record. Hence when we allocated an inode, it chose the root inode to allocate, found it in the cache and re-initialised it. We recently fixed a similar inode allocation issue caused by inobt record corruption problem in xfs_iget_cache_miss() in commit ee457001 ("xfs: catch inode allocation state mismatch corruption"). This change adds similar checks to the cache-hit path to catch it, and turns the reproducer into a corruption shutdown situation. Reported-by:
Wen Xu <wen.xu@gatech.edu> Signed-Off-By:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix typos in comment] Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Cc: Eduardo Valentin <eduval@amazon.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Chinner authored
commit ee457001 upstream. We recently came across a V4 filesystem causing memory corruption due to a newly allocated inode being setup twice and being added to the superblock inode list twice. From code inspection, the only way this could happen is if a newly allocated inode was not marked as free on disk (i.e. di_mode wasn't zero). Running the metadump on an upstream debug kernel fails during inode allocation like so: XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod= e.c, line: 838 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:114! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0= 1/2014 RIP: 0010:assfail+0x28/0x30 RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202 RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30 R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10 FS: 00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000= 000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0 Call Trace: xfs_ialloc+0x383/0x570 xfs_dir_ialloc+0x6a/0x2a0 xfs_create+0x412/0x670 xfs_generic_create+0x1f7/0x2c0 ? capable_wrt_inode_uidgid+0x3f/0x50 vfs_mkdir+0xfb/0x1b0 SyS_mkdir+0xcf/0xf0 do_syscall_64+0x73/0x1a0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Extracting the inode number we crashed on from an event trace and looking at it with xfs_db: xfs_db> inode 184452204 xfs_db> p core.magic = 0x494e core.mode = 0100644 core.version = 2 core.format = 2 (extents) core.nlinkv2 = 1 core.onlink = 0 ..... Confirms that it is not a free inode on disk. xfs_repair also trips over this inode: ..... zero length extent (off = 0, fsbno = 0) in ino 184452204 correcting nextents for inode 184452204 bad attribute fork in inode 184452204, would clear attr fork bad nblocks 1 for inode 184452204, would reset to 0 bad anextents 1 for inode 184452204, would reset to 0 imap claims in-use inode 184452204 is free, would correct imap would have cleared inode 184452204 ..... disconnected inode 184452204, would move to lost+found And so we have a situation where the directory structure and the inobt thinks the inode is free, but the inode on disk thinks it is still in use. Where this corruption came from is not possible to diagnose, but we can detect it and prevent the kernel from oopsing on lookup. The reproducer now results in: $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5} mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex= ists mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex= ists mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu= re needs cleaning mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o= utput error mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o= utput error .... And this corruption shutdown: [ 54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not= marked free on disk [ 54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 = of file fs/xfs/xfs_trans.c. Caller xfs_create+0x425/0x670 [ 54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #= 443 [ 54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO= S 1.10.2-1 04/01/2014 [ 54.852859] Call Trace: [ 54.853531] dump_stack+0x85/0xc5 [ 54.854385] xfs_trans_cancel+0x197/0x1c0 [ 54.855421] xfs_create+0x425/0x670 [ 54.856314] xfs_generic_create+0x1f7/0x2c0 [ 54.857390] ? capable_wrt_inode_uidgid+0x3f/0x50 [ 54.858586] vfs_mkdir+0xfb/0x1b0 [ 54.859458] SyS_mkdir+0xcf/0xf0 [ 54.860254] do_syscall_64+0x73/0x1a0 [ 54.861193] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 54.862492] RIP: 0033:0x7fb73bddf547 [ 54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000= 000000000053 [ 54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73= bddf547 [ 54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda= a55449a [ 54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a= 8670dd0 [ 54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000= 00001ff [ 54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda= a553500 [ 54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1= 024 of file fs/xfs/xfs_trans.c. Return address = ffffffff814cd050 [ 54.882790] XFS (loop0): Corruption of in-memory data detected. Shutt= ing down filesystem [ 54.884597] XFS (loop0): Please umount the filesystem and rectify the = problem(s) Note that this crash is only possible on v4 filesystemsi or v5 filesystems mounted with the ikeep mount option. For all other V5 filesystems, this problem cannot occur because we don't read inodes we are allocating from disk - we simply overwrite them with the new inode information. Signed-Off-By:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com> Tested-by:
Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Cc: Eduardo Valentin <eduval@amazon.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Len Brown authored
commit a4c44753 upstream. When MWAIT is disabled, intel_idle refuses to probe. But it may mis-lead the user by blaming this on the model number: intel_idle: does not run on family 6 modesl 79 So defer the check for MWAIT until after the model# white-list check succeeds, and if the MWAIT check fails, tell the user how to fix it: intel_idle: Please enable MWAIT in BIOS SETUP Signed-off-by:
Len Brown <len.brown@intel.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Eduardo Valentin <eduval@amazon.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
James Smart authored
commit d082dc15 upstream. The existing code to carve up the sg list expected an sg element-per-page which can be very incorrect with iommu's remapping multiple memory pages to fewer bus addresses. To hit this error required a large io payload (greater than 256k) and a system that maps on a per-page basis. It's possible that large ios could get by fine if the system condensed the sgl list into the first 64 elements. This patch corrects the sg list handling by specifically walking the sg list element by element and attempting to divide the transfer up on a per-sg element boundary. While doing so, it still tries to keep sequences under 256k, but will exceed that rule if a single sg element is larger than 256k. Fixes: 48fa362b ("nvmet-fc: simplify sg list handling") Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by:
James Smart <james.smart@broadcom.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Keith Busch authored
commit 62314e40 upstream. The queue count says the highest queue that's been allocated, so don't reallocate a queue lower than that. Fixes: 147b27e4 ("nvme-pci: allocate device queues storage space at probe") Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jon Derrick <jonathan.derrick@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sagi Grimberg authored
commit 147b27e4 upstream. It may cause race by setting 'nvmeq' in nvme_init_request() because .init_request is called inside switching io scheduler, which may happen when the NVMe device is being resetted and its nvme queues are being freed and created. We don't have any sync between the two pathes. This patch changes the nvmeq allocation to occur at probe time so there is no way we can dereference it at init_request. [ 93.268391] kernel BUG at drivers/nvme/host/pci.c:408! [ 93.274146] invalid opcode: 0000 [#1] SMP [ 93.278618] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_cstate ipmi_si iTCO_vendor_support intel_uncore mxm_wmi mei_me ipmi_devintf intel_rapl_perf pcspkr sg ipmi_msghandler lpc_ich dcdbas mei shpchp acpi_power_meter wmi dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci nvme libata crc32c_intel nvme_core tg3 megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [ 93.349071] CPU: 5 PID: 1842 Comm: sh Not tainted 4.15.0-rc2.ming+ #4 [ 93.356256] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017 [ 93.364801] task: 00000000fb8abf2a task.stack: 0000000028bd82d1 [ 93.371408] RIP: 0010:nvme_init_request+0x36/0x40 [nvme] [ 93.377333] RSP: 0018:ffffc90002537ca8 EFLAGS: 00010246 [ 93.383161] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008 [ 93.391122] RDX: 0000000000000000 RSI: ffff880276ae0000 RDI: ffff88047bae9008 [ 93.399084] RBP: ffff88047bae9008 R08: ffff88047bae9008 R09: 0000000009dabc00 [ 93.407045] R10: 0000000000000004 R11: 000000000000299c R12: ffff880186bc1f00 [ 93.415007] R13: ffff880276ae0000 R14: 0000000000000000 R15: 0000000000000071 [ 93.422969] FS: 00007f33cf288740(0000) GS:ffff88047ba80000(0000) knlGS:0000000000000000 [ 93.431996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 93.438407] CR2: 00007f33cf28e000 CR3: 000000047e5bb006 CR4: 00000000001606e0 [ 93.446368] Call Trace: [ 93.449103] blk_mq_alloc_rqs+0x231/0x2a0 [ 93.453579] blk_mq_sched_alloc_tags.isra.8+0x42/0x80 [ 93.459214] blk_mq_init_sched+0x7e/0x140 [ 93.463687] elevator_switch+0x5a/0x1f0 [ 93.467966] ? elevator_get.isra.17+0x52/0xc0 [ 93.472826] elv_iosched_store+0xde/0x150 [ 93.477299] queue_attr_store+0x4e/0x90 [ 93.481580] kernfs_fop_write+0xfa/0x180 [ 93.485958] __vfs_write+0x33/0x170 [ 93.489851] ? __inode_security_revalidate+0x4c/0x60 [ 93.495390] ? selinux_file_permission+0xda/0x130 [ 93.500641] ? _cond_resched+0x15/0x30 [ 93.504815] vfs_write+0xad/0x1a0 [ 93.508512] SyS_write+0x52/0xc0 [ 93.512113] do_syscall_64+0x61/0x1a0 [ 93.516199] entry_SYSCALL64_slow_path+0x25/0x25 [ 93.521351] RIP: 0033:0x7f33ce96aab0 [ 93.525337] RSP: 002b:00007ffe57570238 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 93.533785] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f33ce96aab0 [ 93.541746] RDX: 0000000000000006 RSI: 00007f33cf28e000 RDI: 0000000000000001 [ 93.549707] RBP: 00007f33cf28e000 R08: 000000000000000a R09: 00007f33cf288740 [ 93.557669] R10: 00007f33cf288740 R11: 0000000000000246 R12: 00007f33cec42400 [ 93.565630] R13: 0000000000000006 R14: 0000000000000001 R15: 0000000000000000 [ 93.573592] Code: 4c 8d 40 08 4c 39 c7 74 16 48 8b 00 48 8b 04 08 48 85 c0 74 16 48 89 86 78 01 00 00 31 c0 c3 8d 4a 01 48 63 c9 48 c1 e1 03 eb de <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 53 48 89 [ 93.594676] RIP: nvme_init_request+0x36/0x40 [nvme] RSP: ffffc90002537ca8 [ 93.602273] ---[ end trace 810dde3993e5f14e ]--- Reported-by:
Yi Zhang <yi.zhang@redhat.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jon Derrick <jonathan.derrick@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filipe Manana authored
commit bd3599a0 upstream. When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified extents in the inode's extent map tree, so that a "fast" fsync (the flag BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps and log corresponding extent items. However, at the end of range cloning operation we do truncate all the pages in the affected range (in order to ensure future reads will not get stale data). Sometimes this truncation will release the corresponding extent maps besides the pages from the page cache. If this happens, then a "fast" fsync operation will miss logging some extent items, because it relies exclusively on the extent maps being present in the inode's extent tree, leading to data loss/corruption if the fsync ends up using the same transaction used by the clone operation (that transaction was not committed in the meanwhile). An extent map is released through the callback btrfs_invalidatepage(), which gets called by truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The later ends up calling try_release_extent_mapping() which will release the extent map if some conditions are met, like the file size being greater than 16Mb, gfp flags allow blocking and the range not being locked (which is the case during the clone operation) nor being the extent map flagged as pinned (also the case for cloning). The following example, turned into a test for fstests, reproduces the issue: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar # reflink destination offset corresponds to the size of file bar, # 2728Kb minus 4Kb. $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar $ md5sum /mnt/bar 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar <power fail> $ mount /dev/sdb /mnt $ md5sum /mnt/bar 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the # power failure In the above example, the destination offset of the clone operation corresponds to the size of the "bar" file minus 4Kb. So during the clone operation, the extent map covering the range from 2572Kb to 2728Kb gets trimmed so that it ends at offset 2724Kb, and a new extent map covering the range from 2724Kb to 11724Kb is created. So at the end of the clone operation when we ask to truncate the pages in the range from 2724Kb to 2724Kb + 15908Kb, the page invalidation callback ends up removing the new extent map (through try_release_extent_mapping()) when the page at offset 2724Kb is passed to that callback. Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent map is removed at try_release_extent_mapping(), forcing the next fsync to search for modified extents in the fs/subvolume tree instead of relying on the presence of extent maps in memory. This way we can continue doing a "fast" fsync if the destination range of a clone operation does not overlap with an existing range or if any of the criteria necessary to remove an extent map at try_release_extent_mapping() is not met (file size not bigger then 16Mb or gfp flags do not allow blocking). CC: stable@vger.kernel.org # 3.16+ Signed-off-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Esben Haabendal authored
commit 9f9e3e0d upstream. Make sure to call reinit_completion() before dma is started to avoid race condition where reinit_completion() is called after complete() and before wait_for_completion_timeout(). Signed-off-by:
Esben Haabendal <eha@deif.com> Fixes: ce1a7884 ("i2c: imx: add DMA support for freescale i2c driver") Reviewed-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by:
Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masami Hiramatsu authored
commit 73c8d894 upstream. Maintain the tracing on/off setting of the ring_buffer when switching to the trace buffer snapshot. Taking a snapshot is done by swapping the backup ring buffer (max_tr_buffer). But since the tracing on/off setting is defined by the ring buffer, when swapping it, the tracing on/off setting can also be changed. This causes a strange result like below: /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 0 > tracing_on /sys/kernel/debug/tracing # cat tracing_on 0 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 0 We don't touch tracing_on, but snapshot changes tracing_on setting each time. This is an anomaly, because user doesn't know that each "ring_buffer" stores its own tracing-enable state and the snapshot is done by swapping ring buffers. Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp> Cc: stable@vger.kernel.org Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace") Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> [ Updated commit log and comment in the code ] Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
commit a0040c01 upstream. Hyper-V instances support PCI pass-through which is implemented through PV pci-hyperv driver. When a device is passed through, a new root PCI bus is created in the guest. The bus sits on top of VMBus and has no associated information in ACPI. acpi_pci_add_bus() in this case proceeds all the way to acpi_evaluate_dsm(), which reports ACPI: \: failed to evaluate _DSM (0x1001) While acpi_pci_slot_enumerate() and acpiphp_enumerate_slots() are protected against ACPI_HANDLE() being NULL and do nothing, acpi_evaluate_dsm() is not and gives us the error. It seems the correct fix is to not do anything in acpi_pci_add_bus() in such cases. Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Cc: Sinan Kaya <okaya@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Theodore Ts'o authored
commit 44de022c upstream. Ext4_check_descriptors() was getting called before s_gdb_count was initialized. So for file systems w/o the meta_bg feature, allocation bitmaps could overlap the block group descriptors and ext4 wouldn't notice. For file systems with the meta_bg feature enabled, there was a fencepost error which would cause the ext4_check_descriptors() to incorrectly believe that the block allocation bitmap overlaps with the block group descriptor blocks, and it would reject the mount. Fix both of these problems. Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by:
Benjamin Gilbert <bgilbert@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Safonov authored
commit 91874ecf upstream. It's legal to have 64 groups for netlink_sock. As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe only to first 32 groups. The check for correctness of .bind() userspace supplied parameter is done by applying mask made from ngroups shift. Which broke Android as they have 64 groups and the shift for mask resulted in an overflow. Fixes: 61f4b237 ("netlink: Don't shift with UB on nlk->ngroups") Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Reported-and-Tested-by:
Nathan Chancellor <natechancellor@gmail.com> Signed-off-by:
Dmitry Safonov <dima@arista.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Frederic Weisbecker authored
commit 0a0e0829 upstream. The full nohz tick is reprogrammed in irq_exit() only if the exit is not in a nesting interrupt. This stands as an optimization: whether a hardirq or a softirq is interrupted, the tick is going to be reprogrammed when necessary at the end of the inner interrupt, with even potential new updates on the timer queue. When soft interrupts are interrupted, it's assumed that they are executing on the tail of an interrupt return. In that case tick_nohz_irq_exit() is called after softirq processing to take care of the tick reprogramming. But the assumption is wrong: softirqs can be processed inline as well, ie: outside of an interrupt, like in a call to local_bh_enable() or from ksoftirqd. Inline softirqs don't reprogram the tick once they are done, as opposed to interrupt tail softirq processing. So if a tick interrupts an inline softirq processing, the next timer will neither be reprogrammed from the interrupting tick's irq_exit() nor after the interrupted softirq processing. This situation may leave the tick unprogrammed while timers are armed. To fix this, simply keep reprogramming the tick even if a softirq has been interrupted. That can be optimized further, but for now correctness is more important. Note that new timers enqueued in nohz_full mode after a softirq gets interrupted will still be handled just fine through self-IPIs triggered by the timer code. Reported-by:
Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: stable@vger.kernel.org # 4.14+ Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.orgSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anna-Maria Gleixner authored
commit 80d20d35 upstream. local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by:
Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by:
Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by:
Frederic Weisbecker <frederic@kernel.org> Cc: bigeasy@linutronix.de Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit d1f0301b upstream. The support of force threading interrupts which are set up with both a primary and a threaded handler wreckaged the setup of regular requested threaded interrupts (primary handler == NULL). The reason is that it does not check whether the primary handler is set to the default handler which wakes the handler thread. Instead it replaces the thread handler with the primary handler as it would do with force threaded interrupts which have been requested via request_irq(). So both the primary and the thread handler become the same which then triggers the warnon that the thread handler tries to wakeup a not configured secondary thread. Fortunately this only happens when the driver omits the IRQF_ONESHOT flag when requesting the threaded interrupt, which is normaly caught by the sanity checks when force irq threading is disabled. Fix it by skipping the force threading setup when a regular threaded interrupt is requested. As a consequence the interrupt request which lacks the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging it. Fixes: 2a1d3ab8 ("genirq: Handle force threading of irqs with primary and thread handler") Reported-by:
Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Kurt Kanzenbach <kurt.kanzenbach@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anil Gurumurthy authored
commit b4146c49 upstream. Propagate the task management completion status properly to avoid unnecessary waits for commands to complete. Fixes: faef62d1 ("[SCSI] qla2xxx: Fix Task Management command asynchronous handling") Cc: <stable@vger.kernel.org> Signed-off-by:
Anil Gurumurthy <anil.gurumurthy@cavium.com> Signed-off-by:
Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Quinn Tran authored
commit b08abbd9 upstream. During unload process, the chip can encounter problem where a FW dump would be captured. For this case, the full reset sequence will be skip to bring the chip back to full operational state. Fixes: e315cd28 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring") Cc: <stable@vger.kernel.org> Signed-off-by:
Quinn Tran <quinn.tran@cavium.com> Signed-off-by:
Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Quinn Tran authored
commit efa93f48 upstream. Add wait for session deletion to finish before freeing an NPIV scsi host. Fixes: 726b8548 ("qla2xxx: Add framework for async fabric discovery") Cc: <stable@vger.kernel.org> Signed-off-by:
Quinn Tran <quinn.tran@cavium.com> Signed-off-by:
Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Quinn Tran authored
commit e3dde080 upstream. In case of IOCB Queue full or system where memory is low and driver receives large number of RSCN storm, the stale sp pointer can stay on gpnid_list resulting in page_fault. This patch fixes this issue by initializing the sp->elem list head and removing sp->elem before memory is freed. Following stack trace is seen 9 [ffff987b37d1bc60] page_fault at ffffffffad516768 [exception RIP: qla24xx_async_gpnid+496] 10 [ffff987b37d1bd10] qla24xx_async_gpnid at ffffffffc039866d [qla2xxx] 11 [ffff987b37d1bd80] qla2x00_do_work at ffffffffc036169c [qla2xxx] 12 [ffff987b37d1be38] qla2x00_do_dpc_all_vps at ffffffffc03adfed [qla2xxx] 13 [ffff987b37d1be78] qla2x00_do_dpc at ffffffffc036458a [qla2xxx] 14 [ffff987b37d1bec8] kthread at ffffffffacebae31 Fixes: 2d73ac61 ("scsi: qla2xxx: Serialize GPNID for multiple RSCN") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by:
Quinn Tran <quinn.tran@cavium.com> Signed-off-by:
Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 Aug, 2018 1 commit
-
-
Greg Kroah-Hartman authored
-