- 15 Jun, 2016 40 commits
-
-
Peter Zijlstra authored
commit 9916e214 upstream. Remove the direct {push,pull} balancing operations from switched_{from,to}_dl() / prio_changed_dl() and use the balance callback queue. Again, err on the side of too many reschedules; since too few is a hard bug while too many is just annoying. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.968262663@infradead.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [Conflicts: kernel/sched/deadline.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com> [bwh: Backported to 3.16: - The check_resched / !CONFIG_SMP case in switched_to_dl() is different - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Peter Zijlstra authored
commit 0ea60c20 upstream. In order to be able to use pull_dl_task() from a callback, we need to do away with the return value. Since the return value indicates if we should reschedule, do this inside the function. Since not all callers currently do this, this can increase the number of reschedules due rt balancing. Too many reschedules is not a correctness issues, too few are. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.859398977@infradead.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [Conflicts: kernel/sched/deadline.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com> [bwh: Backported to 3.16: use resched_task() instead of resched_curr()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Peter Zijlstra authored
commit 9916e214 upstream. Remove the direct {push,pull} balancing operations from switched_{from,to}_rt() / prio_changed_rt() and use the balance callback queue. Again, err on the side of too many reschedules; since too few is a hard bug while too many is just annoying. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.766832367@infradead.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [Conflicts: kernel/sched/rt.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Peter Zijlstra authored
commit 8046d680 upstream. In order to be able to use pull_rt_task() from a callback, we need to do away with the return value. Since the return value indicates if we should reschedule, do this inside the function. Since not all callers currently do this, this can increase the number of reschedules due rt balancing. Too many reschedules is not a correctness issues, too few are. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.679002000@infradead.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [Conflicts: kernel/sched/rt.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com> [bwh: Backported to 3.16: use resched_task() instead of resched_curr()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Peter Zijlstra authored
commit 4c9a4bc8 upstream. In order to remove dropping rq->lock from the switched_{to,from}()/prio_changed() sched_class methods, run the balance callbacks after it. We need to remove dropping rq->lock because its buggy, suppose using sched_setattr()/sched_setscheduler() to change a running task from FIFO to OTHER. By the time we get to switched_from_rt() the task is already enqueued on the cfs runqueues. If switched_from_rt() does pull_rt_task() and drops rq->lock, load-balancing can come in and move our task @p to another rq. The subsequent switched_to_fair() still assumes @p is on @rq and bad things will happen. By using balance callbacks we delay the load-balancing operations {rt,dl}x{push,pull} until we've done all the important work and the task is fully set up. Furthermore, the balance callbacks do not know about @p, therefore they cannot get confused like this. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Link: http://lkml.kernel.org/r/20150611124742.615343911@infradead.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [Conflicts: kernel/sched/core.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Peter Zijlstra authored
commit e3fca9e7 upstream. Generalize the post_schedule() stuff into a balance callback list. This allows us to more easily use it outside of schedule() and cross sched_class. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: oleg@redhat.com Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124742.424032725@infradead.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [Conflicts: kernel/sched/core.c] Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Vineet Gupta authored
commit daaf40e5 upstream. Fixes: f7d11e93 locking,arch,arc: Fold atomic_ops Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Guenter Roeck <linux@roeck-us.net>
-
Will Deacon authored
commit f5e0a12c upstream. An arm64 allmodconfig fails to build with GCC 5 due to __asmeq assertions in the PSCI firmware calling code firing due to mcount preambles breaking our assumptions about register allocation of function arguments: /tmp/ccDqJsJ6.s: Assembler messages: /tmp/ccDqJsJ6.s:60: Error: .err encountered /tmp/ccDqJsJ6.s:61: Error: .err encountered /tmp/ccDqJsJ6.s:62: Error: .err encountered /tmp/ccDqJsJ6.s:99: Error: .err encountered /tmp/ccDqJsJ6.s:100: Error: .err encountered /tmp/ccDqJsJ6.s:101: Error: .err encountered This patch fixes the issue by moving the PSCI calls out-of-line into their own assembly files, which are safe from the compiler's meddling fingers. Reported-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Guenter Roeck <linux@roeck-us.net>
-
Konstantin Khlebnikov authored
commit 4d88e6f7 upstream. If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages with balloon and doesn't set PagePrivate flag, as a result balloon_page_dequeue() cannot get any pages because it thinks that all of them are isolated. Without balloon compaction nobody can isolate ballooned pages. It's safe to remove this check. Fixes: d6d86c0a ("mm/balloon_compaction: redesign ballooned pages management"). Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Reported-by: Matt Mullins <mmullins@mmlx.us> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: jian wang <wangjian@bytedance.com>
-
Konstantin Khlebnikov authored
commit d6d86c0a upstream. Sasha Levin reported KASAN splash inside isolate_migratepages_range(). Problem is in the function __is_movable_balloon_page() which tests AS_BALLOON_MAP in page->mapping->flags. This function has no protection against anonymous pages. As result it tried to check address space flags inside struct anon_vma. Further investigation shows more problems in current implementation: * Special branch in __unmap_and_move() never works: balloon_page_movable() checks page flags and page_count. In __unmap_and_move() page is locked, reference counter is elevated, thus balloon_page_movable() always fails. As a result execution goes to the normal migration path. virtballoon_migratepage() returns MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS, move_to_new_page() thinks this is an error code and assigns newpage->mapping to NULL. Newly migrated page lose connectivity with balloon an all ability for further migration. * lru_lock erroneously required in isolate_migratepages_range() for isolation ballooned page. This function releases lru_lock periodically, this makes migration mostly impossible for some pages. * balloon_page_dequeue have a tight race with balloon_page_isolate: balloon_page_isolate could be executed in parallel with dequeue between picking page from list and locking page_lock. Race is rare because they use trylock_page() for locking. This patch fixes all of them. Instead of fake mapping with special flag this patch uses special state of page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark directly in struct page makes everything safer and easier. PagePrivate is used to mark pages present in page list (i.e. not isolated, like PageLRU for normal pages). It replaces special rules for reference counter and makes balloon migration similar to migration of normal pages. This flag is protected by page_lock together with link to the balloon device. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: - Remove an additional check for MIGRATEPAGE_BALLOON_SUCCESS in __unmap_and_move() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: jian wang <wangjian@bytedance.com>
-
Lorenzo Pieralisi authored
commit f436b2ac upstream. The Performance Monitors extension is an optional feature of the AArch64 architecture, therefore, in order to access Performance Monitors registers safely, the kernel should detect the architected PMU unit presence through the ID_AA64DFR0_EL1 register PMUVer field before accessing them. This patch implements a guard by reading the ID_AA64DFR0_EL1 register PMUVer field to detect the architected PMU presence and prevent accessing PMU system registers if the Performance Monitors extension is not implemented in the core. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 60792ad3 ("arm64: kernel: enforce pmuserenr_el0 initialization and restore") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kangjie Lu authored
commit 79e48650 upstream. Stack object "dte_facilities" is allocated in x25_rx_call_request(), which is supposed to be initialized in x25_negotiate_facilities. However, 5 fields (8 bytes in total) are not initialized. This object is then copied to userland via copy_to_user, thus infoleak occurs. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kangjie Lu authored
commit 5f8e4474 upstream. The stack object “map” has a total size of 32 bytes. Its last 4 bytes are padding generated by compiler. These padding bytes are not initialized and sent out via “nla_put”. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kangjie Lu authored
commit b8670c09 upstream. The stack object “info” has a total size of 12 bytes. Its last byte is padding which is not initialized and leaked via “put_cmsg”. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Hugh Dickins authored
commit 42cb14b1 upstream. clear_page_dirty_for_io() has accumulated writeback and memcg subtleties since v2.6.16 first introduced page migration; and the set_page_dirty() which completed its migration of PageDirty, later had to be moderated to __set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too. No actual problems seen with this procedure recently, but if you look into what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually achieving, it turns out to be nothing more than moving the PageDirty flag, and its NR_FILE_DIRTY stat from one zone to another. It would be good to avoid a pile of irrelevant decrementations and incrementations, and improper event counting, and unnecessary descent of the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which radix_tree_replace_slot() left in place anyway). Do the NR_FILE_DIRTY movement, like the other stats movements, while interrupts still disabled in migrate_page_move_mapping(); and don't even bother if the zone is the same. Do the PageDirty movement there under tree_lock too, where old page is frozen and newpage not yet visible: bearing in mind that as soon as newpage becomes visible in radix_tree, an un-page-locked set_page_dirty() might interfere (or perhaps that's just not possible: anything doing so should already hold an additional reference to the old page, preventing its migration; but play safe). But we do still need to transfer PageDirty in migrate_page_copy(), for those who don't go the mapping route through migrate_page_move_mapping(). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: adjust context. This is not just an optimisation, but turned out to fix a possible oops (CVE-2016-3070).] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David Howells authored
commit 23c8a812 upstream. This fixes CVE-2016-0758. In the ASN.1 decoder, when the length field of an ASN.1 value is extracted, it isn't validated against the remaining amount of data before being added to the cursor. With a sufficiently large size indicated, the check: datalen - dp < 2 may then fail due to integer overflow. Fix this by checking the length indicated against the amount of remaining data in both places a definite length is determined. Whilst we're at it, make the following changes: (1) Check the maximum size of extended length does not exceed the capacity of the variable it's being stored in (len) rather than the type that variable is assumed to be (size_t). (2) Compare the EOC tag to the symbolic constant ASN1_EOC rather than the integer 0. (3) To reduce confusion, move the initialisation of len outside of: for (len = 0; n > 0; n--) { since it doesn't have anything to do with the loop counter n. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Peter Jones <pjones@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Joe Perches authored
commit cb984d10 upstream. As gcc major version numbers are going to advance rather rapidly in the future, there's no real value in separate files for each compiler version. Deduplicate some of the macros #defined in each file too. Neaten comments using normal kernel commenting style. Signed-off-by: Joe Perches <joe@perches.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Alan Modra <amodra@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ philm: backport to 3.16-stable ] Signed-off-by: Philip Müller <philm@manjaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dave Chinner authored
commit ec56b1f1 upstream. Lock ordering for the new mmap lock needs to be: mmap_sem sb_start_pagefault i_mmap_lock page lock <fault processsing> Right now xfs_vm_page_mkwrite gets this the wrong way around, While technically it cannot deadlock due to the current freeze ordering, it's still a landmine that might explode if we change anything in future. Hence we need to nest the locks correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
-
Dave Chinner authored
commit 723cac48 upstream. Extent swap operations are another extent manipulation operation that we need to ensure does not race against mmap page faults. The current code returns if the file is mapped prior to the swap being done, but it could potentially race against new page faults while the swap is in progress. Hence we should use the XFS_MMAPLOCK_EXCL for this operation, too. While there, fix the error path handling that can result in double unlocks of the inodes when cancelling the swapext transaction. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> [bwh: Backported to 3.16: - The obsoleted check for mmap'd files was directly in xfs_swap_extents() and used VN_MAPPED - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dave Chinner authored
commit 0f9160b4 upstream. Now that truncate locks out new page faults, we no longer need to do special writeback hacks in truncate to work around potential races between page faults, page cache truncation and file size updates to ensure we get write page faults for extending truncates on sub-page block size filesystems. Hence we can remove the code in xfs_setattr_size() that handles this and update the comments around the code tha thandles page cache truncate and size updates to reflect the new reality. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> [bwh: Backported to 3.16: we never had the previous hack, so just update the comment] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dave Chinner authored
commit e8e9ad42 upstream. Now we have the i_mmap_lock being held across the page fault IO path, we now add extent manipulation operation exclusion by adding the lock to the paths that directly modify extent maps. This includes truncate, hole punching and other fallocate based operations. The operations will now take both the i_iolock and the i_mmaplock in exclusive mode, thereby ensuring that all IO and page faults block without holding any page locks while the extent manipulation is in progress. This gives us the lock order during truncate of i_iolock -> i_mmaplock -> page_lock -> i_lock, hence providing the same lock order as the iolock provides the normal IO path without involving the mmap_sem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> [bwh: Backported to 3.16: - We never need to break layouts, so take both i_iolock and i_mmaplock at the same time - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
-
Dave Chinner authored
commit 075a924d upstream. Take the i_mmaplock over write page faults. These come through the ->page_mkwrite callout, so we need to wrap that calls with the i_mmaplock. This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock -> i_lock. Also, move the page_mkwrite wrapper to the same region of xfs_file.c as the read fault wrappers and add a tracepoint. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
-
Dave Chinner authored
commit de0e8c20 upstream. Take the i_mmaplock over read page faults. These come through the ->fault callout, so we need to wrap the generic implementation with the i_mmaplock. While there, add tracepoints for the read fault as it passes through XFS. This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock -> i_lock. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
-
Dave Chinner authored
commit 653c60b6 upstream. Right now we cannot serialise mmap against truncate or hole punch sanely. ->page_mkwrite is not able to take locks that the read IO path normally takes (i.e. the inode iolock) because that could result in lock inversions (read - iolock - page fault - page_mkwrite - iolock) and so we cannot use an IO path lock to serialise page write faults against truncate operations. Instead, introduce a new lock that is used *only* in the ->page_mkwrite path that is the equivalent of the iolock. The lock ordering in a page fault is i_mmaplock -> page lock -> i_ilock, and so in truncate we can i_iolock -> i_mmaplock and so lock out new write faults during the process of truncation. Because i_mmap_lock is outside the page lock, we can hold it across all the same operations we hold the i_iolock for. The only difference is that we never hold the i_mmaplock in the normal IO path and so do not ever have the possibility that we can page fault inside it. Hence there are no recursion issues on the i_mmap_lock and so we can use it to serialise page fault IO against inode modification operations that affect the IO path. This patch introduces the i_mmaplock infrastructure, lockdep annotations and initialisation/destruction code. Use of the new lock will be in subsequent patches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: xfs@oss.sgi.com
-
Dave Chinner authored
commit 81217683 upstream. xfs_swap_extents() holds the ilock over a call to filemap_write_and_wait(), which can then try to write data and take the ilock. That causes a self-deadlock. Fix the deadlock and clean up the code by separating the locking appropriately. Add a lockflags variable to track what locks we are holding as we gain and drop them and cleanup the error handling to always use "out_unlock" with the lockflags variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Linus Torvalds authored
commit 31b0b385 upstream. The slab name ends up being visible in the directory structure under /sys, and even if you don't have access rights to the file you can see the filenames. Just use a 64-bit counter instead of the pointer to the 'net' structure to generate a unique name. This code will go away in 4.7 when the conntrack code moves to a single kmemcache, but this is the backportable simple solution to avoiding leaking kernel pointers to user space. Fixes: 5b3501fa ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Junxiao Bi authored
commit c25a1e06 upstream. Commit 702e5bc6 ("ocfs2: use generic posix ACL infrastructure") refactored code to use posix_acl_create. The problem with this function is that it is not mindful of the cluster wide inode lock making it unsuitable for use with ocfs2 inode creation with ACLs. For example, when used in ocfs2_mknod, this function can cause deadlock as follows. The parent dir inode lock is taken when calling posix_acl_create -> get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can cause deadlock if there is a blocked remote lock request waiting for the lock to be downconverted. And same deadlock happened in ocfs2_reflink. This fix is to revert back using ocfs2_init_acl. Fixes: 702e5bc6 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dan Carpenter authored
commit e073fc58 upstream. The code at the "out" label assumes that "default_acl" and "acl" are NULL, but actually the pointers can be NULL, unitialized, or freed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steven Rostedt authored
commit 106b816c upstream. At the end of process_filter(), collapse_tree() was changed to update the parg parameter, but the reassignment after the call wasn't removed. What happens is that the "current_op" gets modified and freed and parg is assigned to the new allocated argument. But after the call to collapse_tree(), parg is assigned again to the just freed "current_op", and this causes the tool to crash. The current_op variable must also be assigned to NULL in case of error, otherwise it will cause it to be free()ed twice. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Fixes: 42d6194d ("tools lib traceevent: Refactor process_filter()") Link: http://lkml.kernel.org/r/20160511150936.678c18a1@gandalf.local.homeSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Yura Pakhuchiy authored
commit 3231e205 upstream. Subwoofer does not work out of the box on ASUS N751/N551 laptops. This patch fixes it. Patch tested on N751 laptop. N551 part is not tested, but according to [1] and [2] this laptop requires similar changes, so I included them in the patch. 1. https://github.com/honsiorovskyi/asus-n551-hda-fix 2. https://bugs.launchpad.net/ubuntu/+source/alsa-tools/+bug/1405691 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=117781Signed-off-by: Yura Pakhuchiy <pakhuchiy@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Vetter authored
commit 2700818a upstream. LPT is pch, so might run into the fdi bandwidth constraint (especially since it has only 2 lanes). But right now we just force pipe_bpp back to 24, resulting in a nice loop (which we bail out with a loud WARN_ON). Fix this. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=93477Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1462264381-7573-1-git-send-email-daniel.vetter@ffwll.ch (cherry picked from commit f58a1acc) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kaho Ng authored
commit 2da2dc9e upstream. For reducing the noise from the headset output on ASUS UX501VW, call the existing fixup, alc_fixup_headset_mode_alc668(), additionally. Thread: https://bbs.archlinux.org/viewtopic.php?id=209554Signed-off-by: Kaho Ng <ngkaho1234@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
commit be0bd316 upstream. If GSO packet is segmented and its segments are properly queued, we call consume_skb() instead of kfree_skb() to be drop monitor friendly. Fixes: 3e4f8b78 ("macvtap: Perform GSO on forwarding path.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Al Viro authored
commit 99d82582 upstream. Payloads of NM entries are not supposed to contain NUL. When we run into such, only the part prior to the first NUL goes into the concatenation (i.e. the directory entry name being encoded by a bunch of NM entries). We do stop when the amount collected so far + the claimed amount in the current NM entry exceed 254. So far, so good, but what we return as the total length is the sum of *claimed* sizes, not the actual amount collected. And that can grow pretty large - not unlimited, since you'd need to put CE entries in between to be able to get more than the maximum that could be contained in one isofs directory entry / continuation chunk and we are stop once we'd encountered 32 CEs, but you can get about 8Kb easily. And that's what will be passed to readdir callback as the name length. 8Kb __copy_to_user() from a buffer allocated by __get_free_page() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dmitry V. Levin authored
commit f0b22d1b upstream. Do not load one entry beyond the end of the syscall table when the syscall number of a traced process equals to __NR_Linux_syscalls. Similar bug with regular processes was fixed by commit 3bb457af ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls"). This bug was found by strace test suite. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Chen Yu authored
commit 886123fb upstream. Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f; Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM (35.5), the ratio bits are bit 8-15. Ignoring the upper bits can result in an incorrect tsc ratio, which causes the TSC calibration and the Local APIC timer frequency to be incorrect. Fix this problem by masking 0xff instead. [ tglx: Massaged changelog ] Fixes: 7da7c156 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs" Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Bin Gao <bin.gao@intel.com> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mathias Krause authored
commit 8148a73c upstream. If /proc/<PID>/environ gets read before the envp[] array is fully set up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to read more bytes than are actually written, as env_start will already be set but env_end will still be zero, making the range calculation underflow, allowing to read beyond the end of what has been written. Fix this as it is done for /proc/<PID>/cmdline by testing env_end for zero. It is, apparently, intentionally set last in create_*_tables(). This bug was found by the PaX size_overflow plugin that detected the arithmetic underflow of 'this_len = env_end - (env_start + src)' when env_end is still zero. The expected consequence is that userland trying to access /proc/<PID>/environ of a not yet fully set up process may get inconsistent data as we're in the middle of copying in the environment variables. Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363 Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Lucas Stach authored
commit e3c00d87 upstream. On DCE6.1 PPLL2 is exclusively available to UNIPHYA, so it should not be taken into consideration when looking for an already enabled PLL to be shared with other outputs. This fixes the broken VGA port (TRAVIS DP->VGA bridge) on my Richland based laptop, where the internal display is connected to UNIPHYA through a TRAVIS DP->LVDS bridge. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=78987 v2: agd: add check in radeon_get_shared_nondp_ppll as well, drop extra parameter. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eric W. Biederman authored
commit 5ec0811d upstream. When the first propgated copy was a slave the following oops would result: > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > PGD bacd4067 PUD bac66067 PMD 0 > Oops: 0000 [#1] SMP > Modules linked in: > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000 > RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283 > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480 > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000 > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00 > FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0 > Stack: > ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85 > ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40 > 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0 > Call Trace: > [<ffffffff811fbf85>] propagate_mnt+0x105/0x140 > [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0 > [<ffffffff811f1ec3>] graft_tree+0x63/0x70 > [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100 > [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0 > [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70 > [<ffffffff811f3a45>] SyS_mount+0x75/0xc0 > [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0 > [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25 > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30 > RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0 > RSP <ffff8800bac3fd38> > CR2: 0000000000000010 > ---[ end trace 2725ecd95164f217 ]--- This oops happens with the namespace_sem held and can be triggered by non-root users. An all around not pleasant experience. To avoid this scenario when finding the appropriate source mount to copy stop the walk up the mnt_master chain when the first source mount is encountered. Further rewrite the walk up the last_source mnt_master chain so that it is clear what is going on. The reason why the first source mount is special is that it it's mnt_parent is not a mount in the dest_mnt propagation tree, and as such termination conditions based up on the dest_mnt mount propgation tree do not make sense. To avoid other kinds of confusion last_dest is not changed when computing last_source. last_dest is only used once in propagate_one and that is above the point of the code being modified, so changing the global variable is meaningless and confusing. fixes: f2ebb3a9 ("smarter propagate_mnt()") Reported-by: Tycho Andersen <tycho.andersen@canonical.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Maxim Patlasov authored
commit 7ae8fd03 upstream. propagate_one(m) calculates "type" argument for copy_tree() like this: > if (m->mnt_group_id == last_dest->mnt_group_id) { > type = CL_MAKE_SHARED; > } else { > type = CL_SLAVE; > if (IS_MNT_SHARED(m)) > type |= CL_MAKE_SHARED; > } The "type" argument then governs clone_mnt() behavior with respect to flags and mnt_master of new mount. When we iterate through a slave group, it is possible that both current "m" and "last_dest" are not shared (although, both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison above erroneously makes new mount shared and sets its mnt_master to last_source->mnt_master. The patch fixes the problem by handling zero mnt_group_id-s as though they are unequal. The similar problem exists in the implementation of "else" clause above when we have to ascend upward in the master/slave tree by calling: > last_source = last_source->mnt_master; > last_dest = last_source->mnt_parent; proper number of times. The last step is governed by "n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if both are zero. The patch fixes this case in the same way as the former one. [AV: don't open-code an obvious helper...] Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-