- 16 Jul, 2014 40 commits
-
-
Tony Luck authored
commit 74614de1 upstream. When Linux sees an "action optional" machine check (where h/w has reported an error that is not in the current execution path) we generally do not want to signal a process, since most processes do not have a SIGBUS handler - we'd just prematurely terminate the process for a problem that they might never actually see. task_early_kill() decides whether to consider a process - and it checks whether this specific process has been marked for early signals with "prctl", or if the system administrator has requested early signals for all processes using /proc/sys/vm/memory_failure_early_kill. But for MF_ACTION_REQUIRED case we must not defer. The error is in the execution path of the current thread so we must send the SIGBUS immediatley. Fix by passing a flag argument through collect_procs*() to task_early_kill() so it knows whether we can defer or must take action. Signed-off-by:
Tony Luck <tony.luck@intel.com> Signed-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chen Gong <gong.chen@linux.jf.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Tony Luck authored
commit a70ffcac upstream. When a thread in a multi-threaded application hits a machine check because of an uncorrectable error in memory - we want to send the SIGBUS with si.si_code = BUS_MCEERR_AR to that thread. Currently we fail to do that if the active thread is not the primary thread in the process. collect_procs() just finds primary threads and this test: if ((flags & MF_ACTION_REQUIRED) && t == current) { will see that the thread we found isn't the current thread and so send a si.si_code = BUS_MCEERR_AO to the primary (and nothing to the active thread at this time). We can fix this by checking whether "current" shares the same mm with the process that collect_procs() said owned the page. If so, we send the SIGBUS to current (with code BUS_MCEERR_AR). Signed-off-by:
Tony Luck <tony.luck@intel.com> Signed-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by:
Otto Bruggeman <otto.g.bruggeman@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chen Gong <gong.chen@linux.jf.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mel Gorman authored
commit e58469ba upstream. The test_bit operations in get/set pageblock flags are expensive. This patch reads the bitmap on a word basis and use shifts and masks to isolate the bits of interest. Similarly masks are used to set a local copy of the bitmap and then use cmpxchg to update the bitmap if there have been no other changes made in parallel. In a test running dd onto tmpfs the overhead of the pageblock-related functions went from 1.27% in profiles to 0.5%. In addition to the performance benefits, this patch closes races that are possible between: a) get_ and set_pageblock_migratetype(), where get_pageblock_migratetype() reads part of the bits before and other part of the bits after set_pageblock_migratetype() has updated them. b) set_pageblock_migratetype() and set_pageblock_skip(), where the non-atomic read-modify-update set bit operation in set_pageblock_skip() will cause lost updates to some bits changed in the set_pageblock_migratetype(). Joonsoo Kim first reported the case a) via code inspection. Vlastimil Babka's testing with a debug patch showed that either a) or b) occurs roughly once per mmtests' stress-highalloc benchmark (although not necessarily in the same pageblock). Furthermore during development of unrelated compaction patches, it was observed that frequent calls to {start,undo}_isolate_page_range() the race occurs several thousands of times and has resulted in NULL pointer dereferences in move_freepages() and free_one_page() in places where free_list[migratetype] is manipulated by e.g. list_move(). Further debugging confirmed that migratetype had invalid value of 6, causing out of bounds access to the free_list array. That confirmed that the race exist, although it may be extremely rare, and currently only fatal where page isolation is performed due to memory hot remove. Races on pageblocks being updated by set_pageblock_migratetype(), where both old and new migratetype are lower MIGRATE_RESERVE, currently cannot result in an invalid value being observed, although theoretically they may still lead to unexpected creation or destruction of MIGRATE_RESERVE pageblocks. Furthermore, things could get suddenly worse when memory isolation is used more, or when new migratetypes are added. After this patch, the race has no longer been observed in testing. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Reported-by:
Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-and-tested-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [ kamal: backport to 3.13-stable: context ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Michal Hocko authored
commit d8dc595c upstream. Eric has reported that he can see task(s) stuck in memcg OOM handler regularly. The only way out is to echo 0 > $GROUP/memory.oom_control His usecase is: - Setup a hierarchy with memory and the freezer (disable kernel oom and have a process watch for oom). - In that memory cgroup add a process with one thread per cpu. - In one thread slowly allocate once per second I think it is 16M of ram and mlock and dirty it (just to force the pages into ram and stay there). - When oom is achieved loop: * attempt to freeze all of the tasks. * if frozen send every task SIGKILL, unfreeze, remove the directory in cgroupfs. Eric has then pinpointed the issue to be memcg specific. All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled. Those that have received fatal signal will bypass the charge and should continue on their way out. The tricky part is that the exit path might trigger a page fault (e.g. exit_robust_list), thus the memcg charge, while its memcg is still under OOM because nobody has released any charges yet. Unlike with the in-kernel OOM handler the exiting task doesn't get TIF_MEMDIE set so it doesn't shortcut further charges of the killed task and falls to the memcg OOM again without any way out of it as there are no fatal signals pending anymore. This patch fixes the issue by checking PF_EXITING early in mem_cgroup_try_charge and bypass the charge same as if it had fatal signal pending or TIF_MEMDIE set. Normally exiting tasks (aka not killed) will bypass the charge now but this should be OK as the task is leaving and will release memory and increasing the memory pressure just to release it in a moment seems dubious wasting of cycles. Besides that charges after exit_signals should be rare. I am bringing this patch again (rebased on the current mmotm tree). I hope we can move forward finally. If there is still an opposition then I would really appreciate a concurrent approach so that we can discuss alternatives. http://comments.gmane.org/gmane.linux.kernel.stable/77650 is a reference to the followup discussion when the patch has been dropped from the mmotm last time. Reported-by:
Eric W. Biederman <ebiederm@xmission.com> Signed-off-by:
Michal Hocko <mhocko@suse.cz> Acked-by:
David Rientjes <rientjes@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [ kamal: backport to 3.13: whitespace ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mel Gorman authored
commit 675becce upstream. throttle_direct_reclaim() is meant to trigger during swap-over-network during which the min watermark is treated as a pfmemalloc reserve. It throttes on the first node in the zonelist but this is flawed. The user-visible impact is that a process running on CPU whose local memory node has no ZONE_NORMAL will stall for prolonged periods of time, possibly indefintely. This is due to throttle_direct_reclaim thinking the pfmemalloc reserves are depleted when in fact they don't exist on that node. On a NUMA machine running a 32-bit kernel (I know) allocation requests from CPUs on node 1 would detect no pfmemalloc reserves and the process gets throttled. This patch adjusts throttling of direct reclaim to throttle based on the first node in the zonelist that has a usable ZONE_NORMAL or lower zone. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Tetsuo Handa authored
commit 8fe6929c upstream. Commit 786235ee ("kthread: make kthread_create() killable") meant for allowing kthread_create() to abort as soon as killed by the OOM-killer. But returning -ENOMEM is wrong if killed by SIGKILL from userspace. Change kthread_create() to return -EINTR upon SIGKILL. Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Naoya Horiguchi authored
commit c177c81e upstream. Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by:
Michael Ellerman <mpe@ellerman.id.au> Tested-by:
Michael Ellerman <mpe@ellerman.id.au> Acked-by:
Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Hugh Dickins authored
commit 7f39dda9 upstream. Trinity reports BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:47 in_atomic(): 0, irqs_disabled(): 0, pid: 5787, name: trinity-c27 __might_sleep < down_write < __put_anon_vma < page_get_anon_vma < migrate_pages < compact_zone < compact_zone_order < try_to_compact_pages .. Right, since conversion to mutex then rwsem, we should not put_anon_vma() from inside an rcu_read_lock()ed section: fix the two places that did so. And add might_sleep() to anon_vma_free(), as suggested by Peter Zijlstra. Fixes: 88c22088 ("mm: optimize page_lock_anon_vma() fast-path") Reported-by:
Dave Jones <davej@redhat.com> Signed-off-by:
Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Jérôme Carretero authored
commit d2518365 upstream. This device normally comes with a proprietary driver, using a web GUI to configure RAID: http://www.highpoint-tech.com/USA_new/series_rr600-download.htm But thankfully it also works out of the box with the AHCI driver, being just a Marvell 88SE9235. Devices 640L, 644L, 644LS should also be supported but not tested here. Signed-off-by:
Jérôme Carretero <cJ-ko@zougloub.eu> Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Alex Deucher authored
commit 7d5ab300 upstream. May fix display issues with non-HDMI displays. Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Alex Deucher authored
commit 64252835 upstream. We need to specify the encoder mode as LVDS for eDP when using the Crtc_Source atom table in order to properly set up the FMT hardware. bug: https://bugs.freedesktop.org/show_bug.cgi?id=73911Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Alex Deucher authored
commit 3b6d9fd2 upstream. Only DCE5+ asics support DP 1.2. Noticed by ArtForz on IRC. Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Alex Deucher authored
commit af5d3653 upstream. We were checking the ext clock rather than the display clock. Noticed by ArtForz on IRC. Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Alex Deucher authored
commit f2bc5616 upstream. Avoids blank screens on muxed systems when runpm is active. bug: https://bugs.freedesktop.org/show_bug.cgi?id=75917Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Jukka Taimisto authored
commit 8a96f3cd upstream. -[0x01 Introduction We have found a programming error causing a deadlock in Bluetooth subsystem of Linux kernel. The problem is caused by missing release_sock() call when L2CAP connection creation fails due full accept queue. The issue can be reproduced with 3.15-rc5 kernel and is also present in earlier kernels. -[0x02 Details The problem occurs when multiple L2CAP connections are created to a PSM which contains listening socket (like SDP) and left pending, for example, configuration (the underlying ACL link is not disconnected between connections). When L2CAP connection request is received and listening socket is found the l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called. This function locks the 'parent' socket and then checks if the accept queue is full. 1178 lock_sock(parent); 1179 1180 /* Check for backlog size */ 1181 if (sk_acceptq_is_full(parent)) { 1182 BT_DBG("backlog full %d", parent->sk_ack_backlog); 1183 return NULL; 1184 } If case the accept queue is full NULL is returned, but the 'parent' socket is not released. Thus when next L2CAP connection request is received the code blocks on lock_sock() since the parent is still locked. Also note that for connections already established and waiting for configuration to complete a timeout will occur and l2cap_chan_timeout() (net/bluetooth/l2cap_core.c) will be called. All threads calling this function will also be blocked waiting for the channel mutex since the thread which is waiting on lock_sock() alread holds the channel mutex. We were able to reproduce this by sending continuously L2CAP connection request followed by disconnection request containing invalid CID. This left the created connections pending configuration. After the deadlock occurs it is impossible to kill bluetoothd, btmon will not get any more data etc. requiring reboot to recover. -[0x03 Fix Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL seems to fix the issue. Signed-off-by:
Jukka Taimisto <jtt@codenomicon.com> Reported-by:
Tommi Mäkilä <tmakila@codenomicon.com> Signed-off-by:
Johan Hedberg <johan.hedberg@intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
hujianyang authored
commit 72abc8f4 upstream. I hit the same assert failed as Dolev Raviv reported in Kernel v3.10 shows like this: [ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297) [ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1 [ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24) [ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28) [ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) [ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) [ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) [ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) [ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) [ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) [ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238) [ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) [ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) [ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) [ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) [ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418) [ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530) [ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54) [ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184) [ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac) [ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0) [ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40) After analyzing the code, I found a condition that may cause this failed in correct operations. Thus, I think this assertion is wrong and should be removed. Suppose there are two clean znodes and one dirty znode in TNC. So the per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops on this znode. We clear COW bit and DIRTY bit in write_index() without @tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the comments in write_index() shows, if another process hold @tnc_mutex and dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1). We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in free_obsolete_znodes() to keep it right. If shrink_tnc() performs between decrease and increase, it will release other 2 clean znodes it holds and found @clean_zn_cnt is less than zero (1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will soon correct @clean_zn_cnt and no harm to fs in this case, I think this assertion could be removed. 2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2 Thread A (commit) Thread B (write or others) Thread C (shrinker) ->write_index ->clear_bit(DIRTY_NODE) ->clear_bit(COW_ZNODE) @clean_zn_cnt == 2 ->mutex_locked(&tnc_mutex) ->dirty_cow_znode ->!ubifs_zn_cow(znode) ->!test_and_set_bit(DIRTY_NODE) ->atomic_dec(&clean_zn_cnt) ->mutex_unlocked(&tnc_mutex) @clean_zn_cnt == 1 ->mutex_locked(&tnc_mutex) ->shrink_tnc ->destroy_tnc_subtree ->atomic_sub(&clean_zn_cnt, 2) ->ubifs_assert <- hit ->mutex_unlocked(&tnc_mutex) @clean_zn_cnt == -1 ->mutex_lock(&tnc_mutex) ->free_obsolete_znodes ->atomic_inc(&clean_zn_cnt) ->mutux_unlock(&tnc_mutex) @clean_zn_cnt == 0 (correct after shrink) Signed-off-by:
hujianyang <hujianyang@huawei.com> Signed-off-by:
Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Peter Ujfalusi authored
commit e6c111fa upstream. For some unknown reason the parameters for snd_soc_test_bits() were in wrong order: It was: snd_soc_test_bits(codec, val, mask, reg); /* WRONG!!! */ while it should be: snd_soc_test_bits(codec, reg, mask, val); Signed-off-by:
Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by:
Mark Brown <broonie@linaro.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Christoph Hellwig authored
commit 12337901 upstream. Note nobody's ever noticed because the typical client probably never requests FILES_AVAIL without also requesting something else on the list. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
J. Bruce Fields <bfields@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Ming Lei authored
commit e8edca6f upstream. Firstly, it isn't necessary to hold lock of vblk->vq_lock when notifying hypervisor about queued I/O. Secondly, virtqueue_notify() will cause world switch and it may take long time on some hypervisors(such as, qemu-arm), so it isn't good to hold the lock and block other vCPUs. On arm64 quad core VM(qemu-kvm), the patch can increase I/O performance a lot with VIRTIO_RING_F_EVENT_IDX enabled: - without the patch: 14K IOPS - with the patch: 34K IOPS fio script: [global] direct=1 bsrange=4k-4k timeout=10 numjobs=4 ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f1] rw=randread Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by:
Ming Lei <ming.lei@canonical.com> Acked-by:
Rusty Russell <rusty@rustcorp.com.au> Signed-off-by:
Jens Axboe <axboe@fb.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
James Hogan authored
commit 7006e2df upstream. Each MIPS KVM guest has its own copy of the KVM exception vector. This contains the TLB refill exception handler at offset 0x000, the general exception handler at offset 0x180, and interrupt exception handlers at offset 0x200 in case Cause_IV=1. A common handler is copied to offset 0x2000 and offset 0x3000 is used for temporarily storing k1 during entry from guest. However the amount of memory allocated for this purpose is calculated as 0x200 rounded up to the next page boundary, which is insufficient if 4KB pages are in use. This can lead to the common handler at offset 0x2000 being overwritten and infinitely recursive exceptions on the next exit from the guest. Increase the minimum size from 0x200 to 0x4000 to cover the full use of the page. Signed-off-by:
James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Yann Droneaud authored
commit b6f04d3d upstream. The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_create_cq_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_create_cq_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp { __u32 size; /* 28 4 */ __u32 qid_mask; /* 32 4 */ - /* size: 36, cachelines: 1, members: 6 */ - /* last cacheline: 36 bytes */ + /* size: 40, cachelines: 1, members: 6 */ + /* padding: 4 */ + /* last cacheline: 40 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. This patch adds an explicit padding at end of structure c4iw_create_cq_resp, and, like 92b0ca7c ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_create_cq_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Fixes: cfdda9d7 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Fixes: e24a72a3 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()") Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Yann Droneaud <ydroneaud@opteya.com> Acked-by:
Steve Wise <swise@opengridcomputing.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Bart Van Assche authored
commit 8ec0a0e6 upstream. Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL or if nonseekable_open() fails. Avoid leaking a kref count, that sm_sem is kept down and also that the IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if nonseekable_open() fails. Since container_of() never returns NULL, remove the code that tests whether container_of() returns NULL. Moving the kref_get() call from the start of ib_umad_*open() to the end is safe since it is the responsibility of the caller of these functions to ensure that the cdev pointer remains valid until at least when these functions return. Signed-off-by:
Bart Van Assche <bvanassche@acm.org> [ydroneaud@opteya.com: rework a bit to reduce the amount of code changed] Signed-off-by:
Yann Droneaud <ydroneaud@opteya.com> [ nonseekable_open() can't actually fail, but.... - Roland ] Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Aleksander Morgado authored
commit 0ce5fb58 upstream. A set of new VID/PIDs retrieved from the out-of-tree GobiNet/GobiSerial Sierra Wireless drivers. Signed-off-by:
Aleksander Morgado <aleksander@aleksander.es> Link: http://marc.info/?l=linux-usb&m=140136310027293&w=2Signed-off-by:
Johan Hovold <jhovold@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Aleksander: backport to 3.13-stable ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Aleksander Morgado authored
commit ff1fcd50 upstream. Signed-off-by:
Aleksander Morgado <aleksander@aleksander.es> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Aleksander: backport to 3.13-stable ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Radim Krčmář authored
commit a100d88d upstream. We try to free two pages when only one has been allocated. Cleanup path is unlikely, so I haven't found any trace that would fit, but I hope that free_pages_prepare() does catch it. Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Reviewed-by:
Amos Kong <akong@redhat.com> Acked-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Thomas Petazzoni authored
commit b7e46062 upstream. The pxa3xx_nand driver currently uses __raw_writel() and __raw_readl() to access I/O registers. However, those functions do not do any endianness swapping, which means that they won't work when the CPU runs in big-endian but the I/O registers are little endian, which is the common situation for ARM systems running big endian. Since __raw_writel() and __raw_readl() do not include any memory barriers and the pxa3xx_nand driver can only be compiled for ARM platforms, the closest I/o accessors functions that do endianess swapping are writel_relaxed() and readl_relaxed(). This patch has been verified to work on Armada XP GP: without the patch, the NAND is not detected when the kernel runs big endian while it is properly detected when the kernel runs little endian. With the patch applied, the NAND is properly detected in both situations (little and big endian). Signed-off-by:
Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by:
Brian Norris <computersforpeace@gmail.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Arik Nemtsov authored
commit 923eaf36 upstream. Doing so will lead to an oops for a p2p-dev interface, since it has no netdev. Signed-off-by:
Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Felix Fietkau authored
commit 53d04525 upstream. If the rate control algorithm uses a selection table, it is leaked when the station is destroyed - fix that. Signed-off-by:
Felix Fietkau <nbd@openwrt.org> Reported-by:
Christophe Prévotaux <cprevotaux@nltinc.com> Fixes: 0d528d85 ("mac80211: improve the rate control API") [add commit log entry, remove pointless NULL check] Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Christian Borntraeger authored
commit 993072ee upstream. The IRB might be 96 bytes if the extended-I/O-measurement facility is used. This feature is currently not used by Linux, but struct irb already has the emw defined. So let's make the irb in lowcore match the size of the internal data structure to be future proof. We also have to add a pad, to correctly align the paste. The bigger irb field also circumvents a bug in some QEMU versions that always write the emw field on test subchannel and therefore destroy the paste definitions of this CPU. Running under these QEMU version broke some timing functions in the VDSO and all users of these functions, e.g. some JREs. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Huang Rui authored
commit e4d58f5d upstream. TEST 12 and TEST 24 unlinks the URB write request for N times. When host and gadget both initialize pattern 1 (mod 63) data series to transfer, the gadget side will complain the wrong data which is not expected. Because in host side, usbtest doesn't fill the data buffer as mod 63 and this patch fixed it. [20285.488974] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready [20285.489181] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Not Active [20285.489423] dwc3 dwc3.0.auto: ep1out-bulk: req ffff8800aa6cb480 dma aeb50800 length 512 last [20285.489727] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Start Transfer' params 00000000 a9eaf000 00000000 [20285.490055] dwc3 dwc3.0.auto: Command Complete --> 0 [20285.490281] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready [20285.490492] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Active [20285.490713] dwc3 dwc3.0.auto: ep1out-bulk: endpoint busy [20285.490909] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Complete [20285.491117] dwc3 dwc3.0.auto: request ffff8800aa6cb480 from ep1out-bulk completed 512/512 ===> 0 [20285.491431] zero gadget: bad OUT byte, buf[1] = 0 [20285.491605] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Set Stall' params 00000000 00000000 00000000 [20285.491915] dwc3 dwc3.0.auto: Command Complete --> 0 [20285.492099] dwc3 dwc3.0.auto: queing request ffff8800aa6cb480 to ep1out-bulk length 512 [20285.492387] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready [20285.492595] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Not Active [20285.492830] dwc3 dwc3.0.auto: ep1out-bulk: req ffff8800aa6cb480 dma aeb51000 length 512 last [20285.493135] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Start Transfer' params 00000000 a9eaf000 00000000 [20285.493465] dwc3 dwc3.0.auto: Command Complete --> 0 Signed-off-by:
Huang Rui <ray.huang@amd.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Tomas Winkler authored
commit c40765d9 upstream. According the spec the host should read H_CSR again after asserting reset H_RST to ensure that reset was read by the firmware Signed-off-by:
Tomas Winkler <tomas.winkler@intel.com> Signed-off-by:
Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ kamal: backport to 3.13-stable: also includes mei_me_hw_reset() change from 33ec0826 mei: revamp mei reset state machine ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Tomas Winkler authored
commit 07cd7be3 upstream. It my take time till ME_RDY will be cleared after the reset, so we cannot check the bit before we got the interrupt Signed-off-by:
Tomas Winkler <tomas.winkler@intel.com> Signed-off-by:
Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Tomas Winkler authored
commit b04ada92 upstream. We cleared H_RST for H_CSR on spurious interrupt generated when ME_RDY while cleared and not while ME_RDY is set. The spurious interrupt is not delivered on all platforms in this case the driver may fail to initialize. Signed-off-by:
Tomas Winkler <tomas.winkler@intel.com> Signed-off-by:
Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ kamal: backport to 3.13-stable: context ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Dennis Dalessandro authored
commit 7e6d3e5c upstream. This patch addresses an issue where the legacy diagpacket is sent in from the user, but the driver operates on only the extended diagpkt. This patch specifically initializes the extended diagpkt based on the legacy packet. Reported-by:
Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mike Marciniszyn authored
commit 911eccd2 upstream. The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE. As of the dual port qib QDR card, this is not necessarily correct. Change to use the port as specified in the call. Reported-by:
Alex Estrin <alex.estrin@intel.com> Reviewed-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Yann Droneaud authored
commit 43bc8893 upstream. The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. Tool pahole could be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_srq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 @@ -69,7 +68,6 @@ struct mlx5_ib_create_srq { __u64 db_addr; /* 8 8 */ __u32 flags; /* 16 4 */ - /* size: 20, cachelines: 1, members: 3 */ - /* last cacheline: 20 bytes */ + /* size: 24, cachelines: 1, members: 3 */ + /* padding: 4 */ + /* last cacheline: 24 bytes */ }; ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lay in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail and the uverb will fail. This patch makes create_srq_user() takes care of the input data size to handle the case where no padding was provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by:
Yann Droneaud <ydroneaud@opteya.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Yann Droneaud authored
commit a8237b32 upstream. The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_cq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq { __u64 db_addr; /* 8 8 */ __u32 cqe_size; /* 16 4 */ - /* size: 20, cachelines: 1, members: 3 */ - /* last cacheline: 20 bytes */ + /* size: 24, cachelines: 1, members: 3 */ + /* padding: 4 */ + /* last cacheline: 24 bytes */ }; This ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, a x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lies in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail when trying to read the 4 bytes of padding and the uverb will fail. This patch makes create_cq_user() takes care of the input data size to handle the case where no padding is provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by:
Yann Droneaud <ydroneaud@opteya.com> Signed-off-by:
Roland Dreier <roland@purestorage.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Maurizio Lombardi authored
commit b5b60778 upstream. The variable "size" is expressed as number of blocks and not as number of clusters, this could trigger a kernel panic when using ext4 with the size of a cluster different from the size of a block. Signed-off-by:
Maurizio Lombardi <mlombard@redhat.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Jan Kara authored
commit eeece469 upstream. Tail of a page straddling inode size must be zeroed when being written out due to POSIX requirement that modifications of mmaped page beyond inode size must not be written to the file. ext4_bio_write_page() did this only for blocks fully beyond inode size but didn't properly zero blocks partially beyond inode size. Fix this. The problem has been uncovered by mmap_11-4 test in openposix test suite (part of LTP). Reported-by:
Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Fixes: 5a0dc736 Fixes: bd2d0210Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Andreas Schrägle authored
commit 754a292f upstream. Add support for Marvell Technology Group Ltd. 88SE91A0 SATA 6Gb/s Controller by adding its PCI ID. Signed-off-by:
Andreas Schrägle <ajs124.ajs124@gmail.com> Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-