- 15 Dec, 2016 20 commits
-
-
Romain Perier authored
commit 68c7f8c1 upstream. No need to copy the template of an hash operation twice into the SRAM from the step function. Fixes: commit 85030c51 ("crypto: marvell - Add support for chai...") Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit d6eb270c upstream. Given dimms and bus commands share the same command number space we need to be careful that we are translating status in the correct context. Otherwise we can, for example, fail an ND_CMD_GET_CONFIG_SIZE command because max_xfer is zero. It fails because that condition erroneously correlates with the 'cleared == 0' failure of ND_CMD_CLEAR_ERROR. Fixes: aef25338 ("libnvdimm, nfit: centralize command status translation") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit 82aa37cf upstream. If an ARS Status command returns truncated output, do not process partial records or otherwise consume non-status fields. Fixes: 0caeef63 ("libnvdimm: Add a poison list and export badblocks") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit efda1b5d upstream. Given ambiguities in the ACPI 6.1 definition of the "Output (Size)" field of the ARS (Address Range Scrub) Status command, a firmware implementation may in practice return 0, 4, or 8 to indicate that there is no output payload to process. The specification states "Size of Output Buffer in bytes, including this field.". However, 'Output Buffer' is also the name of the entire payload, and earlier in the specification it states "Max Query ARS Status Output Buffer Size: Maximum size of buffer (including the Status and Extended Status fields)". Without this fix if the BIOS happens to return 0 it causes memory corruption as evidenced by this result from the acpi_nfit_ctl() unit test. ars_status00000000: 00020000 00000000 ........ BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff) kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC task: ffff8803332d2ec0 task.stack: ffffc9000174c000 RIP: 0010:[<ffffffff814cfe72>] [<ffffffff814cfe72>] __memcpy+0x12/0x20 RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0 FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0 Stack: ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac 0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000 0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246 Call Trace: [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit] [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test] Fixes: 747ffe11 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vishal Verma authored
commit 9a901f54 upstream. ACPI DSMs can have an 'extended' status which can be non-zero to convey additional information about the command. In the xlat_status routine, where we translate the command statuses, we were returning an error for a non-zero extended status, even if the primary status indicated success. Return from each command's 'case' once we have verified both its status and extend status are good. Fixes: 11294d63 ("nfit: fail DSMs that return non-zero status by default") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra (Intel) authored
commit 7f612a7f upstream. Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM. Both these parts have full_width_write set, and that does indeed have a problem. In order to deal with counter wrap, we must sample the counter at at least half the counter period (see also the sampling theorem) such that we can unambiguously reconstruct the count. However commit: 069e0c3c ("perf/x86/intel: Support full width counting") sets the sampling interval to the full period, not half. Fixing that exposes another issue, in that we must not sign extend the delta value when we shift it right; the counter cannot have decremented after all. With both these issues fixed, counter overflow functions correctly again. Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Tested-by: Liang, Kan <kan.liang@intel.com> Tested-by: Odzioba, Lukasz <lukasz.odzioba@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 069e0c3c ("perf/x86/intel: Support full width counting") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peng Tao authored
commit c4587631 upstream. local_addr.svm_cid is host cid. We should check guest cid instead, which is remote_addr.svm_cid. Otherwise we end up resetting all connections to all guests. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Galbraith authored
commit 83929cce upstream. Michael Kerrisk reported: > Regarding the previous paragraph... My tests indicate > that writing *any* value to the autogroup [nice priority level] > file causes the task group to get a lower priority. Because autogroup didn't call the then meaningless scale_load()... Autogroup nice level adjustment has been broken ever since load resolution was increased for 64-bit kernels. Use scale_load() to scale group weight. Michael Kerrisk tested this patch to fix the problem: > Applied and tested against 4.9-rc6 on an Intel u7 (4 cores). > Test setup: > > Terminal window 1: running 40 CPU burner jobs > Terminal window 2: running 40 CPU burner jobs > Terminal window 1: running 1 CPU burner job > > Demonstrated that: > * Writing "0" to the autogroup file for TW1 now causes no change > to the rate at which the process on the terminal consume CPU. > * Writing -20 to the autogroup file for TW1 caused those processes > to get the lion's share of CPU while TW2 TW3 get a tiny amount. > * Writing -20 to the autogroup files for TW1 and TW3 allowed the > process on TW3 to get as much CPU as it was getting as when > the autogroup nice values for both terminals were 0. Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-man <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/1479897217.4306.6.camel@gmx.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mauricio Faria de Oliveira authored
commit 2319f847 upstream. The BUG_ON() recently introduced in lpfc_sli_ringtxcmpl_put() is hit in the lpfc_els_abort() > lpfc_sli_issue_abort_iotag() > lpfc_sli_abort_iotag_issue() function path [similar names], due to 'piocb->vport == NULL': BUG_ON(!piocb || !piocb->vport); This happens because lpfc_sli_abort_iotag_issue() doesn't set the 'abtsiocbp->vport' pointer -- but this is not the problem. Previously, lpfc_sli_ringtxcmpl_put() accessed 'piocb->vport' only if 'piocb->iocb.ulpCommand' is neither CMD_ABORT_XRI_CN nor CMD_CLOSE_XRI_CN, which are the only possible values for lpfc_sli_abort_iotag_issue(): lpfc_sli_ringtxcmpl_put(): if ((unlikely(pring->ringno == LPFC_ELS_RING)) && (piocb->iocb.ulpCommand != CMD_ABORT_XRI_CN) && (piocb->iocb.ulpCommand != CMD_CLOSE_XRI_CN) && (!(piocb->vport->load_flag & FC_UNLOADING))) lpfc_sli_abort_iotag_issue(): if (phba->link_state >= LPFC_LINK_UP) iabt->ulpCommand = CMD_ABORT_XRI_CN; else iabt->ulpCommand = CMD_CLOSE_XRI_CN; So, this function path would not have hit this possible NULL pointer dereference before. In order to fix this regression, move the second part of the BUG_ON() check prior to the pointer dereference that it does check for. For reference, this is the stack trace observed. The problem happened because an unsolicited event was received - a PLOGI was received after our PLOGI was issued but not yet complete, so the discovery state machine goes on to sw-abort our PLOGI. kernel BUG at drivers/scsi/lpfc/lpfc_sli.c:1326! Oops: Exception in kernel mode, sig: 5 [#1] <...> NIP [...] lpfc_sli_ringtxcmpl_put+0x1c/0xf0 [lpfc] LR [...] __lpfc_sli_issue_iocb_s4+0x188/0x200 [lpfc] Call Trace: [...] [...] __lpfc_sli_issue_iocb_s4+0xb0/0x200 [lpfc] (unreliable) [...] [...] lpfc_sli_issue_abort_iotag+0x2b4/0x350 [lpfc] [...] [...] lpfc_els_abort+0x1a8/0x4a0 [lpfc] [...] [...] lpfc_rcv_plogi+0x6d4/0x700 [lpfc] [...] [...] lpfc_rcv_plogi_plogi_issue+0xd8/0x1d0 [lpfc] [...] [...] lpfc_disc_state_machine+0xc0/0x2b0 [lpfc] [...] [...] lpfc_els_unsol_buffer+0xcc0/0x26c0 [lpfc] [...] [...] lpfc_els_unsol_event+0xa8/0x220 [lpfc] [...] [...] lpfc_complete_unsol_iocb+0xb8/0x138 [lpfc] [...] [...] lpfc_sli4_handle_received_buffer+0x6a0/0xec0 [lpfc] [...] [...] lpfc_sli_handle_slow_ring_event_s4+0x1c4/0x240 [lpfc] [...] [...] lpfc_sli_handle_slow_ring_event+0x24/0x40 [lpfc] [...] [...] lpfc_do_work+0xd88/0x1970 [lpfc] [...] [...] kthread+0x108/0x130 [...] [...] ret_from_kernel_thread+0x5c/0xbc <...> Fixes: 22466da5 ("lpfc: Fix possible NULL pointer dereference") Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit 325896ff upstream. Hugh notes in response to commit 4cb19355 "device-dax: fail all private mapping attempts": "I think that is more restrictive than you intended: haven't tried, but I believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no way to mmap /dev/dax without write permission to it." Indeed it does restrict read-only mappings, switch to checking VM_MAYSHARE, not VM_SHARED. Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Pawel Lebioda <pawel.lebioda@intel.com> Fixes: 4cb19355 ("device-dax: fail all private mapping attempts") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit 1be5d4fa upstream. While debugging the rtmutex unlock vs. dequeue race Will suggested to use READ_ONCE() in rt_mutex_owner() as it might race against the cmpxchg_release() in unlock_rt_mutex_safe(). Will: "It's a minor thing which will most likely not matter in practice" Careful search did not unearth an actual problem in todays code, but it's better to be safe than surprised. Suggested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit dbb26055 upstream. David reported a futex/rtmutex state corruption. It's caused by the following problem: CPU0 CPU1 CPU2 l->owner=T1 rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T2) boost() unlock(l->wait_lock) schedule() rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T3) boost() unlock(l->wait_lock) schedule() signal(->T2) signal(->T3) lock(l->wait_lock) dequeue(T2) deboost() unlock(l->wait_lock) lock(l->wait_lock) dequeue(T3) ===> wait list is now empty deboost() unlock(l->wait_lock) lock(l->wait_lock) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; l->owner = owner ==> l->owner = T1 } lock(l->wait_lock) rt_mutex_unlock(l) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; cmpxchg(l->owner, T1, NULL) ===> Success (l->owner = NULL) l->owner = owner ==> l->owner = T1 } That means the problem is caused by fixup_rt_mutex_waiters() which does the RMW to clear the waiters bit unconditionally when there are no waiters in the rtmutexes rbtree. This can be fatal: A concurrent unlock can release the rtmutex in the fastpath because the waiters bit is not set. If the cmpxchg() gets in the middle of the RMW operation then the previous owner, which just unlocked the rtmutex is set as the owner again when the write takes place after the successfull cmpxchg(). The solution is rather trivial: verify that the owner member of the rtmutex has the waiters bit set before clearing it. This does not require a cmpxchg() or other atomic operations because the waiters bit can only be set and cleared with the rtmutex wait_lock held. It's also safe against the fast path unlock attempt. The unlock attempt via cmpxchg() will either see the bit set and take the slowpath or see the bit cleared and release it atomically in the fastpath. It's remarkable that the test program provided by David triggers on ARM64 and MIPS64 really quick, but it refuses to reproduce on x86-64, while the problem exists there as well. That refusal might explain that this got not discovered earlier despite the bug existing from day one of the rtmutex implementation more than 10 years ago. Thanks to David for meticulously instrumenting the code and providing the information which allowed to decode this subtle problem. Reported-by: David Daney <ddaney@caviumnetworks.com> Tested-by: David Daney <david.daney@cavium.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core") Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sergey Senozhatsky authored
commit 5c7e9ccd upstream. zram hot_add sysfs attribute is a very 'special' attribute - reading from it creates a new uninitialized zram device. This file, by a mistake, can be read by a 'normal' user at the moment, while only root must be able to create a new zram device, therefore hot_add attribute must have S_IRUSR mode, not S_IRUGO. [akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols] Fixes: 6566d1a3 ("zram: add dynamic device add/remove functionality") Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.comSigned-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Steven Allen <steven@stebalien.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Helge Deller authored
commit 24d0492b upstream. At bootup we run measurements to calculate the best threshold for when we should be using full TLB flushes instead of just flushing a specific amount of TLB entries. This performance test is run over the kernel text segment. But running this TLB performance test on the kernel text segment turned out to crash some SMP machines when the kernel text pages were mapped as huge pages. To avoid those crashes this patch simply skips this test on some SMP machines and calculates an optimal threshold based on the maximum number of available TLB entries and number of online CPUs. On a technical side, this seems to happen: The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge instruction seems to work without problems even if the pages were mapped as huge pages. But on SMP systems the TLB purge instruction is broadcasted to other CPUs. Those CPUs then crash the machine because the page size is not as expected. C8000 machines with PA8800/PA8900 CPUs were not affected by this problem, because the required cache coherency prohibits to use huge pages at all. Sadly I didn't found any documentation about this behaviour, so this finding is purely based on testing with phyiscal SMP machines (A500-44 and J5000, both were 2-way boxes). Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John David Anglin authored
commit febe4296 upstream. We have four routines in pacache.S that use temporary alias pages: copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and flush_icache_page_asm(). copy_user_page_asm() and clear_user_page_asm() don't purge the TLB entry used for the operation. flush_dcache_page_asm() and flush_icache_page_asm do purge the entry. Presumably, this was thought to optimize TLB use. However, the operation is quite heavy weight on PA 1.X processors as we need to take the TLB lock and a TLB broadcast is sent to all processors. This patch removes the purges from flush_dcache_page_asm() and flush_icache_page_asm. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John David Anglin authored
commit c78e710c upstream. The attached change interchanges the order of purging the TLB and setting the corresponding page table entry. TLB purges are strongly ordered. It occurred to me one night that setting the PTE first might have subtle ordering issues on SMP machines and cause random memory corruption. A TLB lock guards the insertion of user TLB entries. So after the TLB is purged, a new entry can't be inserted until the lock is released. This ensures that the new PTE value is used when the lock is released. Since making this change, no random segmentation faults have been observed on the Debian hppa buildd servers. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miklos Szeredi authored
commit c01638f5 upstream. Basically, the pjdfstests set the ownership of a file to 06555, and then chowns it (as root) to a new uid/gid. Prior to commit a09f99ed ("fuse: fix killing s[ug]id in setattr"), fuse would send down a setattr with both the uid/gid change and a new mode. Now, it just sends down the uid/gid change. Technically this is NOTABUG, since POSIX doesn't _require_ that we clear these bits for a privileged process, but Linux (wisely) has done that and I think we don't want to change that behavior here. This is caused by the use of should_remove_suid(), which will always return 0 when the process has CAP_FSETID. In fact we really don't need to be calling should_remove_suid() at all, since we've already been indicated that we should remove the suid, we just don't want to use a (very) stale mode for that. This patch should fix the above as well as simplify the logic. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: a09f99ed ("fuse: fix killing s[ug]id in setattr") Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ben Hutchings authored
commit 10c77dba upstream. OPAL is not callable from 32-bit mode and the assembly code for it may not even build (depending on how binutils was configured). References: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpcspe&ver=4.8.7-1&stamp=1479203712 Fixes: 656ad58e ("powerpc/boot: Add OPAL console to epapr wrappers") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Benjamin Herrenschmidt authored
commit dd7b2f03 upstream. On 64-bit CPUs with no-execute support and non-snooping icache, such as 970 or POWER4, we have a software mechanism to ensure coherency of the cache (using exec faults when needed). This was broken due to a logic error when the code was rewritten from assembly to C, previously the assembly code did: BEGIN_FTR_SECTION mr r4,r30 mr r5,r7 bl hash_page_do_lazy_icache END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE) Which tests that: (cpu_features & (NOEXECUTE | COHERENT_ICACHE)) == NOEXECUTE Which says that the current cpu does have NOEXECUTE, but does not have COHERENT_ICACHE. Fixes: 91f1da99 ("powerpc/mm: Convert 4k hash insert to C") Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C") Fixes: a43c0eb8 ("powerpc/mm: Convert 4k insert from asm to C") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Change log verbosification] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Donnellan authored
commit 409bf7f8 upstream. In eeh_reset_device(), we take the pci_rescan_remove_lock immediately after after we call eeh_reset_pe() to reset the PCI controller. We then call eeh_clear_pe_frozen_state(), which can return an error. In this case, we bail out of eeh_reset_device() without calling pci_unlock_rescan_remove(). Add a call to pci_unlock_rescan_remove() in the eeh_clear_pe_frozen_state() error path so that we don't cause a deadlock later on. Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Fixes: 78954700 ("powerpc/eeh: Avoid I/O access during PE reset") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 10 Dec, 2016 20 commits
-
-
Greg Kroah-Hartman authored
-
Tobias Brunner authored
commit a55e2386 upstream. When handling inbound packets, the two halves of the sequence number stored on the skb are already in network order. Fixes: 000ae7b2 ("esp6: Switch to new AEAD interface") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tobias Brunner authored
commit 7c7fedd5 upstream. When handling inbound packets, the two halves of the sequence number stored on the skb are already in network order. Fixes: 7021b2e1 ("esp4: Switch to new AEAD interface") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miroslav Urbanek authored
commit 6b226487 upstream. The threshold for OOM protection is too small for systems with large number of CPUs. Applications report ENOBUFs on connect() every 10 minutes. The problem is that the variable net->xfrm.flow_cache_gc_count is a global counter while the variable fc->high_watermark is a per-CPU constant. Take the number of CPUs into account as well. Fixes: 6ad3122a ("flowcache: Avoid OOM condition under preasure") Reported-by: Lukáš Koldrt <lk@excello.cz> Tested-by: Jan Hejl <jh@excello.cz> Signed-off-by: Miroslav Urbanek <mu@miroslavurbanek.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eli Cooper authored
commit 80d1106a upstream. This reverts commit ae148b08 ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"). skb->protocol is now set in __ip_local_out() and __ip6_local_out() before dst_output() is called. It is no longer necessary to do it for each tunnel. Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eli Cooper authored
commit f4180439 upstream. When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IP before dst_output() is called, fixing a bug where GSO packets sent through a sit tunnel are dropped when xfrm is involved. Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eli Cooper authored
commit b4e479a9 upstream. When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called, fixing a bug where GSO packets sent through an ipip6 tunnel are dropped when xfrm is involved. Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit a0ac402c upstream. In theory we could map other things, but there's a reason that function is called "user_iov". Using anything else (like splice can do) just confuses it. Reported-and-tested-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit b57332b4 upstream. [stable note, need this to prevent build warning in commit a0ac402c] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Larsson authored
[ Upstream commit 07b5ab3f ] Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Tai authored
[ Upstream commit 87a349f9 ] A compile warning is introduced by a commit to fix the find_node(). This patch fix the compile warning by moving find_node() into __init section. Because find_node() is only used by memblock_nid_range() which is only used by a __init add_node_ranges(). find_node() and memblock_nid_range() should also be inside __init section. Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Tai authored
[ Upstream commit 74a5ed5c ] When booting up LDOM, find_node() warns that a physical address doesn't match a NUMA node. WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835 find_node+0xf4/0x120 find_node: A physical address doesn't match a NUMA node rule. Some physical memory will be owned by node 0.Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4 Call Trace: [0000000000468ba0] __warn+0xc0/0xe0 [0000000000468c74] warn_slowpath_fmt+0x34/0x60 [00000000004592f4] find_node+0xf4/0x120 [0000000000dd0774] add_node_ranges+0x38/0xe4 [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4 [0000000000dd0e9c] bootmem_init+0xb8/0x160 [0000000000dd174c] paging_init+0x808/0x8fc [0000000000dcb0d0] setup_arch+0x2c8/0x2f0 [0000000000dc68a0] start_kernel+0x48/0x424 [0000000000dcb374] start_early_boot+0x27c/0x28c [0000000000a32c08] tlb_fixup_done+0x4c/0x64 [0000000000027f08] 0x27f08 It is because linux use an internal structure node_masks[] to keep the best memory latency node only. However, LDOM mdesc can contain single latency-group with multiple memory latency nodes. If the address doesn't match the best latency node within node_masks[], it should check for an alternative via mdesc. The warning message should only be printed if the address doesn't match any node_masks[] nor within mdesc. To minimize the impact of searching mdesc every time, the last matched mask and index is stored in a variable. Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Reviewed-by: Chris Hyser <chris.hyser@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Duyck authored
[ Upstream commit a52ca62c ] It has been reported that update_suffix can be expensive when it is called on a large node in which most of the suffix lengths are the same. The time required to add 200K entries had increased from around 3 seconds to almost 49 seconds. In order to address this we need to move the code for updating the suffix out of resize and instead just have it handled in the cases where we are pushing a node that increases the suffix length, or will decrease the suffix length. Fixes: 5405afd1 ("fib_trie: Add tracking value for suffix length") Reported-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: Robert Shearman <rshearma@brocade.com> Tested-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Duyck authored
[ Upstream commit 1a239173 ] It wasn't necessary to pass a leaf in when doing the suffix updates so just drop it. Instead just pass the suffix and work with that. Since we dropped the leaf there is no need to include that in the name so the names are updated to node_push_suffix and node_pull_suffix. Finally I noticed that the logic for pulling the suffix length back actually had some issues. Specifically it would stop prematurely if there was a longer suffix, but it was not as long as the original suffix. I updated the code to address that in node_pull_suffix. Fixes: 5405afd1 ("fib_trie: Add tracking value for suffix length") Suggested-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: Robert Shearman <rshearma@brocade.com> Tested-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Duyck authored
[ Upstream commit 3114cdfe ] Fix a small memory leak that can occur where we leak a fib_alias in the event of us not being able to insert it into the local table. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Duyck authored
[ Upstream commit 3b709334, the FIB offload removal didn't occur in 4.8 so that part of this patch isn't here. However we still need to fib_unmerge() bits. ] The patch that removed the FIB offload infrastructure was a bit too aggressive and also removed code needed to clean up us splitting the table if additional rules were added. Specifically the function fib_trie_flush_external was called at the end of a new rule being added to flush the foreign trie entries from the main trie. I updated the code so that we only call fib_trie_flush_external on the main table so that we flush the entries for local from main. This way we don't call it for every rule change which is what was happening previously. Fixes: 347e3b28 ("switchdev: remove FIB offload infrastructure") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
[ Upstream commit 0eab121e ] Prior to commit c0371da6 ("put iov_iter into msghdr") in v3.19, there was no check that the iovec contained enough bytes for an ICMP header, and the read loop would walk across neighboring stack contents. Since the iov_iter conversion, bad arguments are noticed, but the returned error is EFAULT. Returning EINVAL is a clearer error and also solves the problem prior to v3.19. This was found using trinity with KASAN on v3.18: BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0 Read of size 8 by task trinity-c2/9623 page:ffffffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G BU 3.18.0-dirty #15 Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT) Call trace: [<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90 [<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171 [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50 [< inline >] print_address_description mm/kasan/report.c:147 [< inline >] kasan_report_error mm/kasan/report.c:236 [<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259 [< inline >] check_memory_region mm/kasan/kasan.c:264 [<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507 [<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15 [< inline >] memcpy_from_msg include/linux/skbuff.h:2667 [<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674 [<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714 [<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749 [< inline >] __sock_sendmsg_nosec net/socket.c:624 [< inline >] __sock_sendmsg net/socket.c:632 [<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643 [< inline >] SYSC_sendto net/socket.c:1797 [<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761 CVE-2016-8399 Reported-by: Qidan He <i@flanker017.me> Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit b98b0bc8 ] CAP_NET_ADMIN users should not be allowed to set negative sk_sndbuf or sk_rcvbuf values, as it can lead to various memory corruptions, crashes, OOM... Note that before commit 82981930 ("net: cleanups in sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF and SO_RCVBUF were vulnerable. This needs to be backported to all known linux kernels. Again, many thanks to syzkaller team for discovering this gem. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sabrina Dubroca authored
[ Upstream commit 5b010147 ] geneve{,6}_build_skb can end up doing a pskb_expand_head(), which makes the ip_hdr(skb) reference we stashed earlier stale. Since it's only needed as an argument to ip_tunnel_ecn_encap(), move this directly in the function call. Fixes: 08399efc ("geneve: ensure ECN info is handled properly in all tx/rx paths") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Kubeček authored
[ Upstream commit 3de81b75 ] Qian Zhang (张谦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-