- 13 Sep, 2014 40 commits
-
-
Ben Hutchings authored
-
Michal Simek authored
commit 00708d42 upstream. When building with latest binutils, vmlinux includes some sections which need to be stripped out when building the binary image. Signed-off-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Boris Ostrovsky authored
commit 8762e509 upstream. init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.comReviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andy Lutomirski authored
commit 7209a75d upstream. This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.netSigned-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit 34273f41 upstream. Embedded systems, which may be very memory-size-sensitive, are extremely unlikely to ever encounter any 16-bit software, so make it a CONFIG_EXPERT option to turn off support for any 16-bit software whatsoever. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit 197725de upstream. Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit 20b68535 upstream. Header guard is #ifndef, not #ifdef... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit e1fe9ed8 upstream. Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit 3891a04a upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
H. Peter Anvin authored
commit 7ed6fb9b upstream. This reverts commit fa81511b in preparation of merging in the proper fix (espfix64). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Sam Ravnborg authored
commit cbf1ef6b upstream. In sparc headers we use the following pattern: #if defined(__sparc__) && defined(__arch64__) sparc64 specific stuff #else sparc32 specific stuff #endif In types.h this pattern was not followed and here we only checked for __sparc__ for no good reason. It was a left-over from long time ago. I checked other architectures - and most of them do not have any such checks. And all the recently merged versions uses the asm-generic version. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> [bwh: Guenter backported this to 3.2: - Adjusted filenames, context - There's no duplicate export of types.h to delete] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andi Kleen authored
commit e7b691b0 upstream. slab_node() could access current->mempolicy from interrupt context. However there's a race condition during exit where the mempolicy is first freed and then the pointer zeroed. Using this from interrupts seems bogus anyways. The interrupt will interrupt a random process and therefore get a random mempolicy. Many times, this will be idle's, which noone can change. Just disable this here and always use local for slab from interrupts. I also cleaned up the callers of slab_node a bit which always passed the same argument. I believe the original mempolicy code did that in fact, so it's likely a regression. v2: send version with correct logic v3: simplify. fix typo. Reported-by: Arun Sharma <asharma@fb.com> Cc: penberg@kernel.org Cc: cl@linux.com Signed-off-by: Andi Kleen <ak@linux.intel.com> [tdmackey@twitter.com: Rework control flow based on feedback from cl@linux.com, fix logic, and cleanup current task_struct reference] Acked-by: David Rientjes <rientjes@google.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Mackey <tdmackey@twitter.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Andrey Utkin authored
[ Upstream commit 093758e3 ] This commit is a guesswork, but it seems to make sense to drop this break, as otherwise the following line is never executed and becomes dead code. And that following line actually saves the result of local calculation by the pointer given in function argument. So the proposed change makes sense if this code in the whole makes sense (but I am unable to analyze it in the whole). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81641Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Sowmini Varadhan authored
[ Upstream commit 4ec1b010 ] The LDC handshake could have been asynchronously triggered after ldc_bind() enables the ldc_rx() receive interrupt-handler (and thus intercepts incoming control packets) and before vio_port_up() calls ldc_connect(). If that is the case, ldc_connect() should return 0 and let the state-machine progress. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Karl Volz <karl.volz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Christopher Alexander Tobias Schulze authored
[ Upstream commit fe418231 ] Fix detection of BREAK on sunsab serial console: BREAK detection was only performed when there were also serial characters received simultaneously. To handle all BREAKs correctly, the check for BREAK and the corresponding call to uart_handle_break() must also be done if count == 0, therefore duplicate this code fragment and pull it out of the loop over the received characters. Patch applies to 3.16-rc6. Signed-off-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Christopher Alexander Tobias Schulze authored
[ Upstream commit 5cdceab3 ] Fix regression in bbc i2c temperature and fan control on some Sun systems that causes the driver to refuse to load due to the bbc_i2c_bussel resource not being present on the (second) i2c bus where the temperature sensors and fan control are located. (The check for the number of resources was removed when the driver was ported to a pure OF driver in mid 2008.) Signed-off-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit 4ca9a237 ] Based almost entirely upon a patch by Christopher Alexander Tobias Schulze. In commit db64fe02 ("mm: rewrite vmap layer") lazy VMAP tlb flushing was added to the vmalloc layer. This causes problems on sparc64. Sparc64 has two VMAP mapped regions and they are not contiguous with eachother. First we have the malloc mapping area, then another unrelated region, then the vmalloc region. This "another unrelated region" is where the firmware is mapped. If the lazy TLB flushing logic in the vmalloc code triggers after we've had both a module unload and a vfree or similar, it will pass an address range that goes from somewhere inside the malloc region to somewhere inside the vmalloc region, and thus covering the openfirmware area entirely. The sparc64 kernel learns about openfirmware's dynamic mappings in this region early in the boot, and then services TLB misses in this area. But openfirmware has some locked TLB entries which are not mentioned in those dynamic mappings and we should thus not disturb them. These huge lazy TLB flush ranges causes those openfirmware locked TLB entries to be removed, resulting in all kinds of problems including hard hangs and crashes during reboot/reset. Besides causing problems like this, such huge TLB flush ranges are also incredibly inefficient. A plea has been made with the author of the VMAP lazy TLB flushing code, but for now we'll put a safety guard into our flush_tlb_kernel_range() implementation. Since the implementation has become non-trivial, stop defining it as a macro and instead make it a function in a C source file. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit 18f38132 ] The assumption was that update_mmu_cache() (and the equivalent for PMDs) would only be called when the PTE being installed will be accessible by the user. This is not true for code paths originating from remove_migration_pte(). There are dire consequences for placing a non-valid PTE into the TSB. The TLB miss frramework assumes thatwhen a TSB entry matches we can just load it into the TLB and return from the TLB miss trap. So if a non-valid PTE is in there, we will deadlock taking the TLB miss over and over, never satisfying the miss. Just exit early from update_mmu_cache() and friends in this situation. Based upon a report and patch from Christopher Alexander Tobias Schulze. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit 5aa4ecfd ] This is the prevent previous stores from overlapping the block stores done by the memcpy loop. Based upon a glibc patch by Jose E. Marchesi Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit b18eb2d7 ] Access to the TSB hash tables during TLB misses requires that there be an atomic 128-bit quad load available so that we fetch a matching TAG and DATA field at the same time. On cpus prior to UltraSPARC-III only virtual address based quad loads are available. UltraSPARC-III and later provide physical address based variants which are easier to use. When we only have virtual address based quad loads available this means that we have to lock the TSB into the TLB at a fixed virtual address on each cpu when it runs that process. We can't just access the PAGE_OFFSET based aliased mapping of these TSBs because we cannot take a recursive TLB miss inside of the TLB miss handler without risking running out of hardware trap levels (some trap combinations can be deep, such as those generated by register window spill and fill traps). Without huge pages it's working perfectly fine, but when the huge TSB got added another chunk of fixed virtual address space was not allocated for this second TSB mapping. So we were mapping both the 8K and 4MB TSBs to the same exact virtual address, causing multiple TLB matches which gives undefined behavior. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit e5c460f4 ] This was found using Dave Jone's trinity tool. When a user process which is 32-bit performs a load or a store, the cpu chops off the top 32-bits of the effective address before translating it. This is because we run 32-bit tasks with the PSTATE_AM (address masking) bit set. We can't run the kernel with that bit set, so when the kernel accesses userspace no address masking occurs. Since a 32-bit process will have no mappings in that region we will properly fault, so we don't try to handle this using access_ok(), which can safely just be a NOP on sparc64. Real faults from 32-bit processes should never generate such addresses so a bug check was added long ago, and it barks in the logs if this happens. But it also barks when a kernel user access causes this condition, and that _can_ happen. For example, if a pointer passed into a system call is "0xfffffffc" and the kernel access 4 bytes offset from that pointer. Just handle such faults normally via the exception entries. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit 70ffc6eb ] Make get_user_insn() able to cope with huge PMDs. Next, make do_fault_siginfo() more robust when get_user_insn() can't actually fetch the instruction. In particular, use the MMU announced fault address when that happens, instead of calling compute_effective_address() and computing garbage. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit d037d163 ] If we have a 32-bit task we must chop off the top 32-bits of the 64-bit value just as the cpu would. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kirill Tkhai authored
[ Upstream commit 49b6c01f ] One more place where we must not be able to be preempted or to be interrupted in RT. Always actually disable interrupts during synchronization cycle. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
David S. Miller authored
[ Upstream commit aa3449ee ] Only the second argument, 'op', is signed. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 757efd32 ] Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp] [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp] [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp] [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp] [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp] [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp] [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2 This is a followup of commits f1d8cba6 ("inet: fix possible seqlock deadlocks") and 7f88c6b2 ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Sasha Levin authored
[ Upstream commit 06ebb06d ] Check for cases when the caller requests 0 bytes instead of running off and dereferencing potentially invalid iovecs. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Vlad Yasevich authored
[ Upstream commit 081e83a7 ] Macvlan devices do not initialize vlan_features. As a result, any vlan devices configured on top of macvlans perform very poorly. Initialize vlan_features based on the vlan features of the lower-level device. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Borkmann authored
[ Upstream commit 1be9a950 ] Jason reported an oops caused by SCTP on his ARM machine with SCTP authentication enabled: Internal error: Oops: 17 [#1] ARM CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1 task: c6eefa40 ti: c6f52000 task.ti: c6f52000 PC is at sctp_auth_calculate_hmac+0xc4/0x10c LR is at sg_init_table+0x20/0x38 pc : [<c024bb80>] lr : [<c00f32dc>] psr: 40000013 sp : c6f538e8 ip : 00000000 fp : c6f53924 r10: c6f50d80 r9 : 00000000 r8 : 00010000 r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254 r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660 Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005397f Table: 06f28000 DAC: 00000015 Process sctp-test (pid: 104, stack limit = 0xc6f521c0) Stack: (0xc6f538e8 to 0xc6f54000) [...] Backtrace: [<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8) [<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844) [<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28) [<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220) [<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4) [<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160) [<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74) [<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888) While we already had various kind of bugs in that area ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable") and b14878cc ("net: sctp: cache auth_enable per endpoint"), this one is a bit of a different kind. Giving a bit more background on why SCTP authentication is needed can be found in RFC4895: SCTP uses 32-bit verification tags to protect itself against blind attackers. These values are not changed during the lifetime of an SCTP association. Looking at new SCTP extensions, there is the need to have a method of proving that an SCTP chunk(s) was really sent by the original peer that started the association and not by a malicious attacker. To cause this bug, we're triggering an INIT collision between peers; normal SCTP handshake where both sides intent to authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO parameters that are being negotiated among peers: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- RFC4895 says that each endpoint therefore knows its own random number and the peer's random number *after* the association has been established. The local and peer's random number along with the shared key are then part of the secret used for calculating the HMAC in the AUTH chunk. Now, in our scenario, we have 2 threads with 1 non-blocking SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling sctp_bindx(3), listen(2) and connect(2) against each other, thus the handshake looks similar to this, e.g.: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------- -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------> ... Since such collisions can also happen with verification tags, the RFC4895 for AUTH rather vaguely says under section 6.1: In case of INIT collision, the rules governing the handling of this Random Number follow the same pattern as those for the Verification Tag, as explained in Section 5.2.4 of RFC 2960 [5]. Therefore, each endpoint knows its own Random Number and the peer's Random Number after the association has been established. In RFC2960, section 5.2.4, we're eventually hitting Action B: B) In this case, both sides may be attempting to start an association at about the same time but the peer endpoint started its INIT after responding to the local endpoint's INIT. Thus it may have picked a new Verification Tag not being aware of the previous Tag it had sent this endpoint. The endpoint should stay in or enter the ESTABLISHED state but it MUST update its peer's Verification Tag from the State Cookie, stop any init or cookie timers that may running and send a COOKIE ACK. In other words, the handling of the Random parameter is the same as behavior for the Verification Tag as described in Action B of section 5.2.4. Looking at the code, we exactly hit the sctp_sf_do_dupcook_b() case which triggers an SCTP_CMD_UPDATE_ASSOC command to the side effect interpreter, and in fact it properly copies over peer_{random, hmacs, chunks} parameters from the newly created association to update the existing one. Also, the old asoc_shared_key is being released and based on the new params, sctp_auth_asoc_init_active_key() updated. However, the issue observed in this case is that the previous asoc->peer.auth_capable was 0, and has *not* been updated, so that instead of creating a new secret, we're doing an early return from the function sctp_auth_asoc_init_active_key() leaving asoc->asoc_shared_key as NULL. However, we now have to authenticate chunks from the updated chunk list (e.g. COOKIE-ACK). That in fact causes the server side when responding with ... <------------------ AUTH; COOKIE-ACK ----------------- ... to trigger a NULL pointer dereference, since in sctp_packet_transmit(), it discovers that an AUTH chunk is being queued for xmit, and thus it calls sctp_auth_calculate_hmac(). Since the asoc->active_key_id is still inherited from the endpoint, and the same as encoded into the chunk, it uses asoc->asoc_shared_key, which is still NULL, as an asoc_key and dereferences it in ... crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len) ... causing an oops. All this happens because sctp_make_cookie_ack() called with the *new* association has the peer.auth_capable=1 and therefore marks the chunk with auth=1 after checking sctp_auth_send_cid(), but it is *actually* sent later on over the then *updated* association's transport that didn't initialize its shared key due to peer.auth_capable=0. Since control chunks in that case are not sent by the temporary association which are scheduled for deletion, they are issued for xmit via SCTP_CMD_REPLY in the interpreter with the context of the *updated* association. peer.auth_capable was 0 in the updated association (which went from COOKIE_WAIT into ESTABLISHED state), since all previous processing that performed sctp_process_init() was being done on temporary associations, that we eventually throw away each time. The correct fix is to update to the new peer.auth_capable value as well in the collision case via sctp_assoc_update(), so that in case the collision migrated from 0 -> 1, sctp_auth_asoc_init_active_key() can properly recalculate the secret. This therefore fixes the observed server panic. Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Christoph Paasch authored
[ Upstream commit 1f74e613 ] In vegas we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. Then, we need to do do_div to allow this to be used on 32-bit arches. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Doug Leith <doug.leith@nuim.ie> Fixes: 8d3a564d (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Christoph Paasch authored
[ Upstream commit 45a07695 ] In veno we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. A first attempt at fixing 76f10177 ([TCP]: TCP Veno congestion control) was made by 15913114 (tcp: Overflow bug in Vegas), but it failed to add the required cast in tcp_veno_cong_avoid(). Fixes: 76f10177 ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 04ca6973 ] In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit 73f156a6 ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu> Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Hannes Frederic Sowa <hannes@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Eric Dumazet authored
[ Upstream commit 73f156a6 ] Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jonas Bonn authored
commit abdf8b5e upstream. Use of EXPORT_SYMBOL requires inclusion of export.h Signed-off-by: Jonas Bonn <jonas@southpole.se> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ralf Baechle authored
commit ff522058 upstream. This fixes the following issue BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/1761 caller is blast_dcache32+0x30/0x254 Call Trace: [<8047f02c>] dump_stack+0x8/0x34 [<802e7e40>] debug_smp_processor_id+0xe0/0xf0 [<80114d94>] blast_dcache32+0x30/0x254 [<80118484>] r4k_dma_cache_wback_inv+0x200/0x288 [<80110ff0>] mips_dma_map_sg+0x108/0x180 [<80355098>] ide_dma_prepare+0xf0/0x1b8 [<8034eaa4>] do_rw_taskfile+0x1e8/0x33c [<8035951c>] ide_do_rw_disk+0x298/0x3e4 [<8034a3c4>] do_ide_request+0x2e0/0x704 [<802bb0dc>] __blk_run_queue+0x44/0x64 [<802be000>] queue_unplugged.isra.36+0x1c/0x54 [<802beb94>] blk_flush_plug_list+0x18c/0x24c [<802bec6c>] blk_finish_plug+0x18/0x48 [<8026554c>] journal_commit_transaction+0x3b8/0x151c [<80269648>] kjournald+0xec/0x238 [<8014ac00>] kthread+0xb8/0xc0 [<8010268c>] ret_from_kernel_thread+0x14/0x1c Caches in most systems are identical - but not always, so we can't avoid the use of smp_call_function() by just looking at the boot CPU's data, have to fiddle with preemption instead. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/5835Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Florian Fainelli authored
commit 6c37c958 upstream. cc1: warnings being treated as errors arch/mips/kernel/perf_event_mipsxx.c:166: error: 'counters_per_cpu_to_total' defined but not used make[2]: *** [arch/mips/kernel/perf_event_mipsxx.o] Error 1 make[2]: *** Waiting for unfinished jobs.... It was first introduced by 82091564 [MIPS: perf: Add support for 64-bit perf counters.] in 3.2. Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Cc: david.daney@cavium.com Patchwork: https://patchwork.linux-mips.org/patch/3357/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Stefan Kristiansson authored
commit 160d8378 upstream. Prevents build issue with updated toolchain Reported-by: Jack Thomasson <jkt@moonlitsw.com> Tested-by: Christian Svensson <blue@cmd.nu> Signed-off-by: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Signed-off-by: Jonas Bonn <jonas@southpole.se> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Johan Hovold authored
commit 5654699f upstream. Make sure to verify the number of ports requested by subdriver to avoid writing beyond the end of fixed-size array in interface data. The current usb-serial implementation is limited to eight ports per interface but failed to verify that the number of ports requested by a subdriver (which could have been determined from device descriptors) did not exceed this limit. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: s/ddev/\&interface->dev/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Johan Hovold authored
commit d979e9f9 upstream. Make sure to verify the maximum number of endpoints per type to avoid writing beyond the end of a stack-allocated array. The current usb-serial implementation is limited to eight ports per interface but failed to verify that the number of endpoints of a certain type reported by a device did not exceed this limit. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mark Rutland authored
commit 2c32c65e upstream. On revisions of Cortex-A15 prior to r3p3, a CLREX instruction at PL1 may falsely trigger a watchpoint exception, leading to potential data aborts during exception return and/or livelock. This patch resolves the issue in the following ways: - Replacing our uses of CLREX with a dummy STREX sequence instead (as we did for v6 CPUs). - Removing the clrex code from v7_exit_coherency_flush and derivatives, since this only exists as a minor performance improvement when non-cached exclusives are in use (Linux doesn't use these). Benchmarking on a variety of ARM cores revealed no measurable performance difference with this change applied, so the change is performed unconditionally and no new Kconfig entry is added. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> [bwh: Backported to 3.2: - Drop inapplicable changes to arch/arm/include/asm/cacheflush.h and arch/arm/mach-exynos/mcpm-exynos.c] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-