- 23 Jul, 2016 1 commit
-
-
Andy Lutomirski authored
Valdis Kletnieks bisected a boot failure back to this recent commit: 360cb4d1 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated") I broke the case where a PUD table got allocated -- populate_pud() would wander off a pgd_none entry and get lost. I'm not sure how this survived my testing. Fix the original issue in a much simpler way. The problem was that, if we allocated a PUD table, failed to populate it, and freed it, another CPU could potentially keep using the PGD entry we installed (either by copying it via vmalloc_fault or by speculatively caching it). There's a straightforward fix: simply leave the top-level entry in place if this happens. This can't waste any significant amount of memory -- there are at most 256 entries like this systemwide and, as a practical matter, if we hit this failure path repeatedly, we're likely to reuse the same page anyway. For context, this is a reversion with this hunk added in: if (ret < 0) { + /* + * Leave the PUD page in place in case some other CPU or thread + * already found it, but remove any useless entries we just + * added to it. + */ - unmap_pgd_range(cpa->pgd, addr, + unmap_pud_range(pgd_entry, addr, addr + (cpa->numpages << PAGE_SHIFT)); return ret; } This effectively open-codes what the now-deleted unmap_pgd_range() function used to do except that unmap_pgd_range() used to try to free the page as well. Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Mike Krinkin <krinkin.m.u@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 15 Jul, 2016 14 commits
-
-
H.J. Lu authored
Don't use the same syscall numbers for 2 different syscalls: 534 x32 preadv compat_sys_preadv64 535 x32 pwritev compat_sys_pwritev64 534 x32 preadv2 compat_sys_preadv2 535 x32 pwritev2 compat_sys_pwritev2 Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset is passed in one 64-bit register on x32, similar to compat_sys_preadv64() and compat_sys_pwritev64(). Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
It's statically initialized to zero -- no need to dynamically initialize it to zero as well. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6cf6314dce3051371a913ee19d1b88e29c68c560.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
It serves no purpose -- raw_smp_processor_id() works fine. This change will be needed to move thread_info off the stack. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
struct thread_info is a legacy mess. To prepare for its partial removal, move thread_info::addr_limit out. As an added benefit, this way is simpler. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Rename it to match the thread_struct::uaccess_err pattern and also because it was too long. Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
struct thread_info is a legacy mess. To prepare for its partial removal, move the uaccess control fields out -- they're straightforward. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
If we call do_exit() with a clean stack, we greatly reduce the risk of recursive oopses due to stack overflow in do_exit, and we allow do_exit to work even if we OOPS from an IST stack. The latter gives us a much better chance of surviving long enough after we detect a stack overflow to write out our logs. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/32f73ceb372ec61889598da5e5b145889b9f2e19.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
If we get a vmalloc fault while current->active_mm->pgd doesn't match CR3, we'll crash without this change. I've seen this failure mode on heavily instrumented kernels with virtually mapped stacks. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4650d7674185f165ed8fdf9ac4c5c35c5c179ba8.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
If we overflow the stack into a guard page, we'll recursively fault when trying to dump the contents of the guard page. Use probe_kernel_address() so we can recover if this happens. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e626d47a55d7b04dcb1b4d33faa95e8505b217c8.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
If we overflow the stack, print_context_stack() will abort. Detect this case and rewind back into the valid part of the stack so that we can trace it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ee1690eb2715ccc5dc187fde94effa4ca0ccbbcd.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
kernel_unmap_pages_in_pgd() is dangerous: if a PGD entry in init_mm.pgd were to be cleared, callers would need to ensure that the pgd entry hadn't been propagated to any other pgd. Its only caller was efi_cleanup_page_tables(), and that, in turn, was unused, so just delete both functions. This leaves a couple of other helpers unused, so delete them, too. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/77ff20fdde3b75cd393be5559ad8218870520248.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Andy Lutomirski authored
This avoids pointless races in which another CPU or task might see a partially populated global PGD entry. These races should normally be harmless, but, if another CPU propagates the entry via vmalloc_fault() and then populate_pgd() fails (due to memory allocation failure, for example), this prevents a use-after-free of the PGD entry. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bf99df27eac6835f687005364bd1fbd89130946c.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
So when memory hotplug removes a piece of physical memory from pagetable mappings, it also frees the underlying PGD entry. This complicates PGD management, so don't do this. We can keep the PGD mapped and the PUD table all clear - it's only a single 4K page per 512 GB of memory hotplugged. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/064ff6c7275734537f969e876f6cd0baa954d2cc.1468527351.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Conflicts: tools/testing/selftests/x86/Makefile Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 13 Jul, 2016 4 commits
-
-
Dave Hansen authored
The page table manipulation code seems to have grown a couple of sites that are looking for empty PTEs. Just in case one of these entries got a stray bit set, use pte_none() instead of checking for a zero pte_val(). The use pte_same() makes me a bit nervous. If we were doing a pte_same() check against two cleared entries and one of them had a stray bit set, it might fail the pte_same() check. But, I don't think we ever _do_ pte_same() for cleared entries. It is almost entirely used for checking for races in fault-in paths. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: dave.hansen@intel.com Cc: linux-mm@kvack.org Cc: mhocko@suse.com Link: http://lkml.kernel.org/r/20160708001915.813703D9@viggo.jf.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Dave Hansen authored
The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights Landing) has an erratum where a processor thread setting the Accessed or Dirty bits may not do so atomically against its checks for the Present bit. This may cause a thread (which is about to page fault) to set A and/or D, even though the Present bit had already been atomically cleared. These bits are truly "stray". In the case of the Dirty bit, the thread associated with the stray set was *not* allowed to write to the page. This means that we do not have to launder the bit(s); we can simply ignore them. If the PTE is used for storing a swap index or a NUMA migration index, the A bit could be misinterpreted as part of the swap type. The stray bits being set cause a software-cleared PTE to be interpreted as a swap entry. In some cases (like when the swap index ends up being for a non-existent swapfile), the kernel detects the stray value and WARN()s about it, but there is no guarantee that the kernel can always detect it. When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able to move the swap PTE format around to avoid these troublesome bits. But, 32-bit non-PAE is tight on bits. So, disallow it from running on this hardware. I can't imagine anyone wanting to run 32-bit non-highmem kernels on this hardware, but disallowing them from running entirely is surely the safe thing to do. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: dave.hansen@intel.com Cc: linux-mm@kvack.org Cc: mhocko@suse.com Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Dave Hansen authored
The erratum we are fixing here can lead to stray setting of the A and D bits. That means that a pte that we cleared might suddenly have A/D set. So, stop considering those bits when determining if a pte is pte_none(). The same goes for the other pmd_none() and pud_none(). pgd_none() can be skipped because it is not affected; we do not use PGD entries for anything other than pagetables on affected configurations. This adds a tiny amount of overhead to all pte_none() checks. I doubt we'll be able to measure it anywhere. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: dave.hansen@intel.com Cc: linux-mm@kvack.org Cc: mhocko@suse.com Link: http://lkml.kernel.org/r/20160708001912.5216F89C@viggo.jf.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Dave Hansen authored
This erratum can result in Accessed/Dirty getting set by the hardware when we do not expect them to be (on !Present PTEs). Instead of trying to fix them up after this happens, we just allow the bits to get set and try to ignore them. We do this by shifting the layout of the bits we use for swap offset/type in our 64-bit PTEs. It looks like this: bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0| after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| Note that D was already a don't care (X) even before. We just move TYPE up and turn its old spot (which could be hit by the A bit) into all don't cares. We take 5 bits away from the offset, but that still leaves us with 50 bits which lets us index into a 62-bit swapfile (4 EiB). I think that's probably fine for the moment. We could theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it doesn't gain us anything. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: dave.hansen@intel.com Cc: linux-mm@kvack.org Cc: mhocko@suse.com Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 10 Jul, 2016 2 commits
-
-
Paolo Bonzini authored
This matches what is already done for prepare_exit_to_usermode(), and saves about 60 clock cycles (4% speedup) with the benchmark in the previous commit message. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1466434712-31440-3-git-send-email-pbonzini@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Paolo Bonzini authored
Thanks to all the work that was done by Andy Lutomirski and others, enter_from_user_mode() and prepare_exit_to_usermode() are now called only with interrupts disabled. Let's provide them a version of user_enter()/user_exit() that skips saving and restoring the interrupt flag. On an AMD-based machine I tested this patch on, with force-enabled context tracking, the speed-up in system calls was 90 clock cycles or 6%, measured with the following simple benchmark: #include <sys/signal.h> #include <time.h> #include <unistd.h> #include <stdio.h> unsigned long rdtsc() { unsigned long result; asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n" "or %%rdx, %%rax" : "=a" (result) : : "rdx"); return result; } int main() { unsigned long tsc1, tsc2; int pid = getpid(); int i; tsc1 = rdtsc(); for (i = 0; i < 100000000; i++) kill(pid, SIGWINCH); tsc2 = rdtsc(); printf("%ld\n", tsc2 - tsc1); } Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1466434712-31440-2-git-send-email-pbonzini@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 09 Jul, 2016 2 commits
-
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Three fixes. One is the qla24xx MSI regression, one is a theoretical problem over blacklist matching, which would bite USB badly if it ever triggered and one is a system hang with a particular type of IPR device" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: qla2xxx: Fix NULL pointer deref in QLA interrupt SCSI: fix new bug in scsi_dev_info_list string matching ipr: Clear interrupt on croc/crocodile when running with LSI
-
- 08 Jul, 2016 17 commits
-
-
Linus Torvalds authored
Merge tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull eCryptfs fixes from Tyler Hicks: "Provide a more concise fix for CVE-2016-1583: - Additionally fixes linux-stable regressions caused by the cherry-picking of the original fix Some very minor changes that have queued up: - Fix typos in code comments - Remove unnecessary check for NULL before destroying kmem_cache" * tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: don't allow mmap when the lower fs doesn't support it Revert "ecryptfs: forbid opening files without mmap handler" ecryptfs: fix spelling mistakes eCryptfs: fix typos in comment ecryptfs: drop null test before destroy functions
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull IOMMU fixes from Joerg Roedel: "Two Fixes: - Intel VT-d fix for a suspend/resume issue, introduced with the scalability improvements in this cycle. - AMD IOMMU fix for systems that have unity mappings defined. There was a race where translation got enabled before the unity mappings were in place. This issue was seen on some HP servers" * tag 'iommu-fixes-v4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix unity mapping initialization race iommu/vt-d: Fix infinite loop in free_all_cpu_cached_iovas
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen bug fixes from David Vrabel: - Fix two bugs in the handling of xenbus transactions. - Make the xen acpi driver compatible with Xen 4.7. * tag 'for-linus-4.7b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7 xenbus: simplify xenbus_dev_request_and_reply() xenbus: don't bail early from xenbus_dev_request_and_reply() xenbus: don't BUG() on user mode induced condition
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fixes from Will Deacon: "A couple of late fixes here, but one that we've been sitting on for a few weeks while the details were worked out. Specifically, we now enforce USER_DS on taking exceptions whilst in the kernel, which avoids leaking kernel data to userspace through things like perf. The other patch is an update to a workaround for a hardware erratum on some Cavium SoCs. Summary: - Enforce USER_DS on exception entry from EL1 - Apply workaround for Cavium errata #27456 on Thunderx-81xx parts" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Enable workaround for Cavium erratum 27456 on thunderx-81xx arm64: kernel: Save and restore UAO and addr_limit on exception entry
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Ingo Molnar: "Three fixes: - A boot crash fix with certain configs - a MAINTAINERS entry update - Documentation typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Documentation: Fix various typos in Documentation/x86/ files x86/amd_nb: Fix boot crash on non-AMD systems MAINTAINERS: Update the Calgary IOMMU entry
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Ingo Molnar: "Two load-balancing fixes for cgroups-intense workloads" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion sched/fair: Fix effective_load() to consistently use smoothed load
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Ingo Molnar: "Various fixes: - 32-bit callgraph bug fix - suboptimal event group scheduling bug fix - event constraint fixes for Broadwell/Skylake - RAPL module name collision fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix pmu::filter_match for SW-led groups x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl perf/x86: Fix 32-bit perf user callgraph collection perf/x86/intel: Update event constraints when HT is off
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Ingo Molnar: "Two MIPS-GIC irqchip driver fixes to unbreak certain MIPS boards" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mips-gic: Match IPI IRQ domain by bus token only irqchip/mips-gic: Map to VPs using HW VPNum
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpioLinus Torvalds authored
Pull GPIO fixes from Linus Walleij: "I don't like to toss in last minute patches, but these are all for things that are broken, and have bitten people for real. Two of them go into stable. Maybe all of them if the compile test problem is a pain in the ass also for stable folks. Final (hopefully) GPIO fixes for v4.7: - Fix an oops on the Asus Eee PC 1201 - Revert a patch trying to split GPIO parsing and GPIO configuration - Revert a too liberal compile testing thing" * tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: Revert "gpio: gpiolib-of: Allow compile testing" Revert "gpiolib: Split GPIO flags parsing and GPIO configuration" gpio: sch: Fix Oops on module load on Asus Eee PC 1201
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: "One nouveau fix, and a few AMD Polaris fixes and some Allwinner fixes. I've got some vmware fixes that I might send separate over the weekend, they fix some black screens, but I'm still debating them" * tag 'drm-fixes-for-v4.7-rc7' of git://people.freedesktop.org/~airlied/linux: drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation. drm/amd/powerplay: fix bug that get wrong polaris evv voltage. drm/amd/powerplay: incorrectly use of the function return value drm/amd/powerplay: fix incorrect voltage table value for tonga drm/amd/powerplay: fix incorrect voltage table value for polaris10 drm/nouveau/disp/sor/gf119: select correct sor when poking training pattern gpu: drm: sun4i_drv: add missing of_node_put after calling of_parse_phandle drm/sun4i: Send vblank event when the CRTC is disabled drm/sun4i: Report proper vblank
-
Jeff Mahoney authored
There are legitimate reasons to disallow mmap on certain files, notably in sysfs or procfs. We shouldn't emulate mmap support on file systems that don't offer support natively. CVE-2016-1583 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
-
Borislav Petkov authored
No need to have it appear in objdump output. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160708141016.GH3808@pd.tnicSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Jan Beulich authored
As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on both Intel and AMD systems. Doing any kind of hardware capability checks in the driver as a prerequisite was wrong anyway: With the hypervisor being in charge, all such checking should be done by it. If ACPI data gets uploaded despite some missing capability, the hypervisor is free to ignore part or all of that data. Ditch the entire check_prereq() function, and do the only valid check (xen_initial_domain()) in the caller in its place. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
-
Dmitry Safonov authored
Should print this on vDSO remapping success (on new kernels): [root@localhost ~]# ./test_mremap_vdso_32 AT_SYSINFO_EHDR is 0xf773f000 [NOTE] Moving vDSO: [f773f000, f7740000] -> [a000000, a001000] [OK] Or print that mremap() for vDSOs is unsupported: [root@localhost ~]# ./test_mremap_vdso_32 AT_SYSINFO_EHDR is 0xf773c000 [NOTE] Moving vDSO: [0xf773c000, 0xf773d000] -> [0xf7737000, 0xf7738000] [FAIL] mremap() of the vDSO does not work on this kernel! Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160628113539.13606-3-dsafonov@virtuozzo.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Dmitry Safonov authored
Add possibility for 32-bit user-space applications to move the vDSO mapping. Previously, when a user-space app called mremap() for the vDSO address, in the syscall return path it would land on the previous address of the vDSOpage, resulting in segmentation violation. Now it lands fine and returns to userspace with a remapped vDSO. This will also fix the context.vdso pointer for 64-bit, which does not affect the user of vDSO after mremap() currently, but this may change in the future. As suggested by Andy, return -EINVAL for mremap() that would split the vDSO image: that operation cannot possibly result in a working system so reject it. Renamed and moved the text_mapping structure declaration inside map_vdso(), as it used only there and now it complements the vvar_mapping variable. There is still a problem for remapping the vDSO in glibc applications: the linker relocates addresses for syscalls on the vDSO page, so you need to relink with the new addresses. Without that the next syscall through glibc may fail: Program received signal SIGSEGV, Segmentation fault. #0 0xf7fd9b80 in __kernel_vsyscall () #1 0xf7ec8238 in _exit () from /usr/lib32/libc.so.6 Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Jan Beulich authored
No need to retain a local copy of the full request message, only the type is really needed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
-
Jan Beulich authored
xenbus_dev_request_and_reply() needs to track whether a transaction is open. For XS_TRANSACTION_START messages it calls transaction_start() and for XS_TRANSACTION_END messages it calls transaction_end(). If sending an XS_TRANSACTION_START message fails or responds with an an error, the transaction is not open and transaction_end() must be called. If sending an XS_TRANSACTION_END message fails, the transaction is still open, but if an error response is returned the transaction is closed. Commit 027bd7e8 ("xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart") introduced a regression where failed XS_TRANSACTION_START messages were leaving the transaction open. This can cause problems with suspend (and migration) as all transactions must be closed before suspending. It appears that the problematic change was added accidentally, so just remove it. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
-