1. 11 Dec, 2017 1 commit
    • Karol Herbst's avatar
      x86/mm/kmmio: Fix mmiotrace for page unaligned addresses · 6d60ce38
      Karol Herbst authored
      If something calls ioremap() with an address not aligned to PAGE_SIZE, the
      returned address might be not aligned as well. This led to a probe
      registered on exactly the returned address, but the entire page was armed
      for mmiotracing.
      
      On calling iounmap() the address passed to unregister_kmmio_probe() was
      PAGE_SIZE aligned by the caller leading to a complete freeze of the
      machine.
      
      We should always page align addresses while (un)registerung mappings,
      because the mmiotracer works on top of pages, not mappings. We still keep
      track of the probes based on their real addresses and lengths though,
      because the mmiotrace still needs to know what are mapped memory regions.
      
      Also move the call to mmiotrace_iounmap() prior page aligning the address,
      so that all probes are unregistered properly, otherwise the kernel ends up
      failing memory allocations randomly after disabling the mmiotracer.
      Tested-by: default avatarLyude <lyude@redhat.com>
      Signed-off-by: default avatarKarol Herbst <kherbst@redhat.com>
      Acked-by: default avatarPekka Paalanen <ppaalanen@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: nouveau@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/20171127075139.4928-1-kherbst@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6d60ce38
  2. 07 Dec, 2017 3 commits
  3. 06 Dec, 2017 6 commits
  4. 28 Nov, 2017 1 commit
  5. 27 Nov, 2017 1 commit
  6. 25 Nov, 2017 2 commits
    • Nadav Amit's avatar
      x86/tlb: Disable interrupts when changing CR4 · 9d0b6232
      Nadav Amit authored
      CR4 modifications are implemented as RMW operations which update a shadow
      variable and write the result to CR4. The RMW operation is protected by
      preemption disable, but there is no enforcement or debugging mechanism.
      
      CR4 modifications happen also in interrupt context via
      __native_flush_tlb_global(). This implementation does not affect a
      interrupted thread context CR4 operation, because the CR4 toggle restores
      the original content and does not modify the shadow variable.
      
      So the current situation seems to be safe, but a recent patch tried to add
      an actual RMW operation in interrupt context, which will cause subtle
      corruptions.
      
      To prevent that and make the CR4 handling future proof:
      
       - Add a lockdep assertion to __cr4_set() which will catch interrupt
         enabled invocations
      
       - Disable interrupts in the cr4 manipulator inlines
      
       - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
         from __switch_to_xtra() where interrupts are already disabled and
         performance matters.
      
      All other call sites are not performance critical, so the extra overhead of
      an additional local_irq_save/restore() pair is not a problem. If new call
      sites care about performance then the necessary _irqsoff() variants can be
      added.
      
      [ tglx: Condensed the patch by moving the irq protection inside the
        	manipulator functions. Updated changelog ]
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Luck <tony.luck@intel.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: nadav.amit@gmail.com
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
      9d0b6232
    • Nadav Amit's avatar
      x86/tlb: Refactor CR4 setting and shadow write · 0c3292ca
      Nadav Amit authored
      Refactor the write to CR4 and its shadow value. This is done in
      preparation for the addition of an assertion to check that IRQs are
      disabled during CR4 update.
      
      No functional change.
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: nadav.amit@gmail.com
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
      0c3292ca
  7. 24 Nov, 2017 1 commit
    • Masami Hiramatsu's avatar
      x86/decoder: Add new TEST instruction pattern · 12a78d43
      Masami Hiramatsu authored
      The kbuild test robot reported this build warning:
      
        Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
      
        Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
        Warning: objdump says 3 bytes, but insn_get_length() says 2
        Warning: decoded and checked 1569014 instructions with 1 warnings
      
      This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
      
      The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
      Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
      the ModR/M Byte (bits 2,1,0 in parenthesis)"
      
      In that table, opcodes listed by the index REG bits as:
      
        000         001       010 011  100        101        110         111
       TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
      
      So, it seems TEST Ib is assigned to 001.
      
      Add the new pattern.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      12a78d43
  8. 23 Nov, 2017 4 commits
  9. 22 Nov, 2017 2 commits
    • Andrey Ryabinin's avatar
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin authored
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f68d62a5
    • Andy Lutomirski's avatar
      x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing · 548c3050
      Andy Lutomirski authored
      When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF
      before it.  This means that users of entry_SYSCALL_64_after_hwframe()
      were responsible for invoking TRACE_IRQS_OFF, and the one and only
      user (Xen, added in the same commit) got it wrong.
      
      I think this would manifest as a warning if a Xen PV guest with
      CONFIG_DEBUG_LOCKDEP=y were used with context tracking.  (The
      context tracking bit is to cause lockdep to get invoked before we
      turn IRQs back on.)  I haven't tested that for real yet because I
      can't get a kernel configured like that to boot at all on Xen PV.
      
      Move TRACE_IRQS_OFF below the label.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 8a9949bc ("x86/xen/64: Rearrange the SYSCALL entries")
      Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.1511325444.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      548c3050
  10. 21 Nov, 2017 5 commits
  11. 17 Nov, 2017 6 commits
  12. 16 Nov, 2017 3 commits
    • Craig Bergstrom's avatar
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · be62a320
      Craig Bergstrom authored
      One thing /dev/mem access APIs should verify is that there's no way
      that excessively large pfn's can leak into the high bits of the
      page table entry.
      
      In particular, if people can use "very large physical page addresses"
      through /dev/mem to set the bits past bit 58 - SOFTW4 and permission
      key bits and NX bit, that could *really* confuse the kernel.
      
      We had an earlier attempt:
      
        ce56a86e ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      
      ... which turned out to be too restrictive (breaking mem=... bootups for example) and
      had to be reverted in:
      
        90edaac6 ("Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses"")
      
      This v2 attempt modifies the original patch and makes sure that mmap(/dev/mem)
      limits the pfns so that it at least fits in the actual pteval_t architecturally:
      
       - Make sure mmap_mem() actually validates that the offset fits in phys_addr_t
      
          ( This may be indirectly true due to some other check, but it's not
            entirely obvious. )
      
       - Change valid_mmap_phys_addr_range() to just use phys_addr_valid()
         on the top byte
      
          ( Top byte is sufficient, because mmap_mem() has already checked that
            it cannot wrap. )
      
       - Add a few comments about what the valid_phys_addr_range() vs.
         valid_mmap_phys_addr_range() difference is.
      Signed-off-by: default avatarCraig Bergstrom <craigb@google.com>
      [ Fixed the checks and added comments. ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ Collected the discussion and patches into a commit. ]
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/CA+55aFyEcOMb657vWSmrM13OxmHxC-XxeBmNis=DwVvpJUOogQ@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      be62a320
    • Kirill A. Shutemov's avatar
      x86/selftests: Add test for mapping placement for 5-level paging · 97f404ad
      Kirill A. Shutemov authored
      5-level paging provides a 56-bit virtual address space for user space
      application. But the kernel defaults to mappings below the 47-bit address
      space boundary, which is the upper bound for 4-level paging, unless an
      application explicitely request it by using a mmap(2) address hint above
      the 47-bit boundary. The kernel prevents mappings which spawn across the
      47-bit boundary unless mmap(2) was invoked with MAP_FIXED.
      
      Add a self-test that covers the corner cases of the interface and validates
      the correctness of the implementation.
      
      [ tglx: Massaged changelog once more ]
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20171115143607.81541-2-kirill.shutemov@linux.intel.com
      97f404ad
    • Kirill A. Shutemov's avatar
      x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border · 1e0f25db
      Kirill A. Shutemov authored
      In case of 5-level paging, the kernel does not place any mapping above
      47-bit, unless userspace explicitly asks for it.
      
      Userspace can request an allocation from the full address space by
      specifying the mmap address hint above 47-bit.
      
      Nicholas noticed that the current implementation violates this interface:
      
        If user space requests a mapping at the end of the 47-bit address space
        with a length which causes the mapping to cross the 47-bit border
        (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
        below and above.
      
      Sanity check the mmap address hint so that start and end of the resulting
      vma are on the same side of the 47-bit border. If that's not the case fall
      back to the code path which ignores the address hint and allocate from the
      regular address space below 47-bit.
      
      To make the checks consistent, mask out the address hints lower bits
      (either PAGE_MASK or huge_page_mask()) instead of using ALIGN() which can
      push them up to the next boundary.
      
      [ tglx: Moved the address check to a function and massaged comment and
        	changelog ]
      Reported-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20171115143607.81541-1-kirill.shutemov@linux.intel.com
      1e0f25db
  13. 14 Nov, 2017 5 commits