1. 23 Jul, 2016 1 commit
    • Andy Lutomirski's avatar
      x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs · 530dd8d4
      Andy Lutomirski authored
      Valdis Kletnieks bisected a boot failure back to this recent commit:
      
        360cb4d1 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated")
      
      I broke the case where a PUD table got allocated -- populate_pud()
      would wander off a pgd_none entry and get lost.  I'm not sure how
      this survived my testing.
      
      Fix the original issue in a much simpler way.  The problem
      was that, if we allocated a PUD table, failed to populate it, and
      freed it, another CPU could potentially keep using the PGD entry we
      installed (either by copying it via vmalloc_fault or by speculatively
      caching it).  There's a straightforward fix: simply leave the
      top-level entry in place if this happens.  This can't waste any
      significant amount of memory -- there are at most 256 entries like
      this systemwide and, as a practical matter, if we hit this failure
      path repeatedly, we're likely to reuse the same page anyway.
      
      For context, this is a reversion with this hunk added in:
      
      	if (ret < 0) {
      +		/*
      +		 * Leave the PUD page in place in case some other CPU or thread
      +		 * already found it, but remove any useless entries we just
      +		 * added to it.
      +		 */
      -		unmap_pgd_range(cpa->pgd, addr,
      +		unmap_pud_range(pgd_entry, addr,
      			        addr + (cpa->numpages << PAGE_SHIFT));
      		return ret;
      	}
      
      This effectively open-codes what the now-deleted unmap_pgd_range()
      function used to do except that unmap_pgd_range() used to try to
      free the page as well.
      Reported-by: default avatarValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mike Krinkin <krinkin.m.u@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      530dd8d4
  2. 15 Jul, 2016 14 commits
  3. 13 Jul, 2016 4 commits
    • Dave Hansen's avatar
      x86/mm: Use pte_none() to test for empty PTE · dcb32d99
      Dave Hansen authored
      The page table manipulation code seems to have grown a couple of
      sites that are looking for empty PTEs.  Just in case one of these
      entries got a stray bit set, use pte_none() instead of checking
      for a zero pte_val().
      
      The use pte_same() makes me a bit nervous.  If we were doing a
      pte_same() check against two cleared entries and one of them had
      a stray bit set, it might fail the pte_same() check.  But, I
      don't think we ever _do_ pte_same() for cleared entries.  It is
      almost entirely used for checking for races in fault-in paths.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001915.813703D9@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dcb32d99
    • Dave Hansen's avatar
      x86/mm: Disallow running with 32-bit PTEs to work around erratum · e4a84be6
      Dave Hansen authored
      The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
      Landing) has an erratum where a processor thread setting the Accessed
      or Dirty bits may not do so atomically against its checks for the
      Present bit.  This may cause a thread (which is about to page fault)
      to set A and/or D, even though the Present bit had already been
      atomically cleared.
      
      These bits are truly "stray".  In the case of the Dirty bit, the
      thread associated with the stray set was *not* allowed to write to
      the page.  This means that we do not have to launder the bit(s); we
      can simply ignore them.
      
      If the PTE is used for storing a swap index or a NUMA migration index,
      the A bit could be misinterpreted as part of the swap type.  The stray
      bits being set cause a software-cleared PTE to be interpreted as a
      swap entry.  In some cases (like when the swap index ends up being
      for a non-existent swapfile), the kernel detects the stray value
      and WARN()s about it, but there is no guarantee that the kernel can
      always detect it.
      
      When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
      to move the swap PTE format around to avoid these troublesome bits.
      But, 32-bit non-PAE is tight on bits.  So, disallow it from running
      on this hardware.  I can't imagine anyone wanting to run 32-bit
      non-highmem kernels on this hardware, but disallowing them from
      running entirely is surely the safe thing to do.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e4a84be6
    • Dave Hansen's avatar
      x86/mm: Ignore A/D bits in pte/pmd/pud_none() · 97e3c602
      Dave Hansen authored
      The erratum we are fixing here can lead to stray setting of the
      A and D bits.  That means that a pte that we cleared might
      suddenly have A/D set.  So, stop considering those bits when
      determining if a pte is pte_none().  The same goes for the
      other pmd_none() and pud_none().  pgd_none() can be skipped
      because it is not affected; we do not use PGD entries for
      anything other than pagetables on affected configurations.
      
      This adds a tiny amount of overhead to all pte_none() checks.
      I doubt we'll be able to measure it anywhere.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001912.5216F89C@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      97e3c602
    • Dave Hansen's avatar
      x86/mm: Move swap offset/type up in PTE to work around erratum · 00839ee3
      Dave Hansen authored
      This erratum can result in Accessed/Dirty getting set by the hardware
      when we do not expect them to be (on !Present PTEs).
      
      Instead of trying to fix them up after this happens, we just
      allow the bits to get set and try to ignore them.  We do this by
      shifting the layout of the bits we use for swap offset/type in
      our 64-bit PTEs.
      
      It looks like this:
      
       bitnrs: |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0|
       names:  |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
       before: |         OFFSET (9-63)          |0|X|X| TYPE(1-5) |0|
        after: | OFFSET (14-63)  |  TYPE (9-13) |0|X|X|X| X| X|X|X|0|
      
      Note that D was already a don't care (X) even before.  We just
      move TYPE up and turn its old spot (which could be hit by the
      A bit) into all don't cares.
      
      We take 5 bits away from the offset, but that still leaves us
      with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
      I think that's probably fine for the moment.  We could
      theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
      doesn't gain us anything.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      00839ee3
  4. 10 Jul, 2016 2 commits
  5. 09 Jul, 2016 2 commits
  6. 08 Jul, 2016 17 commits