Commits · 71389703839ebe9cb426c72d5f0bd549592e583c · nexedi / linux

01 May, 2017 1 commit

mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash · 71389703

Dan Williams authored Apr 28, 2017

The x86 conversion to the generic GUP code included a small change which causes
crashes and data corruption in the pmem code - not good.

The root cause is that the /dev/pmem driver code implicitly relies on the x86
get_user_pages() implementation doing a get_page() on the page refcount, because
get_page() does a get_zone_device_page() which properly refcounts pmem's separate
page struct arrays that are not present in the regular page struct structures.
(The pmem driver does this because it can cover huge memory areas.)

But the x86 conversion to the generic GUP code changed the get_page() to
page_cache_get_speculative() which is faster but doesn't do the
get_zone_device_page() call the pmem code relies on.

One way to solve the regression would be to change the generic GUP code to use
get_page(), but that would slow things down a bit and punish other generic-GUP
using architectures for an x86-ism they did not care about. (Arguably the pmem
driver was probably not working reliably for them: but nvdimm is an Intel
feature, so non-x86 exposure is probably still limited.)

So restructure the pmem code's interface with the MM instead: get rid of the
get/put_zone_device_page() distinction, integrate put_zone_device_page() into
__put_page() and and restructure the pmem completion-wait and teardown machinery:

Kirill points out that the calls to {get,put}_dev_pagemap() can be
removed from the mm fast path if we take a single get_dev_pagemap()
reference to signify that the page is alive and use the final put of the
page to drop that reference.

This does require some care to make sure that any waits for the
percpu_ref to drop to zero occur *after* devm_memremap_page_release(),
since it now maintains its own elevated reference.

This speeds up things while also making the pmem refcounting more robust going
forward.
Suggested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

71389703

26 Apr, 2017 5 commits

x86/mm: Fix flush_tlb_page() on Xen · dbd68d8e

Andy Lutomirski authored Apr 22, 2017

flush_tlb_page() passes a bogus range to flush_tlb_others() and
expects the latter to fix it up.  native_flush_tlb_others() has the
fixup but Xen's version doesn't.  Move the fixup to
flush_tlb_others().

AFAICS the only real effect is that, without this fix, Xen would
flush everything instead of just the one page on remote vCPUs in
when flush_tlb_page() was called.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: e7b52ffd ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

dbd68d8e

x86/mm: Make flush_tlb_mm_range() more predictable · ce27374f

Andy Lutomirski authored Apr 22, 2017

I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way.  Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs.  This seems to be
an accident, and preserving it will be awkward.  Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.

The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.

As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

ce27374f

x86/mm: Remove flush_tlb() and flush_tlb_current_task() · 29961b59

Andy Lutomirski authored Apr 22, 2017

I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

29961b59

x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() · 9ccee237

Andy Lutomirski authored Apr 22, 2017

mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.

Compile-tested only because I don't know whether software that uses
this mechanism even exists.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

9ccee237

x86/mm/64: Fix crash in remove_pagetable() · e6ab9c4d

Kirill A. Shutemov authored Apr 25, 2017

remove_pagetable() does page walk using p*d_page_vaddr() plus cast.
It's not canonical approach -- we usually use p*d_offset() for that.

It works fine as long as all page table levels are present. We broke the
invariant by introducing folded p4d page table level.

As result, remove_pagetable() interprets PMD as PUD and it leads to
crash:

	BUG: unable to handle kernel paging request at ffff880300000000
	IP: memchr_inv+0x60/0x110
	PGD 317d067
	P4D 317d067
	PUD 3180067
	PMD 33f102067
	PTE 8000000300000060

Let's fix this by using p*d_offset() instead of p*d_page_vaddr() for
page walk.
Reported-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Fixes: f2a6a705 ("x86: Convert the rest of the code to support p4d_t")
Link: http://lkml.kernel.org/r/20170425092557.21852-1-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

e6ab9c4d

23 Apr, 2017 1 commit

Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d

Ingo Molnar authored Apr 23, 2017

This reverts commit 2947ba05.

Dan Williams reported dax-pmem kernel warnings with the following signature:

   WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
   percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic

... and bisected it to this commit, which suggests possible memory corruption
caused by the x86 fast-GUP conversion.

He also pointed out:

 "
  This is similar to the backtrace when we were not properly handling
  pud faults and was fixed with this commit: 220ced16 "mm: fix
  get_user_pages() vs device-dax pud mappings"

  I've found some missing _devmap checks in the generic
  get_user_pages_fast() path, but this does not fix the regression
  [...]
 "

So given that there are known bugs, and a pretty robust looking bisection
points to this commit suggesting that are unknown bugs in the conversion
as well, revert it for the time being - we'll re-try in v4.13.
Reported-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dann.frazier@canonical.com
Cc: dave.hansen@intel.com
Cc: steve.capper@linaro.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

6dd29b3d

14 Apr, 2017 1 commit

x86/boot/e820: Remove a redundant self assignment · ace2fb5a

Colin King authored Apr 13, 2017

Remove a redundant self assignment of table->nr_entries, it does
nothing and is an artifact of code simplification re-work.

Detected by CoverityScan, CID#1428450 ("Self assignment")

Fixes: 441ac2f3 ("x86/boot/e820: Simplify e820__update_table()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: kernel-janitors@vger.kernel.org
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/20170413155912.12078-1-colin.king@canonical.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

ace2fb5a

12 Apr, 2017 3 commits

x86/mm: Fix dump pagetables for 4 levels of page tables · 84bbabc3

Juergen Gross authored Apr 12, 2017

Commit fdd3d8ce ("x86/dump_pagetables: Add support for 5-level
paging") introduced an error for dumping with only 4 levels by setting
PGD_LEVEL_MULT to a wrong value.

This is leading to e.g. addresses printed as "(null)" for ranges:

x86/mm: Found insecure W+X mapping at address (null)/(null)

Make PGD_LEVEL_MULT a multiple of PTRS_PER_P4D instead of PTRS_PER_PUD

Fixes: fdd3d8ce ("x86/dump_pagetables: Add support for 5-level paging")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170412143634.6846-1-jgross@suse.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>

84bbabc3

x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow · 5f2173e0

Joerg Roedel authored Apr 06, 2017

The check between the hardware state and our shadow of it is
checked in the signal handler for all bounds exceptions,
even for the ones where we don't keep the shadow up2date.
This is a problem because when no shadow is kept the handler
fails at this point and hides the real reason of the
exception.

Move the check into the code-path evaluating normal bounds
exceptions to prevent this.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kselftest@vger.kernel.org
Link: http://lkml.kernel.org/r/1491488598-27346-1-git-send-email-joro@8bytes.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

5f2173e0

x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space · 5ed386ec

Joerg Roedel authored Apr 06, 2017

When this function fails it just sends a SIGSEGV signal to
user-space using force_sig(). This signal is missing
essential information about the cause, e.g. the trap_nr or
an error code.

Fix this by propagating the error to the only caller of
mpx_handle_bd_fault(), do_bounds(), which sends the correct
SIGSEGV signal to the process.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fe3d197f ('x86, mpx: On-demand kernel allocation of bounds tables')
Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

5ed386ec

11 Apr, 2017 2 commits

Merge branch 'x86/boot' into x86/mm, to avoid conflict · e5185a76

Ingo Molnar authored Apr 11, 2017

There's a conflict between ongoing level-5 paging support and
the E820 rewrite. Since the E820 rewrite is essentially ready,
merge it into x86/mm to reduce tree conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>

e5185a76

Merge branch 'WIP.x86/boot' into x86/boot, to pick up ready branch · 47292771

Ingo Molnar authored Apr 11, 2017

The E820 rework in WIP.x86/boot has gone through a couple of weeks
of exposure in -tip, merge it in a wider fashion.
Signed-off-by: Ingo Molnar <mingo@kernel.org>

47292771

07 Apr, 2017 1 commit

Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" · b678c91a

Thomas Gleixner authored Apr 08, 2017

This reverts commit 474aeffd due to testing
failures.
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20170406124459.dwn5zhpr2xqg3lqm@node.shutemov.name

b678c91a

04 Apr, 2017 7 commits

x86/espfix: Add support for 5-level paging · 1d33b219

Kirill A. Shutemov authored Mar 30, 2017

We don't need extra virtual address space for ESPFIX, so it stays within
one PUD page table for both 4- and 5-level paging.

Redefining ESPFIX_BASE_ADDR using P4D_SHIFT instead of PGDIR_SHIFT would
make it stay in the same place regarding of paging mode.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-8-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

1d33b219

x86/kasan: Extend KASAN to support 5-level paging · 5480bb61

Kirill A. Shutemov authored Mar 30, 2017

This patch bring support for a non-folded additional page table level.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-7-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

5480bb61

x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y · b8504058

Kirill A. Shutemov authored Mar 30, 2017

Extends pagetable headers to support the new paging mode.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-6-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

b8504058

x86/paravirt: Add 5-level support to the paravirt code · 335437fb

Kirill A. Shutemov authored Mar 30, 2017

Add operations to allocate/release p4ds.

Xen requires more work. We will need to come back to it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-5-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

335437fb

x86/mm: Define virtual memory map for 5-level paging · 4c7c4483

Kirill A. Shutemov authored Mar 30, 2017

The first part of memory map (up to %esp fixup) simply scales existing
map for 4-level paging by factor of 9 -- number of bits addressed by
the additional page table level.

The rest of the map is unchanged.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-4-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

4c7c4483

x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert · 361b4b58

Kirill A. Shutemov authored Mar 30, 2017

We don't need the assert anymore, as:

  17be0aec ("x86/asm/entry/64: Implement better check for canonical addresses")

made canonical address checks generic wrt. address width.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-3-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

361b4b58

x86/boot: Detect 5-level paging support · 3677d4c6

Kirill A. Shutemov authored Mar 30, 2017

In this initial implementation we force-require 5-level paging support
from the hardware, when compiled with CONFIG_X86_5LEVEL=y. (The kernel
will panic during boot on CPUs that don't support 5-level paging.)

We will implement boot-time switch between 4- and 5-level paging later.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-2-kirill.shutemov@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

3677d4c6

03 Apr, 2017 4 commits

Merge tag 'v4.11-rc5' into x86/mm, to refresh the branch · 7f75540f
Ingo Molnar authored Apr 03, 2017
```
Signed-off-by: Ingo Molnar <mingo@kernel.org>
```
7f75540f

x86/mm/numa: Remove numa_nodemask_from_meminfo() · 474aeffd

Wei Yang authored Mar 14, 2017

numa_nodemask_from_meminfo() generates a nodemask of nodes which have
memory according to a meminfo descriptor.

The two callsites of that function both set bits in copies of the
numa_nodes_parsed nodemask. In both cases, the information in supplied
numa_meminfo is a subset of numa_nodes_parsed. So setting those bits
again is not really necessary.

Here are the three call paths which show that the supplied numa_meminfo
argument describes memory regions in nodes which are already in
numa_nodes_parsed:

    x86_numa_init()
        numa_init()
            Case 1:
            acpi_numa_init()
	    acpi_parse_memory_affinity()
                    numa_add_memblk()
                    node_set(numa_nodes_parsed)
                acpi_parse_slit()
		 acpi_numa_slit_init()
		  numa_set_distance()
		   numa_alloc_distance()
                    numa_nodemask_from_meminfo()

            Case 2:
            amd_numa_init()
                numa_add_memblk()
                node_set(numa_nodes_parsed)

            Case 3
            dummy_numa_init()
                node_set(numa_nodes_parsed)
                numa_add_memblk()

            numa_register_memblks()
                numa_nodemask_from_meminfo()

Thus, in all three cases, the respective bit in numa_nodes_parsed is
set, which means it is not necessary to set it again in a copy of
numa_nodes_parsed.

So remove that function.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20170314030801.13656-2-richard.weiyang@gmail.com
[ Heavily massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

474aeffd

x86/mm/numa: Improve alloc_node_data() error path message · 43dac8f6

Wei Yang authored Mar 14, 2017

alloc_node_data() tries to allocate from the local node first and, if
that attempt fails, falls back to any node. Improve the error message to
issue the initial node for ease during debugging.

Fix a typo in the comments, while at it.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Link: http://lkml.kernel.org/r/20170314030801.13656-1-richard.weiyang@gmail.com
[ Masssage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

43dac8f6

Linux 4.11-rc5 · a71c9a1c
Linus Torvalds authored Apr 02, 2017

a71c9a1c

02 Apr, 2017 10 commits

Merge tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma · f49237bf

Linus Torvalds authored Apr 02, 2017

Pull dmaengine fixes from Vinod Koul:
 "A couple of minor fixes for 4.11:

   - array bound fix for __get_unmap_pool()

   - cyclic period splitting for bcm2835"

* tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
  dmaengine: bcm2835: Fix cyclic DMA period splitting

f49237bf

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 496dcc50

Linus Torvalds authored Apr 02, 2017

Pull x86 fixes from Thomas Gleixner:
 "This update provides:

   - prevent KASLR from randomizing EFI regions

   - restrict the usage of -maccumulate-outgoing-args and document when
     and why it is required.

   - make the Global Physical Address calculation for UV4 systems work
     correctly.

   - address a copy->paste->forgot-edit problem in the MCE exception
     table entries.

   - assign a name to AMD MCA bank 3, so the sysfs file registration
     works.

   - add a missing include in the boot code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Include missing header file
  x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
  x86/build: Mostly disable '-maccumulate-outgoing-args'
  x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
  x86/mce: Fix copy/paste error in exception table entries
  x86/platform/uv: Fix calculation of Global Physical Address

496dcc50

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 128c434a

Linus Torvalds authored Apr 02, 2017

Pull scheduler fixes from Thomas Gleixner:
 "This update provides:

   - make the scheduler clock switch to unstable mode smooth so the
     timestamps stay at microseconds granularity instead of switching to
     tick granularity.

   - unbreak perf test tsc by taking the new offset into account which
     was added in order to proveide better sched clock continuity

   - switching sched clock to unstable mode runs all clock related
     computations which affect the sched clock output itself from a work
     queue. In case of preemption sched clock uses half updated data and
     provides wrong timestamps. Keep the math in the protected context
     and delegate only the static key switch to workqueue context.

   - remove a duplicate header include"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers: Remove duplicate #include <linux/sched/debug.h> line
  sched/clock: Fix broken stable to unstable transfer
  sched/clock, x86/perf: Fix "perf test tsc"
  sched/clock: Fix clear_sched_clock_stable() preempt wobbly

128c434a

Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a89b5eb

Linus Torvalds authored Apr 02, 2017

Pull EFI fix from Thomas Gleixner:
 "Downgrade the missing ESRT header printk to warning level and remove a
  useless error printk which just generates noise for no value"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/esrt: Cleanup bad memory map log messages

0a89b5eb

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4a6808f3

Linus Torvalds authored Apr 02, 2017

Pull timer fixes from Thomas Gleixner:
 "Two small fixes for the new CLKEVT_OF infrastructure"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vmlinux.lds: Add __clkevt_of_table to kernel
  clockevents: Fix syntax error in clkevt-of macro

4a6808f3

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 907977b2

Linus Torvalds authored Apr 02, 2017

Pull irq fixes from Thomas Gleixner:
 "Two small fixlets:

   - select a required Kconfig to make the MVEBU driver compile

   - add the missing MIPS local GIC interrupts which prevent drivers to
     probe successfully"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mips-gic: Fix Local compare interrupt
  irqchip/mvebu-odmi: Select GENERIC_MSI_IRQ_DOMAIN

907977b2

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ada63c61

Linus Torvalds authored Apr 02, 2017

Pull core fix from Thomas Gleixner:
 "Prevent leaking kernel memory via /proc/$pid/syscall when the queried
  task is not in a syscall"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lib/syscall: Clear return values when no stack

ada63c61

Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 346ce1d7

Linus Torvalds authored Apr 01, 2017

Pull parisc fixes from Helge Deller:
"Al Viro reported that - in case of read faults - our copy_from_user()
implementation may claim to have copied more bytes than it actually
did. In order to fix this bug and because of the way how gcc optimizes
register usage for inline assembly in C code, we had to replace our
pa_memcpy() function with a pure assembler implementation.

While fixing the memcpy bug we noticed some other issues with our
get_user() and put_user() functions, e.g. nested faults may return
wrong data. This is now fixed by a common fixup handler for
get_user/put_user in the exception handler which additionally makes
generated code smaller and faster.

The third patch is a trivial one-line fix for a patch which went in
during 4.11-rc and which avoids stalled CPU warnings after power
shutdown (for parisc machines which can't plug power off themselves).

Due to the rewrite of pa_memcpy() into assembly this patch got bigger
than what I wanted to have sent at this stage.

Those patches have been running in production during the last few days
on our debian build servers without any further issues"

* 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid stalled CPU warnings after system shutdown
parisc: Clean up fixup routines for get_user()/put_user()
parisc: Fix access fault handling in pa_memcpy()

346ce1d7

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 7d34ddbe

Linus Torvalds authored Apr 01, 2017

Pull SCSI fixes from James Bottomley:
 "Thirteen small fixes: The hopefully final effort to get the lpfc nvme
  kconfig problems sorted, there's one important sg fix (user can induce
  read after end of buffer) and one minor enhancement (adding an extra
  PCI ID to qedi). The rest are a set of minor fixes, which mostly occur
  as user visible in error legs or on specific devices"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: remove the duplicated checking for supporting clkscaling
  scsi: lpfc: fix building without debugfs support
  scsi: lpfc: Fix PT2PT PRLI reject
  scsi: hpsa: fix volume offline state
  scsi: libsas: fix ata xfer length
  scsi: scsi_dh_alua: Warn if the first argument of alua_rtpg_queue() is NULL
  scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function
  scsi: scsi_dh_alua: Check scsi_device_get() return value
  scsi: sg: check length passed to SG_NEXT_CMD_LEN
  scsi: ufshcd-platform: remove the useless cast in ERR_PTR/IS_ERR
  scsi: qedi: Add PCI device-ID for QL41xxx adapters.
  scsi: aacraid: Fix potential null access
  scsi: qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr

7d34ddbe

Merge branch 'akpm' (patches from Andrew) · 978e0f92

Linus Torvalds authored Apr 01, 2017

Merge misc fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kasan: do not sanitize kexec purgatory
  drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
  mm/hugetlb.c: don't call region_abort if region_chg fails
  kasan: report only the first error by default
  hugetlbfs: initialize shared policy as part of inode allocation
  mm: fix section name for .data..ro_after_init
  mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
  mm: workingset: fix premature shadow node shrinking with cgroups
  mm: rmap: fix huge file mmap accounting in the memcg stats
  mm: move mm_percpu_wq initialization earlier
  mm: migrate: fix remove_migration_pte() for ksm pages

978e0f92

01 Apr, 2017 5 commits

Merge tag 'usb-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · a9f6b6b8

Linus Torvalds authored Apr 01, 2017

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 4.11-rc5.

  The usual xhci fixes are here, as well as a fix for yet-another-bug-
  found-by-KASAN, those developers are doing great stuff here.

  And there's a phy build warning fix that showed up in 4.11-rc1.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled
  xhci: Manually give back cancelled URB if we can't queue it for cancel
  xhci: Set URB actual length for stopped control transfers
  xhci: plat: Register shutdown for xhci_plat
  USB: fix linked-list corruption in rh_call_control()

a9f6b6b8

Merge tag 'tty-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · b3ff4fac

Linus Torvalds authored Apr 01, 2017

Pull tty/serial fixes from Greg KH:
 "Here are some small fixes for some serial drivers and Kconfig help
  text for 4.11-rc5. Nothing major here at all, a few things resolving
  reported bugs in some random serial drivers.

  I don't think these made the last linux-next due to me getting to them
  yesterday, but I am not sure, they might have snuck in. The patches
  only affect drivers that the maintainers of sent me these patches for,
  so we should be safe here :)"

* tag 'tty-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: pl011: fix earlycon work-around for QDF2400 erratum 44
  serial: 8250_EXAR: fix duplicate Kconfig text and add missing help text
  tty/serial: atmel: fix TX path in atmel_console_write()
  tty/serial: atmel: fix race condition (TX+DMA)
  serial: mxs-auart: Fix baudrate calculation

b3ff4fac

Merge tag 'acpi-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7ece03b0

Linus Torvalds authored Apr 01, 2017

Pull ACPI fixes from Rafael Wysocki:
 "These fix two issues related to IOAPIC hotplug, an overzealous build
  optimization that prevents the function graph tracer from working with
  the ACPI subsystem correctly and an RCU synchronization issue in the
  ACPI APEI code.

  Specifics:

   - drop the unconditional setting of the '-Os' gcc flag from the ACPI
     Makefile to make the function graph tracer work correctly with the
     ACPI subsystem (Josh Poimboeuf).

   - add missing synchronize_rcu() to ghes_remove() which removes an
     element from an RCU-protected list, but fails to synchronize it
     properly afterward (James Morse).

   - fix two problems related to IOAPIC hotplug, a local variable
     initialization in setup_res() and the creation of platform device
     objects for IO(x)APICs which are (a) unused and (b) leaked on
     hot-removal (Joerg Roedel)"

* tag 'acpi-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: Fix incompatibility with mcount-based function graph tracing
  ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal
  ACPI: Do not create a platform_device for IOAPIC/IOxAPIC
  ACPI: ioapic: Clear on-stack resource before using it

7ece03b0

Merge tag 'pm-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0d2ceec6

Linus Torvalds authored Apr 01, 2017

Pull power management fixes from Rafael Wysocki:
 "These fix a cpufreq core issue with the initialization of the cpufreq
  sysfs interface and a cpuidle powernv driver initialization issue.

  Specifics:

   - symbolic links from CPU directories to the corresponding cpufreq
     policy directories in sysfs are not created during initialization
     in some cases which confuses user space, so prevent that from
     happening (Rafael Wysocki).

   - the powernv cpuidle driver fails to pass a correct cpumaks to the
     cpuidle core in some cases which causes subsequent failures to
     occur, so fix it (Vaidyanathan Srinivasan)"

* tag 'pm-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: powernv: Pass correct drv->cpumask for registration
  cpufreq: Fix creation of symbolic links to policy directories

0d2ceec6

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 1300dc68

Linus Torvalds authored Apr 01, 2017

Pull i2c fixes from Wolfram Sang:
 "Two bugfixes from I2C, specifically the I2C mux section. Thanks to
  peda for collecting them"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: mux: pca954x: Add missing pca9546 definition to chip_desc
  Revert "i2c: mux: pca954x: Add ACPI support for pca954x"

1300dc68