1. 02 Jun, 2017 22 commits
    • Michal Hocko's avatar
      mm: consider memblock reservations for deferred memory initialization sizing · 864b9a39
      Michal Hocko authored
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      864b9a39
    • James Morse's avatar
      mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified · 9a291a7c
      James Morse authored
      KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
      FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
      finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
      special case.  (check_user_page_hwpoison())
      
      When huge pages are involved, this doesn't work so well.
      get_user_pages() calls follow_hugetlb_page(), which stops early if it
      receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
      -EFAULT to the caller.  The step to map this to -EHWPOISON based on the
      FOLL_ flags is missing.  The hwpoison special case is skipped, and
      -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
      
      Instead, move this VM_FAULT_ to errno mapping code into a header file
      and use it from faultin_page() and follow_hugetlb_page().
      
      With this, KVM works as expected.
      
      This isn't a problem for arm64 today as we haven't enabled
      MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
      too, so I think this should be a fix.  This doesn't apply earlier than
      stable's v4.11.1 due to all sorts of cleanup.
      
      [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
      suggested.
        Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.comSigned-off-by: default avatarJames Morse <james.morse@arm.com>
      Acked-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.11.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a291a7c
    • Yisheng Xie's avatar
      mlock: fix mlock count can not decrease in race condition · 70feee0e
      Yisheng Xie authored
      Kefeng reported that when running the follow test, the mlock count in
      meminfo will increase permanently:
      
       [1] testcase
       linux:~ # cat test_mlockal
       grep Mlocked /proc/meminfo
        for j in `seq 0 10`
        do
       	for i in `seq 4 15`
       	do
       		./p_mlockall >> log &
       	done
       	sleep 0.2
       done
       # wait some time to let mlock counter decrease and 5s may not enough
       sleep 5
       grep Mlocked /proc/meminfo
      
       linux:~ # cat p_mlockall.c
       #include <sys/mman.h>
       #include <stdlib.h>
       #include <stdio.h>
      
       #define SPACE_LEN	4096
      
       int main(int argc, char ** argv)
       {
      	 	int ret;
      	 	void *adr = malloc(SPACE_LEN);
      	 	if (!adr)
      	 		return -1;
      
      	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
      	 	printf("mlcokall ret = %d\n", ret);
      
      	 	ret = munlockall();
      	 	printf("munlcokall ret = %d\n", ret);
      
      	 	free(adr);
      	 	return 0;
      	 }
      
      In __munlock_pagevec() we should decrement NR_MLOCK for each page where
      we clear the PageMlocked flag.  Commit 1ebb7cc6 ("mm: munlock: batch
      NR_MLOCK zone state updates") has introduced a bug where we don't
      decrement NR_MLOCK for pages where we clear the flag, but fail to
      isolate them from the lru list (e.g.  when the pages are on some other
      cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
      accounting gets permanently disrupted by this.
      
      Fix it by counting the number of page whose PageMlock flag is cleared.
      
      Fixes: 1ebb7cc6 (" mm: munlock: batch NR_MLOCK zone state updates")
      Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.comSigned-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: zhongjiang <zhongjiang@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70feee0e
    • Punit Agrawal's avatar
      mm/migrate: fix refcount handling when !hugepage_migration_supported() · 30809f55
      Punit Agrawal authored
      On failing to migrate a page, soft_offline_huge_page() performs the
      necessary update to the hugepage ref-count.
      
      But when !hugepage_migration_supported() , unmap_and_move_hugepage()
      also decrements the page ref-count for the hugepage.  The combined
      behaviour leaves the ref-count in an inconsistent state.
      
      This leads to soft lockups when running the overcommitted hugepage test
      from mce-tests suite.
      
        Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
        soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
        INFO: rcu_preempt detected stalls on CPUs/tasks:
         Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
          (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
          thugetlb_overco R  running task        0  2715   2685 0x00000008
          Call trace:
            dump_backtrace+0x0/0x268
            show_stack+0x24/0x30
            sched_show_task+0x134/0x180
            rcu_print_detail_task_stall_rnp+0x54/0x7c
            rcu_check_callbacks+0xa74/0xb08
            update_process_times+0x34/0x60
            tick_sched_handle.isra.7+0x38/0x70
            tick_sched_timer+0x4c/0x98
            __hrtimer_run_queues+0xc0/0x300
            hrtimer_interrupt+0xac/0x228
            arch_timer_handler_phys+0x3c/0x50
            handle_percpu_devid_irq+0x8c/0x290
            generic_handle_irq+0x34/0x50
            __handle_domain_irq+0x68/0xc0
            gic_handle_irq+0x5c/0xb0
      
      Address this by changing the putback_active_hugepage() in
      soft_offline_huge_page() to putback_movable_pages().
      
      This only triggers on systems that enable memory failure handling
      (ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
      (!ARCH_ENABLE_HUGEPAGE_MIGRATION).
      
      I imagine this wasn't triggered as there aren't many systems running
      this configuration.
      
      [akpm@linux-foundation.org: remove dead comment, per Naoya]
      Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.comReported-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Tested-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Suggested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[3.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30809f55
    • Ross Zwisler's avatar
      dax: fix race between colliding PMD & PTE entries · e2093926
      Ross Zwisler authored
      We currently have two related PMD vs PTE races in the DAX code.  These
      can both be easily triggered by having two threads reading and writing
      simultaneously to the same private mapping, with the key being that
      private mapping reads can be handled with PMDs but private mapping
      writes are always handled with PTEs so that we can COW.
      
      Here is the first race:
      
        CPU 0					CPU 1
      
        (private mapping write)
        __handle_mm_fault()
          create_huge_pmd() - FALLBACK
          handle_pte_fault()
            passes check for pmd_devmap()
      
      					(private mapping read)
      					__handle_mm_fault()
      					  create_huge_pmd()
      					    dax_iomap_pmd_fault() inserts PMD
      
            dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
            			  installed in our page tables at this spot.
      
      Here's the second race:
      
        CPU 0					CPU 1
      
        (private mapping read)
        __handle_mm_fault()
          passes check for pmd_none()
          create_huge_pmd()
            dax_iomap_pmd_fault() inserts PMD
      
        (private mapping write)
        __handle_mm_fault()
          create_huge_pmd() - FALLBACK
      					(private mapping read)
      					__handle_mm_fault()
      					  passes check for pmd_none()
      					  create_huge_pmd()
      
          handle_pte_fault()
            dax_iomap_pte_fault() inserts PTE
      					    dax_iomap_pmd_fault() inserts PMD,
      					       but we already have a PTE at
      					       this spot.
      
      The core of the issue is that while there is isolation between faults to
      the same range in the DAX fault handlers via our DAX entry locking,
      there is no isolation between faults in the code in mm/memory.c.  This
      means for instance that this code in __handle_mm_fault() can run:
      
      	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
      		ret = create_huge_pmd(&vmf);
      
      But by the time we actually get to run the fault handler called by
      create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
      fault has installed a normal PMD here as a parent.  This is the cause of
      the 2nd race.  The first race is similar - there is the following check
      in handle_pte_fault():
      
      	} else {
      		/* See comment in pte_alloc_one_map() */
      		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
      			return 0;
      
      So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
      will bail and retry the fault.  This is correct, but there is nothing
      preventing the PMD from being installed after this check but before we
      actually get to the DAX PTE fault handlers.
      
      In my testing these races result in the following types of errors:
      
        BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
        BUG: non-zero nr_ptes on freeing mm: 15
      
      Fix this issue by having the DAX fault handlers verify that it is safe
      to continue their fault after they have taken an entry lock to block
      other racing faults.
      
      [ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
        Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
      Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarPawel Lebioda <pawel.lebioda@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Xiong Zhou <xzhou@redhat.com>
      Cc: Eryu Guan <eguan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e2093926
    • Ross Zwisler's avatar
      mm: avoid spurious 'bad pmd' warning messages · d0f0931d
      Ross Zwisler authored
      When the pmd_devmap() checks were added by 5c7fb56e ("mm, dax:
      dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
      pages, they were all added to the end of if() statements after existing
      pmd_trans_huge() checks.  So, things like:
      
        -       if (pmd_trans_huge(*pmd))
        +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
      
      When further checks were added after pmd_trans_unstable() checks by
      commit 7267ec00 ("mm: postpone page table allocation until we have
      page to map") they were also added at the end of the conditional:
      
        +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
      
      This ordering is fine for pmd_trans_huge(), but doesn't work for
      pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
      check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
      pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
      do end up doing the right thing, but only after spamming dmesg with
      suspicious looking messages:
      
        mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
      
      Reorder these checks in a helper so that pmd_devmap() is checked first,
      avoiding the error messages, and add a comment explaining why the
      ordering is important.
      
      Fixes: commit 7267ec00 ("mm: postpone page table allocation until we have page to map")
      Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Xiong Zhou <xzhou@redhat.com>
      Cc: Eryu Guan <eguan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0f0931d
    • Tetsuo Handa's avatar
      mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once · c288983d
      Tetsuo Handa authored
      Roman Gushchin has reported that the OOM killer can trivially selects
      next OOM victim when a thread doing memory allocation from page fault
      path was selected as first OOM victim.
      
          allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
          allocate cpuset=/ mems_allowed=0
          CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          Call Trace:
           oom_kill_process+0x219/0x3e0
           out_of_memory+0x11d/0x480
           __alloc_pages_slowpath+0xc84/0xd40
           __alloc_pages_nodemask+0x245/0x260
           alloc_pages_vma+0xa2/0x270
           __handle_mm_fault+0xca9/0x10c0
           handle_mm_fault+0xf3/0x210
           __do_page_fault+0x240/0x4e0
           trace_do_page_fault+0x37/0xe0
           do_async_page_fault+0x19/0x70
           async_page_fault+0x28/0x30
          ...
          Out of memory: Kill process 492 (allocate) score 899 or sacrifice child
          Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB
          allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)
          allocate cpuset=/ mems_allowed=0
          CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          Call Trace:
           __alloc_pages_slowpath+0xd32/0xd40
           __alloc_pages_nodemask+0x245/0x260
           alloc_pages_vma+0xa2/0x270
           __handle_mm_fault+0xca9/0x10c0
           handle_mm_fault+0xf3/0x210
           __do_page_fault+0x240/0x4e0
           trace_do_page_fault+0x37/0xe0
           do_async_page_fault+0x19/0x70
           async_page_fault+0x28/0x30
          ...
          oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
          ...
          allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
          allocate cpuset=/ mems_allowed=0
          CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          Call Trace:
           oom_kill_process+0x219/0x3e0
           out_of_memory+0x11d/0x480
           pagefault_out_of_memory+0x68/0x80
           mm_fault_error+0x8f/0x190
           ? handle_mm_fault+0xf3/0x210
           __do_page_fault+0x4b2/0x4e0
           trace_do_page_fault+0x37/0xe0
           do_async_page_fault+0x19/0x70
           async_page_fault+0x28/0x30
          ...
          Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
          Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB
      
      There is a race window that the OOM reaper completes reclaiming the
      first victim's memory while nothing but mutex_trylock() prevents the
      first victim from calling out_of_memory() from pagefault_out_of_memory()
      after memory allocation for page fault path failed due to being selected
      as an OOM victim.
      
      This is a side effect of commit 9a67f648 ("mm: consolidate
      GFP_NOFAIL checks in the allocator slowpath") because that commit
      silently changed the behavior from
      
          /* Avoid allocations with no watermarks from looping endlessly */
      
      to
      
          /*
           * Give up allocations without trying memory reserves if selected
           * as an OOM victim
           */
      
      in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE
      flag.  I have noticed this change but I didn't post a patch because I
      thought it is an acceptable change other than noise by warn_alloc()
      because !__GFP_NOFAIL allocations are allowed to fail.  But we
      overlooked that failing memory allocation from page fault path makes
      difference due to the race window explained above.
      
      While it might be possible to add a check to pagefault_out_of_memory()
      that prevents the first victim from calling out_of_memory() or remove
      out_of_memory() from pagefault_out_of_memory(), changing
      pagefault_out_of_memory() does not suppress noise by warn_alloc() when
      allocating thread was selected as an OOM victim.  There is little point
      with printing similar backtraces and memory information from both
      out_of_memory() and warn_alloc().
      
      Instead, if we guarantee that current thread can try allocations with no
      watermarks once when current thread looping inside
      __alloc_pages_slowpath() was selected as an OOM victim, we can follow "who
      can use memory reserves" rules and suppress noise by warn_alloc() and
      prevent memory allocations from page fault path from calling
      pagefault_out_of_memory().
      
      If we take the comment literally, this patch would do
      
        -    if (test_thread_flag(TIF_MEMDIE))
        -        goto nopage;
        +    if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))
        +        goto nopage;
      
      because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is
      given.  But if I recall correctly (I couldn't find the message), the
      condition is meant to apply to only OOM victims despite the comment.
      Therefore, this patch preserves TIF_MEMDIE check.
      
      Fixes: 9a67f648 ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath")
      Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarRoman Gushchin <guro@fb.com>
      Tested-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.11]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c288983d
    • Nicolas Iooss's avatar
      pcmcia: remove left-over %Z format · ff5a2016
      Nicolas Iooss authored
      Commit 5b5e0928 ("lib/vsprintf.c: remove %Z support") removed some
      usages of format %Z but forgot "%.2Zx".  This makes clang 4.0 reports a
      -Wformat-extra-args warning because it does not know about %Z.
      
      Replace %Z with %z.
      
      Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.orgSigned-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Harald Welte <laforge@gnumonks.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.11+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff5a2016
    • Thomas Gleixner's avatar
      slub/memcg: cure the brainless abuse of sysfs attributes · 478fe303
      Thomas Gleixner authored
      memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
      to propagate settings from the root kmem_cache to a newly created
      kmem_cache.  It does that with:
      
           attr->show(root, buf);
           attr->store(new, buf, strlen(bug);
      
      Aside of being a lazy and absurd hackery this is broken because it does
      not check the return value of the show() function.
      
      Some of the show() functions return 0 w/o touching the buffer.  That
      means in such a case the store function is called with the stale content
      of the previous show().  That causes nonsense like invoking
      kmem_cache_shrink() on a newly created kmem_cache.  In the worst case it
      would cause handing in an uninitialized buffer.
      
      This should be rewritten proper by adding a propagate() callback to
      those slub_attributes which must be propagated and avoid that insane
      conversion to and from ASCII, but that's too large for a hot fix.
      
      Check at least the return value of the show() function, so calling
      store() with stale content is prevented.
      
      Steven said:
       "It can cause a deadlock with get_online_cpus() that has been uncovered
        by recent cpu hotplug and lockdep changes that Thomas and Peter have
        been doing.
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(cpu_hotplug.lock);
                                         lock(slab_mutex);
                                         lock(cpu_hotplug.lock);
            lock(slab_mutex);
      
           *** DEADLOCK ***"
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      478fe303
    • Florian Fainelli's avatar
      initramfs: fix disabling of initramfs (and its compression) · 57ddfdaa
      Florian Fainelli authored
      Commit db2aa7fd ("initramfs: allow again choice of the embedded
      initram compression algorithm") introduced the possibility to select the
      initramfs compression algorithm from Kconfig and while this is a nice
      feature it broke the use case described below.
      
      Here is what my build system does:
      
       - kernel is initially configured not to have an initramfs included
      
       - build the user space root file system
      
       - re-configure the kernel to have an initramfs included
         (CONFIG_INITRAMFS_SOURCE="/path/to/romfs") and set relevant
         CONFIG_INITRAMFS options, in my case, no compression option
         (CONFIG_INITRAMFS_COMPRESSION_NONE)
      
       - kernel is re-built with these options -> kernel+initramfs image is
         copied
      
       - kernel is re-built again without these options -> kernel image is
         copied
      
      Building a kernel without an initramfs means setting this option:
      
        CONFIG_INITRAMFS_SOURCE="" (and this one only)
      
      whereas building a kernel with an initramfs means setting these options:
      
        CONFIG_INITRAMFS_SOURCE="/home/fainelli/work/uclinux-rootfs/romfs /home/fainelli/work/uclinux-rootfs/misc/initramfs.dev"
        CONFIG_INITRAMFS_ROOT_UID=1000
        CONFIG_INITRAMFS_ROOT_GID=1000
        CONFIG_INITRAMFS_COMPRESSION_NONE=y
        CONFIG_INITRAMFS_COMPRESSION=""
      
      Commit db2aa7fd ("initramfs: allow again choice of the embedded
      initram compression algorithm") is problematic because
      CONFIG_INITRAMFS_COMPRESSION which is used to determine the
      initramfs_data.cpio extension/compression is a string, and due to how
      Kconfig works it will evaluate in order, how to assign it.
      
      Setting CONFIG_INITRAMFS_COMPRESSION_NONE with CONFIG_INITRAMFS_SOURCE=""
      cannot possibly work (because of the depends on INITRAMFS_SOURCE!=""
      imposed on CONFIG_INITRAMFS_COMPRESSION ) yet we still get
      CONFIG_INITRAMFS_COMPRESSION assigned to ".gz" because CONFIG_RD_GZIP=y
      is set in my kernel, even when there is no initramfs being built.
      
      So we basically end-up generating two initramfs_data.cpio* files, one
      without extension, and one with .gz.  This causes usr/Makefile to track
      usr/initramfs_data.cpio.gz, and not usr/initramfs_data.cpio anymore,
      that is also largely problematic after 9e3596b0 ("kbuild:
      initramfs cleanup, set target from Kconfig") because we used to track
      all possible initramfs_data files in the $(targets) variable before that
      commit.
      
      The end result is that the kernel with an initramfs clearly does not
      contain what we expect it to, it has a stale initramfs_data.cpio file
      built into it, and we keep re-generating an initramfs_data.cpio.gz file
      which is not the one that we want to include in the kernel image proper.
      
      The fix consists in hiding CONFIG_INITRAMFS_COMPRESSION when
      CONFIG_INITRAMFS_SOURCE="".  This puts us back in a state to the
      pre-4.10 behavior where we can properly disable and re-enable initramfs
      within the same kernel .config file, and be in control of what
      CONFIG_INITRAMFS_COMPRESSION is set to.
      
      Fixes: db2aa7fd ("initramfs: allow again choice of the embedded initram compression algorithm")
      Fixes: 9e3596b0 ("kbuild: initramfs cleanup, set target from Kconfig")
      Link: http://lkml.kernel.org/r/20170521033337.6197-1-f.fainelli@gmail.comSigned-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Cc: P J P <ppandit@redhat.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57ddfdaa
    • Michal Hocko's avatar
      mm: clarify why we want kmalloc before falling backto vmallock · 4f4f2ba9
      Michal Hocko authored
      While converting drm_[cm]alloc* helpers to kvmalloc* variants Chris
      Wilson has wondered why we want to try kmalloc before vmalloc fallback
      even for larger allocations requests.  Let's clarify that one larger
      physically contiguous block is less likely to fragment memory than many
      scattered pages which can prevent more large blocks from being created.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170517080932.21423-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f4f2ba9
    • Matthias Kaehlcke's avatar
      frv: declare jiffies to be located in the .data section · 60b0a8c3
      Matthias Kaehlcke authored
      Commit 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with
      ____cacheline_aligned_in_smp") removed a section specification from the
      jiffies declaration that caused conflicts on some platforms.
      
      Unfortunately this change broke the build for frv:
      
        kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
            symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
        ...
      
      Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
      include the section specification.  For all other platforms
      __jiffy_arch_data (currently) has no effect.
      
      Fixes: 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
      Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.orgSigned-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      60b0a8c3
    • Michal Hocko's avatar
      include/linux/gfp.h: fix ___GFP_NOLOCKDEP value · 1bde33e0
      Michal Hocko authored
      Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit.  At
      the time commit 7e784422 ("lockdep: allow to disable reclaim lockup
      detection") was written we still had __GFP_OTHER_NODE but I have removed
      it in commit 41b6167e ("mm: get rid of __GFP_OTHER_NODE") and forgot
      to lower the bit value.
      
      The current value is outside of __GFP_BITS_SHIFT so it cannot be used
      actually.
      
      Fixes: 7e784422 ("lockdep: allow to disable reclaim lockup detection")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarIgor Stoppa <igor.stoppa@nokia.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1bde33e0
    • Andrea Arcangeli's avatar
      ksm: prevent crash after write_protect_page fails · a7306c34
      Andrea Arcangeli authored
      "err" needs to be left set to -EFAULT if split_huge_page succeeds.
      Otherwise if "err" gets clobbered with zero and write_protect_page
      fails, try_to_merge_one_page() will succeed instead of returning -EFAULT
      and then try_to_merge_with_ksm_page() will continue thinking kpage is a
      PageKsm when in fact it's still an anonymous page.  Eventually it'll
      crash in page_add_anon_rmap.
      
      This has been reproduced on Fedora25 kernel but I can reproduce with
      upstream too.
      
      The bug was introduced in commit f765f540 ("ksm: prepare to new THP
      semantics") introduced in v4.5.
      
          page:fffff67546ce1cc0 count:4 mapcount:2 mapping:ffffa094551e36e1 index:0x7f0f46673
          flags: 0x2ffffc0004007c(referenced|uptodate|dirty|lru|active|swapbacked)
          page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
          page->mem_cgroup:ffffa09674bf0000
          ------------[ cut here ]------------
          kernel BUG at mm/rmap.c:1222!
          CPU: 1 PID: 76 Comm: ksmd Not tainted 4.9.3-200.fc25.x86_64 #1
          RIP: do_page_add_anon_rmap+0x1c4/0x240
          Call Trace:
            page_add_anon_rmap+0x18/0x20
            try_to_merge_with_ksm_page+0x50b/0x780
            ksm_scan_thread+0x1211/0x1410
            ? prepare_to_wait_event+0x100/0x100
            ? try_to_merge_with_ksm_page+0x780/0x780
            kthread+0xd9/0xf0
            ? kthread_park+0x60/0x60
            ret_from_fork+0x25/0x30
      
      Fixes: f765f540 ("ksm: prepare to new THP semantics")
      Link: http://lkml.kernel.org/r/20170513131040.21732-1-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarFederico Simoncelli <fsimonce@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7306c34
    • Linus Torvalds's avatar
      Merge tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c531577b
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This contains the fixes for a few reported regression for HD-audio and
        USB-audio. All small, trivial, and boring"
      
      * tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Fix applying MSI dual-codec mobo quirk
        ALSA: usb: Avoid VLA in mixer_us16x08.c
        ALSA: usb: Fix a typo in Tascam US-16x08 mixer element
        Revert "ALSA: usb-audio: purge needless variable length array"
      c531577b
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma · f8e72db3
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "Here is the dmaengine fixes request for 4.12. Fixes bunch of issues in
        the driver, npthing exciting though..
      
         - mv_xor_v2 driver fixes for handling descriptors, tx_submit
           implementation, removing interrupt coalescing and setting DMA mask
           properly
      
         - fix usb-dmac DMAOR AE bit definition
      
         - fix ep93xx start buffer from BASE0 and not drain the transfers in
           terminate_all
      
         - fix rcar-dmac to use right descriptor pointer for residue
           calculation
      
         - pl330 fix warn for irq freeup"
      
      * tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: pl330: fix warning in pl330_remove
        rcar-dmac: fixup descriptor pointer for descriptor mode
        dmaengine: ep93xx: Don't drain the transfers in terminate_all()
        dmaengine: ep93xx: Always start from BASE0
        dmaengine: usb-dmac: Fix DMAOR AE bit definition
        dmaengine: mv_xor_v2: set DMA mask to 40 bits
        dmaengine: mv_xor_v2: remove interrupt coalescing
        dmaengine: mv_xor_v2: fix tx_submit() implementation
        dmaengine: mv_xor_v2: enable XOR engine after its configuration
        dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
        dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
        dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly
      f8e72db3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 6df62e79
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - corner-case oops fixes for Asus and Wacom drivers from Carlo Caione
         and Jason Gerecke
      
       - power management fix (reported on SIS0817 touchscreen) for i2c-hid
         devices from Hans de Goede
      
       - device-id-specific fixes and quirks from Hans de Goede, Diego Elio
         Pettenò and Che-Liang Chiou
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: asus: Stop underlying hardware on remove
        HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices
        HID: asus: Add support for T100 keyboard
        HID: elecom: extend to fix the descriptor for DEFT trackballs
        HID: magicmouse: Set multi-touch keybits for Magic Mouse
        HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
      6df62e79
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 035f1456
      Linus Torvalds authored
      Pull livepatching fix from Jiri Kosina:
       "Kconfig dependency fix for livepatching infrastructure from Miroslav
        Benes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
      035f1456
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f2a025de
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - revert a broken PAT commit that broke a number of systems
      
         - fix two preemptability warnings/bugs that can trigger under certain
           circumstances, in the debug code and in the microcode loader"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
        x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning
        x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug
      f2a025de
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f56f88ee
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Misc fixes:
      
         - three boot crash fixes for uncommon configurations
      
         - silence a boot warning under virtualization
      
         - plus a GCC 7 related (harmless) build warning fix"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
        x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled
        x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
        efi: Remove duplicate 'const' specifiers
        efi: Don't issue error message when booted under Xen
      f56f88ee
    • Carlo Caione's avatar
      HID: asus: Stop underlying hardware on remove · 715e944f
      Carlo Caione authored
      We are missing a call to hid_hw_stop() on the remove hook.
      Among other things this is causing an Oops when (re-)starting GNOME /
      upowerd / ... after the module has been already rmmod-ed.
      Signed-off-by: default avatarCarlo Caione <carlo@endlessm.com>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      715e944f
    • Jean-Philippe Brucker's avatar
      dmaengine: pl330: fix warning in pl330_remove · ebcdaee4
      Jean-Philippe Brucker authored
      When removing a device with less than 9 IRQs (AMBA_NR_IRQS), we'll get a
      big WARN_ON from devres.c because pl330_remove calls devm_free_irqs for
      unallocated irqs. Similarly to pl330_probe, check that IRQ number is
      present before calling devm_free_irq.
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      ebcdaee4
  2. 01 Jun, 2017 14 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux · 3b1e342b
      Linus Torvalds authored
      Pull nfsd fixes from Bruce Fields:
       "Revert patch accidentally included in the merge window pull request,
        and fix a crash that was likely a result of buggy client behavior"
      
      * tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux:
        nfsd4: fix null dereference on replay
        nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"
      3b1e342b
    • Linus Torvalds's avatar
      Merge tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 2f48641c
      Linus Torvalds authored
      Pull gcc-plugin prepwork from Kees Cook:
       "Use designated initializers for mtk-vcodec, powerplay, amdgpu, and
        sgi-xp. Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs,
        and NFS.
      
        Christoph Hellwig recommended that I send these fixes now, rather than
        waiting for the v4.13 merge window. These are all initializer and cast
        fixes needed for the future randstruct plugin that haven't been picked
        up by the respective maintainers"
      
      * tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        mtk-vcodec: Use designated initializers
        drm/amd/powerplay: Use designated initializers
        drm/amdgpu: Use designated initializers
        sgi-xp: Use designated initializers
        ocfs2: Use ERR_CAST() to avoid cross-structure cast
        ntfs: Use ERR_CAST() to avoid cross-structure cast
        NFS: Use ERR_CAST() to avoid cross-structure cast
      2f48641c
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 9ea15a59
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Many small x86 bug fixes: SVM segment registers access rights, nested
        VMX, preempt notifiers, LAPIC virtual wire mode, NMI injection"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Fix nmi injection failure when vcpu got blocked
        KVM: SVM: do not zero out segment attributes if segment is unusable or not present
        KVM: SVM: ignore type when setting segment registers
        KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging
        KVM: x86: Fix virtual wire mode
        KVM: nVMX: Fix handling of lmsw instruction
        KVM: X86: Fix preempt the preemption timer cancel
      9ea15a59
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 0bb23039
      Linus Torvalds authored
      Pull Reiserfs and GFS2 fixes from Jan Kara:
       "Fixes to GFS2 & Reiserfs for the fallout of the recent WRITE_FUA
        cleanup from Christoph.
      
        Fixes for other filesystems were already merged by respective
        maintainers."
      
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        reiserfs: Make flush bios explicitely sync
        gfs2: Make flush bios explicitely sync
      0bb23039
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 393bcfae
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "Here are the target-pending fixes for v4.12-rc4:
      
         - ibmviscsis ABORT_TASK handling fixes that missed the v4.12 merge
           window. (Bryant Ly and Michael Cyr)
      
         - Re-add a target-core check enforcing WRITE overflow reject that was
           relaxed in v4.3, to avoid unsupported iscsi-target immediate data
           overflow. (nab)
      
         - Fix a target-core-user OOPs during device removal. (MNC + Bryant
           Ly)
      
         - Fix a long standing iscsi-target potential issue where kthread exit
           did not wait for kthread_should_stop(). (Jiang Yi)
      
         - Fix a iscsi-target v3.12.y regression OOPs involving initial login
           PDU processing during asynchronous TCP connection close. (MNC +
           nab)
      
        This is a little larger than usual for an -rc4, primarily due to the
        iscsi-target v3.12.y regression OOPs bug-fix.
      
        However, it's an important patch as MNC + Hannes where both able to
        trigger it using a reduced iscsi initiator login timeout combined with
        a backend taking a long time to complete I/Os during iscsi login
        driven session reinstatement"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        iscsi-target: Always wait for kthread_should_stop() before kthread exit
        iscsi-target: Fix initial login PDU asynchronous socket close OOPs
        tcmu: fix crash during device removal
        target: Re-add check to reject control WRITEs with overflow data
        ibmvscsis: Fix the incorrect req_lim_delta
        ibmvscsis: Clear left-over abort_cmd pointers
      393bcfae
    • Ingo Molnar's avatar
      Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT" · c08d5174
      Ingo Molnar authored
      This reverts commit cbed27cd.
      
      As Andy Lutomirski observed:
      
       "I think this patch is bogus. pat_enabled() sure looks like it's
        supposed to return true if PAT is *enabled*, and these days PAT is
        'enabled' even if there's no HW PAT support."
      Reported-by: default avatarBernhard Held <berny156@gmx.de>
      Reported-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: stable@vger.kernel.org # v4.2+
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c08d5174
    • ZhuangYanying's avatar
      KVM: x86: Fix nmi injection failure when vcpu got blocked · 47a66eed
      ZhuangYanying authored
      When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads,
      other than the lock-holding one, would enter into S state because of
      pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could
      not be injected into vm.
      
      The reason is:
      1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets
      cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile.
      2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because
      cpu->kvm_vcpu_dirty is true.
      
      It's not enough just to check nmi_queued to decide whether to stay in
      vcpu_block() or not. NMI should be injected immediately at any situation.
      Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued
      in vm_vcpu_has_events().
      
      Do the same change for SMIs.
      Signed-off-by: default avatarZhuang Yanying <ann.zhuangyanying@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      47a66eed
    • Roman Pen's avatar
      KVM: SVM: do not zero out segment attributes if segment is unusable or not present · d9c1b543
      Roman Pen authored
      This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
      was taken on userspace stack.  The root cause lies in the specific AMD CPU
      behaviour which manifests itself as unusable segment attributes on SYSRET.
      The corresponding work around for the kernel is the following:
      
      61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")
      
      In other turn virtualization side treated unusable segment incorrectly and
      restored CPL from SS attributes, which were zeroed out few lines above.
      
      In current patch it is assured only that P bit is cleared in VMCB.save state
      and segment attributes are not zeroed out if segment is not presented or is
      unusable, therefore CPL can be safely restored from DPL field.
      
      This is only one part of the fix, since QEMU side should be fixed accordingly
      not to zero out attributes on its side.  Corresponding patch will follow.
      
      [1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com
      Signed-off-by: default avatarRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: default avatarMikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmáŠ<rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d9c1b543
    • Takashi Iwai's avatar
      ALSA: hda - Fix applying MSI dual-codec mobo quirk · d2c3b14e
      Takashi Iwai authored
      The previous commit [63691587: ALSA: hda - Apply dual-codec quirk
      for MSI Z270-Gaming mobo] attempted to apply the existing dual-codec
      quirk for a MSI mobo.  But it turned out that this isn't applied
      properly due to the MSI-vendor quirk before this entry.  I overlooked
      such two MSI entries just because they were put in the wrong position,
      although we have a list ordered by PCI SSID numbers.
      
      This patch fixes it by rearranging the unordered entries.
      
      Fixes: 63691587 ("ALSA: hda - Apply dual-codec quirk for MSI Z270-Gaming mobo")
      Reported-by: default avatarRudolf Schmidt <info@rudolfschmidt.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      d2c3b14e
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux · a3748463
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This is the main set of fixes for rc4, one amdgpu fix, some exynos
        regression fixes, some msm fixes and some i915 and GVT fixes.
      
        I've got a second regression fix for some DP chips that might be a
        bit large, but I think we'd like to land it now, I'll send it along
        tomorrow, once you are happy with this set"
      
      * tag 'drm-fixes-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux: (24 commits)
        drm/amdgpu: Program ring for vce instance 1 at its register space
        drm/exynos: clean up description of exynos_drm_crtc
        drm/exynos: dsi: Remove bridge node reference in removal
        drm/exynos: dsi: Fix the parse_dt function
        drm/exynos: Merge pre/postclose hooks
        drm/msm: Fix the check for the command size
        drm/msm: Take the mutex before calling msm_gem_new_impl
        drm/msm: for array in-fences, check if all backing fences are from our own context before waiting
        drm/msm: constify irq_domain_ops
        drm/msm/mdp5: release hwpipe(s) for unused planes
        drm/msm: Reuse dma_fence_release.
        drm/msm: Expose our reservation object when exporting a dmabuf.
        drm/msm/gpu: check legacy clk names in get_clocks()
        drm/msm/mdp5: use __drm_atomic_helper_plane_duplicate_state()
        drm/msm: select PM_OPP
        drm/i915: Stop pretending to mask/unmask LPE audio interrupts
        drm/i915/selftests: Silence compiler warning in igt_ctx_exec
        Revert "drm/i915: Restore lost "Initialized i915" welcome message"
        drm/i915/gvt: clean up unsubmited workloads before destroying kmem cache
        drm/i915/gvt: Disable compression workaround for Gen9
        ...
      a3748463
    • Dave Airlie's avatar
      Merge tag 'exynos-drm-fixes-for-v4.12' of... · 400129f0
      Dave Airlie authored
      Merge tag 'exynos-drm-fixes-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
      
      - Fix a regression to description of exynos_drm_crtc
      - Remove preclose hook of Exynos
        . This was a exynos change of the patch series[1] merged already.
      - Fix one dt broken issue
      - Make sure to release bridge_node of Exynos MIPI-DSI driver.
      
      [1] https://lists.freedesktop.org/archives/dri-devel/2017-March/135111.html
      
      * tag 'exynos-drm-fixes-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
        drm/exynos: clean up description of exynos_drm_crtc
        drm/exynos: dsi: Remove bridge node reference in removal
        drm/exynos: dsi: Fix the parse_dt function
        drm/exynos: Merge pre/postclose hooks
      400129f0
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 8ef6fcc8
      Dave Airlie authored
      * 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux:
        drm/amdgpu: Program ring for vce instance 1 at its register space
      8ef6fcc8
    • Dave Airlie's avatar
      Merge branch 'msm-fixes-4.12-rc4' of git://people.freedesktop.org/~robclark/linux into drm-fixes · 58b58f6e
      Dave Airlie authored
      a few fixes for 4.12..
      
      * 'msm-fixes-4.12-rc4' of git://people.freedesktop.org/~robclark/linux:
        drm/msm: Fix the check for the command size
        drm/msm: Take the mutex before calling msm_gem_new_impl
        drm/msm: for array in-fences, check if all backing fences are from our own context before waiting
        drm/msm: constify irq_domain_ops
        drm/msm/mdp5: release hwpipe(s) for unused planes
        drm/msm: Reuse dma_fence_release.
        drm/msm: Expose our reservation object when exporting a dmabuf.
        drm/msm/gpu: check legacy clk names in get_clocks()
        drm/msm/mdp5: use __drm_atomic_helper_plane_duplicate_state()
        drm/msm: select PM_OPP
      58b58f6e
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2017-05-29' of... · 25f480e8
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
      
      drm/i915 fixes for v4.12-rc4
      
      * tag 'drm-intel-fixes-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel:
        drm/i915: Stop pretending to mask/unmask LPE audio interrupts
        drm/i915/selftests: Silence compiler warning in igt_ctx_exec
        Revert "drm/i915: Restore lost "Initialized i915" welcome message"
        drm/i915/gvt: clean up unsubmited workloads before destroying kmem cache
        drm/i915/gvt: Disable compression workaround for Gen9
        drm/i915: set initialised only when init_context callback is NULL
        drm/i915: Fix new -Wint-in-bool-context gcc compiler warning
        drm/i915: use vma->size for appgtt allocate_va_range
        drm/i915: Do not sync RCU during shrinking
      25f480e8
  3. 31 May, 2017 4 commits
    • Jiang Yi's avatar
      iscsi-target: Always wait for kthread_should_stop() before kthread exit · 5e0cf5e6
      Jiang Yi authored
      There are three timing problems in the kthread usages of iscsi_target_mod:
      
       - np_thread of struct iscsi_np
       - rx_thread and tx_thread of struct iscsi_conn
      
      In iscsit_close_connection(), it calls
      
       send_sig(SIGINT, conn->tx_thread, 1);
       kthread_stop(conn->tx_thread);
      
      In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
      SIGINT the kthread will exit without checking the return value of
      kthread_should_stop().
      
      So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
      and kthread_stop(...), the kthread_stop() will try to stop an already
      stopped kthread.
      
      This is invalid according to the documentation of kthread_stop().
      
      (Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
       early iscsi_target_rx_thread failure case - nab)
      Signed-off-by: default avatarJiang Yi <jiangyilism@gmail.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      5e0cf5e6
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 25cdda95
      Nicholas Bellinger authored
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing within iscsi_target_do_login() in iscsi_np
      context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
      but doesn't invoke schedule_delayed_work().
      
      The original NULL pointer dereference case reported by MNC is now handled
      by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
      transitioning to FFP to determine when the socket has already closed,
      or iscsi_target_start_negotiation() if the login needs to exchange
      more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
      closed.  For both of these cases, the cleanup up of remaining connection
      resources will occur in iscsi_target_start_negotiation() from iscsi_np
      process context once the failure is detected.
      
      Finally, to handle to case where iscsi_target_sk_state_change() is
      called after the initial PDU procesing is complete, it now invokes
      conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
      existing iscsi_target_sk_check_close() checks detect connection failure.
      For this case, the cleanup of remaining connection resources will occur
      in iscsi_target_do_login_rx() from delayed workqueue process context
      once the failure is detected.
      Reported-by: default avatarMike Christie <mchristi@redhat.com>
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Tested-by: default avatarMike Christie <mchristi@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reported-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Varun Prakash <varun@chelsio.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      25cdda95
    • Leo Liu's avatar
      drm/amdgpu: Program ring for vce instance 1 at its register space · 45cc6586
      Leo Liu authored
      We need program ring buffer on instance 1 register space domain,
      when only if instance 1 available, with two instances or instance 0,
      and we need only program instance 0 regsiter space domain for ring.
      Signed-off-by: default avatarLeo Liu <leo.liu@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      45cc6586
    • Linus Torvalds's avatar
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · d602fb68
      Linus Torvalds authored
      Pull overlayfs fixes from Miklos Szeredi:
       "Fix regressions:
      
         - missing CONFIG_EXPORTFS dependency
      
         - failure if upper fs doesn't support xattr
      
         - bad error cleanup
      
        This also adds the concept of "impure" directories complementing the
        "origin" marking introduced in -rc1. Together they enable getting
        consistent st_ino and d_ino for directory listings.
      
        And there's a bug fix and a cleanup as well"
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: filter trusted xattr for non-admin
        ovl: mark upper merge dir with type origin entries "impure"
        ovl: mark upper dir with type origin entries "impure"
        ovl: remove unused arg from ovl_lookup_temp()
        ovl: handle rename when upper doesn't support xattr
        ovl: don't fail copy-up if upper doesn't support xattr
        ovl: check on mount time if upper fs supports setting xattr
        ovl: fix creds leak in copy up error path
        ovl: select EXPORTFS
      d602fb68