1. 25 Feb, 2017 22 commits
    • Mel Gorman's avatar
      mm, page_alloc: only use per-cpu allocator for irq-safe requests · 374ad05a
      Mel Gorman authored
      Many workloads that allocate pages are not handling an interrupt at a
      time.  As allocation requests may be from IRQ context, it's necessary to
      disable/enable IRQs for every page allocation.  This cost is the bulk of
      the free path but also a significant percentage of the allocation path.
      
      This patch alters the locking and checks such that only irq-safe
      allocation requests use the per-cpu allocator.  All others acquire the
      irq-safe zone->lock and allocate from the buddy allocator.  It relies on
      disabling preemption to safely access the per-cpu structures.  It could
      be slightly modified to avoid soft IRQs using it but it's not clear it's
      worthwhile.
      
      This modification may slow allocations from IRQ context slightly but the
      main gain from the per-cpu allocator is that it scales better for
      allocations from multiple contexts.  There is an implicit assumption
      that intensive allocations from IRQ contexts on multiple CPUs from a
      single NUMA node are rare and that the fast majority of scaling issues
      are encountered in !IRQ contexts such as page faulting.  It's worth
      noting that this patch is not required for a bulk page allocator but it
      significantly reduces the overhead.
      
      The following is results from a page allocator micro-benchmark.  Only
      order-0 is interesting as higher orders do not use the per-cpu allocator
      
                                                4.10.0-rc2                 4.10.0-rc2
                                                   vanilla               irqsafe-v1r5
      Amean    alloc-odr0-1               287.15 (  0.00%)           219.00 ( 23.73%)
      Amean    alloc-odr0-2               221.23 (  0.00%)           183.23 ( 17.18%)
      Amean    alloc-odr0-4               187.00 (  0.00%)           151.38 ( 19.05%)
      Amean    alloc-odr0-8               167.54 (  0.00%)           132.77 ( 20.75%)
      Amean    alloc-odr0-16              156.00 (  0.00%)           123.00 ( 21.15%)
      Amean    alloc-odr0-32              149.00 (  0.00%)           118.31 ( 20.60%)
      Amean    alloc-odr0-64              138.77 (  0.00%)           116.00 ( 16.41%)
      Amean    alloc-odr0-128             145.00 (  0.00%)           118.00 ( 18.62%)
      Amean    alloc-odr0-256             136.15 (  0.00%)           125.00 (  8.19%)
      Amean    alloc-odr0-512             147.92 (  0.00%)           121.77 ( 17.68%)
      Amean    alloc-odr0-1024            147.23 (  0.00%)           126.15 ( 14.32%)
      Amean    alloc-odr0-2048            155.15 (  0.00%)           129.92 ( 16.26%)
      Amean    alloc-odr0-4096            164.00 (  0.00%)           136.77 ( 16.60%)
      Amean    alloc-odr0-8192            166.92 (  0.00%)           138.08 ( 17.28%)
      Amean    alloc-odr0-16384           159.00 (  0.00%)           138.00 ( 13.21%)
      Amean    free-odr0-1                165.00 (  0.00%)            89.00 ( 46.06%)
      Amean    free-odr0-2                113.00 (  0.00%)            63.00 ( 44.25%)
      Amean    free-odr0-4                 99.00 (  0.00%)            54.00 ( 45.45%)
      Amean    free-odr0-8                 88.00 (  0.00%)            47.38 ( 46.15%)
      Amean    free-odr0-16                83.00 (  0.00%)            46.00 ( 44.58%)
      Amean    free-odr0-32                80.00 (  0.00%)            44.38 ( 44.52%)
      Amean    free-odr0-64                72.62 (  0.00%)            43.00 ( 40.78%)
      Amean    free-odr0-128               78.00 (  0.00%)            42.00 ( 46.15%)
      Amean    free-odr0-256               80.46 (  0.00%)            57.00 ( 29.16%)
      Amean    free-odr0-512               96.38 (  0.00%)            64.69 ( 32.88%)
      Amean    free-odr0-1024             107.31 (  0.00%)            72.54 ( 32.40%)
      Amean    free-odr0-2048             108.92 (  0.00%)            78.08 ( 28.32%)
      Amean    free-odr0-4096             113.38 (  0.00%)            82.23 ( 27.48%)
      Amean    free-odr0-8192             112.08 (  0.00%)            82.85 ( 26.08%)
      Amean    free-odr0-16384            110.38 (  0.00%)            81.92 ( 25.78%)
      Amean    total-odr0-1               452.15 (  0.00%)           308.00 ( 31.88%)
      Amean    total-odr0-2               334.23 (  0.00%)           246.23 ( 26.33%)
      Amean    total-odr0-4               286.00 (  0.00%)           205.38 ( 28.19%)
      Amean    total-odr0-8               255.54 (  0.00%)           180.15 ( 29.50%)
      Amean    total-odr0-16              239.00 (  0.00%)           169.00 ( 29.29%)
      Amean    total-odr0-32              229.00 (  0.00%)           162.69 ( 28.96%)
      Amean    total-odr0-64              211.38 (  0.00%)           159.00 ( 24.78%)
      Amean    total-odr0-128             223.00 (  0.00%)           160.00 ( 28.25%)
      Amean    total-odr0-256             216.62 (  0.00%)           182.00 ( 15.98%)
      Amean    total-odr0-512             244.31 (  0.00%)           186.46 ( 23.68%)
      Amean    total-odr0-1024            254.54 (  0.00%)           198.69 ( 21.94%)
      Amean    total-odr0-2048            264.08 (  0.00%)           208.00 ( 21.24%)
      Amean    total-odr0-4096            277.38 (  0.00%)           219.00 ( 21.05%)
      Amean    total-odr0-8192            279.00 (  0.00%)           220.92 ( 20.82%)
      Amean    total-odr0-16384           269.38 (  0.00%)           219.92 ( 18.36%)
      
      This is the alloc, free and total overhead of allocating order-0 pages
      in batches of 1 page up to 16384 pages.  Avoiding disabling/enabling
      overhead massively reduces overhead.  Alloc overhead is roughly reduced
      by 14-20% in most cases.  The free path is reduced by 26-46% and the
      total reduction is significant.
      
      Many users require zeroing of pages from the page allocator which is the
      vast cost of allocation.  Hence, the impact on a basic page faulting
      benchmark is not that significant
      
                                    4.10.0-rc2            4.10.0-rc2
                                       vanilla          irqsafe-v1r5
      Hmean    page_test   656632.98 (  0.00%)   675536.13 (  2.88%)
      Hmean    brk_test   3845502.67 (  0.00%)  3867186.94 (  0.56%)
      Stddev   page_test    10543.29 (  0.00%)     4104.07 ( 61.07%)
      Stddev   brk_test     33472.36 (  0.00%)    15538.39 ( 53.58%)
      CoeffVar page_test        1.61 (  0.00%)        0.61 ( 62.15%)
      CoeffVar brk_test         0.87 (  0.00%)        0.40 ( 53.84%)
      Max      page_test   666513.33 (  0.00%)   678640.00 (  1.82%)
      Max      brk_test   3882800.00 (  0.00%)  3887008.66 (  0.11%)
      
      This is from aim9 and the most notable outcome is that fault variability
      is reduced by the patch.  The headline improvement is small as the
      overall fault cost, zeroing, page table insertion etc dominate relative
      to disabling/enabling IRQs in the per-cpu allocator.
      
      Similarly, little benefit was seen on networking benchmarks both
      localhost and between physical server/clients where other costs
      dominate.  It's possible that this will only be noticable on very high
      speed networks.
      
      Jesper Dangaard Brouer independently tested this with a separate
      microbenchmark from
        https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
      
      Micro-benchmarked with [1] page_bench02:
       modprobe page_bench02 page_order=0 run_flags=$((2#010)) loops=$((10**8)); \
        rmmod page_bench02 ; dmesg --notime | tail -n 4
      
      Compared to baseline: 213 cycles(tsc) 53.417 ns
       - against this     : 184 cycles(tsc) 46.056 ns
       - Saving           : -29 cycles
       - Very close to expected 27 cycles saving [see below [2]]
      
      Micro benchmarking via time_bench_sample[3], we get the cost of these
      operations:
      
       time_bench: Type:for_loop                 Per elem: 0 cycles(tsc) 0.232 ns (step:0)
       time_bench: Type:spin_lock_unlock         Per elem: 33 cycles(tsc) 8.334 ns (step:0)
       time_bench: Type:spin_lock_unlock_irqsave Per elem: 62 cycles(tsc) 15.607 ns (step:0)
       time_bench: Type:irqsave_before_lock      Per elem: 57 cycles(tsc) 14.344 ns (step:0)
       time_bench: Type:spin_lock_unlock_irq     Per elem: 34 cycles(tsc) 8.560 ns (step:0)
       time_bench: Type:simple_irq_disable_before_lock Per elem: 37 cycles(tsc) 9.289 ns (step:0)
       time_bench: Type:local_BH_disable_enable  Per elem: 19 cycles(tsc) 4.920 ns (step:0)
       time_bench: Type:local_IRQ_disable_enable Per elem: 7 cycles(tsc) 1.864 ns (step:0)
       time_bench: Type:local_irq_save_restore   Per elem: 38 cycles(tsc) 9.665 ns (step:0)
       [Mel's patch removes a ^^^^^^^^^^^^^^^^]            ^^^^^^^^^ expected saving - preempt cost
       time_bench: Type:preempt_disable_enable   Per elem: 11 cycles(tsc) 2.794 ns (step:0)
       [adds a preempt  ^^^^^^^^^^^^^^^^^^^^^^]            ^^^^^^^^^ adds this cost
       time_bench: Type:funcion_call_cost        Per elem: 6 cycles(tsc) 1.689 ns (step:0)
       time_bench: Type:func_ptr_call_cost       Per elem: 11 cycles(tsc) 2.767 ns (step:0)
       time_bench: Type:page_alloc_put           Per elem: 211 cycles(tsc) 52.803 ns (step:0)
      
      Thus, expected improvement is: 38-11 = 27 cycles.
      
      [mgorman@techsingularity.net: s/preempt_enable_no_resched/preempt_enable/]
        Link: http://lkml.kernel.org/r/20170208143128.25ahymqlyspjcixu@techsingularity.net
      Link: http://lkml.kernel.org/r/20170123153906.3122-5-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      374ad05a
    • Michal Hocko's avatar
      mm, page_alloc: do not depend on cpu hotplug locks inside the allocator · a459eeb7
      Michal Hocko authored
      Dmitry has reported the following lockdep splat
        lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
        __mutex_lock_common kernel/locking/mutex.c:521 [inline]
        mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621
        pcpu_alloc+0xbda/0x1280 mm/percpu.c:896
        __alloc_percpu+0x24/0x30 mm/percpu.c:1075
        smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:44
        cpuhp_invoke_callback+0x254/0x1480 kernel/cpu.c:136
        cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:493
        _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057
        do_cpu_up+0x73/0xa0 kernel/cpu.c:1087
        cpu_up+0x18/0x20 kernel/cpu.c:1095
        smp_init+0xe9/0xee kernel/smp.c:564
        kernel_init_freeable+0x439/0x690 init/main.c:1010
        kernel_init+0x13/0x180 init/main.c:941
        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
      
      cpu_hotplug_begin
        cpu_hotplug.lock
      pcpu_alloc
        pcpu_alloc_mutex
      
        get_online_cpus+0x62/0x90 kernel/cpu.c:248
        drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385
        __alloc_pages_direct_reclaim mm/page_alloc.c:3440 [inline]
        __alloc_pages_slowpath+0x8fd/0x2370 mm/page_alloc.c:3778
        __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980
        __alloc_pages include/linux/gfp.h:426 [inline]
        __alloc_pages_node include/linux/gfp.h:439 [inline]
        alloc_pages_node include/linux/gfp.h:453 [inline]
        pcpu_alloc_pages mm/percpu-vm.c:93 [inline]
        pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282
        pcpu_alloc+0xe01/0x1280 mm/percpu.c:998
        __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062
        bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline]
        array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99
        find_and_alloc_map kernel/bpf/syscall.c:34 [inline]
        map_create kernel/bpf/syscall.c:188 [inline]
        SYSC_bpf kernel/bpf/syscall.c:870 [inline]
        SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827
        entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      pcpu_alloc
        pcpu_alloc_mutex
      drain_all_pages
        get_online_cpus
          cpu_hotplug.lock
      
        cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:304
        _cpu_up+0xca/0x2a0 kernel/cpu.c:1011
        do_cpu_up+0x73/0xa0 kernel/cpu.c:1087
        cpu_up+0x18/0x20 kernel/cpu.c:1095
        smp_init+0xe9/0xee kernel/smp.c:564
        kernel_init_freeable+0x439/0x690 init/main.c:1010
        kernel_init+0x13/0x180 init/main.c:941
        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
      
      cpu_hotplug_begin
        cpu_hotplug.lock
      
      Pulling cpu hotplug locks inside the page allocator is just too
      dangerous.  Let's remove the dependency by dropping get_online_cpus()
      from drain_all_pages.  This is not so simple though because now we do
      not have a protection against cpu hotplug which means 2 things:
      
        - the work item might be executed on a different cpu in worker from
          unbound pool so it doesn't run on pinned on the cpu
      
        - we have to make sure that we do not race with page_alloc_cpu_dead
          calling drain_pages_zone
      
      Disabling preemption in drain_local_pages_wq will solve the first
      problem drain_local_pages will determine its local CPU from the WQ
      context which will be stable after that point, page_alloc_cpu_dead is
      pinned to the CPU already.  The later condition is achieved by disabling
      IRQs in drain_pages_zone.
      
      Fixes: mm, page_alloc: drain per-cpu pages from workqueue context
      Link: http://lkml.kernel.org/r/20170207201950.20482-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a459eeb7
    • Mel Gorman's avatar
      mm, page_alloc: drain per-cpu pages from workqueue context · 0ccce3b9
      Mel Gorman authored
      The per-cpu page allocator can be drained immediately via
      drain_all_pages() which sends IPIs to every CPU.  In the next patch, the
      per-cpu allocator will only be used for interrupt-safe allocations which
      prevents draining it from IPI context.  This patch uses workqueues to
      drain the per-cpu lists instead.
      
      This is slower but no slowdown during intensive reclaim was measured and
      the paths that use drain_all_pages() are not that sensitive to
      performance.  This is particularly true as the path would only be
      triggered when reclaim is failing.  It also makes a some sense to avoid
      storming a machine with IPIs when it's under memory pressure.  Arguably,
      it should be further adjusted so that only one caller at a time is
      draining pages but it's beyond the scope of the current patch.
      
      Link: http://lkml.kernel.org/r/20170123153906.3122-4-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ccce3b9
    • Mel Gorman's avatar
      mm, page_alloc: split alloc_pages_nodemask() · 9cd75558
      Mel Gorman authored
      alloc_pages_nodemask does a number of preperation steps that determine
      what zones can be used for the allocation depending on a variety of
      factors.  This is fine but a hypothetical caller that wanted multiple
      order-0 pages has to do the preparation steps multiple times.  This
      patch structures __alloc_pages_nodemask such that it's relatively easy
      to build a bulk order-0 page allocator.  There is no functional change.
      
      Link: http://lkml.kernel.org/r/20170123153906.3122-3-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9cd75558
    • Mel Gorman's avatar
      mm, page_alloc: split buffered_rmqueue() · 066b2393
      Mel Gorman authored
      Patch series "Use per-cpu allocator for !irq requests and prepare for a
      bulk allocator", v5.
      
      This series is motivated by a conversation led by Jesper Dangaard Brouer
      at the last LSF/MM proposing a generic page pool for DMA-coherent pages.
      Part of his motivation was due to the overhead of allocating multiple
      order-0 that led some drivers to use high-order allocations and
      splitting them.  This is very slow in some cases.
      
      The first two patches in this series restructure the page allocator such
      that it is relatively easy to introduce an order-0 bulk page allocator.
      A patch exists to do that and has been handed over to Jesper until an
      in-kernel users is created.  The third patch prevents the per-cpu
      allocator being drained from IPI context as that can potentially corrupt
      the list after patch four is merged.  The final patch alters the per-cpu
      alloctor to make it exclusive to !irq requests.  This cuts
      allocation/free overhead by roughly 30%.
      
      Performance tests from both Jesper and me are included in the patch.
      
      This patch (of 4):
      
      buffered_rmqueue removes a page from a given zone and uses the per-cpu
      list for order-0.  This is fine but a hypothetical caller that wanted
      multiple order-0 pages has to disable/reenable interrupts multiple
      times.  This patch structures buffere_rmqueue such that it's relatively
      easy to build a bulk order-0 page allocator.  There is no functional
      change.
      
      [mgorman@techsingularity.net: failed per-cpu refill may blow up]
        Link: http://lkml.kernel.org/r/20170124112723.mshmgwq2ihxku2um@techsingularity.net
      Link: http://lkml.kernel.org/r/20170123153906.3122-2-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      066b2393
    • Johannes Weiner's avatar
      mm: vmscan: move dirty pages out of the way until they're flushed · c55e8d03
      Johannes Weiner authored
      We noticed a performance regression when moving hadoop workloads from
      3.10 kernels to 4.0 and 4.6.  This is accompanied by increased pageout
      activity initiated by kswapd as well as frequent bursts of allocation
      stalls and direct reclaim scans.  Even lowering the dirty ratios to the
      equivalent of less than 1% of memory would not eliminate the issue,
      suggesting that dirty pages concentrate where the scanner is looking.
      
      This can be traced back to recent efforts of thrash avoidance.  Where
      3.10 would not detect refaulting pages and continuously supply clean
      cache to the inactive list, a thrashing workload on 4.0+ will detect and
      activate refaulting pages right away, distilling used-once pages on the
      inactive list much more effectively.  This is by design, and it makes
      sense for clean cache.  But for the most part our workload's cache
      faults are refaults and its use-once cache is from streaming writes.  We
      end up with most of the inactive list dirty, and we don't go after the
      active cache as long as we have use-once pages around.
      
      But waiting for writes to avoid reclaiming clean cache that *might*
      refault is a bad trade-off.  Even if the refaults happen, reads are
      faster than writes.  Before getting bogged down on writeback, reclaim
      should first look at *all* cache in the system, even active cache.
      
      To accomplish this, activate pages that are dirty or under writeback
      when they reach the end of the inactive LRU.  The pages are marked for
      immediate reclaim, meaning they'll get moved back to the inactive LRU
      tail as soon as they're written back and become reclaimable.  But in the
      meantime, by reducing the inactive list to only immediately reclaimable
      pages, we allow the scanner to deactivate and refill the inactive list
      with clean cache from the active list tail to guarantee forward
      progress.
      
      [hannes@cmpxchg.org: update comment]
        Link: http://lkml.kernel.org/r/20170202191957.22872-8-hannes@cmpxchg.org
      Link: http://lkml.kernel.org/r/20170123181641.23938-6-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c55e8d03
    • Johannes Weiner's avatar
      mm: vmscan: only write dirty pages that the scanner has seen twice · 4eda4823
      Johannes Weiner authored
      Dirty pages can easily reach the end of the LRU while there are still
      clean pages to reclaim around.  Don't let kswapd write them back just
      because there are a lot of them.  It costs more CPU to find the clean
      pages, but that's almost certainly better than to disrupt writeback from
      the flushers with LRU-order single-page writes from reclaim.  And the
      flushers have been woken up by that point, so we spend IO capacity on
      flushing and CPU capacity on finding the clean cache.
      
      Only start writing dirty pages if they have cycled around the LRU twice
      now and STILL haven't been queued on the IO device.  It's possible that
      the dirty pages are so sparsely distributed across different bdis,
      inodes, memory cgroups, that the flushers take forever to get to the
      ones we want reclaimed.  Once we see them twice on the LRU, we know
      that's the quicker way to find them, so do LRU writeback.
      
      Link: http://lkml.kernel.org/r/20170123181641.23938-5-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4eda4823
    • Johannes Weiner's avatar
      mm: vmscan: remove old flusher wakeup from direct reclaim path · bbef9384
      Johannes Weiner authored
      Direct reclaim has been replaced by kswapd reclaim in pretty much all
      common memory pressure situations, so this code most likely doesn't
      accomplish the described effect anymore.  The previous patch wakes up
      flushers for all reclaimers when we encounter dirty pages at the tail
      end of the LRU.  Remove the crufty old direct reclaim invocation.
      
      Link: http://lkml.kernel.org/r/20170123181641.23938-4-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bbef9384
    • Johannes Weiner's avatar
      mm: vmscan: kick flushers when we encounter dirty pages on the LRU · 726d061f
      Johannes Weiner authored
      Memory pressure can put dirty pages at the end of the LRU without
      anybody running into dirty limits.  Don't start writing individual pages
      from kswapd while the flushers might be asleep.
      
      Unlike the old direct reclaim flusher wakeup (removed in the next patch)
      that flushes the number of pages just scanned, this patch wakes the
      flushers for all outstanding dirty pages.  That seemed to perform better
      in a synthetic test that pushes dirty pages to the end of the LRU and
      into reclaim, because we know LRU aging outstrips writeback already, and
      this way we give younger dirty pages a headstart rather than wait until
      reclaim runs into them as well.  It also means less plugging and risk of
      exhausting the struct request pool from reclaim.
      
      There is a concern that this will cause temporary files that used to get
      dirtied and truncated before writeback to now get written to disk under
      memory pressure.  If this turns out to be a real problem, we'll have to
      revisit this and tame the reclaim flusher wakeups.
      
      [hannes@cmpxchg.org: mention dirty expiration as a condition]
        Link: http://lkml.kernel.org/r/20170126174739.GA30636@cmpxchg.org
      Link: http://lkml.kernel.org/r/20170123181641.23938-3-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      726d061f
    • Johannes Weiner's avatar
      mm: vmscan: scan dirty pages even in laptop mode · 1276ad68
      Johannes Weiner authored
      Patch series "mm: vmscan: fix kswapd writeback regression".
      
      We noticed a regression on multiple hadoop workloads when moving from
      3.10 to 4.0 and 4.6, which involves kswapd getting tangled up in page
      writeout, causing direct reclaim herds that also don't make progress.
      
      I tracked it down to the thrash avoidance efforts after 3.10 that make
      the kernel better at keeping use-once cache and use-many cache sorted on
      the inactive and active list, with more aggressive protection of the
      active list as long as there is inactive cache.  Unfortunately, our
      workload's use-once cache is mostly from streaming writes.  Waiting for
      writes to avoid potential reloads in the future is not a good tradeoff.
      
      These patches do the following:
      
      1. Wake the flushers when kswapd sees a lump of dirty pages. It's
         possible to be below the dirty background limit and still have cache
         velocity push them through the LRU. So start a-flushin'.
      
      2. Let kswapd only write pages that have been rotated twice. This makes
         sure we really tried to get all the clean pages on the inactive list
         before resorting to horrible LRU-order writeback.
      
      3. Move rotating dirty pages off the inactive list. Instead of churning
         or waiting on page writeback, we'll go after clean active cache. This
         might lead to thrashing, but in this state memory demand outstrips IO
         speed anyway, and reads are faster than writes.
      
      Mel backported the series to 4.10-rc5 with one minor conflict and ran a
      couple of tests on it.  Mix of read/write random workload didn't show
      anything interesting.  Write-only database didn't show much difference
      in performance but there were slight reductions in IO -- probably in the
      noise.
      
      simoop did show big differences although not as big as Mel expected.
      This is Chris Mason's workload that similate the VM activity of hadoop.
      Mel won't go through the full details but over the samples measured
      during an hour it reported
      
                                               4.10.0-rc5            4.10.0-rc5
                                                  vanilla         johannes-v1r1
      Amean    p50-Read             21346531.56 (  0.00%) 21697513.24 ( -1.64%)
      Amean    p95-Read             24700518.40 (  0.00%) 25743268.98 ( -4.22%)
      Amean    p99-Read             27959842.13 (  0.00%) 28963271.11 ( -3.59%)
      Amean    p50-Write                1138.04 (  0.00%)      989.82 ( 13.02%)
      Amean    p95-Write             1106643.48 (  0.00%)    12104.00 ( 98.91%)
      Amean    p99-Write             1569213.22 (  0.00%)    36343.38 ( 97.68%)
      Amean    p50-Allocation          85159.82 (  0.00%)    79120.70 (  7.09%)
      Amean    p95-Allocation         204222.58 (  0.00%)   129018.43 ( 36.82%)
      Amean    p99-Allocation         278070.04 (  0.00%)   183354.43 ( 34.06%)
      Amean    final-p50-Read       21266432.00 (  0.00%) 21921792.00 ( -3.08%)
      Amean    final-p95-Read       24870912.00 (  0.00%) 26116096.00 ( -5.01%)
      Amean    final-p99-Read       28147712.00 (  0.00%) 29523968.00 ( -4.89%)
      Amean    final-p50-Write          1130.00 (  0.00%)      977.00 ( 13.54%)
      Amean    final-p95-Write       1033216.00 (  0.00%)     2980.00 ( 99.71%)
      Amean    final-p99-Write       1517568.00 (  0.00%)    32672.00 ( 97.85%)
      Amean    final-p50-Allocation    86656.00 (  0.00%)    78464.00 (  9.45%)
      Amean    final-p95-Allocation   211712.00 (  0.00%)   116608.00 ( 44.92%)
      Amean    final-p99-Allocation   287232.00 (  0.00%)   168704.00 ( 41.27%)
      
      The latencies are actually completely horrific in comparison to 4.4 (and
      4.10-rc5 is worse than 4.9 according to historical data for reasons Mel
      hasn't analysed yet).
      
      Still, 95% of write latency (p95-write) is halved by the series and
      allocation latency is way down.  Direct reclaim activity is one fifth of
      what it was according to vmstats.  Kswapd activity is higher but this is
      not necessarily surprising.  Kswapd efficiency is unchanged at 99% (99%
      of pages scanned were reclaimed) but direct reclaim efficiency went from
      77% to 99%
      
      In the vanilla kernel, 627MB of data was written back from reclaim
      context.  With the series, no data was written back.  With or without
      the patch, pages are being immediately reclaimed after writeback
      completes.  However, with the patch, only 1/8th of the pages are
      reclaimed like this.
      
      This patch (of 5):
      
      We have an elaborate dirty/writeback throttling mechanism inside the
      reclaim scanner, but for that to work the pages have to go through
      shrink_page_list() and get counted for what they are.  Otherwise, we
      mess up the LRU order and don't match reclaim speed to writeback.
      
      Especially during deactivation, there is never a reason to skip dirty
      pages; nothing is even trying to write them out from there.  Don't mess
      up the LRU order for nothing, shuffle these pages along.
      
      Link: http://lkml.kernel.org/r/20170123181641.23938-2-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1276ad68
    • Mike Rapoport's avatar
      userfaultfd: non-cooperative: selftest: enable REMOVE event test for shmem · 64527f5d
      Mike Rapoport authored
      Now when madvise(MADV_REMOVE) notifies uffd reader, we should verify
      that appliciation actually sees zeros at the removed range.
      
      Link: http://lkml.kernel.org/r/1484814154-1557-4-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64527f5d
    • Mike Rapoport's avatar
      userfaultfd: non-cooperative: add madvise() event for MADV_REMOVE request · a6bf53eb
      Mike Rapoport authored
      When a page is removed from a shared mapping, the uffd reader should be
      notified, so that it won't attempt to handle #PF events for the removed
      pages.
      
      We can reuse the UFFD_EVENT_REMOVE because from the uffd monitor point
      of view, the semantices of madvise(MADV_DONTNEED) and
      madvise(MADV_REMOVE) is exactly the same.
      
      Link: http://lkml.kernel.org/r/1484814154-1557-3-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarPavel Emelyanov <xemul@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a6bf53eb
    • Mike Rapoport's avatar
      userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE · d811914d
      Mike Rapoport authored
      Patch series "userfaultfd: non-cooperative: add madvise() event for
      MADV_REMOVE request".
      
      These patches add notification of madvise(MADV_REMOVE) event to
      non-cooperative userfaultfd monitor.
      
      The first pacth renames EVENT_MADVDONTNEED to EVENT_REMOVE along with
      relevant functions and structures.  Using _REMOVE instead of
      _MADVDONTNEED describes the event semantics more clearly and I hope it's
      not too late for such change in the ABI.
      
      This patch (of 3):
      
      The UFFD_EVENT_MADVDONTNEED purpose is to notify uffd monitor about
      removal of certain range from address space tracked by userfaultfd.
      Hence, UFFD_EVENT_REMOVE seems to better reflect the operation
      semantics.  Respectively, 'madv_dn' field of uffd_msg is renamed to
      'remove' and the madvise_userfault_dontneed callback is renamed to
      userfaultfd_remove.
      
      Link: http://lkml.kernel.org/r/1484814154-1557-2-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d811914d
    • Heiko Carstens's avatar
      memblock: embed memblock type name within struct memblock_type · 0262d9c8
      Heiko Carstens authored
      Provide the name of each memblock type with struct memblock_type.  This
      allows to get rid of the function memblock_type_name() and duplicating
      the type names in __memblock_dump_all().
      
      The only memblock_type usage out of mm/memblock.c seems to be
      arch/s390/kernel/crash_dump.c.  While at it, give it a name.
      
      Link: http://lkml.kernel.org/r/20170120123456.46508-4-heiko.carstens@de.ibm.comSigned-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0262d9c8
    • Heiko Carstens's avatar
      memblock: also dump physmem list within __memblock_dump_all · 409efd4c
      Heiko Carstens authored
      Since commit 70210ed9 ("mm/memblock: add physical memory list") the
      memblock structure knows about a physical memory list.
      
      The physical memory list should also be dumped if memblock_dump_all() is
      called in case memblock_debug is switched on.  This makes debugging a
      bit easier.
      
      Link: http://lkml.kernel.org/r/20170120123456.46508-3-heiko.carstens@de.ibm.comSigned-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      409efd4c
    • Heiko Carstens's avatar
      memblock: let memblock_type_name know about physmem type · 7409c5f7
      Heiko Carstens authored
      Since commit 70210ed9 ("mm/memblock: add physical memory list") the
      memblock structure knows about a physical memory list.
      
      memblock_type_name() should return "physmem" instead of "unknown" if the
      name of the physmem memblock_type is being asked for.
      
      Link: http://lkml.kernel.org/r/20170120123456.46508-2-heiko.carstens@de.ibm.comSigned-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7409c5f7
    • Andrew Morton's avatar
      mm/memory_hotplug.c: unexport __remove_pages() · 997126bb
      Andrew Morton authored
      It has no modular callers.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      997126bb
    • Dan Williams's avatar
      mm: validate device_hotplug is held for memory hotplug · 3fc21924
      Dan Williams authored
      mem_hotplug_begin() assumes that it can set mem_hotplug.active_writer
      and run the hotplug process without racing another thread.  Validate
      this assumption with a lockdep assertion.
      
      Link: http://lkml.kernel.org/r/148693886229.16345.1770484669403334689.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3fc21924
    • Dan Williams's avatar
      mm, devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin, done} · b5d24fda
      Dan Williams authored
      The mem_hotplug_{begin,done} lock coordinates with {get,put}_online_mems()
      to hold off "readers" of the current state of memory from new hotplug
      actions.  mem_hotplug_begin() expects exclusive access, via the
      device_hotplug lock, to set mem_hotplug.active_writer.  Calling
      mem_hotplug_begin() without locking device_hotplug can lead to
      corrupting mem_hotplug.refcount and missed wakeups / soft lockups.
      
      [dan.j.williams@intel.com: v2]
        Link: http://lkml.kernel.org/r/148728203365.38457.17804568297887708345.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/148693885680.16345.17802627926777862337.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: f931ab47 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5d24fda
    • David Rientjes's avatar
      mm, oom: header nodemask is NULL when cpusets are disabled · 299c517a
      David Rientjes authored
      Commit 82e7d3ab ("oom: print nodemask in the oom report") implicitly
      sets the allocation nodemask to cpuset_current_mems_allowed when there
      is no effective mempolicy.  cpuset_current_mems_allowed is only
      effective when cpusets are enabled, which is also printed by
      dump_header(), so setting the nodemask to cpuset_current_mems_allowed is
      redundant and prevents debugging issues where ac->nodemask is not set
      properly in the page allocator.
      
      This provides better debugging output since
      cpuset_print_current_mems_allowed() is already provided.
      
      [rientjes@google.com: newline per Hillf]
        Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701200158300.88321@chino.kir.corp.google.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701191454470.2381@chino.kir.corp.google.comSigned-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      299c517a
    • Claudio Imbrenda's avatar
      mm/ksm: improve deduplication of zero pages with colouring · e86c59b1
      Claudio Imbrenda authored
      Some architectures have a set of zero pages (coloured zero pages)
      instead of only one zero page, in order to improve the cache
      performance.  In those cases, the kernel samepage merger (KSM) would
      merge all the allocated pages that happen to be filled with zeroes to
      the same deduplicated page, thus losing all the advantages of coloured
      zero pages.
      
      This behaviour is noticeable when a process accesses large arrays of
      allocated pages containing zeroes.  A test I conducted on s390 shows
      that there is a speed penalty when KSM merges such pages, compared to
      not merging them or using actual zero pages from the start without
      breaking the COW.
      
      This patch fixes this behaviour.  When coloured zero pages are present,
      the checksum of a zero page is calculated during initialisation, and
      compared with the checksum of the current canditate during merging.  In
      case of a match, the normal merging routine is used to merge the page
      with the correct coloured zero page, which ensures the candidate page is
      checked to be equal to the target zero page.
      
      A sysfs entry is also added to toggle this behaviour, since it can
      potentially introduce performance regressions, especially on
      architectures without coloured zero pages.  The default value is
      disabled, for backwards compatibility.
      
      With this patch, the performance with KSM is the same as with non
      COW-broken actual zero pages, which is also the same as without KSM.
      
      [akpm@linux-foundation.org: make zero_checksum and ksm_use_zero_pages __read_mostly, per Andrea]
      [imbrenda@linux.vnet.ibm.com: documentation for coloured zero pages deduplication]
        Link: http://lkml.kernel.org/r/1484927522-1964-1-git-send-email-imbrenda@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/1484850953-23941-1-git-send-email-imbrenda@linux.vnet.ibm.comSigned-off-by: default avatarClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e86c59b1
    • Davidlohr Bueso's avatar
      cris: use generic current.h · 8d4a0170
      Davidlohr Bueso authored
      Given that the arch does not add its own implementations, simply use the
      asm-generic/current.h (generic-y) header instead of duplicating code.
      
      Link: http://lkml.kernel.org/r/1485992878-4780-3-git-send-email-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d4a0170
  2. 24 Feb, 2017 7 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · f1ef09fd
      Linus Torvalds authored
      Pull namespace updates from Eric Biederman:
       "There is a lot here. A lot of these changes result in subtle user
        visible differences in kernel behavior. I don't expect anything will
        care but I will revert/fix things immediately if any regressions show
        up.
      
        From Seth Forshee there is a continuation of the work to make the vfs
        ready for unpriviled mounts. We had thought the previous changes
        prevented the creation of files outside of s_user_ns of a filesystem,
        but it turns we missed the O_CREAT path. Ooops.
      
        Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
        standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
        children that are forked after the prctl are considered and not
        children forked before the prctl. The only known user of this prctl
        systemd forks all children after the prctl. So no userspace
        regressions will occur. Holding earlier forked children to the same
        rules as later forked children creates a semantic that is sane enough
        to allow checkpoing of processes that use this feature.
      
        There is a long delayed change by Nikolay Borisov to limit inotify
        instances inside a user namespace.
      
        Michael Kerrisk extends the API for files used to maniuplate
        namespaces with two new trivial ioctls to allow discovery of the
        hierachy and properties of namespaces.
      
        Konstantin Khlebnikov with the help of Al Viro adds code that when a
        network namespace exits purges it's sysctl entries from the dcache. As
        in some circumstances this could use a lot of memory.
      
        Vivek Goyal fixed a bug with stacked filesystems where the permissions
        on the wrong inode were being checked.
      
        I continue previous work on ptracing across exec. Allowing a file to
        be setuid across exec while being ptraced if the tracer has enough
        credentials in the user namespace, and if the process has CAP_SETUID
        in it's own namespace. Proc files for setuid or otherwise undumpable
        executables are now owned by the root in the user namespace of their
        mm. Allowing debugging of setuid applications in containers to work
        better.
      
        A bug I introduced with permission checking and automount is now
        fixed. The big change is to mark the mounts that the kernel initiates
        as a result of an automount. This allows the permission checks in sget
        to be safely suppressed for this kind of mount. As the permission
        check happened when the original filesystem was mounted.
      
        Finally a special case in the mount namespace is removed preventing
        unbounded chains in the mount hash table, and making the semantics
        simpler which benefits CRIU.
      
        The vfs fix along with related work in ima and evm I believe makes us
        ready to finish developing and merge fully unprivileged mounts of the
        fuse filesystem. The cleanups of the mount namespace makes discussing
        how to fix the worst case complexity of umount. The stacked filesystem
        fixes pave the way for adding multiple mappings for the filesystem
        uids so that efficient and safer containers can be implemented"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        proc/sysctl: Don't grab i_lock under sysctl_lock.
        vfs: Use upper filesystem inode in bprm_fill_uid()
        proc/sysctl: prune stale dentries during unregistering
        mnt: Tuck mounts under others instead of creating shadow/side mounts.
        prctl: propagate has_child_subreaper flag to every descendant
        introduce the walk_process_tree() helper
        nsfs: Add an ioctl() to return owner UID of a userns
        fs: Better permission checking for submounts
        exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
        vfs: open() with O_CREAT should not create inodes with unknown ids
        nsfs: Add an ioctl() to return the namespace type
        proc: Better ownership of files for non-dumpable tasks in user namespaces
        exec: Remove LSM_UNSAFE_PTRACE_CAP
        exec: Test the ptracer's saved cred to see if the tracee can gain caps
        exec: Don't reset euid and egid when the tracee has CAP_SETUID
        inotify: Convert to using per-namespace limits
      f1ef09fd
    • Linus Torvalds's avatar
      Merge tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux · ef96152e
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "This is the main drm pull request for v4.11.
      
        Nothing too major, the tinydrm and mmu-less support should make
        writing smaller drivers easier for some of the simpler platforms, and
        there are a bunch of documentation updates.
      
        Intel grew displayport MST audio support which is hopefully useful to
        people, and FBC is on by default for GEN9+ (so people know where to
        look for regressions). AMDGPU has a lot of fixes that would like new
        firmware files installed for some GPUs.
      
        Other than that it's pretty scattered all over.
      
        I may have a follow up pull request as I know BenH has a bunch of AST
        rework and fixes and I'd like to get those in once they've been tested
        by AST, and I've got at least one pull request I'm just trying to get
        the author to fix up.
      
        Core:
         - drm_mm reworked
         - Connector list locking and iterators
         - Documentation updates
         - Format handling rework
         - MMU-less support for fbdev helpers
         - drm_crtc_from_index helper
         - Core CRC API
         - Remove drm_framebuffer_unregister_private
         - Debugfs cleanup
         - EDID/Infoframe fixes
         - Release callback
         - Tinydrm support (smaller drivers for simple hw)
      
        panel:
         - Add support for some new simple panels
      
        i915:
         - FBC by default for gen9+
         - Shared dpll cleanups and docs
         - GEN8 powerdomain cleanup
         - DMC support on GLK
         - DP MST audio support
         - HuC loading support
         - GVT init ordering fixes
         - GVT IOMMU workaround fix
      
        amdgpu/radeon:
         - Power/clockgating improvements
         - Preliminary SR-IOV support
         - TTM buffer priority and eviction fixes
         - SI DPM quirks removed due to firmware fixes
         - Powerplay improvements
         - VCE/UVD powergating fixes
         - Cleanup SI GFX code to match CI/VI
         - Support for > 2 displays on 3/5 crtc asics
         - SI headless fixes
      
        nouveau:
         - Rework securre boot code in prep for GP10x secure boot
         - Channel recovery improvements
         - Initial power budget code
         - MMU rework preperation
      
        vmwgfx:
         - Bunch of fixes and cleanups
      
        exynos:
         - Runtime PM support for MIC driver
         - Cleanups to use atomic helpers
         - UHD Support for TM2/TM2E boards
         - Trigger mode fix for Rinato board
      
        etnaviv:
         - Shader performance fix
         - Command stream validator fixes
         - Command buffer suballocator
      
        rockchip:
         - CDN DisplayPort support
         - IOMMU support for arm64 platform
      
        imx-drm:
         - Fix i.MX5 TV encoder probing
         - Remove lower fb size limits
      
        msm:
         - Support for HW cursor on MDP5 devices
         - DSI encoder cleanup
         - GPU DT bindings cleanup
      
        sti:
         - stih410 cleanups
         - Create fbdev at binding
         - HQVDP fixes
         - Remove stih416 chip functionality
         - DVI/HDMI mode selection fixes
         - FPS statistic reporting
      
        omapdrm:
         - IRQ code cleanup
      
        dwi-hdmi bridge:
         - Cleanups and fixes
      
        adv-bridge:
         - Updates for nexus
      
        sii8520 bridge:
         - Add interlace mode support
         - Rework HDMI and lots of fixes
      
        qxl:
         - probing/teardown cleanups
      
        ZTE drm:
         - HDMI audio via SPDIF interface
         - Video Layer overlay plane support
         - Add TV encoder output device
      
        atmel-hlcdc:
         - Rework fbdev creation logic
      
        tegra:
         - OF node fix
      
        fsl-dcu:
         - Minor fixes
      
        mali-dp:
         - Assorted fixes
      
        sunxi:
         - Minor fix"
      
      [ This was the "fixed" pull, that still had build warnings due to people
        not even having build tested the result. I'm not a happy camper
      
        I've fixed the things I noticed up in this merge.      - Linus ]
      
      * tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits)
        lib/Kconfig: make PRIME_NUMBERS not user selectable
        drm/tinydrm: helpers: Properly fix backlight dependency
        drm/tinydrm: mipi-dbi: Fix field width specifier warning
        drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized
        drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files
        drm/amd/powerplay: fix PSI feature on Polars12
        drm/amdgpu: refuse to reserve io mem for split VRAM buffers
        drm/ttm: fix use-after-free races in vm fault handling
        drm/tinydrm: Add support for Multi-Inno MI0283QT display
        dt-bindings: Add Multi-Inno MI0283QT binding
        dt-bindings: display/panel: Add common rotation property
        of: Add vendor prefix for Multi-Inno
        drm/tinydrm: Add MIPI DBI support
        drm/tinydrm: Add helper functions
        drm: Add DRM support for tiny LCD displays
        drm/amd/amdgpu: post card if there is real hw resetting performed
        drm/nouveau/tmr: provide backtrace when a timeout is hit
        drm/nouveau/pci/g92: Fix rearm
        drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios
        drm/nouveau/hwmon: expose power_max and power_crit
        ..
      ef96152e
    • Dave Airlie's avatar
      lib/Kconfig: make PRIME_NUMBERS not user selectable. · 64a57719
      Dave Airlie authored
      Linus doesn't like it user selectable, so kill it until
      someone needs it for something else.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      64a57719
    • Noralf Trønnes's avatar
      drm/tinydrm: helpers: Properly fix backlight dependency · 7fef80a4
      Noralf Trønnes authored
      BACKLIGHT_CLASS_DEVICE was selected in the last version of the
      tinydrm patchset to fix the backlight dependency, but the
      ifdef CONFIG_BACKLIGHT_CLASS_DEVICE was forgotten. Fix that.
      Signed-off-by: default avatarNoralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      7fef80a4
    • Noralf Trønnes's avatar
      drm/tinydrm: mipi-dbi: Fix field width specifier warning · ce8c0137
      Noralf Trønnes authored
      This warning is seen on 64-bit builds in functions:
         'mipi_dbi_typec1_command':
         'mipi_dbi_typec3_command_read':
         'mipi_dbi_typec3_command':
      
      >> drivers/gpu/drm/tinydrm/mipi-dbi.c:65:20: warning: field width specifier '*' expects argument of type 'int', but argument 5 has type 'size_t {aka long unsigned int}' [-Wformat=]
            DRM_DEBUG_DRIVER("cmd=%02x, par=%*ph\n", cmd, len, data); \
                             ^
         include/drm/drmP.h:228:40: note: in definition of macro 'DRM_DEBUG_DRIVER'
           drm_printk(KERN_DEBUG, DRM_UT_DRIVER, fmt, ##__VA_ARGS__)
                                                 ^~~
      >> drivers/gpu/drm/tinydrm/mipi-dbi.c:671:2: note: in expansion of macro 'MIPI_DBI_DEBUG_COMMAND'
           MIPI_DBI_DEBUG_COMMAND(cmd, parameters, num);
           ^~~~~~~~~~~~~~~~~~~~~~
      
      Fix by casting 'len' to int in the macro MIPI_DBI_DEBUG_COMMAND().
      There is no chance of overflow.
      Signed-off-by: default avatarNoralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      ce8c0137
    • Noralf Trønnes's avatar
      drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized · b401f343
      Noralf Trønnes authored
      Fix this warning:
      drivers/gpu/drm/tinydrm/mipi-dbi.c: In function ‘mipi_dbi_debugfs_command_write’:
      drivers/gpu/drm/tinydrm/mipi-dbi.c:905:8: warning: ‘cmd’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        ret = mipi_dbi_command_buf(mipi, cmd, parameters, i);
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      cmd can't be used uninitialized, but to satisfy the compiler,
      initialize it to zero.
      Signed-off-by: default avatarNoralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      b401f343
    • Linus Torvalds's avatar
      Merge tag 'usercopy-v4.11-rc1.fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · d5500a07
      Linus Torvalds authored
      Pull usercopy test fix from Kees Cook:
       "Fix for non-MMU ARM testing, from Arnd Bergmann"
      
      * tag 'usercopy-v4.11-rc1.fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        usercopy: ARM NOMMU has no 64-bit get_user
      d5500a07
  3. 23 Feb, 2017 11 commits
    • Linus Torvalds's avatar
      Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · b2e3c431
      Linus Torvalds authored
      Pull ARM SoC driver updates from Arnd Bergmann:
       "Driver updates for ARM SoCs.
      
        A handful of driver changes this time around. The larger changes are:
      
         - Reset drivers for hi3660 and zx2967
      
         - AHCI driver for Davinci, acked by Tejun and brought in here due to
           platform dependencies
      
         - Cleanups of atmel-ebi (External Bus Interface)
      
         - Tweaks for Rockchip GRF (General Register File) usage (kitchensink
           misc register range on the SoCs)
      
         - PM domains changes for support of two new ZTE SoCs (zx296718 and
           zx2967)"
      
      * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (53 commits)
        soc: samsung: pmu: Add register defines for pad retention control
        reset: make zx2967 explicitly non-modular
        reset: core: fix reset_control_put
        soc: samsung: pm_domains: Read domain name from the new label property
        soc: samsung: pm_domains: Remove message about failed memory allocation
        soc: samsung: pm_domains: Remove unused name field
        soc: samsung: pm_domains: Use full names in subdomains registration log
        sata: ahci-da850: un-hardcode the MPY bits
        sata: ahci-da850: add a workaround for controller instability
        sata: ahci: export ahci_do_hardreset() locally
        sata: ahci-da850: implement a workaround for the softreset quirk
        sata: ahci-da850: add device tree match table
        sata: ahci-da850: get the sata clock using a connection id
        soc: samsung: pmu: Remove duplicated define for ARM_L2_OPTION register
        memory: atmel-ebi: Enable the SMC clock if specified
        soc: samsung: pmu: Remove unused and duplicated defines
        memory: atmel-ebi: Properly handle multiple reference to the same CS
        memory: atmel-ebi: Fix the test to enable generic SMC logic
        soc: samsung: pm_domains: Add new Exynos5433 compatible
        soc: samsung: pmu: Add dummy support for Exynos5433 SoC
        ...
      b2e3c431
    • Linus Torvalds's avatar
      Merge tag 'armsoc-dt64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · c61c15e0
      Linus Torvalds authored
      Pull ARM 64-bit DT updates from Arnd Bergmann:
       "ARM64 DT updates are fairly small this time, only two new SoCs and a
        handful of new machines get added, all of them similar to other
        hardware we already support.
      
        New SoC:
      
         - HiSilicon Kirin960/Hi3660 and HiKey960 development board
      
         - NXP LS1012a with three reference boards:
              http://www.nxp.com/products/microcontrollers-and-processors/arm-processors/qoriq-layerscape-arm-processors/qoriq-layerscape-1012a-low-power-communication-processor:LS1012A
      
        New development board:
      
         - Banana Pi M64, based on Allwinner A64:
              http://www.banana-pi.org/m64.html
      
         - SolidRun MACCHIATOBin based on Marvell Armada 8K:
              https://www.solid-run.com/marvell-armada-family/armada-8040-community-board/
      
         - Broadcom BCM958712DxXMC NorthStar2 reference board (another one)
      
        A lot of platforms improve support for existing machines by adding
        extra devices for which a binding and driver is availabe:
      
        Allwinner:
         - MMC, USB
      
        ARM Juno:
         - Coresight, STM
      
        Broadcom:
         - NS2 GICv2m irqchip and PCIe
      
        Marvell:
         - Armada 3700 SPI, I2C, ethernet switch
      
        Mediatek:
         - MT8173 thermal
      
        NXP i.MX:
         - LS1046A thermal
      
        Qualcomm:
         - coresight on MSM8916, HDMI, WCNSS, SCM
      
        Renesas:
         - r8a779[56] thermal, powerdomain, ethernet, sound, pwm, can, can fd
      
        Rockchip:
         - thermal, eDP, pinctrl enhancements
      
        Samsung:
         - TM2 touchkey, Exynos5433 HDMI and power management improvements
      
        UniPhier:
         - SD reset, eMMC controller
      
        ZTE:
         - oppv2 cpufreq"
      
      * tag 'armsoc-dt64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (110 commits)
        arm64: dts: qcom: Add msm8916 CoreSight components
        arm64: dts: marvell: adjust name of sd-mmc-gop clock in syscon
        arm64: allwinner: add BananaPi-M64 support
        arm64: allwinner: a64: add UART1 pin nodes
        arm64: allwinner: pine64: add MMC support
        arm64: allwinner: a64: Increase the MMC max frequency
        arm64: allwinner: a64: Add MMC pinctrl nodes
        arm64: allwinner: a64: Add MMC nodes
        dt-bindings: clockgen: Add compatible string for LS1012A
        Documentation: DT: add LS1012A compatible for SCFG and DCFG
        Documentation: DT: Add entry for FSL LS1012A RDB, FRDM, QDS boards
        arm64: dts: marvell: add generic-ahci compatibles for CP110 ahci
        arm64: tegra: Use symbolic reset identifiers
        arm64: dts: r8a7796: Mark EthernetAVB device node disabled
        arm64: dts: r8a7795: Mark EthernetAVB device node disabled
        arm64: dts: r8a7795: tidyup audma definition order
        arm64: dts: r8a7796: Link ARM GIC to clock and clock domain
        arm64: dts: r8a7795: Link ARM GIC to clock and clock domain
        arm64: dts: r8a7796: Add R-Car Gen3 thermal support
        arm64: dts: r8a7795: Add R-Car Gen3 thermal support
        ...
      c61c15e0
    • Linus Torvalds's avatar
      Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 195849ea
      Linus Torvalds authored
      Pull ARM DT updates from Arnd Bergmann:
       "A total of 380 patches this time, mostly adding support for more
        hardware in the device tree descriptions. There is not much exciting
        here for 4.11, but I've tried my best to condense the information from
        the pull requests I got into a readable summary.
      
        Noteworthy changes to existing platforms include:
      
         - The GIC memory map was a bit wrong almost everywhere and now gets
           fixed up
      
         - The Allwinner platforms convert to the generic pinmux properties
      
         - The Marvell EBU platforms now use the new DSA binding
      
         - Samsung Exynos4212 was unused and gets removed
      
         - The Renesas power management got improved
      
        New production machines:
      
         - Lego Mindstorms EV3:
              https://www.lego.com/en-us/mindstorms/about-ev3
      
         - Beelink X2 Android media box:
              http://linux-sunxi.org/Beelink_X2
      
         - "Romulus" baseboard management controller for OpenPower
      
         - Axentia TSE-850 Data Radio Channel (DARC) encoder:
              http://www.axentia.se/db/equipment.html
      
         - Luxul XAP-1410 and XWR-1200 wireless access points:
              https://luxul.com/xap-1410
      
        New SoCs:
      
         - Allwinner H2+ and V3s, both minor variations of already supported
           chips:
              http://www.allwinnertech.com/index.php?c=product&a=index&id=38
      
         - Marvell Prestera DX packet processors based on Armada XP
           architecture:
              http://www.marvell.com/switching/prestera-dx/
      
         - Samsung Exynos4412 Prime gets added, a minor variation of
           Exynos4412
      
        New developer and reference boards:
      
         - Lichee Pi One, Lichee Pi Zero and Orange Pi Zero, all based on
           Allwinner SoCs:
              http://linux-sunxi.org/LicheePi_One
              http://www.orangepi.org/orangepizero/
      
         - SAMA5d36ek Reference platform:
              http://www.atmel.com/tools/sama5d36-ek.aspx
      
         - Beaglebone Green Wireless and Black Wireless:
              https://beagleboard.org/black-wireless
              https://beagleboard.org/green-wireless
      
         - phyCORE-AM335x System on Module:
              http://phytec.com/products/system-on-modules/phycore/am335x/
      
         - New revision of "vf610-zii" Zodiac Inflight Innovations board
      
         - Various i.MX System-on-Module: Is.IoT MX6UL, SavageBoard, Engicam
           i.Core:
              http://www.opossom.com/english/index.html
              http://www.savageboard.org/
              http://www.engicam.com/en/products/embedded/som/sodimm/is-iot-mx6ul
              http://www.engicam.com/en/products/embedded/som/sodimm/i-core-m6s-dl-d-q
      
         - Liebherr (LWN) monitor 6 based on i.MX6 Quad, no idea what this is
      
         - Cleanups and bugfixes on at91, bcm53xx, i.MX, mvebu, omap, oxnas,
           qcom, rockchip, sti, stm32 and tegra
      
        New device supports added to some boards and SoCs, briefly by platform:
      
         - Allwinner: SPDIF, A33 cpufreq, A33 Mali GPU
      
         - Aspeed: network, ipmi bt, gpio, pinmux
      
         - Broadcom: video encoder for raspberry pi, qspi, ethernet, sd/mmc
      
         - TI DaVinci: gpio, lcdc, usb, video-in, uart
      
         - TI Keystone 2: MSM RAM, power/reset, uart
      
         - Mediatek MT2701: clocks, iommu, spi, nand, adc, thermal
      
         - Marvell EBU: ethernet switch on Turris Omnia
      
         - NXP i.MX: otp ram, USB, wifi, bluetooth, spdif, spi, pmic, eeprom,
           mmc, nand
      
         - TI OMAP:
      
         - Qualcomm: coresight, gyro/accelerometer, hdmi
      
         - Renesas: pmic, soc-id
      
         - Rockchip: qos
      
         - Samsung: audio on Odroid-X
      
         - Socfpga: FPGA manager, i2c, led, can, watchdog, nand, power monitor
      
         - STi: video in/out
      
         - STM32: timer, pwm, i2c, rtc, add, i2s
      
         - NVIDIA Tegra: tpm
      
         - Uniphier: mmc/sd pinmux"
      
      * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (380 commits)
        ARM: dts: armada-385-linksys: fix DSA compatible property
        ARM: dts: Fix typo in armada-xp-98dx4251
        ARM: DTS: Fix register map for virt-capable GIC
        dt-bindings: arm,gic: Fix binding example for a virt-capable GIC
        ARM: dts: sun8i: sinlinx: Enable audio nodes
        ARM: dts: sun8i: parrot: Enable audio nodes
        ARM: dts: sun8i: Add audio codec, dai and card for A33
        ARM: dts: Add EMAC AXI settings for Arria10
        ARM: dts: am335x-chiliboard: Support charger
        ARM: dts: am335x-chiliboard: Support power button
        ARM: sun8i: dt: Add mali node
        dt-bindings: gpu: Add Mali Utgard bindings
        ARM: dts: stm32: Add I2C1 support for STM32429 eval board
        ARM: dts: stm32: Add I2C1 support for STM32F429 SoC
        ARM: dts: stm32: Use clock DT binding definition on stm32f429 family
        dt-bindings: mfd: stm32f4: Add missing binding definition
        dt-bindings: mfd: stm32f4: Fix STM32F4_X_CLOCK() macro
        ARM: dts: stm32: Enable pwm1 and pwm3 for stm32f469-disco
        ARM: dts: stm32: add Timers driver for stm32f429 MCU
        ARM: dts: add the AB8500 sysclk to the device trees
        ...
      195849ea
    • Linus Torvalds's avatar
      Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 54fff785
      Linus Torvalds authored
      Pull ARM SoC defconfig updates from Arnd Bergmann:
       "Defconfig additions, removals, etc. Almost all of them just turn on
        drivers that we want on some platform, usually after the driver has
        been merged into mainline.
      
        There is now a new defconfig file for tango4"
      
      * tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits)
        ARM: multi_v7_defconfig: enable pstore configs
        ARM: multi_v7_defconfig: enable some newly added crypto modules
        ARM: davinci_all_defconfig: enable SATA modules
        arm64: defconfig: enable CONFIG_MTD_NAND and CONFIG_MTD_NAND_DENALI_DT
        arm64: defconfig: enable CONFIG_MTD_BLOCK
        ARM: Import tango4_defconfig
        ARM: omap2plus_defconfig: Enable support for RTC M41T80
        ARM: omap2plus_defconfig: Enable support for micrell phys
        ARM: vf610m4: defconfig: enable EXT4 filesystem
        ARM: omap2plus_defconfig: Fix probe errors on UARTs 5 and 6
        arm64: defconfig: Enable NUMA and NUMA_BALANCING
        arm64: defconfig: enable SMMUv3 config
        ARM: davinci_all_defconfig: enable iio
        ARM: Keystone: Enable ARCH_HAS_RESET_CONTROLLER
        ARM: configs: stm32: Add RTC support in STM32 defconfig
        ARM: defconfig: qcom: add APQ8060 DragonBoard devices
        ARM: qcom_defconfig: enable thermal sensors
        ARM: qcom_defconfig: add ahci configs
        ARM: qcom_defconfig: add pcie and atl1c ethernet configs
        ARM: qcom_defconfig: add usb related configs
        ...
      54fff785
    • Linus Torvalds's avatar
      Merge tag 'armsoc-arm64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · c35675f3
      Linus Torvalds authored
      Pull ARM SoC 64-bit updates from Arnd Bergmann:
       "Changes to platform code for 64-bit ARM platforms, only trivial stuff
        this time, a few defconfig changes to enable drivers, and a new entry
        for the Cavium ThunderX2 platform"
      
      * tag 'armsoc-arm64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        MAINTAINERS: Add Cavium ThunderX2 entry
        arm64: add ARCH_THUNDER2 to defconfig
        arm64: add THUNDER2 processor family
        MAINTAINERS: Extend ARM/Mediatek SoC support section
        arm64: defconfig: enable CONFIG_MMC_SDHCI_CADENCE
        arm64: defconfig: enable XORv2 for Marvell Armada 7K/8K
      c35675f3
    • Linus Torvalds's avatar
      Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 6ae52c65
      Linus Torvalds authored
      Pull ARM SoC platform updates from Arnd Bergmann:
       "In the SoC branch we normally collect classic arch/arm/mach-*
        contents, i.e. C code changes for SoC platforms. This release cycle
        the diffstat is quite nice, in that we're removing 3x the amount of
        code that's being added.
      
        The main reason for this is that there's a removal of camera drivers
        for Freescale i.MX chips (driver was removed so the device
        registration isn't needed any more). There's also removal of display
        initialization code for OMAP that is no longer needed.
      
        The rest are mostly minor tweaks and cleanups; constification on
        Samsung platforms, cleanup of ux500 platform data, purge of other
        unused platform data/device seutp on i.MX and other good stuff.
      
        New SoC support this cycle is for two Allwinner platforms, H2+ and
        V3s"
      
      * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (55 commits)
        ARM: ux500: remove deleted file from Makefile
        ARM: ep93xx: Disable TS-72xx watchdog before uncompressing
        ARM: ux500: cut some platform data
        MAINTAINERS: Update for the current location of the bcm2835 tree.
        ARM: davinci: remove BUG_ON() from da850_register_sata()
        ARM: davinci: da850: model the SATA refclk
        ARM: davinci: da850: add con_id for the SATA clock
        ARM: davinci: da8xx-dt: add OF_DEV_AUXDATA entry for SATA
        arm: mvebu: support for SMP on 98DX3336 SoC
        dt-bindings: video: exynos7-decon: Remove obsolete samsung,power-domain property
        soc: dove: constify reset_control_ops structures
        ARM: mv78xx0: fix possible PCI buffer overflow
        MAINTAINERS: transfer maintainership for the EZX platform
        ARM: shmobile: rcar-gen2: Add more register documentation
        ARM: tegra: paz00: Fix __initdata placement
        ARM: OMAP: clock: Remove unused mpurate cmdline option
        ARM: davinci: add skeleton for pdata-quirks
        arm: sunxi: add support for V3s SoC
        ARM: OMAP2+: omap_hwmod: Add support for earlycon
        arm: hisi: drop extern hip01_cpu_die
        ...
      6ae52c65
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes-nc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · af8999f6
      Linus Torvalds authored
      Pull ARM SoC non-urgent fixes from Arnd Bergmann:
       "We sometimes collect non-critical fixes that come in during the later
        part of the merge window in a branch for the next release instead, and
        this is that contents for v4.11.
      
        Most of these are OMAP fixes, dealing with OMAP36/37 detection, quirks
        and setup. There's also some fixes for Davinci and a Kconfig fix for
        SCPI to only enable on ARM{,64}"
      
      * tag 'armsoc-fixes-nc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        firmware: arm_scpi: Add hardware dependencies
        ARM: OMAP3: Fix SoC detection of OMAP36/37 Family
        ARM: OMAP5: Add HWMOD_SWSUP_SIDLE_ACT flag for UART
        ARM: dts: Fix compatible for ti81xx uarts for 8250
        ARM: dts: Fix am335x and dm814x scm syscon to probe children
        ARM: OMAP2+: Fix init for multiple quirks for the same SoC
        ARM: dts: Fix omap3 off mode pull defines
        bus: da850-mstpri: fix my e-mail address
        ARM: davinci: da850: fix da850_set_pll0rate()
        ARM: davinci: da850: coding style fix
      af8999f6
    • Dave Airlie's avatar
      Merge branch 'drm-next-4.11' of git://people.freedesktop.org/~agd5f/linux into drm-next · 1e8ad3d8
      Dave Airlie authored
      Some ttm/amd fixes.
      
      * 'drm-next-4.11' of git://people.freedesktop.org/~agd5f/linux:
        drm/amd/powerplay: fix PSI feature on Polars12.
        drm/amdgpu: refuse to reserve io mem for split VRAM buffers
        drm/ttm: fix use-after-free races in vm fault handling
        drm/amd/amdgpu: post card if there is real hw resetting performed
      1e8ad3d8
    • Dave Airlie's avatar
      Merge tag 'drm/panel/for-4.11-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next · 894ebc41
      Dave Airlie authored
      drm/panel: Changes for v4.11-rc1
      
      This set contains a couple of cleanups as well as support for a few more
      simple panels.
      
      * tag 'drm/panel/for-4.11-rc1' of git://anongit.freedesktop.org/tegra/linux:
        drm/panel: simple: Specify bus width and flags for EDT displays
        drm/panel: simple: Add Netron DY E231732
        of: Add vendor prefix for Netron DY
        drm/panel: simple: Add support for Tianma TM070JDHG30
        of: Add vendor prefix for Tianma Micro-electronics
        drm/panel: simple: Add support BOE NV101WXMN51
        dt-bindings: display: Add BOE NV101WXMN51 panel binding
        drm/panel: Constify device node argument to of_drm_find_panel()
      894ebc41
    • Dave Airlie's avatar
      Merge tag 'drm/tegra/for-4.11-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next · 84f7174b
      Dave Airlie authored
      drm/tegra: Changes for v4.11-rc1
      
      Just a single change that hooks up the Tegra DRM parent device to the
      correct device tree node.
      
      * tag 'drm/tegra/for-4.11-rc1' of git://anongit.freedesktop.org/tegra/linux:
        gpu: host1x: Set OF node for new host1x devices
      84f7174b
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 60e8d3e1
      Linus Torvalds authored
      Pull PCI updates from Bjorn Helgaas:
      
       - add ASPM L1 substate support
      
       - enable PCIe Extended Tags when supported
      
       - configure PCIe MPS settings on iProc, Versatile, X-Gene, and Xilinx
      
       - increase VPD access timeout
      
       - add ACS quirks for Intel Union Point, Qualcomm QDF2400 and QDF2432
      
       - use new pci_irq_alloc_vectors() in more drivers
      
       - fix MSI affinity memory leak
      
       - remove unused MSI interfaces and update documentation
      
       - remove unused AER .link_reset() callback
      
       - avoid pci_lock / p->pi_lock deadlock seen with perf
      
       - serialize sysfs enable/disable num_vfs operations
      
       - move DesignWare IP from drivers/pci/host/ to drivers/pci/dwc/ and
         refactor so we can support both hosts and endpoints
      
       - add DT ECAM-like support for HiSilicon Hip06/Hip07 controllers
      
       - add Rockchip system power management support
      
       - add Thunder-X cn81xx and cn83xx support
      
       - add Exynos 5440 PCIe PHY support
      
      * tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (93 commits)
        PCI: dwc: Remove dependency of designware on CONFIG_PCI
        PCI: dwc: Add CONFIG_PCIE_DW_HOST to enable PCI dwc host
        PCI: dwc: Split pcie-designware.c into host and core files
        PCI: dwc: designware: Fix style errors in pcie-designware.c
        PCI: dwc: designware: Parse "num-lanes" property in dw_pcie_setup_rc()
        PCI: dwc: all: Split struct pcie_port into host-only and core structures
        PCI: dwc: designware: Get device pointer at the start of dw_pcie_host_init()
        PCI: dwc: all: Rename cfg_read/cfg_write to read/write
        PCI: dwc: all: Use platform_set_drvdata() to save private data
        PCI: dwc: designware: Move register defines to designware header file
        PCI: dwc: Use PTR_ERR_OR_ZERO to simplify code
        PCI: dra7xx: Group PHY API invocations
        PCI: dra7xx: Enable MSI and legacy interrupts simultaneously
        PCI: dra7xx: Add support to force RC to work in GEN1 mode
        PCI: dra7xx: Simplify probe code with devm_gpiod_get_optional()
        PCI: Move DesignWare IP support to new drivers/pci/dwc/ directory
        PCI: exynos: Support the PHY generic framework
        Documentation: binding: Modify the exynos5440 PCIe binding
        phy: phy-exynos-pcie: Add support for Exynos PCIe PHY
        Documentation: samsung-phy: Add exynos-pcie-phy binding
        ...
      60e8d3e1