1. 12 Aug, 2020 11 commits
    • Joonsoo Kim's avatar
      mm/swap: implement workingset detection for anonymous LRU · aae466b0
      Joonsoo Kim authored
      This patch implements workingset detection for anonymous LRU.  All the
      infrastructure is implemented by the previous patches so this patch just
      activates the workingset detection by installing/retrieving the shadow
      entry and adding refault calculation.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Link: http://lkml.kernel.org/r/1595490560-15117-6-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aae466b0
    • Joonsoo Kim's avatar
      mm/swapcache: support to handle the shadow entries · 3852f676
      Joonsoo Kim authored
      Workingset detection for anonymous page will be implemented in the
      following patch and it requires to store the shadow entries into the
      swapcache.  This patch implements an infrastructure to store the shadow
      entry in the swapcache.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: http://lkml.kernel.org/r/1595490560-15117-5-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3852f676
    • Joonsoo Kim's avatar
      mm/workingset: prepare the workingset detection infrastructure for anon LRU · 170b04b7
      Joonsoo Kim authored
      To prepare the workingset detection for anon LRU, this patch splits
      workingset event counters for refault, activate and restore into anon and
      file variants, as well as the refaults counter in struct lruvec.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Link: http://lkml.kernel.org/r/1595490560-15117-4-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      170b04b7
    • Joonsoo Kim's avatar
      mm/vmscan: protect the workingset on anonymous LRU · b518154e
      Joonsoo Kim authored
      In current implementation, newly created or swap-in anonymous page is
      started on active list.  Growing active list results in rebalancing
      active/inactive list so old pages on active list are demoted to inactive
      list.  Hence, the page on active list isn't protected at all.
      
      Following is an example of this situation.
      
      Assume that 50 hot pages on active list.  Numbers denote the number of
      pages on active/inactive list (active | inactive).
      
      1. 50 hot pages on active list
      50(h) | 0
      
      2. workload: 50 newly created (used-once) pages
      50(uo) | 50(h)
      
      3. workload: another 50 newly created (used-once) pages
      50(uo) | 50(uo), swap-out 50(h)
      
      This patch tries to fix this issue.  Like as file LRU, newly created or
      swap-in anonymous pages will be inserted to the inactive list.  They are
      promoted to active list if enough reference happens.  This simple
      modification changes the above example as following.
      
      1. 50 hot pages on active list
      50(h) | 0
      
      2. workload: 50 newly created (used-once) pages
      50(h) | 50(uo)
      
      3. workload: another 50 newly created (used-once) pages
      50(h) | 50(uo), swap-out 50(uo)
      
      As you can see, hot pages on active list would be protected.
      
      Note that, this implementation has a drawback that the page cannot be
      promoted and will be swapped-out if re-access interval is greater than the
      size of inactive list but less than the size of total(active+inactive).
      To solve this potential issue, following patch will apply workingset
      detection similar to the one that's already applied to file LRU.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Link: http://lkml.kernel.org/r/1595490560-15117-3-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b518154e
    • Joonsoo Kim's avatar
      mm/vmscan: make active/inactive ratio as 1:1 for anon lru · ccc5dc67
      Joonsoo Kim authored
      Patch series "workingset protection/detection on the anonymous LRU list", v7.
      
      * PROBLEM
      In current implementation, newly created or swap-in anonymous page is
      started on the active list.  Growing the active list results in
      rebalancing active/inactive list so old pages on the active list are
      demoted to the inactive list.  Hence, hot page on the active list isn't
      protected at all.
      
      Following is an example of this situation.
      
      Assume that 50 hot pages on active list and system can contain total 100
      pages.  Numbers denote the number of pages on active/inactive list (active
      | inactive).  (h) stands for hot pages and (uo) stands for used-once
      pages.
      
      1. 50 hot pages on active list
      50(h) | 0
      
      2. workload: 50 newly created (used-once) pages
      50(uo) | 50(h)
      
      3. workload: another 50 newly created (used-once) pages
      50(uo) | 50(uo), swap-out 50(h)
      
      As we can see, hot pages are swapped-out and it would cause swap-in later.
      
      * SOLUTION
      Since this is what we want to avoid, this patchset implements workingset
      protection.  Like as the file LRU list, newly created or swap-in anonymous
      page is started on the inactive list.  Also, like as the file LRU list, if
      enough reference happens, the page will be promoted.  This simple
      modification changes the above example as following.
      
      1. 50 hot pages on active list
      50(h) | 0
      
      2. workload: 50 newly created (used-once) pages
      50(h) | 50(uo)
      
      3. workload: another 50 newly created (used-once) pages
      50(h) | 50(uo), swap-out 50(uo)
      
      hot pages remains in the active list. :)
      
      * EXPERIMENT
      I tested this scenario on my test bed and confirmed that this problem
      happens on current implementation. I also checked that it is fixed by
      this patchset.
      
      * SUBJECT
      workingset detection
      
      * PROBLEM
      Later part of the patchset implements the workingset detection for the
      anonymous LRU list.  There is a corner case that workingset protection
      could cause thrashing.  If we can avoid thrashing by workingset detection,
      we can get the better performance.
      
      Following is an example of thrashing due to the workingset protection.
      
      1. 50 hot pages on active list
      50(h) | 0
      
      2. workload: 50 newly created (will be hot) pages
      50(h) | 50(wh)
      
      3. workload: another 50 newly created (used-once) pages
      50(h) | 50(uo), swap-out 50(wh)
      
      4. workload: 50 (will be hot) pages
      50(h) | 50(wh), swap-in 50(wh)
      
      5. workload: another 50 newly created (used-once) pages
      50(h) | 50(uo), swap-out 50(wh)
      
      6. repeat 4, 5
      
      Without workingset detection, this kind of workload cannot be promoted and
      thrashing happens forever.
      
      * SOLUTION
      Therefore, this patchset implements workingset detection.  All the
      infrastructure for workingset detecion is already implemented, so there is
      not much work to do.  First, extend workingset detection code to deal with
      the anonymous LRU list.  Then, make swap cache handles the exceptional
      value for the shadow entry.  Lastly, install/retrieve the shadow value
      into/from the swap cache and check the refault distance.
      
      * EXPERIMENT
      I made a test program to imitates above scenario and confirmed that
      problem exists.  Then, I checked that this patchset fixes it.
      
      My test setup is a virtual machine with 8 cpus and 6100MB memory.  But,
      the amount of the memory that the test program can use is about 280 MB.
      This is because the system uses large ram-backed swap and large ramdisk to
      capture the trace.
      
      Test scenario is like as below.
      
      1. allocate cold memory (512MB)
      2. allocate hot-1 memory (96MB)
      3. activate hot-1 memory (96MB)
      4. allocate another hot-2 memory (96MB)
      5. access cold memory (128MB)
      6. access hot-2 memory (96MB)
      7. repeat 5, 6
      
      Since hot-1 memory (96MB) is on the active list, the inactive list can
      contains roughly 190MB pages.  hot-2 memory's re-access interval (96+128
      MB) is more 190MB, so it cannot be promoted without workingset detection
      and swap-in/out happens repeatedly.  With this patchset, workingset
      detection works and promotion happens.  Therefore, swap-in/out occurs
      less.
      
      Here is the result. (average of 5 runs)
      
      type swap-in swap-out
      base 863240 989945
      patch 681565 809273
      
      As we can see, patched kernel do less swap-in/out.
      
      * OVERALL TEST (ebizzy using modified random function)
      ebizzy is the test program that main thread allocates lots of memory and
      child threads access them randomly during the given times.  Swap-in will
      happen if allocated memory is larger than the system memory.
      
      The random function that represents the zipf distribution is used to make
      hot/cold memory.  Hot/cold ratio is controlled by the parameter.  If the
      parameter is high, hot memory is accessed much larger than cold one.  If
      the parameter is low, the number of access on each memory would be
      similar.  I uses various parameters in order to show the effect of
      patchset on various hot/cold ratio workload.
      
      My test setup is a virtual machine with 8 cpus, 1024 MB memory and 5120 MB
      ram swap.
      
      Result format is as following.
      
      param: 1-1024-0.1
      - 1 (number of thread)
      - 1024 (allocated memory size, MB)
      - 0.1 (zipf distribution alpha,
      0.1 works like as roughly uniform random,
      1.3 works like as small portion of memory is hot and the others are cold)
      
      pswpin: smaller is better
      std: standard deviation
      improvement: negative is better
      
      * single thread
                 param        pswpin       std       improvement
            base 1-1024.0-0.1 14101983.40   79441.19
            prot 1-1024.0-0.1 14065875.80  136413.01  (   -0.26 )
          detect 1-1024.0-0.1 13910435.60  100804.82  (   -1.36 )
            base 1-1024.0-0.7 7998368.80   43469.32
            prot 1-1024.0-0.7 7622245.80   88318.74  (   -4.70 )
          detect 1-1024.0-0.7 7618515.20   59742.07  (   -4.75 )
            base 1-1024.0-1.3 1017400.80   38756.30
            prot 1-1024.0-1.3  940464.60   29310.69  (   -7.56 )
          detect 1-1024.0-1.3  945511.40   24579.52  (   -7.07 )
            base 1-1280.0-0.1 22895541.40   50016.08
            prot 1-1280.0-0.1 22860305.40   51952.37  (   -0.15 )
          detect 1-1280.0-0.1 22705565.20   93380.35  (   -0.83 )
            base 1-1280.0-0.7 13717645.60   46250.65
            prot 1-1280.0-0.7 12935355.80   64754.43  (   -5.70 )
          detect 1-1280.0-0.7 13040232.00   63304.00  (   -4.94 )
            base 1-1280.0-1.3 1654251.40    4159.68
            prot 1-1280.0-1.3 1522680.60   33673.50  (   -7.95 )
          detect 1-1280.0-1.3 1599207.00   70327.89  (   -3.33 )
            base 1-1536.0-0.1 31621775.40   31156.28
            prot 1-1536.0-0.1 31540355.20   62241.36  (   -0.26 )
          detect 1-1536.0-0.1 31420056.00  123831.27  (   -0.64 )
            base 1-1536.0-0.7 19620760.60   60937.60
            prot 1-1536.0-0.7 18337839.60   56102.58  (   -6.54 )
          detect 1-1536.0-0.7 18599128.00   75289.48  (   -5.21 )
            base 1-1536.0-1.3 2378142.40   20994.43
            prot 1-1536.0-1.3 2166260.60   48455.46  (   -8.91 )
          detect 1-1536.0-1.3 2183762.20   16883.24  (   -8.17 )
            base 1-1792.0-0.1 40259714.80   90750.70
            prot 1-1792.0-0.1 40053917.20   64509.47  (   -0.51 )
          detect 1-1792.0-0.1 39949736.40  104989.64  (   -0.77 )
            base 1-1792.0-0.7 25704884.40   69429.68
            prot 1-1792.0-0.7 23937389.00   79945.60  (   -6.88 )
          detect 1-1792.0-0.7 24271902.00   35044.30  (   -5.57 )
            base 1-1792.0-1.3 3129497.00   32731.86
            prot 1-1792.0-1.3 2796994.40   19017.26  (  -10.62 )
          detect 1-1792.0-1.3 2886840.40   33938.82  (   -7.75 )
            base 1-2048.0-0.1 48746924.40   50863.88
            prot 1-2048.0-0.1 48631954.40   24537.30  (   -0.24 )
          detect 1-2048.0-0.1 48509419.80   27085.34  (   -0.49 )
            base 1-2048.0-0.7 32046424.40   78624.22
            prot 1-2048.0-0.7 29764182.20   86002.26  (   -7.12 )
          detect 1-2048.0-0.7 30250315.80  101282.14  (   -5.60 )
            base 1-2048.0-1.3 3916723.60   24048.55
            prot 1-2048.0-1.3 3490781.60   33292.61  (  -10.87 )
          detect 1-2048.0-1.3 3585002.20   44942.04  (   -8.47 )
      
      * multi thread
                 param        pswpin       std       improvement
            base 8-1024.0-0.1 16219822.60  329474.01
            prot 8-1024.0-0.1 15959494.00  654597.45  (   -1.61 )
          detect 8-1024.0-0.1 15773790.80  502275.25  (   -2.75 )
            base 8-1024.0-0.7 9174107.80  537619.33
            prot 8-1024.0-0.7 8571915.00  385230.08  (   -6.56 )
          detect 8-1024.0-0.7 8489484.20  364683.00  (   -7.46 )
            base 8-1024.0-1.3 1108495.60   83555.98
            prot 8-1024.0-1.3 1038906.20   63465.20  (   -6.28 )
          detect 8-1024.0-1.3  941817.80   32648.80  (  -15.04 )
            base 8-1280.0-0.1 25776114.20  450480.45
            prot 8-1280.0-0.1 25430847.00  465627.07  (   -1.34 )
          detect 8-1280.0-0.1 25282555.00  465666.55  (   -1.91 )
            base 8-1280.0-0.7 15218968.00  702007.69
            prot 8-1280.0-0.7 13957947.80  492643.86  (   -8.29 )
          detect 8-1280.0-0.7 14158331.20  238656.02  (   -6.97 )
            base 8-1280.0-1.3 1792482.80   30512.90
            prot 8-1280.0-1.3 1577686.40   34002.62  (  -11.98 )
          detect 8-1280.0-1.3 1556133.00   22944.79  (  -13.19 )
            base 8-1536.0-0.1 33923761.40  575455.85
            prot 8-1536.0-0.1 32715766.20  300633.51  (   -3.56 )
          detect 8-1536.0-0.1 33158477.40  117764.51  (   -2.26 )
            base 8-1536.0-0.7 20628907.80  303851.34
            prot 8-1536.0-0.7 19329511.20  341719.31  (   -6.30 )
          detect 8-1536.0-0.7 20013934.00  385358.66  (   -2.98 )
            base 8-1536.0-1.3 2588106.40  130769.20
            prot 8-1536.0-1.3 2275222.40   89637.06  (  -12.09 )
          detect 8-1536.0-1.3 2365008.40  124412.55  (   -8.62 )
            base 8-1792.0-0.1 43328279.20  946469.12
            prot 8-1792.0-0.1 41481980.80  525690.89  (   -4.26 )
          detect 8-1792.0-0.1 41713944.60  406798.93  (   -3.73 )
            base 8-1792.0-0.7 27155647.40  536253.57
            prot 8-1792.0-0.7 24989406.80  502734.52  (   -7.98 )
          detect 8-1792.0-0.7 25524806.40  263237.87  (   -6.01 )
            base 8-1792.0-1.3 3260372.80  137907.92
            prot 8-1792.0-1.3 2879187.80   63597.26  (  -11.69 )
          detect 8-1792.0-1.3 2892962.20   33229.13  (  -11.27 )
            base 8-2048.0-0.1 50583989.80  710121.48
            prot 8-2048.0-0.1 49599984.40  228782.42  (   -1.95 )
          detect 8-2048.0-0.1 50578596.00  660971.66  (   -0.01 )
            base 8-2048.0-0.7 33765479.60  812659.55
            prot 8-2048.0-0.7 30767021.20  462907.24  (   -8.88 )
          detect 8-2048.0-0.7 32213068.80  211884.24  (   -4.60 )
            base 8-2048.0-1.3 3941675.80   28436.45
            prot 8-2048.0-1.3 3538742.40   76856.08  (  -10.22 )
          detect 8-2048.0-1.3 3579397.80   58630.95  (   -9.19 )
      
      As we can see, all the cases show improvement.  Especially, test case with
      zipf distribution 1.3 show more improvements.  It means that if there is a
      hot/cold tendency in anon pages, this patchset works better.
      
      This patch (of 6):
      
      Current implementation of LRU management for anonymous page has some
      problems.  Most important one is that it doesn't protect the workingset,
      that is, pages on the active LRU list.  Although, this problem will be
      fixed in the following patchset, the preparation is required and this
      patch does it.
      
      What following patch does is to implement workingset protection.  After
      the following patchset, newly created or swap-in pages will start their
      lifetime on the inactive list.  If inactive list is too small, there is
      not enough chance to be referenced and the page cannot become the
      workingset.
      
      In order to provide the newly anonymous or swap-in pages enough chance to
      be referenced again, this patch makes active/inactive LRU ratio as 1:1.
      
      This is just a temporary measure.  Later patch in the series introduces
      workingset detection for anonymous LRU that will be used to better decide
      if pages should start on the active and inactive list.  Afterwards this
      patch is effectively reverted.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/1595490560-15117-1-git-send-email-iamjoonsoo.kim@lge.com
      Link: http://lkml.kernel.org/r/1595490560-15117-2-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ccc5dc67
    • Muchun Song's avatar
      mm/hugetlb: add mempolicy check in the reservation routine · 8ca39e68
      Muchun Song authored
      In the reservation routine, we only check whether the cpuset meets the
      memory allocation requirements.  But we ignore the mempolicy of MPOL_BIND
      case.  If someone mmap hugetlb succeeds, but the subsequent memory
      allocation may fail due to mempolicy restrictions and receives the SIGBUS
      signal.  This can be reproduced by the follow steps.
      
       1) Compile the test case.
          cd tools/testing/selftests/vm/
          gcc map_hugetlb.c -o map_hugetlb
      
       2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the
          system. Each node will pre-allocate one huge page.
          echo 2 > /proc/sys/vm/nr_hugepages
      
       3) Run test case(mmap 4MB). We receive the SIGBUS signal.
          numactl --membind=3D0 ./map_hugetlb 4
      
      With this patch applied, the mmap will fail in the step 3) and throw
      "mmap: Cannot allocate memory".
      
      [akpm@linux-foundation.org: include sched.h for `current']
      Reported-by: default avatarJianchao Guo <guojianchao@bytedance.com>
      Suggested-by: default avatarMichal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Baoquan He <bhe@redhat.com>
      Link: http://lkml.kernel.org/r/20200728034938.14993-1-songmuchun@bytedance.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ca39e68
    • Roman Gushchin's avatar
      kselftests: cgroup: add perpcu memory accounting test · 90631e1d
      Roman Gushchin authored
      Add a simple test to check the percpu memory accounting.  The test creates
      a cgroup tree with 1000 child cgroups and checks values of memory.current
      and memory.stat::percpu.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Bixuan Cui <cuibixuan@huawei.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200608230819.832349-6-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90631e1d
    • Roman Gushchin's avatar
      mm: memcg: charge memcg percpu memory to the parent cgroup · 3e38e0aa
      Roman Gushchin authored
      Memory cgroups are using large chunks of percpu memory to store vmstat
      data.  Yet this memory is not accounted at all, so in the case when there
      are many (dying) cgroups, it's not exactly clear where all the memory is.
      
      Because the size of memory cgroup internal structures can dramatically
      exceed the size of object or page which is pinning it in the memory, it's
      not a good idea to simply ignore it.  It actually breaks the isolation
      between cgroups.
      
      Let's account the consumed percpu memory to the parent cgroup.
      
      [guro@fb.com: add WARN_ON_ONCE()s, per Johannes]
        Link: http://lkml.kernel.org/r/20200811170611.GB1507044@carbon.DHCP.thefacebook.comSigned-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Bixuan Cui <cuibixuan@huawei.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200623184515.4132564-5-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e38e0aa
    • Roman Gushchin's avatar
      mm: memcg/percpu: per-memcg percpu memory statistics · 772616b0
      Roman Gushchin authored
      Percpu memory can represent a noticeable chunk of the total memory
      consumption, especially on big machines with many CPUs.  Let's track
      percpu memory usage for each memcg and display it in memory.stat.
      
      A percpu allocation is usually scattered over multiple pages (and nodes),
      and can be significantly smaller than a page.  So let's add a byte-sized
      counter on the memcg level: MEMCG_PERCPU_B.  Byte-sized vmstat infra
      created for slabs can be perfectly reused for percpu case.
      
      [guro@fb.com: v3]
        Link: http://lkml.kernel.org/r/20200623184515.4132564-4-guro@fb.comSigned-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Bixuan Cui <cuibixuan@huawei.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200608230819.832349-4-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      772616b0
    • Roman Gushchin's avatar
      mm: memcg/percpu: account percpu memory to memory cgroups · 3c7be18a
      Roman Gushchin authored
      Percpu memory is becoming more and more widely used by various subsystems,
      and the total amount of memory controlled by the percpu allocator can make
      a good part of the total memory.
      
      As an example, bpf maps can consume a lot of percpu memory, and they are
      created by a user.  Also, some cgroup internals (e.g.  memory controller
      statistics) can be quite large.  On a machine with many CPUs and big
      number of cgroups they can consume hundreds of megabytes.
      
      So the lack of memcg accounting is creating a breach in the memory
      isolation.  Similar to the slab memory, percpu memory should be accounted
      by default.
      
      To implement the perpcu accounting it's possible to take the slab memory
      accounting as a model to follow.  Let's introduce two types of percpu
      chunks: root and memcg.  What makes memcg chunks different is an
      additional space allocated to store memcg membership information.  If
      __GFP_ACCOUNT is passed on allocation, a memcg chunk should be be used.
      If it's possible to charge the corresponding size to the target memory
      cgroup, allocation is performed, and the memcg ownership data is recorded.
      System-wide allocations are performed using root chunks, so there is no
      additional memory overhead.
      
      To implement a fast reparenting of percpu memory on memcg removal, we
      don't store mem_cgroup pointers directly: instead we use obj_cgroup API,
      introduced for slab accounting.
      
      [akpm@linux-foundation.org: fix CONFIG_MEMCG_KMEM=n build errors and warning]
      [akpm@linux-foundation.org: move unreachable code, per Roman]
      [cuibixuan@huawei.com: mm/percpu: fix 'defined but not used' warning]
        Link: http://lkml.kernel.org/r/6d41b939-a741-b521-a7a2-e7296ec16219@huawei.comSigned-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarBixuan Cui <cuibixuan@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Bixuan Cui <cuibixuan@huawei.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200623184515.4132564-3-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c7be18a
    • Roman Gushchin's avatar
      percpu: return number of released bytes from pcpu_free_area() · 5b32af91
      Roman Gushchin authored
      Patch series "mm: memcg accounting of percpu memory", v3.
      
      This patchset adds percpu memory accounting to memory cgroups.  It's based
      on the rework of the slab controller and reuses concepts and features
      introduced for the per-object slab accounting.
      
      Percpu memory is becoming more and more widely used by various subsystems,
      and the total amount of memory controlled by the percpu allocator can make
      a good part of the total memory.
      
      As an example, bpf maps can consume a lot of percpu memory, and they are
      created by a user.  Also, some cgroup internals (e.g.  memory controller
      statistics) can be quite large.  On a machine with many CPUs and big
      number of cgroups they can consume hundreds of megabytes.
      
      So the lack of memcg accounting is creating a breach in the memory
      isolation.  Similar to the slab memory, percpu memory should be accounted
      by default.
      
      Percpu allocations by their nature are scattered over multiple pages, so
      they can't be tracked on the per-page basis.  So the per-object tracking
      introduced by the new slab controller is reused.
      
      The patchset implements charging of percpu allocations, adds memcg-level
      statistics, enables accounting for percpu allocations made by memory
      cgroup internals and provides some basic tests.
      
      To implement the accounting of percpu memory without a significant memory
      and performance overhead the following approach is used: all accounted
      allocations are placed into a separate percpu chunk (or chunks).  These
      chunks are similar to default chunks, except that they do have an attached
      vector of pointers to obj_cgroup objects, which is big enough to save a
      pointer for each allocated object.  On the allocation, if the allocation
      has to be accounted (__GFP_ACCOUNT is passed, the allocating process
      belongs to a non-root memory cgroup, etc), the memory cgroup is getting
      charged and if the maximum limit is not exceeded the allocation is
      performed using a memcg-aware chunk.  Otherwise -ENOMEM is returned or the
      allocation is forced over the limit, depending on gfp (as any other kernel
      memory allocation).  The memory cgroup information is saved in the
      obj_cgroup vector at the corresponding offset.  On the release time the
      memcg information is restored from the vector and the cgroup is getting
      uncharged.  Unaccounted allocations (at this point the absolute majority
      of all percpu allocations) are performed in the old way, so no additional
      overhead is expected.
      
      To avoid pinning dying memory cgroups by outstanding allocations,
      obj_cgroup API is used instead of directly saving memory cgroup pointers.
      obj_cgroup is basically a pointer to a memory cgroup with a standalone
      reference counter.  The trick is that it can be atomically swapped to
      point at the parent cgroup, so that the original memory cgroup can be
      released prior to all objects, which has been charged to it.  Because all
      charges and statistics are fully recursive, it's perfectly correct to
      uncharge the parent cgroup instead.  This scheme is used in the slab
      memory accounting, and percpu memory can just follow the scheme.
      
      This patch (of 5):
      
      To implement accounting of percpu memory we need the information about the
      size of freed object.  Return it from pcpu_free_area().
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Waiman Long <longman@redhat.com>
      cC: Michal Koutnýutny@suse.com>
      Cc: Bixuan Cui <cuibixuan@huawei.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200623184515.4132564-1-guro@fb.com
      Link: http://lkml.kernel.org/r/20200608230819.832349-1-guro@fb.com
      Link: http://lkml.kernel.org/r/20200608230819.832349-2-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b32af91
  2. 11 Aug, 2020 6 commits
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux · 00e4db51
      Linus Torvalds authored
      Pull perf tools updates from Arnaldo Carvalho de Melo:
       "New features:
      
         - Introduce controlling how 'perf stat' and 'perf record' works via a
           control file descriptor, allowing starting with events configured
           but disabled until commands are received via the control file
           descriptor. This allows, for instance for tools such as Intel VTune
           to make further use of perf as its Linux platform driver.
      
         - Improve 'perf record' to to register in a perf.data file header the
           clockid used to help later correlate things like syslog files and
           perf events recorded.
      
         - Add basic syscall and find_next_bit benchmarks to 'perf bench'.
      
         - Allow using computed metrics in calculating other metrics. For
           instance:
      
      	  {
      	    .metric_expr    = "l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit",
      	    .metric_name    = "DCache_L2_All_Hits",
      	  },
      	  {
      	    .metric_expr    = "max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss",
      	    .metric_name    = "DCache_L2_All_Miss",
      	  },
      	  {
      	     .metric_expr    = "dcache_l2_all_hits + dcache_l2_all_miss",
      	     .metric_name    = "DCache_L2_All",
      	  }
      
         - Add suport for 'd_ratio', '>' and '<' operators to the expression
           resolver used in calculating metrics in 'perf stat'.
      
        Support for new kernel features:
      
         - Support TEXT_POKE and KSYMBOL_TYPE_OOL perf metadata events to cope
           with things like ftrace, trampolines, i.e. changes in the kernel
           text that gets in the way of properly decoding Intel PT hardware
           traces, for instance.
      
        Intel PT:
      
         - Add various knobs to reduce the volume of Intel PT traces by
           reducing the level of details such as decoding just some types of
           packets (e.g., FUP/TIP, PSB+), also filtering by time range.
      
         - Add new itrace options (log flags to the 'd' option, error flags to
           the 'e' one, etc), controlling how Intel PT is transformed into
           perf events, document some missing options (e.g., how to synthesize
           callchains).
      
        BPF:
      
         - Properly report BPF errors when parsing events.
      
         - Do not setup side-band events if LIBBPF is not linked, fixing a
           segfault.
      
        Libraries:
      
         - Improvements to the libtraceevent plugin mechanism.
      
         - Improve libtracevent support for KVM trace events SVM exit reasons.
      
         - Add a libtracevent plugins for decoding syscalls/sys_enter_futex
           and for tlb_flush.
      
         - Ensure sample_period is set libpfm4 events in 'perf test'.
      
         - Fixup libperf namespacing, to make sure what is in libperf has the
           perf_ namespace while what is now only in tools/perf/ doesn't use
           that prefix.
      
        Arch specific:
      
         - Improve the testing of vendor events and metrics in 'perf test'.
      
         - Allow no ARM CoreSight hardware tracer sink to be specified on
           command line.
      
         - Fix arm_spe_x recording when mixed with other perf events.
      
         - Add s390 idle functions 'psw_idle' and 'psw_idle_exit' to list of
           idle symbols.
      
         - List kernel supplied event aliases for arm64 in 'perf list'.
      
         - Add support for extended register capability in PowerPC 9 and 10.
      
         - Added nest IMC power9 metric events.
      
        Miscellaneous:
      
         - No need to setup sample_regs_intr/sample_regs_user for dummy
           events.
      
         - Update various copies of kernel headers, some causing perf to
           handle new syscalls, MSRs, etc.
      
         - Improve usage of flex and yacc, enabling warnings and addressing
           the fallout.
      
         - Add missing '--output' option to 'perf kmem' so that it can pass it
           along to 'perf record'.
      
         - 'perf probe' fixes related to adding multiple probes on the same
           address for the same event.
      
         - Make 'perf probe' warn if the target function is a GNU indirect
           function.
      
         - Remove //anon mmap events from 'perf inject jit' to fix supporting
           both using ELF files for generated functions and the perf-PID.map
           approaches"
      
      * tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (144 commits)
        perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set
        perf tools powerpc: Add support for extended regs in power10
        perf tools powerpc: Add support for extended register capability
        tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
        tools arch x86: Sync asm/cpufeatures.h with the kernel sources
        tools arch x86: Sync the msr-index.h copy with the kernel sources
        tools headers UAPI: update linux/in.h copy
        tools headers API: Update close_range affected files
        perf script: Add 'tod' field to display time of day
        perf script: Change the 'enum perf_output_field' enumerators to be 64 bits
        perf data: Add support to store time of day in CTF data conversion
        perf tools: Move clockid_res_ns under clock struct
        perf header: Store clock references for -k/--clockid option
        perf tools: Add clockid_name function
        perf clockid: Move parse_clockid() to new clockid object
        tools lib traceevent: Handle possible strdup() error in tep_add_plugin_path() API
        libtraceevent: Fixed description of tep_add_plugin_path() API
        libtraceevent: Fixed type in PRINT_FMT_STING
        libtraceevent: Fixed broken indentation in parse_ip4_print_args()
        libtraceevent: Improve error handling of tep_plugin_add_option() API
        ...
      00e4db51
    • Linus Torvalds's avatar
      Merge tag 'ktest-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · ed3854ff
      Linus Torvalds authored
      Pull ktest updates from Steven Rostedt:
      
       - Have config-bisect save the good/bad configs at each step.
      
       - Show log file location even on success
      
       - Add PRE_TEST_DIE to kill test if the PRE_TEST fails
      
       - Add a NOT operator for conditionals in config file
      
       - Add the log output of the last test when emailing on failure.
      
       - Other minor clean ups and small fixes.
      
      * tag 'ktest-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest.pl: Fix spelling mistake "Cant" -> "Can't"
        ktest.pl: Change the logic to control the size of the log file emailed
        ktest.pl: Add MAIL_MAX_SIZE to limit the amount of log emailed
        ktest.pl: Add the log of last test in email on failure
        ktest.pl: Turn off buffering to the log file
        ktest.pl: Just open up the log file once
        ktest.pl: Add a NOT operator
        ktest.pl: Define PRE_TEST_DIE to kill the test if the PRE_TEST fails
        ktest.pl: Always show log file location if defined even on success
        ktest.pl: Have config-bisect save each config used in the bisect
      ed3854ff
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 97d052ea
      Linus Torvalds authored
      Pull locking updates from Thomas Gleixner:
       "A set of locking fixes and updates:
      
         - Untangle the header spaghetti which causes build failures in
           various situations caused by the lockdep additions to seqcount to
           validate that the write side critical sections are non-preemptible.
      
         - The seqcount associated lock debug addons which were blocked by the
           above fallout.
      
           seqcount writers contrary to seqlock writers must be externally
           serialized, which usually happens via locking - except for strict
           per CPU seqcounts. As the lock is not part of the seqcount, lockdep
           cannot validate that the lock is held.
      
           This new debug mechanism adds the concept of associated locks.
           sequence count has now lock type variants and corresponding
           initializers which take a pointer to the associated lock used for
           writer serialization. If lockdep is enabled the pointer is stored
           and write_seqcount_begin() has a lockdep assertion to validate that
           the lock is held.
      
           Aside of the type and the initializer no other code changes are
           required at the seqcount usage sites. The rest of the seqcount API
           is unchanged and determines the type at compile time with the help
           of _Generic which is possible now that the minimal GCC version has
           been moved up.
      
           Adding this lockdep coverage unearthed a handful of seqcount bugs
           which have been addressed already independent of this.
      
           While generally useful this comes with a Trojan Horse twist: On RT
           kernels the write side critical section can become preemtible if
           the writers are serialized by an associated lock, which leads to
           the well known reader preempts writer livelock. RT prevents this by
           storing the associated lock pointer independent of lockdep in the
           seqcount and changing the reader side to block on the lock when a
           reader detects that a writer is in the write side critical section.
      
         - Conversion of seqcount usage sites to associated types and
           initializers"
      
      * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
        locking/seqlock, headers: Untangle the spaghetti monster
        locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
        x86/headers: Remove APIC headers from <asm/smp.h>
        seqcount: More consistent seqprop names
        seqcount: Compress SEQCNT_LOCKNAME_ZERO()
        seqlock: Fold seqcount_LOCKNAME_init() definition
        seqlock: Fold seqcount_LOCKNAME_t definition
        seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
        hrtimer: Use sequence counter with associated raw spinlock
        kvm/eventfd: Use sequence counter with associated spinlock
        userfaultfd: Use sequence counter with associated spinlock
        NFSv4: Use sequence counter with associated spinlock
        iocost: Use sequence counter with associated spinlock
        raid5: Use sequence counter with associated spinlock
        vfs: Use sequence counter with associated spinlock
        timekeeping: Use sequence counter with associated raw spinlock
        xfrm: policy: Use sequence counters with associated lock
        netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
        netfilter: conntrack: Use sequence counter with associated spinlock
        sched: tasks: Use sequence counter with associated spinlock
        ...
      97d052ea
    • Linus Torvalds's avatar
      Merge tag 'f2fs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 086ba2ec
      Linus Torvalds authored
      Pull f2fs updates from Jaegeuk Kim:
       "In this round, we've added two small interfaces: (a) GC_URGENT_LOW
        mode for performance and (b) F2FS_IOC_SEC_TRIM_FILE ioctl for
        security.
      
        The new GC mode allows Android to run some lower priority GCs in
        background, while new ioctl discards user information without race
        condition when the account is removed.
      
        In addition, some patches were merged to address latency-related
        issues. We've fixed some compression-related bug fixes as well as edge
        race conditions.
      
        Enhancements:
         - add GC_URGENT_LOW mode in gc_urgent
         - introduce F2FS_IOC_SEC_TRIM_FILE ioctl
         - bypass racy readahead to improve read latencies
         - shrink node_write lock coverage to avoid long latency
      
        Bug fixes:
         - fix missing compression flag control, i_size, and mount option
         - fix deadlock between quota writes and checkpoint
         - remove inode eviction path in synchronous path to avoid deadlock
         - fix to wait GCed compressed page writeback
         - fix a kernel panic in f2fs_is_compressed_page
         - check page dirty status before writeback
         - wait page writeback before update in node page write flow
         - fix a race condition between f2fs_write_end_io and f2fs_del_fsync_node_entry
      
        We've added some minor sanity checks and refactored trivial code
        blocks for better readability and debugging information"
      
      * tag 'f2fs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
        f2fs: prepare a waiter before entering io_schedule
        f2fs: update_sit_entry: Make the judgment condition of f2fs_bug_on more intuitive
        f2fs: replace test_and_set/clear_bit() with set/clear_bit()
        f2fs: make file immutable even if releasing zero compression block
        f2fs: compress: disable compression mount option if compression is off
        f2fs: compress: add sanity check during compressed cluster read
        f2fs: use macro instead of f2fs verity version
        f2fs: fix deadlock between quota writes and checkpoint
        f2fs: correct comment of f2fs_exist_written_data
        f2fs: compress: delay temp page allocation
        f2fs: compress: fix to update isize when overwriting compressed file
        f2fs: space related cleanup
        f2fs: fix use-after-free issue
        f2fs: Change the type of f2fs_flush_inline_data() to void
        f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctl
        f2fs: should avoid inode eviction in synchronous path
        f2fs: segment.h: delete a duplicated word
        f2fs: compress: fix to avoid memory leak on cc->cpages
        f2fs: use generic names for generic ioctls
        f2fs: don't keep meta inode pages used for compressed block migration
        ...
      086ba2ec
    • Linus Torvalds's avatar
      Merge tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 8c2618a6
      Linus Torvalds authored
      Pull gfs2 updates from Andreas Gruenbacher:
      
       - Make sure transactions won't be started recursively in
         gfs2_block_zero_range (bug introduced in 5.4 when switching to
         iomap_zero_range)
      
       - Fix a glock holder refcount leak introduced in the iopen glock
         locking scheme rework merged in 5.8.
      
       - A few other small improvements (debugging, stack usage, comment
         fixes).
      
      * tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: When gfs2_dirty_inode gets a glock error, dump the glock
        gfs2: Never call gfs2_block_zero_range with an open transaction
        gfs2: print details on transactions that aren't properly ended
        gfs2: Fix inaccurate comment
        fs: Fix typo in comment
        gfs2: Fix refcount leak in gfs2_glock_poke
        gfs2: Pass glock holder to gfs2_file_direct_{read,write}
        gfs2: Add some flags missing from glock output
      8c2618a6
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs · 163c3e3d
      Linus Torvalds authored
      Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
       "JFFS2:
         - Fix for a corner case while mounting
         - Fix for an use-after-free issue
      
        UBI:
         - Fix for a memory load while attaching
         - Don't produce an anchor PEB with fastmap being disabled
      
        UBIFS:
         - Fix for orphan inode logic
         - Spelling fixes
         - New mount option to specify filesystem version"
      
      * tag 'for-linus-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
        jffs2: fix UAF problem
        jffs2: fix jffs2 mounting failure
        ubifs: Fix wrong orphan node deletion in ubifs_jnl_update|rename
        ubi: fastmap: Free fastmap next anchor peb during detach
        ubi: fastmap: Don't produce the initial next anchor PEB when fastmap is disabled
        ubifs: misc.h: delete a duplicated word
        ubifs: add option to specify version for new file systems
      163c3e3d
  3. 10 Aug, 2020 8 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 4bcf69e5
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - an update to Elan touchpad controller driver supporting newer ICs
         with enhanced precision reports and a new firmware update process
      
       - an update to EXC3000 touch controller supporting additional parts
      
       - assorted driver fixups
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (27 commits)
        Input: exc3000 - add support to query model and fw_version
        Input: exc3000 - add reset gpio support
        Input: exc3000 - add EXC80H60 and EXC80H84 support
        dt-bindings: touchscreen: Convert EETI EXC3000 touchscreen to json-schema
        Input: sentelic - fix error return when fsp_reg_write fails
        Input: alps - remove redundant assignment to variable ret
        Input: ims-pcu - return error code rather than -ENOMEM
        Input: elan_i2c - add ic type 0x15
        Input: atmel_mxt_ts - only read messages in mxt_acquire_irq() when necessary
        Input: uinput - fix typo in function name documentation
        Input: ati_remote2 - add missing newlines when printing module parameters
        Input: psmouse - add a newline when printing 'proto' by sysfs
        Input: synaptics-rmi4 - drop a duplicated word
        Input: elan_i2c - add support for high resolution reports
        Input: elan_i2c - do not constantly re-query pattern ID
        Input: elan_i2c - add firmware update info for ICs 0x11, 0x13, 0x14
        Input: elan_i2c - handle firmware updated on newer ICs
        Input: elan_i2c - add support for different firmware page sizes
        Input: elan_i2c - fix detecting IAP version on older controllers
        Input: elan_i2c - handle devices with patterns above 1
        ...
      4bcf69e5
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · b7b8e368
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
      
       - fix for some modern devices that return multi-byte battery report,
         from Grant Likely
      
       - fix for devices with Resolution Multiplier, from Peter Hutterer
      
       - device probing speed increase, from Dmitry Torokhov
      
       - ThinkPad 10 Ultrabook Keyboard support, from Hans de Goede
      
       - other small assorted fixes and device ID additions
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: quirks: add NOGET quirk for Logitech GROUP
        HID: Replace HTTP links with HTTPS ones
        HID: udraw-ps3: Replace HTTP links with HTTPS ones
        HID: mcp2221: Replace HTTP links with HTTPS ones
        HID: input: Fix devices that return multiple bytes in battery report
        HID: lenovo: Fix spurious F23 key press report during resume from suspend
        HID: lenovo: Add ThinkPad 10 Ultrabook Keyboard fn_lock support
        HID: lenovo: Add ThinkPad 10 Ultrabook Keyboard support
        HID: lenovo: Rename fn_lock sysfs attr handlers to make them generic
        HID: lenovo: Factor out generic parts of the LED code
        HID: lenovo: Merge tpkbd and cptkbd data structures
        HID: intel-ish-hid: Replace PCI_DEV_FLAGS_NO_D3 with pci_save_state
        HID: Wiimote: Treat the d-pad as an analogue stick
        HID: input: do not run GET_REPORT unless there's a Resolution Multiplier
        HID: usbhid: remove redundant assignment to variable retval
        HID: usbhid: do not sleep when opening device
      b7b8e368
    • Colin Ian King's avatar
      ff131eff
    • Steven Rostedt (VMware)'s avatar
      ktest.pl: Change the logic to control the size of the log file emailed · 855d8abd
      Steven Rostedt (VMware) authored
      If the log file for a given test is larger than the max size given then use
      set the seek from the end of the log file instead of from the start of the
      test.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      855d8abd
    • Jiri Kosina's avatar
      e6b6e19a
    • Jiri Kosina's avatar
      Merge branch 'for-5.9/lenovo' into for-linus · ccac9cec
      Jiri Kosina authored
      - ThinkPad 10 Ultrabook Keyboard support, from Hans de Goede
      ccac9cec
    • Jiri Kosina's avatar
      cd6cad55
    • Jiri Kosina's avatar
      Merge branch 'for-5.9/core-v2' into for-linus · a66eebd7
      Jiri Kosina authored
      - fix for some modern devices that return multi-byte battery report, from
        Grant Likely
      - fix for devices with Resolution Multiplier, from Peter Hutterer
      - device probing speed increase, from Dmitry Torokhov
      a66eebd7
  4. 09 Aug, 2020 15 commits
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · fc80c51f
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - run the checker (e.g. sparse) after the compiler
      
       - remove unneeded cc-option tests for old compiler flags
      
       - fix tar-pkg to install dtbs
      
       - introduce ccflags-remove-y and asflags-remove-y syntax
      
       - allow to trace functions in sub-directories of lib/
      
       - introduce hostprogs-always-y and userprogs-always-y syntax
      
       - various Makefile cleanups
      
      * tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base
        kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled
        kbuild: introduce hostprogs-always-y and userprogs-always-y
        kbuild: sort hostprogs before passing it to ifneq
        kbuild: move host .so build rules to scripts/gcc-plugins/Makefile
        kbuild: Replace HTTP links with HTTPS ones
        kbuild: trace functions in subdirectories of lib/
        kbuild: introduce ccflags-remove-y and asflags-remove-y
        kbuild: do not export LDFLAGS_vmlinux
        kbuild: always create directories of targets
        powerpc/boot: add DTB to 'targets'
        kbuild: buildtar: add dtbs support
        kbuild: remove cc-option test of -ffreestanding
        kbuild: remove cc-option test of -fno-stack-protector
        Revert "kbuild: Create directory for target DTB"
        kbuild: run the checker after the compiler
      fc80c51f
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6 · 7a6b6044
      Linus Torvalds authored
      Pull NFS server updates from Chuck Lever:
       "Highlights:
         - Support for user extended attributes on NFS (RFC 8276)
         - Further reduce unnecessary NFSv4 delegation recalls
      
        Notable fixes:
         - Fix recent krb5p regression
         - Address a few resource leaks and a rare NULL dereference
      
        Other:
         - De-duplicate RPC/RDMA error handling and other utility functions
         - Replace storage and display of kernel memory addresses by tracepoints"
      
      * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
        svcrdma: CM event handler clean up
        svcrdma: Remove transport reference counting
        svcrdma: Fix another Receive buffer leak
        SUNRPC: Refresh the show_rqstp_flags() macro
        nfsd: netns.h: delete a duplicated word
        SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
        nfsd: avoid a NULL dereference in __cld_pipe_upcall()
        nfsd4: a client's own opens needn't prevent delegations
        nfsd: Use seq_putc() in two functions
        svcrdma: Display chunk completion ID when posting a rw_ctxt
        svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
        svcrdma: Introduce Send completion IDs
        svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
        svcrdma: Introduce Receive completion IDs
        svcrdma: Introduce infrastructure to support completion IDs
        svcrdma: Add common XDR encoders for RDMA and Read segments
        svcrdma: Add common XDR decoders for RDMA and Read segments
        SUNRPC: Add helpers for decoding list discriminators symbolically
        svcrdma: Remove declarations for functions long removed
        svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
        ...
      7a6b6044
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 8d3e09b4
      Linus Torvalds authored
      Pull regset conversion fix from Al Viro:
       "Fix a regression from an unnoticed bisect hazard in the regset series.
      
        A bunch of old (aout, originally) primitives used by coredumps became
        dead code after fdpic conversion to regsets. Removal of that dead code
        had been the first commit in the followups to regset series;
        unfortunately, it happened to hide the bisect hazard on sh (extern for
        fpregs_get() had not been updated in the main series when it should
        have been; followup simply made fpregs_get() static). And without that
        followup commit this bisect hazard became breakage in the mainline"
      Tested-by: default avatarJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        kill unused dump_fpu() instances
      8d3e09b4
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 9420f1ce
      Linus Torvalds authored
      Pull pin control updates from Linus Walleij:
       "This is the bulk of the pin control changes for the v5.9 kernel
        series:
      
        Core changes:
      
         - The GPIO patch "gpiolib: Introduce for_each_requested_gpio_in_range()
           macro" was put in an immutable branch and merged into the pinctrl
           tree as well. We see these changes also here.
      
         - Improved debug output for pins used as GPIO.
      
        New drivers:
      
         - Ocelot Sparx5 SoC driver.
      
         - Intel Emmitsburg SoC subdriver.
      
         - Intel Tiger Lake-H SoC subdriver.
      
         - Qualcomm PM660 SoC subdriver.
      
         - Renesas SH-PFC R8A774E1 subdriver.
      
        Driver improvements:
      
         - Linear improvement and cleanups of the Intel drivers for
           Cherryview, Lynxpoint, Baytrail etc. Improved locking among other
           things.
      
         - Renesas SH-PFC has added support for RPC pins, groups, and
           functions to r8a77970 and r8a77980.
      
         - The newere Freescale (now NXP) i.MX8 pin controllers have been
           modularized. This is driven by the Google Android GKI initiative I
           think.
      
         - Open drain support for pins on the Qualcomm IPQ4019.
      
         - The Ingenic driver can handle both edges IRQ detection.
      
         - A big slew of documentation fixes all over the place.
      
         - A few irqchip template conversions by yours truly.
      
      * tag 'pinctrl-v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (107 commits)
        dt-bindings: pinctrl: add bindings for MediaTek MT6779 SoC
        pinctrl: stmfx: Use irqchip template
        pinctrl: amd: Use irqchip template
        pinctrl: mediatek: fix build for tristate changes
        pinctrl: samsung: Use bank name as irqchip name
        pinctrl: core: print gpio in pins debugfs file
        pinctrl: mediatek: add mt6779 eint support
        pinctrl: mediatek: add pinctrl support for MT6779 SoC
        pinctrl: mediatek: avoid virtual gpio trying to set reg
        pinctrl: mediatek: update pinmux definitions for mt6779
        pinctrl: stm32: use the hwspin_lock_timeout_in_atomic() API
        pinctrl: mcp23s08: Use irqchip template
        pinctrl: sx150x: Use irqchip template
        dt-bindings: ingenic,pinctrl: Support pinmux/pinconf nodes
        pinctrl: intel: Add Intel Emmitsburg pin controller support
        pinctl: ti: iodelay: Replace HTTP links with HTTPS ones
        Revert "gpio: omap: handle pin config bias flags"
        pinctrl: single: Use fallthrough pseudo-keyword
        pinctrl: qcom: spmi-gpio: Use fallthrough pseudo-keyword
        pinctrl: baytrail: Use fallthrough pseudo-keyword
        ...
      9420f1ce
    • Linus Torvalds's avatar
      Merge tag 'mtd/for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · dec1fbbc
      Linus Torvalds authored
      Pull mtd updates from Miquel Raynal:
       "MTD core changes:
         - Spelling
         - http to https updates
      
        NAND core changes:
         - Drop useless 'depends on' in Kconfig
         - Add an extra level in the Kconfig hierarchy
         - Trivial spellings
         - Dynamic allocation of the interface configurations
         - Dropping the default ONFI timing mode
         - Various cleanup (types, structures, naming, comments)
         - Hide the chip->data_interface indirection
         - Add the generic rb-gpios property
         - Add the ->choose_interface_config() hook
         - Introduce nand_choose_best_sdr_timings()
         - Use default values for tPROG_max and tBERS_max
         - Avoid redefining tR_max and tCCS_min
         - Add a helper to find the closest ONFI mode
         - bcm63xx MTD parsers: simplify CFE detection
      
        Raw NAND controller drivers changes:
         - fsl-upm: Deprecation of specific DT properties
         - fsl_upm: Driver rework and cleanup in favor of ->exec_op()
         - Ingenic: Cleanup ARRAY_SIZE() vs sizeof() use
         - brcmnand: ECC error handling on EDU transfers
         - brcmnand: Don't default to EDU transfers
         - qcom: Set BAM mode only if not set already
         - qcom: Avoid write to unavailable register
         - gpio: Driver rework in favor of ->exec_op()
         - tango: ->exec_op() conversion
         - mtk: ->exec_op() conversion
      
        Raw NAND chip drivers changes:
         - toshiba: Implement ->choose_interface_config() for TH58NVG2S3HBAI4,
           TC58NVG0S3E, and TC58TEG5DCLTA00
         - hynix: Implement ->choose_interface_config() for H27UCG8T2ATR-BC
      
        SPI NOR core changes:
         - Disable Quad Mode in spi_nor_restore().
         - Don't abort BFPT parsing when QER reserved value is used.
         - Add support/update capabilities for few flashes.
         - Drop s70fl01gs flash: it does not support RDSR(05h) which is
           critical for erase/write.
         - Merge the SPIMEM DTR bits in spi-nor/next to avoid conflicts during
           the release cycle.
      
        SPI NOR controller drivers changes:
         - Move the cadence-quadspi driver to spi-mem. The series was taken
           through the SPI tree. Merge it also in spi-nor/next to avoid
           conflicts during the release cycle.
         - intel-spi:
            - Add new PCI IDs.
            - Ignore the Write Disable command, the controller doesn't support
              it.
            - Fix performance regression"
      
      * tag 'mtd/for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (79 commits)
        MTD: pfow.h: drop a duplicated word
        MTD: mtd-abi.h: drop a duplicated word
        mtd: rawnand: omap_elm: Replace HTTP links with HTTPS ones
        mtd: Replace HTTP links with HTTPS ones
        mtd: hyperbus: Replace HTTP links with HTTPS ones
        mtd: revert "spi-nor: intel: provide a range for poll_timout"
        mtd: spi-nor: update read capabilities for w25q64 and s25fl064k
        mtd: spi-nor: micron: Add SPI_NOR_DUAL_READ flag on mt25qu02g
        mtd: spi-nor: macronix: Add support for mx66u2g45g
        mtd: spi-nor: intel-spi: Simulate WRDI command
        mtd: spi-nor: Disable the flash quad mode in spi_nor_restore()
        mtd: spi-nor: Add capability to disable flash quad mode
        mtd: spi-nor: spansion: Remove s70fl01gs from flash_info
        mtd: spi-nor: sfdp: do not make invalid quad enable fatal
        dt-bindings: mtd: fsl-upm-nand: Deprecate chip-delay and fsl, upm-wait-flags
        mtd: rawnand: stm32_fmc2: get resources from parent node
        mtd: rawnand: stm32_fmc2: use regmap APIs
        memory: stm32-fmc2-ebi: add STM32 FMC2 EBI controller driver
        dt-bindings: memory-controller: add STM32 FMC2 EBI controller documentation
        dt-bindings: mtd: update STM32 FMC2 NAND controller documentation
        ...
      dec1fbbc
    • Stephen Rothwell's avatar
    • Masahiro Yamada's avatar
      kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base · 132305b3
      Masahiro Yamada authored
      Commit d26e9414 ("kbuild: no gcc-plugins during cc-option tests")
      was neeeded because scripts/Makefile.gcc-plugins was too early.
      
      This is unneeded by including scripts/Makefile.gcc-plugins last,
      and being careful to not add cc-option tests after it.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      132305b3
    • Masahiro Yamada's avatar
      kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled · e0fe0bbe
      Masahiro Yamada authored
      Currently, the top Makefile includes all of scripts/Makefile.<feature>
      even if the associated CONFIG option is disabled.
      
      Do not include unneeded Makefiles in order to slightly optimize the
      parse stage.
      
      Include $(include-y), and ignore $(include-).
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      e0fe0bbe
    • Masahiro Yamada's avatar
      kbuild: introduce hostprogs-always-y and userprogs-always-y · faabed29
      Masahiro Yamada authored
      To build host programs, you need to add the program names to 'hostprogs'
      to use the necessary build rule, but it is not enough to build them
      because there is no dependency.
      
      There are two types of host programs: built as the prerequisite of
      another (e.g. gen_crc32table in lib/Makefile), or always built when
      Kbuild visits the Makefile (e.g. genksyms in scripts/genksyms/Makefile).
      
      The latter is typical in Makefiles under scripts/, which contains host
      programs globally used during the kernel build. To build them, you need
      to add them to both 'hostprogs' and 'always-y'.
      
      This commit adds hostprogs-always-y as a shorthand.
      
      The same applies to user programs. net/bpfilter/Makefile builds
      bpfilter_umh on demand, hence always-y is unneeded. In contrast,
      programs under samples/ are added to both 'userprogs' and 'always-y'
      so they are always built when Kbuild visits the Makefiles.
      
      userprogs-always-y works as a shorthand.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      faabed29
    • Masahiro Yamada's avatar
      kbuild: sort hostprogs before passing it to ifneq · 85569d19
      Masahiro Yamada authored
      The conditional:
      
        ifneq ($(hostprogs),)
      
      ... is evaluated to true if $(hostprogs) does not contain any word but
      whitespace characters.
      
        ifneq ($(strip $(hostprogs)),)
      
      ... is a safe way to avoid interpreting whitespace as a non-empty value,
      but I'd rather want to use the side-effect of $(sort ...) to do the
      equivalent.
      
      $(sort ...) is used in scripts/Makefile.host in order to drop duplication
      in $(hostprogs). It is also useful to strip excessive spaces.
      
      Move $(sort ...) before evaluating the ifneq.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      85569d19
    • Masahiro Yamada's avatar
      kbuild: move host .so build rules to scripts/gcc-plugins/Makefile · 42640b13
      Masahiro Yamada authored
      The host shared library rules are currently implemented in
      scripts/Makefile.host, but actually GCC-plugin is the only user of
      them. (The VDSO .so files are built for the target by different
      build rules) Hence, they do not need to be treewide available.
      
      Move all the relevant build rules to scripts/gcc-plugins/Makefile.
      
      I also optimized the build steps so *.so is directly built from .c
      because every upstream plugin is compiled from a single source file.
      
      I am still keeping the multi-file plugin support, which Kees Cook
      mentioned might be needed by out-of-tree plugins.
      (https://lkml.org/lkml/2019/1/11/1107)
      
      If the plugin, foo.so, is compiled from two files foo.c and foo2.c,
      then you can do like follows:
      
        foo-objs := foo.o foo2.o
      
      Single-file plugins do not need the *-objs notation.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      42640b13
    • Alexander A. Klimov's avatar
      kbuild: Replace HTTP links with HTTPS ones · 16a122c7
      Alexander A. Klimov authored
      Rationale:
      Reduces attack surface on kernel devs opening the links for MITM
      as HTTPS traffic is much harder to manipulate.
      
      Deterministic algorithm:
      For each file:
        If not .svg:
          For each line:
            If doesn't contain `\bxmlns\b`:
              For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
      	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
                  If both the HTTP and HTTPS versions
                  return 200 OK and serve the same content:
                    Replace HTTP with HTTPS.
      Signed-off-by: default avatarAlexander A. Klimov <grandmaster@al2klimov.de>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      16a122c7
    • Masahiro Yamada's avatar
      kbuild: trace functions in subdirectories of lib/ · b16838c6
      Masahiro Yamada authored
      ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE)
      
      exists here in sub-directories of lib/ to keep the behavior of
      commit 2464a609 ("ftrace: do not trace library functions").
      
      Since that commit, not only the objects in lib/ but also the ones in
      the sub-directories are excluded from ftrace (although the commit
      description did not explicitly mention this).
      
      However, most of library functions in sub-directories are not so hot.
      Re-add them to ftrace.
      
      Going forward, only the objects right under lib/ will be excluded.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      b16838c6
    • Masahiro Yamada's avatar
      kbuild: introduce ccflags-remove-y and asflags-remove-y · 15d5761a
      Masahiro Yamada authored
      CFLAGS_REMOVE_<file>.o filters out flags when compiling a particular
      object, but there is no convenient way to do that for every object in
      a directory.
      
      Add ccflags-remove-y and asflags-remove-y to make it easily.
      
      Use ccflags-remove-y to clean up some Makefiles.
      
      The add/remove order works as follows:
      
       [1] KBUILD_CFLAGS specifies compiler flags used globally
      
       [2] ccflags-y adds compiler flags for all objects in the
           current Makefile
      
       [3] ccflags-remove-y removes compiler flags for all objects in the
           current Makefile (New feature)
      
       [4] CFLAGS_<file> adds compiler flags per file.
      
       [5] CFLAGS_REMOVE_<file> removes compiler flags per file.
      
      Having [3] before [4] allows us to remove flags from most (but not all)
      objects in the current Makefile.
      
      For example, kernel/trace/Makefile removes $(CC_FLAGS_FTRACE)
      from all objects in the directory, then adds it back to
      trace_selftest_dynamic.o and CFLAGS_trace_kprobe_selftest.o
      
      The same applies to lib/livepatch/Makefile.
      
      Please note ccflags-remove-y has no effect to the sub-directories.
      In contrast, the previous notation got rid of compiler flags also from
      all the sub-directories.
      
      The following are not affected because they have no sub-directories:
      
        arch/arm/boot/compressed/
        arch/powerpc/xmon/
        arch/sh/
        kernel/trace/
      
      However, lib/ has several sub-directories.
      
      To keep the behavior, I added ccflags-remove-y to all Makefiles
      in subdirectories of lib/, except the following:
      
        lib/vdso/Makefile        - Kbuild does not descend into this Makefile
        lib/raid/test/Makefile   - This is not used for the kernel build
      
      I think commit 2464a609 ("ftrace: do not trace library functions")
      excluded too much. In the next commit, I will remove ccflags-remove-y
      from the sub-directories of lib/.
      Suggested-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: Brendan Higgins <brendanhiggins@google.com> (KUnit)
      Tested-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      15d5761a
    • Masahiro Yamada's avatar
      kbuild: do not export LDFLAGS_vmlinux · 3ec8a5b3
      Masahiro Yamada authored
      When you clean the build tree for ARCH=arm, you may see the following
      error message from 'nm' command:
      
      $ make -j24 ARCH=arm clean
        CLEAN   arch/arm/crypto
        CLEAN   arch/arm/kernel
        CLEAN   arch/arm/mach-at91
        CLEAN   arch/arm/mach-omap2
        CLEAN   arch/arm/vdso
        CLEAN   certs
        CLEAN   lib
        CLEAN   usr
        CLEAN   net/wireless
        CLEAN   drivers/firmware/efi/libstub
      nm: 'arch/arm/boot/compressed/../../../../vmlinux': No such file
      /bin/sh: 1: arithmetic expression: expecting primary: " "
        CLEAN   arch/arm/boot/compressed
        CLEAN   drivers/scsi
        CLEAN   drivers/tty/vt
        CLEAN   arch/arm/boot
        CLEAN   vmlinux.symvers modules.builtin modules.builtin.modinfo
      
      Even if you rerun the same command, the error message will not be
      shown despite vmlinux is already gone.
      
      To reproduce it, the parallel option -j is needed. Single thread
      cleaning always executes 'archclean', 'vmlinuxclean' in this order,
      so vmlinux still exists when arch/arm/boot/compressed/ is cleaned.
      
      Looking at arch/arm/boot/compressed/Makefile does not help understand
      the reason of the error message. Both KBSS_SZ and LDFLAGS_vmlinux are
      assigned with '=' operator, hence, they are not expanded unless used.
      Obviously, 'make clean' does not use them.
      
      In fact, the root cause exists in the top Makefile:
      
        export LDFLAGS_vmlinux
      
      Since LDFLAGS_vmlinux is an exported variable, LDFLAGS_vmlinux in
      arch/arm/boot/compressed/Makefile is expanded when scripts/Makefile.clean
      has a command to execute. This is why the error message shows up only
      when there exist build artifacts in arch/arm/boot/compressed/.
      
      Adding 'unexport LDFLAGS_vmlinux' to arch/arm/boot/compressed/Makefile
      will fix it as far as ARCH=arm is concerned, but I think the proper fix
      is to get rid of 'export LDFLAGS_vmlinux' from the top Makefile.
      
      LDFLAGS_vmlinux in the top Makefile contains linker flags for the top
      vmlinux. LDFLAGS_vmlinux in arch/arm/boot/compressed/Makefile is for
      arch/arm/boot/compressed/vmlinux. They just happen to have the same
      variable name, but are used for different purposes. Stop shadowing
      LDFLAGS_vmlinux.
      
      This commit passes LDFLAGS_vmlinux to scripts/link-vmlinux.sh via a
      command line parameter instead of via an environment variable. LD and
      KBUILD_LDFLAGS are exported, but I did the same for consistency. Anyway,
      they must be included in cmd_link-vmlinux to allow if_changed to detect
      the changes in LD or KBUILD_LDFLAGS.
      
      The following Makefiles are not affected:
      
        arch/arm/boot/compressed/Makefile
        arch/h8300/boot/compressed/Makefile
        arch/nios2/boot/compressed/Makefile
        arch/parisc/boot/compressed/Makefile
        arch/s390/boot/compressed/Makefile
        arch/sh/boot/compressed/Makefile
        arch/sh/boot/romimage/Makefile
        arch/x86/boot/compressed/Makefile
      
      They use ':=' or '=' to clear the LDFLAGS_vmlinux inherited from the
      top Makefile.
      
      We need to take a closer look at the impact to unicore32 and xtensa.
      
      arch/unicore32/boot/compressed/Makefile only uses '+=' operator for
      LDFLAGS_vmlinux. So, the decompressor previously inherited the linker
      flags from the top Makefile.
      
      However, commit 70fac51f ("unicore32 additional architecture files:
      boot process") was merged before commit 1f2bfbd0 ("kbuild: link of
      vmlinux moved to a script"). So, I rather consider this is a bug fix of
      1f2bfbd0.
      
      arch/xtensa/boot/boot-elf/Makefile is also affected, but this is also
      considered a fix for the same reason. It did not inherit LDFLAGS_vmlinux
      when commit 4bedea94 ("[PATCH] xtensa: Architecture support for
      Tensilica Xtensa Part 2") was merged. I deleted $(LDFLAGS_vmlinux),
      which is now empty.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Tested-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      3ec8a5b3