1. 13 Oct, 2020 8 commits
    • Joel Fernandes (Google)'s avatar
      perf sched: Show start of latency as well · dc000c45
      Joel Fernandes (Google) authored
      The 'perf sched latency' tool is really useful at showing worst-case
      latencies that task encountered since wakeup. However it shows only the
      end of the latency. Often times the start of a latency is interesting as
      it can show what else was going on at the time to cause the latency. I
      certainly myself spending a lot of time backtracking to the start of the
      latency in "perf sched script" which wastes a lot of time.
      
      This patch therefore adds a new column "Max delay start". Considering
      this, also rename "Maximum delay at" to "Max delay end" as its easier to
      understand.
      
      Example of the new output:
      
        ----------------------------------------------------------------------------------------------------------------------------------
         Task                  | Runtime ms  | Switches | Avg delay ms  | Max delay ms   | Max delay start         | Max delay end       |
        ----------------------------------------------------------------------------------------------------------------------------------
         MediaScannerSer:11936 |  651.296 ms |    67978 | avg: 0.113 ms | max: 77.250 ms | max start: 477.691360 s | max end: 477.768610 s
         audio@2.0-servi:(3)   |    0.000 ms |     3440 | avg: 0.034 ms | max: 72.267 ms | max start: 477.697051 s | max end: 477.769318 s
         AudioOut_1D:8112      |    0.000 ms |     2588 | avg: 0.083 ms | max: 64.020 ms | max start: 477.710740 s | max end: 477.774760 s
         Time-limited te:14973 | 7966.090 ms |    24807 | avg: 0.073 ms | max: 15.563 ms | max start: 477.162746 s | max end: 477.178309 s
         surfaceflinger:8049   |    9.680 ms |      603 | avg: 0.063 ms | max: 13.275 ms | max start: 476.931791 s | max end: 476.945067 s
         HeapTaskDaemon:(3)    | 1588.830 ms |     7040 | avg: 0.065 ms | max:  6.880 ms | max start: 473.666043 s | max end: 473.672922 s
         mount-passthrou:(3)   | 1370.809 ms |    68904 | avg: 0.011 ms | max:  6.524 ms | max start: 478.090630 s | max end: 478.097154 s
         ReferenceQueueD:(3)   |   11.794 ms |     1725 | avg: 0.014 ms | max:  6.521 ms | max start: 476.119782 s | max end: 476.126303 s
         writer:14077          |   18.410 ms |     1427 | avg: 0.036 ms | max:  6.131 ms | max start: 474.169675 s | max end: 474.175805 s
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20200925235634.4089867-1-joel@joelfernandes.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dc000c45
    • Sandipan Das's avatar
      perf vendor events: Fix typos in power8 PMU events · 70830f97
      Sandipan Das authored
      This replaces the incorrectly spelled word "localtion" with "location"
      in some power8 PMU event descriptions.
      
      Fixes: 2a81fa3b ("perf vendor events: Add power8 PMU events")
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Reviewed-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20201012050205.328523-1-sandipan@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      70830f97
    • Namhyung Kim's avatar
      perf bench: Run inject-build-id with --buildid-all option too · bf7ef5dd
      Namhyung Kim authored
      For comparison, it now runs the benchmark twice - one if regular -b and
      another for --buildid-all.
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 21.002 msec (+- 0.172 msec)
          Average time per event: 2.059 usec (+- 0.017 usec)
          Average memory usage: 8169 KB (+- 0 KB)
          Average build-id-all injection took: 19.543 msec (+- 0.124 msec)
          Average time per event: 1.916 usec (+- 0.012 usec)
          Average memory usage: 7348 KB (+- 0 KB)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-7-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bf7ef5dd
    • Namhyung Kim's avatar
      perf inject: Add --buildid-all option · 27c9c342
      Namhyung Kim authored
      Like 'perf record', we can even more speedup build-id processing by just
      using all DSOs.  Then we don't need to look at all the sample events
      anymore.  The following patch will update 'perf bench' to show the result
      of the --buildid-all option too.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Original-patch-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      27c9c342
    • Namhyung Kim's avatar
      perf inject: Do not load map/dso when injecting build-id · e7b60c5a
      Namhyung Kim authored
      No need to load symbols in a DSO when injecting build-id.  I guess the
      reason was to check the DSO is a special file like anon files.  Use some
      helper functions in map.c to check them before reading build-id.  Also
      pass sample event's cpumode to a new build-id event.
      
      It brought a speedup in the benchmark of 25 -> 21 msec on my laptop.
      Also the memory usage (Max RSS) went down by ~200 KB.
      
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 21.389 msec (+- 0.138 msec)
          Average time per event: 2.097 usec (+- 0.014 usec)
          Average memory usage: 8225 KB (+- 0 KB)
      
      Committer notes:
      
      Before:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,020.56 msec task-clock:u              #    1.271 CPUs utilized            ( +-  0.74% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   123,354      page-faults:u             #    0.031 M/sec                    ( +-  0.81% )
             7,119,951,568      cycles:u                  #    1.771 GHz                      ( +-  1.74% )  (83.27%)
               230,086,969      stalled-cycles-frontend:u #    3.23% frontend cycles idle     ( +-  1.97% )  (83.41%)
             1,168,298,765      stalled-cycles-backend:u  #   16.41% backend cycles idle      ( +-  1.13% )  (83.44%)
            11,173,083,669      instructions:u            #    1.57  insn per cycle
                                                          #    0.10  stalled cycles per insn  ( +-  1.58% )  (83.31%)
             2,413,908,936      branches:u                #  600.392 M/sec                    ( +-  1.69% )  (83.26%)
                46,576,289      branch-misses:u           #    1.93% of all branches          ( +-  2.20% )  (83.31%)
      
                    3.1638 +- 0.0309 seconds time elapsed  ( +-  0.98% )
      
        $
      
      After:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  2,379.94 msec task-clock:u              #    1.473 CPUs utilized            ( +-  0.18% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                    62,584      page-faults:u             #    0.026 M/sec                    ( +-  0.07% )
             2,372,389,668      cycles:u                  #    0.997 GHz                      ( +-  0.29% )  (83.14%)
               106,937,862      stalled-cycles-frontend:u #    4.51% frontend cycles idle     ( +-  4.89% )  (83.20%)
               581,697,915      stalled-cycles-backend:u  #   24.52% backend cycles idle      ( +-  0.71% )  (83.47%)
             3,659,692,199      instructions:u            #    1.54  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.10% )  (83.63%)
               791,372,961      branches:u                #  332.518 M/sec                    ( +-  0.27% )  (83.39%)
                10,648,083      branch-misses:u           #    1.35% of all branches          ( +-  0.22% )  (83.16%)
      
                   1.61570 +- 0.00172 seconds time elapsed  ( +-  0.11% )
      
        $
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Original-patch-by: default avatarStephane Eranian <eranian@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e7b60c5a
    • Namhyung Kim's avatar
      perf inject: Enter namespace when reading build-id · 336c95b2
      Namhyung Kim authored
      It should be in a proper mnt namespace when accessing the file.
      
      I think this had no problem since the build-id was actually read from
      map__load() -> dso__load() already.  But I'd like to change it in the
      following commit.
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      336c95b2
    • Namhyung Kim's avatar
      perf inject: Add missing callbacks in perf_tool · 2946eced
      Namhyung Kim authored
      I found some events (like PERF_RECORD_CGROUP) are not copied by perf
      inject due to the missing callbacks.  Let's add them.
      
      While at it, I've changed the order of the callbacks to match with
      struct perf_tool so that we can compare them easily.
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2946eced
    • Namhyung Kim's avatar
      perf bench: Add build-id injection benchmark · 0bf02a0d
      Namhyung Kim authored
      Sometimes I can see that 'perf record' piped with 'perf inject' take a
      long time processing build-ids.
      
      So introduce a inject-build-id benchmark to the internals benchmark
      suite to measure its overhead regularly.
      
      It runs the 'perf inject' command internally and feeds the given number
      of synthesized events (MMAP2 + SAMPLE basically).
      
        Usage: perf bench internals inject-build-id <options>
      
          -i, --iterations <n>  Number of iterations used to compute average (default: 100)
          -m, --nr-mmaps <n>    Number of mmap events for each iteration (default: 100)
          -n, --nr-samples <n>  Number of sample events per mmap event (default: 100)
          -v, --verbose         be more verbose (show iteration count, DSO name, etc)
      
      By default, it measures average processing time of 100 MMAP2 events
      and 10000 SAMPLE events.  Below is a result on my laptop.
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 25.789 msec (+- 0.202 msec)
          Average time per event: 2.528 usec (+- 0.020 usec)
          Average memory usage: 8411 KB (+- 7 KB)
      
      Committer testing:
      
        $ perf bench
        Usage:
        	perf bench [<common options>] <collection> <benchmark> [<options>]
      
                # List of all available benchmark collections:
      
                 sched: Scheduler and IPC benchmarks
               syscall: System call benchmarks
                   mem: Memory access benchmarks
                  numa: NUMA scheduling and MM benchmarks
                 futex: Futex stressing benchmarks
                 epoll: Epoll stressing benchmarks
             internals: Perf-internals benchmarks
                   all: All benchmarks
      
        $ perf bench internals
      
                # List of available benchmarks for collection 'internals':
      
            synthesize: Benchmark perf event synthesis
        kallsyms-parse: Benchmark kallsyms parsing
        inject-build-id: Benchmark build-id injection
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.202 msec (+- 0.059 msec)
          Average time per event: 1.392 usec (+- 0.006 usec)
          Average memory usage: 12650 KB (+- 10 KB)
          Average build-id-all injection took: 12.831 msec (+- 0.071 msec)
          Average time per event: 1.258 usec (+- 0.007 usec)
          Average memory usage: 11895 KB (+- 10 KB)
        $
      
        $ perf stat -r5 perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.380 msec (+- 0.056 msec)
          Average time per event: 1.410 usec (+- 0.006 usec)
          Average memory usage: 12608 KB (+- 11 KB)
          Average build-id-all injection took: 11.889 msec (+- 0.064 msec)
          Average time per event: 1.166 usec (+- 0.006 usec)
          Average memory usage: 11838 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.246 msec (+- 0.065 msec)
          Average time per event: 1.397 usec (+- 0.006 usec)
          Average memory usage: 12744 KB (+- 10 KB)
          Average build-id-all injection took: 12.019 msec (+- 0.066 msec)
          Average time per event: 1.178 usec (+- 0.006 usec)
          Average memory usage: 11963 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.321 msec (+- 0.067 msec)
          Average time per event: 1.404 usec (+- 0.007 usec)
          Average memory usage: 12690 KB (+- 10 KB)
          Average build-id-all injection took: 11.909 msec (+- 0.041 msec)
          Average time per event: 1.168 usec (+- 0.004 usec)
          Average memory usage: 11938 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.287 msec (+- 0.059 msec)
          Average time per event: 1.401 usec (+- 0.006 usec)
          Average memory usage: 12864 KB (+- 10 KB)
          Average build-id-all injection took: 11.862 msec (+- 0.058 msec)
          Average time per event: 1.163 usec (+- 0.006 usec)
          Average memory usage: 12103 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.402 msec (+- 0.053 msec)
          Average time per event: 1.412 usec (+- 0.005 usec)
          Average memory usage: 12876 KB (+- 10 KB)
          Average build-id-all injection took: 11.826 msec (+- 0.061 msec)
          Average time per event: 1.159 usec (+- 0.006 usec)
          Average memory usage: 12111 KB (+- 10 KB)
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,267.48 msec task-clock:u              #    1.502 CPUs utilized            ( +-  0.14% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   102,092      page-faults:u             #    0.024 M/sec                    ( +-  0.08% )
             3,894,589,578      cycles:u                  #    0.913 GHz                      ( +-  0.19% )  (83.49%)
               140,078,421      stalled-cycles-frontend:u #    3.60% frontend cycles idle     ( +-  0.77% )  (83.34%)
               948,581,189      stalled-cycles-backend:u  #   24.36% backend cycles idle      ( +-  0.46% )  (83.25%)
             5,835,587,719      instructions:u            #    1.50  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.21% )  (83.24%)
             1,267,423,636      branches:u                #  296.996 M/sec                    ( +-  0.22% )  (83.12%)
                17,484,290      branch-misses:u           #    1.38% of all branches          ( +-  0.12% )  (83.55%)
      
                   2.84176 +- 0.00222 seconds time elapsed  ( +-  0.08% )
      
        $
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0bf02a0d
  2. 01 Oct, 2020 2 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Use the autogenerated mmap 'prot' string/id table · 388968d8
      Arnaldo Carvalho de Melo authored
      No change in behaviour:
      
        # perf trace -e mmap sleep 1
             0.000 ( 0.009 ms): sleep/751870 mmap(len: 143317, prot: READ, flags: PRIVATE, fd: 3)                  = 0x7fa96d0f7000
             0.028 ( 0.004 ms): sleep/751870 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS)           = 0x7fa96d0f5000
             0.037 ( 0.005 ms): sleep/751870 mmap(len: 1872744, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3)       = 0x7fa96cf2b000
             0.044 ( 0.011 ms): sleep/751870 mmap(addr: 0x7fa96cf50000, len: 1376256, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x25000) = 0x7fa96cf50000
             0.056 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0a0000, len: 307200, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x175000) = 0x7fa96d0a0000
             0.064 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0eb000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bf000) = 0x7fa96d0eb000
             0.075 ( 0.005 ms): sleep/751870 mmap(addr: 0x7fa96d0f1000, len: 13160, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7fa96d0f1000
             0.253 ( 0.005 ms): sleep/751870 mmap(len: 218049136, prot: READ, flags: PRIVATE, fd: 3)               = 0x7fa95ff38000
        #
        #
        # set -o vi
        # strace -e mmap sleep 1
        mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f333bd83000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f333bd81000
        mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f333bbb7000
        mmap(0x7f333bbdc000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f333bbdc000
        mmap(0x7f333bd2c000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f333bd2c000
        mmap(0x7f333bd77000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f333bd77000
        mmap(0x7f333bd7d000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f333bd7d000
        mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f332ebc4000
        +++ exited with 0 +++
        #
      
      And you can as well tweak 'perf trace's output to more closely match
      strace's:
      
        # perf config trace.show_arg_names=no
        # perf config trace.show_duration=no
        # perf config trace.show_prefix=yes
        # perf config trace.show_timestamp=no
        # perf config trace.show_zeros=yes
        # perf config trace.no_inherit=yes
        # perf trace -e mmap sleep 1
        mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0)                      = 0x7f0d287ca000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS)     = 0x7f0d287c8000
        mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0)       = 0x7f0d285fe000
        mmap(0x7f0d28623000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f0d28623000
        mmap(0x7f0d28773000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f0d28773000
        mmap(0x7f0d287be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f0d287be000
        mmap(0x7f0d287c4000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f0d287c4000
        mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0)                   = 0x7f0d1b60b000
        #
      
        # perf config | grep ^trace
        trace.show_arg_names=no
        trace.show_duration=no
        trace.show_prefix=yes
        trace.show_timestamp=no
        trace.show_zeros=yes
        trace.no_inherit=yes
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      388968d8
    • Arnaldo Carvalho de Melo's avatar
      tools beauty: Add script to generate table of mmap's 'prot' argument · 08fc4762
      Arnaldo Carvalho de Melo authored
      Will be wired up in the following csets:
      
        $ tools/perf/trace/beauty/mmap_prot.sh
        static const char *mmap_prot[] = {
        	[ilog2(0x1) + 1] = "READ",
        #ifndef PROT_READ
        #define PROT_READ 0x1
        #endif
        	[ilog2(0x2) + 1] = "WRITE",
        #ifndef PROT_WRITE
        #define PROT_WRITE 0x2
        #endif
        	[ilog2(0x4) + 1] = "EXEC",
        #ifndef PROT_EXEC
        #define PROT_EXEC 0x4
        #endif
        	[ilog2(0x8) + 1] = "SEM",
        #ifndef PROT_SEM
        #define PROT_SEM 0x8
        #endif
        	[ilog2(0x01000000) + 1] = "GROWSDOWN",
        #ifndef PROT_GROWSDOWN
        #define PROT_GROWSDOWN 0x01000000
        #endif
        	[ilog2(0x02000000) + 1] = "GROWSUP",
        #ifndef PROT_GROWSUP
        #define PROT_GROWSUP 0x02000000
        #endif
        };
        $
        $
        $
        $ tools/perf/trace/beauty/mmap_prot.sh alpha
        static const char *mmap_prot[] = {
        	[ilog2(0x4) + 1] = "EXEC",
        #ifndef PROT_EXEC
        #define PROT_EXEC 0x4
        #endif
        	[ilog2(0x01000000) + 1] = "GROWSDOWN",
        #ifndef PROT_GROWSDOWN
        #define PROT_GROWSDOWN 0x01000000
        #endif
        	[ilog2(0x02000000) + 1] = "GROWSUP",
        #ifndef PROT_GROWSUP
        #define PROT_GROWSUP 0x02000000
        #endif
        	[ilog2(0x1) + 1] = "READ",
        #ifndef PROT_READ
        #define PROT_READ 0x1
        #endif
        	[ilog2(0x8) + 1] = "SEM",
        #ifndef PROT_SEM
        #define PROT_SEM 0x8
        #endif
        	[ilog2(0x2) + 1] = "WRITE",
        #ifndef PROT_WRITE
        #define PROT_WRITE 0x2
        #endif
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      08fc4762
  3. 30 Sep, 2020 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf beauty mmap_flags: Conditionaly define the mmap flags · 61693228
      Arnaldo Carvalho de Melo authored
      So that in older systems we get it in the mmap flags scnprintf routines:
      
        $ tools/perf/trace/beauty/mmap_flags.sh  | head -9 2> /dev/null
        static const char *mmap_flags[] = {
        	[ilog2(0x40) + 1] = "32BIT",
        #ifndef MAP_32BIT
        #define MAP_32BIT 0x40
        #endif
        	[ilog2(0x01) + 1] = "SHARED",
        #ifndef MAP_SHARED
        #define MAP_SHARED 0x01
        #endif
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      61693228
  4. 29 Sep, 2020 2 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty: Add script to autogenerate mremap's flags args string/id table · 9012e3dd
      Arnaldo Carvalho de Melo authored
      It'll also conditionally generate the defines, so that if we don't have
      those when building a new tool tarball in an older systems, we get
      those, and we need them sometimes in the actual scnprintf routine, such
      as when checking if a flags means we have an extra arg, like with
      MREMAP_FIXED.
      
        $ tools/perf/trace/beauty/mremap_flags.sh
        static const char *mremap_flags[] = {
        	[ilog2(1) + 1] = "MAYMOVE",
        #ifndef MREMAP_MAYMOVE
        #define MREMAP_MAYMOVE 1
        #endif
        	[ilog2(2) + 1] = "FIXED",
        #ifndef MREMAP_FIXED
        #define MREMAP_FIXED 2
        #endif
        	[ilog2(4) + 1] = "DONTUNMAP",
        #ifndef MREMAP_DONTUNMAP
        #define MREMAP_DONTUNMAP 4
        #endif
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9012e3dd
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Separate the checking of headers only used to build beautification tables · d758d5d4
      Arnaldo Carvalho de Melo authored
      Some headers are not used in building the tools directly, but instead to
      generate tables that then gets source code included to do id->string and
      string->id lookups for things like syscall flags and commands.
      
      We were adding it directly to tools/include/ and this sometimes gets in
      the way of building using system headers, lets untangle this a bit.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d758d5d4
  5. 28 Sep, 2020 12 commits
  6. 27 Sep, 2020 9 commits
    • Linus Torvalds's avatar
      Linux 5.9-rc7 · a1b8638b
      Linus Torvalds authored
      a1b8638b
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.9-4' of... · 16bc1d54
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - ignore compiler stubs for PPC to fix builds
      
       - fix the usage of --target mentioned in the LLVM document
      
      * tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Documentation/llvm: Fix clang target examples
        scripts/kallsyms: skip ppc compiler stub *.long_branch.* / *.plt_branch.*
      16bc1d54
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f8818559
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Two fixes for the x86 interrupt code:
      
         - Unbreak the magic 'search the timer interrupt' logic in IO/APIC
           code which got wreckaged when the core interrupt code made the
           state tracking logic stricter.
      
           That caused the interrupt line to stay masked after switching from
           IO/APIC to PIC delivery mode, which obviously prevents interrupts
           from being delivered.
      
         - Make run_on_irqstack_code() typesafe. The function argument is a
           void pointer which is then cast to 'void (*fun)(void *).
      
           This breaks Control Flow Integrity checking in clang. Use proper
           helper functions for the three variants reuqired"
      
      * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ioapic: Unbreak check_timer()
        x86/irq: Make run_on_irqstack_cond() typesafe
      f8818559
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ba25f057
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "A set of clocksource/clockevents updates:
      
         - Reset the TI/DM timer before enabling it instead of doing it the
           other way round.
      
         - Initialize the reload value for the GX6605s timer correctly so the
           hardware counter starts at 0 again after overrun.
      
         - Make error return value negative in the h8300 timer init function"
      
      * tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/timer-gx6605s: Fixup counter reload
        clocksource/drivers/timer-ti-dm: Do reset before enable
        clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
      ba25f057
    • Peter Xu's avatar
      mm/thp: Split huge pmds/puds if they're pinned when fork() · d042035e
      Peter Xu authored
      Pinned pages shouldn't be write-protected when fork() happens, because
      follow up copy-on-write on these pages could cause the pinned pages to
      be replaced by random newly allocated pages.
      
      For huge PMDs, we split the huge pmd if pinning is detected.  So that
      future handling will be done by the PTE level (with our latest changes,
      each of the small pages will be copied).  We can achieve this by let
      copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
      fallthrough in copy_pmd_range() and finally land the next
      copy_pte_range() call.
      
      Huge PUDs will be even more special - so far it does not support
      anonymous pages.  But it can actually be done the same as the huge PMDs
      even if the split huge PUDs means to erase the PUD entries.  It'll
      guarantee the follow up fault ins will remap the same pages in either
      parent/child later.
      
      This might not be the most efficient way, but it should be easy and
      clean enough.  It should be fine, since we're tackling with a very rare
      case just to make sure userspaces that pinned some thps will still work
      even without MADV_DONTFORK and after they fork()ed.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d042035e
    • Peter Xu's avatar
      mm: Do early cow for pinned pages during fork() for ptes · 70e806e4
      Peter Xu authored
      This allows copy_pte_range() to do early cow if the pages were pinned on
      the source mm.
      
      Currently we don't have an accurate way to know whether a page is pinned
      or not.  The only thing we have is page_maybe_dma_pinned().  However
      that's good enough for now.  Especially, with the newly added
      mm->has_pinned flag to make sure we won't affect processes that never
      pinned any pages.
      
      It would be easier if we can do GFP_KERNEL allocation within
      copy_one_pte().  Unluckily, we can't because we're with the page table
      locks held for both the parent and child processes.  So the page
      allocation needs to be done outside copy_one_pte().
      
      Some trick is there in copy_present_pte(), majorly the wrprotect trick
      to block concurrent fast-gup.  Comments in the function should explain
      better in place.
      
      Oleg Nesterov reported a (probably harmless) bug during review that we
      didn't reset entry.val properly in copy_pte_range() so that potentially
      there's chance to call add_swap_count_continuation() multiple times on
      the same swp entry.  However that should be harmless since even if it
      happens, the same function (add_swap_count_continuation()) will return
      directly noticing that there're enough space for the swp counter.  So
      instead of a standalone stable patch, it is touched up in this patch
      directly.
      
      Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70e806e4
    • Peter Xu's avatar
      mm/fork: Pass new vma pointer into copy_page_range() · 7a4830c3
      Peter Xu authored
      This prepares for the future work to trigger early cow on pinned pages
      during fork().
      
      No functional change intended.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a4830c3
    • Peter Xu's avatar
      mm: Introduce mm_struct.has_pinned · 008cfe44
      Peter Xu authored
      (Commit message majorly collected from Jason Gunthorpe)
      
      Reduce the chance of false positive from page_maybe_dma_pinned() by
      keeping track if the mm_struct has ever been used with pin_user_pages().
      This allows cases that might drive up the page ref_count to avoid any
      penalty from handling dma_pinned pages.
      
      Future work is planned, to provide a more sophisticated solution, likely
      to turn it into a real counter.  For now, make it atomic_t but use it as
      a boolean for simplicity.
      Suggested-by: default avatarJason Gunthorpe <jgg@ziepe.ca>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      008cfe44
    • Thomas Gleixner's avatar
      Merge tag 'timers-v5.9-rc4' of... · a7b6c0fe
      Thomas Gleixner authored
      Merge tag 'timers-v5.9-rc4' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent
      
      Pull clocksource/clockevent fixes from Daniel Lezcano:
      
       - Fix wrong signed return value when checking of_iomap in the probe
         function for the h8300 timer (Tianjia Zhang)
      
       - Fix reset sequence when setting up the timer on the dm_timer (Tony
         Lindgren)
      
       - Fix counter reload when the interrupt fires on gx6605s (Guo Ren)
      a7b6c0fe
  7. 26 Sep, 2020 6 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · a1bffa48
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three fixes: one in drivers (lpfc) and two for zoned block devices.
      
        The latter also impinges on the block layer but only to introduce a
        new block API for setting the zone model rather than fiddling with the
        queue directly in the zoned block driver"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: sd_zbc: Fix ZBC disk initialization
        scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks
        scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported
      a1bffa48
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block · 692495ba
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Two fixes for regressions in this cycle, and one that goes to 5.8
        stable:
      
         - fix leak of getname() retrieved filename
      
         - remove plug->nowait assignment, fixing a regression with btrfs
      
         - fix for async buffered retry"
      
      * tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block:
        io_uring: ensure async buffered read-retry is setup properly
        io_uring: don't unconditionally set plug->nowait = true
        io_uring: ensure open/openat2 name is cleaned on cancelation
      692495ba
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block · 9d2fbaef
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "NVMe pull request from Christoph, and removal of a dead define.
      
         - fix error during controller probe that cause double free irqs
           (Keith Busch)
      
         - FC connection establishment fix (James Smart)
      
         - properly handle completions for invalid tags (Xianting Tian)
      
         - pass the correct nsid to the command effects and supported log
           (Chaitanya Kulkarni)"
      
      * tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block:
        block: remove unused BLK_QC_T_EAGAIN flag
        nvme-core: don't use NVME_NSID_ALL for command effects and supported log
        nvme-fc: fail new connections to a deleted host or remote port
        nvme-pci: fix NULL req in completion handler
        nvme: return errors for hwmon init
      9d2fbaef
    • Linus Torvalds's avatar
      Merge tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · eeddbe68
      Linus Torvalds authored
      Pull s390 fix from Vasily Gorbik:
       "Fix truncated ZCRYPT_PERDEV_REQCNT ioctl result. Copy entire reqcnt
        list"
      
      * tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl
      eeddbe68
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 8fb1e910
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "9 patches.
      
        Subsystems affected by this patch series: mm (thp, memcg, gup,
        migration, memory-hotplug), lib, and x86"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: don't rely on system state to detect hot-plug operations
        mm: replace memmap_context by meminit_context
        arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
        lib/memregion.c: include memregion.h
        lib/string.c: implement stpcpy
        mm/migrate: correct thp migration stats
        mm/gup: fix gup_fast with dynamic page table folding
        mm: memcontrol: fix missing suffix of workingset_restore
        mm, THP, swap: fix allocating cluster for swapfile by mistake
      8fb1e910
    • Minchan Kim's avatar
      mm: validate pmd after splitting · ce268425
      Minchan Kim authored
      syzbot reported the following KASAN splat:
      
        general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
        CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
        Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
        RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
        Call Trace:
          lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
          __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
          _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
          spin_lock include/linux/spinlock.h:354 [inline]
          madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
          walk_pmd_range mm/pagewalk.c:89 [inline]
          walk_pud_range mm/pagewalk.c:160 [inline]
          walk_p4d_range mm/pagewalk.c:193 [inline]
          walk_pgd_range mm/pagewalk.c:229 [inline]
          __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
          walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
          madvise_pageout_page_range mm/madvise.c:521 [inline]
          madvise_pageout mm/madvise.c:557 [inline]
          madvise_vma mm/madvise.c:946 [inline]
          do_madvise+0x12d0/0x2090 mm/madvise.c:1145
          __do_sys_madvise mm/madvise.c:1171 [inline]
          __se_sys_madvise mm/madvise.c:1169 [inline]
          __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
          do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The backing vma was shmem.
      
      In case of split page of file-backed THP, madvise zaps the pmd instead
      of remapping of sub-pages.  So we need to check pmd validity after
      split.
      
      Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
      Fixes: 1a4e58cc ("mm: introduce MADV_PAGEOUT")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce268425