1. 01 Sep, 2020 8 commits
    • Al Grant's avatar
      perf cs-etm: Fix corrupt data after perf inject from · f5f8e7e5
      Al Grant authored
      Commit 42bbabed ("perf tools: Add hw_idx in struct branch_stack")
      changed the format of branch stacks in perf samples. When samples use
      this new format, a flag must be set in the corresponding event.
      
      Synthesized branch stacks generated from CoreSight ETM trace were using
      the new format, but not setting the event attribute, leading to
      consumers seeing corrupt data. This patch fixes the issue by setting the
      event attribute to indicate use of the new format.
      
      Fixes: 42bbabed ("perf tools: Add hw_idx in struct branch_stack")
      Signed-off-by: default avatarAl Grant <al.grant@arm.com>
      Reviewed-by: default avatarAndrea Brunato <andrea.brunato@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Link: http://lore.kernel.org/lkml/20200819084751.17686-1-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f5f8e7e5
    • Arnaldo Carvalho de Melo's avatar
      perf top/report: Fix infinite loop in the TUI for grouped events · d4ccbacb
      Arnaldo Carvalho de Melo authored
      For a while we need to have a dummy event for doing things like
      receiving PERF_RECORD_COMM, PERF_RECORD_EXEC, etc for threads being
      created and dying while we synthesize the pre-existing ones at tool
      start.
      
      This 'dummy' event is needed for keeping track of thread lifetime events
      early in the session but are uninteresting otherwise, i.e. no need to
      have it in a initial events menu for the non-grouped case, i.e. for:
      
       # perf top -e cycles,instructions
      
      or even for plain:
      
       # perf top
      
      When 'cycles' and that 'dummy' event are in place.
      
      The code to remove that 'dummy' event ended up creating an endless loop
      for the grouped case, i.e.:
      
       # perf top -e '{cycles,instructions}'
      
      Fix it.
      
      Fixes: bee9ca1c ("perf report TUI: Remove needless 'dummy' event from menu")
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d4ccbacb
    • Ian Rogers's avatar
      perf parse-events: Avoid an uninitialized read when using fake PMUs · 33321a06
      Ian Rogers authored
      With a fake_pmu the pmu_info isn't populated by perf_pmu__check_alias.
      In this case, don't try to copy the uninitialized values to the evsel.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200826042910.1902374-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      33321a06
    • Thomas Richter's avatar
      perf stat: Fix out of bounds array access in the print_counters() evlist method · 313146a8
      Thomas Richter authored
      Fix a compile error on F32 and gcc version 10.1 on s390 in file
      utils/stat-display.c.  The error does not show up with make DEBUG=y.  In
      fact the issue shows up when using both compiler options -O6 and
      -D_FORTIFY_SOURCE=2 (which are omitted with DEBUG=Y).
      
      This is the offending call chain:
      
      print_counter_aggr()
        printout(config, -1, 0, ...)  with 2nd parm id set to -1
          aggr_printout(config, x, id --> -1, ...) which leads to this code:
      		case AGGR_NONE:
                      if (evsel->percore && !config->percore_show_thread) {
                              ....
                      } else {
                              fprintf(config->output, "CPU%*d%s",
                                      config->csv_output ? 0 : -7,
                                      evsel__cpus(evsel)->map[id],
      				                        ^^ id is -1 !!!!
                                      config->csv_sep);
                      }
      
      This is a compiler inlining issue which is detected on s390 but not on
      other plattforms.
      
      Output before:
      
       # make util/stat-display.o
          .....
      
        util/stat-display.c: In function ‘perf_evlist__print_counters’:
        util/stat-display.c:121:4: error: array subscript -1 is below array
            bounds of ‘int[]’ [-Werror=array-bounds]
        121 |    fprintf(config->output, "CPU%*d%s",
            |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        122 |     config->csv_output ? 0 : -7,
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        123 |     evsel__cpus(evsel)->map[id],
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        124 |     config->csv_sep);
            |     ~~~~~~~~~~~~~~~~
        In file included from util/evsel.h:13,
                       from util/evlist.h:13,
                       from util/stat-display.c:9:
        /root/linux/tools/lib/perf/include/internal/cpumap.h:10:7:
        note: while referencing ‘map’
         10 |  int  map[];
            |       ^~~
        cc1: all warnings being treated as errors
        mv: cannot stat 'util/.stat-display.o.tmp': No such file or directory
        make[3]: *** [/root/linux/tools/build/Makefile.build:97: util/stat-display.o]
        Error 1
        make[2]: *** [Makefile.perf:716: util/stat-display.o] Error 2
        make[1]: *** [Makefile.perf:231: sub-make] Error 2
        make: *** [Makefile:110: util/stat-display.o] Error 2
        [root@t35lp46 perf]#
      
      Output after:
      
        # make util/stat-display.o
          .....
        CC       util/stat-display.o
        [root@t35lp46 perf]#
      
      Committer notes:
      
      Removed the removal of {} enclosing the multiline else block, as pointed
      out by Jiri Olsa.
      Suggested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200825063304.77733-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      313146a8
    • Thomas Richter's avatar
      perf test: Set NULL sentinel in pmu_events table in "Parse and process metrics" test · 492d4d87
      Thomas Richter authored
      Linux 5.9 introduced perf test case "Parse and process metrics" and
      on s390 this test case always dumps core:
      
        [root@t35lp67 perf]# ./perf test -vvvv -F 67
        67: Parse and process metrics                             :
        --- start ---
        metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
        parsing metric: inst_retired.any / cpu_clk_unhalted.thread
        Segmentation fault (core dumped)
        [root@t35lp67 perf]#
      
      I debugged this core dump and gdb shows this call chain:
      
        (gdb) where
         #0  0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6
         #1  0x000003ffabc293de in strcasestr () from /lib64/libc.so.6
         #2  0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any",
                  n=<optimized out>)
             at util/metricgroup.c:368
         #3  find_metric (map=<optimized out>, map=<optimized out>,
                 metric=0x1e6ea20 "inst_retired.any")
            at util/metricgroup.c:765
         #4  __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0,
                 metric_no_group=<optimized out>, m=<optimized out>)
            at util/metricgroup.c:844
         #5  resolve_metric (ids=0x0, map=0x0, metric_list=0x0,
                metric_no_group=<optimized out>)
            at util/metricgroup.c:881
         #6  metricgroup__add_metric (metric=<optimized out>,
              metric_no_group=metric_no_group@entry=false, events=<optimized out>,
              events@entry=0x3ffd84fb878, metric_list=0x0,
              metric_list@entry=0x3ffd84fb868, map=0x0)
            at util/metricgroup.c:943
         #7  0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>,
              metric_list=0x3ffd84fb868, events=0x3ffd84fb878,
              metric_no_group=<optimized out>, list=<optimized out>)
            at util/metricgroup.c:988
         #8  parse_groups (perf_evlist=perf_evlist@entry=0x1e70260,
                str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>,
                metric_no_merge=<optimized out>,
                fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>,
                metric_events=0x3ffd84fba58, map=0x1)
            at util/metricgroup.c:1040
         #9  0x0000000001103eb2 in metricgroup__parse_groups_test(
        	evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>,
        	str=str@entry=0x12f34b2 "IPC",
        	metric_no_group=metric_no_group@entry=false,
        	metric_no_merge=metric_no_merge@entry=false,
        	metric_events=0x3ffd84fba58)
            at util/metricgroup.c:1082
         #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0,
                ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC",
        	vals=0x3ffd84fbad8, name=0x12f34b2 "IPC")
            at tests/parse-metric.c:159
         #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8,
        	name=0x12f34b2 "IPC")
            at tests/parse-metric.c:189
         #12 test_ipc () at tests/parse-metric.c:208
      .....
      ..... omitted many more lines
      
      This test case was added with
      commit 218ca91d ("perf tests: Add parse metric test for frontend metric").
      
      When I compile with make DEBUG=y it works fine and I do not get a core dump.
      
      It turned out that the above listed function call chain worked on a struct
      pmu_event array which requires a trailing element with zeroes which was
      missing. The marco map_for_each_event() loops over that array tests for members
      metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes
      the issue.
      
      Output after:
      
        [root@t35lp46 perf]# ./perf test 67
        67: Parse and process metrics                             : Ok
        [root@t35lp46 perf]#
      
      Committer notes:
      
      As Ian remarks, this is not s390 specific:
      
      <quote Ian>
        This also shows up with address sanitizer on all architectures
        (perhaps change the patch title) and perhaps add a "Fixes: <commit>"
        tag.
      
        =================================================================
        ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address
        0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp
        0x7ffd24327c58
        READ of size 8 at 0x55c93b4d59e8 thread T0
            #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2
            #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9
            #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9
            #3 0x55c93a1528db in metricgroup__add_metric
        tools/perf/util/metricgroup.c:943:9
            #4 0x55c93a151996 in metricgroup__add_metric_list
        tools/perf/util/metricgroup.c:988:9
            #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8
            #6 0x55c93a1513e1 in metricgroup__parse_groups_test
        tools/perf/util/metricgroup.c:1082:9
            #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8
            #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9
            #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2
            #10 0x55c93a00f1e8 in test__parse_metric
        tools/perf/tests/parse-metric.c:345:2
            #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9
            #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9
            #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4
            #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9
            #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11
            #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8
            #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2
            #18 0x55c939e45f7e in main tools/perf/perf.c:539:3
      
        0x55c93b4d59e8 is located 0 bytes to the right of global variable
        'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25'
        (0x55c93b4d54a0) of size 1352
        SUMMARY: AddressSanitizer: global-buffer-overflow
        tools/perf/util/metricgroup.c:764:2 in find_metric
        Shadow bytes around the buggy address:
          0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9
          0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
          0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
          0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
          0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
        Shadow byte legend (one shadow byte represents 8 application bytes):
          Addressable:           00
          Partially addressable: 01 02 03 04 05 06 07
          Heap left redzone:	   fa
          Freed heap region:	   fd
          Stack left redzone:	   f1
          Stack mid redzone:	   f2
          Stack right redzone:     f3
          Stack after return:	   f5
          Stack use after scope:   f8
          Global redzone:          f9
          Global init order:	   f6
          Poisoned by user:        f7
          Container overflow:	   fc
          Array cookie:            ac
          Intra object redzone:    bb
          ASan internal:           fe
          Left alloca redzone:     ca
          Right alloca redzone:    cb
          Shadow gap:              cc
      </quote>
      
      I'm also adding the missing "Fixes" tag and setting just .name to NULL,
      as doing it that way is more compact (the compiler will zero out
      everything else) and the table iterators look for .name being NULL as
      the sentinel marking the end of the table.
      
      Fixes: 0a507af9 ("perf tests: Add parse metric test for ipc metric")
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      492d4d87
    • Jin Yao's avatar
      perf parse-events: Set exclude_guest=1 for user-space counting · 943b69ac
      Jin Yao authored
      Currently if we run 'perf record -e cycles:u', exclude_guest=0.
      
      But it doesn't make sense in most cases that we request for
      user-space counting but we also get the guest report.
      
      Of course, we also need to consider 'perf kvm' usage case that
      authorized perf users on the host may only want to count guest user
      space events. For example,
      
        # perf kvm --guest record -e cycles:u
      
      When we have 'exclude_guest=1' for 'perf kvm' usage, we may get nothing
      from guest events.
      
      To keep perf semantics consistent and clear, this patch sets
      exclude_guest=1 for user-space counting but except for 'perf kvm' usage.
      
      Before:
      
        perf record -e cycles:u ./div
        perf evlist -v
        cycles:u: ..., exclude_kernel: 1, exclude_hv: 1, ...
      
      After:
        perf record -e cycles:u ./div
        perf evlist -v
        cycles:u: ..., exclude_kernel: 1, exclude_hv: 1,  exclude_guest: 1, ...
      
      Before:
        perf kvm --guest record -e cycles:u -vvv
      
      perf_event_attr:
      
        size                             120
        { sample_period, sample_freq }   4000
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD
        read_format                      ID
        disabled                         1
        inherit                          1
        exclude_kernel                   1
        exclude_hv                       1
        freq                             1
        sample_id_all                    1
      
      After:
      
        perf kvm --guest record -e cycles:u -vvv
      
      perf_event_attr:
        size                             120
        { sample_period, sample_freq }   4000
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD
        read_format                      ID
        disabled                         1
        inherit                          1
        exclude_kernel                   1
        exclude_hv                       1
        freq                             1
        sample_id_all                    1
      
      For Before/After, exclude_guest are both 0 for perf kvm usage.
      
      perf test 6
      
       6: Parse event definition strings             : Ok
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarLike Xu <like.xu@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200814012120.16647-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      943b69ac
    • Wei Li's avatar
      perf record: Correct the help info of option "--no-bpf-event" · a060c1f1
      Wei Li authored
      The help info of option "--no-bpf-event" is wrongly described as "record
      bpf events", correct it.
      
      Committer testing:
      
        $ perf record -h bpf
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
                --clang-opt <clang options>
                                  options passed to clang when compiling BPF scriptlets
                --clang-path <clang path>
                                  clang binary to use for compiling BPF scriptlets
                --no-bpf-event    do not record bpf events
      
        $
      
      Fixes: 71184c6a ("perf record: Replace option --bpf-event with --no-bpf-event")
      Signed-off-by: default avatarWei Li <liwei391@huawei.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20200819031947.12115-1-liwei391@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a060c1f1
    • Chris Wilson's avatar
      perf tools: Use %zd for size_t printf formats on 32-bit · 20befbb1
      Chris Wilson authored
      A couple of trivial fixes for using %zd for size_t in the code
      supporting the ZSTD compression library.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200820212501.24421-1-chris@chris-wilson.co.ukSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      20befbb1
  2. 21 Aug, 2020 6 commits
  3. 20 Aug, 2020 4 commits
  4. 19 Aug, 2020 2 commits
    • Arvind Sankar's avatar
      lib/string.c: Use freestanding environment · 33d0f96f
      Arvind Sankar authored
      gcc can transform the loop in a naive implementation of memset/memcpy
      etc into a call to the function itself.  This optimization is enabled by
      -ftree-loop-distribute-patterns.
      
      This has been the case for a while, but gcc-10.x enables this option at
      -O2 rather than -O3 as in previous versions.
      
      Add -ffreestanding, which implicitly disables this optimization with
      gcc.  It is unclear whether clang performs such optimizations, but
      hopefully it will also not do so in a freestanding environment.
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33d0f96f
    • Arvind Sankar's avatar
      x86/boot/compressed: Use builtin mem functions for decompressor · 394b19d6
      Arvind Sankar authored
      Since commits
      
        c041b5ad ("x86, boot: Create a separate string.h file to provide standard string functions")
        fb4cac57 ("x86, boot: Move memcmp() into string.h and string.c")
      
      the decompressor stub has been using the compiler's builtin memcpy,
      memset and memcmp functions, _except_ where it would likely have the
      largest impact, in the decompression code itself.
      
      Remove the #undef's of memcpy and memset in misc.c so that the
      decompressor code also uses the compiler builtins.
      
      The rationale given in the comment doesn't really apply: just because
      some functions use the out-of-line version is no reason to not use the
      builtin version in the rest.
      
      Replace the comment with an explanation of why memzero and memmove are
      being #define'd.
      
      Drop the suggestion to #undef in boot/string.h as well: the out-of-line
      versions are not really optimized versions, they're generic code that's
      good enough for the preboot environment. The compiler will likely
      generate better code for constant-size memcpy/memset/memcmp if it is
      allowed to.
      
      Most decompressors' performance is unchanged, with the exception of LZ4
      and 64-bit ZSTD.
      
      	Before	After ARCH
      LZ4	  73ms	 10ms   32
      LZ4	 120ms	 10ms	64
      ZSTD	  90ms	 74ms	64
      
      Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels.
      
      Decompressor code size has small differences, with the largest being
      that 64-bit ZSTD decreases just over 2k. The largest code size increase
      was on 64-bit XZ, of about 400 bytes.
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Suggested-by: default avatarNick Terrell <nickrterrell@gmail.com>
      Tested-by: default avatarNick Terrell <nickrterrell@gmail.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      394b19d6
  5. 18 Aug, 2020 5 commits
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 18445bf4
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A bunch of fixes that came in for SPI during the merge window.
      
        Some from ST and others for their controller, one from Lukas for a
        race between device addition and controller unregistration and one
        from fix from Geert for the DT bindings which unbreaks validation"
      
      * tag 'spi-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        dt-bindings: lpspi: Add missing boolean type for fsl,spi-only-use-cs1-sel
        spi: stm32: always perform registers configuration prior to transfer
        spi: stm32: fixes suspend/resume management
        spi: stm32: fix stm32_spi_prepare_mbr in case of odd clk_rate
        spi: stm32: fix fifo threshold level in case of short transfer
        spi: stm32h7: fix race condition at end of transfer
        spi: stm32: clear only asserted irq flags on interrupt
        spi: Prevent adding devices below an unregistering controller
      18445bf4
    • Linus Torvalds's avatar
      Merge tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · 9899b587
      Linus Torvalds authored
      Pull ia64 page table fix from Mike Rapoport:
       "Fix regression in IA-64 caused by page table allocation refactoring
      
        The refactoring and consolidation of <asm/pgalloc.h> caused regression
        on parisc and ia64. The fix for parisc made it into v5.9-rc1 while the
        fix ia64 got delayed a bit and here it is"
      
      * tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
        arch/ia64: Restore arch-specific pgd_offset_k implementation
      9899b587
    • Yang Shi's avatar
      mm/memory.c: skip spurious TLB flush for retried page fault · b7333b58
      Yang Shi authored
      Recently we found regression when running will_it_scale/page_fault3 test
      on ARM64.  Over 70% down for the multi processes cases and over 20% down
      for the multi threads cases.  It turns out the regression is caused by
      commit 89b15332 ("mm: drop mmap_sem before calling
      balance_dirty_pages() in write fault").
      
      The test mmaps a memory size file then write to the mapping, this would
      make all memory dirty and trigger dirty pages throttle, that upstream
      commit would release mmap_sem then retry the page fault.  The retried
      page fault would see correct PTEs installed then just fall through to
      spurious TLB flush.  The regression is caused by the excessive spurious
      TLB flush.  It is fine on x86 since x86's spurious TLB flush is no-op.
      
      We could just skip the spurious TLB flush to mitigate the regression.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Debugged-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Tested-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b7333b58
    • Linus Torvalds's avatar
      Merge tag 'pstore-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 06a4ec1d
      Linus Torvalds authored
      Pull mailmap update from Kees Cook:
       "This was originally part of my pstore tree, but when I realized that
        mailmap needed re-alphabetizing, I decided to wait until -rc1 to send
        this, as I saw a lot of mailmap additions pending in -next for the
        merge window.
      
        It's a programmatic reordering and the addition of a pstore
        contributor's preferred email address"
      
      * tag 'pstore-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        mailmap: Add WeiXiong Liao
        mailmap: Restore dictionary sorting
      06a4ec1d
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 4cf75621
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Another batch of fixes:
      
        1) Remove nft_compat counter flush optimization, it generates warnings
           from the refcount infrastructure. From Florian Westphal.
      
        2) Fix BPF to search for build id more robustly, from Jiri Olsa.
      
        3) Handle bogus getopt lengths in ebtables, from Florian Westphal.
      
        4) Infoleak and other fixes to j1939 CAN driver, from Eric Dumazet and
           Oleksij Rempel.
      
        5) Reset iter properly on mptcp sendmsg() error, from Florian
           Westphal.
      
        6) Show a saner speed in bonding broadcast mode, from Jarod Wilson.
      
        7) Various kerneldoc fixes in bonding and elsewhere, from Lee Jones.
      
        8) Fix double unregister in bonding during namespace tear down, from
           Cong Wang.
      
        9) Disable RP filter during icmp_redirect selftest, from David Ahern"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
        otx2_common: Use devm_kcalloc() in otx2_config_npa()
        net: qrtr: fix usage of idr in port assignment to socket
        selftests: disable rp_filter for icmp_redirect.sh
        Revert "net: xdp: pull ethernet header off packet after computing skb->protocol"
        phylink: <linux/phylink.h>: fix function prototype kernel-doc warning
        mptcp: sendmsg: reset iter on error redux
        net: devlink: Remove overzealous WARN_ON with snapshots
        tipc: not enable tipc when ipv6 works as a module
        tipc: fix uninit skb->data in tipc_nl_compat_dumpit()
        net: Fix potential wrong skb->protocol in skb_vlan_untag()
        net: xdp: pull ethernet header off packet after computing skb->protocol
        ipvlan: fix device features
        bonding: fix a potential double-unregister
        can: j1939: add rxtimer for multipacket broadcast session
        can: j1939: abort multipacket broadcast session when timeout occurs
        can: j1939: cancel rxtimer on multipacket broadcast session complete
        can: j1939: fix support for multipacket broadcast message
        net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs'
        net: fddi: skfp: cfm: Remove set but unused variable 'oldstate'
        net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs'
        ...
      4cf75621
  6. 17 Aug, 2020 15 commits