- 08 Jan, 2018 15 commits
-
-
Jiri Olsa authored
Currently we use perf_event_context::task_ctx_data to save and restore the LBR status when the task is scheduled out and in. We don't allocate it for child contexts, which results in shorter task's LBR stack, because we don't save the history from previous run and start over every time we schedule the task in. I made a test to generate samples with LBR call stack and got higher numbers on bigger chain depths: before: after: LBR call chain: nr: 1 60561 498127 LBR call chain: nr: 2 0 0 LBR call chain: nr: 3 107030 2172 LBR call chain: nr: 4 466685 62758 LBR call chain: nr: 5 2307319 878046 LBR call chain: nr: 6 48713 495218 LBR call chain: nr: 7 1040 4551 LBR call chain: nr: 8 481 172 LBR call chain: nr: 9 878 120 LBR call chain: nr: 10 2377 6698 LBR call chain: nr: 11 28830 151487 LBR call chain: nr: 12 29347 339867 LBR call chain: nr: 13 4 22 LBR call chain: nr: 14 3 53 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 4af57ef2 ("perf: Add pmu specific data for perf task context") Link: http://lkml.kernel.org/r/20180107160356.28203-4-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Display namespaces bit in -vv debug display: $ perf record -vv --namespaces ... ... perf_event_attr: size 112 ... namespaces 1 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-3-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
There's no reason anymore to treat babel trace in a special way, because a) we no longer display its state b) the needed babeltrace library is now out and well adopted among distros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180107160356.28203-2-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
perf script has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf script --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-7-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
perf report has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf report --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf report --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-6-git-send-email-yao.jin@linux.intel.com [ Add missing colons at end of examples in the man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
Previous patch supports the multiple time range. For example, select the first and second 10% time slices. perf report --time 10%/1,10%/2 We need a function to check if a timestamp is in the ranges of [0, 10%) and [10%, 20%]. Note that it includes the last element in [10%, 20%] but it doesn't include the last element in [0, 10%). It's to avoid the overlap. This patch implments a new function perf_time__ranges_skip_sample for this checking. Change log: v4: Let perf_time__ranges_skip_sample be compatible with perf_time__skip_sample when only one time range. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-5-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
Current perf report/script/... have a --time option to limit the time range of output. But right now it only supports absolute time, add support for time percentage. For example: 1. Select the second 10% time slice perf report --time 10%/2 2. Select from 0% to 10% time slice perf report --time 0%-10% It also support the multiple time ranges. 3. Select the first and second 10% time slices perf report --time 10%/1,10%/2 4. Select from 0% to 10% and 30% to 40% slices perf report --time 0%-10%,30%-40% Changelog: v4: An issue is found. Following passes. perf script --time 10%/10x12321xsdfdasfdsafdsafdsa Now it uses strtol to replace atoi. Committer notes: This just puts in place the infrastructure, so the examples in this cset comment will only work later, after more patches in this series are applied. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
In the default 'perf record' configuration, all samples are processed, to create the HEADER_BUILD_ID table. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). Later, at post processing time, perf report/script will fetch the time from perf file header. Committer testing: # perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ] [root@jouet home]# perf report --header | grep "time of " # time of first sample : 22947.909226 # time of last sample : 22948.910704 # # perf report -D | grep PERF_RECORD_SAMPLE\( 0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0 0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0 <SNIP> 3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0 0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0 2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0 # Changelog: v7: Just update the patch description according to Arnaldo's suggestion. v6: Currently '--buildid-all' is not enabled at default. So the walking on all samples is the default operation. There is no big overhead to calculate the timestamp boundary in process_sample_event handler once we already go through all samples. So the timestamp boundary calculation is enabled by default when '--buildid-all' is not enabled. While if '--buildid-all' is enabled, we creates a new option "--timestamp-boundary" for user to decide if it enables the timestamp boundary calculation. v5: There is an issue that the sample walking can only work when '--buildid-all' is not enabled. So we need to let the walking be able to work even if '--buildid-all' is enabled and let the processing skips the dso hit marking for this case. At first, I want to provide a new option "--record-time-boundaries". While after consideration, I think a new option is not very necessary. v3: Remove the definitions of first_sample_time and last_sample_time from struct record and directly save them in perf_evlist. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
perf report/script/... have a --time option to limit the time range of output. That's very useful to slice large traces, e.g. when processing the output of perf script for some analysis. But right now --time only supports absolute time. Also there is no fast way to get the start/end times of a given trace except for looking at it. This makes it hard to e.g. only decode the first half of the trace, which is useful for parallelization of scripts Another problem is that perf records are variable size and there is no synchronization mechanism. So the only way to find the last sample reliably would be to walk all samples. But we want to avoid that in perf report/... because it is already quite expensive. That is why storing the first sample time and last sample time in perf record is better. This patch creates a new header feature type HEADER_SAMPLE_TIME and related ops. Save the first sample time and the last sample time to the feature section in perf file header. That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time. Committer testing: After this patch is applied the header is written with zeroes, we need the next patch, for "perf record" to actually write the timestamps: # perf report -D | grep PERF_RECORD_SAMPLE\( 22501155244406 0x44f0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21be8c5 period: 1 addr: 0 <SNIP> 22501155793625 0x4a30 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21ffd50 period: 2828043 addr: 0 # perf report --header | grep "time of " # time of first sample : 0.000000 # time of last sample : 0.000000 # Changelog: v7: 1. Rebase to latest perf/core branch. 2. Add following clarification in patch description according to Arnaldo's suggestion. "That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time." v4: Use perf script time style for timestamp printing. Also add with the printing of sample duration. v3: Remove the definitions of first_sample_time/last_sample_time from perf_session. Just define them in perf_evlist Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
When enabling '-b' option in perf record, for example, perf record -b ... perf report and then browsing the annotate browser from perf report (press 'A'), it would fail (annotate browser can't be displayed). It's because the '.add_entry_cb' op of struct report is overwritten by hist_iter__branch_callback() in builtin-report.c. But this function doesn't do something like mapping symbols and sources. So next, do_annotate() will return directly. notes = symbol__annotation(act->ms.sym); if (!notes->src) return 0; This patch adds the lost code to hist_iter__branch_callback (refer to hist_iter__report_callback). v2: Fix a crash bug when perform 'perf report --stdio'. The reason is that we init the symbol annotation only in browser mode, it doesn't allocate/init resources for stdio mode. So now in hist_iter__branch_callback(), it will return directly if it's not in browser mode. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1514284963-18587-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
When a valid vmlinux is not found, 'perf report' falls back to look at /proc/kcore. In this case, it will report the impossible large offset. For example: # perf record -b -e cycles:k find /etc/ > /dev/null # perf report --stdio --branch-history 22.77% _vm_normal_page+18446603336221188162 | ---page_remove_rmap +18446603336221188324 page_remove_rmap +18446603336221188487 (cycles:5) unlock_page_memcg +18446603336221188096 page_remove_rmap +18446603336221188327 (cycles:1) The issue is the value which is passed to parameter 'addr' in __get_srcline() is the objdump address. It's not correct if we calculate the offset by using 'addr - sym->start'. This patch creates a new parameter 'ip' in __get_srcline(). It is not converted to objdump address. With this patch, the perf report output is: 22.77% _vm_normal_page+66 | ---page_remove_rmap +228 page_remove_rmap +391 (cycles:5) unlock_page_memcg +0 page_remove_rmap +231 (cycles:1) page_remove_rmap +236 Committer testing: Make sure you get any valid vmlinux out of the way, using '-v' on the 'perf report' case and deleting it from places where perf searches them, like your kernel build dir and the build-id cache, in ~/.debug/. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1514564812-17344-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Wang Nan authored
Fix a compile error: ... CC util/libunwind/x86_32.o In file included from util/libunwind/x86_32.c:33:0: util/libunwind/../../arch/x86/util/unwind-libunwind.c: In function 'libunwind__x86_reg_id': util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: error: 'EINVAL' undeclared (first use in this function) return -EINVAL; ^ util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: note: each undeclared identifier is reported only once for each function it appears in mv: cannot stat 'util/libunwind/.x86_32.o.tmp': No such file or directory make[4]: *** [util/libunwind/x86_32.o] Error 1 make[3]: *** [util] Error 2 make[2]: *** [libperf-in.o] Error 2 make[1]: *** [sub-make] Error 2 make: *** [all] Error 2 It happens when libunwind-x86 feature is detected. Signed-off-by: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171206015040.114574-1-wangnan0@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
The 'perf test bpf' was hooking a eBPF program on the SyS_epoll_wait() kernel function, that was what the epoll_wait() glibc function ended up calling, but since at least glibc 2.26, the one that comes with, for instance, Fedora 27, glibc ends up calling SyS_epoll_pwait() when epoll_wait() is used, causing this 'perf test' entry to fail. So switch to using epoll_pwait() and hook the eBPF program to the SyS_epoll_pwait() kernel function to make it work on a wider range of glibc and kernel versions. Tested-by: Wang Nan <wangnan0@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-zynvquy63er8s5mrgsz65pto@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To follow standard practice in the kernel sources, documenting the initialization better and helping quickly finding the value for some field in a struct with many entries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-syn3hz9hz7ukxlxbx5x6hv20@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
When failing on one of the BPF tests we were just stating: BPF filter result incorrect Add some more info to help figuring out the problem: BPF filter result incorrect, expected 56, got 0 samples This came out while investigating this failure, first seen after updating the kernel to the 4.15.0-rc6 tag: [root@jouet ~]# perf test bpf 39: BPF filter : 39.1: Basic BPF filtering : FAILED! 39.2: BPF pinning : Skip 39.3: BPF prologue generation: Skip 39.4: BPF relocation checker : Skip [root@jouet ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-403npu7daupv6b2bmxliv5pk@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 06 Jan, 2018 3 commits
-
-
Ingo Molnar authored
Recent changes made a bit of an inconsistent mess out of arch/x86/events/msr.c, fix it: - re-align the initialization tables to be vertically aligned and readable again - harmonize comment style in terms of punctuation, capitalization and spelling - use curly braces for multi-condition branches - remove extra newlines - simplify the code a bit Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1515169132-3980-1-git-send-email-eranian@google.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Stephane Eranian authored
This patch adds support for the Digital Readout provided by the IA32_THERM_STATUS MSR (0x19C) on Intel X86 processors. The readout shows the number of degrees Celcius to the TCC (critical temperature) supported by the processor. Thus, the larger, the better. The perf_event support is provided via the msr PMU. The new logical event is called cpu_thermal_margin. It comes with a unit and snapshot files. The event shows the current temprature distance (margin). It is not an accumulating event. The unit is degrees C. The event is provided per logical CPU to make things simpler but it is the same for both hyper-threads sharing a physical core. $ perf stat -I 1000 -a -A -e msr/cpu_thermal_margin/ This will print the temperature for all logical CPUs. time CPU counts unit events 1.000123741 CPU0 38 C msr/cpu_thermal_margin/ 1.000161837 CPU1 37 C msr/cpu_thermal_margin/ 1.000187906 CPU2 36 C msr/cpu_thermal_margin/ 1.000189046 CPU3 39 C msr/cpu_thermal_margin/ 1.000283044 CPU4 40 C msr/cpu_thermal_margin/ 1.000344297 CPU5 40 C msr/cpu_thermal_margin/ 1.000365832 CPU6 39 C msr/cpu_thermal_margin/ ... In case the temperature margin cannot be read, the reported value would be -1. Works on all processors supporting the Digital Readout (dtherm in cpuinfo) Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1515169132-3980-1-git-send-email-eranian@google.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 05 Jan, 2018 22 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: "We have two more fixes for 4.15, both aimed for stable. The leak fix is obvious, the second patch fixes a bug revealed by the refcount API, when it behaves differently than previous atomic_t and reports refs going from 0 to 1 in one case" * tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes btrfs: Fix flush bio leak
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull XFS fixes from Darrick Wong: "I have just a few fixes for bugs and resource cleanup problems this week: - Fix resource cleanup of failed quota initialization - Fix integer overflow problems wrt s_maxbytes" * tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix s_maxbytes overflow problems xfs: quota: check result of register_shrinker() xfs: quota: fix missed destroy of qi_tree_lock
-
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfdLinus Torvalds authored
Pull MFD fix from Lee Jones: "Late bugfix to plug a leak in rtsx_pcr" * tag 'mfd-fixes-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: rtsx: Release IRQ during shutdown
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull more x86 pti fixes from Thomas Gleixner: "Another small stash of fixes for fallout from the PTI work: - Fix the modules vs. KASAN breakage which was caused by making MODULES_END depend of the fixmap size. That was done when the cpu entry area moved into the fixmap, but now that we have a separate map space for that this is causing more issues than it solves. - Use the proper cache flush methods for the debugstore buffers as they are mapped/unmapped during runtime and not statically mapped at boot time like the rest of the cpu entry area. - Make the map layout of the cpu_entry_area consistent for 4 and 5 level paging and fix the KASLR vaddr_end wreckage. - Use PER_CPU_EXPORT for per cpu variable and while at it unbreak nvidia gfx drivers by dropping the GPL export. The subject line of the commit tells it the other way around, but I noticed that too late. - Fix the ASM alternative macros so they can be used in the middle of an inline asm block. - Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack vector is properly identified. The Spectre mitigations will come with their own bug bits later" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm x86/tlb: Drop the _GPL from the cpu_tlbstate export x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers x86/kaslr: Fix the vaddr_end mess x86/mm: Map cpu_entry_area at the same place on 4/5 level x86/mm: Set MODULES_END to 0xffffffffff000000
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI updates from Thomas Gleixner: - A fix for a add_efi_memmap parameter regression which ensures that the parameter is parsed before it is used. - Reinstate the virtual capsule mapping as the cached copy turned out to break Quark and other things - Remove Matt Fleming as EFI co-maintainer. He stepped back a few days ago. Thanks Matt for all your great work! * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove Matt Fleming as EFI co-maintainer efi/capsule-loader: Reinstate virtual capsule mapping x86/efi: Fix kernel param add_efi_memmap regression
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 fixes from Martin Schwidefsky: "Four bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/dasd: fix wrongly assigned configuration data s390: fix preemption race in disable_sacf_uaccess s390/sclp: disable FORTIFY_SOURCE for early sclp code s390/pci: handle insufficient resources during dma tlb flush
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen fix from Juergen Gross: "One minor fix adjusting the kmalloc flags in the new pvcalls driver added in rc1" * tag 'for-linus-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: use GFP_ATOMIC under spin lock
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fixes from Herbert Xu: "This fixes the following issues: - racy use of ctx->rcvused in af_alg - algif_aead crash in chacha20poly1305 - freeing bogus pointer in pcrypt - build error on MIPS in mpi - memory leak in inside-secure - memory overwrite in inside-secure - NULL pointer dereference in inside-secure - state corruption in inside-secure - build error without CRYPTO_GF128MUL in chelsio - use after free in n2" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: inside-secure - do not use areq->result for partial results crypto: inside-secure - fix request allocations in invalidation path crypto: inside-secure - free requests even if their handling failed crypto: inside-secure - per request invalidation lib/mpi: Fix umul_ppmm() for MIPS64r6 crypto: pcrypt - fix freeing pcrypt instances crypto: n2 - cure use after free crypto: af_alg - Fix race around ctx->rcvused by making it atomic_t crypto: chacha20poly1305 - validate the digest size crypto: chelsio - select CRYPTO_GF128MUL
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "9 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mailmap: update Mark Yao's email address userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails mm/sparse.c: wrong allocation for mem_section mm/zsmalloc.c: include fs.h mm/debug.c: provide useful debugging information for VM_BUG kernel/exit.c: export abort() to modules mm/mprotect: add a cond_resched() inside change_pmd_range() kernel/acct.c: fix the acct->needcheck check in check_free_space() mm: check pfn_valid first in zero_resv_unavail
-
Thomas Gleixner authored
Use the name associated with the particular attack which needs page table isolation for mitigation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jiri Koshina <jikos@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Lutomirski <luto@amacapital.net> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Greg KH <gregkh@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
-
David Woodhouse authored
Where an ALTERNATIVE is used in the middle of an inline asm block, this would otherwise lead to the following instruction being appended directly to the trailing ".popsection", and a failed compile. Fixes: 9cebed42 ("x86, alternative: Use .pushsection/.popsection") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: ak@linux.intel.com Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk
-
Sinan Kaya authored
'Commit cc27b735 ("PCI/portdrv: Turn off PCIe services during shutdown")' revealed a resource leak in rtsx_pci driver during shutdown. Issue shows up as a warning during shutdown as follows: remove_proc_entry: removing non-empty directory 'irq/17', leaking at least 'rtsx_pci' WARNING: CPU: 0 PID: 1578 at fs/proc/generic.c:572 remove_proc_entry+0x11d/0x130 Modules linked in <long list but none that are out-of-tree> ... Call Trace: unregister_irq_proc free_desc irq_free_descs mp_unmap_irq acpi_unregister_gsi_apic acpi_pci_irq_disable do_pci_disable_device pci_disable_device device_shutdown kernel_restart Sys_reboot Even though rtsx_pci driver implements a shutdown callback, it is not releasing the interrupt that it registered during probe. This is causing the ACPI layer to complain that the shared IRQ is in use while freeing IRQ. This code releases the IRQ to prevent resource leak and eliminate the warning. Fixes: cc27b735 ("PCI/portdrv: Turn off PCIe services during shutdown") Link: https://bugzilla.kernel.org/show_bug.cgi?id=198141Reported-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Just collecting some fixes to finish my hoildays :-). A few fixes for i915 (one documentation build fix), one ttm fix, one AMD display fix, one omapdrm fix, and a set of armada fixes from Russell. All seem pretty small, you can now return to your latest security news site" * tag 'drm-fixes-for-v4.15-rc7' of git://people.freedesktop.org/~airlied/linux: drm/i915: Apply Display WA #1183 on skl, kbl, and cfl drm/ttm: check the return value of kzalloc drm/amd/display: call set csc_default if enable adjustment is false docs: fix, intel_guc_loader.c has been moved to intel_guc_fw.c omapdrm/dss/hdmi4_cec: fix interrupt handling documentation/gpu/i915: fix docs build error after file rename drm/i915: Put all non-blocking modesets onto an ordered wq drm/i915: Disable DC states around GMBUS on GLK drm/i915/psr: Fix register name mess up. drm/armada: fix YUV planar format framebuffer offsets drm/armada: improve efficiency of armada_drm_plane_calc_addrs() drm/armada: fix UV swap code drm/armada: fix SRAM powerdown drm/armada: fix leak of crtc structure
-
Jeffy Chen authored
Change the previous employers email addresses to the current email address. Link: http://lkml.kernel.org/r/20171229121726.31589-1-jeffy.chen@rock-chips.comSigned-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Acked-by: Martin Kepplinger <martink@posteo.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
The previous fix in commit 384632e6 ("userfaultfd: non-cooperative: fix fork use after free") corrected the refcounting in case of UFFD_EVENT_FORK failure for the fork userfault paths. That still didn't clear the vma->vm_userfaultfd_ctx of the vmas that were set to point to the aborted new uffd ctx earlier in dup_userfaultfd. Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.comSigned-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Baoquan He authored
In commit 83e3c487 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to save memory. It allocates the first dimension of array with sizeof(struct mem_section). It costs extra memory, should be sizeof(struct mem_section *). Fix it. Link: http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com Fixes: 83e3c487 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Signed-off-by: Baoquan He <bhe@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Sergey Senozhatsky authored
`struct file_system_type' and alloc_anon_inode() function are defined in fs.h, include it directly. Link: http://lkml.kernel.org/r/20171219104219.3017-1-sergey.senozhatsky@gmail.comSigned-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
With the recent addition of hashed kernel pointers, places which need to produce useful debug output have to specify %px, not %p. This patch fixes all the VM debug to use %px. This is appropriate because it's debug output that the user should never be able to trigger, and kernel developers need to see the actual pointers. Link: http://lkml.kernel.org/r/20171219133236.GE13680@bombadil.infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Tobin C. Harding" <me@tobin.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
gcc -fisolate-erroneous-paths-dereference can generate calls to abort() from modular code too. [arnd@arndb.de: drop duplicate exports of abort()] Link: http://lkml.kernel.org/r/20180102103311.706364-1-arnd@arndb.deReported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
While testing on a large CPU system, detected the following RCU stall many times over the span of the workload. This problem is solved by adding a cond_resched() in the change_pmd_range() function. INFO: rcu_sched detected stalls on CPUs/tasks: 154-....: (670 ticks this GP) idle=022/140000000000000/0 softirq=2825/2825 fqs=612 (detected by 955, t=6002 jiffies, g=4486, c=4485, q=90864) Sending NMI from CPU 955 to CPUs 154: NMI backtrace for cpu 154 CPU: 154 PID: 147071 Comm: workload Not tainted 4.15.0-rc3+ #3 NIP: c0000000000b3f64 LR: c0000000000b33d4 CTR: 000000000000aa18 REGS: 00000000a4b0fb44 TRAP: 0501 Not tainted (4.15.0-rc3+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22422082 XER: 00000000 CFAR: 00000000006cf8f0 SOFTE: 1 GPR00: 0010000000000000 c00003ef9b1cb8c0 c0000000010cc600 0000000000000000 GPR04: 8e0000018c32b200 40017b3858fd6e00 8e0000018c32b208 40017b3858fd6e00 GPR08: 8e0000018c32b210 40017b3858fd6e00 8e0000018c32b218 40017b3858fd6e00 GPR12: ffffffffffffffff c00000000fb25100 NIP [c0000000000b3f64] plpar_hcall9+0x44/0x7c LR [c0000000000b33d4] pSeries_lpar_flush_hash_range+0x384/0x420 Call Trace: flush_hash_range+0x48/0x100 __flush_tlb_pending+0x44/0xd0 hpte_need_flush+0x408/0x470 change_protection_range+0xaac/0xf10 change_prot_numa+0x30/0xb0 task_numa_work+0x2d0/0x3e0 task_work_run+0x130/0x190 do_notify_resume+0x118/0x120 ret_from_except_lite+0x70/0x74 Instruction dump: 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008 Link: http://lkml.kernel.org/r/20171214140551.5794-1-khandual@linux.vnet.ibm.comSigned-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check is very wrong, we need time_is_after_jiffies() to make sys_acct() work. Ignoring the overflows, the code should "goto out" if needcheck > jiffies, while currently it checks "needcheck < jiffies" and thus in the likely case check_free_space() does nothing until jiffies overflow. In particular this means that sys_acct() is simply broken, acct_on() sets acct->needcheck = jiffies and expects that check_free_space() should set acct->active = 1 after the free-space check, but this won't happen if jiffies increments in between. This was broken by commit 32dc7308 ("get rid of timer in kern/acct.c") in 2011, then another (correct) commit 795a2f22 ("acct() should honour the limits from the very beginning") made the problem more visible. Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com Fixes: 32dc7308 ("get rid of timer in kern/acct.c") Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp> Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dave Young authored
With latest kernel I get below bug while testing kdump: BUG: unable to handle kernel paging request at ffffea00034b1040 IP: zero_resv_unavail+0xbd/0x126 PGD 37b98067 P4D 37b98067 PUD 37b97067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc1+ #316 Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET92WW (2.42 ) 03/03/2017 task: ffffffff81a0e4c0 task.stack: ffffffff81a00000 RIP: 0010:zero_resv_unavail+0xbd/0x126 RSP: 0000:ffffffff81a03d88 EFLAGS: 00010006 RAX: 0000000000000000 RBX: ffffea00034b1040 RCX: 0000000000000010 RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffffea00034b1040 RBP: 00000000000d2c41 R08: 00000000000000c0 R09: 0000000000000a0d R10: 0000000000000002 R11: 0000000000007f01 R12: ffffffff81a03d90 R13: ffffea0000000000 R14: 0000000000000063 R15: 0000000000000062 FS: 0000000000000000(0000) GS:ffffffff81c73000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea00034b1040 CR3: 0000000037609000 CR4: 00000000000606b0 Call Trace: ? free_area_init_nodes+0x640/0x664 ? zone_sizes_init+0x58/0x72 ? setup_arch+0xb50/0xc6c ? start_kernel+0x64/0x43d ? secondary_startup_64+0xa5/0xb0 Code: c1 e8 0c 48 39 d8 76 27 48 89 de 48 c1 e3 06 48 c7 c7 7a 87 79 81 e8 b0 c0 3e ff 4c 01 eb b9 10 00 00 00 31 c0 48 89 df 49 ff c6 <f3> ab eb bc 6a 00 49 c7 c0 f0 93 d1 81 31 d2 83 ce ff 41 54 49 RIP: zero_resv_unavail+0xbd/0x126 RSP: ffffffff81a03d88 CR2: ffffea00034b1040 ---[ end trace f5ba9e8f73c7ee26 ]--- This is introduced by commit a4a3ede2 ("mm: zero reserved and unavailable struct pages"). The reason is some efi reserved boot ranges is not reported in E820 ram. In my case it is a bgrt buffer: efi: mem00: [Boot Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x00000000d2c41000-0x00000000d2c85fff] (0MB) Use "add_efi_memmap" can workaround the problem with another fix: http://lkml.kernel.org/r/20171130052327.GA3500@dhcp-128-65.nay.redhat.com In zero_resv_unavail it would be better to check pfn_valid first before zero the page struct. This fixes the problem and potential other similar problems. Also as Pavel Tatashin suggested checks pfn_valid at the beginning of the section. The range is backed by real memory. The memory range is efi "Boot Service Data", that means after ExitBootServices() these ranges can be used as system ram. But some of them need to be reserved, for example the bgrt image address in an acpi table, if the image memory is freed then kexec reboot will fail because kexec inherit same acpi table to initialize the driver. Link: http://lkml.kernel.org/r/20171201095048.GA3084@dhcp-128-65.nay.redhat.com Fixes: a4a3ede2 ("mm: zero reserved and unavailable struct pages") Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-