- 07 Aug, 2013 13 commits
-
-
Arnaldo Carvalho de Melo authored
It is an errno, so print an error string. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-zt68gijvvoe8gd7kmclo43si@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Ahern authored
On Fedora 18, with gcc 4.6.4 compile fails with: arch/x86/util/tsc.c: In function ‘perf_time_to_tsc’: arch/x86/util/tsc.c:13:6: error: declaration of ‘time’ shadows a global declaration [-Werror=shadow] cc1: all warnings being treated as errors make: *** [/tmp/junk/arch/x86/util/tsc.o] Error 1 make: *** Waiting for unfinished jobs.... Fix by renaming the local variable. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1374848843-43127-1-git-send-email-dsahern@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Ahern authored
Symbol offset is one of the fields that can be requested in perf-script. Currently you do not get that data when requested. e.g., perf script -f comm,tid,pid,time,cpu,sym,symoff,ip ... gcc 6201/6201 [006] 762250.617897: ffffffff81090d95 update_curr ffffffff810911b8 dequeue_entity ffffffff81091825 dequeue_task_fair ffffffff81087163 dequeue_task ffffffff81087c03 deactivate_task ... With this patch you get the offset: ... gcc 6201/6201 [006] 762250.617897: ffffffff81090d95 update_curr+0x1c5 ffffffff810911b8 dequeue_entity+0x28 ffffffff81091825 dequeue_task_fair+0x45 ffffffff81087163 dequeue_task+0x93 ffffffff81087c03 deactivate_task+0x23 ... Signed-off-by: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1375024474-45726-1-git-send-email-dsahern@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding 2 more tests to the automated parse events suite for following event config: '{cycles,cache-misses,branch-misses}:S' '{instructions,branch-misses}:Su' Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-tmcy0ir7i8id2t54qg5ifbio@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding test to validate perf_event_attr data for command: 'record -e '{cycles,cache-misses}:S' Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-9eppxvhkly6gse5ximudckrp@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding 'S' event/group modifier to specify that the event value/s are read by PERF_SAMPLE_READ sample type processing, instead of the period value offered by lower layers. There's additional behaviour change for 'S' modifier being specified on event group: Currently all the events within a group makes samples. If user now specifies 'S' within group modifier, only the leader will trigger samples. The rest of events in the group will have sampling disabled. And same as for single events, values of all events within the group (including leader) are read by PERF_SAMPLE_READ sample type processing. Following example will create event group with cycles and cache-misses events, setting the cycles as group leader and the only event to actually sample. Both cycles and cache-misses event period values are read by PERF_SAMPLE_READ sample type processing with PERF_FORMAT_GROUP read format. Example: $ perf record -e '{cycles,cache-misses}:S' ls ... $ perf report --group --show-total-period --stdio ... # Samples: 36 of event 'anon group { cycles, cache-misses }' # Event count (approx.): 12585593 # # Overhead Period Command Shared Object Symbol # .............. .............. ....... ................. .......................... # 19.92% 1.20% 2505936 31 ls [kernel.kallsyms] [k] mark_held_locks 13.74% 0.47% 1729327 12 ls [kernel.kallsyms] [k] sched_clock_local 13.64% 23.72% 1716147 612 ls ld-2.14.90.so [.] check_match.10805 13.12% 23.22% 1650778 599 ls libc-2.14.90.so [.] _nl_intern_locale_data 11.24% 29.19% 1414554 753 ls [kernel.kallsyms] [k] sched_clock_cpu 8.50% 0.35% 1070150 9 ls [kernel.kallsyms] [k] check_chain_key ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-iyoinu3axi11mymwnh2b7fxj@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
For sample with sample type PERF_SAMPLE_READ the period value is stored in the 'struct sample_read'. Moreover if the read format has PERF_FORMAT_GROUP, the 'struct sample_read' contains period values for all events in the group (for which the sample's event is a leader). We deliver separated samples for all the values contained within the 'struct sample_read'. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-6mdm5xkrm6kypouh1c33cyys@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
This will be helpful for PERF_FORMAT_GROUP samples where we need to store ID related period value for each event. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-twmlgsbyim97p7cyohjwb1df@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
We need to fail the event ID retrieval in case both following conditions are true: - we are on kernel with no PERF_EVENT_IOC_ID support - PERF_FORMAT_GROUP read format is set The PERF_FORMAT_GROUP read format bit is the killer for retrieving event ID out of the read syscall, because we have no guarantee of the event placement within leader kernel sibling list. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-e93pgyj20rqx48qzw10vj4r4@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding support to parse out the PERF_SAMPLE_READ sample bits. The code contains both single and group format specification. This code parse out and prepare PERF_SAMPLE_READ data into the perf_sample struct. It will be used for group leader sampling feature comming in shortly. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-0tgdoln5rwk3wocshb442cl3@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Changing the way we retrieve the event ID. Instead of parsing out the ID out of the read data, using the PERF_EVENT_IOC_ID ioctl. Keeping the old way in place to support kernels without PERF_EVENT_IOC_ID ioctl support. This will be useful for retrieving the event ID for events with PERF_FORMAT_GROUP read format set, where it's impossible to get correct event id out of the read call data. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-psgb4n7kte8e6tfenbe7nj2h@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
It's possible some of the counters in the group could be disabled when sampling member of the event group is reading the rest via PERF_SAMPLE_READ sample type processing. Disabled counters could then produce wrong numbers. Fixing that by reading only enabled counters for PERF_SAMPLE_READ sample type processing. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wwkjb0bbcuslnz0klrmqi26r@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
The only way to get the event ID is by reading the event fd, followed by parsing the ID value out of the returned data. While this is ok for current read format used by perf tool, it is not ok when we use PERF_FORMAT_GROUP format. With this format the data are returned for the whole group and there's no way to find out what ID belongs to our fd (if we are not group leader event). Adding a simple ioctl that returns event primary ID for given fd. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-v1bn5cto707jn0bon34afqr1@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 30 Jul, 2013 8 commits
-
-
Frederic Weisbecker authored
A perf event can be used without forcing the tick to stay alive if it doesn't use a frequency but a sample period and if it doesn't throttle (raise storm of events). Since the lockup detector neither use a perf event frequency nor should ever throttle due to its high period, it can now run concurrently with the full dynticks feature. So remove the hack that disabled the watchdog. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anish Singh <anish198519851985@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-9-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
Currently the full dynticks subsystem keep the tick alive as long as there are perf events running. This prevents the tick from being stopped as long as features such that the lockup detectors are running. As a temporary fix, the lockup detector is disabled by default when full dynticks is built but this is not a long term viable solution. To fix this, only keep the tick alive when an event configured with a frequency rather than a period is running on the CPU, or when an event throttles on the CPU. These are the only purposes of the perf tick, especially now that the rotation of flexible events is handled from a seperate hrtimer. The tick can be shutdown the rest of the time. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-8-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
This is going to be used by the full dynticks subsystem as a finer-grained information to know when to keep and when to stop the tick. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-7-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
When an event is migrated, move the event per-cpu accounting accordingly so that branch stack and cgroup events work correctly on the new CPU. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-6-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
This way we can use the per-cpu handling seperately. This is going to be used by to fix the event migration code accounting. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-5-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
Gather all the event accounting code to a single place, once all the prerequisites are completed. This simplifies the refcounting. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-4-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
In case of allocation failure, get_callchain_buffer() keeps the refcount incremented for the current event. As a result, when get_callchain_buffers() returns an error, we must cleanup what it did by cancelling its last refcount with a call to put_callchain_buffers(). This is a hack in order to be able to call free_event() after that failure. The original purpose of that was to simplify the failure path. But this error handling is actually counter intuitive, ugly and not very easy to follow because one expect to see the resources used to perform a service to be cleaned by the callee if case of failure, not by the caller. So lets clean this up by cancelling the refcount from get_callchain_buffer() in case of failure. And correctly free the event accordingly in perf_event_alloc(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-3-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
On callchain buffers allocation failure, free_event() is called and all the accounting performed in perf_event_alloc() for that event is cancelled. But if the event has branch stack sampling, it is unaccounted as well from the branch stack sampling events refcounts. This is a bug because this accounting is performed after the callchain buffer allocation. As a result, the branch stack sampling events refcount can become negative. To fix this, move the branch stack event accounting before the callchain buffer allocation. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-2-git-send-email-fweisbec@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
- 23 Jul, 2013 9 commits
-
-
Peter Zijlstra authored
Smart wake-affine is using node-size as the factor currently, but the overhead of the mask operation is high. Thus, this patch introduce the 'sd_llc_size' percpu variable, which will record the highest cache-share domain size, and make it to be the new factor, in order to reduce the overhead and make it more reasonable. Tested-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Tested-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Michael Wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/51D5008E.6030102@linux.vnet.ibm.com [ Tidied up the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Michael Wang authored
The wake-affine scheduler feature is currently always trying to pull the wakee close to the waker. In theory this should be beneficial if the waker's CPU caches hot data for the wakee, and it's also beneficial in the extreme ping-pong high context switch rate case. Testing shows it can benefit hackbench up to 15%. However, the feature is somewhat blind, from which some workloads such as pgbench suffer. It's also time-consuming algorithmically. Testing shows it can damage pgbench up to 50% - far more than the benefit it brings in the best case. So wake-affine should be smarter and it should realize when to stop its thankless effort at trying to find a suitable CPU to wake on. This patch introduces 'wakee_flips', which will be increased each time the task flips (switches) its wakee target. So a high 'wakee_flips' value means the task has more than one wakee, and the bigger the number, the higher the wakeup frequency. Now when making the decision on whether to pull or not, pay attention to the wakee with a high 'wakee_flips', pulling such a task may benefit the wakee. Also imply that the waker will face cruel competition later, it could be very cruel or very fast depends on the story behind 'wakee_flips', waker therefore suffers. Furthermore, if waker also has a high 'wakee_flips', that implies that multiple tasks rely on it, then waker's higher latency will damage all of them, so pulling wakee seems to be a bad deal. Thus, when 'waker->wakee_flips / wakee->wakee_flips' becomes higher and higher, the cost of pulling seems to be worse and worse. The patch therefore helps the wake-affine feature to stop its pulling work when: wakee->wakee_flips > factor && waker->wakee_flips > (factor * wakee->wakee_flips) The 'factor' here is the number of CPUs in the current CPU's NUMA node, so a bigger node will lead to more pulling since the trial becomes more severe. After applying the patch, pgbench shows up to 40% improvements and no regressions. Tested with 12 cpu x86 server and tip 3.10.0-rc7. The percentages in the final column highlight the areas with the biggest wins, all other areas improved as well: pgbench base smart | db_size | clients | tps | | tps | +---------+---------+-------+ +-------+ | 22 MB | 1 | 10598 | | 10796 | | 22 MB | 2 | 21257 | | 21336 | | 22 MB | 4 | 41386 | | 41622 | | 22 MB | 8 | 51253 | | 57932 | | 22 MB | 12 | 48570 | | 54000 | | 22 MB | 16 | 46748 | | 55982 | +19.75% | 22 MB | 24 | 44346 | | 55847 | +25.93% | 22 MB | 32 | 43460 | | 54614 | +25.66% | 7484 MB | 1 | 8951 | | 9193 | | 7484 MB | 2 | 19233 | | 19240 | | 7484 MB | 4 | 37239 | | 37302 | | 7484 MB | 8 | 46087 | | 50018 | | 7484 MB | 12 | 42054 | | 48763 | | 7484 MB | 16 | 40765 | | 51633 | +26.66% | 7484 MB | 24 | 37651 | | 52377 | +39.11% | 7484 MB | 32 | 37056 | | 51108 | +37.92% | 15 GB | 1 | 8845 | | 9104 | | 15 GB | 2 | 19094 | | 19162 | | 15 GB | 4 | 36979 | | 36983 | | 15 GB | 8 | 46087 | | 49977 | | 15 GB | 12 | 41901 | | 48591 | | 15 GB | 16 | 40147 | | 50651 | +26.16% | 15 GB | 24 | 37250 | | 52365 | +40.58% | 15 GB | 32 | 36470 | | 50015 | +37.14% Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51D50057.9000809@linux.vnet.ibm.com [ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Vladimir Davydov authored
The bad thing about update_h_load(), which computes hierarchical load factor for task groups, is that it is called for each task group in the system before every load balancer run, and since rebalance can be triggered very often, this function can eat really a lot of cpu time if there are many cpu cgroups in the system. Although the situation was improved significantly by commit a35b6466 ('sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies'), the problem still can arise under some kinds of loads, e.g. when cpus are switching from idle to busy and back very frequently. For instance, when I start 1000 of processes that wake up every millisecond on my 8 cpus host, 'top' and 'perf top' show: Cpu(s): 17.8%us, 24.3%sy, 0.0%ni, 57.9%id, 0.0%wa, 0.0%hi, 0.0%si Events: 243K cycles 7.57% [kernel] [k] __schedule 7.08% [kernel] [k] timerqueue_add 6.13% libc-2.12.so [.] usleep Then if I create 10000 *idle* cpu cgroups (no processes in them), cpu usage increases significantly although the 'wakers' are still executing in the root cpu cgroup: Cpu(s): 19.1%us, 48.7%sy, 0.0%ni, 31.6%id, 0.0%wa, 0.0%hi, 0.7%si Events: 230K cycles 24.56% [kernel] [k] tg_load_down 5.76% [kernel] [k] __schedule This happens because this particular kind of load triggers 'new idle' rebalance very frequently, which requires calling update_h_load(), which, in turn, calls tg_load_down() for every *idle* cpu cgroup even though it is absolutely useless, because idle cpu cgroups have no tasks to pull. This patch tries to improve the situation by making h_load calculation proceed only when h_load is really necessary. To achieve this, it substitutes update_h_load() with update_cfs_rq_h_load(), which computes h_load only for a given cfs_rq and all its ascendants, and makes the load balancer call this function whenever it considers if a task should be pulled, i.e. it moves h_load calculations directly to task_h_load(). For h_load of the same cfs_rq not to be updated multiple times (in case several tasks in the same cgroup are considered during the same balance run), the patch keeps the time of the last h_load update for each cfs_rq and breaks calculation when it finds h_load to be uptodate. The benefit of it is that h_load is computed only for those cfs_rq's, which really need it, in particular all idle task groups are skipped. Although this, in fact, moves h_load calculation under rq lock, it should not affect latency much, because the amount of work done under rq lock while trying to pull tasks is limited by sched_nr_migrate. After the patch applied with the setup described above (1000 wakers in the root cgroup and 10000 idle cgroups), I get: Cpu(s): 16.9%us, 24.8%sy, 0.0%ni, 58.4%id, 0.0%wa, 0.0%hi, 0.0%si Events: 242K cycles 7.57% [kernel] [k] __schedule 6.70% [kernel] [k] timerqueue_add 5.93% libc-2.12.so [.] usleep Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1373896159-1278-1-git-send-email-vdavydov@parallels.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Adrian Hunter authored
The test uses the newly added cap_usr_time_zero and time_zero of perf_event_mmap_page. TSC from rdtsc is compared with the time from 2 perf events. The test passes if the calculated times are all in the correct order. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1372425741-1676-4-git-send-email-adrian.hunter@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Adrian Hunter authored
For modern CPUs, perf clock is directly related to TSC. TSC can be calculated from perf clock and vice versa using a simple calculation. Two of the three componenets of that calculation are already exported in struct perf_event_mmap_page. This patch exports the third. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Adrian Hunter authored
The capabilities bits must not be "union'ed" together. Put them in a separate struct. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1372425741-1676-2-git-send-email-adrian.hunter@intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
Due to a discussion with Adrian I had a good look at the perf_event_type record layout and found the documentation to be somewhat unclear. Cc: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130716150907.GL23818@dyad.programming.kicks-ass.netSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Jiri Kosina authored
In fd4363ff ("x86: Introduce int3 (breakpoint)-based instruction patching"), the mechanism that was introduced for notifying alternatives code from int3 exception handler that and exception occured was die_notifier. This is however problematic, as early code might be using jump labels even before the notifier registration has been performed, which will then lead to an oops due to unhandled exception. One of such occurences has been encountered by Fengguang: int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8 task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000 RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225 RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006 RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002 R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0 R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8 FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0 Stack: ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48 ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68 ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98 Call Trace: <IRQ> [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79 [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84 [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0 [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41 [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80 <EOI> [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1 [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16 [<ffffffff81015f10>] default_idle+0x147/0x282 [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84 [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5 [...] Fix this by directly calling poke_int3_handler() from the int3 exception handler (analogically to what ftrace has been doing already), instead of relying on notifier, registration of which might not have yet been finalized by the time of the first trap. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.czSigned-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: * Fix memcpy benchmark for large sizes, from Andi Kleen. * Support callchain sorting based on addresses, from Andi Kleen * Move weight back to common sort keys, From Andi Kleen. * Fix named threads support in 'perf script', from David Ahern. * Handle ENODEV on default cycles event, fix from David Ahern. * More install tests, from Jiri Olsa. * Fix build with perl 5.18, from Kirill A. Shutemov. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 22 Jul, 2013 10 commits
-
-
Andi Kleen authored
This is a partial revert of Namhyung's patch afab87b9 perf sort: Separate out memory-specific sort keys He wrote For global/local weights, I'm not entirely sure to place them into the memory dimension. But it's the only user at this time. Well TSX is another (in fact the original) user of the flags, and it needs them to be common. So move local/global weight back to the common sort keys. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Link: http://lkml.kernel.org/r/1374188333-17899-1-git-send-email-andi@firstfloor.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding install-* tests into tests/make. Those tests are broken, so commenting them out right away. * Nothing get installed for install-man, install_doc and install_html targets, they just rebuild the documentation. * I've got following error for 'install-info': $ make -f tests/make make_install_info - make_install_info: cd . && make -f Makefile DESTDIR=/tmp/tmp.Xi4mb9J1a0 install-info $ tail -f make_install_info ... PERF_VERSION = 3.11.rc1.g9b3c2d make[2]: *** No rule to make target `user-manual.xml', needed by `user-manual.texi'. Stop. make[1]: *** [install-info] Error 2 * I've got following error for 'install-pdf': $ make -f tests/make make_install_pdf - make_install_pdf: cd . && make -f Makefile DESTDIR=/tmp/tmp.fXseECBbt1 install-pdf $ tail -f make_install_pdf ... PERF_VERSION = 3.11.rc1.g9b3c2d make[2]: *** No rule to make target `user-manual.xml', needed by `user-manual.pdf'. Stop. make[1]: *** [install-pdf] Error 2 Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1374497014-2817-6-git-send-email-jolsa@redhat.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding 'make install' and 'make install-bin' tests into tests/make. It's run as part of the suite, but could be run separately like: $ make -f tests/make make_install - make_install: cd . && make -f Makefile DESTDIR=/tmp/tmp.LpkYbk5pfs install test: test -x /tmp/tmp.LpkYbk5pfs/bin/perf $ make -f tests/make make_install_bin - make_install_bin: cd . && make -f Makefile DESTDIR=/tmp/tmp.dMxePBMcFT install-bin test: test -x /tmp/tmp.dMxePBMcFT/bin/perf Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1374497014-2817-5-git-send-email-jolsa@redhat.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Adding TMP_DEST tests/make variable to provide the DESTDIR directory for installation tests. Adding this to existing test targets, since DESTDIR variable 'should not' affect other than install* targets. We can always separate this if there's a need for DESTDIR-free build test. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1374497014-2817-4-git-send-email-jolsa@redhat.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Renaming TMP to TMP_O tests/make variable to make a name space for other temp variables. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1374497014-2817-3-git-send-email-jolsa@redhat.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiri Olsa authored
Running tags and cscope make tests only if the 'ctags' and 'cscope' binaries are installed, so we don't have false alarm test failures. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1374497014-2817-2-git-send-email-jolsa@redhat.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kirill A. Shutemov authored
perl.h from new Perl release doesn't like -Wundef and -Wswitch-default: /usr/lib/perl5/core_perl/CORE/perl.h:548:5: error: "SILENT_NO_TAINT_SUPPORT" is not defined [-Werror=undef] #if SILENT_NO_TAINT_SUPPORT && !defined(NO_TAINT_SUPPORT) ^ /usr/lib/perl5/core_perl/CORE/perl.h:556:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef] #if NO_TAINT_SUPPORT ^ In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3471:0, from util/scripting-engines/trace-event-perl.c:30: /usr/lib/perl5/core_perl/CORE/sv.h:1455:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef] #if NO_TAINT_SUPPORT ^ In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3472:0, from util/scripting-engines/trace-event-perl.c:30: /usr/lib/perl5/core_perl/CORE/regexp.h:436:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef] #if NO_TAINT_SUPPORT ^ In file included from /usr/lib/perl5/core_perl/CORE/hv.h:592:0, from /usr/lib/perl5/core_perl/CORE/perl.h:3480, from util/scripting-engines/trace-event-perl.c:30: /usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_siphash_2_4’: /usr/lib/perl5/core_perl/CORE/hv_func.h:222:3: error: switch missing default case [-Werror=switch-default] switch( left ) ^ /usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_superfast’: /usr/lib/perl5/core_perl/CORE/hv_func.h:274:5: error: switch missing default case [-Werror=switch-default] switch (rem) { \ ^ /usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_murmur3’: /usr/lib/perl5/core_perl/CORE/hv_func.h:398:5: error: switch missing default case [-Werror=switch-default] switch(bytes_in_carry) { /* how many bytes in carry */ ^ Let's disable the warnings for code which uses perl.h. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1372063394-20126-1-git-send-email-kirill@shutemov.nameSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Andi Kleen authored
With programs with very large functions it can be useful to distinguish the callgraph nodes on more than just function names. So for example if you have multiple calls to the same function, it ends up being separate nodes in the chain. This patch adds a new key field to the callgraph options, that allows comparing nodes on functions (as today, default) and addresses. Longer term it would be nice to also handle src lines, but that would need more changes and address is a reasonable proxy for it today. I right now reference the global params, as there was no simple way to register a params pointer. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-0uskktybf0e7wrnoi5e9b9it@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Andi Kleen authored
The glibc calloc() function has an optimization to not explicitely memset() very large calloc allocations that just came from mmap(), because they are known to be zero. This could result in the perf memcpy benchmark reading only from the zero page, which gives unrealistic results. Always call memset explicitly on the source area to avoid this problem. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/n/tip-pzz2qrdq9eymxda0y8yxdn33@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Ahern authored
Some systems (e.g., VMs on qemu-0.13 with the default vcpu model) report an unsupported CPU model: Performance Events: unsupported p6 CPU model 2 no PMU driver, software events only. Subsequent invocations of perf fail with: The sys_perf_event_open() syscall returned with 19 (No such device) for event (cycles). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured? Add ENODEV to the list of errno's to fallback to cpu-clock. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374190079-28507-1-git-send-email-dsahern@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-