Commit c98d5d94 authored by Len Brown's avatar Len Brown

tools/power: turbostat v2 - re-write for efficiency

Measuring large profoundly-idle configurations
requires turbostat to be more lightweight.
Otherwise, the operation of turbostat itself
can interfere with the measurements.

This re-write makes turbostat topology aware.
Hardware is accessed in "topology order".
Redundant hardware accesses are deleted.
Redundant output is deleted.
Also, output is buffered and
local RDTSC use replaces remote MSR access for TSC.

From a feature point of view, the output
looks different since redundant figures are absent.
Also, there are now -c and -p options -- to restrict
output to the 1st thread in each core, and the 1st
thread in each package, respectively.  This is helpful
to reduce output on big systems, where more detail
than the "-s" system summary is desired.
Finally, periodic mode output is now on stdout, not stderr.

Turbostat v2 is also slightly more robust in
handling run-time CPU online/offline events,
as it now checks the actual map of on-line cpus rather
than just the total number of on-line cpus.
Signed-off-by: default avatarLen Brown <len.brown@intel.com>
parent d3514abc
turbostat : turbostat.c turbostat : turbostat.c
CFLAGS += -Wall
clean : clean :
rm -f turbostat rm -f turbostat
......
...@@ -27,7 +27,11 @@ supports an "invariant" TSC, plus the APERF and MPERF MSRs. ...@@ -27,7 +27,11 @@ supports an "invariant" TSC, plus the APERF and MPERF MSRs.
on processors that additionally support C-state residency counters. on processors that additionally support C-state residency counters.
.SS Options .SS Options
The \fB-s\fP option prints only a 1-line summary for each sample interval. The \fB-s\fP option limits output to a 1-line system summary for each interval.
.PP
The \fB-c\fP option limits output to the 1st thread in each core.
.PP
The \fB-p\fP option limits output to the 1st thread in each package.
.PP .PP
The \fB-v\fP option increases verbosity. The \fB-v\fP option increases verbosity.
.PP .PP
...@@ -65,19 +69,19 @@ Subsequent rows show per-CPU statistics. ...@@ -65,19 +69,19 @@ Subsequent rows show per-CPU statistics.
.nf .nf
[root@x980]# ./turbostat [root@x980]# ./turbostat
cor CPU %c0 GHz TSC %c1 %c3 %c6 %pc3 %pc6 cor CPU %c0 GHz TSC %c1 %c3 %c6 %pc3 %pc6
0.60 1.63 3.38 2.91 0.00 96.49 0.00 76.64 0.09 1.62 3.38 1.83 0.32 97.76 1.26 83.61
0 0 0.59 1.62 3.38 4.51 0.00 94.90 0.00 76.64 0 0 0.15 1.62 3.38 10.23 0.05 89.56 1.26 83.61
0 6 1.13 1.64 3.38 3.97 0.00 94.90 0.00 76.64 0 6 0.05 1.62 3.38 10.34
1 2 0.08 1.62 3.38 0.07 0.00 99.85 0.00 76.64 1 2 0.03 1.62 3.38 0.07 0.05 99.86
1 8 0.03 1.62 3.38 0.12 0.00 99.85 0.00 76.64 1 8 0.03 1.62 3.38 0.06
2 4 0.01 1.62 3.38 0.06 0.00 99.93 0.00 76.64 2 4 0.21 1.62 3.38 0.10 1.49 98.21
2 10 0.04 1.62 3.38 0.02 0.00 99.93 0.00 76.64 2 10 0.02 1.62 3.38 0.29
8 1 2.85 1.62 3.38 11.71 0.00 85.44 0.00 76.64 8 1 0.04 1.62 3.38 0.04 0.08 99.84
8 7 1.98 1.62 3.38 12.58 0.00 85.44 0.00 76.64 8 7 0.01 1.62 3.38 0.06
9 3 0.36 1.62 3.38 0.71 0.00 98.93 0.00 76.64 9 3 0.53 1.62 3.38 0.10 0.20 99.17
9 9 0.09 1.62 3.38 0.98 0.00 98.93 0.00 76.64 9 9 0.02 1.62 3.38 0.60
10 5 0.03 1.62 3.38 0.09 0.00 99.87 0.00 76.64 10 5 0.01 1.62 3.38 0.02 0.04 99.92
10 11 0.07 1.62 3.38 0.06 0.00 99.87 0.00 76.64 10 11 0.02 1.62 3.38 0.02
.fi .fi
.SH SUMMARY EXAMPLE .SH SUMMARY EXAMPLE
The "-s" option prints the column headers just once, The "-s" option prints the column headers just once,
...@@ -86,9 +90,10 @@ and then the one line system summary for each sample interval. ...@@ -86,9 +90,10 @@ and then the one line system summary for each sample interval.
.nf .nf
[root@x980]# ./turbostat -s [root@x980]# ./turbostat -s
%c0 GHz TSC %c1 %c3 %c6 %pc3 %pc6 %c0 GHz TSC %c1 %c3 %c6 %pc3 %pc6
0.61 1.89 3.38 5.95 0.00 93.44 0.00 66.33 0.23 1.67 3.38 2.00 0.30 97.47 1.07 82.12
0.52 1.62 3.38 6.83 0.00 92.65 0.00 61.11 0.10 1.62 3.38 1.87 2.25 95.77 12.02 72.60
0.62 1.92 3.38 5.47 0.00 93.91 0.00 67.31 0.20 1.64 3.38 1.98 0.11 97.72 0.30 83.36
0.11 1.70 3.38 1.86 1.81 96.22 9.71 74.90
.fi .fi
.SH VERBOSE EXAMPLE .SH VERBOSE EXAMPLE
The "-v" option adds verbosity to the output: The "-v" option adds verbosity to the output:
...@@ -120,30 +125,28 @@ until ^C while the other CPUs are mostly idle: ...@@ -120,30 +125,28 @@ until ^C while the other CPUs are mostly idle:
[root@x980 lenb]# ./turbostat cat /dev/zero > /dev/null [root@x980 lenb]# ./turbostat cat /dev/zero > /dev/null
^C ^C
cor CPU %c0 GHz TSC %c1 %c3 %c6 %pc3 %pc6 cor CPU %c0 GHz TSC %c1 %c3 %c6 %pc3 %pc6
8.63 3.64 3.38 14.46 0.49 76.42 0.00 0.00 8.86 3.61 3.38 15.06 31.19 44.89 0.00 0.00
0 0 0.34 3.36 3.38 99.66 0.00 0.00 0.00 0.00 0 0 1.46 3.22 3.38 16.84 29.48 52.22 0.00 0.00
0 6 99.96 3.64 3.38 0.04 0.00 0.00 0.00 0.00 0 6 0.21 3.06 3.38 18.09
1 2 0.14 3.50 3.38 1.75 2.04 96.07 0.00 0.00 1 2 0.53 3.33 3.38 2.80 46.40 50.27
1 8 0.38 3.57 3.38 1.51 2.04 96.07 0.00 0.00 1 8 0.89 3.47 3.38 2.44
2 4 0.01 2.65 3.38 0.06 0.00 99.93 0.00 0.00 2 4 1.36 3.43 3.38 9.04 23.71 65.89
2 10 0.03 2.12 3.38 0.04 0.00 99.93 0.00 0.00 2 10 0.18 2.86 3.38 10.22
8 1 0.91 3.59 3.38 35.27 0.92 62.90 0.00 0.00 8 1 0.04 2.87 3.38 99.96 0.01 0.00
8 7 1.61 3.63 3.38 34.57 0.92 62.90 0.00 0.00 8 7 99.72 3.63 3.38 0.27
9 3 0.04 3.38 3.38 0.20 0.00 99.76 0.00 0.00 9 3 0.31 3.21 3.38 7.64 56.55 35.50
9 9 0.04 3.29 3.38 0.20 0.00 99.76 0.00 0.00 9 9 0.08 2.95 3.38 7.88
10 5 0.03 3.08 3.38 0.12 0.00 99.85 0.00 0.00 10 5 1.42 3.43 3.38 2.14 30.99 65.44
10 11 0.05 3.07 3.38 0.10 0.00 99.85 0.00 0.00 10 11 0.16 2.88 3.38 3.40
4.907015 sec
.fi .fi
Above the cycle soaker drives cpu6 up 3.6 Ghz turbo limit Above the cycle soaker drives cpu7 up its 3.6 Ghz turbo limit
while the other processors are generally in various states of idle. while the other processors are generally in various states of idle.
Note that cpu0 is an HT sibling sharing core0 Note that cpu1 and cpu7 are HT siblings within core8.
with cpu6, and thus it is unable to get to an idle state As cpu7 is very busy, it prevents its sibling, cpu1,
deeper than c1 while cpu6 is busy. from entering a c-state deeper than c1.
Note that turbostat reports average GHz of 3.64, while Note that turbostat reports average GHz of 3.63, while
the arithmetic average of the GHz column above is lower. the arithmetic average of the GHz column above is lower.
This is a weighted average, where the weight is %c0. ie. it is the total number of This is a weighted average, where the weight is %c0. ie. it is the total number of
un-halted cycles elapsed per time divided by the number of CPUs. un-halted cycles elapsed per time divided by the number of CPUs.
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment