• Andi Kleen's avatar
    perf callchain: Support handling complete branch stacks as histograms · 8b7bad58
    Andi Kleen authored
    Currently branch stacks can be only shown as edge histograms for
    individual branches. I never found this display particularly useful.
    
    This implements an alternative mode that creates histograms over
    complete branch traces, instead of individual branches, similar to how
    normal callgraphs are handled. This is done by putting it in front of
    the normal callgraph and then using the normal callgraph histogram
    infrastructure to unify them.
    
    This way in complex functions we can understand the control flow that
    lead to a particular sample, and may even see some control flow in the
    caller for short functions.
    
    Example (simplified, of course for such simple code this is usually not
    needed), please run this after the whole patchkit is in, as at this
    point in the patch order there is no --branch-history, that will be
    added in a patch after this one:
    
    tcall.c:
    
    volatile a = 10000, b = 100000, c;
    
    __attribute__((noinline)) f2()
    {
    	c = a / b;
    }
    
    __attribute__((noinline)) f1()
    {
    	f2();
    	f2();
    }
    main()
    {
    	int i;
    	for (i = 0; i < 1000000; i++)
    		f1();
    }
    
    % perf record -b -g ./tsrc/tcall
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
    % perf report --no-children --branch-history
    ...
        54.91%  tcall.c:6  [.] f2                      tcall
                |
                |--65.53%-- f2 tcall.c:5
                |          |
                |          |--70.83%-- f1 tcall.c:11
                |          |          f1 tcall.c:10
                |          |          main tcall.c:18
                |          |          main tcall.c:18
                |          |          main tcall.c:17
                |          |          main tcall.c:17
                |          |          f1 tcall.c:13
                |          |          f1 tcall.c:13
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:12
                |          |          f1 tcall.c:12
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:11
                |          |
                |           --29.17%-- f1 tcall.c:12
                |                     f1 tcall.c:12
                |                     f2 tcall.c:7
                |                     f2 tcall.c:5
                |                     f1 tcall.c:11
                |                     f1 tcall.c:10
                |                     main tcall.c:18
                |                     main tcall.c:18
                |                     main tcall.c:17
                |                     main tcall.c:17
                |                     f1 tcall.c:13
                |                     f1 tcall.c:13
                |                     f2 tcall.c:7
                |                     f2 tcall.c:5
                |                     f1 tcall.c:12
    
    The default output is unchanged.
    
    This is only implemented in perf report, no change to record or anywhere
    else.
    
    This adds the basic code to report:
    
    - add a new "branch" option to the -g option parser to enable this mode
    - when the flag is set include the LBR into the callstack in machine.c.
    
    The rest of the history code is unchanged and doesn't know the
    difference between LBR entry and normal call entry.
    
    - detect overlaps with the callchain
    - remove small loop duplicates in the LBR
    
    Current limitations:
    
    - The LBR flags (mispredict etc.) are not shown in the history
    and LBR entries have no special marker.
    - It would be nice if annotate marked the LBR entries somehow
    (e.g. with arrows)
    
    v2: Various fixes.
    v3: Merge further patches into this one. Fix white space.
    v4: Improve manpage. Address review feedback.
    v5: Rename functions. Better error message without -g. Fix crash without
        -b.
    v6: Rebase
    v7: Rebase. Use NO_ENTRY in memset.
    v8: Port to latest tip. Move add_callchain_ip to separate
        patch. Skip initial entries in callchain. Minor cleanups.
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    8b7bad58
perf-report.txt 10.3 KB