• Robert Richter's avatar
    perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
    Robert Richter authored
    The current x86 event scheduler fails to resolve scheduling problems
    of certain combinations of events and constraints. This happens if the
    counter mask of such an event is not a subset of any other counter
    mask of a constraint with an equal or higher weight, e.g. constraints
    of the AMD family 15h pmu:
    
                            counter mask    weight
    
     amd_f15_PMC30          0x09            2  <--- overlapping counters
     amd_f15_PMC20          0x07            3
     amd_f15_PMC53          0x38            3
    
    The scheduler does not find then an existing solution. Here is an
    example:
    
     event code     counter         failure         possible solution
    
     0x02E          PMC[3,0]        0               3
     0x043          PMC[2:0]        1               0
     0x045          PMC[2:0]        2               1
     0x046          PMC[2:0]        FAIL            2
    
    The event scheduler may not select the correct counter in the first
    cycle because it needs to know which subsequent events will be
    scheduled. It may fail to schedule the events then.
    
    To solve this, we now save the scheduler state of events with
    overlapping counter counstraints.  If we fail to schedule the events
    we rollback to those states and try to use another free counter.
    
    Constraints with overlapping counters are marked with a new introduced
    overlap flag. We set the overlap flag for such constraints to give the
    scheduler a hint which events to select for counter rescheduling. The
    EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
    
    Care must be taken as the rescheduling algorithm is O(n!) which will
    increase scheduling cycles for an over-commited system dramatically.
    The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
    masks must be kept at a minimum. Thus, the current stack is limited to
    2 states to limit the number of loops the algorithm takes in the worst
    case.
    
    On systems with no overlapping-counter constraints, this
    implementation does not increase the loop count compared to the
    previous algorithm.
    
    V2:
    * Renamed redo -> overlap.
    * Reimplementation using perf scheduling helper functions.
    
    V3:
    * Added WARN_ON_ONCE() if out of save states.
    * Changed function interface of perf_sched_restore_state() to use bool
      as return value.
    Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
    Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Stephane Eranian <eranian@google.com>
    Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    bc1738f6
perf_event_amd.c 15.6 KB