• Namhyung Kim's avatar
    perf lock contention: Use per-cpu array map for spinlocks · b5711042
    Namhyung Kim authored
    Currently lock contention timestamp is maintained in a hash map keyed by
    pid.  That means it needs to get and release a map element (which is
    proctected by spinlock!) on each contention begin and end pair.  This
    can impact on performance if there are a lot of contention (usually from
    spinlocks).
    
    It used to go with task local storage but it had an issue on memory
    allocation in some critical paths.  Although it's addressed in recent
    kernels IIUC, the tool should support old kernels too.  So it cannot
    simply switch to the task local storage at least for now.
    
    As spinlocks create lots of contention and they disabled preemption
    during the spinning, it can use per-cpu array to keep the timestamp to
    avoid overhead in hashmap update and delete.
    
    In contention_begin, it's easy to check the lock types since it can see
    the flags.  But contention_end cannot see it.  So let's try to per-cpu
    array first (unconditionally) if it has an active element (lock != 0).
    Then it should be used and per-task tstamp map should not be used until
    the per-cpu array element is cleared which means nested spinlock
    contention (if any) was finished and it nows see (the outer) lock.
    Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Acked-by: default avatarIan Rogers <irogers@google.com>
    Cc: Hao Luo <haoluo@google.com>
    Cc: Song Liu <song@kernel.org>
    Cc: bpf@vger.kernel.org
    Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
    b5711042
lock_contention.bpf.c 12.6 KB