• Jiri Olsa's avatar
    kretprobe: Prevent triggering kretprobe from within kprobe_flush_task · 9b38cc70
    Jiri Olsa authored
    Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
    My test was also able to trigger lockdep output:
    
     ============================================
     WARNING: possible recursive locking detected
     5.6.0-rc6+ #6 Not tainted
     --------------------------------------------
     sched-messaging/2767 is trying to acquire lock:
     ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
    
     but task is already holding lock:
     ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
    
     other info that might help us debug this:
      Possible unsafe locking scenario:
    
            CPU0
            ----
       lock(&(kretprobe_table_locks[i].lock));
       lock(&(kretprobe_table_locks[i].lock));
    
      *** DEADLOCK ***
    
      May be due to missing lock nesting notation
    
     1 lock held by sched-messaging/2767:
      #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
    
     stack backtrace:
     CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
     Call Trace:
      dump_stack+0x96/0xe0
      __lock_acquire.cold.57+0x173/0x2b7
      ? native_queued_spin_lock_slowpath+0x42b/0x9e0
      ? lockdep_hardirqs_on+0x590/0x590
      ? __lock_acquire+0xf63/0x4030
      lock_acquire+0x15a/0x3d0
      ? kretprobe_hash_lock+0x52/0xa0
      _raw_spin_lock_irqsave+0x36/0x70
      ? kretprobe_hash_lock+0x52/0xa0
      kretprobe_hash_lock+0x52/0xa0
      trampoline_handler+0xf8/0x940
      ? kprobe_fault_handler+0x380/0x380
      ? find_held_lock+0x3a/0x1c0
      kretprobe_trampoline+0x25/0x50
      ? lock_acquired+0x392/0xbc0
      ? _raw_spin_lock_irqsave+0x50/0x70
      ? __get_valid_kprobe+0x1f0/0x1f0
      ? _raw_spin_unlock_irqrestore+0x3b/0x40
      ? finish_task_switch+0x4b9/0x6d0
      ? __switch_to_asm+0x34/0x70
      ? __switch_to_asm+0x40/0x70
    
    The code within the kretprobe handler checks for probe reentrancy,
    so we won't trigger any _raw_spin_lock_irqsave probe in there.
    
    The problem is in outside kprobe_flush_task, where we call:
    
      kprobe_flush_task
        kretprobe_table_lock
          raw_spin_lock_irqsave
            _raw_spin_lock_irqsave
    
    where _raw_spin_lock_irqsave triggers the kretprobe and installs
    kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
    
    The kretprobe_trampoline handler is then executed with already
    locked kretprobe_table_locks, and first thing it does is to
    lock kretprobe_table_locks ;-) the whole lockup path like:
    
      kprobe_flush_task
        kretprobe_table_lock
          raw_spin_lock_irqsave
            _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
    
            ---> kretprobe_table_locks locked
    
            kretprobe_trampoline
              trampoline_handler
                kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
    
    Adding kprobe_busy_begin/end helpers that mark code with fake
    probe installed to prevent triggering of another kprobe within
    this code.
    
    Using these helpers in kprobe_flush_task, so the probe recursion
    protection check is hit and the probe is never set to prevent
    above lockup.
    
    Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2
    
    Fixes: ef53d9c5 ("kprobes: improve kretprobe scalability with hashed locking")
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org>
    Cc: Anders Roxell <anders.roxell@linaro.org>
    Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
    Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
    Cc: David Miller <davem@davemloft.net>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Reported-by: default avatar"Ziqian SUN (Zamir)" <zsun@redhat.com>
    Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
    Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
    Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
    9b38cc70
core.c 30.8 KB