• John Stultz's avatar
    timekeeping: Avoid possible deadlock from clock_was_set_delayed · 6fdda9a9
    John Stultz authored
    As part of normal operaions, the hrtimer subsystem frequently calls
    into the timekeeping code, creating a locking order of
      hrtimer locks -> timekeeping locks
    
    clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
    between the timekeeping the hrtimer subsystem, so that we could
    notify the hrtimer subsytem the time had changed while holding
    the timekeeping locks. This was done by scheduling delayed work
    that would run later once we were out of the timekeeing code.
    
    But unfortunately the lock chains are complex enoguh that in
    scheduling delayed work, we end up eventually trying to grab
    an hrtimer lock.
    
    Sasha Levin noticed this in testing when the new seqlock lockdep
    enablement triggered the following (somewhat abrieviated) message:
    
    [  251.100221] ======================================================
    [  251.100221] [ INFO: possible circular locking dependency detected ]
    [  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
    [  251.101967] -------------------------------------------------------
    [  251.101967] kworker/10:1/4506 is trying to acquire lock:
    [  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
    [  251.101967]
    [  251.101967] but task is already holding lock:
    [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
    [  251.101967]
    [  251.101967] which lock already depends on the new lock.
    [  251.101967]
    [  251.101967]
    [  251.101967] the existing dependency chain (in reverse order) is:
    [  251.101967]
    -> #5 (hrtimer_bases.lock#11){-.-...}:
    [snipped]
    -> #4 (&rt_b->rt_runtime_lock){-.-...}:
    [snipped]
    -> #3 (&rq->lock){-.-.-.}:
    [snipped]
    -> #2 (&p->pi_lock){-.-.-.}:
    [snipped]
    -> #1 (&(&pool->lock)->rlock){-.-...}:
    [  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
    [  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
    [  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
    [  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
    [  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
    [  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
    [  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
    [  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
    [  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
    [  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
    [  251.101967]
    -> #0 (timekeeper_seq){----..}:
    [snipped]
    [  251.101967] other info that might help us debug this:
    [  251.101967]
    [  251.101967] Chain exists of:
      timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
    
    [  251.101967]  Possible unsafe locking scenario:
    [  251.101967]
    [  251.101967]        CPU0                    CPU1
    [  251.101967]        ----                    ----
    [  251.101967]   lock(hrtimer_bases.lock#11);
    [  251.101967]                                lock(&rt_b->rt_runtime_lock);
    [  251.101967]                                lock(hrtimer_bases.lock#11);
    [  251.101967]   lock(timekeeper_seq);
    [  251.101967]
    [  251.101967]  *** DEADLOCK ***
    [  251.101967]
    [  251.101967] 3 locks held by kworker/10:1/4506:
    [  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
    [  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
    [  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
    [  251.101967]
    [  251.101967] stack backtrace:
    [  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
    [  251.101967] Workqueue: events clock_was_set_work
    
    So the best solution is to avoid calling clock_was_set_delayed() while
    holding the timekeeping lock, and instead using a flag variable to
    decide if we should call clock_was_set() once we've released the locks.
    
    This works for the case here, where the do_adjtimex() was the deadlock
    trigger point. Unfortuantely, in update_wall_time() we still hold
    the jiffies lock, which would deadlock with the ipi triggered by
    clock_was_set(), preventing us from calling it even after we drop the
    timekeeping lock. So instead call clock_was_set_delayed() at that point.
    
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Prarit Bhargava <prarit@redhat.com>
    Cc: Richard Cochran <richardcochran@gmail.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Sasha Levin <sasha.levin@oracle.com>
    Cc: stable <stable@vger.kernel.org> #3.10+
    Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
    Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
    Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
    6fdda9a9
timekeeping.c 45.9 KB