• Nikolay Borisov's avatar
    userns: Make ucounts lock irq-safe · 880a3854
    Nikolay Borisov authored
    The ucounts_lock is being used to protect various ucounts lifecycle
    management functionalities. However, those services can also be invoked
    when a pidns is being freed in an RCU callback (e.g. softirq context).
    This can lead to deadlocks. There were already efforts trying to
    prevent similar deadlocks in add7c65c ("pid: fix lockdep deadlock
    warning due to ucount_lock"), however they just moved the context
    from hardirq to softrq. Fix this issue once and for all by explictly
    making the lock disable irqs altogether.
    
    Dmitry Vyukov <dvyukov@google.com> reported:
    
    > I've got the following deadlock report while running syzkaller fuzzer
    > on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid
    > device if it matters):
    >
    > =================================
    > [ INFO: inconsistent lock state ]
    > 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted
    > ---------------------------------
    > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    > swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    >  (ucounts_lock){+.?...}, at: [<     inline     >] spin_lock
    > ./include/linux/spinlock.h:302
    >  (ucounts_lock){+.?...}, at: [<ffff2000081678c8>]
    > put_ucounts+0x60/0x138 kernel/ucount.c:162
    > {SOFTIRQ-ON-W} state was registered at:
    > [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054
    > [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2941
    > [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295
    > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
    > [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
    > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
    > [<     inline     >] spin_lock ./include/linux/spinlock.h:302
    > [<     inline     >] get_ucounts kernel/ucount.c:131
    > [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189
    > [<     inline     >] inc_mnt_namespaces fs/namespace.c:2818
    > [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849
    > [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959
    > [<     inline     >] init_mount_tree fs/namespace.c:3199
    > [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251
    > [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626
    > [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648
    > [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456
    > irq event stamp: 2316924
    > hardirqs last  enabled at (2316924): [<     inline     >] rcu_do_batch
    > kernel/rcu/tree.c:2911
    > hardirqs last  enabled at (2316924): [<     inline     >]
    > invoke_rcu_callbacks kernel/rcu/tree.c:3182
    > hardirqs last  enabled at (2316924): [<     inline     >]
    > __rcu_process_callbacks kernel/rcu/tree.c:3149
    > hardirqs last  enabled at (2316924): [<ffff200008210414>]
    > rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166
    > hardirqs last disabled at (2316923): [<     inline     >] rcu_do_batch
    > kernel/rcu/tree.c:2900
    > hardirqs last disabled at (2316923): [<     inline     >]
    > invoke_rcu_callbacks kernel/rcu/tree.c:3182
    > hardirqs last disabled at (2316923): [<     inline     >]
    > __rcu_process_callbacks kernel/rcu/tree.c:3149
    > hardirqs last disabled at (2316923): [<ffff20000820fe80>]
    > rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166
    > softirqs last  enabled at (2316912): [<ffff20000811b4c4>]
    > _local_bh_enable+0x4c/0x80 kernel/softirq.c:155
    > softirqs last disabled at (2316913): [<     inline     >]
    > do_softirq_own_stack ./include/linux/interrupt.h:488
    > softirqs last disabled at (2316913): [<     inline     >]
    > invoke_softirq kernel/softirq.c:371
    > softirqs last disabled at (2316913): [<ffff20000811c994>]
    > irq_exit+0x264/0x308 kernel/softirq.c:405
    >
    > other info that might help us debug this:
    >  Possible unsafe locking scenario:
    >
    >        CPU0
    >        ----
    >   lock(ucounts_lock);
    >   <Interrupt>
    >     lock(ucounts_lock);
    >
    >  *** DEADLOCK ***
    >
    > 1 lock held by swapper/2/0:
    >  #0:  (rcu_callback){......}, at: [<     inline     >] __rcu_reclaim
    > kernel/rcu/rcu.h:108
    >  #0:  (rcu_callback){......}, at: [<     inline     >] rcu_do_batch
    > kernel/rcu/tree.c:2919
    >  #0:  (rcu_callback){......}, at: [<     inline     >]
    > invoke_rcu_callbacks kernel/rcu/tree.c:3182
    >  #0:  (rcu_callback){......}, at: [<     inline     >]
    > __rcu_process_callbacks kernel/rcu/tree.c:3149
    >  #0:  (rcu_callback){......}, at: [<ffff200008210390>]
    > rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166
    >
    > stack backtrace:
    > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6
    > Hardware name: Hardkernel ODROID-C2 (DT)
    > Call trace:
    > [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500
    > [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225
    > [<ffff2000088a99e0>] dump_stack+0x110/0x168
    > [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc
    > kernel/locking/lockdep.c:2387
    > [<     inline     >] print_usage_bug kernel/locking/lockdep.c:2357
    > [<     inline     >] valid_state kernel/locking/lockdep.c:2400
    > [<     inline     >] mark_lock_irq kernel/locking/lockdep.c:2617
    > [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065
    > [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2923
    > [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295
    > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
    > [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
    > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
    > [<     inline     >] spin_lock ./include/linux/spinlock.h:302
    > [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162
    > [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214
    > [<     inline     >] dec_pid_namespaces kernel/pid_namespace.c:89
    > [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156
    > [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
    > [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2919
    > [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3182
    > [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3149
    > [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166
    > [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284
    > [<     inline     >] do_softirq_own_stack ./include/linux/interrupt.h:488
    > [<     inline     >] invoke_softirq kernel/softirq.c:371
    > [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405
    > [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636
    > [<ffff200008081c80>] gic_handle_irq+0x68/0xd8
    > Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00)
    > 7dc0:                                   ffff8000648d4b3c 0000000000000007
    > 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967
    > 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001
    > 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002
    > 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000
    > 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140
    > 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d
    > 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00
    > 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145
    > 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478
    > [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486
    > [<     inline     >] arch_local_irq_restore
    > ./arch/arm64/include/asm/irqflags.h:81
    > [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030
    > [<     inline     >] cpuidle_idle_call kernel/sched/idle.c:200
    > [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243
    > [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345
    > [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358
    > arch/arm64/kernel/smp.c:276
    > [<000000000279f1a4>] 0x279f1a4
    Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
    Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
    Fixes: add7c65c ("pid: fix lockdep deadlock warning due to ucount_lock")
    Fixes: f333c700 ("pidns: Add a limit on the number of pid namespaces")
    Cc: stable@vger.kernel.org
    Link: https://www.spinics.net/lists/kernel/msg2426637.htmlSigned-off-by: default avatarNikolay Borisov <n.borisov.lkml@gmail.com>
    Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
    880a3854
ucount.c 5.52 KB