• Tejun Heo's avatar
    kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce
    Tejun Heo authored
    d911d987 ("kernfs: make kernfs_notify() trigger inotify events
    too") added fsnotify triggering to kernfs_notify() which requires a
    sleepable context.  There are already existing users of
    kernfs_notify() which invoke it from an atomic context and in general
    it's silly to require a sleepable context for triggering a
    notification.
    
    The following is an invalid context bug triggerd by md invoking
    sysfs_notify() from IO completion path.
    
     BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
     in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
     2 locks held by swapper/1/0:
      #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
      #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
     irq event stamp: 33518
     hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
     hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
     softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
     softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
     CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
     Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
      0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
      0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
     Call Trace:
      <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
      [<ffffffff810d4f14>] __might_sleep+0x184/0x240
      [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
      [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
      [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
      [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
      [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
      [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
      [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
      [<ffffffff813b46c4>] blk_update_request+0x94/0x420
      [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
      [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
      [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
      [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
      [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
      [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
      [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
      [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
      [<ffffffff81119436>] handle_edge_irq+0x66/0x130
      [<ffffffff8101c3e4>] handle_irq+0x84/0x150
      [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
      [<ffffffff818122f2>] common_interrupt+0x72/0x72
      <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
      [<ffffffff81025454>] default_idle+0x24/0x230
      [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
      [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
      [<ffffffff8104df1b>] start_secondary+0x25b/0x300
    
    This patch fixes it by punting the notification delivery through a
    work item.  This ends up adding an extra pointer to kernfs_elem_attr
    enlarging kernfs_node by a pointer, which is not ideal but not a very
    big deal either.  If this turns out to be an actual issue, we can move
    kernfs_elem_attr->size to kernfs_node->iattr later.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Reported-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Reviewed-by: default avatarMichael S. Tsirkin <mst@redhat.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    ecca47ce
file.c 23.4 KB