• Tejun Heo's avatar
    implement in-kernel gendisk events handling · 77ea887e
    Tejun Heo authored
    Currently, media presence polling for removeable block devices is done
    from userland.  There are several issues with this.
    
    * Polling is done by periodically opening the device.  For SCSI
      devices, the command sequence generated by such action involves a
      few different commands including TEST_UNIT_READY.  This behavior,
      while perfectly legal, is different from Windows which only issues
      single command, GET_EVENT_STATUS_NOTIFICATION.  Unfortunately, some
      ATAPI devices lock up after being periodically queried such command
      sequences.
    
    * There is no reliable and unintrusive way for a userland program to
      tell whether the target device is safe for media presence polling.
      For example, polling for media presence during an on-going burning
      session can make it fail.  The polling program can avoid this by
      opening the device with O_EXCL but then it risks making a valid
      exclusive user of the device fail w/ -EBUSY.
    
    * Userland polling is unnecessarily heavy and in-kernel implementation
      is lighter and better coordinated (workqueue, timer slack).
    
    This patch implements framework for in-kernel disk event handling,
    which includes media presence polling.
    
    * bdops->check_events() is added, which supercedes ->media_changed().
      It should check whether there's any pending event and return if so.
      Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
      DISK_EVENT_EJECT_REQUEST.  ->check_events() is guaranteed not to be
      called parallelly.
    
    * gendisk->events and ->async_events are added.  These should be
      initialized by block driver before passing the device to add_disk().
      The former contains the mask of all supported events and the latter
      the mask of all events which the device can report without polling.
      /sys/block/*/events[_async] export these to userland.
    
    * Kernel parameter block.events_dfl_poll_msecs controls the system
      polling interval (default is 0 which means disable) and
      /sys/block/*/events_poll_msecs control polling intervals for
      individual devices (default is -1 meaning use system setting).  Note
      that if a device can report all supported events asynchronously and
      its polling interval isn't explicitly set, the device won't be
      polled regardless of the system polling interval.
    
    * If a device is opened exclusively with write access, event checking
      is automatically disabled until all write exclusive accesses are
      released.
    
    * There are event 'clearing' events.  For example, both of currently
      defined events are cleared after the device has been successfully
      opened.  This information is passed to ->check_events() callback
      using @clearing argument as a hint.
    
    * Event checking is always performed from system_nrt_wq and timer
      slack is set to 25% for polling.
    
    * Nothing changes for drivers which implement ->media_changed() but
      not ->check_events().  Going forward, all drivers will be converted
      to ->check_events() and ->media_change() will be dropped.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Cc: Kay Sievers <kay.sievers@vrfy.org>
    Cc: Jan Kara <jack@suse.cz>
    Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
    77ea887e
block_dev.c 37.8 KB