• Jun'ichi Nomura's avatar
    dm mpath: fix IO hang due to logic bug in multipath_busy · 7a7a3b45
    Jun'ichi Nomura authored
    Commit e8099177 ("dm mpath: push back requests instead of queueing")
    modified multipath_busy() to return true if !pg_ready().  pg_ready()
    checks the current state of the multipath device and may return false
    even if a new IO is needed to change the state.
    
    Bart Van Assche reported that he had multipath IO lockup when he was
    performing cable pull tests.  Analysis showed that the multipath
    device had a single path group with both paths active, but that the
    path group itself was not active.  During the multipath device state
    transitions 'queue_io' got set but nothing could clear it.  Clearing
    'queue_io' only happens in __choose_pgpath(), but it won't be called
    if multipath_busy() returns true due to pg_ready() returning false
    when 'queue_io' is set.
    
    As such the !pg_ready() check in multipath_busy() is wrong because new
    IO will not be sent to multipath target and the multipath state change
    won't happen.  That results in multipath IO lockup.
    
    The intent of multipath_busy() is to avoid unnecessary cycles of
    dequeue + request_fn + requeue if it is known that the multipath
    device will requeue.
    
    Such "busy" situations would be:
      - path group is being activated
      - there is no path and the multipath is setup to requeue if no path
    
    Fix multipath_busy() to return "busy" early only for these specific
    situations.
    Reported-by: default avatarBart Van Assche <bvanassche@acm.org>
    Tested-by: default avatarBart Van Assche <bvanassche@acm.org>
    Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    Cc: stable@vger.kernel.org # v3.15
    7a7a3b45
dm-mpath.c 39.9 KB