1. 16 Mar, 2017 2 commits
    • Brian Foster's avatar
      xfs: pass total block res. as total xfs_bmapi_write() parameter · 2d288808
      Brian Foster authored
      commit dbd5c8c9 upstream.
      
      The total field from struct xfs_alloc_arg is a bit of an unknown
      commodity. It is documented as the total block requirement for the
      transaction and is used in this manner from most call sites by virtue of
      passing the total block reservation of the transaction associated with
      an allocation. Several xfs_bmapi_write() callers pass hardcoded values
      of 0 or 1 for the total block requirement, which is a historical oddity
      without any clear reasoning.
      
      The xfs_iomap_write_direct() caller, for example, passes 0 for the total
      block requirement. This has been determined to cause problems in the
      form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
      the block allocator. Specifically, the xfs_alloc_space_available()
      function incorrectly selects an AG that doesn't actually have sufficient
      space for the allocation. This occurs because the args.total field is 0
      and thus the remaining free space check on the AG doesn't actually
      consider the size of the allocation request. This locks the AGF buffer,
      the allocation attempt proceeds and ultimately fails (in
      xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
      AG. In turn, this can lead to incorrect AG locking order (if the
      allocator wraps around, attempting to lock AG 0 after acquiring AG N)
      and thus deadlock if racing with another operation. This problem has
      been reproduced via generic/299 on smallish (1GB) ramdisk test devices.
      
      To avoid this problem, replace the undocumented hardcoded total
      parameters from the iomap and utility callers to pass the block
      reservation used for the associated transaction. This is consistent with
      other xfs_bmapi_write() callers throughout XFS. The assumption is that
      the total field allows the selection of an AG that can handle the entire
      operation rather than simply the allocation/range being requested (e.g.,
      resulting btree splits, etc.). This addresses the aforementioned
      generic/299 hang by ensuring AG selection only occurs when the
      allocation can be satisfied by the AG.
      
      [nb] backport to 3.12
      Reported-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Acked-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2d288808
    • Mikulas Patocka's avatar
      dm: flush queued bios when process blocks to avoid deadlock · d4b83d12
      Mikulas Patocka authored
      commit d67a5f4b upstream.
      
      Commit df2cb6da ("block: Avoid deadlocks with bio allocation by
      stacking drivers") created a workqueue for every bio set and code
      in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
      by redirecting bios queued on current->bio_list to the workqueue if the
      system is low on memory.  However other deadlocks (see below **) may
      happen, without any low memory condition, because generic_make_request
      is queuing bios to current->bio_list (rather than submitting them).
      
      ** the related dm-snapshot deadlock is detailed here:
      https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html
      
      Fix this deadlock by redirecting any bios on current->bio_list to the
      bio_set's rescue workqueue on every schedule() call.  Consequently,
      when the process blocks on a mutex, the bios queued on
      current->bio_list are dispatched to independent workqueus and they can
      complete without waiting for the mutex to be available.
      
      The structure blk_plug contains an entry cb_list and this list can contain
      arbitrary callback functions that are called when the process blocks.
      To implement this fix DM (ab)uses the onstack plug's cb_list interface
      to get its flush_current_bio_list() called at schedule() time.
      
      This fixes the snapshot deadlock - if the map method blocks,
      flush_current_bio_list() will be called and it redirects bios waiting
      on current->bio_list to appropriate workqueues.
      
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
      Depends-on: df2cb6da ("block: Avoid deadlocks with bio allocation by stacking drivers")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d4b83d12
  2. 14 Mar, 2017 1 commit
  3. 13 Mar, 2017 37 commits