• Brian Foster's avatar
    xfs: flush inodegc workqueue tasks before cancel · 6191cf3a
    Brian Foster authored
    The xfs_inodegc_stop() helper performs a high level flush of pending
    work on the percpu queues and then runs a cancel_work_sync() on each
    of the percpu work tasks to ensure all work has completed before
    returning.  While cancel_work_sync() waits for wq tasks to complete,
    it does not guarantee work tasks have started. This means that the
    _stop() helper can queue and instantly cancel a wq task without
    having completed the associated work. This can be observed by
    tracepoint inspection of a simple "rm -f <file>; fsfreeze -f <mnt>"
    test:
    
    	xfs_destroy_inode: ... ino 0x83 ...
    	xfs_inode_set_need_inactive: ... ino 0x83 ...
    	xfs_inodegc_stop: ...
    	...
    	xfs_inodegc_start: ...
    	xfs_inodegc_worker: ...
    	xfs_inode_inactivating: ... ino 0x83 ...
    
    The first few lines show that the inode is removed and need inactive
    state set, but the inactivation work has not completed before the
    inodegc mechanism stops. The inactivation doesn't actually occur
    until the fs is unfrozen and the gc mechanism starts back up. Note
    that this test requires fsfreeze to reproduce because xfs_freeze
    indirectly invokes xfs_fs_statfs(), which calls xfs_inodegc_flush().
    
    When this occurs, the workqueue try_to_grab_pending() logic first
    tries to steal the pending bit, which does not succeed because the
    bit has been set by queue_work_on(). Subsequently, it checks for
    association of a pool workqueue from the work item under the pool
    lock. This association is set at the point a work item is queued and
    cleared when dequeued for processing. If the association exists, the
    work item is removed from the queue and cancel_work_sync() returns
    true. If the pwq association is cleared, the remove attempt assumes
    the task is busy and retries (eventually returning false to the
    caller after waiting for the work task to complete).
    
    To avoid this race, we can flush each work item explicitly before
    cancel. However, since the _queue_all() already schedules each
    underlying work item, the workqueue level helpers are sufficient to
    achieve the same ordering effect. E.g., the inodegc enabled flag
    prevents scheduling any further work in the _stop() case. Use the
    drain_workqueue() helper in this particular case to make the intent
    a bit more self explanatory.
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    6191cf3a
xfs_icache.c 54.1 KB