• Dave Chinner's avatar
    xfs: prevent CIL push holdoff in log recovery · 8ab39f11
    Dave Chinner authored
    generic/530 on a machine with enough ram and a non-preemptible
    kernel can run the AGI processing phase of log recovery enitrely out
    of cache. This means it never blocks on locks, never waits for IO
    and runs entirely through the unlinked lists until it either
    completes or blocks and hangs because it has run out of log space.
    
    It runs out of log space because the background CIL push is
    scheduled but never runs. queue_work() queues the CIL work on the
    current CPU that is busy, and the workqueue code will not run it on
    any other CPU. Hence if the unlinked list processing never yields
    the CPU voluntarily, the push work is delayed indefinitely. This
    results in the CIL aggregating changes until all the log space is
    consumed.
    
    When the log recoveyr processing evenutally blocks, the CIL flushes
    but because the last iclog isn't submitted for IO because it isn't
    full, the CIL flush never completes and nothing ever moves the log
    head forwards, or indeed inserts anything into the tail of the log,
    and hence nothing is able to get the log moving again and recovery
    hangs.
    
    There are several problems here, but the two obvious ones from
    the trace are that:
    	a) log recovery does not yield the CPU for over 4 seconds,
    	b) binding CIL pushes to a single CPU is a really bad idea.
    
    This patch addresses just these two aspects of the problem, and are
    suitable for backporting to work around any issues in older kernels.
    The more fundamental problem of preventing the CIL from consuming
    more than 50% of the log without committing will take more invasive
    and complex work, so will be done as followup work.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    8ab39f11
xfs_log_recover.c 162 KB