• Bob Peterson's avatar
    gfs2: Warn when a journal replay overwrites a rgrp with buffers · d14e1ca3
    Bob Peterson authored
    This patch adds some instrumentation in gfs2's journal replay that
    indicates when we're about to overwrite a rgrp for which we already
    have a valid buffer_head.
    
    When this problem occurs, it's a situation in which this node has
    been granted a rgrp glock and subsequently read in buffer_heads for
    it, and possibly even made changes to the rgrp bits and/or
    allocation values. But now another node has failed and forced us to
    replay its journal, but its journal contains a copy of the same
    rgrp, without a revoke, which means we're about to overwrite a
    rgrp that we now rightfully own, with an obsolete copy. That is
    always a problem. It means the other node (which failed and left
    its journal to be replayed) failed to flush out its rgrp buffers,
    write out the revoke, and invalidate its copy before it released
    the glock to our possession.
    
    No node should ever release a glock until its metadata has been
    written to the journal and revoked and invalidated..
    
    We also kludge around the problem and refuse to replace our good
    copy with the journals bad copy by not marking the buffer dirty,
    but never do it silently. That's wallpapering over a larger problem
    that still exists. IOW, if this situation can happen to this node,
    it can also happen to a different node and we wouldn't even know it
    or be able to circumvent it: Suppose we have a 3-node cluster:
    Node 1 fails, leaving an obsolete rgrp block in its journal without
    a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is
    released and starts making changes, allocating and freeing blocks
    from the rgrp, etc. Node 3 replays the journal from node 1,
    oblivious and unaware that it's about to overwrite node 2's changes.
    So we still need to be vocal and log the error to make it apparent
    that a corruption path still exists in gfs2.
    Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
    Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
    d14e1ca3
lops.c 27.2 KB