• Junxiao Bi's avatar
    ocfs2: issue zeroout to EOF blocks · 9449ad33
    Junxiao Bi authored
    For punch holes in EOF blocks, fallocate used buffer write to zero the
    EOF blocks in last cluster.  But since ->writepage will ignore EOF
    pages, those zeros will not be flushed.
    
    This "looks" ok as commit 6bba4471 ("ocfs2: fix data corruption by
    fallocate") will zero the EOF blocks when extend the file size, but it
    isn't.  The problem happened on those EOF pages, before writeback, those
    pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
    set, when writeback run by write_cache_pages(), DIRTY flag on the page
    was cleared, but DIRTY flag on the buffer_head not.
    
    When next write happened to those EOF pages, since buffer_head already
    had DIRTY flag set, it would not mark page DIRTY again.  That made
    writeback ignore them forever.  That will cause data corruption.  Even
    directio write can't work because it will fail when trying to drop pages
    caches before direct io, as it found the buffer_head for those pages
    still had DIRTY flag set, then it will fall back to buffer io mode.
    
    To make a summary of the issue, as writeback ingores EOF pages, once any
    EOF page is generated, any write to it will only go to the page cache,
    it will never be flushed to disk even file size extends and that page is
    not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
    write.
    
    The following code snippet from qemu-img could trigger the corruption.
    
      656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
      ...
      660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
      660   fallocate(11, 0, 2275868672, 327680) = 0
      658   pwrite64(11, "
    
    Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
    Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Gang He <ghe@suse.com>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9449ad33
file.c 68.6 KB