• Hugh Dickins's avatar
    tmpfs: fix shmem_evict_inode() warnings on i_blocks · 267a4c76
    Hugh Dickins authored
    Dmitry Vyukov provides a little program, autogenerated by syzkaller,
    which races a fault on a mapping of a sparse memfd object, against
    truncation of that object below the fault address: run repeatedly for a
    few minutes, it reliably generates shmem_evict_inode()'s
    WARN_ON(inode->i_blocks).
    
    (But there's nothing specific to memfd here, nor to the fstat which it
    happened to use to generate the fault: though that looked suspicious,
    since a shmem_recalc_inode() had been added there recently.  The same
    problem can be reproduced with open+unlink in place of memfd_create, and
    with fstatfs in place of fstat.)
    
    v3.7 commit 0f3c42f5 ("tmpfs: change final i_blocks BUG to WARNING")
    explains one cause of such a warning (a race with shmem_writepage to
    swap), and possible solutions; but we never took it further, and this
    syzkaller incident turns out to have a different cause.
    
    shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
    then found to be beyond eof, looks plausible - decrementing the alloced
    count that was just before incremented - but in fact can go wrong, if a
    racing thread (the truncator, for example) gets its shmem_recalc_inode()
    in just after our delete_from_page_cache().  delete_from_page_cache()
    decrements nrpages, that shmem_recalc_inode() will balance the books by
    decrementing alloced itself, then our decrement of alloced take it one
    too low: leading to the WARNING when the object is finally evicted.
    
    Once the new page has been exposed in the page cache,
    shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
    the accounting right in all cases (and not fall through from "trunc:" to
    "decused:").  Adjust that error recovery block; and the reinitialization
    of info and sbinfo can be removed too.
    
    While we're here, fix shmem_writepage() to avoid the original issue: it
    will be safe against a racing shmem_recalc_inode(), if it merely
    increments swapped before the shmem_delete_from_page_cache() which
    decrements nrpages (but it must then do its own shmem_recalc_inode()
    before that, while still in balance, instead of after).  (Aside: why do
    we shmem_recalc_inode() here in the swap path? Because its raison d'etre
    is to cope with clean sparse shmem pages being reclaimed behind our
    back: so here when swapping is a good place to look for that case.) But
    I've not now managed to reproduce this bug, even without the patch.
    
    I don't see why I didn't do that earlier: perhaps inhibited by the
    preference to eliminate shmem_recalc_inode() altogether.  Driven by this
    incident, I do now have a patch to do so at last; but still want to sit
    on it for a bit, there's a couple of questions yet to be resolved.
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    267a4c76
shmem.c 89.7 KB