• Dave Chinner's avatar
    iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" · a837eca2
    Dave Chinner authored
    This reverts commit 61c6de66.
    
    The reverted commit added page reference counting to iomap page
    structures that are used to track block size < page size state. This
    was supposed to align the code with page migration page accounting
    assumptions, but what it has done instead is break XFS filesystems.
    Every fstests run I've done on sub-page block size XFS filesystems
    has since picking up this commit 2 days ago has failed with bad page
    state errors such as:
    
    # ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
    ....
    SECTION       -- xfs
    FSTYP         -- xfs (debug)
    PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
    MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
    MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
    
    generic/038 454s ...
     run fstests generic/038 at 2018-12-20 18:43:05
     XFS (sdc): Unmounting Filesystem
     XFS (sdc): Mounting V5 Filesystem
     XFS (sdc): Ending clean mount
     BUG: Bad page state in process kswapd0  pfn:3a7fa
     page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
     flags: 0xfffffc0000000()
     raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
     raw: 0000000000000001 0000000000000000 00000000ffffffff
     page dumped because: non-NULL mapping
     CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
     Call Trace:
      dump_stack+0x67/0x90
      bad_page.cold.116+0x8a/0xbd
      free_pcppages_bulk+0x4bf/0x6a0
      free_unref_page_list+0x10f/0x1f0
      shrink_page_list+0x49d/0xf50
      shrink_inactive_list+0x19d/0x3b0
      shrink_node_memcg.constprop.77+0x398/0x690
      ? shrink_slab.constprop.81+0x278/0x3f0
      shrink_node+0x7a/0x2f0
      kswapd+0x34b/0x6d0
      ? node_reclaim+0x240/0x240
      kthread+0x11f/0x140
      ? __kthread_bind_mask+0x60/0x60
      ret_from_fork+0x24/0x30
     Disabling lock debugging due to kernel taint
    ....
    
    The failures are from anyway that frees pages and empties the
    per-cpu page magazines, so it's not a predictable failure or an easy
    to debug failure.
    
    generic/038 is a reliable reproducer of this problem - it has a 9 in
    10 failure rate on one of my test machines. Failure on other
    machines have been at random points in fstests runs but every run
    has ended up tripping this problem. Hence generic/038 was used to
    bisect the failure because it was the most reliable failure.
    
    It is too close to the 4.20 release (not to mention holidays) to
    try to diagnose, fix and test the underlying cause of the problem,
    so reverting the commit is the only option we have right now. The
    revert has been tested against a current tot 4.20-rc7+ kernel across
    multiple machines running sub-page block size XFs filesystems and
    none of the bad page state failures have been seen.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: William Kucharski <william.kucharski@oracle.com>
    Cc: Darrick J. Wong <darrick.wong@oracle.com>
    Cc: Brian Foster <bfoster@redhat.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a837eca2
iomap.c 52.9 KB