• Gao Xiang's avatar
    mm/migrate: fix deadlock in migrate_pages_batch() on large folios · 2e6506e1
    Gao Xiang authored
    Currently, migrate_pages_batch() can lock multiple locked folios with an
    arbitrary order.  Although folio_trylock() is used to avoid deadlock as
    commit 2ef7dbb2 ("migrate_pages: try migrate in batch asynchronously
    firstly") mentioned, it seems try_split_folio() is still missing.
    
    It was found by compaction stress test when I explicitly enable EROFS
    compressed files to use large folios, which case I cannot reproduce with
    the same workload if large folio support is off (current mainline). 
    Typically, filesystem reads (with locked file-backed folios) could use
    another bdev/meta inode to load some other I/Os (e.g.  inode extent
    metadata or caching compressed data), so the locking order will be:
    
      file-backed folios  (A)
         bdev/meta folios (B)
    
    The following calltrace shows the deadlock:
       Thread 1 takes (B) lock and tries to take folio (A) lock
       Thread 2 takes (A) lock and tries to take folio (B) lock
    
    [Thread 1]
    INFO: task stress:1824 blocked for more than 30 seconds.
          Tainted: G           OE      6.10.0-rc7+ #6
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:stress          state:D stack:0     pid:1824  tgid:1824  ppid:1822   flags:0x0000000c
    Call trace:
     __switch_to+0xec/0x138
     __schedule+0x43c/0xcb0
     schedule+0x54/0x198
     io_schedule+0x44/0x70
     folio_wait_bit_common+0x184/0x3f8
    			<-- folio mapping ffff00036d69cb18 index 996  (**)
     __folio_lock+0x24/0x38
     migrate_pages_batch+0x77c/0xea0	// try_split_folio (mm/migrate.c:1486:2)
    					// migrate_pages_batch (mm/migrate.c:1734:16)
    		<--- LIST_HEAD(unmap_folios) has
    			..
    			folio mapping 0xffff0000d184f1d8 index 1711;   (*)
    			folio mapping 0xffff0000d184f1d8 index 1712;
    			..
     migrate_pages+0xb28/0xe90
     compact_zone+0xa08/0x10f0
     compact_node+0x9c/0x180
     sysctl_compaction_handler+0x8c/0x118
     proc_sys_call_handler+0x1a8/0x280
     proc_sys_write+0x1c/0x30
     vfs_write+0x240/0x380
     ksys_write+0x78/0x118
     __arm64_sys_write+0x24/0x38
     invoke_syscall+0x78/0x108
     el0_svc_common.constprop.0+0x48/0xf0
     do_el0_svc+0x24/0x38
     el0_svc+0x3c/0x148
     el0t_64_sync_handler+0x100/0x130
     el0t_64_sync+0x190/0x198
    
    [Thread 2]
    INFO: task stress:1825 blocked for more than 30 seconds.
          Tainted: G           OE      6.10.0-rc7+ #6
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:stress          state:D stack:0     pid:1825  tgid:1825  ppid:1822   flags:0x0000000c
    Call trace:
     __switch_to+0xec/0x138
     __schedule+0x43c/0xcb0
     schedule+0x54/0x198
     io_schedule+0x44/0x70
     folio_wait_bit_common+0x184/0x3f8
    			<-- folio = 0xfffffdffc6b503c0 (mapping == 0xffff0000d184f1d8 index == 1711) (*)
     __folio_lock+0x24/0x38
     z_erofs_runqueue+0x384/0x9c0 [erofs]
     z_erofs_readahead+0x21c/0x350 [erofs]       <-- folio mapping 0xffff00036d69cb18 range from [992, 1024] (**)
     read_pages+0x74/0x328
     page_cache_ra_order+0x26c/0x348
     ondemand_readahead+0x1c0/0x3a0
     page_cache_sync_ra+0x9c/0xc0
     filemap_get_pages+0xc4/0x708
     filemap_read+0x104/0x3a8
     generic_file_read_iter+0x4c/0x150
     vfs_read+0x27c/0x330
     ksys_pread64+0x84/0xd0
     __arm64_sys_pread64+0x28/0x40
     invoke_syscall+0x78/0x108
     el0_svc_common.constprop.0+0x48/0xf0
     do_el0_svc+0x24/0x38
     el0_svc+0x3c/0x148
     el0t_64_sync_handler+0x100/0x130
     el0t_64_sync+0x190/0x198
    
    Link: https://lkml.kernel.org/r/20240729021306.398286-1-hsiangkao@linux.alibaba.com
    Fixes: 5dfab109 ("migrate_pages: batch _unmap and _move")
    Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
    Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    2e6506e1
migrate.c 71.1 KB