• Zhaoyang Huang's avatar
    mm: fix incorrect vbq reference in purge_fragmented_block · 8c61291f
    Zhaoyang Huang authored
    xa_for_each() in _vm_unmap_aliases() loops through all vbs.  However,
    since commit 062eacf5 ("mm: vmalloc: remove a global vmap_blocks
    xarray") the vb from xarray may not be on the corresponding CPU
    vmap_block_queue.  Consequently, purge_fragmented_block() might use the
    wrong vbq->lock to protect the free list, leading to vbq->free breakage.
    
    Incorrect lock protection can exhaust all vmalloc space as follows:
    CPU0                                            CPU1
    +--------------------------------------------+
    |    +--------------------+     +-----+      |
    +--> |                    |---->|     |------+
         | CPU1:vbq free_list |     | vb1 |
    +--- |                    |<----|     |<-----+
    |    +--------------------+     +-----+      |
    +--------------------------------------------+
    
    _vm_unmap_aliases()                             vb_alloc()
                                                    new_vmap_block()
    xa_for_each(&vbq->vmap_blocks, idx, vb)
    --> vb in CPU1:vbq->freelist
    
    purge_fragmented_block(vb)
    spin_lock(&vbq->lock)                           spin_lock(&vbq->lock)
    --> use CPU0:vbq->lock                          --> use CPU1:vbq->lock
    
    list_del_rcu(&vb->free_list)                    list_add_tail_rcu(&vb->free_list, &vbq->free)
        __list_del(vb->prev, vb->next)
            next->prev = prev
        +--------------------+
        |                    |
        | CPU1:vbq free_list |
    +---|                    |<--+
    |   +--------------------+   |
    +----------------------------+
                                                    __list_add(new, head->prev, head)
    +--------------------------------------------+
    |    +--------------------+     +-----+      |
    +--> |                    |---->|     |------+
         | CPU1:vbq free_list |     | vb2 |
    +--- |                    |<----|     |<-----+
    |    +--------------------+     +-----+      |
    +--------------------------------------------+
    
            prev->next = next
    +--------------------------------------------+
    |----------------------------+               |
    |    +--------------------+  |  +-----+      |
    +--> |                    |--+  |     |------+
         | CPU1:vbq free_list |     | vb2 |
    +--- |                    |<----|     |<-----+
    |    +--------------------+     +-----+      |
    +--------------------------------------------+
    Here’s a list breakdown. All vbs, which were to be added to
    ‘prev’, cannot be used by list_for_each_entry_rcu(vb, &vbq->free,
    free_list) in vb_alloc(). Thus, vmalloc space is exhausted.
    
    This issue affects both erofs and f2fs, the stacktrace is as follows:
    erofs:
    [<ffffffd4ffb93ad4>] __switch_to+0x174
    [<ffffffd4ffb942f0>] __schedule+0x624
    [<ffffffd4ffb946f4>] schedule+0x7c
    [<ffffffd4ffb947cc>] schedule_preempt_disabled+0x24
    [<ffffffd4ffb962ec>] __mutex_lock+0x374
    [<ffffffd4ffb95998>] __mutex_lock_slowpath+0x14
    [<ffffffd4ffb95954>] mutex_lock+0x24
    [<ffffffd4fef2900c>] reclaim_and_purge_vmap_areas+0x44
    [<ffffffd4fef25908>] alloc_vmap_area+0x2e0
    [<ffffffd4fef24ea0>] vm_map_ram+0x1b0
    [<ffffffd4ff1b46f4>] z_erofs_lz4_decompress+0x278
    [<ffffffd4ff1b8ac4>] z_erofs_decompress_queue+0x650
    [<ffffffd4ff1b8328>] z_erofs_runqueue+0x7f4
    [<ffffffd4ff1b66a8>] z_erofs_read_folio+0x104
    [<ffffffd4feeb6fec>] filemap_read_folio+0x6c
    [<ffffffd4feeb68c4>] filemap_fault+0x300
    [<ffffffd4fef0ecac>] __do_fault+0xc8
    [<ffffffd4fef0c908>] handle_mm_fault+0xb38
    [<ffffffd4ffb9f008>] do_page_fault+0x288
    [<ffffffd4ffb9ed64>] do_translation_fault[jt]+0x40
    [<ffffffd4fec39c78>] do_mem_abort+0x58
    [<ffffffd4ffb8c3e4>] el0_ia+0x70
    [<ffffffd4ffb8c260>] el0t_64_sync_handler[jt]+0xb0
    [<ffffffd4fec11588>] ret_to_user[jt]+0x0
    
    f2fs:
    [<ffffffd4ffb93ad4>] __switch_to+0x174
    [<ffffffd4ffb942f0>] __schedule+0x624
    [<ffffffd4ffb946f4>] schedule+0x7c
    [<ffffffd4ffb947cc>] schedule_preempt_disabled+0x24
    [<ffffffd4ffb962ec>] __mutex_lock+0x374
    [<ffffffd4ffb95998>] __mutex_lock_slowpath+0x14
    [<ffffffd4ffb95954>] mutex_lock+0x24
    [<ffffffd4fef2900c>] reclaim_and_purge_vmap_areas+0x44
    [<ffffffd4fef25908>] alloc_vmap_area+0x2e0
    [<ffffffd4fef24ea0>] vm_map_ram+0x1b0
    [<ffffffd4ff1a3b60>] f2fs_prepare_decomp_mem+0x144
    [<ffffffd4ff1a6c24>] f2fs_alloc_dic+0x264
    [<ffffffd4ff175468>] f2fs_read_multi_pages+0x428
    [<ffffffd4ff17b46c>] f2fs_mpage_readpages+0x314
    [<ffffffd4ff1785c4>] f2fs_readahead+0x50
    [<ffffffd4feec3384>] read_pages+0x80
    [<ffffffd4feec32c0>] page_cache_ra_unbounded+0x1a0
    [<ffffffd4feec39e8>] page_cache_ra_order+0x274
    [<ffffffd4feeb6cec>] do_sync_mmap_readahead+0x11c
    [<ffffffd4feeb6764>] filemap_fault+0x1a0
    [<ffffffd4ff1423bc>] f2fs_filemap_fault+0x28
    [<ffffffd4fef0ecac>] __do_fault+0xc8
    [<ffffffd4fef0c908>] handle_mm_fault+0xb38
    [<ffffffd4ffb9f008>] do_page_fault+0x288
    [<ffffffd4ffb9ed64>] do_translation_fault[jt]+0x40
    [<ffffffd4fec39c78>] do_mem_abort+0x58
    [<ffffffd4ffb8c3e4>] el0_ia+0x70
    [<ffffffd4ffb8c260>] el0t_64_sync_handler[jt]+0xb0
    [<ffffffd4fec11588>] ret_to_user[jt]+0x0
    
    To fix this, introducee cpu within vmap_block to record which this vb
    belongs to.
    
    Link: https://lkml.kernel.org/r/20240614021352.1822225-1-zhaoyang.huang@unisoc.com
    Link: https://lkml.kernel.org/r/20240607023116.1720640-1-zhaoyang.huang@unisoc.com
    Fixes: fc1e0d98 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
    Signed-off-by: default avatarZhaoyang Huang <zhaoyang.huang@unisoc.com>
    Suggested-by: default avatarHailong.Liu <hailong.liu@oppo.com>
    Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    8c61291f
vmalloc.c 132 KB