1. 09 Dec, 2022 34 commits
    • Jan Kara's avatar
      ext4: handle redirtying in ext4_bio_write_page() · 04e568a3
      Jan Kara authored
      Since we want to transition transaction commits to use ext4_writepages()
      for writing back ordered, add handling of page redirtying into
      ext4_bio_write_page(). Also move buffer dirty bit clearing into the same
      place other buffer state handling.
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221207112722.22220-1-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      04e568a3
    • Ye Bin's avatar
      ext4: fix kernel BUG in 'ext4_write_inline_data_end()' · 5c099c4f
      Ye Bin authored
      Syzbot report follow issue:
      ------------[ cut here ]------------
      kernel BUG at fs/ext4/inline.c:227!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 3629 Comm: syz-executor212 Not tainted 6.1.0-rc5-syzkaller-00018-g59d0d52c #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      RIP: 0010:ext4_write_inline_data+0x344/0x3e0 fs/ext4/inline.c:227
      RSP: 0018:ffffc90003b3f368 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff8880704e16c0 RCX: 0000000000000000
      RDX: ffff888021763a80 RSI: ffffffff821e31a4 RDI: 0000000000000006
      RBP: 000000000006818e R08: 0000000000000006 R09: 0000000000068199
      R10: 0000000000000079 R11: 0000000000000000 R12: 000000000000000b
      R13: 0000000000068199 R14: ffffc90003b3f408 R15: ffff8880704e1c82
      FS:  000055555723e3c0(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fffe8ac9080 CR3: 0000000079f81000 CR4: 0000000000350ee0
      Call Trace:
       <TASK>
       ext4_write_inline_data_end+0x2a3/0x12f0 fs/ext4/inline.c:768
       ext4_write_end+0x242/0xdd0 fs/ext4/inode.c:1313
       ext4_da_write_end+0x3ed/0xa30 fs/ext4/inode.c:3063
       generic_perform_write+0x316/0x570 mm/filemap.c:3764
       ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:285
       ext4_file_write_iter+0x8bc/0x16e0 fs/ext4/file.c:700
       call_write_iter include/linux/fs.h:2191 [inline]
       do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
       do_iter_write+0x182/0x700 fs/read_write.c:861
       vfs_iter_write+0x74/0xa0 fs/read_write.c:902
       iter_file_splice_write+0x745/0xc90 fs/splice.c:686
       do_splice_from fs/splice.c:764 [inline]
       direct_splice_actor+0x114/0x180 fs/splice.c:931
       splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886
       do_splice_direct+0x1ab/0x280 fs/splice.c:974
       do_sendfile+0xb19/0x1270 fs/read_write.c:1255
       __do_sys_sendfile64 fs/read_write.c:1323 [inline]
       __se_sys_sendfile64 fs/read_write.c:1309 [inline]
       __x64_sys_sendfile64+0x1d0/0x210 fs/read_write.c:1309
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      ---[ end trace 0000000000000000 ]---
      
      Above issue may happens as follows:
      ext4_da_write_begin
        ext4_da_write_inline_data_begin
          ext4_da_convert_inline_data_to_extent
            ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
      ext4_da_write_end
      
      ext4_run_li_request
        ext4_mb_prefetch
          ext4_read_block_bitmap_nowait
            ext4_validate_block_bitmap
              ext4_mark_group_bitmap_corrupted(sb, block_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT)
      	 percpu_counter_sub(&sbi->s_freeclusters_counter,grp->bb_free);
      	  -> sbi->s_freeclusters_counter become zero
      ext4_da_write_begin
        if (ext4_nonda_switch(inode->i_sb)) -> As freeclusters_counter is zero will return true
          *fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
          ext4_write_begin
      ext4_da_write_end
        if (write_mode == FALL_BACK_TO_NONDELALLOC)
          ext4_write_end
            if (inline_data)
              ext4_write_inline_data_end
      	  ext4_write_inline_data
      	    BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);
                 -> As inode is already convert to extent, so 'pos + len' > inline_size
      	   -> then trigger BUG.
      
      To solve this issue, instead of checking ext4_has_inline_data() which
      is only cleared after data has been written back, check the
      EXT4_STATE_MAY_INLINE_DATA flag in ext4_write_end().
      
      Fixes: f19d5870 ("ext4: add normal write support for inline data")
      Reported-by: syzbot+4faa160fa96bfba639f8@syzkaller.appspotmail.com
      Reported-by: default avatarJun Nie <jun.nie@linaro.org>
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20221206144134.1919987-1-yebin@huaweicloud.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      5c099c4f
    • Guoqing Jiang's avatar
      ext4: make ext4_mb_initialize_context return void · d73eff68
      Guoqing Jiang authored
      Change the return type to void since it always return 0, and no need
      to do the checking in ext4_mb_new_blocks.
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      Reviewed-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221202120409.24098-1-guoqing.jiang@linux.devSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d73eff68
    • Jan Kara's avatar
      ext4: fix deadlock due to mbcache entry corruption · a44e84a9
      Jan Kara authored
      When manipulating xattr blocks, we can deadlock infinitely looping
      inside ext4_xattr_block_set() where we constantly keep finding xattr
      block for reuse in mbcache but we are unable to reuse it because its
      reference count is too big. This happens because cache entry for the
      xattr block is marked as reusable (e_reusable set) although its
      reference count is too big. When this inconsistency happens, this
      inconsistent state is kept indefinitely and so ext4_xattr_block_set()
      keeps retrying indefinitely.
      
      The inconsistent state is caused by non-atomic update of e_reusable bit.
      e_reusable is part of a bitfield and e_reusable update can race with
      update of e_referenced bit in the same bitfield resulting in loss of one
      of the updates. Fix the problem by using atomic bitops instead.
      
      This bug has been around for many years, but it became *much* easier
      to hit after commit 65f8b800 ("ext4: fix race when reusing xattr
      blocks").
      
      Cc: stable@vger.kernel.org
      Fixes: 6048c64b ("mbcache: add reusable flag to cache entries")
      Fixes: 65f8b800 ("ext4: fix race when reusing xattr blocks")
      Reported-and-tested-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Reported-by: default avatarThilo Fromm <t-lo@linux.microsoft.com>
      Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microsoft.com/Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a44e84a9
    • Jan Kara's avatar
      ext4: avoid BUG_ON when creating xattrs · b40ebaf6
      Jan Kara authored
      Commit fb0a387d ("ext4: limit block allocations for indirect-block
      files to < 2^32") added code to try to allocate xattr block with 32-bit
      block number for indirect block based files on the grounds that these
      files cannot use larger block numbers. It also added BUG_ON when
      allocated block could not fit into 32 bits. This is however bogus
      reasoning because xattr block is stored in inode->i_file_acl and
      inode->i_file_acl_hi and as such even indirect block based files can
      happily use full 48 bits for xattr block number. The proper handling
      seems to be there basically since 64-bit block number support was added.
      So remove the bogus limitation and BUG_ON.
      
      Cc: Eric Sandeen <sandeen@redhat.com>
      Fixes: fb0a387d ("ext4: limit block allocations for indirect-block files to < 2^32")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221121130929.32031-1-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      b40ebaf6
    • Alexander Potapenko's avatar
      fs: ext4: initialize fsdata in pagecache_write() · 956510c0
      Alexander Potapenko authored
      When aops->write_begin() does not initialize fsdata, KMSAN reports
      an error passing the latter to aops->write_end().
      
      Fix this by unconditionally initializing fsdata.
      
      Cc: Eric Biggers <ebiggers@kernel.org>
      Fixes: c93d8f88 ("ext4: add basic fs-verity support")
      Reported-by: syzbot+9767be679ef5016b6082@syzkaller.appspotmail.com
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221121112134.407362-1-glider@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      956510c0
    • Eric Whitney's avatar
      ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline · 131294c3
      Eric Whitney authored
      When converting files with inline data to extents, delayed allocations
      made on a file system created with both the bigalloc and inline options
      can result in invalid extent status cache content, incorrect reserved
      cluster counts, kernel memory leaks, and potential kernel panics.
      
      With bigalloc, the code that determines whether a block must be
      delayed allocated searches the extent tree to see if that block maps
      to a previously allocated cluster.  If not, the block is delayed
      allocated, and otherwise, it isn't.  However, if the inline option is
      also used, and if the file containing the block is marked as able to
      store data inline, there isn't a valid extent tree associated with
      the file.  The current code in ext4_clu_mapped() calls
      ext4_find_extent() to search the non-existent tree for a previously
      allocated cluster anyway, which typically finds nothing, as desired.
      However, a side effect of the search can be to cache invalid content
      from the non-existent tree (garbage) in the extent status tree,
      including bogus entries in the pending reservation tree.
      
      To fix this, avoid searching the extent tree when allocating blocks
      for bigalloc + inline files that are being converted from inline to
      extent mapped.
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Link: https://lore.kernel.org/r/20221117152207.2424-1-enwlinux@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      131294c3
    • Ye Bin's avatar
      ext4: fix uninititialized value in 'ext4_evict_inode' · 7ea71af9
      Ye Bin authored
      Syzbot found the following issue:
      =====================================================
      BUG: KMSAN: uninit-value in ext4_evict_inode+0xdd/0x26b0 fs/ext4/inode.c:180
       ext4_evict_inode+0xdd/0x26b0 fs/ext4/inode.c:180
       evict+0x365/0x9a0 fs/inode.c:664
       iput_final fs/inode.c:1747 [inline]
       iput+0x985/0xdd0 fs/inode.c:1773
       __ext4_new_inode+0xe54/0x7ec0 fs/ext4/ialloc.c:1361
       ext4_mknod+0x376/0x840 fs/ext4/namei.c:2844
       vfs_mknod+0x79d/0x830 fs/namei.c:3914
       do_mknodat+0x47d/0xaa0
       __do_sys_mknodat fs/namei.c:3992 [inline]
       __se_sys_mknodat fs/namei.c:3989 [inline]
       __ia32_sys_mknodat+0xeb/0x150 fs/namei.c:3989
       do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
       __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
       do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
       do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:246
       entry_SYSENTER_compat_after_hwframe+0x70/0x82
      
      Uninit was created at:
       __alloc_pages+0x9f1/0xe80 mm/page_alloc.c:5578
       alloc_pages+0xaae/0xd80 mm/mempolicy.c:2285
       alloc_slab_page mm/slub.c:1794 [inline]
       allocate_slab+0x1b5/0x1010 mm/slub.c:1939
       new_slab mm/slub.c:1992 [inline]
       ___slab_alloc+0x10c3/0x2d60 mm/slub.c:3180
       __slab_alloc mm/slub.c:3279 [inline]
       slab_alloc_node mm/slub.c:3364 [inline]
       slab_alloc mm/slub.c:3406 [inline]
       __kmem_cache_alloc_lru mm/slub.c:3413 [inline]
       kmem_cache_alloc_lru+0x6f3/0xb30 mm/slub.c:3429
       alloc_inode_sb include/linux/fs.h:3117 [inline]
       ext4_alloc_inode+0x5f/0x860 fs/ext4/super.c:1321
       alloc_inode+0x83/0x440 fs/inode.c:259
       new_inode_pseudo fs/inode.c:1018 [inline]
       new_inode+0x3b/0x430 fs/inode.c:1046
       __ext4_new_inode+0x2a7/0x7ec0 fs/ext4/ialloc.c:959
       ext4_mkdir+0x4d5/0x1560 fs/ext4/namei.c:2992
       vfs_mkdir+0x62a/0x870 fs/namei.c:4035
       do_mkdirat+0x466/0x7b0 fs/namei.c:4060
       __do_sys_mkdirat fs/namei.c:4075 [inline]
       __se_sys_mkdirat fs/namei.c:4073 [inline]
       __ia32_sys_mkdirat+0xc4/0x120 fs/namei.c:4073
       do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
       __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
       do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
       do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:246
       entry_SYSENTER_compat_after_hwframe+0x70/0x82
      
      CPU: 1 PID: 4625 Comm: syz-executor.2 Not tainted 6.1.0-rc4-syzkaller-62821-gcb231e2f67ec #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      =====================================================
      
      Now, 'ext4_alloc_inode()' didn't init 'ei->i_flags'. If new inode failed
      before set 'ei->i_flags' in '__ext4_new_inode()', then do 'iput()'. As after
      6bc0d63d commit will access 'ei->i_flags' in 'ext4_evict_inode()' which
      will lead to access uninit-value.
      To solve above issue just init 'ei->i_flags' in 'ext4_alloc_inode()'.
      
      Reported-by: syzbot+57b25da729eb0b88177d@syzkaller.appspotmail.com
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Fixes: 6bc0d63d ("ext4: remove EA inode entry from mbcache on inode eviction")
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221117073603.2598882-1-yebin@huaweicloud.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      7ea71af9
    • Baokun Li's avatar
      ext4: fix corruption when online resizing a 1K bigalloc fs · 0aeaa255
      Baokun Li authored
      When a backup superblock is updated in update_backups(), the primary
      superblock's offset in the group (that is, sbi->s_sbh->b_blocknr) is used
      as the backup superblock's offset in its group. However, when the block
      size is 1K and bigalloc is enabled, the two offsets are not equal. This
      causes the backup group descriptors to be overwritten by the superblock
      in update_backups(). Moreover, if meta_bg is enabled, the file system will
      be corrupted because this feature uses backup group descriptors.
      
      To solve this issue, we use a more accurate ext4_group_first_block_no() as
      the offset of the backup superblock in its group.
      
      Fixes: d77147ff ("ext4: add support for online resizing with bigalloc")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20221117040341.1380702-4-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0aeaa255
    • Baokun Li's avatar
      ext4: fix corrupt backup group descriptors after online resize · 8f49ec60
      Baokun Li authored
      In commit 9a8c5b0d ("ext4: update the backup superblock's at the end
      of the online resize"), it is assumed that update_backups() only updates
      backup superblocks, so each b_data is treated as a backupsuper block to
      update its s_block_group_nr and s_checksum. However, update_backups()
      also updates the backup group descriptors, which causes the backup group
      descriptors to be corrupted.
      
      The above commit fixes the problem of invalid checksum of the backup
      superblock. The root cause of this problem is that the checksum of
      ext4_update_super() is not set correctly. This problem has been fixed
      in the previous patch ("ext4: fix bad checksum after online resize").
      
      However, we do need to set block_group_nr for the backup superblock in
      update_backups(). When a block is in a group that contains a backup
      superblock, and the block is the first block in the group, the block is
      definitely a superblock. We add a helper function that includes setting
      s_block_group_nr and updating checksum, and then call it only when the
      above conditions are met to prevent the backup group descriptors from
      being incorrectly modified.
      
      Fixes: 9a8c5b0d ("ext4: update the backup superblock's at the end of the online resize")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20221117040341.1380702-3-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      8f49ec60
    • Baokun Li's avatar
      ext4: fix bad checksum after online resize · a408f33e
      Baokun Li authored
      When online resizing is performed twice consecutively, the error message
      "Superblock checksum does not match superblock" is displayed for the
      second time. Here's the reproducer:
      
      	mkfs.ext4 -F /dev/sdb 100M
      	mount /dev/sdb /tmp/test
      	resize2fs /dev/sdb 5G
      	resize2fs /dev/sdb 6G
      
      To solve this issue, we moved the update of the checksum after the
      es->s_overhead_clusters is updated.
      
      Fixes: 026d0d27 ("ext4: reduce computation of overhead during resize")
      Fixes: de394a86 ("ext4: update s_overhead_clusters in the superblock during an on-line resize")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/r/20221117040341.1380702-2-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a408f33e
    • Darrick J. Wong's avatar
      ext4: don't fail GETFSUUID when the caller provides a long buffer · a7e9d977
      Darrick J. Wong authored
      If userspace provides a longer UUID buffer than is required, we
      shouldn't fail the call with EINVAL -- rather, we can fill the caller's
      buffer with the bytes we /can/ fill, and update the length field to
      reflect what we copied.  This doesn't break the UAPI since we're
      enabling a case that currently fails, and so far Ted hasn't released a
      version of e2fsprogs that uses the new ext4 ioctl.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarCatherine Hoang <catherine.hoang@oracle.com>
      Link: https://lore.kernel.org/r/166811139478.327006.13879198441587445544.stgit@magnoliaSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      a7e9d977
    • Darrick J. Wong's avatar
      ext4: dont return EINVAL from GETFSUUID when reporting UUID length · b76abb51
      Darrick J. Wong authored
      If userspace calls this ioctl with fsu_length (the length of the
      fsuuid.fsu_uuid array) set to zero, ext4 copies the desired uuid length
      out to userspace.  The kernel call returned a result from a valid input,
      so the return value here should be zero, not EINVAL.
      
      While we're at it, fix the copy_to_user call to make it clear that we're
      only copying out fsu_len.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarCatherine Hoang <catherine.hoang@oracle.com>
      Link: https://lore.kernel.org/r/166811138914.327006.9241306894437166566.stgit@magnoliaSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      b76abb51
    • Luís Henriques's avatar
      ext4: fix error code return to user-space in ext4_get_branch() · 26d75a16
      Luís Henriques authored
      If a block is out of range in ext4_get_branch(), -ENOMEM will be returned
      to user-space.  Obviously, this error code isn't really useful.  This
      patch fixes it by making sure the right error code (-EFSCORRUPTED) is
      propagated to user-space.  EUCLEAN is more informative than ENOMEM.
      Signed-off-by: default avatarLuís Henriques <lhenriques@suse.de>
      Link: https://lore.kernel.org/r/20221109181445.17843-1-lhenriques@suse.deSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      26d75a16
    • JunChao Sun's avatar
      ext4: replace kmem_cache_create with KMEM_CACHE · 060f7739
      JunChao Sun authored
      Replace kmem_cache_create with KMEM_CACHE macro that
      guaranteed struct alignment
      Signed-off-by: default avatarJunChao Sun <sunjunchao2870@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221109153822.80250-1-sunjunchao2870@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      060f7739
    • Baokun Li's avatar
      ext4: correct inconsistent error msg in nojournal mode · 89481b5f
      Baokun Li authored
      When we used the journal_async_commit mounting option in nojournal mode,
      the kernel told me that "can't mount with journal_checksum", was very
      confusing. I find that when we mount with journal_async_commit, both the
      JOURNAL_ASYNC_COMMIT and EXPLICIT_JOURNAL_CHECKSUM flags are set. However,
      in the error branch, CHECKSUM is checked before ASYNC_COMMIT. As a result,
      the above inconsistency occurs, and the ASYNC_COMMIT branch becomes dead
      code that cannot be executed. Therefore, we exchange the positions of the
      two judgments to make the error msg more accurate.
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221109074343.4184862-1-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      89481b5f
    • Lukas Czerner's avatar
      ext4: print file system UUID on mount, remount and unmount · bb0fbc78
      Lukas Czerner authored
      The device names are not necessarily consistent across reboots which can
      make it more difficult to identify the right file system when tracking
      down issues using system logs.
      
      Print file system UUID string on every mount, remount and unmount to
      make this task easier.
      
      This is similar to the functionality recently propsed for XFS.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: Lukas Herbolt <lukas@herbolt.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/20221108145042.85770-1-lczerner@redhat.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      bb0fbc78
    • Ye Bin's avatar
      ext4: init quota for 'old.inode' in 'ext4_rename' · fae381a3
      Ye Bin authored
      Syzbot found the following issue:
      ext4_parse_param: s_want_extra_isize=128
      ext4_inode_info_init: s_want_extra_isize=32
      ext4_rename: old.inode=ffff88823869a2c8 old.dir=ffff888238699828 new.inode=ffff88823869d7e8 new.dir=ffff888238699828
      __ext4_mark_inode_dirty: inode=ffff888238699828 ea_isize=32 want_ea_size=128
      __ext4_mark_inode_dirty: inode=ffff88823869a2c8 ea_isize=32 want_ea_size=128
      ext4_xattr_block_set: inode=ffff88823869a2c8
      ------------[ cut here ]------------
      WARNING: CPU: 13 PID: 2234 at fs/ext4/xattr.c:2070 ext4_xattr_block_set.cold+0x22/0x980
      Modules linked in:
      RIP: 0010:ext4_xattr_block_set.cold+0x22/0x980
      RSP: 0018:ffff888227d3f3b0 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff88823007a000 RCX: 0000000000000000
      RDX: 0000000000000a03 RSI: 0000000000000040 RDI: ffff888230078178
      RBP: 0000000000000000 R08: 000000000000002c R09: ffffed1075c7df8e
      R10: ffff8883ae3efc6b R11: ffffed1075c7df8d R12: 0000000000000000
      R13: ffff88823869a2c8 R14: ffff8881012e0460 R15: dffffc0000000000
      FS:  00007f350ac1f740(0000) GS:ffff8883ae200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f350a6ed6a0 CR3: 0000000237456000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ? ext4_xattr_set_entry+0x3b7/0x2320
       ? ext4_xattr_block_set+0x0/0x2020
       ? ext4_xattr_set_entry+0x0/0x2320
       ? ext4_xattr_check_entries+0x77/0x310
       ? ext4_xattr_ibody_set+0x23b/0x340
       ext4_xattr_move_to_block+0x594/0x720
       ext4_expand_extra_isize_ea+0x59a/0x10f0
       __ext4_expand_extra_isize+0x278/0x3f0
       __ext4_mark_inode_dirty.cold+0x347/0x410
       ext4_rename+0xed3/0x174f
       vfs_rename+0x13a7/0x2510
       do_renameat2+0x55d/0x920
       __x64_sys_rename+0x7d/0xb0
       do_syscall_64+0x3b/0xa0
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      As 'ext4_rename' will modify 'old.inode' ctime and mark inode dirty,
      which may trigger expand 'extra_isize' and allocate block. If inode
      didn't init quota will lead to warning.  To solve above issue, init
      'old.inode' firstly in 'ext4_rename'.
      
      Reported-by: syzbot+98346927678ac3059c77@syzkaller.appspotmail.com
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221107015335.2524319-1-yebin@huaweicloud.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      fae381a3
    • Eric Biggers's avatar
      ext4: simplify fast-commit CRC calculation · 8805dbcb
      Eric Biggers authored
      Instead of checksumming each field as it is added to the block, just
      checksum each block before it is written.  This is simpler, and also
      much more efficient.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-8-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      8805dbcb
    • Eric Biggers's avatar
      ext4: fix off-by-one errors in fast-commit block filling · 48a6a66d
      Eric Biggers authored
      Due to several different off-by-one errors, or perhaps due to a late
      change in design that wasn't fully reflected in the code that was
      actually merged, there are several very strange constraints on how
      fast-commit blocks are filled with tlv entries:
      
      - tlvs must start at least 10 bytes before the end of the block, even
        though the minimum tlv length is 8.  Otherwise, the replay code will
        ignore them.  (BUG: ext4_fc_reserve_space() could violate this
        requirement if called with a len of blocksize - 9 or blocksize - 8.
        Fortunately, this doesn't seem to happen currently.)
      
      - tlvs must end at least 1 byte before the end of the block.  Otherwise
        the replay code will consider them to be invalid.  This quirk
        contributed to a bug (fixed by an earlier commit) where uninitialized
        memory was being leaked to disk in the last byte of blocks.
      
      Also, strangely these constraints don't apply to the replay code in
      e2fsprogs, which will accept any tlvs in the blocks (with no bounds
      checks at all, but that is a separate issue...).
      
      Given that this all seems to be a bug, let's fix it by just filling
      blocks with tlv entries in the natural way.
      
      Note that old kernels will be unable to replay fast-commit journals
      created by kernels that have this commit.
      
      Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
      Cc: <stable@vger.kernel.org> # v5.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-7-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      48a6a66d
    • Eric Biggers's avatar
      ext4: fix unaligned memory access in ext4_fc_reserve_space() · 8415ce07
      Eric Biggers authored
      As is done elsewhere in the file, build the struct ext4_fc_tl on the
      stack and memcpy() it into the buffer, rather than directly writing it
      to a potentially-unaligned location in the buffer.
      
      Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
      Cc: <stable@vger.kernel.org> # v5.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-6-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      8415ce07
    • Eric Biggers's avatar
      ext4: add missing validation of fast-commit record lengths · 64b4a25c
      Eric Biggers authored
      Validate the inode and filename lengths in fast-commit journal records
      so that a malicious fast-commit journal cannot cause a crash by having
      invalid values for these.  Also validate EXT4_FC_TAG_DEL_RANGE.
      
      Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
      Cc: <stable@vger.kernel.org> # v5.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-5-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      64b4a25c
    • Eric Biggers's avatar
      ext4: fix leaking uninitialized memory in fast-commit journal · 594bc43b
      Eric Biggers authored
      When space at the end of fast-commit journal blocks is unused, make sure
      to zero it out so that uninitialized memory is not leaked to disk.
      
      Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
      Cc: <stable@vger.kernel.org> # v5.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-4-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      594bc43b
    • Eric Biggers's avatar
      ext4: don't set up encryption key during jbd2 transaction · 4c0d5778
      Eric Biggers authored
      Commit a80f7fcf ("ext4: fixup ext4_fc_track_* functions' signature")
      extended the scope of the transaction in ext4_unlink() too far, making
      it include the call to ext4_find_entry().  However, ext4_find_entry()
      can deadlock when called from within a transaction because it may need
      to set up the directory's encryption key.
      
      Fix this by restoring the transaction to its original scope.
      
      Reported-by: syzbot+1a748d0007eeac3ab079@syzkaller.appspotmail.com
      Fixes: a80f7fcf ("ext4: fixup ext4_fc_track_* functions' signature")
      Cc: <stable@vger.kernel.org> # v5.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-3-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4c0d5778
    • Eric Biggers's avatar
      ext4: disable fast-commit of encrypted dir operations · 0fbcb525
      Eric Biggers authored
      fast-commit of create, link, and unlink operations in encrypted
      directories is completely broken because the unencrypted filenames are
      being written to the fast-commit journal instead of the encrypted
      filenames.  These operations can't be replayed, as encryption keys
      aren't present at journal replay time.  It is also an information leak.
      
      Until if/when we can get this working properly, make encrypted directory
      operations ineligible for fast-commit.
      
      Note that fast-commit operations on encrypted regular files continue to
      be allowed, as they seem to work.
      
      Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
      Cc: <stable@vger.kernel.org> # v5.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221106224841.279231-2-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0fbcb525
    • Baokun Li's avatar
      ext4: fix use-after-free in ext4_orphan_cleanup · a71248b1
      Baokun Li authored
      I caught a issue as follows:
      ==================================================================
       BUG: KASAN: use-after-free in __list_add_valid+0x28/0x1a0
       Read of size 8 at addr ffff88814b13f378 by task mount/710
      
       CPU: 1 PID: 710 Comm: mount Not tainted 6.1.0-rc3-next #370
       Call Trace:
        <TASK>
        dump_stack_lvl+0x73/0x9f
        print_report+0x25d/0x759
        kasan_report+0xc0/0x120
        __asan_load8+0x99/0x140
        __list_add_valid+0x28/0x1a0
        ext4_orphan_cleanup+0x564/0x9d0 [ext4]
        __ext4_fill_super+0x48e2/0x5300 [ext4]
        ext4_fill_super+0x19f/0x3a0 [ext4]
        get_tree_bdev+0x27b/0x450
        ext4_get_tree+0x19/0x30 [ext4]
        vfs_get_tree+0x49/0x150
        path_mount+0xaae/0x1350
        do_mount+0xe2/0x110
        __x64_sys_mount+0xf0/0x190
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
        </TASK>
       [...]
      ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      ext4_fill_super
        ext4_orphan_cleanup
         --- loop1: assume last_orphan is 12 ---
          list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan)
          ext4_truncate --> return 0
            ext4_inode_attach_jinode --> return -ENOMEM
          iput(inode) --> free inode<12>
         --- loop2: last_orphan is still 12 ---
          list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan);
          // use inode<12> and trigger UAF
      
      To solve this issue, we need to propagate the return value of
      ext4_inode_attach_jinode() appropriately.
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221102080633.1630225-1-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      a71248b1
    • Eric Biggers's avatar
      ext4: don't allow journal inode to have encrypt flag · 105c78e1
      Eric Biggers authored
      Mounting a filesystem whose journal inode has the encrypt flag causes a
      NULL dereference in fscrypt_limit_io_blocks() when the 'inlinecrypt'
      mount option is used.
      
      The problem is that when jbd2_journal_init_inode() calls bmap(), it
      eventually finds its way into ext4_iomap_begin(), which calls
      fscrypt_limit_io_blocks().  fscrypt_limit_io_blocks() requires that if
      the inode is encrypted, then its encryption key must already be set up.
      That's not the case here, since the journal inode is never "opened" like
      a normal file would be.  Hence the crash.
      
      A reproducer is:
      
          mkfs.ext4 -F /dev/vdb
          debugfs -w /dev/vdb -R "set_inode_field <8> flags 0x80808"
          mount /dev/vdb /mnt -o inlinecrypt
      
      To fix this, make ext4 consider journal inodes with the encrypt flag to
      be invalid.  (Note, maybe other flags should be rejected on the journal
      inode too.  For now, this is just the minimal fix for the above issue.)
      
      I've marked this as fixing the commit that introduced the call to
      fscrypt_limit_io_blocks(), since that's what made an actual crash start
      being possible.  But this fix could be applied to any version of ext4
      that supports the encrypt feature.
      
      Reported-by: syzbot+ba9dac45bc76c490b7c3@syzkaller.appspotmail.com
      Fixes: 38ea50da ("ext4: support direct I/O with fscrypt using blk-crypto")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20221102053312.189962-1-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      105c78e1
    • Gaosheng Cui's avatar
      ext4: fix undefined behavior in bit shift for ext4_check_flag_values · 3bf678a0
      Gaosheng Cui authored
      Shifting signed 32-bit value by 31 bits is undefined, so changing
      significant bit to unsigned. The UBSAN warning calltrace like below:
      
      UBSAN: shift-out-of-bounds in fs/ext4/ext4.h:591:2
      left shift of 1 by 31 places cannot be represented in type 'int'
      Call Trace:
       <TASK>
       dump_stack_lvl+0x7d/0xa5
       dump_stack+0x15/0x1b
       ubsan_epilogue+0xe/0x4e
       __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
       ext4_init_fs+0x5a/0x277
       do_one_initcall+0x76/0x430
       kernel_init_freeable+0x3b3/0x422
       kernel_init+0x24/0x1e0
       ret_from_fork+0x1f/0x30
       </TASK>
      
      Fixes: 9a4c8019 ("ext4: ensure Inode flags consistency are checked at build time")
      Signed-off-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Link: https://lore.kernel.org/r/20221031055833.3966222-1-cuigaosheng1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      3bf678a0
    • Baokun Li's avatar
      ext4: fix bug_on in __es_tree_search caused by bad boot loader inode · 991ed014
      Baokun Li authored
      We got a issue as fllows:
      ==================================================================
       kernel BUG at fs/ext4/extents_status.c:203!
       invalid opcode: 0000 [#1] PREEMPT SMP
       CPU: 1 PID: 945 Comm: cat Not tainted 6.0.0-next-20221007-dirty #349
       RIP: 0010:ext4_es_end.isra.0+0x34/0x42
       RSP: 0018:ffffc9000143b768 EFLAGS: 00010203
       RAX: 0000000000000000 RBX: ffff8881769cd0b8 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffffffff8fc27cf7 RDI: 00000000ffffffff
       RBP: ffff8881769cd0bc R08: 0000000000000000 R09: ffffc9000143b5f8
       R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881769cd0a0
       R13: ffff8881768e5668 R14: 00000000768e52f0 R15: 0000000000000000
       FS: 00007f359f7f05c0(0000)GS:ffff88842fd00000(0000)knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f359f5a2000 CR3: 000000017130c000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        __es_tree_search.isra.0+0x6d/0xf5
        ext4_es_cache_extent+0xfa/0x230
        ext4_cache_extents+0xd2/0x110
        ext4_find_extent+0x5d5/0x8c0
        ext4_ext_map_blocks+0x9c/0x1d30
        ext4_map_blocks+0x431/0xa50
        ext4_mpage_readpages+0x48e/0xe40
        ext4_readahead+0x47/0x50
        read_pages+0x82/0x530
        page_cache_ra_unbounded+0x199/0x2a0
        do_page_cache_ra+0x47/0x70
        page_cache_ra_order+0x242/0x400
        ondemand_readahead+0x1e8/0x4b0
        page_cache_sync_ra+0xf4/0x110
        filemap_get_pages+0x131/0xb20
        filemap_read+0xda/0x4b0
        generic_file_read_iter+0x13a/0x250
        ext4_file_read_iter+0x59/0x1d0
        vfs_read+0x28f/0x460
        ksys_read+0x73/0x160
        __x64_sys_read+0x1e/0x30
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
        </TASK>
      ==================================================================
      
      In the above issue, ioctl invokes the swap_inode_boot_loader function to
      swap inode<5> and inode<12>. However, inode<5> contain incorrect imode and
      disordered extents, and i_nlink is set to 1. The extents check for inode in
      the ext4_iget function can be bypassed bacause 5 is EXT4_BOOT_LOADER_INO.
      While links_count is set to 1, the extents are not initialized in
      swap_inode_boot_loader. After the ioctl command is executed successfully,
      the extents are swapped to inode<12>, in this case, run the `cat` command
      to view inode<12>. And Bug_ON is triggered due to the incorrect extents.
      
      When the boot loader inode is not initialized, its imode can be one of the
      following:
      1) the imode is a bad type, which is marked as bad_inode in ext4_iget and
         set to S_IFREG.
      2) the imode is good type but not S_IFREG.
      3) the imode is S_IFREG.
      
      The BUG_ON may be triggered by bypassing the check in cases 1 and 2.
      Therefore, when the boot loader inode is bad_inode or its imode is not
      S_IFREG, initialize the inode to avoid triggering the BUG.
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221026042310.3839669-5-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      991ed014
    • Baokun Li's avatar
      ext4: add EXT4_IGET_BAD flag to prevent unexpected bad inode · 63b1e9bc
      Baokun Li authored
      There are many places that will get unhappy (and crash) when ext4_iget()
      returns a bad inode. However, if iget the boot loader inode, allows a bad
      inode to be returned, because the inode may not be initialized. This
      mechanism can be used to bypass some checks and cause panic. To solve this
      problem, we add a special iget flag EXT4_IGET_BAD. Only with this flag
      we'd be returning bad inode from ext4_iget(), otherwise we always return
      the error code if the inode is bad inode.(suggested by Jan Kara)
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221026042310.3839669-4-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      63b1e9bc
    • Baokun Li's avatar
      ext4: add helper to check quota inums · 07342ec2
      Baokun Li authored
      Before quota is enabled, a check on the preset quota inums in
      ext4_super_block is added to prevent wrong quota inodes from being loaded.
      In addition, when the quota fails to be enabled, the quota type and quota
      inum are printed to facilitate fault locating.
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221026042310.3839669-3-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      07342ec2
    • Baokun Li's avatar
      ext4: fix bug_on in __es_tree_search caused by bad quota inode · d3238774
      Baokun Li authored
      We got a issue as fllows:
      ==================================================================
       kernel BUG at fs/ext4/extents_status.c:202!
       invalid opcode: 0000 [#1] PREEMPT SMP
       CPU: 1 PID: 810 Comm: mount Not tainted 6.1.0-rc1-next-g9631525255e3 #352
       RIP: 0010:__es_tree_search.isra.0+0xb8/0xe0
       RSP: 0018:ffffc90001227900 EFLAGS: 00010202
       RAX: 0000000000000000 RBX: 0000000077512a0f RCX: 0000000000000000
       RDX: 0000000000000002 RSI: 0000000000002a10 RDI: ffff8881004cd0c8
       RBP: ffff888177512ac8 R08: 47ffffffffffffff R09: 0000000000000001
       R10: 0000000000000001 R11: 00000000000679af R12: 0000000000002a10
       R13: ffff888177512d88 R14: 0000000077512a10 R15: 0000000000000000
       FS: 00007f4bd76dbc40(0000)GS:ffff88842fd00000(0000)knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00005653bf993cf8 CR3: 000000017bfdf000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        ext4_es_cache_extent+0xe2/0x210
        ext4_cache_extents+0xd2/0x110
        ext4_find_extent+0x5d5/0x8c0
        ext4_ext_map_blocks+0x9c/0x1d30
        ext4_map_blocks+0x431/0xa50
        ext4_getblk+0x82/0x340
        ext4_bread+0x14/0x110
        ext4_quota_read+0xf0/0x180
        v2_read_header+0x24/0x90
        v2_check_quota_file+0x2f/0xa0
        dquot_load_quota_sb+0x26c/0x760
        dquot_load_quota_inode+0xa5/0x190
        ext4_enable_quotas+0x14c/0x300
        __ext4_fill_super+0x31cc/0x32c0
        ext4_fill_super+0x115/0x2d0
        get_tree_bdev+0x1d2/0x360
        ext4_get_tree+0x19/0x30
        vfs_get_tree+0x26/0xe0
        path_mount+0x81d/0xfc0
        do_mount+0x8d/0xc0
        __x64_sys_mount+0xc0/0x160
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
        </TASK>
      ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      ext4_fill_super
       ext4_orphan_cleanup
        ext4_enable_quotas
         ext4_quota_enable
          ext4_iget --> get error inode <5>
           ext4_ext_check_inode --> Wrong imode makes it escape inspection
           make_bad_inode(inode) --> EXT4_BOOT_LOADER_INO set imode
          dquot_load_quota_inode
           vfs_setup_quota_inode --> check pass
           dquot_load_quota_sb
            v2_check_quota_file
             v2_read_header
              ext4_quota_read
               ext4_bread
                ext4_getblk
                 ext4_map_blocks
                  ext4_ext_map_blocks
                   ext4_find_extent
                    ext4_cache_extents
                     ext4_es_cache_extent
                      __es_tree_search.isra.0
                       ext4_es_end --> Wrong extents trigger BUG_ON
      
      In the above issue, s_usr_quota_inum is set to 5, but inode<5> contains
      incorrect imode and disordered extents. Because 5 is EXT4_BOOT_LOADER_INO,
      the ext4_ext_check_inode check in the ext4_iget function can be bypassed,
      finally, the extents that are not checked trigger the BUG_ON in the
      __es_tree_search function. To solve this issue, check whether the inode is
      bad_inode in vfs_setup_quota_inode().
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221026042310.3839669-2-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      d3238774
    • Luís Henriques's avatar
      ext4: remove trailing newline from ext4_msg() message · 78742d4d
      Luís Henriques authored
      The ext4_msg() function adds a new line to the message.  Remove extra '\n'
      from call to ext4_msg() in ext4_orphan_cleanup().
      Signed-off-by: default avatarLuís Henriques <lhenriques@suse.de>
      Link: https://lore.kernel.org/r/20221011155758.15287-1-lhenriques@suse.deSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      78742d4d
    • Bixuan Cui's avatar
      jbd2: use the correct print format · d87a7b4c
      Bixuan Cui authored
      The print format error was found when using ftrace event:
          <...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
          <...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0
      
      Use the correct print format for transaction, head and tid.
      
      Fixes: 879c5e6b ('jbd2: convert instrumentation from markers to tracepoints')
      Signed-off-by: default avatarBixuan Cui <cuibixuan@linux.alibaba.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Link: https://lore.kernel.org/r/1665488024-95172-1-git-send-email-cuibixuan@linux.alibaba.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      d87a7b4c
  2. 01 Dec, 2022 5 commits
  3. 29 Nov, 2022 1 commit