1. 01 Nov, 2022 1 commit
    • Darrick J. Wong's avatar
      Merge tag 'refcount-cow-domain-6.1_2022-10-31' of... · 4eb559dd
      Darrick J. Wong authored
      Merge tag 'refcount-cow-domain-6.1_2022-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.1-fixesA
      
      xfs: improve runtime refcountbt corruption detection
      
      Fuzz testing of the refcount btree demonstrated a weakness in validation
      of refcount btree records during normal runtime.  The idea of using the
      upper bit of the rc_startblock field to separate the refcount records
      into one group for shared space and another for CoW staging extents was
      added at the last minute.  The incore struct left this bit encoded in
      the upper bit of the startblock field, which makes it all too easy for
      arithmetic operations to overflow if we don't detect the cowflag
      properly.
      
      When I ran a norepair fuzz tester, I was able to crash the kernel on one
      of these accidental overflows by fuzzing a key record in a node block,
      which broke lookups.  To fix the problem, make the domain (shared/cow) a
      separate field in the incore record.
      
      Unfortunately, a customer also hit this once in production.  Due to bugs
      in the kernel running on the VM host, writes to the disk image would
      occasionally be lost.  Given sufficient memory pressure on the VM guest,
      a refcountbt xfs_buf could be reclaimed and later reloaded from the
      stale copy on the virtual disk.  The stale disk contents were a refcount
      btree leaf block full of records for the wrong domain, and this caused
      an infinite loop in the guest VM.
      
      v2: actually include the refcount adjust loop invariant checking patch;
          move the deferred refcount continuation checks earlier in the series;
          break up the megapatch into smaller pieces; fix an uninitialized list
          error.
      v3: in the continuation check patch, verify the per-ag extent before
          converting it to a fsblock
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'refcount-cow-domain-6.1_2022-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: rename XFS_REFC_COW_START to _COWFLAG
        xfs: fix uninitialized list head in struct xfs_refcount_recovery
        xfs: fix agblocks check in the cow leftover recovery function
        xfs: check record domain when accessing refcount records
        xfs: remove XFS_FIND_RCEXT_SHARED and _COW
        xfs: refactor domain and refcount checking
        xfs: report refcount domain in tracepoints
        xfs: track cow/shared record domains explicitly in xfs_refcount_irec
        xfs: refactor refcount record usage in xchk_refcountbt_rec
        xfs: move _irec structs to xfs_types.h
        xfs: check deferred refcount op continuation parameters
        xfs: create a predicate to verify per-AG extents
        xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents
      4eb559dd
  2. 31 Oct, 2022 23 commits
  3. 26 Oct, 2022 1 commit
  4. 20 Oct, 2022 4 commits
    • Li Zetao's avatar
      xfs: Fix unreferenced object reported by kmemleak in xfs_sysfs_init() · d08af403
      Li Zetao authored
      kmemleak reported a sequence of memory leaks, and one of them indicated we
      failed to free a pointer:
        comm "mount", pid 19610, jiffies 4297086464 (age 60.635s)
          hex dump (first 8 bytes):
            73 64 61 00 81 88 ff ff                          sda.....
          backtrace:
            [<00000000d77f3e04>] kstrdup_const+0x46/0x70
            [<00000000e51fa804>] kobject_set_name_vargs+0x2f/0xb0
            [<00000000247cd595>] kobject_init_and_add+0xb0/0x120
            [<00000000f9139aaf>] xfs_mountfs+0x367/0xfc0
            [<00000000250d3caf>] xfs_fs_fill_super+0xa16/0xdc0
            [<000000008d873d38>] get_tree_bdev+0x256/0x390
            [<000000004881f3fa>] vfs_get_tree+0x41/0xf0
            [<000000008291ab52>] path_mount+0x9b3/0xdd0
            [<0000000022ba8f2d>] __x64_sys_mount+0x190/0x1d0
      
      As mentioned in kobject_init_and_add() comment, if this function
      returns an error, kobject_put() must be called to properly clean up
      the memory associated with the object. Apparently, xfs_sysfs_init()
      does not follow such a requirement. When kobject_init_and_add()
      returns an error, the space of kobj->kobject.name alloced by
      kstrdup_const() is unfree, which will cause the above stack.
      
      Fix it by adding kobject_put() when kobject_init_and_add returns an
      error.
      
      Fixes: a31b1d3d ("xfs: add xfs_mount sysfs kobject")
      Signed-off-by: default avatarLi Zetao <lizetao1@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      d08af403
    • Zeng Heng's avatar
      xfs: fix memory leak in xfs_errortag_init · cf4f4c12
      Zeng Heng authored
      When `xfs_sysfs_init` returns failed, `mp->m_errortag` needs to free.
      Otherwise kmemleak would report memory leak after mounting xfs image:
      
      unreferenced object 0xffff888101364900 (size 192):
        comm "mount", pid 13099, jiffies 4294915218 (age 335.207s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f08ad25c>] __kmalloc+0x41/0x1b0
          [<00000000dca9aeb6>] kmem_alloc+0xfd/0x430
          [<0000000040361882>] xfs_errortag_init+0x20/0x110
          [<00000000b384a0f6>] xfs_mountfs+0x6ea/0x1a30
          [<000000003774395d>] xfs_fs_fill_super+0xe10/0x1a80
          [<000000009cf07b6c>] get_tree_bdev+0x3e7/0x700
          [<00000000046b5426>] vfs_get_tree+0x8e/0x2e0
          [<00000000952ec082>] path_mount+0xf8c/0x1990
          [<00000000beb1f838>] do_mount+0xee/0x110
          [<000000000e9c41bb>] __x64_sys_mount+0x14b/0x1f0
          [<00000000f7bb938e>] do_syscall_64+0x3b/0x90
          [<000000003fcd67a9>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: c6840101 ("xfs: expose errortag knobs via sysfs")
      Signed-off-by: default avatarZeng Heng <zengheng4@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      cf4f4c12
    • Colin Ian King's avatar
      xfs: remove redundant pointer lip · fc93812c
      Colin Ian King authored
      The assignment to pointer lip is not really required, the pointer lip
      is redundant and can be removed.
      
      Cleans up clang-scan warning:
      warning: Although the value stored to 'lip' is used in the enclosing
      expression, the value is never actually read from 'lip'
      [deadcode.DeadStores]
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      fc93812c
    • Guo Xuenan's avatar
      xfs: fix exception caused by unexpected illegal bestcount in leaf dir · 13cf24e0
      Guo Xuenan authored
      For leaf dir, In most cases, there should be as many bestfree slots
      as the dir data blocks that can fit under i_size (except for [1]).
      
      Root cause is we don't examin the number bestfree slots, when the slots
      number less than dir data blocks, if we need to allocate new dir data
      block and update the bestfree array, we will use the dir block number as
      index to assign bestfree array, while we did not check the leaf buf
      boundary which may cause UAF or other memory access problem. This issue
      can also triggered with test cases xfs/473 from fstests.
      
      According to Dave Chinner & Darrick's suggestion, adding buffer verifier
      to detect this abnormal situation in time.
      Simplify the testcase for fstest xfs/554 [1]
      
      The error log is shown as follows:
      ==================================================================
      BUG: KASAN: use-after-free in xfs_dir2_leaf_addname+0x1995/0x1ac0
      Write of size 2 at addr ffff88810168b000 by task touch/1552
      CPU: 5 PID: 1552 Comm: touch Not tainted 6.0.0-rc3+ #101
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.13.0-1ubuntu1.1 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x4d/0x66
       print_report.cold+0xf6/0x691
       kasan_report+0xa8/0x120
       xfs_dir2_leaf_addname+0x1995/0x1ac0
       xfs_dir_createname+0x58c/0x7f0
       xfs_create+0x7af/0x1010
       xfs_generic_create+0x270/0x5e0
       path_openat+0x270b/0x3450
       do_filp_open+0x1cf/0x2b0
       do_sys_openat2+0x46b/0x7a0
       do_sys_open+0xb7/0x130
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7fe4d9e9312b
      Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
      75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
      f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
      RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
      RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
      RDX: 0000000000000941 RSI: 00007ffda4c17f33 RDI: 00000000ffffff9c
      RBP: 00007ffda4c17f33 R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
      R13: 00007fe4d9f631a4 R14: 00007ffda4c17f33 R15: 0000000000000000
       </TASK>
      
      The buggy address belongs to the physical page:
      page:ffffea000405a2c0 refcount:0 mapcount:0 mapping:0000000000000000
      index:0x0 pfn:0x10168b
      flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
      raw: 002fffff80000000 ffffea0004057788 ffffea000402dbc8 0000000000000000
      raw: 0000000000000000 0000000000170000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88810168af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff88810168af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff88810168b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                         ^
       ffff88810168b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88810168b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      Disabling lock debugging due to kernel taint
      00000000: 58 44 44 33 5b 53 35 c2 00 00 00 00 00 00 00 78
      XDD3[S5........x
      XFS (sdb): Internal error xfs_dir2_data_use_free at line 1200 of file
      fs/xfs/libxfs/xfs_dir2_data.c.  Caller
      xfs_dir2_data_use_free+0x28a/0xeb0
      CPU: 5 PID: 1552 Comm: touch Tainted: G    B              6.0.0-rc3+
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.13.0-1ubuntu1.1 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x4d/0x66
       xfs_corruption_error+0x132/0x150
       xfs_dir2_data_use_free+0x198/0xeb0
       xfs_dir2_leaf_addname+0xa59/0x1ac0
       xfs_dir_createname+0x58c/0x7f0
       xfs_create+0x7af/0x1010
       xfs_generic_create+0x270/0x5e0
       path_openat+0x270b/0x3450
       do_filp_open+0x1cf/0x2b0
       do_sys_openat2+0x46b/0x7a0
       do_sys_open+0xb7/0x130
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7fe4d9e9312b
      Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
      75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
      f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
      RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
      RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
      RDX: 0000000000000941 RSI: 00007ffda4c17f46 RDI: 00000000ffffff9c
      RBP: 00007ffda4c17f46 R08: 0000000000000000 R09: 0000000000000001
      R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
      R13: 00007fe4d9f631a4 R14: 00007ffda4c17f46 R15: 0000000000000000
       </TASK>
      XFS (sdb): Corruption detected. Unmount and run xfs_repair
      
      [1] https://lore.kernel.org/all/20220928095355.2074025-1-guoxuenan@huawei.com/Reviewed-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      13cf24e0
  5. 18 Oct, 2022 1 commit
    • Darrick J. Wong's avatar
      xfs: avoid a UAF when log intent item recovery fails · 97cf7967
      Darrick J. Wong authored
      KASAN reported a UAF bug when I was running xfs/235:
      
       BUG: KASAN: use-after-free in xlog_recover_process_intents+0xa77/0xae0 [xfs]
       Read of size 8 at addr ffff88804391b360 by task mount/5680
      
       CPU: 2 PID: 5680 Comm: mount Not tainted 6.0.0-xfsx #6.0.0 77e7b52a4943a975441e5ac90a5ad7748b7867f6
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
       Call Trace:
        <TASK>
        dump_stack_lvl+0x34/0x44
        print_report.cold+0x2cc/0x682
        kasan_report+0xa3/0x120
        xlog_recover_process_intents+0xa77/0xae0 [xfs fb841c7180aad3f8359438576e27867f5795667e]
        xlog_recover_finish+0x7d/0x970 [xfs fb841c7180aad3f8359438576e27867f5795667e]
        xfs_log_mount_finish+0x2d7/0x5d0 [xfs fb841c7180aad3f8359438576e27867f5795667e]
        xfs_mountfs+0x11d4/0x1d10 [xfs fb841c7180aad3f8359438576e27867f5795667e]
        xfs_fs_fill_super+0x13d5/0x1a80 [xfs fb841c7180aad3f8359438576e27867f5795667e]
        get_tree_bdev+0x3da/0x6e0
        vfs_get_tree+0x7d/0x240
        path_mount+0xdd3/0x17d0
        __x64_sys_mount+0x1fa/0x270
        do_syscall_64+0x2b/0x80
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
       RIP: 0033:0x7ff5bc069eae
       Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0f 00 f7 d8 64 89 01 48
       RSP: 002b:00007ffe433fd448 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff5bc069eae
       RDX: 00005575d7213290 RSI: 00005575d72132d0 RDI: 00005575d72132b0
       RBP: 00005575d7212fd0 R08: 00005575d7213230 R09: 00005575d7213fe0
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
       R13: 00005575d7213290 R14: 00005575d72132b0 R15: 00005575d7212fd0
        </TASK>
      
       Allocated by task 5680:
        kasan_save_stack+0x1e/0x40
        __kasan_slab_alloc+0x66/0x80
        kmem_cache_alloc+0x152/0x320
        xfs_rui_init+0x17a/0x1b0 [xfs]
        xlog_recover_rui_commit_pass2+0xb9/0x2e0 [xfs]
        xlog_recover_items_pass2+0xe9/0x220 [xfs]
        xlog_recover_commit_trans+0x673/0x900 [xfs]
        xlog_recovery_process_trans+0xbe/0x130 [xfs]
        xlog_recover_process_data+0x103/0x2a0 [xfs]
        xlog_do_recovery_pass+0x548/0xc60 [xfs]
        xlog_do_log_recovery+0x62/0xc0 [xfs]
        xlog_do_recover+0x73/0x480 [xfs]
        xlog_recover+0x229/0x460 [xfs]
        xfs_log_mount+0x284/0x640 [xfs]
        xfs_mountfs+0xf8b/0x1d10 [xfs]
        xfs_fs_fill_super+0x13d5/0x1a80 [xfs]
        get_tree_bdev+0x3da/0x6e0
        vfs_get_tree+0x7d/0x240
        path_mount+0xdd3/0x17d0
        __x64_sys_mount+0x1fa/0x270
        do_syscall_64+0x2b/0x80
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
       Freed by task 5680:
        kasan_save_stack+0x1e/0x40
        kasan_set_track+0x21/0x30
        kasan_set_free_info+0x20/0x30
        ____kasan_slab_free+0x144/0x1b0
        slab_free_freelist_hook+0xab/0x180
        kmem_cache_free+0x1f1/0x410
        xfs_rud_item_release+0x33/0x80 [xfs]
        xfs_trans_free_items+0xc3/0x220 [xfs]
        xfs_trans_cancel+0x1fa/0x590 [xfs]
        xfs_rui_item_recover+0x913/0xd60 [xfs]
        xlog_recover_process_intents+0x24e/0xae0 [xfs]
        xlog_recover_finish+0x7d/0x970 [xfs]
        xfs_log_mount_finish+0x2d7/0x5d0 [xfs]
        xfs_mountfs+0x11d4/0x1d10 [xfs]
        xfs_fs_fill_super+0x13d5/0x1a80 [xfs]
        get_tree_bdev+0x3da/0x6e0
        vfs_get_tree+0x7d/0x240
        path_mount+0xdd3/0x17d0
        __x64_sys_mount+0x1fa/0x270
        do_syscall_64+0x2b/0x80
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
       The buggy address belongs to the object at ffff88804391b300
        which belongs to the cache xfs_rui_item of size 688
       The buggy address is located 96 bytes inside of
        688-byte region [ffff88804391b300, ffff88804391b5b0)
      
       The buggy address belongs to the physical page:
       page:ffffea00010e4600 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888043919320 pfn:0x43918
       head:ffffea00010e4600 order:2 compound_mapcount:0 compound_pincount:0
       flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff)
       raw: 04fff80000010200 0000000000000000 dead000000000122 ffff88807f0eadc0
       raw: ffff888043919320 0000000080140010 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff88804391b200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff88804391b280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff88804391b300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                              ^
        ffff88804391b380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff88804391b400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      
      The test fuzzes an rmap btree block and starts writer threads to induce
      a filesystem shutdown on the corrupt block.  When the filesystem is
      remounted, recovery will try to replay the committed rmap intent item,
      but the corruption problem causes the recovery transaction to fail.
      Cancelling the transaction frees the RUD, which frees the RUI that we
      recovered.
      
      When we return to xlog_recover_process_intents, @lip is now a dangling
      pointer, and we cannot use it to find the iop_recover method for the
      tracepoint.  Hence we must store the item ops before calling
      ->iop_recover if we want to give it to the tracepoint so that the trace
      data will tell us exactly which intent item failed.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      97cf7967
  6. 16 Oct, 2022 10 commits
    • Linus Torvalds's avatar
      Linux 6.1-rc1 · 9abf2313
      Linus Torvalds authored
      9abf2313
    • Linus Torvalds's avatar
      Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · f1947d7c
      Linus Torvalds authored
      Pull more random number generator updates from Jason Donenfeld:
       "This time with some large scale treewide cleanups.
      
        The intent of this pull is to clean up the way callers fetch random
        integers. The current rules for doing this right are:
      
         - If you want a secure or an insecure random u64, use get_random_u64()
      
         - If you want a secure or an insecure random u32, use get_random_u32()
      
           The old function prandom_u32() has been deprecated for a while
           now and is just a wrapper around get_random_u32(). Same for
           get_random_int().
      
         - If you want a secure or an insecure random u16, use get_random_u16()
      
         - If you want a secure or an insecure random u8, use get_random_u8()
      
         - If you want secure or insecure random bytes, use get_random_bytes().
      
           The old function prandom_bytes() has been deprecated for a while
           now and has long been a wrapper around get_random_bytes()
      
         - If you want a non-uniform random u32, u16, or u8 bounded by a
           certain open interval maximum, use prandom_u32_max()
      
           I say "non-uniform", because it doesn't do any rejection sampling
           or divisions. Hence, it stays within the prandom_*() namespace, not
           the get_random_*() namespace.
      
           I'm currently investigating a "uniform" function for 6.2. We'll see
           what comes of that.
      
        By applying these rules uniformly, we get several benefits:
      
         - By using prandom_u32_max() with an upper-bound that the compiler
           can prove at compile-time is ≤65536 or ≤256, internally
           get_random_u16() or get_random_u8() is used, which wastes fewer
           batched random bytes, and hence has higher throughput.
      
         - By using prandom_u32_max() instead of %, when the upper-bound is
           not a constant, division is still avoided, because
           prandom_u32_max() uses a faster multiplication-based trick instead.
      
         - By using get_random_u16() or get_random_u8() in cases where the
           return value is intended to indeed be a u16 or a u8, we waste fewer
           batched random bytes, and hence have higher throughput.
      
        This series was originally done by hand while I was on an airplane
        without Internet. Later, Kees and I worked on retroactively figuring
        out what could be done with Coccinelle and what had to be done
        manually, and then we split things up based on that.
      
        So while this touches a lot of files, the actual amount of code that's
        hand fiddled is comfortably small"
      
      * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        prandom: remove unused functions
        treewide: use get_random_bytes() when possible
        treewide: use get_random_u32() when possible
        treewide: use get_random_{u8,u16}() when possible, part 2
        treewide: use get_random_{u8,u16}() when possible, part 1
        treewide: use prandom_u32_max() when possible, part 2
        treewide: use prandom_u32_max() when possible, part 1
      f1947d7c
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of... · 8636df94
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
      
       - Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
         when using bperf (perf BPF based counters) with cgroups.
      
       - Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
         monitors bandwidth, latency, bus utilization and buffer occupancy.
      
         Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
      
       - User space tasks can migrate between CPUs, so when tracing selected
         CPUs, system-wide sideband is still needed, fix it in the setup of
         Intel PT on hybrid systems.
      
       - Fix metricgroups title message in 'perf list', it should state that
         the metrics groups are to be used with the '-M' option, not '-e'.
      
       - Sync the msr-index.h copy with the kernel sources, adding support for
         using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well
         as decoding it when printing the MSR tracepoint arguments.
      
       - Fix program header size and alignment when generating a JIT ELF in
         'perf inject'.
      
       - Add multiple new Intel PT 'perf test' entries, including a jitdump
         one.
      
       - Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
         running on PowerPC due to an invalid topology number in that arch.
      
       - Fix the 'perf test' for arm_coresight failures on the ARM Juno
         system.
      
       - Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this
         option to the or expression expected in the intercepted
         perf_event_open() syscall.
      
       - Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the
         'perf annotate' asm parser.
      
       - Fix 'perf mem record -C' option processing, it was being chopped up
         when preparing the underlying 'perf record -e mem-events' and thus
         being ignored, requiring using '-- -C CPUs' as a workaround.
      
       - Improvements and tidy ups for 'perf test' shell infra.
      
       - Fix Intel PT information printing segfault in uClibc, where a NULL
         format was being passed to fprintf.
      
      * tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
        tools arch x86: Sync the msr-index.h copy with the kernel sources
        perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
        perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
        perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
        perf tests stat+json_output: Include sanity check for topology
        perf tests stat+csv_output: Include sanity check for topology
        perf intel-pt: Fix system_wide dummy event for hybrid
        perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
        perf test: Fix attr tests for PERF_FORMAT_LOST
        perf test: test_intel_pt.sh: Add 9 tests
        perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
        perf test: test_intel_pt.sh: Add jitdump test
        perf test: test_intel_pt.sh: Tidy some alignment
        perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
        perf test: test_intel_pt.sh: Tidy some perf record options
        perf test: test_intel_pt.sh: Fix return checking again
        perf: Skip and warn on unknown format 'configN' attrs
        perf list: Fix metricgroups title message
        perf mem: Fix -C option behavior for perf mem record
        perf annotate: Add missing condition flags for arm64
        ...
      8636df94
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.1' of... · 2df76606
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
         combination of Clang >= 14 and GAS <= 2.35.
      
       - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
         the package size.
      
       - Fix modpost error under build environments using musl.
      
       - Make *.ll files keep value names for easier debugging
      
       - Fix single directory build
      
       - Prevent RISC-V from selecting the broken DWARF5 support when Clang
         and GAS are used together.
      
      * tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
        kbuild: fix single directory build
        kbuild: add -fno-discard-value-names to cmd_cc_ll_c
        scripts/clang-tools: Convert clang-tidy args to list
        modpost: put modpost options before argument
        kbuild: Stop including vmlinux.bz2 in the rpm's
        Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
        Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
      2df76606
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 2fcd8f10
      Linus Torvalds authored
      Pull more clk updates from Stephen Boyd:
       "This is the final part of the clk patches for this merge window.
      
        The clk rate range series needed another week to fully bake. Maxime
        fixed the bug that broke clk notifiers and prevented this from being
        included in the first pull request. He also added a unit test on top
        to make sure it doesn't break so easily again. The majority of the
        series fixes up how the clk_set_rate_*() APIs work, particularly
        around when the rate constraints are dropped and how they move around
        when reparenting clks. Overall it's a much needed improvement to the
        clk rate range APIs that used to be pretty broken if you looked
        sideways.
      
        Beyond the core changes there are a few driver fixes for a compilation
        issue or improper data causing clks to fail to register or have the
        wrong parents. These are good to get in before the first -rc so that
        the system actually boots on the affected devices"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits)
        clk: tegra: Fix Tegra PWM parent clock
        clk: at91: fix the build with binutils 2.27
        clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
        clk: mediatek: clk-mux: Add .determine_rate() callback
        clk: tests: Add tests for notifiers
        clk: Update req_rate on __clk_recalc_rates()
        clk: tests: Add missing test case for ranges
        clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
        clk: Introduce the clk_hw_get_rate_range function
        clk: Zero the clk_rate_request structure
        clk: Stop forwarding clk_rate_requests to the parent
        clk: Constify clk_has_parent()
        clk: Introduce clk_core_has_parent()
        clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
        clk: Add our request boundaries in clk_core_init_rate_req
        clk: Introduce clk_hw_init_rate_request()
        clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
        clk: Change clk_core_init_rate_req prototype
        clk: Set req_rate on reparenting
        clk: Take into account uncached clocks in clk_set_rate_range()
        ...
      2fcd8f10
    • Linus Torvalds's avatar
      Merge tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · b08cd744
      Linus Torvalds authored
      Pull more cifs updates from Steve French:
      
       - fix a regression in guest mounts to old servers
      
       - improvements to directory leasing (caching directory entries safely
         beyond the root directory)
      
       - symlink improvement (reducing roundtrips needed to process symlinks)
      
       - an lseek fix (to problem where some dir entries could be skipped)
      
       - improved ioctl for returning more detailed information on directory
         change notifications
      
       - clarify multichannel interface query warning
      
       - cleanup fix (for better aligning buffers using ALIGN and round_up)
      
       - a compounding fix
      
       - fix some uninitialized variable bugs found by Coverity and the kernel
         test robot
      
      * tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: improve SMB3 change notification support
        cifs: lease key is uninitialized in two additional functions when smb1
        cifs: lease key is uninitialized in smb1 paths
        smb3: must initialize two ACL struct fields to zero
        cifs: fix double-fault crash during ntlmssp
        cifs: fix static checker warning
        cifs: use ALIGN() and round_up() macros
        cifs: find and use the dentry for cached non-root directories also
        cifs: enable caching of directories for which a lease is held
        cifs: prevent copying past input buffer boundaries
        cifs: fix uninitialised var in smb2_compound_op()
        cifs: improve symlink handling for smb2+
        smb3: clarify multichannel warning
        cifs: fix regression in very old smb1 mounts
        cifs: fix skipping to incorrect offset in emit_cached_dirents
      b08cd744
    • Tetsuo Handa's avatar
      Revert "cpumask: fix checking valid cpu range". · 80493877
      Tetsuo Handa authored
      This reverts commit 78e5a339 ("cpumask: fix checking valid cpu range").
      
      syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
      cpu_max_bits_warn() [1], for commit 78e5a339 ("cpumask: fix checking
      valid cpu range") is broken.  Obviously that patch hits WARN_ON_ONCE()
      when e.g.  reading /proc/cpuinfo because passing "cpu + 1" instead of
      "cpu" will trivially hit cpu == nr_cpumask_bits condition.
      
      Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
      this problem was not fixed immediately.  As a result, that patch was
      sent to linux.git before the patch author recognizes this problem, and
      syzbot started failing to test changes in linux.git since 2022/10/10
      [3].
      
      Andrew Jones proposed a fix for x86 and riscv architectures [4].  But
      [2] and [5] indicate that affected locations are not limited to arch
      code.  More delay before we find and fix affected locations, less tested
      kernel (and more difficult to bisect and fix) before release.
      
      We should have inspected and fixed basically all cpumask users before
      applying that patch.  We should not crash kernels in order to ask
      existing cpumask users to update their code, even if limited to
      CONFIG_DEBUG_PER_CPU_MAPS=y case.
      
      Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1]
      Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2]
      Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3]
      Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4]
      Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5]
      Reported-by: default avatarAndrew Jones <ajones@ventanamicro.com>
      Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Yury Norov <yury.norov@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80493877
    • Nathan Chancellor's avatar
      lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5 · 0a6de78c
      Nathan Chancellor authored
      When building with a RISC-V kernel with DWARF5 debug info using clang
      and the GNU assembler, several instances of the following error appear:
      
        /tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported
      
      Dumping the .s file reveals these .uleb128 directives come from
      .debug_loc and .debug_ranges:
      
        .Ldebug_loc0:
                .byte   4                               # DW_LLE_offset_pair
                .uleb128 .Lfunc_begin0-.Lfunc_begin0    #   starting offset
                .uleb128 .Ltmp1-.Lfunc_begin0           #   ending offset
                .byte   1                               # Loc expr size
                .byte   90                              # DW_OP_reg10
                .byte   0                               # DW_LLE_end_of_list
      
        .Ldebug_ranges0:
                .byte   4                               # DW_RLE_offset_pair
                .uleb128 .Ltmp6-.Lfunc_begin0           #   starting offset
                .uleb128 .Ltmp27-.Lfunc_begin0          #   ending offset
                .byte   4                               # DW_RLE_offset_pair
                .uleb128 .Ltmp28-.Lfunc_begin0          #   starting offset
                .uleb128 .Ltmp30-.Lfunc_begin0          #   ending offset
                .byte   0                               # DW_RLE_end_of_list
      
      There is an outstanding binutils issue to support a non-constant operand
      to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
      be any movement on it, due to concerns over how it would work with
      linker relaxation.
      
      To avoid these build errors, prevent DWARF5 from being selected when
      using clang and an assembler that does not have support for these symbol
      deltas, which can be easily checked in Kconfig with as-instr plus the
      small test program from the dwz test suite from the binutils issue.
      
      Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
      Link: https://github.com/ClangBuiltLinux/linux/issues/1719Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      0a6de78c
    • Masahiro Yamada's avatar
      kbuild: fix single directory build · 3753af77
      Masahiro Yamada authored
      Commit f110e5a2 ("kbuild: refactor single builds of *.ko") was wrong.
      
      KBUILD_MODULES _is_ needed for single builds.
      
      Otherwise, "make foo/bar/baz/" does not build module objects at all.
      
      Fixes: f110e5a2 ("kbuild: refactor single builds of *.ko")
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarDavid Sterba <dsterba@suse.com>
      3753af77
    • Linus Torvalds's avatar
      Merge tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · 1501278b
      Linus Torvalds authored
      Pull slab hotfix from Vlastimil Babka:
       "A single fix for the common-kmalloc series, for warnings on mips and
        sparc64 reported by Guenter Roeck"
      
      * tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
      1501278b