1. 09 Aug, 2024 2 commits
  2. 08 Aug, 2024 4 commits
    • Steven Rostedt's avatar
      tracefs: Use generic inode RCU for synchronizing freeing · 0b6743bd
      Steven Rostedt authored
      With structure layout randomization enabled for 'struct inode' we need to
      avoid overlapping any of the RCU-used / initialized-only-once members,
      e.g. i_lru or i_sb_list to not corrupt related list traversals when making
      use of the rcu_head.
      
      For an unlucky structure layout of 'struct inode' we may end up with the
      following splat when running the ftrace selftests:
      
      [<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object])
      [<...>] ------------[ cut here ]------------
      [<...>] kernel BUG at lib/list_debug.c:54!
      [<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      [<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G                 N  6.8.12-grsec+ #122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65
      [<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
      [<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0
      [<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f
      [<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283
      [<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000
      [<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001
      [<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25
      [<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d
      [<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000
      [<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object]
      [<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0
      [<...>] RSI: __func__.47+0x4340/0x4400
      [<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object]
      [<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550]
      [<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550]
      [<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550]
      [<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object]
      [<...>] FS:  00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000
      [<...>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0
      [<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [<...>] ASID: 0003
      [<...>] Stack:
      [<...>]  ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0
      [<...>]  ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f
      [<...>]  0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392
      [<...>] Call Trace:
      [<...>]  <TASK>
      [<...>]  [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0
      [<...>]  [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48
      [<...>]  [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88
      [<...>]  [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90
      [<...>]  [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8
      [<...>]  [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8
      [<...>]  [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40
      [<...>]  [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48
      [<...>]  [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70
      [<...>]  [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78
      [<...>]  [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80
      [<...>]  [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8
      [<...>]  [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0
      [<...>]  [<ffffffff8294ad20>] ? do_one_tree+0x140/0x140 fffffe80416afcf8
      [<...>]  [<ffffffff82997349>] ? do_remount+0x329/0xa00 fffffe80416afd18
      [<...>]  [<ffffffff83ebf7a1>] ? security_sb_remount+0x81/0x1c0 fffffe80416afd38
      [<...>]  [<ffffffff82892096>] reconfigure_super+0x856/0x14e0 fffffe80416afd70
      [<...>]  [<ffffffff815d1327>] ? ns_capable_common+0xe7/0x2a0 fffffe80416afd90
      [<...>]  [<ffffffff82997436>] do_remount+0x416/0xa00 fffffe80416afdd0
      [<...>]  [<ffffffff829b2ba4>] path_mount+0x5c4/0x900 fffffe80416afe28
      [<...>]  [<ffffffff829b25e0>] ? finish_automount+0x13a0/0x13a0 fffffe80416afe60
      [<...>]  [<ffffffff82903812>] ? user_path_at_empty+0xb2/0x140 fffffe80416afe88
      [<...>]  [<ffffffff829b2ff5>] do_mount+0x115/0x1c0 fffffe80416afeb8
      [<...>]  [<ffffffff829b2ee0>] ? path_mount+0x900/0x900 fffffe80416afed8
      [<...>]  [<ffffffff8272461c>] ? __kasan_check_write+0x1c/0xa0 fffffe80416afee0
      [<...>]  [<ffffffff829b31cf>] __do_sys_mount+0x12f/0x280 fffffe80416aff30
      [<...>]  [<ffffffff829b36cd>] __x64_sys_mount+0xcd/0x2e0 fffffe80416aff70
      [<...>]  [<ffffffff819f8818>] ? syscall_trace_enter+0x218/0x380 fffffe80416aff88
      [<...>]  [<ffffffff8111655e>] x64_sys_call+0x5d5e/0x6720 fffffe80416affa8
      [<...>]  [<ffffffff8952756d>] do_syscall_64+0xcd/0x3c0 fffffe80416affb8
      [<...>]  [<ffffffff8100119b>] entry_SYSCALL_64_safe_stack+0x4c/0x87 fffffe80416affe8
      [<...>]  </TASK>
      [<...>]  <PTREGS>
      [<...>] RIP: 0033:[<00006dcb382ff66a>] vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
      [<...>] Code: 48 8b 0d 29 18 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 17 0d 00 f7 d8 64 89 01 48
      [<...>] RSP: 002b:0000763d68192558 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [<...>] RAX: ffffffffffffffda RBX: 00006dcb38433264 RCX: 00006dcb382ff66a
      [<...>] RDX: 000017c3e0d11210 RSI: 000017c3e0d1a5a0 RDI: 000017c3e0d1ae70
      [<...>] RBP: 000017c3e0d10fb0 R08: 000017c3e0d11260 R09: 00006dcb383d1be0
      [<...>] R10: 000000000020002e R11: 0000000000000246 R12: 0000000000000000
      [<...>] R13: 000017c3e0d1ae70 R14: 000017c3e0d11210 R15: 000017c3e0d10fb0
      [<...>] RBX: vm_area_struct[mount 2550 2550 file 6dcb38433000-6dcb38434000 5b 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] RCX: vm_area_struct[mount 2550 2550 file 6dcb38225000-6dcb3837e000 22 55(read|exec|mayread|mayexec)]+0x0/0xb8 [userland map]
      [<...>] RDX: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] RSI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] RDI: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] RBP: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] RSP: vm_area_struct[mount 2550 2550 anon 763d68173000-763d68195000 7ffffffdd 100133(read|write|mayread|maywrite|growsdown|account)]+0x0/0xb8 [userland map]
      [<...>] R08: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] R09: vm_area_struct[mount 2550 2550 file 6dcb383d1000-6dcb383d3000 1cd 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] R13: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] R14: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>] R15: vm_area_struct[mount 2550 2550 anon 17c3e0d0f000-17c3e0d31000 17c3e0d0f 100033(read|write|mayread|maywrite|account)]+0x0/0xb8 [userland map]
      [<...>]  </PTREGS>
      [<...>] Modules linked in:
      [<...>] ---[ end trace 0000000000000000 ]---
      
      The list debug message as well as RBX's symbolic value point out that the
      object in question was allocated from 'tracefs_inode_cache' and that the
      list's '->next' member is at offset 0. Dumping the layout of the relevant
      parts of 'struct tracefs_inode' gives the following:
      
        struct tracefs_inode {
          union {
            struct inode {
              struct list_head {
                struct list_head * next;                    /*     0     8 */
                struct list_head * prev;                    /*     8     8 */
              } i_lru;
              [...]
            } vfs_inode;
            struct callback_head {
              void (*func)(struct callback_head *);         /*     0     8 */
              struct callback_head * next;                  /*     8     8 */
            } rcu;
          };
          [...]
        };
      
      Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
      destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
      call_rcu() or later when calling the RCU callback. This will disturb
      concurrent list traversals as well as object reuse which assumes these
      list heads will keep their integrity.
      
      For reproduction, the following diff manually overlays 'i_lru' with
      'rcu' as, otherwise, one would require some good portion of luck for
      gambling an unlucky RANDSTRUCT seed:
      
        --- a/include/linux/fs.h
        +++ b/include/linux/fs.h
        @@ -629,6 +629,7 @@ struct inode {
         	umode_t			i_mode;
         	unsigned short		i_opflags;
         	kuid_t			i_uid;
        +	struct list_head	i_lru;		/* inode LRU list */
         	kgid_t			i_gid;
         	unsigned int		i_flags;
      
        @@ -690,7 +691,6 @@ struct inode {
         	u16			i_wb_frn_avg_time;
         	u16			i_wb_frn_history;
         #endif
        -	struct list_head	i_lru;		/* inode LRU list */
         	struct list_head	i_sb_list;
         	struct list_head	i_wb_list;	/* backing dev writeback list */
         	union {
      
      The tracefs inode does not need to supply its own RCU delayed destruction
      of its inode. The inode code itself offers both a "destroy_inode()"
      callback that gets called when the last reference of the inode is
      released, and the "free_inode()" which is called after a RCU
      synchronization period from the "destroy_inode()".
      
      The tracefs code can unlink the inode from its list in the destroy_inode()
      callback, and the simply free it from the free_inode() callback. This
      should provide the same protection.
      
      Link: https://lore.kernel.org/all/20240807115143.45927-3-minipli@grsecurity.net/
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Ajay Kaher <ajay.kaher@broadcom.com>
      Cc: Ilkka =?utf-8?b?TmF1bGFww6TDpA==?= <digirigawa@gmail.com>
      Link: https://lore.kernel.org/20240807185402.61410544@gandalf.local.home
      Fixes: baa23a8d ("tracefs: Reset permissions on remount if permissions are options")
      Reported-by: default avatarMathias Krause <minipli@grsecurity.net>
      Reported-by: default avatarBrad Spengler <spender@grsecurity.net>
      Suggested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      0b6743bd
    • Jianhui Zhou's avatar
      ring-buffer: Remove unused function ring_buffer_nr_pages() · 58f7e4d7
      Jianhui Zhou authored
      Because ring_buffer_nr_pages() is not an inline function and user accesses
      buffer->buffers[cpu]->nr_pages directly, the function ring_buffer_nr_pages
      is removed.
      Signed-off-by: default avatarJianhui Zhou <912460177@qq.com>
      Link: https://lore.kernel.org/tencent_F4A7E9AB337F44E0F4B858D07D19EF460708@qq.comSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      58f7e4d7
    • Tze-nan Wu's avatar
      tracing: Fix overflow in get_free_elt() · bcf86c01
      Tze-nan Wu authored
      "tracing_map->next_elt" in get_free_elt() is at risk of overflowing.
      
      Once it overflows, new elements can still be inserted into the tracing_map
      even though the maximum number of elements (`max_elts`) has been reached.
      Continuing to insert elements after the overflow could result in the
      tracing_map containing "tracing_map->max_size" elements, leaving no empty
      entries.
      If any attempt is made to insert an element into a full tracing_map using
      `__tracing_map_insert()`, it will cause an infinite loop with preemption
      disabled, leading to a CPU hang problem.
      
      Fix this by preventing any further increments to "tracing_map->next_elt"
      once it reaches "tracing_map->max_elt".
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Fixes: 08d43a5f ("tracing: Add lock-free tracing_map")
      Co-developed-by: default avatarCheng-Jui Wang <cheng-jui.wang@mediatek.com>
      Link: https://lore.kernel.org/20240805055922.6277-1-Tze-nan.Wu@mediatek.comSigned-off-by: default avatarCheng-Jui Wang <cheng-jui.wang@mediatek.com>
      Signed-off-by: default avatarTze-nan Wu <Tze-nan.Wu@mediatek.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      bcf86c01
    • Petr Pavlu's avatar
      function_graph: Fix the ret_stack used by ftrace_graph_ret_addr() · 604b72b3
      Petr Pavlu authored
      When ftrace_graph_ret_addr() is invoked to convert a found stack return
      address to its original value, the function can end up producing the
      following crash:
      
      [   95.442712] BUG: kernel NULL pointer dereference, address: 0000000000000028
      [   95.442720] #PF: supervisor read access in kernel mode
      [   95.442724] #PF: error_code(0x0000) - not-present page
      [   95.442727] PGD 0 P4D 0-
      [   95.442731] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
      [   95.442736] CPU: 1 UID: 0 PID: 2214 Comm: insmod Kdump: loaded Tainted: G           OE K    6.11.0-rc1-default #1 67c62a3b3720562f7e7db5f11c1fdb40b7a2857c
      [   95.442747] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH
      [   95.442750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
      [   95.442754] RIP: 0010:ftrace_graph_ret_addr+0x42/0xc0
      [   95.442766] Code: [...]
      [   95.442773] RSP: 0018:ffff979b80ff7718 EFLAGS: 00010006
      [   95.442776] RAX: ffffffff8ca99b10 RBX: ffff979b80ff7760 RCX: ffff979b80167dc0
      [   95.442780] RDX: ffffffff8ca99b10 RSI: ffff979b80ff7790 RDI: 0000000000000005
      [   95.442783] RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
      [   95.442786] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffff8e9491e0
      [   95.442790] R13: ffffffff8d6f70f0 R14: ffff979b80167da8 R15: ffff979b80167dc8
      [   95.442793] FS:  00007fbf83895740(0000) GS:ffff8a0afdd00000(0000) knlGS:0000000000000000
      [   95.442797] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   95.442800] CR2: 0000000000000028 CR3: 0000000005070002 CR4: 0000000000370ef0
      [   95.442806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   95.442809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   95.442816] Call Trace:
      [   95.442823]  <TASK>
      [   95.442896]  unwind_next_frame+0x20d/0x830
      [   95.442905]  arch_stack_walk_reliable+0x94/0xe0
      [   95.442917]  stack_trace_save_tsk_reliable+0x7d/0xe0
      [   95.442922]  klp_check_and_switch_task+0x55/0x1a0
      [   95.442931]  task_call_func+0xd3/0xe0
      [   95.442938]  klp_try_switch_task.part.5+0x37/0x150
      [   95.442942]  klp_try_complete_transition+0x79/0x2d0
      [   95.442947]  klp_enable_patch+0x4db/0x890
      [   95.442960]  do_one_initcall+0x41/0x2e0
      [   95.442968]  do_init_module+0x60/0x220
      [   95.442975]  load_module+0x1ebf/0x1fb0
      [   95.443004]  init_module_from_file+0x88/0xc0
      [   95.443010]  idempotent_init_module+0x190/0x240
      [   95.443015]  __x64_sys_finit_module+0x5b/0xc0
      [   95.443019]  do_syscall_64+0x74/0x160
      [   95.443232]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [   95.443236] RIP: 0033:0x7fbf82f2c709
      [   95.443241] Code: [...]
      [   95.443247] RSP: 002b:00007fffd5ea3b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [   95.443253] RAX: ffffffffffffffda RBX: 000056359c48e750 RCX: 00007fbf82f2c709
      [   95.443257] RDX: 0000000000000000 RSI: 000056356ed4efc5 RDI: 0000000000000003
      [   95.443260] RBP: 000056356ed4efc5 R08: 0000000000000000 R09: 00007fffd5ea3c10
      [   95.443263] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
      [   95.443267] R13: 000056359c48e6f0 R14: 0000000000000000 R15: 0000000000000000
      [   95.443272]  </TASK>
      [   95.443274] Modules linked in: [...]
      [   95.443385] Unloaded tainted modules: intel_uncore_frequency(E):1 isst_if_common(E):1 skx_edac(E):1
      [   95.443414] CR2: 0000000000000028
      
      The bug can be reproduced with kselftests:
      
       cd linux/tools/testing/selftests
       make TARGETS='ftrace livepatch'
       (cd ftrace; ./ftracetest test.d/ftrace/fgraph-filter.tc)
       (cd livepatch; ./test-livepatch.sh)
      
      The problem is that ftrace_graph_ret_addr() is supposed to operate on the
      ret_stack of a selected task but wrongly accesses the ret_stack of the
      current task. Specifically, the above NULL dereference occurs when
      task->curr_ret_stack is non-zero, but current->ret_stack is NULL.
      
      Correct ftrace_graph_ret_addr() to work with the right ret_stack.
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reported-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Link: https://lore.kernel.org/20240803131211.17255-1-petr.pavlu@suse.com
      Fixes: 7aa1eaef ("function_graph: Allow multiple users to attach to function graph")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      604b72b3
  3. 07 Aug, 2024 5 commits
    • Mathias Krause's avatar
      eventfs: Use SRCU for freeing eventfs_inodes · 8e556432
      Mathias Krause authored
      To mirror the SRCU lock held in eventfs_iterate() when iterating over
      eventfs inodes, use call_srcu() to free them too.
      
      This was accidentally(?) degraded to RCU in commit 43aa6f97
      ("eventfs: Get rid of dentry pointers without refcounts").
      
      Cc: Ajay Kaher <ajay.kaher@broadcom.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/20240723210755.8970-1-minipli@grsecurity.net
      Fixes: 43aa6f97 ("eventfs: Get rid of dentry pointers without refcounts")
      Signed-off-by: default avatarMathias Krause <minipli@grsecurity.net>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      8e556432
    • Mathias Krause's avatar
      eventfs: Don't return NULL in eventfs_create_dir() · 12c20c65
      Mathias Krause authored
      Commit 77a06c33 ("eventfs: Test for ei->is_freed when accessing
      ei->dentry") added another check, testing if the parent was freed after
      we released the mutex. If so, the function returns NULL. However, all
      callers expect it to either return a valid pointer or an error pointer,
      at least since commit 5264a2f4 ("tracing: Fix a NULL vs IS_ERR() bug
      in event_subsystem_dir()"). Returning NULL will therefore fail the error
      condition check in the caller.
      
      Fix this by substituting the NULL return value with a fitting error
      pointer.
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: stable@vger.kernel.org
      Fixes: 77a06c33 ("eventfs: Test for ei->is_freed when accessing ei->dentry")
      Link: https://lore.kernel.org/20240723122522.2724-1-minipli@grsecurity.netReviewed-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarAjay Kaher <ajay.kaher@broadcom.com>
      Signed-off-by: default avatarMathias Krause <minipli@grsecurity.net>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      12c20c65
    • Mathias Krause's avatar
      tracefs: Fix inode allocation · 0df2ac59
      Mathias Krause authored
      The leading comment above alloc_inode_sb() is pretty explicit about it:
      
        /*
         * This must be used for allocating filesystems specific inodes to set
         * up the inode reclaim context correctly.
         */
      
      Switch tracefs over to alloc_inode_sb() to make sure inodes are properly
      linked.
      
      Cc: Ajay Kaher <ajay.kaher@broadcom.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/20240807115143.45927-2-minipli@grsecurity.net
      Fixes: ba37ff75 ("eventfs: Implement tracefs_inode_cache")
      Signed-off-by: default avatarMathias Krause <minipli@grsecurity.net>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      0df2ac59
    • Steven Rostedt's avatar
      tracing: Use refcount for trace_event_file reference counter · 6e2fdcef
      Steven Rostedt authored
      Instead of using an atomic counter for the trace_event_file reference
      counter, use the refcount interface. It has various checks to make sure
      the reference counting is correct, and will warn if it detects an error
      (like refcount_inc() on '0').
      
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Link: https://lore.kernel.org/20240726144208.687cce24@rorschach.local.homeAcked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      6e2fdcef
    • Steven Rostedt's avatar
      tracing: Have format file honor EVENT_FILE_FL_FREED · b1560408
      Steven Rostedt authored
      When eventfs was introduced, special care had to be done to coordinate the
      freeing of the file meta data with the files that are exposed to user
      space. The file meta data would have a ref count that is set when the file
      is created and would be decremented and freed after the last user that
      opened the file closed it. When the file meta data was to be freed, it
      would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed,
      and any new references made (like new opens or reads) would fail as it is
      marked freed. This allowed other meta data to be freed after this flag was
      set (under the event_mutex).
      
      All the files that were dynamically created in the events directory had a
      pointer to the file meta data and would call event_release() when the last
      reference to the user space file was closed. This would be the time that it
      is safe to free the file meta data.
      
      A shortcut was made for the "format" file. It's i_private would point to
      the "call" entry directly and not point to the file's meta data. This is
      because all format files are the same for the same "call", so it was
      thought there was no reason to differentiate them.  The other files
      maintain state (like the "enable", "trigger", etc). But this meant if the
      file were to disappear, the "format" file would be unaware of it.
      
      This caused a race that could be trigger via the user_events test (that
      would create dynamic events and free them), and running a loop that would
      read the user_events format files:
      
      In one console run:
      
       # cd tools/testing/selftests/user_events
       # while true; do ./ftrace_test; done
      
      And in another console run:
      
       # cd /sys/kernel/tracing/
       # while true; do cat events/user_events/__test_event/format; done 2>/dev/null
      
      With KASAN memory checking, it would trigger a use-after-free bug report
      (which was a real bug). This was because the format file was not checking
      the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the
      event that the file meta data pointed to after the event was freed.
      
      After inspection, there are other locations that were found to not check
      the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a
      new helper function: event_file_file() that will make sure that the
      event_mutex is held, and will return NULL if the trace_event_file has the
      EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file
      pointer use event_file_file() and check for NULL. Later uses can still use
      the event_file_data() helper function if the event_mutex is still held and
      was not released since the event_file_file() call.
      
      Link: https://lore.kernel.org/all/20240719204701.1605950-1-minipli@grsecurity.net/
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers   <mathieu.desnoyers@efficios.com>
      Cc: Ajay Kaher <ajay.kaher@broadcom.com>
      Cc: Ilkka Naulapää    <digirigawa@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al   Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Carpenter   <dan.carpenter@linaro.org>
      Cc: Beau Belgrave <beaub@linux.microsoft.com>
      Cc: Florian Fainelli  <florian.fainelli@broadcom.com>
      Cc: Alexey Makhalov    <alexey.makhalov@broadcom.com>
      Cc: Vasavi Sirnapalli    <vasavi.sirnapalli@broadcom.com>
      Link: https://lore.kernel.org/20240730110657.3b69d3c1@gandalf.local.home
      Fixes: b63db58e ("eventfs/tracing: Add callback for release of an eventfs_inode")
      Reported-by: default avatarMathias Krause <minipli@grsecurity.net>
      Tested-by: default avatarMathias Krause <minipli@grsecurity.net>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b1560408
  4. 04 Aug, 2024 11 commits
    • Linus Torvalds's avatar
      Linux 6.11-rc2 · de9c2c66
      Linus Torvalds authored
      de9c2c66
    • Tetsuo Handa's avatar
      profiling: remove profile=sleep support · b88f5538
      Tetsuo Handa authored
      The kernel sleep profile is no longer working due to a recursive locking
      bug introduced by commit 42a20f86 ("sched: Add wrapper for get_wchan()
      to keep task blocked")
      
      Booting with the 'profile=sleep' kernel command line option added or
      executing
      
        # echo -n sleep > /sys/kernel/profiling
      
      after boot causes the system to lock up.
      
      Lockdep reports
      
        kthreadd/3 is trying to acquire lock:
        ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
      
        but task is already holding lock:
        ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
      
      with the call trace being
      
         lock_acquire+0xc8/0x2f0
         get_wchan+0x32/0x70
         __update_stats_enqueue_sleeper+0x151/0x430
         enqueue_entity+0x4b0/0x520
         enqueue_task_fair+0x92/0x6b0
         ttwu_do_activate+0x73/0x140
         try_to_wake_up+0x213/0x370
         swake_up_locked+0x20/0x50
         complete+0x2f/0x40
         kthread+0xfb/0x180
      
      However, since nobody noticed this regression for more than two years,
      let's remove 'profile=sleep' support based on the assumption that nobody
      needs this functionality.
      
      Fixes: 42a20f86 ("sched: Add wrapper for get_wchan() to keep task blocked")
      Cc: stable@vger.kernel.org # v5.16+
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b88f5538
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a5dbd76a
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.
      
         A recent change in the ACPI code which consolidated code pathes moved
         the invocation of init_freq_invariance_cppc() to be moved to a CPU
         hotplug handler. The first invocation on AMD CPUs ends up enabling a
         static branch which dead locks because the static branch enable tries
         to acquire cpu_hotplug_lock but that lock is already held write by
         the hotplug machinery.
      
         Use static_branch_enable_cpuslocked() instead and take the hotplug
         lock read for the Intel code path which is invoked from the
         architecture code outside of the CPU hotplug operations.
      
       - Fix the number of reserved bits in the sev_config structure bit field
         so that the bitfield does not exceed 64 bit.
      
       - Add missing Zen5 model numbers
      
       - Fix the alignment assumptions of pti_clone_pgtable() and
         clone_entry_text() on 32-bit:
      
         The code assumes PMD aligned code sections, but on 32-bit the kernel
         entry text is not PMD aligned. So depending on the code size and
         location, which is configuration and compiler dependent, entry text
         can cross a PMD boundary. As the start is not PMD aligned adding PMD
         size to the start address is larger than the end address which
         results in partially mapped entry code for user space. That causes
         endless recursion on the first entry from userspace (usually #PF).
      
         Cure this by aligning the start address in the addition so it ends up
         at the next PMD start address.
      
         clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
         eventually be PTE mapped, which causes a map fail because the PMD for
         the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
         the clone() invocation which resolves to PTE on 32-bit and PMD on
         64-bit.
      
       - Zero the 8-byte case for get_user() on range check failure on 32-bit
      
         The recend consolidation of the 8-byte get_user() case broke the
         zeroing in the failure case again. Establish it by clearing ECX
         before the range check and not afterwards as that obvioulsy can't be
         reached when the range check fails
      
      * tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
        x86/mm: Fix pti_clone_entry_text() for i386
        x86/mm: Fix pti_clone_pgtable() alignment assumption
        x86/setup: Parse the builtin command line before merging
        x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
        x86/sev: Fix __reserved field in sev_config
        x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
      a5dbd76a
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 61ca6c78
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "Two fixes for the timer/clocksource code:
      
         - The recent fix to make the take over of the broadcast timer more
           reliable retrieves a per CPU pointer in preemptible context.
      
           This went unnoticed in testing as some compilers hoist the access
           into the non-preemotible section where the pointer is actually
           used, but obviously compilers can rightfully invoke it where the
           code put it.
      
           Move it into the non-preemptible section right to the actual usage
           side to cure it.
      
         - The clocksource watchdog is supposed to emit a warning when the
           retry count is greater than one and the number of retries reaches
           the limit.
      
           The condition is backwards and warns always when the count is
           greater than one. Fixup the condition to prevent spamming dmesg"
      
      * tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
        tick/broadcast: Move per CPU pointer access into the atomic section
      61ca6c78
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6cc82dc2
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
      
       - When stime is larger than rtime due to accounting imprecision, then
         utime = rtime - stime becomes negative. As this is unsigned math, the
         result becomes a huge positive number.
      
         Cure it by resetting stime to rtime in that case, so utime becomes 0.
      
       - Restore consistent state when sched_cpu_deactivate() fails.
      
         When offlining a CPU fails in sched_cpu_deactivate() after the SMT
         present counter has been decremented, then the function aborts but
         fails to increment the SMT present counter and leaves it imbalanced.
         Consecutive operations cause it to underflow. Add the missing fixup
         for the error path.
      
         For SMT accounting the runqueue needs to marked online again in the
         error exit path to restore consistent state.
      
      * tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
        sched/core: Introduce sched_set_rq_on/offline() helper
        sched/smt: Fix unbalance sched_smt_present dec/inc
        sched/smt: Introduce sched_smt_present_inc/dec() helper
        sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
      6cc82dc2
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ddeb0ef
      Linus Torvalds authored
      Pull x86 perf fixes from Thomas Gleixner:
      
       - Move the smp_processor_id() invocation back into the non-preemtible
         region, so that the result is valid to use
      
       - Add the missing package C2 residency counters for Sierra Forest CPUs
         to make the newly added support actually useful
      
      * tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Fix smp_processor_id()-in-preemptible warnings
        perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
      1ddeb0ef
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 953f7764
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A couple of fixes for interrupt chip drivers:
      
         - Make sure to skip the clear register space in the MBIGEN driver
           when calculating the node register index. Otherwise the clear
           register is clobbered and the wrong node registers are accessed.
      
         - Fix a signed/unsigned confusion in the loongarch CPU driver which
           converts an error code to a huge "valid" interrupt number.
      
         - Convert the mesion GPIO interrupt controller lock to a raw spinlock
           so it works on RT.
      
         - Add a missing static to a internal function in the pic32 EVIC
           driver"
      
      * tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/mbigen: Fix mbigen node address layout
        irqchip/meson-gpio: Convert meson_gpio_irq_controller::lock to 'raw_spinlock_t'
        irqchip/irq-pic32-evic: Add missing 'static' to internal function
        irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()
      953f7764
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3bc70ad1
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "Two fixes for locking and jump labels:
      
         - Ensure that the atomic_cmpxchg() conditions are correct and
           evaluating to true on any non-zero value except 1. The missing
           check of the return value leads to inconsisted state of the jump
           label counter.
      
         - Add a missing type conversion in the paravirt spinlock code which
           makes loongson build again"
      
      * tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        jump_label: Fix the fix, brown paper bags galore
        locking/pvqspinlock: Correct the type of "old" variable in pv_kick_node()
      3bc70ad1
    • Rob Herring (Arm)'s avatar
      arm: dts: arm: versatile-ab: Fix duplicate clock node name · ff588380
      Rob Herring (Arm) authored
      Commit 04f08ef2 ("arm/arm64: dts: arm: Use generic clock and
      regulator nodenames") renamed nodes and created 2 "clock-24000000" nodes
      (at different paths).
      
      The kernel can't handle these duplicate names even though they are at
      different paths.  Fix this by renaming one of the nodes to "clock-pclk".
      
      This name is aligned with other Arm boards (those didn't have a known
      frequency to use in the node name).
      
      Fixes: 04f08ef2 ("arm/arm64: dts: arm: Use generic clock and regulator nodenames")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarRob Herring (Arm) <robh@kernel.org>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Tested-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff588380
    • Linus Torvalds's avatar
      Merge tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 3f3f6d61
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - two reparse point fixes
      
       - minor cleanup
      
       - additional trace point (to help debug a recent problem)
      
      * tag '6.11-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal version number
        smb: client: fix FSCTL_GET_REPARSE_POINT against NetApp
        smb3: add dynamic tracepoints for shutdown ioctl
        cifs: Remove cifs_aio_ctx
        smb: client: handle lack of FSCTL_GET_REPARSE_POINT support
      3f3f6d61
    • Linus Torvalds's avatar
      Merge tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 3c41df42
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
      
       - two Kconfig fixes
      
       - one fix for the UVC driver addressing probing time detection of a UVC
         custom controls
      
       - one fix related to PDF generation
      
      * tag 'media/v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: v4l: Fix missing tabular column hint for Y14P format
        media: intel/ipu6: select AUXILIARY_BUS in Kconfig
        media: ipu-bridge: fix ipu6 Kconfig dependencies
        media: uvcvideo: Fix custom control mapping probing
      3c41df42
  5. 03 Aug, 2024 5 commits
  6. 02 Aug, 2024 13 commits
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux · 17712b7e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Two minor tweaks for the NAPI handling, both from Olivier:
      
         - Kill two unused list definitions
      
         - Ensure that multishot NAPI doesn't age away"
      
      * tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux:
        io_uring: remove unused local list heads in NAPI functions
        io_uring: keep multishot request NAPI timeout current
      17712b7e
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · d9ef02e5
      Linus Torvalds authored
      Pull thermal control fixes from Rafael Wysocki:
       "These fix a few issues related to the MSI IRQs management in the
        int340x thermal driver, fix a thermal core issue that may lead to
        missing trip point crossing events and update the thermal core
        documentation.
      
        Specifics:
      
         - Fix MSI error path cleanup in int340x, allow it to work with a
           subset of thermal MSI IRQs if some of them are not working and make
           it free all MSI IRQs on module exit (Srinivas Pandruvada)
      
         - Fix a thermal core issue that may lead to missing trip point
           crossing events in some cases when thermal_zone_set_trips() is used
           and update the thermal core documentation (Rafael Wysocki)"
      
      * tag 'thermal-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: core: Update thermal zone registration documentation
        thermal: trip: Avoid skipping trips in thermal_zone_set_trips()
        thermal: intel: int340x: Free MSI IRQ vectors on module exit
        thermal: intel: int340x: Allow limited thermal MSI support
        thermal: intel: int340x: Fix kernel warning during MSI cleanup
      d9ef02e5
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 041b1061
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Expand the speculative SSBS errata workaround to more CPUs
      
       - Ensure jump label changes are visible to all CPUs with a
         kick_all_cpus_sync() (and also enable jump label batching as part of
         the fix)
      
       - The shadow call stack sanitiser is currently incompatible with Rust,
         make CONFIG_RUST conditional on !CONFIG_SHADOW_CALL_STACK
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: jump_label: Ensure patched jump_labels are visible to all CPUs
        rust: SHADOW_CALL_STACK is incompatible with Rust
        arm64: errata: Expand speculative SSBS workaround (again)
        arm64: cputype: Add Cortex-A725 definitions
        arm64: cputype: Add Cortex-X1C definitions
      041b1061
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-client · 1c424629
      Linus Torvalds authored
      Pull ceph fix from Ilya Dryomov:
       "A fix for a potential hang in the MDS when cap revocation races with
        the client releasing the caps in question, marked for stable"
      
      * tag 'ceph-for-6.11-rc2' of https://github.com/ceph/ceph-client:
        ceph: force sending a cap update msg back to MDS for revoke op
      1c424629
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 725d410f
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "The bulk of the changes here is a largish change to guest_memfd,
        delaying the clearing and encryption of guest-private pages until they
        are actually added to guest page tables. This started as "let's make
        it impossible to misuse the API" for SEV-SNP; but then it ballooned a
        bit.
      
        The new logic is generally simpler and more ready for hugepage support
        in guest_memfd.
      
        Summary:
      
         - fix latent bug in how usage of large pages is determined for
           confidential VMs
      
         - fix "underline too short" in docs
      
         - eliminate log spam from limited APIC timer periods
      
         - disallow pre-faulting of memory before SEV-SNP VMs are initialized
      
         - delay clearing and encrypting private memory until it is added to
           guest page tables
      
         - this change also enables another small cleanup: the checks in
           SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can
           now be moved in the common kvm_gmem_populate() function
      
         - fix compilation error that the RISC-V merge introduced in selftests"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86/mmu: fix determination of max NPT mapping level for private pages
        KVM: riscv: selftests: Fix compile error
        KVM: guest_memfd: abstract how prepared folios are recorded
        KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns
        KVM: extend kvm_range_has_memory_attributes() to check subset of attributes
        KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
        KVM: guest_memfd: move check for already-populated page to common code
        KVM: remove kvm_arch_gmem_prepare_needed()
        KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm
        KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest
        KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
        KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
        KVM: guest_memfd: do not go through struct page
        KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation
        KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
        KVM: x86: disallow pre-fault for SNP VMs before initialization
        KVM: Documentation: Fix title underline too short warning
        KVM: x86: Eliminate log spam from limited APIC timer periods
      725d410f
    • Paolo Bonzini's avatar
      Merge branch 'kvm-fixes' into HEAD · 1773014a
      Paolo Bonzini authored
      * fix latent bug in how usage of large pages is determined for
        confidential VMs
      
      * fix "underline too short" in docs
      
      * eliminate log spam from limited APIC timer periods
      
      * disallow pre-faulting of memory before SEV-SNP VMs are initialized
      
      * delay clearing and encrypting private memory until it is added to
        guest page tables
      
      * this change also enables another small cleanup: the checks in
        SNP_LAUNCH_UPDATE that limit it to non-populated, private pages
        can now be moved in the common kvm_gmem_populate() function
      1773014a
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 948752d2
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A fix to avoid dropping some of the internal pseudo-extensions, which
         breaks *envcfg dependency parsing
      
       - The kernel entry address is now aligned in purgatory, which avoids a
         misaligned load that can lead to crash on systems that don't support
         misaligned accesses early in boot
      
       - The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of
         perf JSON configurations, one of them been updated to
         FW_SFENCE_VMA_ASID_SENT
      
       - The starfive cache driver is now restricted to 64-bit systems, as it
         isn't 32-bit clean
      
       - A fix for to avoid aliasing legacy-mode perf counters with software
         perf counters
      
       - VM_FAULT_SIGSEGV is now handled in the page fault code
      
       - A fix for stalls during CPU hotplug due to IPIs being disabled
      
       - A fix for memblock bounds checking. This manifests as a crash on
         systems with discontinuous memory maps that have regions that don't
         fit in the linear map
      
      * tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix linear mapping checks for non-contiguous memory regions
        RISC-V: Enable the IPI before workqueue_online_cpu()
        riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()
        perf: riscv: Fix selecting counters in legacy mode
        cache: StarFive: Require a 64-bit system
        perf arch events: Fix duplicate RISC-V SBI firmware event name
        riscv/purgatory: align riscv_kernel_entry
        riscv: cpufeature: Do not drop Linux-internal extensions
      948752d2
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-fixes-6.11-1' of https://github.com/kvm-riscv/linux into HEAD · 29b5bbf7
      Paolo Bonzini authored
      KVM/riscv fixes for 6.11, take #1
      
      - Fix compile error in get-reg-list selftests
      29b5bbf7
    • Linus Torvalds's avatar
      Merge tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 66242ef2
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - remove unused empty CPU alternatives header file
      
       - fix recently and erroneously removed exception handling when loading
         an invalid floating point register
      
       - ptdump fixes to reflect the recent changes due to the uncoupling of
         physical vs virtual kernel address spaces
      
       - changes to avoid the unnecessary splitting of large pages in kernel
         mappings
      
       - add the missing MODULE_DESCRIPTION for the CIO modules
      
      * tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: Keep inittext section writable
        s390/vmlinux.lds.S: Move ro_after_init section behind rodata section
        s390/mm: Get rid of RELOC_HIDE()
        s390/mm/ptdump: Improve sorting of markers
        s390/mm/ptdump: Add support for relocated lowcore mapping
        s390/mm/ptdump: Fix handling of identity mapping area
        s390/cio: Add missing MODULE_DESCRIPTION() macros
        s390/alternatives: Remove unused empty header file
        s390/fpu: Re-add exception handling in load_fpu_state()
      66242ef2
    • Paul E. McKenney's avatar
      clocksource: Fix brown-bag boolean thinko in cs_watchdog_read() · f2655ac2
      Paul E. McKenney authored
      The current "nretries > 1 || nretries >= max_retries" check in
      cs_watchdog_read() will always evaluate to true, and thus pr_warn(), if
      nretries is greater than 1.  The intent is instead to never warn on the
      first try, but otherwise warn if the successful retry was the last retry.
      
      Therefore, change that "||" to "&&".
      
      Fixes: db3a34e1 ("clocksource: Retry clock read if long delays detected")
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/all/20240802154618.4149953-2-paulmck@kernel.org
      f2655ac2
    • Linus Torvalds's avatar
      Merge tag 'asm-generic-fixes-6.11-1' of... · 29ccb40f
      Linus Torvalds authored
      Merge tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
      
      Pull asm-generic fixes from Arnd Bergmann:
       "These are three important bug fixes for the cross-architecture tree,
        fixing a regression with the new syscall.tbl file, the inconsistent
        numbering for the new uretprobe syscall and a bug with iowrite64be on
        alpha"
      
      * tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        syscalls: fix syscall macros for newfstat/newfstatat
        uretprobe: change syscall number, again
        alpha: fix ioread64be()/iowrite64be() helpers
      29ccb40f
    • Linus Torvalds's avatar
      Merge tag 'sound-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 6b779f8a
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A small collection of fixes:
      
         - Revert of FireWire changes that caused a long-time regression
      
         - Another long-time regression fix for AMD HDMI
      
         - MIDI2 UMP fixes
      
         - HD-audio Conexant codec fixes and a quirk"
      
      * tag 'sound-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda: Conditionally use snooping for AMD HDMI
        ALSA: usb-audio: Correct surround channels in UAC1 channel map
        ALSA: seq: ump: Explicitly reset RPN with Null RPN
        ALSA: seq: ump: Transmit RPN/NRPN message at each MSB/LSB data reception
        ALSA: seq: ump: Use the common RPN/bank conversion context
        ALSA: ump: Explicitly reset RPN with Null RPN
        ALSA: ump: Transmit RPN/NRPN message at each MSB/LSB data reception
        Revert "ALSA: firewire-lib: operate for period elapse event in process context"
        Revert "ALSA: firewire-lib: obsolete workqueue for period update"
        ALSA: hda/realtek: Add quirk for Acer Aspire E5-574G
        ALSA: seq: ump: Optimize conversions from SysEx to UMP
        ALSA: hda/conexant: Mute speakers at suspend / shutdown
        ALSA: hda/generic: Add a helper to mute speakers at suspend/shutdown
        ALSA: hda: conexant: Fix headset auto detect fail in the polling mode
      6b779f8a
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-08-02' of https://gitlab.freedesktop.org/drm/kernel · 29b4a699
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Regular weekly fixes. This is a bit larger than usual but doesn't seem
        too crazy.
      
        Most of it is vmwgfx changes that fix a bunch of issues with wayland
        userspaces with dma-buf/external buffers and modesetting fixes.
      
        Otherwise it's kinda spread out, v3d fixes some new ioctls, nouveau
        has regression revert and fixes, amdgpu, i915 and ast have some small
        fixes, and some core fixes spread about.
      
        client:
         - fix error code
      
        atomic:
         - allow damage clips with async flips
         - allow explicit sync with async flips
      
        kselftests:
         - fix dmabuf-heaps test
      
        panic:
         - fix schedule_work in panic paths
      
        panel:
         - fix OrangePi Neo orientation
      
        gpuvm:
         - fix missing dependency
      
        amdgpu:
         - SMU 14.x update
         - Fix contiguous VRAM handling for IB parsing
         - GFX 12 fix
         - Regression fix for old APUs
      
        i915:
         - Static analysis fix for int overflow
         - Fix for HDCP2_STREAM_STATUS macro and removal of PWR_CLK_STATE for gen12
      
        nouveau:
         - revert busy wait change that caused a resume regression
         - fix buffer placement fault on dynamic pm s/r
         - fix refcount underflow
      
        ast:
         - fix black screen on resume
         - wake during connector status detect
      
        v3d:
         - fix issues with perf/timestamp ioctls
      
        vmwgfx:
         - fix deadlock in dma-buf fence polling
         - fix screen surface refcounting
         - fix dumb buffer handling
         - fix support for external buffers
         - fix overlay with screen targets
         - trigger modeset on screen moves"
      
      * tag 'drm-fixes-2024-08-02' of https://gitlab.freedesktop.org/drm/kernel: (31 commits)
        Revert "nouveau: rip out busy fence waits"
        nouveau: set placement to original placement on uvmm validate.
        drm/atomic: Allow userspace to use damage clips with async flips
        drm/atomic: Allow userspace to use explicit sync with atomic async flips
        drm/i915: Fix possible int overflow in skl_ddi_calculate_wrpll()
        drm/i915/hdcp: Fix HDCP2_STREAM_STATUS macro
        drm/ast: astdp: Wake up during connector status detection
        i915/perf: Remove code to update PWR_CLK_STATE for gen12
        kselftests: dmabuf-heaps: Ensure the driver name is null-terminated
        drm/client: Fix error code in drm_client_buffer_vmap_local()
        drm/amdgpu: Fix APU handling in amdgpu_pm_load_smu_firmware()
        drm/amdgpu: increase mes log buffer size for gfx12
        drm/amdgpu: fix contiguous handling for IB parsing v2
        drm/amdgpu/pm: support gpu_metrics sysfs interface for smu v14.0.2/3
        drm/vmwgfx: Trigger a modeset when the screen moves
        drm/vmwgfx: Fix overlay when using Screen Targets
        drm/vmwgfx: Add basic support for external buffers
        drm/vmwgfx: Fix handling of dumb buffers
        drm/vmwgfx: Make sure the screen surface is ref counted
        drm/vmwgfx: Fix a deadlock in dma buf fence polling
        ...
      29b4a699