1. 06 Sep, 2024 25 commits
  2. 05 Sep, 2024 15 commits
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · f9535999
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A few small driver specific fixes (including some of the widespread
        work on fixing missing ID tables for module autoloading and the revert
        of some problematic PM work in spi-rockchip), some improvements to the
        MAINTAINERS information for the NXP drivers and the addition of a new
        device ID to spidev"
      
      * tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers
        MAINTAINERS: SPI: Add freescale lpspi maintainer information
        spi: spi-fsl-lpspi: Fix off-by-one in prescale max
        spi: spidev: Add missing spi_device_id for jg10309-01
        spi: bcm63xx: Enable module autoloading
        spi: intel: Add check devm_kasprintf() returned value
        spi: spidev: Add an entry for elgin,jg10309-01
        spi: rockchip: Resolve unbalanced runtime PM / system PM handling
      f9535999
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.11-stub' of... · 2a660447
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fix from Mark Brown:
       "A fix from Doug Anderson for a missing stub, required to fix the build
        for some newly added users of devm_regulator_bulk_get_const() in
        !REGULATOR configurations"
      
      * tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR
      2a660447
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux · 6c5b3e30
      Linus Torvalds authored
      Pull Rust fixes from Miguel Ojeda:
       "Toolchain and infrastructure:
      
         - Fix builds for nightly compiler users now that 'new_uninit' was
           split into new features by using an alternative approach for the
           code that used what is now called the 'box_uninit_write' feature
      
         - Allow the 'stable_features' lint to preempt upcoming warnings about
           them, since soon there will be unstable features that will become
           stable in nightly compilers
      
         - Export bss symbols too
      
        'kernel' crate:
      
         - 'block' module: fix wrong usage of lockdep API
      
        'macros' crate:
      
         - Provide correct provenance when constructing 'THIS_MODULE'
      
        Documentation:
      
         - Remove unintended indentation (blockquotes) in generated output
      
         - Fix a couple typos
      
        MAINTAINERS:
      
         - Remove Wedson as Rust maintainer
      
         - Update Andreas' email"
      
      * tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux:
        MAINTAINERS: update Andreas Hindborg's email address
        MAINTAINERS: Remove Wedson as Rust maintainer
        rust: macros: provide correct provenance when constructing THIS_MODULE
        rust: allow `stable_features` lint
        docs: rust: remove unintended blockquote in Quick Start
        rust: alloc: eschew `Box<MaybeUninit<T>>::write`
        rust: kernel: fix typos in code comments
        docs: rust: remove unintended blockquote in Coding Guidelines
        rust: block: fix wrong usage of lockdep API
        rust: kbuild: fix export of bss symbols
      6c5b3e30
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · e4b42053
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix adding a new fgraph callback after function graph tracing has
         already started.
      
         If the new caller does not initialize its hash before registering the
         fgraph_ops, it can cause a NULL pointer dereference. Fix this by
         adding a new parameter to ftrace_graph_enable_direct() passing in the
         newly added gops directly and not rely on using the fgraph_array[],
         as entries in the fgraph_array[] must be initialized.
      
         Assign the new gops to the fgraph_array[] after it goes through
         ftrace_startup_subops() as that will properly initialize the
         gops->ops and initialize its hashes.
      
       - Fix a memory leak in fgraph storage memory test.
      
         If the "multiple fgraph storage on a function" boot up selftest fails
         in the registering of the function graph tracer, it will not free the
         memory it allocated for the filter. Break the loop up into two where
         it allocates the filters first and then registers the functions where
         any errors will do the appropriate clean ups.
      
       - Only clear the timerlat timers if it has an associated kthread.
      
         In the rtla tool that uses timerlat, if it was killed just as it was
         shutting down, the signals can free the kthread and the timer. But
         the closing of the timerlat files could cause the hrtimer_cancel() to
         be called on the already freed timer. As the kthread variable is is
         set to NULL when the kthreads are stopped and the timers are freed it
         can be used to know not to call hrtimer_cancel() on the timer if the
         kthread variable is NULL.
      
       - Use a cpumask to keep track of osnoise/timerlat kthreads
      
         The timerlat tracer can use user space threads for its analysis. With
         the killing of the rtla tool, the kernel can get confused between if
         it is using a user space thread to analyze or one of its own kernel
         threads. When this confusion happens, kthread_stop() can be called on
         a user space thread and bad things happen. As the kernel threads are
         per-cpu, a bitmask can be used to know when a kernel thread is used
         or when a user space thread is used.
      
       - Add missing interface_lock to osnoise/timerlat stop_kthread()
      
         The stop_kthread() function in osnoise/timerlat clears the osnoise
         kthread variable, and if it was a user space thread does a put_task
         on it. But this can race with the closing of the timerlat files that
         also does a put_task on the kthread, and if the race happens the task
         will have put_task called on it twice and oops.
      
       - Add cond_resched() to the tracing_iter_reset() loop.
      
         The latency tracers keep writing to the ring buffer without resetting
         when it issues a new "start" event (like interrupts being disabled).
         When reading the buffer with an iterator, the tracing_iter_reset()
         sets its pointer to that start event by walking through all the
         events in the buffer until it gets to the time stamp of the start
         event. In the case of a very large buffer, the loop that looks for
         the start event has been reported taking a very long time with a non
         preempt kernel that it can trigger a soft lock up warning. Add a
         cond_resched() into that loop to make sure that doesn't happen.
      
       - Use list_del_rcu() for eventfs ei->list variable
      
         It was reported that running loops of creating and deleting kprobe
         events could cause a crash due to the eventfs list iteration hitting
         a LIST_POISON variable. This is because the list is protected by SRCU
         but when an item is deleted from the list, it was using list_del()
         which poisons the "next" pointer. This is what list_del_rcu() was to
         prevent.
      
      * tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
        tracing/timerlat: Only clear timer if a kthread exists
        tracing/osnoise: Use a cpumask to know what threads are kthreads
        eventfs: Use list_del_rcu() for SRCU protected list variable
        tracing: Avoid possible softlockup in tracing_iter_reset()
        tracing: Fix memory leak in fgraph storage selftest
        tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
      e4b42053
    • Eric Dumazet's avatar
      ila: call nf_unregister_net_hooks() sooner · 031ae728
      Eric Dumazet authored
      syzbot found an use-after-free Read in ila_nf_input [1]
      
      Issue here is that ila_xlat_exit_net() frees the rhashtable,
      then call nf_unregister_net_hooks().
      
      It should be done in the reverse way, with a synchronize_rcu().
      
      This is a good match for a pre_exit() method.
      
      [1]
       BUG: KASAN: use-after-free in rht_key_hashfn include/linux/rhashtable.h:159 [inline]
       BUG: KASAN: use-after-free in __rhashtable_lookup include/linux/rhashtable.h:604 [inline]
       BUG: KASAN: use-after-free in rhashtable_lookup include/linux/rhashtable.h:646 [inline]
       BUG: KASAN: use-after-free in rhashtable_lookup_fast+0x77a/0x9b0 include/linux/rhashtable.h:672
      Read of size 4 at addr ffff888064620008 by task ksoftirqd/0/16
      
      CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.11.0-rc4-syzkaller-00238-g2ad6d23f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:93 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        rht_key_hashfn include/linux/rhashtable.h:159 [inline]
        __rhashtable_lookup include/linux/rhashtable.h:604 [inline]
        rhashtable_lookup include/linux/rhashtable.h:646 [inline]
        rhashtable_lookup_fast+0x77a/0x9b0 include/linux/rhashtable.h:672
        ila_lookup_wildcards net/ipv6/ila/ila_xlat.c:132 [inline]
        ila_xlat_addr net/ipv6/ila/ila_xlat.c:652 [inline]
        ila_nf_input+0x1fe/0x3c0 net/ipv6/ila/ila_xlat.c:190
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK+0x29e/0x450 include/linux/netfilter.h:312
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1ea/0x650 net/core/dev.c:5775
        process_backlog+0x662/0x15b0 net/core/dev.c:6108
        __napi_poll+0xcb/0x490 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0x89b/0x1240 net/core/dev.c:6963
        handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
        run_ksoftirqd+0xca/0x130 kernel/softirq.c:928
        smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164
        kthread+0x2f0/0x390 kernel/kthread.c:389
        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       </TASK>
      
      The buggy address belongs to the physical page:
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x64620
      flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
      page_type: 0xbfffffff(buddy)
      raw: 00fff00000000000 ffffea0000959608 ffffea00019d9408 0000000000000000
      raw: 0000000000000000 0000000000000003 00000000bfffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as freed
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 5242, tgid 5242 (syz-executor), ts 73611328570, free_ts 618981657187
        set_page_owner include/linux/page_owner.h:32 [inline]
        post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1493
        prep_new_page mm/page_alloc.c:1501 [inline]
        get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3439
        __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4695
        __alloc_pages_node_noprof include/linux/gfp.h:269 [inline]
        alloc_pages_node_noprof include/linux/gfp.h:296 [inline]
        ___kmalloc_large_node+0x8b/0x1d0 mm/slub.c:4103
        __kmalloc_large_node_noprof+0x1a/0x80 mm/slub.c:4130
        __do_kmalloc_node mm/slub.c:4146 [inline]
        __kmalloc_node_noprof+0x2d2/0x440 mm/slub.c:4164
        __kvmalloc_node_noprof+0x72/0x190 mm/util.c:650
        bucket_table_alloc lib/rhashtable.c:186 [inline]
        rhashtable_init_noprof+0x534/0xa60 lib/rhashtable.c:1071
        ila_xlat_init_net+0xa0/0x110 net/ipv6/ila/ila_xlat.c:613
        ops_init+0x359/0x610 net/core/net_namespace.c:139
        setup_net+0x515/0xca0 net/core/net_namespace.c:343
        copy_net_ns+0x4e2/0x7b0 net/core/net_namespace.c:508
        create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110
        unshare_nsproxy_namespaces+0x124/0x180 kernel/nsproxy.c:228
        ksys_unshare+0x619/0xc10 kernel/fork.c:3328
        __do_sys_unshare kernel/fork.c:3399 [inline]
        __se_sys_unshare kernel/fork.c:3397 [inline]
        __x64_sys_unshare+0x38/0x40 kernel/fork.c:3397
      page last free pid 11846 tgid 11846 stack trace:
        reset_page_owner include/linux/page_owner.h:25 [inline]
        free_pages_prepare mm/page_alloc.c:1094 [inline]
        free_unref_page+0xd22/0xea0 mm/page_alloc.c:2612
        __folio_put+0x2c8/0x440 mm/swap.c:128
        folio_put include/linux/mm.h:1486 [inline]
        free_large_kmalloc+0x105/0x1c0 mm/slub.c:4565
        kfree+0x1c4/0x360 mm/slub.c:4588
        rhashtable_free_and_destroy+0x7c6/0x920 lib/rhashtable.c:1169
        ila_xlat_exit_net+0x55/0x110 net/ipv6/ila/ila_xlat.c:626
        ops_exit_list net/core/net_namespace.c:173 [inline]
        cleanup_net+0x802/0xcc0 net/core/net_namespace.c:640
        process_one_work kernel/workqueue.c:3231 [inline]
        process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
        worker_thread+0x86d/0xd40 kernel/workqueue.c:3390
        kthread+0x2f0/0x390 kernel/kthread.c:389
        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      Memory state around the buggy address:
       ffff88806461ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88806461ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff888064620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                            ^
       ffff888064620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff888064620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      Fixes: 7f00feaf ("ila: Add generic ILA translation facility")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://patch.msgid.link/20240904144418.1162839-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      031ae728
    • Arkadiusz Kubalewski's avatar
      tools/net/ynl: fix cli.py --subscribe feature · 6fda63c4
      Arkadiusz Kubalewski authored
      Execution of command:
      ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml /
      	--subscribe "monitor" --sleep 10
      fails with:
        File "/repo/./tools/net/ynl/cli.py", line 109, in main
          ynl.check_ntf()
        File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf
          op = self.rsp_by_value[nl_msg.cmd()]
      KeyError: 19
      
      Parsing Generic Netlink notification messages performs lookup for op in
      the message. The message was not yet decoded, and is not yet considered
      GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of
      proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13).
      
      Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the
      op was not passed to the decode function, thus allow parsing of Generic
      Netlink notifications without causing the failure.
      Suggested-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/netdev/m2le0n5xpn.fsf@gmail.com/
      Fixes: 0a966d60 ("tools/net/ynl: Fix extack decoding for directional ops")
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://patch.msgid.link/20240904135034.316033-1-arkadiusz.kubalewski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6fda63c4
    • Vadim Fedorenko's avatar
      MAINTAINERS: fix ptp ocp driver maintainers address · 20d664eb
      Vadim Fedorenko authored
      While checking the latest series for ptp_ocp driver I realised that
      MAINTAINERS file has wrong item about email on linux.dev domain.
      
      Fixes: 795fd934 ("ptp_ocp: adjust MAINTAINERS and mailmap")
      Signed-off-by: default avatarVadim Fedorenko <vadim.fedorenko@linux.dev>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240904131855.559078-1-vadim.fedorenko@linux.devSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20d664eb
    • Jamie Bainbridge's avatar
      selftests: net: enable bind tests · e4af74a5
      Jamie Bainbridge authored
      bind_wildcard is compiled but not run, bind_timewait is not compiled.
      
      These two tests complete in a very short time, use the test harness
      properly, and seem reasonable to enable.
      
      The author of the tests confirmed via email that these were
      intended to be run.
      
      Enable these two tests.
      
      Fixes: 13715acf ("selftest: Add test for bind() conflicts.")
      Fixes: 2c042e8e ("tcp: Add selftest for bind() and TIME_WAIT.")
      Signed-off-by: default avatarJamie Bainbridge <jamie.bainbridge@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/5a009b26cf5fb1ad1512d89c61b37e2fac702323.1725430322.git.jamie.bainbridge@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e4af74a5
    • Frank Li's avatar
      MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers · c9ca76e8
      Frank Li authored
      Add mailing list imx@lists.linux.dev for nxp spi drivers(qspi, fspi and
      dspi).
      Signed-off-by: default avatarFrank Li <Frank.Li@nxp.com>
      Reviewed-by: default avatarStefan Wahren <wahrenst@gmx.net>
      Link: https://patch.msgid.link/20240905155230.1901787-1-Frank.Li@nxp.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      c9ca76e8
    • Frank Li's avatar
      MAINTAINERS: SPI: Add freescale lpspi maintainer information · fb9820c5
      Frank Li authored
      Add imx@lists.linux.dev and NXP maintainer information for lpspi driver
      (drivers/spi/spi-fsl-lpspi.c).
      Signed-off-by: default avatarFrank Li <Frank.Li@nxp.com>
      Reviewed-by: default avatarStefan Wahren <wahrenst@gmx.net>
      Link: https://patch.msgid.link/20240905154124.1901311-1-Frank.Li@nxp.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      fb9820c5
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.11-6' of... · ad618736
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Ilpo Järvinen:
      
       - amd/pmf: ASUS GA403 quirk matching tweak
      
       - dell-smbios: Fix to the init function rollback path
      
      * tag 'platform-drivers-x86-v6.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86/amd: pmf: Make ASUS GA403 quirk generic
        platform/x86: dell-smbios: Fix error path in dell_smbios_init()
      ad618736
    • Linus Torvalds's avatar
      Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7' of... · 120434e5
      Linus Torvalds authored
      Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kunit fix fromShuah Khan:
       "One single fix to a use-after-free bug resulting from
        kunit_driver_create() failing to copy the driver name leaving it on
        the stack or freeing it"
      
      * tag 'linux_kselftest-kunit-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kunit: Device wrappers should also manage driver name
      120434e5
    • Steven Rostedt's avatar
      tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread() · 5bfbcd1e
      Steven Rostedt authored
      The timerlat interface will get and put the task that is part of the
      "kthread" field of the osn_var to keep it around until all references are
      released. But here's a race in the "stop_kthread()" code that will call
      put_task_struct() on the kthread if it is not a kernel thread. This can
      race with the releasing of the references to that task struct and the
      put_task_struct() can be called twice when it should have been called just
      once.
      
      Take the interface_lock() in stop_kthread() to synchronize this change.
      But to do so, the function stop_per_cpu_kthreads() needs to change the
      loop from for_each_online_cpu() to for_each_possible_cpu() and remove the
      cpu_read_lock(), as the interface_lock can not be taken while the cpu
      locks are held. The only side effect of this change is that it may do some
      extra work, as the per_cpu variables of the offline CPUs would not be set
      anyway, and would simply be skipped in the loop.
      
      Remove unneeded "return;" in stop_kthread().
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Tomas Glozar <tglozar@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
      Link: https://lore.kernel.org/20240905113359.2b934242@gandalf.local.home
      Fixes: e88ed227 ("tracing/timerlat: Add user-space interface")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      5bfbcd1e
    • Steven Rostedt's avatar
      tracing/timerlat: Only clear timer if a kthread exists · e6a53481
      Steven Rostedt authored
      The timerlat tracer can use user space threads to check for osnoise and
      timer latency. If the program using this is killed via a SIGTERM, the
      threads are shutdown one at a time and another tracing instance can start
      up resetting the threads before they are fully closed. That causes the
      hrtimer assigned to the kthread to be shutdown and freed twice when the
      dying thread finally closes the file descriptors, causing a use-after-free
      bug.
      
      Only cancel the hrtimer if the associated thread is still around. Also add
      the interface_lock around the resetting of the tlat_var->kthread.
      
      Note, this is just a quick fix that can be backported to stable. A real
      fix is to have a better synchronization between the shutdown of old
      threads and the starting of new ones.
      
      Link: https://lore.kernel.org/all/20240820130001.124768-1-tglozar@redhat.com/
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
      Link: https://lore.kernel.org/20240905085330.45985730@gandalf.local.home
      Fixes: e88ed227 ("tracing/timerlat: Add user-space interface")
      Reported-by: default avatarTomas Glozar <tglozar@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e6a53481
    • Steven Rostedt's avatar
      tracing/osnoise: Use a cpumask to know what threads are kthreads · 177e1cc2
      Steven Rostedt authored
      The start_kthread() and stop_thread() code was not always called with the
      interface_lock held. This means that the kthread variable could be
      unexpectedly changed causing the kthread_stop() to be called on it when it
      should not have been, leading to:
      
       while true; do
         rtla timerlat top -u -q & PID=$!;
         sleep 5;
         kill -INT $PID;
         sleep 0.001;
         kill -TERM $PID;
         wait $PID;
        done
      
      Causing the following OOPS:
      
       Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN PTI
       KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
       CPU: 5 UID: 0 PID: 885 Comm: timerlatu/5 Not tainted 6.11.0-rc4-test-00002-gbc754cc7-dirty #125 a533010b71dab205ad2f507188ce8c82203b0254
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
       RIP: 0010:hrtimer_active+0x58/0x300
       Code: 48 c1 ee 03 41 54 48 01 d1 48 01 d6 55 53 48 83 ec 20 80 39 00 0f 85 30 02 00 00 49 8b 6f 30 4c 8d 75 10 4c 89 f0 48 c1 e8 03 <0f> b6 3c 10 4c 89 f0 83 e0 07 83 c0 03 40 38 f8 7c 09 40 84 ff 0f
       RSP: 0018:ffff88811d97f940 EFLAGS: 00010202
       RAX: 0000000000000002 RBX: ffff88823c6b5b28 RCX: ffffed10478d6b6b
       RDX: dffffc0000000000 RSI: ffffed10478d6b6c RDI: ffff88823c6b5b28
       RBP: 0000000000000000 R08: ffff88823c6b5b58 R09: ffff88823c6b5b60
       R10: ffff88811d97f957 R11: 0000000000000010 R12: 00000000000a801d
       R13: ffff88810d8b35d8 R14: 0000000000000010 R15: ffff88823c6b5b28
       FS:  0000000000000000(0000) GS:ffff88823c680000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000561858ad7258 CR3: 000000007729e001 CR4: 0000000000170ef0
       Call Trace:
        <TASK>
        ? die_addr+0x40/0xa0
        ? exc_general_protection+0x154/0x230
        ? asm_exc_general_protection+0x26/0x30
        ? hrtimer_active+0x58/0x300
        ? __pfx_mutex_lock+0x10/0x10
        ? __pfx_locks_remove_file+0x10/0x10
        hrtimer_cancel+0x15/0x40
        timerlat_fd_release+0x8e/0x1f0
        ? security_file_release+0x43/0x80
        __fput+0x372/0xb10
        task_work_run+0x11e/0x1f0
        ? _raw_spin_lock+0x85/0xe0
        ? __pfx_task_work_run+0x10/0x10
        ? poison_slab_object+0x109/0x170
        ? do_exit+0x7a0/0x24b0
        do_exit+0x7bd/0x24b0
        ? __pfx_migrate_enable+0x10/0x10
        ? __pfx_do_exit+0x10/0x10
        ? __pfx_read_tsc+0x10/0x10
        ? ktime_get+0x64/0x140
        ? _raw_spin_lock_irq+0x86/0xe0
        do_group_exit+0xb0/0x220
        get_signal+0x17ba/0x1b50
        ? vfs_read+0x179/0xa40
        ? timerlat_fd_read+0x30b/0x9d0
        ? __pfx_get_signal+0x10/0x10
        ? __pfx_timerlat_fd_read+0x10/0x10
        arch_do_signal_or_restart+0x8c/0x570
        ? __pfx_arch_do_signal_or_restart+0x10/0x10
        ? vfs_read+0x179/0xa40
        ? ksys_read+0xfe/0x1d0
        ? __pfx_ksys_read+0x10/0x10
        syscall_exit_to_user_mode+0xbc/0x130
        do_syscall_64+0x74/0x110
        ? __pfx___rseq_handle_notify_resume+0x10/0x10
        ? __pfx_ksys_read+0x10/0x10
        ? fpregs_restore_userregs+0xdb/0x1e0
        ? fpregs_restore_userregs+0xdb/0x1e0
        ? syscall_exit_to_user_mode+0x116/0x130
        ? do_syscall_64+0x74/0x110
        ? do_syscall_64+0x74/0x110
        ? do_syscall_64+0x74/0x110
        entry_SYSCALL_64_after_hwframe+0x71/0x79
       RIP: 0033:0x7ff0070eca9c
       Code: Unable to access opcode bytes at 0x7ff0070eca72.
       RSP: 002b:00007ff006dff8c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
       RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007ff0070eca9c
       RDX: 0000000000000400 RSI: 00007ff006dff9a0 RDI: 0000000000000003
       RBP: 00007ff006dffde0 R08: 0000000000000000 R09: 00007ff000000ba0
       R10: 00007ff007004b08 R11: 0000000000000246 R12: 0000000000000003
       R13: 00007ff006dff9a0 R14: 0000000000000007 R15: 0000000000000008
        </TASK>
       Modules linked in: snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep snd_hda_core
       ---[ end trace 0000000000000000 ]---
      
      This is because it would mistakenly call kthread_stop() on a user space
      thread making it "exit" before it actually exits.
      
      Since kthreads are created based on global behavior, use a cpumask to know
      when kthreads are running and that they need to be shutdown before
      proceeding to do new work.
      
      Link: https://lore.kernel.org/all/20240820130001.124768-1-tglozar@redhat.com/
      
      This was debugged by using the persistent ring buffer:
      
      Link: https://lore.kernel.org/all/20240823013902.135036960@goodmis.org/
      
      Note, locking was originally used to fix this, but that proved to cause too
      many deadlocks to work around:
      
        https://lore.kernel.org/linux-trace-kernel/20240823102816.5e55753b@gandalf.local.home/
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
      Link: https://lore.kernel.org/20240904103428.08efdf4c@gandalf.local.home
      Fixes: e88ed227 ("tracing/timerlat: Add user-space interface")
      Reported-by: default avatarTomas Glozar <tglozar@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      177e1cc2