- 05 Feb, 2024 1 commit
-
-
Tejun Heo authored
This reverts commit d412ace1. This leads to build failures as it depends on a driver-core commit 32f78abe ("driver core: bus: constantify subsys_register() calls"). Let's drop it from wq tree and route it through driver-core tree. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402051505.kM9Rr3CJ-lkp@intel.com/
-
- 04 Feb, 2024 5 commits
-
-
Tejun Heo authored
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws such as the execution code accessing the tasklet item after the execution is complete which can lead to subtle use-after-free in certain usage scenarios and less-developed flush and cancel mechanisms. This patch implements BH workqueues which share the same semantics and features of regular workqueues but execute their work items in the softirq context. As there is always only one BH execution context per CPU, none of the concurrency management mechanisms applies and a BH workqueue can be thought of as a convenience wrapper around softirq. Except for the inability to sleep while executing and lack of max_active adjustments, BH workqueues and work items should behave the same as regular workqueues and work items. Currently, the execution is hooked to tasklet[_hi]. However, the goal is to convert all tasklet users over to BH workqueues. Once the conversion is complete, tasklet can be removed and BH workqueues can directly take over the tasklet softirqs. system_bh[_highpri]_wq are added. As queue-wide flushing doesn't exist in tasklet, all existing tasklet users should be able to use the system BH workqueues without creating their own workqueues. v3: - Add missing interrupt.h include. v2: - Instead of using tasklets, hook directly into its softirq action functions - tasklet[_hi]_action(). This is slightly cheaper and closer to the eventual code structure we want to arrive at. Suggested by Lai. - Lai also pointed out several places which need NULL worker->task handling or can use clarification. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CAHk-=wjDW53w4-YcSmgKC5RruiRLHmJ1sXeYdp_ZgVoBw=5byA@mail.gmail.comTested-by: Allen Pais <allen.lkml@gmail.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
Factor out init_cpu_worker_pool() from workqueue_init_early(). This is pure reorganization in preparation of BH workqueue support. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Allen Pais <allen.lkml@gmail.com>
-
Tejun Heo authored
These changes are in preparation of BH workqueue which will execute work items from BH context. - Update lock and RCU depth checks in process_one_work() so that it remembers and checks against the starting depths and prints out the depth changes. - Factor out lockdep annotations in the flush paths into touch_{wq|work}_lockdep_map(). The work->lockdep_map touching is moved from __flush_work() to its callee - start_flush_work(). This brings it closer to the wq counterpart and will allow testing the associated wq's flags which will be needed to support BH workqueues. This is not expected to cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Allen Pais <allen.lkml@gmail.com>
-
Ricardo B. Marliere authored
Now that the driver core can properly handle constant struct bus_type, move the wq_subsys variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-and-reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
dd6c3c54 ("workqueue: Move pwq_dec_nr_in_flight() to the end of work item handling") relocated pwq_dec_nr_in_flight() after set_work_pool_and_keep_pending(). However, the latter destroys information contained in work->data that's needed by pwq_dec_nr_in_flight() including the flush color. With flush color destroyed, flush_workqueue() can stall easily when mixed with cancel_work*() usages. This is easily triggered by running xfstests generic/001 test on xfs: INFO: task umount:6305 blocked for more than 122 seconds. ... task:umount state:D stack:13008 pid:6305 tgid:6305 ppid:6301 flags:0x00004000 Call Trace: <TASK> __schedule+0x2f6/0xa20 schedule+0x36/0xb0 schedule_timeout+0x20b/0x280 wait_for_completion+0x8a/0x140 __flush_workqueue+0x11a/0x3b0 xfs_inodegc_flush+0x24/0xf0 xfs_unmountfs+0x14/0x180 xfs_fs_put_super+0x3d/0x90 generic_shutdown_super+0x7c/0x160 kill_block_super+0x1b/0x40 xfs_kill_sb+0x12/0x30 deactivate_locked_super+0x35/0x90 deactivate_super+0x42/0x50 cleanup_mnt+0x109/0x170 __cleanup_mnt+0x12/0x20 task_work_run+0x60/0x90 syscall_exit_to_user_mode+0x146/0x150 do_syscall_64+0x5d/0x110 entry_SYSCALL_64_after_hwframe+0x6c/0x74 Fix it by stashing work_data before calling set_work_pool_and_keep_pending() and using the stashed value for pwq_dec_nr_in_flight(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Chandan Babu R <chandanbabu@kernel.org> Link: http://lkml.kernel.org/r/87o7cxeehy.fsf@debian-BULLSEYE-live-builder-AMD64 Fixes: dd6c3c54 ("workqueue: Move pwq_dec_nr_in_flight() to the end of work item handling")
-
- 01 Feb, 2024 1 commit
-
-
Miguel Ojeda authored
Commit e563d0a7 ("workqueue: Break up enum definitions and give names to the types") gives a name to the `enum` where `WORK_CPU_UNBOUND` was defined, so `bindgen` changes its output from e.g.: pub type _bindgen_ty_10 = core::ffi::c_uint; pub const WORK_CPU_UNBOUND: _bindgen_ty_10 = 64; to e.g.: pub type wq_misc_consts = core::ffi::c_uint; pub const wq_misc_consts_WORK_CPU_UNBOUND: wq_misc_consts = 64; Thus update Rust's side to match the change (which requires a slight reformat of the code), fixing the build error. Closes: https://lore.kernel.org/rust-for-linux/CANiq72=9PZ89bCAVX0ZV4cqrYSLoZWyn-d_K4KpBMHjwUMdC3A@mail.gmail.com/ Fixes: e563d0a7 ("workqueue: Break up enum definitions and give names to the types") Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 31 Jan, 2024 2 commits
-
-
Tejun Heo authored
System workqueues are allocated early during boot from workqueue_init_early(). While allocating unbound workqueues, wq_update_node_max_active() is invoked from apply_workqueue_attrs() and accesses NUMA topology to initialize wq->node_nr_active[].max. However, topology information may not be set up at this point. wq_update_node_max_active() is explicitly invoked from workqueue_init_topology() later when topology information is known to be available. This doesn't seem to crash anything but it's doing useless work with dubious data. Let's skip the premature and duplicate node_max_active updates by initializing the field to WQ_DFL_MIN_ACTIVE on allocation and making wq_update_node_max_active() noop until workqueue_init_topology(). Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/workqueue.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 9221a4c57ae1..a65081ec6780 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -386,6 +386,8 @@ static const char *wq_affn_names[WQ_AFFN_NR_TYPES] = { [WQ_AFFN_SYSTEM] = "system", }; +static bool wq_topo_initialized = false; + /* * Per-cpu work items which run for longer than the following threshold are * automatically considered CPU intensive and excluded from concurrency @@ -1510,6 +1512,9 @@ static void wq_update_node_max_active(struct workqueue_struct *wq, int off_cpu) lockdep_assert_held(&wq->mutex); + if (!wq_topo_initialized) + return; + if (!cpumask_test_cpu(off_cpu, effective)) off_cpu = -1; @@ -4356,6 +4361,7 @@ static void free_node_nr_active(struct wq_node_nr_active **nna_ar) static void init_node_nr_active(struct wq_node_nr_active *nna) { + nna->max = WQ_DFL_MIN_ACTIVE; atomic_set(&nna->nr, 0); raw_spin_lock_init(&nna->lock); INIT_LIST_HEAD(&nna->pending_pwqs); @@ -7400,6 +7406,8 @@ void __init workqueue_init_topology(void) init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache); init_pod_type(&wq_pod_types[WQ_AFFN_NUMA], cpus_share_numa); + wq_topo_initialized = true; + mutex_lock(&wq_pool_mutex); /*
-
Tejun Heo authored
For wq_update_node_max_active(), @off_cpu of -1 indicates that no CPU is going down. The function was incorrectly calling cpumask_test_cpu() with -1 CPU leading to oopses like the following on some archs: Unable to handle kernel paging request at virtual address ffff0002100296e0 .. pc : wq_update_node_max_active+0x50/0x1fc lr : wq_update_node_max_active+0x1f0/0x1fc ... Call trace: wq_update_node_max_active+0x50/0x1fc apply_wqattrs_commit+0xf0/0x114 apply_workqueue_attrs_locked+0x58/0xa0 alloc_workqueue+0x5ac/0x774 workqueue_init_early+0x460/0x540 start_kernel+0x258/0x684 __primary_switched+0xb8/0xc0 Code: 9100a273 35000d01 53067f00 d0016dc1 (f8607a60) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Attempted to kill the idle task! ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: http://lkml.kernel.org/r/91eacde0-df99-4d5c-a980-91046f66e612@samsung.com Fixes: 5797b1c1 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
-
- 30 Jan, 2024 1 commit
-
-
Leonardo Bras authored
When __queue_delayed_work() is called, it chooses a cpu for handling the timer interrupt. As of today, it will pick either the cpu passed as parameter or the last cpu used for this. This is not good if a system does use CPU isolation, because it can take away some valuable cpu time to: 1 - deal with the timer interrupt, 2 - schedule-out the desired task, 3 - queue work on a random workqueue, and 4 - schedule the desired task back to the cpu. So to fix this, during __queue_delayed_work(), if cpu isolation is in place, pick a random non-isolated cpu to handle the timer interrupt. As an optimization, if the current cpu is not isolated, use it instead of looking for another candidate. Signed-off-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 29 Jan, 2024 10 commits
-
-
Tejun Heo authored
Print out per-node nr/max_active numbers to improve visibility into node_nr_active operations. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
A pool_workqueue (pwq) represents the connection between a workqueue and a worker_pool. One of the roles that a pwq plays is enforcement of the max_active concurrency limit. Before 636b927e ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues"), there was one pwq per each CPU for per-cpu workqueues and per each NUMA node for unbound workqueues, which was a natural result of per-cpu workqueues being served by per-cpu pools and unbound by per-NUMA pools. In terms of max_active enforcement, this was, while not perfect, workable. For per-cpu workqueues, it was fine. For unbound, it wasn't great in that NUMA machines would get max_active that's multiplied by the number of nodes but didn't cause huge problems because NUMA machines are relatively rare and the node count is usually pretty low. However, cache layouts are more complex now and sharing a worker pool across a whole node didn't really work well for unbound workqueues. Thus, a series of commits culminating on 8639eceb ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") implemented more flexible affinity mechanism for unbound workqueues which enables using e.g. last-level-cache aligned pools. In the process, 636b927e ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") made unbound workqueues use per-cpu pwqs like per-cpu workqueues. While the change was necessary to enable more flexible affinity scopes, this came with the side effect of blowing up the effective max_active for unbound workqueues. Before, the effective max_active for unbound workqueues was multiplied by the number of nodes. After, by the number of CPUs. 636b927e ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") claims that this should generally be okay. It is okay for users which self-regulates concurrency level which are the vast majority; however, there are enough use cases which actually depend on max_active to prevent the level of concurrency from going bonkers including several IO handling workqueues that can issue a work item for each in-flight IO. With targeted benchmarks, the misbehavior can easily be exposed as reported in http://lkml.kernel.org/r/dbu6wiwu3sdhmhikb2w6lns7b27gbobfavhjj57kwi2quafgwl@htjcc5oikcr3. Unfortunately, there is no way to express what these use cases need using per-cpu max_active. A CPU may issue most of in-flight IOs, so we don't want to set max_active too low but as soon as we increase max_active a bit, we can end up with unreasonable number of in-flight work items when many CPUs issue IOs at the same time. ie. The acceptable lowest max_active is higher than the acceptable highest max_active. Ideally, max_active for an unbound workqueue should be system-wide so that the users can regulate the total level of concurrency regardless of node and cache layout. The reasons workqueue hasn't implemented that yet are: - One max_active enforcement decouples from pool boundaires, chaining execution after a work item finishes requires inter-pool operations which would require lock dancing, which is nasty. - Sharing a single nr_active count across the whole system can be pretty expensive on NUMA machines. - Per-pwq enforcement had been more or less okay while we were using per-node pools. It looks like we no longer can avoid decoupling max_active enforcement from pool boundaries. This patch implements system-wide nr_active mechanism with the following design characteristics: - To avoid sharing a single counter across multiple nodes, the configured max_active is split across nodes according to the proportion of each workqueue's online effective CPUs per node. e.g. A node with twice more online effective CPUs will get twice higher portion of max_active. - Workqueue used to be able to process a chain of interdependent work items which is as long as max_active. We can't do this anymore as max_active is distributed across the nodes. Instead, a new parameter min_active is introduced which determines the minimum level of concurrency within a node regardless of how max_active distribution comes out to be. It is set to the smaller of max_active and WQ_DFL_MIN_ACTIVE which is 8. This can lead to higher effective max_weight than configured and also deadlocks if a workqueue was depending on being able to handle chains of interdependent work items that are longer than 8. I believe these should be fine given that the number of CPUs in each NUMA node is usually higher than 8 and work item chain longer than 8 is pretty unlikely. However, if these assumptions turn out to be wrong, we'll need to add an interface to adjust min_active. - Each unbound wq has an array of struct wq_node_nr_active which tracks per-node nr_active. When its pwq wants to run a work item, it has to obtain the matching node's nr_active. If over the node's max_active, the pwq is queued on wq_node_nr_active->pending_pwqs. As work items finish, the completion path round-robins the pending pwqs activating the first inactive work item of each, which involves some pool lock dancing and kicking other pools. It's not the simplest code but doesn't look too bad. v4: - wq_adjust_max_active() updated to invoke wq_update_node_max_active(). - wq_adjust_max_active() is now protected by wq->mutex instead of wq_pool_mutex. v3: - wq_node_max_active() used to calculate per-node max_active on the fly based on system-wide CPU online states. Lai pointed out that this can lead to skewed distributions for workqueues with restricted cpumasks. Update the max_active distribution to use per-workqueue effective online CPU counts instead of system-wide and cache the calculation results in node_nr_active->max. v2: - wq->min/max_active now uses WRITE/READ_ONCE() as suggested by Lai. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Naohiro Aota <Naohiro.Aota@wdc.com> Link: http://lkml.kernel.org/r/dbu6wiwu3sdhmhikb2w6lns7b27gbobfavhjj57kwi2quafgwl@htjcc5oikcr3 Fixes: 636b927e ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
Currently, for both percpu and unbound workqueues, max_active applies per-cpu, which is a recent change for unbound workqueues. The change for unbound workqueues was a significant departure from the previous behavior of per-node application. It made some use cases create undesirable number of concurrent work items and left no good way of fixing them. To address the problem, workqueue is implementing a NUMA node segmented global nr_active mechanism, which will be explained further in the next patch. As a preparation, this patch introduces struct wq_node_nr_active. It's a data structured allocated for each workqueue and NUMA node pair and currently only tracks the workqueue's number of active work items on the node. This is split out from the next patch to make it easier to understand and review. Note that there is an extra wq_node_nr_active allocated for the invalid node nr_node_ids which is used to track nr_active for pools which don't have NUMA node associated such as the default fallback system-wide pool. This doesn't cause any behavior changes visible to userland yet. The next patch will expand to implement the control mechanism on top. v4: - Fixed out-of-bound access when freeing per-cpu workqueues. v3: - Use flexible array for wq->node_nr_active as suggested by Lai. v2: - wq->max_active now uses WRITE/READ_ONCE() as suggested by Lai. - Lai pointed out that pwq_tryinc_nr_active() incorrectly dropped pwq->max_active check. Restored. As the next patch replaces the max_active enforcement mechanism, this doesn't change the end result. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
The planned shared nr_active handling for unbound workqueues will make pwq_dec_nr_active() sometimes drop the pool lock temporarily to acquire other pool locks, which is necessary as retirement of an nr_active count from one pool may need kick off an inactive work item in another pool. This patch moves pwq_dec_nr_in_flight() call in try_to_grab_pending() to the end of work item handling so that work item state changes stay atomic. process_one_work() which is the other user of pwq_dec_nr_in_flight() already calls it at the end of work item handling. Comments are added to both call sites and pwq_dec_nr_in_flight(). This shouldn't cause any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
wq->cpu_pwq is RCU protected but wq->dfl_pwq isn't. This is okay because currently wq->dfl_pwq is used only accessed to install it into wq->cpu_pwq which doesn't require RCU access. However, we want to be able to access wq->dfl_pwq under RCU in the future to access its __pod_cpumask and the code can be made easier to read by making the two pwq fields behave in the same way. - Make wq->dfl_pwq RCU protected. - Add unbound_pwq_slot() and unbound_pwq() which can access both ->dfl_pwq and ->cpu_pwq. The former returns the double pointer that can be used access and update the pwqs. The latter performs locking check and dereferences the double pointer. - pwq accesses and updates are converted to use unbound_pwq[_slot](). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
wq_adjust_max_active() needs to activate work items after max_active is increased. Previously, it did that by visiting each pwq once activating all that could be activated. While this makes sense with per-pwq nr_active, nr_active will be shared across multiple pwqs for unbound wqs. Then, we'd want to round-robin through pwqs to be fairer. In preparation, this patch makes wq_adjust_max_active() round-robin pwqs while activating. While the activation ordering changes, this shouldn't cause user-noticeable behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
__queue_work(), pwq_dec_nr_in_flight() and wq_adjust_max_active() were open-coding nr_active handling, which is fine given that the operations are trivial. However, the planned unbound nr_active update will make them more complicated, so let's move them into helpers. - pwq_tryinc_nr_active() is added. It increments nr_active if under max_active limit and return a boolean indicating whether inc was successful. Note that the function is structured to accommodate future changes. __queue_work() is updated to use the new helper. - pwq_activate_first_inactive() is updated to use pwq_tryinc_nr_active() and thus no longer assumes that nr_active is under max_active and returns a boolean to indicate whether a work item has been activated. - wq_adjust_max_active() no longer tests directly whether a work item can be activated. Instead, it's updated to use the return value of pwq_activate_first_inactive() to tell whether a work item has been activated. - nr_active decrement and activating the first inactive work item is factored into pwq_dec_nr_active(). v3: - WARN_ON_ONCE(!WORK_STRUCT_INACTIVE) added to __pwq_activate_work() as now we're calling the function unconditionally from pwq_activate_first_inactive(). v2: - wq->max_active now uses WRITE/READ_ONCE() as suggested by Lai. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
To prepare for unbound nr_active handling improvements, move work activation part of pwq_activate_inactive_work() into __pwq_activate_work() and add pwq_activate_work() which tests WORK_STRUCT_INACTIVE and updates nr_active. pwq_activate_first_inactive() and try_to_grab_pending() are updated to use pwq_activate_work(). The latter conversion is functionally identical. For the former, this conversion adds an unnecessary WORK_STRUCT_INACTIVE testing. This is temporary and will be removed by the next patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
"!pwq->nr_active && list_empty(&pwq->inactive_works)" test is repeated multiple times. Let's factor it out into pwq_is_empty(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
Tejun Heo authored
max_active is a workqueue-wide setting and the configured value is stored in wq->saved_max_active; however, the effective value was stored in pwq->max_active. While this is harmless, it makes max_active update process more complicated and gets in the way of the planned max_active semantic updates for unbound workqueues. This patches moves pwq->max_active to wq->max_active. This simplifies the code and makes freezing and noop max_active updates cheaper too. No user-visible behavior change is intended. As wq->max_active is updated while holding wq mutex but read without any locking, it now uses WRITE/READ_ONCE(). A new locking locking rule WO is added for it. v2: wq->max_active now uses WRITE/READ_ONCE() as suggested by Lai. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
-
- 26 Jan, 2024 2 commits
-
-
Tejun Heo authored
workqueue is collecting different sorts of enums into a single unnamed enum type which can increase confusion around enum width. Also, unnamed enums can't be accessed from BPF. Let's break up enum definitions according to their purposes and give them type names. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
After creating a new worker, create_worker() is calling kick_pool() to wake up the new worker task. However, as kick_pool() doesn't do anything if there is no work pending, it also calls wake_up_process() explicitly. There's no reason to call kick_pool() at all. wake_up_process() is enough by itself. Drop the unnecessary kick_pool() call. Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 25 Jan, 2024 2 commits
-
-
Audra Mitchell authored
Since we have set the WQ_NAME_LEN to 32, decrease the name of events_freezable_power_efficient so that it does not trip the name length warning when the workqueue is created. Signed-off-by: Audra Mitchell <audra@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
- Factor out wq_type_str() - Improve formatting so that it adapts to actual field widths. - Drop duplicate information from "Workqueue -> rescuer" section. If anything, we should add more rescuer-specific info - e.g. the number of work items rescued. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com>
-
- 19 Jan, 2024 1 commit
-
-
Marcelo Tosatti authored
A customer using nohz_full has experienced the following interruption: oslat-1004510 [018] timer_cancel: timer=0xffff90a7ca663cf8 oslat-1004510 [018] timer_expire_entry: timer=0xffff90a7ca663cf8 function=delayed_work_timer_fn now=4709188240 baseclk=4709188240 oslat-1004510 [018] workqueue_queue_work: work struct=0xffff90a7ca663cd8 function=fb_flashcursor workqueue=events_power_efficient req_cpu=8192 cpu=18 oslat-1004510 [018] workqueue_activate_work: work struct 0xffff90a7ca663cd8 oslat-1004510 [018] sched_wakeup: kworker/18:1:326 [120] CPU:018 oslat-1004510 [018] timer_expire_exit: timer=0xffff90a7ca663cf8 oslat-1004510 [018] irq_work_entry: vector=246 oslat-1004510 [018] irq_work_exit: vector=246 oslat-1004510 [018] tick_stop: success=0 dependency=SCHED oslat-1004510 [018] hrtimer_start: hrtimer=0xffff90a70009cb00 function=tick_sched_timer/0x0 ... oslat-1004510 [018] softirq_exit: vec=1 [action=TIMER] oslat-1004510 [018] softirq_entry: vec=7 [action=SCHED] oslat-1004510 [018] softirq_exit: vec=7 [action=SCHED] oslat-1004510 [018] tick_stop: success=0 dependency=SCHED oslat-1004510 [018] sched_switch: oslat:1004510 [120] R ==> kworker/18:1:326 [120] kworker/18:1-326 [018] workqueue_execute_start: work struct 0xffff90a7ca663cd8: function fb_flashcursor kworker/18:1-326 [018] workqueue_queue_work: work struct=0xffff9078f119eed0 function=drm_fb_helper_damage_work workqueue=events req_cpu=8192 cpu=18 kworker/18:1-326 [018] workqueue_activate_work: work struct 0xffff9078f119eed0 kworker/18:1-326 [018] timer_start: timer=0xffff90a7ca663cf8 function=delayed_work_timer_fn ... Set wq_power_efficient to true, in case nohz_full is enabled. This makes the power efficient workqueue be unbounded, which allows workqueue items there to be moved to HK CPUs. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 16 Jan, 2024 4 commits
-
-
Xuewen Yan authored
Currently the workqueue just checks the atomic and locking states after work execution ends. However, sometimes, a work item may not unlock rcu after acquiring rcu_read_lock(). And as a result, it would cause rcu stall, but the rcu stall warning can not dump the work func, because the work has finished. In order to quickly discover those works that do not call rcu_read_unlock() after rcu_read_lock(), add the rcu lock check. Use rcu_preempt_depth() to check the work's rcu status. Normally, this value is 0. If this value is bigger than 0, it means the work are still holding rcu lock. If so, print err info and the work func. tj: Reworded the description for clarity. Minor formatting tweak. Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Juri Lelli authored
At the time they are created unbound workqueues rescuers currently use cpu_possible_mask as their affinity, but this can be too wide in case a workqueue unbound mask has been set as a subset of cpu_possible_mask. Make new rescuers use their associated workqueue unbound cpumask from the start. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Juri Lelli authored
Retrieving rescuers information (e.g., affinity and name) is quite useful when debugging workqueues configurations. Add printing of such information to the existing wq_dump.py script. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Audra Mitchell authored
Currently we limit the size of the workqueue name to 24 characters due to commit ecf6881f ("workqueue: make workqueue->name[] fixed len") Increase the size to 32 characters and print a warning in the event the requested name is larger than the limit of 32 characters. Signed-off-by: Audra Mitchell <audra@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 13 Jan, 2024 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfatLinus Torvalds authored
Pull exfat updates from Namjae Jeon: - Replace the internal table lookup algorithm with the hweight library and ffs of the bitops library. - Handle the two types of stream entry, valid data size (has been written) and data size separately. It improves compatibility with two differently sized files created on Windows. * tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: do not zero the extended part exfat: change to get file size from DataLength exfat: using ffs instead of internal logic exfat: using hweight instead of internal logic
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull bcachefs locking fix from Al Viro: "Fix broken locking in bch2_ioctl_subvolume_destroy()" * tag 'pull-bcachefs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bch2_ioctl_subvolume_destroy(): fix locking new helper: user_path_locked_at()
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull nfsctl update from Al Viro: "More simple_recursive_removal() conversions. nfsctl this time..." * tag 'pull-simple_recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsctl: switch to simple_recursive_removal()
-
https://github.com/neeraju/linuxLinus Torvalds authored
Pull RCU updates from Neeraj Upadhyay: - Documentation and comment updates - RCU torture, locktorture updates that include cleanups; nolibc init build support for mips, ppc and rv64; testing of mid stall duration scenario and fixing fqs task creation conditions - Misc fixes, most notably restricting usage of RCU CPU stall notifiers, to confine their usage primarily to debug kernels - RCU tasks minor fixes - lockdep annotation fix for NMI-safe accesses, callback advancing/acceleration cleanup and documentation improvements * tag 'rcu.release.v6.8' of https://github.com/neeraju/linux: rcu: Force quiescent states only for ongoing grace period doc: Clarify historical disclaimers in memory-barriers.txt doc: Mention address and data dependencies in rcu_dereference.rst doc: Clarify RCU Tasks reader/updater checklist rculist.h: docs: Fix wrong function summary Documentation: RCU: Remove repeated word in comments srcu: Use try-lock lockdep annotation for NMI-safe access. srcu: Explain why callbacks invocations can't run concurrently srcu: No need to advance/accelerate if no callback enqueued srcu: Remove superfluous callbacks advancing from srcu_gp_start() rcu: Remove unused macros from rcupdate.h rcu: Restrict access to RCU CPU stall notifiers rcu-tasks: Mark RCU Tasks accesses to current->rcu_tasks_idle_cpu rcutorture: Add fqs_holdoff check before fqs_task is created rcutorture: Add mid-sized stall to TREE07 rcutorture: add nolibc init support for mips, ppc and rv64 locktorture: Increase Hamming distance between call_rcu_chain and rcu_call_chains
-
- 12 Jan, 2024 7 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linuxLinus Torvalds authored
Pull devicetree updates from Rob Herring: - Convert FPGA bridge, all TPMs (finally), and Rockchip HDMI bindings to schemas - Improvements in Samsung GPU schemas - A few more cases of dropping unneeded quotes in schemas - Merge QCom idle-states txt binding into common idle-states schema - Add X1E80100, SM8650, SM8650, and SDX75 SoCs to QCom Power Domain Controller - Add NXP i.mx8dl to SCU PD - Add synaptics r63353 panel controller - Clarify the wording around the use of 'wakeup-source' property - Add a DTS coding style doc - Add smi vendor prefix - Fix DT_SCHEMA_FILES incorrect matching of paths outside the kernel tree - Disable sysfb (e.g. EFI FB) when simple-framebuffer node is present - Fix double free in of_parse_phandle_with_args_map() - A couple of kerneldoc fixes * tag 'devicetree-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (37 commits) of: unittest: Fix of_count_phandle_with_args() expected value message dt-bindings: fpga: altera: Convert bridge bindings to yaml dt-bindings: fpga: Convert bridge binding to yaml dt-bindings: vendor-prefixes: Add smi dt-bindings: power: Clarify wording for wakeup-source property of: Fix double free in of_parse_phandle_with_args_map dt-bindings: ignore paths outside kernel for DT_SCHEMA_FILES drivers: of: Fixed kernel doc warning dt-bindings: tpm: Document Microsoft fTPM bindings dt-bindings: tpm: Convert IBM vTPM bindings to DT schema dt-bindings: tpm: Convert Google Cr50 bindings to DT schema dt-bindings: tpm: Consolidate TCG TIS bindings dt-bindings: display: rockchip,inno-hdmi: Document RK3128 compatible dt-bindings: arm: Add remote etm dt-binding dt-bindings: mmc: sdhci-pxa: Fix 'regs' typo media: dt-bindings: samsung,s5p-mfc: Fix iommu properties schemas dt-bindings: display: panel: Add synaptics r63353 panel controller dt-bindings: arm: merge qcom,idle-state with idle-state dt-bindings: drm: rockchip: convert inno_hdmi-rockchip.txt to yaml dt-bindings: cache: qcom,llcc: correct QDU1000 reg entries ...
-
Linus Torvalds authored
Merge tag 'pwm/for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This contains a bunch of cleanups and simplifications across the board, as well as a number of small fixes. Perhaps the most notable change here is the addition of an API that allows PWMs to be used in atomic contexts, which is useful when time- critical operations are involved, such as using a PWM to generate IR signals. Finally, I have decided to step down as PWM subsystem maintainer. Due to other responsibilities I have lately not been able to find the time that the subsystem deserves and Uwe, who has been helping out a lot for the past few years and has many things planned for the future, has kindly volunteered to take over. I have no doubt that he will be a suitable replacement" * tag 'pwm/for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (44 commits) MAINTAINERS: pwm: Thierry steps down, Uwe takes over pwm: linux/pwm.h: fix Excess kernel-doc description warning pwm: Add pwm_apply_state() compatibility stub pwm: cros-ec: Drop documentation for dropped struct member pwm: Drop two unused API functions pwm: lpc18xx-sct: Don't modify the cached period of other PWM outputs pwm: meson: Simplify using dev_err_probe() pwm: stmpe: Silence duplicate error messages pwm: Reduce number of pointer dereferences in pwm_device_request() pwm: crc: Use consistent variable naming for driver data pwm: omap-dmtimer: Drop locking dt-bindings: pwm: ti,pwm-omap-dmtimer: Update binding for yaml media: pwm-ir-tx: Trigger edges from hrtimer interrupt context pwm: bcm2835: Allow PWM driver to be used in atomic context pwm: Make it possible to apply PWM changes in atomic context pwm: renesas: Remove unused include pwm: Replace ENOTSUPP with EOPNOTSUPP pwm: Rename pwm_apply_state() to pwm_apply_might_sleep() pwm: Stop referencing pwm->chip pwm: Update kernel doc for struct pwm_chip ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID updates from Jiri Kosina: - assorted functional fixes for hid-steam ported from SteamOS betas (Vicki Pfau) - fix for custom sensor-hub sensors (hinge angle sensor and LISS sensors) not working (Yauhen Kharuzhy) - functional fix for handling Confidence in Wacom driver (Jason Gerecke) - support for Ilitek ili2901 touchscreen (Zhengqiao Xia) - power management fix for Wacom userspace battery exporting (Tatsunosuke Tobita) - rework of wait-for-reset in order to reduce the need for I2C_HID_QUIRK_NO_IRQ_AFTER_RESET qurk; the success rate is now 50% better, but there are still further improvements to be made (Hans de Goede) - greatly improved coverage of Tablets in hid-selftests (Benjamin Tissoires) - support for Nintendo NSO controllers -- SNES, Genesis and N64 (Ryan McClelland) - support for controlling mcp2200 GPIOs (Johannes Roith) - power management improvement for EHL OOB wakeup in intel-ish (Kai-Heng Feng) - other assorted device-specific fixes and code cleanups * tag 'hid-for-linus-2024010801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (53 commits) HID: amd_sfh: Add a new interface for exporting ALS data HID: amd_sfh: Add a new interface for exporting HPD data HID: amd_sfh: rename float_to_int() to amd_sfh_float_to_int() HID: i2c-hid: elan: Add ili2901 timing dt-bindings: HID: i2c-hid: elan: Introduce Ilitek ili2901 HID: bpf: make bus_type const in struct hid_bpf_ops HID: make ishtp_cl_bus_type const HID: make hid_bus_type const HID: hid-steam: Add gamepad-only mode switched to by holding options HID: hid-steam: Better handling of serial number length HID: hid-steam: Update list of identifiers from SDL HID: hid-steam: Make client_opened a counter HID: hid-steam: Clean up locking HID: hid-steam: Disable watchdog instead of using a heartbeat HID: hid-steam: Avoid overwriting smoothing parameter HID: magicmouse: fix kerneldoc for struct magicmouse_sc HID: sensor-hub: Enable hid core report processing for all devices HID: wacom: Add additional tests of confidence behavior HID: wacom: Correct behavior when processing some confidence == false touches HID: nintendo: add support for nso controllers ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdevLinus Torvalds authored
Pull fbdev updates from Helge Deller: "Three fbdev drivers (~8500 lines of code) removed. The Carillo Ranch fbdev driver is for an Intel product which was never shipped, and for the intelfb and the amba-clcd drivers the drm drivers can be used instead. The other code changes are minor: some fb_deferred_io flushing fixes, imxfb margin fixes and stifb cleanups. Summary: - Remove intelfb fbdev driver (Thomas Zimmermann) - Remove amba-clcd fbdev driver (Linus Walleij) - Remove vmlfb Carillo Ranch fbdev driver (Matthew Wilcox) - fb_deferred_io flushing fixes (Nam Cao) - imxfb code fixes and cleanups (Dario Binacchi) - stifb primary screen detection cleanups (Thomas Zimmermann)" * tag 'fbdev-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: (28 commits) fbdev/intelfb: Remove driver fbdev/hyperv_fb: Do not clear global screen_info firmware/sysfb: Clear screen_info state after consuming it fbdev/hyperv_fb: Remove firmware framebuffers with aperture helpers drm/hyperv: Remove firmware framebuffers with aperture helper fbdev/sis: Remove dependency on screen_info video/logo: use %u format specifier for unsigned int values video/sticore: Remove info field from STI struct arch/parisc: Detect primary video device from device instance fbdev/stifb: Allocate fb_info instance with framebuffer_alloc() video/sticore: Store ROM device in STI struct fbdev: flush deferred IO before closing fbdev: flush deferred work in fb_deferred_io_fsync() fbdev: amba-clcd: Delete the old CLCD driver fbdev: Remove support for Carillo Ranch driver fbdev: hgafb: fix kernel-doc comments fbdev: mmp: Fix typo and wording in code comment fbdev: fsl-diu-fb: Fix sparse warning due to virt_to_phys() prototype change fbdev: imxfb: add '*/' on a separate line in block comment fbdev: imxfb: use __func__ for function name ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds authored
Pull media updates from Mauro Carvalho Chehab: - v4l core: subdev frame interval now supports which field - v4l kapi: moves and renames the init_cfg pad op to init_state as an internal op. - new sensor drivers: gc0308, gc2145, Avnet Alvium, ov64a40, tw9900 - new camera driver: STM32 DCMIPP - s5p-mfc has gained MFC v12 support - new ISP driver added to staging: Starfive - new stateful encoder/decoded: Wave5 codec It is found on the J721S2 SoC, JH7100 SoC, ssd202d SoC. Etc. - fwnode gained support for MIPI "DisCo for Imaging" (https://www.mipi.org/specifications/mipi-disco-imaging) - as usual, lots of cleanups, fixups and driver improvements. * tag 'media/v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (309 commits) media: i2c: thp7312: select CONFIG_FW_LOADER media: i2c: mt9m114: use fsleep() in place of udelay() media: videobuf2: core: Rename min_buffers_needed field in vb2_queue media: i2c: thp7312: Store frame interval in subdev state media: docs: uAPI: Fix documentation of 'which' field for routing ioctls media: docs: uAPI: Expand error documentation for invalid 'which' value media: docs: uAPI: Clarify error documentation for invalid 'which' value media: v4l2-subdev: Store frame interval in subdev state media: v4l2-subdev: Add which field to struct v4l2_subdev_frame_interval media: v4l2-subdev: Turn .[gs]_frame_interval into pad operations media: v4l: subdev: Move out subdev state lock macros outside CONFIG_MEDIA_CONTROLLER media: s5p-mfc: DPB Count Independent of VIDIOC_REQBUF media: s5p-mfc: Load firmware for each run in MFCv12. media: s5p-mfc: Set context for valid case before calling try_run media: s5p-mfc: Add support for DMABUF for encoder media: s5p-mfc: Add support for UHD encoding. media: s5p-mfc: Add support for rate controls in MFCv12 media: s5p-mfc: Add YV12 and I420 multiplanar format support media: s5p-mfc: Add initial support for MFCv12 media: s5p-mfc: Rename IS_MFCV10 macro ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull libnvdimm updates from Ira Weiny: "A mix of bug fixes and updates to interfaces used by nvdimm: - Updates to interfaces include: Use the new scope based management Remove deprecated ida interfaces Update to sysfs_emit() - Fixup kdoc comments" * tag 'libnvdimm-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: acpi/nfit: Use sysfs_emit() for all attributes nvdimm/namespace: fix kernel-doc for function params nvdimm/dimm_devs: fix kernel-doc for function params nvdimm/btt: fix btt_blk_cleanup() kernel-doc nvdimm-btt: simplify code with the scope based resource management nvdimm: Remove usage of the deprecated ida_simple_xx() API ACPI: NFIT: Use cleanup.h helpers instead of devm_*()
-
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds authored
Pull MMC updates from Ulf Hansson: "MMC core: - Don't force a retune before eMMC RPMB switch - Add optional HS400 tuning in HS400es initialization - Add a sysfs node to for write-protect-group-size - Add re-tuning test to the mmc-test module - Use mrq.sbc to support close-ended ioctl requests MMC host: - mmci: Add support for SDIO in-band irqs for the stm32 variant - mmc_spi: Remove broken support custom DMA mapped buffers - mtk-sd: Improve and extend the support for tunings - renesas_sdhi: Document support for the RZ/Five variant - sdhci_am654: Drop support for the ti,otap-del-sel DT property - sdhci-brcmstb: Add support for the brcm 74165b0 variant - sdhci-msm: Add compatibles for IPQ4019 and IPQ8074 - sdhci-of-dwcmshc: Add support for the T-Head TH1520 variant - sdhci-xenon: Add support for the Marvell ac5 variant" * tag 'mmc-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (27 commits) mmc: xenon: Add ac5 support via bounce buffer dt-bindings: mmc: add Marvell ac5 mmc: sdhci-brcmstb: add new sdhci reset sequence for brcm 74165b0 dt-bindings: mmc: brcm,sdhci-brcmstb: Add support for 74165b0 mmc: core: Do not force a retune before RPMB switch mmc: core: Add HS400 tuning in HS400es initialization mmc: sdhci_omap: Fix TI SoC dependencies mmc: sdhci_am654: Fix TI SoC dependencies mmc: core: Add wp_grp_size sysfs node mmc: mmc_test: Add re-tuning test mmc: mmc_spi: remove custom DMA mapped buffers dt-bindings: mmc: sdhci-msm: document dedicated IPQ4019 and IPQ8074 dt-bindings: mmc: synopsys-dw-mshc: add iommus for Intel SocFPGA mmc: mtk-sd: Extend number of tuning steps dt-bindings: mmc: mtk-sd: add tuning steps related property mmc: sdhci-omap: don't misuse kernel-doc marker mmc: mtk-sd: Increase the verbosity of msdc_track_cmd_data mmc: core: Use mrq.sbc in close-ended ffu mmc: sdhci_am654: Drop lookup for deprecated ti,otap-del-sel mmc: sdhci-of-dwcmshc: Use logical OR instead of bitwise OR in dwcmshc_probe() ...
-