- 25 Mar, 2013 4 commits
-
-
Lai Jiangshan authored
We're expanding wq->mutex to cover all fields specific to each workqueue with the end goal of replacing pwq_lock which will make locking simpler and easier to understand. init_and_link_pwq() and pwq_unbound_release_workfn() already grab wq->mutex when adding or removing a pwq from wq->pwqs list. This patch makes it official that the list is wq->mutex protected for writes and updates readers accoridingly. Explicit IRQ toggles for sched-RCU read-locking in flush_workqueue_prep_pwqs() and drain_workqueues() are removed as the surrounding wq->mutex can provide sufficient synchronization. Also, assert_rcu_or_pwq_lock() is renamed to assert_rcu_or_wq_mutex() and checks for wq->mutex too. pwq_lock locking and assertion are not removed by this patch and a couple of for_each_pwq() iterations are still protected by it. They'll be removed by future patches. tj: Rebased on top of the current dev branch. Updated description. Folded in assert_rcu_or_wq_mutex() renaming from a later patch along with associated comment updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Lai Jiangshan authored
We're expanding wq->mutex to cover all fields specific to each workqueue with the end goal of replacing pwq_lock which will make locking simpler and easier to understand. wq->nr_drainers and ->flags are specific to each workqueue. Protect ->nr_drainers and ->flags with wq->mutex instead of pool_mutex. tj: Rebased on top of the current dev branch. Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Lai Jiangshan authored
Currently pwq->flush_mutex protects many fields of a workqueue including, especially, the pwqs list. We're going to expand this mutex to protect most of a workqueue and eventually replace pwq_lock, which will make locking simpler and easier to understand. Drop the "flush_" prefix in preparation. This patch is pure rename. tj: Rebased on top of the current dev branch. Updated description. Use WQ: and WR: instead of Q: and QR: for synchronization labels. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Lai Jiangshan authored
wq->flush_mutex will be renamed to wq->mutex and cover all fields specific to each workqueue and eventually replace pwq_lock, which will make locking simpler and easier to understand. Rename wq_mutex to wq_pool_mutex to avoid confusion with wq->mutex. After the scheduled changes, wq_pool_mutex won't be protecting anything specific to each workqueue instance anyway. This patch is pure rename. tj: s/wqs_mutex/wq_pool_mutex/. Rewrote description. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
-
- 20 Mar, 2013 5 commits
-
-
Lai Jiangshan authored
If lockdep complains something for other subsystem, lockdep_is_held() can be false negative, so we need to also test debug_locks before triggering WARN. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Lai Jiangshan authored
rcu_read_lock_sched() is better than preempt_disable() if the code is protected by RCU_SCHED. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Lai Jiangshan authored
If pwq_adjust_max_active() changes max_active from 0 to saved_max_active, it needs to wakeup worker. This is already done by thaw_workqueues(). If pwq_adjust_max_active() increases max_active for an unbound wq, while not strictly necessary for correctness, it's still desirable to wake up a worker so that the requested concurrency level is reached sooner. Move wake_up_worker() call from thaw_workqueues() to pwq_adjust_max_active() so that it can handle both of the above two cases. This also makes thaw_workqueues() simpler. tj: Updated comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Lai Jiangshan authored
We can test worker->recue_wq instead of reaching into current_pwq->wq->rescuer and then comparing it to self. tj: Commit message. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Lai Jiangshan authored
get_unbound_pool() forgot to set POOL_FREEZING if workqueue_freezing is set and a new pool could go out of sync with the global freezing state. Fix it by adding POOL_FREEZING if workqueue_freezing. wq_mutex is already held so no further locking is necessary. This also removes the unused static variable warning when !CONFIG_FREEZER. tj: Updated commit message. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 19 Mar, 2013 5 commits
-
-
Tejun Heo authored
With the recent addition of the custom attributes support, unbound pools may have allowed cpumask which isn't full. As long as some of CPUs in the cpumask are online, its workers will maintain cpus_allowed as set on worker creation; however, once no online CPU is left in cpus_allowed, the scheduler will reset cpus_allowed of any workers which get scheduled so that they can execute. To remain compliant to the user-specified configuration, CPU affinity needs to be restored when a CPU becomes online for an unbound pool which doesn't currently have any online CPUs before. This patch implement restore_unbound_workers_cpumask(), which is called from CPU_ONLINE for all unbound pools, checks whether the coming up CPU is the first allowed online one, and, if so, invokes set_cpus_allowed_ptr() with the configured cpumask on all workers. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
Rebinding workers of a per-cpu pool after a CPU comes online involves a lot of back-and-forth mostly because only the task itself could adjust CPU affinity if PF_THREAD_BOUND was set. As CPU_ONLINE itself couldn't adjust affinity, it had to somehow coerce the workers themselves to perform set_cpus_allowed_ptr(). Due to the various states a worker can be in, this led to three different paths a worker may be rebound. worker->rebind_work is queued to busy workers. Idle ones are signaled by unlinking worker->entry and call idle_worker_rebind(). The manager isn't covered by either and implements its own mechanism. PF_THREAD_BOUND has been relaced with PF_NO_SETAFFINITY and CPU_ONLINE itself now can manipulate CPU affinity of workers. This patch replaces the existing rebind mechanism with direct one where CPU_ONLINE iterates over all workers using for_each_pool_worker(), restores CPU affinity, and clears WORKER_UNBOUND. There are a couple subtleties. All bound idle workers should have their runqueues set to that of the bound CPU; however, if the target task isn't running, set_cpus_allowed_ptr() just updates the cpus_allowed mask deferring the actual migration to when the task wakes up. This is worked around by waking up idle workers after restoring CPU affinity before any workers can become bound. Another subtlety is stems from matching @pool->nr_running with the number of running unbound workers. While DISASSOCIATED, all workers are unbound and nr_running is zero. As workers become bound again, nr_running needs to be adjusted accordingly; however, there is no good way to tell whether a given worker is running without poking into scheduler internals. Instead of clearing UNBOUND directly, rebind_workers() replaces UNBOUND with another new NOT_RUNNING flag - REBOUND, which will later be cleared by the workers themselves while preparing for the next round of work item execution. The only change needed for the workers is clearing REBOUND along with PREP. * This patch leaves for_each_busy_worker() without any user. Removed. * idle_worker_rebind(), busy_worker_rebind_fn(), worker->rebind_work and rebind logic in manager_workers() removed. * worker_thread() now looks at WORKER_DIE instead of testing whether @worker->entry is empty to determine whether it needs to do something special as dying is the only special thing now. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
rebind_workers() will be reimplemented in a way which makes it mostly decoupled from the rest of worker management. Move rebind_workers() so that it's located with other CPU hotplug related functions. This patch is pure function relocation. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
Make worker_ida an idr - worker_idr and use it to implement for_each_pool_worker() which will be used to simplify worker rebinding on CPU_ONLINE. pool->worker_idr is protected by both pool->manager_mutex and pool->lock so that it can be iterated while holding either lock. * create_worker() allocates ID without installing worker pointer and installs the pointer later using idr_replace(). This is because worker ID is needed when creating the actual task to name it and the new worker shouldn't be visible to iterations before fully initialized. * In destroy_worker(), ID removal is moved before kthread_stop(). This is again to guarantee that only fully working workers are visible to for_each_pool_worker(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
PF_THREAD_BOUND was originally used to mark kernel threads which were bound to a specific CPU using kthread_bind() and a task with the flag set allows cpus_allowed modifications only to itself. Workqueue is currently abusing it to prevent userland from meddling with cpus_allowed of workqueue workers. What we need is a flag to prevent userland from messing with cpus_allowed of certain kernel tasks. In kernel, anyone can (incorrectly) squash the flag, and, for worker-type usages, restricting cpus_allowed modification to the task itself doesn't provide meaningful extra proection as other tasks can inject work items to the task anyway. This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY. sched_setaffinity() checks the flag and return -EINVAL if set. set_cpus_allowed_ptr() is no longer affected by the flag. This will allow simplifying workqueue worker CPU affinity management. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 14 Mar, 2013 7 commits
-
-
Tejun Heo authored
With the recent locking updates, the only thing protected by workqueue_lock is workqueue->maydays list. Rename workqueue_lock to wq_mayday_lock. This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
This patch continues locking cleanup from the previous patch. It breaks out pool_workqueue synchronization from workqueue_lock into a new spinlock - pwq_lock. The followings are protected by pwq_lock. * workqueue->pwqs * workqueue->saved_max_active The conversion is straight-forward. workqueue_lock usages which cover the above two are converted to pwq_lock. New locking label PW added for things protected by pwq_lock and FR is updated to mean flush_mutex + pwq_lock + sched-RCU. This patch shouldn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
Currently, workqueue_lock protects most shared workqueue resources - the pools, workqueues, pool_workqueues, draining, ID assignments, mayday handling and so on. The coverage has grown organically and there is no identified bottleneck coming from workqueue_lock, but it has grown a bit too much and scheduled rebinding changes need the pools and workqueues to be protected by a mutex instead of a spinlock. This patch breaks out pool and workqueue synchronization from workqueue_lock into a new mutex - wq_mutex. The followings are protected by wq_mutex. * worker_pool_idr and unbound_pool_hash * pool->refcnt * workqueues list * workqueue->flags, ->nr_drainers Most changes are mostly straight-forward. workqueue_lock is replaced with wq_mutex where applicable and workqueue_lock lock/unlocks are added where wq_mutex conversion leaves data structures not protected by wq_mutex without locking. irq / preemption flippings were added where the conversion affects them. Things worth noting are * New WQ and WR locking lables added along with assert_rcu_or_wq_mutex(). * worker_pool_assign_id() now expects to be called under wq_mutex. * create_mutex is removed from get_unbound_pool(). It now just holds wq_mutex. This patch shouldn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
They're split across debugobj code for some reason. Collect them. This patch is pure relocation. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
When a manager creates or destroys workers, the operations are always done with the manager_mutex held; however, initial worker creation or worker destruction during pool release don't grab the mutex. They are still correct as initial worker creation doesn't require synchronization and grabbing manager_arb provides enough exclusion for pool release path. Still, let's make everyone follow the same rules for consistency and such that lockdep annotations can be added. Update create_and_start_worker() and put_unbound_pool() to grab manager_mutex around thread creation and destruction respectively and add lockdep assertions to create_worker() and destroy_worker(). This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
get_unbound_pool(), workqueue_cpu_up_callback() and init_workqueues() have similar code pieces to create and start the initial worker factor those out into create_and_start_worker(). This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
Manager operations are currently governed by two mutexes - pool->manager_arb and ->assoc_mutex. The former is used to decide who gets to be the manager and the latter to exclude the actual manager operations including creation and destruction of workers. Anyone who grabs ->manager_arb must perform manager role; otherwise, the pool might stall. Grabbing ->assoc_mutex blocks everyone else from performing manager operations but doesn't require the holder to perform manager duties as it's merely blocking manager operations without becoming the manager. Because the blocking was necessary when [dis]associating per-cpu workqueues during CPU hotplug events, the latter was named assoc_mutex. The mutex is scheduled to be used for other purposes, so this patch gives it a more fitting generic name - manager_mutex - and updates / adds comments to explain synchronization around the manager role and operations. This patch is pure rename / doc update. Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 13 Mar, 2013 7 commits
-
-
Tejun Heo authored
There's no reason to make these trivial wrappers full (exported) functions. Inline the followings. queue_work() queue_delayed_work() mod_delayed_work() schedule_work_on() schedule_work() schedule_delayed_work_on() schedule_delayed_work() keventd_up() Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
Rename @id argument of for_each_pool() to @pi so that it doesn't get reused accidentally when for_each_pool() is used in combination with other iterators. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
* Update incorrect and add missing synchronization labels. * Update incorrect or misleading comments. Add new comments where clarification is necessary. Reformat / rephrase some comments. * drain_workqueue() can be used separately from destroy_workqueue() but its warning message was incorrectly referring to destruction. Other than the warning message change, this patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
Since 9e8cd2f5 ("workqueue: implement apply_workqueue_attrs()"), init_and_link_pwq() may be called to initialize a new pool_workqueue for a workqueue which is already online, but the function was setting pwq->max_active to wq->saved_max_active without proper synchronization. Fix it by calling pwq_adjust_max_active() under proper locking instead of manually setting max_active. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
Rename pwq_set_max_active() to pwq_adjust_max_active() and move pool_workqueue->max_active synchronization and max_active determination logic into it. The new function should be called with workqueue_lock held for stable workqueue->saved_max_active, determines the current max_active value the target pool_workqueue should be using from @wq->saved_max_active and the state of the associated pool, and applies it with proper synchronization. The current two users - workqueue_set_max_active() and thaw_workqueues() - are updated accordingly. In addition, the manual freezing handling in __alloc_workqueue_key() and freeze_workqueues_begin() are replaced with calls to pwq_adjust_max_active(). This centralizes max_active handling so that it's less error-prone. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
pwq_set_max_active() is gonna be modified and used during pool_workqueue init. Move it above init_and_link_pwq(). This patch is pure code reorganization and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
Implement a function which queries whether it currently is running off a workqueue rescuer. This will be used to convert writeback to workqueue. Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 12 Mar, 2013 12 commits
-
-
Tejun Heo authored
There are cases where workqueue users want to expose control knobs to userland. e.g. Unbound workqueues with custom attributes are scheduled to be used for writeback workers and depending on configuration it can be useful to allow admins to tinker with the priority or allowed CPUs. This patch implements workqueue_sysfs_register(), which makes the workqueue visible under /sys/bus/workqueue/devices/WQ_NAME. There currently are two attributes common to both per-cpu and unbound pools and extra attributes for unbound pools including nice level and cpumask. If alloc_workqueue*() is called with WQ_SYSFS, workqueue_sysfs_register() is called automatically as part of workqueue creation. This is the preferred method unless the workqueue user wants to apply workqueue_attrs before making the workqueue visible to userland. v2: Disallow exposing ordered workqueues as ordered workqueues can't be tuned in any way. Signed-off-by: Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
-
Tejun Heo authored
Kay tells me the most appropriate place to expose workqueues to userland would be /sys/devices/virtual/workqueues/WQ_NAME which is symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking a way to do that outside of driver core as virtual_device_parent() isn't exported and there's no inteface to conveniently create a virtual subsystem. This patch implements subsys_virtual_register() by factoring out subsys_register() from subsys_system_register() and using it with virtual_device_parent() as the origin directory. It's identical to subsys_system_register() other than the origin directory but we aren't gonna restrict the device names which should be used under it. This will be used to expose workqueue attributes to userland. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kay Sievers <kay.sievers@vrfy.org>
-
Tejun Heo authored
We have cpulist_parse() but not cpumask_parse(). Implement it using bitmap_parse(). bitmap_parse() is weird in that it takes @len for a string in kernel-memory which also is inconsistent with bitmap_parselist(). Make cpumask_parse() calculate the length and don't expose the inconsistency to cpumask users. Maybe we can fix up bitmap_parse() later. This will be used to expose workqueue cpumask knobs to userland via sysfs. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
Tejun Heo authored
Adjusting max_active of or applying new workqueue_attrs to an ordered workqueue breaks its ordering guarantee. The former is obvious. The latter is because applying attrs creates a new pwq (pool_workqueue) and there is no ordering constraint between the old and new pwqs. Make apply_workqueue_attrs() and workqueue_set_max_active() trigger WARN_ON() if those operations are requested on an ordered workqueue and fail / ignore respectively. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
We're gonna add another internal WQ flag. Let's make the distinction clear. Prefix WQ_DRAINING with __ and move it to bit 16. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
Implement apply_workqueue_attrs() which applies workqueue_attrs to the specified unbound workqueue by creating a new pwq (pool_workqueue) linked to worker_pool with the specified attributes. A new pwq is linked at the head of wq->pwqs instead of tail and __queue_work() verifies that the first unbound pwq has positive refcnt before choosing it for the actual queueing. This is to cover the case where creation of a new pwq races with queueing. As base ref on a pwq won't be dropped without making another pwq the first one, __queue_work() is guaranteed to make progress and not add work item to a dead pwq. init_and_link_pwq() is updated to return the last first pwq the new pwq replaced, which is put by apply_workqueue_attrs(). Note that apply_workqueue_attrs() is almost identical to unbound pwq part of alloc_and_link_pwqs(). The only difference is that there is no previous first pwq. apply_workqueue_attrs() is implemented to handle such cases and replaces unbound pwq handling in alloc_and_link_pwqs(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
Because per-cpu workqueues have multiple pwqs (pool_workqueues) to serve the CPUs, to guarantee that a single work item isn't queued on one pwq while still executing another, __queue_work() takes a look at the previous pool the target work item was on and if it's still executing there, queue the work item on that pool. To support changing workqueue_attrs on the fly, unbound workqueues too will have multiple pwqs and thus need non-reentrancy test when queueing. This patch modifies __queue_work() such that the reentrancy test is performed regardless of the workqueue type. per_cpu_ptr(wq->cpu_pwqs, cpu) used to be used to determine the matching pwq for the last pool. This can't be used for unbound workqueues and is replaced with worker->current_pwq which also happens to be simpler. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
Unbound pwqs (pool_workqueues) will be dynamically created and destroyed with the scheduled unbound workqueue w/ custom attributes support. This patch synchronizes pwq linking and unlinking against flush_workqueue() so that its operation isn't disturbed by pwqs coming and going. Linking and unlinking a pwq into wq->pwqs is now protected also by wq->flush_mutex and a new pwq's work_color is initialized to wq->work_color during linking. This ensures that pwqs changes don't disturb flush_workqueue() in progress and the new pwq's work coloring stays in sync with the rest of the workqueue. flush_mutex during unlinking isn't strictly necessary but it's simpler to do it anyway. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
Add pool_workqueue->refcnt along with get/put_pwq(). Both per-cpu and unbound pwqs have refcnts and any work item inserted on a pwq increments the refcnt which is dropped when the work item finishes. For per-cpu pwqs the base ref is never dropped and destroy_workqueue() frees the pwqs as before. For unbound ones, destroy_workqueue() simply drops the base ref on the first pwq. When the refcnt reaches zero, pwq_unbound_release_workfn() is scheduled on system_wq, which unlinks the pwq, puts the associated pool and frees the pwq and wq as necessary. This needs to be done from a work item as put_pwq() needs to be protected by pool->lock but release can't happen with the lock held - e.g. put_unbound_pool() involves blocking operations. Unbound pool->locks are marked with lockdep subclas 1 as put_pwq() will schedule the release work item on system_wq while holding the unbound pool's lock and triggers recursive locking warning spuriously. This will be used to implement dynamic creation and destruction of unbound pwqs. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
* Move initialization and linking of pool_workqueues into init_and_link_pwq(). * Make the failure path use destroy_workqueue() once pool_workqueue initialization succeeds. These changes are to prepare for dynamic management of pool_workqueues and don't introduce any functional changes. While at it, convert list_del(&wq->list) to list_del_init() as a precaution as scheduled changes will make destruction more complex. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-
Tejun Heo authored
WQ_RESCUER is superflous. WQ_MEM_RECLAIM indicates that the user wants a rescuer and testing wq->rescuer for NULL can answer whether a given workqueue has a rescuer or not. Drop WQ_RESCUER and test wq->rescuer directly. This will help simplifying __alloc_workqueue_key() failure path by allowing it to use destroy_workqueue() on a partially constructed workqueue, which in turn will help implementing dynamic management of pool_workqueues. While at it, clear wq->rescuer after freeing it in destroy_workqueue(). This is a precaution as scheduled changes will make destruction more complex. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
-