• Tejun Heo's avatar
    cpuset: make mm migration asynchronous · e93ad19d
    Tejun Heo authored
    If "cpuset.memory_migrate" is set, when a process is moved from one
    cpuset to another with a different memory node mask, pages in used by
    the process are migrated to the new set of nodes.  This was performed
    synchronously in the ->attach() callback, which is synchronized
    against process management.  Recently, the synchronization was changed
    from per-process rwsem to global percpu rwsem for simplicity and
    optimization.
    
    Combined with the synchronous mm migration, this led to deadlocks
    because mm migration could schedule a work item which may in turn try
    to create a new worker blocking on the process management lock held
    from cgroup process migration path.
    
    This heavy an operation shouldn't be performed synchronously from that
    deep inside cgroup migration in the first place.  This patch punts the
    actual migration to an ordered workqueue and updates cgroup process
    migration and cpuset config update paths to flush the workqueue after
    all locks are released.  This way, the operations still seem
    synchronous to userland without entangling mm migration with process
    management synchronization.  CPU hotplug can also invoke mm migration
    but there's no reason for it to wait for mm migrations and thus
    doesn't synchronize against their completions.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Reported-and-tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
    Cc: stable@vger.kernel.org # v4.4+
    e93ad19d
cpuset.c 76.1 KB