• Tejun Heo's avatar
    cpuset: don't nest cgroup_mutex inside get_online_cpus() · 3a5a6d0c
    Tejun Heo authored
    CPU / memory hotplug path currently grabs cgroup_mutex from hotplug
    event notifications.  We want to separate cpuset locking from cgroup
    core and make cgroup_mutex outer to hotplug synchronization so that,
    among other things, mechanisms which depend on get_online_cpus() can
    be used from cgroup callbacks.  In general, we want to keep
    cgroup_mutex the outermost lock to minimize locking interactions among
    different controllers.
    
    Convert cpuset_handle_hotplug() to cpuset_hotplug_workfn() and
    schedule it from the hotplug notifications.  As the function can
    already handle multiple mixed events without any input, converting it
    to a work function is mostly trivial; however, one complication is
    that cpuset_update_active_cpus() needs to update sched domains
    synchronously to reflect an offlined cpu to avoid confusing the
    scheduler.  This is worked around by falling back to the the default
    single sched domain synchronously before scheduling the actual hotplug
    work.  This makes sched domain rebuilt twice per CPU hotplug event but
    the operation isn't that heavy and a lot of the second operation would
    be noop for systems w/ single sched domain, which is the common case.
    
    This decouples cpuset hotplug handling from the notification callbacks
    and there can be an arbitrary delay between the actual event and
    updates to cpusets.  Scheduler and mm can handle it fine but moving
    tasks out of an empty cpuset may race against writes to the cpuset
    restoring execution resources which can lead to confusing behavior.
    Flush hotplug work item from cpuset_write_resmask() to avoid such
    confusions.
    
    v2: Synchronous sched domain rebuilding using the fallback sched
        domain added.  This fixes various issues caused by confused
        scheduler putting tasks on a dead CPU, including the one reported
        by Li Zefan.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarLi Zefan <lizefan@huawei.com>
    3a5a6d0c
cpuset.c 75.6 KB