• Xiaochen Shen's avatar
    x86/resctrl: Fix a deadlock due to inaccurate reference · 334b0f4e
    Xiaochen Shen authored
    There is a race condition which results in a deadlock when rmdir and
    mkdir execute concurrently:
    
    $ ls /sys/fs/resctrl/c1/mon_groups/m1/
    cpus  cpus_list  mon_data  tasks
    
    Thread 1: rmdir /sys/fs/resctrl/c1
    Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1
    
    3 locks held by mkdir/48649:
     #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
     #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
     #2:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
    
    4 locks held by rmdir/48652:
     #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
     #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
     #2:  (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
     #3:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70
    
    Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
    kernfs_remove() removes all kernfs nodes under directory "c1"
    recursively, then waits for sub kernfs node "mon_groups" to drop active
    reference.
    
    Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
    directory. The wrapper kernfs_iop_mkdir() takes an active reference to
    the "mon_groups" directory but the code drops the active reference to
    the parent directory "c1" instead.
    
    As a result, Thread 1 is blocked on waiting for active reference to drop
    and never release rdtgroup_mutex, while Thread 2 is also blocked on
    trying to get rdtgroup_mutex.
    
    Thread 1 (rdtgroup_rmdir)   Thread 2 (rdtgroup_mkdir)
    (rmdir /sys/fs/resctrl/c1)  (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
    -------------------------   -------------------------
                                kernfs_iop_mkdir
                                  /*
                                   * kn: "m1", parent_kn: "mon_groups",
                                   * prgrp_kn: parent_kn->parent: "c1",
                                   *
                                   * "mon_groups", parent_kn->active++: 1
                                   */
                                  kernfs_get_active(parent_kn)
    kernfs_iop_rmdir
      /* "c1", kn->active++ */
      kernfs_get_active(kn)
    
      rdtgroup_kn_lock_live
        atomic_inc(&rdtgrp->waitcount)
        /* "c1", kn->active-- */
        kernfs_break_active_protection(kn)
        mutex_lock
    
      rdtgroup_rmdir_ctrl
        free_all_child_rdtgrp
          sentry->flags = RDT_DELETED
    
        rdtgroup_ctrl_remove
          rdtgrp->flags = RDT_DELETED
          kernfs_get(kn)
          kernfs_remove(rdtgrp->kn)
            __kernfs_remove
              /* "mon_groups", sub_kn */
              atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
              kernfs_drain(sub_kn)
                /*
                 * sub_kn->active == KN_DEACTIVATED_BIAS + 1,
                 * waiting on sub_kn->active to drop, but it
                 * never drops in Thread 2 which is blocked
                 * on getting rdtgroup_mutex.
                 */
    Thread 1 hangs here ---->
                wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
                ...
                                  rdtgroup_mkdir
                                    rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
                                      mkdir_rdt_prepare(parent_kn, prgrp_kn)
                                        rdtgroup_kn_lock_live(prgrp_kn)
                                          atomic_inc(&rdtgrp->waitcount)
                                          /*
                                           * "c1", prgrp_kn->active--
                                           *
                                           * The active reference on "c1" is
                                           * dropped, but not matching the
                                           * actual active reference taken
                                           * on "mon_groups", thus causing
                                           * Thread 1 to wait forever while
                                           * holding rdtgroup_mutex.
                                           */
                                          kernfs_break_active_protection(
                                                                   prgrp_kn)
                                          /*
                                           * Trying to get rdtgroup_mutex
                                           * which is held by Thread 1.
                                           */
    Thread 2 hangs here ---->             mutex_lock
                                          ...
    
    The problem is that the creation of a subdirectory in the "mon_groups"
    directory incorrectly releases the active protection of its parent
    directory instead of itself before it starts waiting for rdtgroup_mutex.
    This is triggered by the rdtgroup_mkdir() flow calling
    rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
    parent control group ("c1") as argument. It should be called with kernfs
    node "mon_groups" instead. What is currently missing is that the
    kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.
    
    Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
    it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
    instead. And then it operates on the same rdtgroup structure but handles
    the active reference of kernfs node "mon_groups" to prevent deadlock.
    The same changes are also made to the "mon_data" directories.
    
    This results in some unused function parameters that will be cleaned up
    in follow-up patch as the focus here is on the fix only in support of
    backporting efforts.
    
    Fixes: c7d9aac6 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
    Suggested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: default avatarXiaochen Shen <xiaochen.shen@intel.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
    Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
    334b0f4e
rdtgroup.c 76.7 KB