• Paul Jackson's avatar
    [PATCH] cpuset release ABBA deadlock fix · 3077a260
    Paul Jackson authored
    Fix possible cpuset_sem ABBA deadlock if 'notify_on_release' set.
    
    For a particular usage pattern, creating and destroying cpusets fairly
    frequently using notify_on_release, on a very large system, this deadlock
    can be seen every few days.  If you are not using the cpuset
    notify_on_release feature, you will never see this deadlock.
    
    The existing code, on task exit (or cpuset deletion) did:
    
      get cpuset_sem
      if cpuset marked notify_on_release and is ready to release:
        compute cpuset path relative to /dev/cpuset mount point
        call_usermodehelper() forks /sbin/cpuset_release_agent with path
      drop cpuset_sem
    
    Unfortunately, the fork in call_usermodehelper can allocate memory, and
    allocating memory can require cpuset_sem, if the mems_generation values
    changed in the interim.  This results in an ABBA deadlock, trying to obtain
    cpuset_sem when it is already held by the current task.
    
    To fix this, I put the cpuset path (which must be computed while holding
    cpuset_sem) in a temporary buffer, to be used in the call_usermodehelper
    call of /sbin/cpuset_release_agent only _after_ dropping cpuset_sem.
    
    So the new logic is:
    
      get cpuset_sem
      if cpuset marked notify_on_release and is ready to release:
        compute cpuset path relative to /dev/cpuset mount point
        stash path in kmalloc'd buffer
      drop cpuset_sem
      call_usermodehelper() forks /sbin/cpuset_release_agent with path
      free path
    
    The sharp eyed reader might notice that this patch does not contain any
    calls to kmalloc.  The existing code in the check_for_release() routine was
    already kmalloc'ing a buffer to hold the cpuset path.  In the old code, it
    just held the buffer for a few lines, over the cpuset_release_agent() call
    that in turn invoked call_usermodehelper().  In the new code, with the
    application of this patch, it returns that buffer via the new char
    **ppathbuf parameter, for later use and freeing in cpuset_release_agent(),
    which is called after cpuset_sem is dropped.  Whereas the old code has just
    one call to cpuset_release_agent(), right in the check_for_release()
    routine, the new code has three calls to cpuset_release_agent(), from the
    various places that a cpuset can be released.
    
    This patch has been build and booted on SN2, and passed a stress test that
    previously hit the deadlock within a few seconds.
    Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    3077a260
cpuset.c 43.6 KB