• Li Zefan's avatar
    cgroup: fix a race condition in manipulating tsk->cg_list · 0e04388f
    Li Zefan authored
    When I ran a test program to fork mass processes and at the same time
    'cat /cgroup/tasks', I got the following oops:
    
      ------------[ cut here ]------------
      kernel BUG at lib/list_debug.c:72!
      invalid opcode: 0000 [#1] SMP
      Pid: 4178, comm: a.out Not tainted (2.6.25-rc9 #72)
      ...
      Call Trace:
       [<c044a5f9>] ? cgroup_exit+0x55/0x94
       [<c0427acf>] ? do_exit+0x217/0x5ba
       [<c0427ed7>] ? do_group_exit+0.65/0x7c
       [<c0427efd>] ? sys_exit_group+0xf/0x11
       [<c0404842>] ? syscall_call+0x7/0xb
       [<c05e0000>] ? init_cyrix+0x2fa/0x479
      ...
      EIP: [<c04df671>] list_del+0x35/0x53 SS:ESP 0068:ebc7df4
      ---[ end trace caffb7332252612b ]---
      Fixing recursive fault but reboot is needed!
    
    After digging into the code and debugging, I finlly found out a race
    situation:
    
    				do_exit()
    				  ->cgroup_exit()
    				    ->if (!list_empty(&tsk->cg_list))
    				        list_del(&tsk->cg_list);
    
      cgroup_iter_start()
        ->cgroup_enable_task_cg_list()
          ->list_add(&tsk->cg_list, ..);
    
    In this case the list won't be deleted though the process has exited.
    
    We got two bug reports in the past, which seem to be the same bug as
    this one:
    	http://lkml.org/lkml/2008/3/5/332
    	http://lkml.org/lkml/2007/10/17/224
    
    Actually sometimes I got oops on list_del, sometimes oops on list_add.
    And I can change my test program a bit to trigger other oops.
    
    The patch has been tested both on x86_32 and x86_64.
    Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
    Acked-by: default avatarPaul Menage <menage@google.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: stable@kernel.org
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0e04388f
cgroup.c 79.2 KB