• Chunguang Xu's avatar
    memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event · 7d36665a
    Chunguang Xu authored
    An eventfd monitors multiple memory thresholds of the cgroup, closes them,
    the kernel deletes all events related to this eventfd.  Before all events
    are deleted, another eventfd monitors the memory threshold of this cgroup,
    leading to a crash:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000004
      #PF: supervisor write access in kernel mode
      #PF: error_code(0x0002) - not-present page
      PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
      Oops: 0002 [#1] SMP PTI
      CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
      Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
      Workqueue: events memcg_event_remove
      RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
      RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
      RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
      RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
      R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
      R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
      Call Trace:
        memcg_event_remove+0x32/0x90
        process_one_work+0x172/0x380
        worker_thread+0x49/0x3f0
        kthread+0xf8/0x130
        ret_from_fork+0x35/0x40
      CR2: 0000000000000004
    
    We can reproduce this problem in the following ways:
    
    1. We create a new cgroup subdirectory and a new eventfd, and then we
       monitor multiple memory thresholds of the cgroup through this eventfd.
    
    2.  closing this eventfd, and __mem_cgroup_usage_unregister_event ()
       will be called multiple times to delete all events related to this
       eventfd.
    
    The first time __mem_cgroup_usage_unregister_event() is called, the
    kernel will clear all items related to this eventfd in thresholds->
    primary.
    
    Since there is currently only one eventfd, thresholds-> primary becomes
    empty, so the kernel will set thresholds-> primary and hresholds-> spare
    to NULL.  If at this time, the user creates a new eventfd and monitor
    the memory threshold of this cgroup, kernel will re-initialize
    thresholds-> primary.
    
    Then when __mem_cgroup_usage_unregister_event () is called for the
    second time, because thresholds-> primary is not empty, the system will
    access thresholds-> spare, but thresholds-> spare is NULL, which will
    trigger a crash.
    
    In general, the longer it takes to delete all events related to this
    eventfd, the easier it is to trigger this problem.
    
    The solution is to check whether the thresholds associated with the
    eventfd has been cleared when deleting the event.  If so, we do nothing.
    
    [akpm@linux-foundation.org: fix comment, per Kirill]
    Fixes: 907860ed ("cgroups: make cftype.unregister_event() void-returning")
    Signed-off-by: default avatarChunguang Xu <brookxu@tencent.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: <stable@vger.kernel.org>
    Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7d36665a
memcontrol.c 184 KB