• Roman Gushchin's avatar
    mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction · a264df74
    Roman Gushchin authored
    Christian reported a warning like the following obtained during running
    some KVM-related tests on s390:
    
        WARNING: CPU: 8 PID: 208 at lib/percpu-refcount.c:108 percpu_ref_exit+0x50/0x58
        Modules linked in: kvm(-) xt_CHECKSUM xt_MASQUERADE bonding xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_na>
        CPU: 8 PID: 208 Comm: kworker/8:1 Not tainted 5.2.0+ #66
        Hardware name: IBM 2964 NC9 712 (LPAR)
        Workqueue: events sysfs_slab_remove_workfn
        Krnl PSW : 0704e00180000000 0000001529746850 (percpu_ref_exit+0x50/0x58)
                   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
        Krnl GPRS: 00000000ffff8808 0000001529746740 000003f4e30e8e18 0036008100000000
                   0000001f00000000 0035008100000000 0000001fb3573ab8 0000000000000000
                   0000001fbdb6de00 0000000000000000 0000001529f01328 0000001fb3573b00
                   0000001fbb27e000 0000001fbdb69300 000003e009263d00 000003e009263cd0
        Krnl Code: 0000001529746842: f0a0000407fe        srp        4(11,%r0),2046,0
                   0000001529746848: 47000700            bc         0,1792
                  #000000152974684c: a7f40001            brc        15,152974684e
                  >0000001529746850: a7f4fff2            brc        15,1529746834
                   0000001529746854: 0707                bcr        0,%r7
                   0000001529746856: 0707                bcr        0,%r7
                   0000001529746858: eb8ff0580024        stmg       %r8,%r15,88(%r15)
                   000000152974685e: a738ffff            lhi        %r3,-1
        Call Trace:
        ([<000003e009263d00>] 0x3e009263d00)
         [<00000015293252ea>] slab_kmem_cache_release+0x3a/0x70
         [<0000001529b04882>] kobject_put+0xaa/0xe8
         [<000000152918cf28>] process_one_work+0x1e8/0x428
         [<000000152918d1b0>] worker_thread+0x48/0x460
         [<00000015291942c6>] kthread+0x126/0x160
         [<0000001529b22344>] ret_from_fork+0x28/0x30
         [<0000001529b2234c>] kernel_thread_starter+0x0/0x10
        Last Breaking-Event-Address:
         [<000000152974684c>] percpu_ref_exit+0x4c/0x58
        ---[ end trace b035e7da5788eb09 ]---
    
    The problem occurs because kmem_cache_destroy() is called immediately
    after deleting of a memcg, so it races with the memcg kmem_cache
    deactivation.
    
    flush_memcg_workqueue() at the beginning of kmem_cache_destroy() is
    supposed to guarantee that all deactivation processes are finished, but
    failed to do so.  It waits for an rcu grace period, after which all
    children kmem_caches should be deactivated.  During the deactivation
    percpu_ref_kill() is called for non root kmem_cache refcounters, but it
    requires yet another rcu grace period to finish the transition to the
    atomic (dead) state.
    
    So in a rare case when not all children kmem_caches are destroyed at the
    moment when the root kmem_cache is about to be gone, we need to wait
    another rcu grace period before destroying the root kmem_cache.
    
    This issue can be triggered only with dynamically created kmem_caches
    which are used with memcg accounting.  In this case per-memcg child
    kmem_caches are created.  They are deactivated from the cgroup removing
    path.  If the destruction of the root kmem_cache is racing with the
    removal of the cgroup (both are quite complicated multi-stage
    processes), the described issue can occur.  The only known way to
    trigger it in the real life, is to unload some kernel module which
    creates a dedicated kmem_cache, used from different memory cgroups with
    GFP_ACCOUNT flag.  If the unloading happens immediately after calling
    rmdir on the corresponding cgroup, there is some chance to trigger the
    issue.
    
    Link: http://lkml.kernel.org/r/20191129025011.3076017-1-guro@fb.com
    Fixes: f0a3a24b ("mm: memcg/slab: rework non-root kmem_cache lifecycle management")
    Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
    Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
    Tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a264df74
slab_common.c 44.6 KB