• Kirill Tkhai's avatar
    mm: use special value SHRINKER_REGISTERING instead of list_empty() check · 7e010df5
    Kirill Tkhai authored
    The patch introduces a special value SHRINKER_REGISTERING to use instead
    of list_empty() to differ a registering shrinker from unregistered
    shrinker.  Why we need that at all?
    
    Shrinker registration is split in two parts.  The first one is
    prealloc_shrinker(), which allocates shrinker memory and reserves ID in
    shrinker_idr.  This function can fail.  The second is
    register_shrinker_prepared(), and it finalizes the registration.  This
    function actually makes shrinker available to be used from
    shrink_slab(), and it can't fail.
    
    One shrinker may be based on more then one LRU lists.  So, we never
    clear the bit in memcg shrinker maps, when (one of) corresponding LRU
    list becomes empty, since other LRU lists may be not empty.  See
    superblock shrinker for example: it is based on two LRU lists:
    s_inode_lru and s_dentry_lru.  We do not want to clear shrinker bit,
    when there are no inodes in s_inode_lru, as s_dentry_lru may contain
    dentries.
    
    Instead of that, we use special algorithm to detect shrinkers having no
    elements at all its LRU lists, and this is made in shrink_slab_memcg().
    See the comment in this function for the details.
    
    Also, in shrink_slab_memcg() we clear shrinker bit in the map, when we
    meet unregistered shrinker (bit is set, while there is no a shrinker in
    IDR).  Otherwise, we would have done that at the moment of shrinker
    unregistration for all memcgs (and this looks worse, since iteration
    over all memcg may take much time).  Also this would have imposed
    restrictions on shrinker unregistration order for its users: they would
    have had to guarantee, there are no new elements after
    unregister_shrinker() (otherwise, a new added element would have set a
    bit).
    
    So, if we meet a set bit in map and no shrinker in IDR when we're
    iterating over the map in shrink_slab_memcg(), this means the
    corresponding shrinker is unregistered, and we must clear the bit.
    
    Another case is shrinker registration.  We want two things there:
    
    1) do_shrink_slab() can be called only for completely registered
       shrinkers;
    
    2) shrinker internal lists may be populated in any order with
       register_shrinker_prepared() (let's talk on the example with sb).  Both
       of:
    
      a)list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru); [cpu0]
        memcg_set_shrinker_bit();                               [cpu0]
        ...
        register_shrinker_prepared();                           [cpu1]
    
      and
    
      b)register_shrinker_prepared();                           [cpu0]
        ...
        list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru); [cpu1]
        memcg_set_shrinker_bit();                               [cpu1]
    
       are legitimate.  We don't want to impose restriction here and to
       force people to use only (b) variant.  We don't want to force people to
       care, there is no elements in LRU lists before the shrinker is
       completely registered.  Internal users of LRU lists and shrinker code
       are two different subsystems, and they have to be closed in themselves
       each other.
    
    In (a) case we have the bit set before shrinker is completely
    registered.  We don't want do_shrink_slab() is called at this moment, so
    we have to detect such the registering shrinkers.
    
    Before this patch list_empty() (shrinker is not linked to the list)
    check was used for that.  So, in (a) there could be a bit set, but we
    don't call do_shrink_slab() unless shrinker is linked to the list.  It's
    just an indicator, I just overloaded linking to the list.
    
    This was not the best solution, since it's better not to touch the
    shrinker memory from shrink_slab_memcg() before it's completely
    registered (this also will be useful in the future to make shrink_slab()
    completely lockless).
    
    So, this patch introduces better way to detect registering shrinker,
    which allows not to dereference shrinker memory.  It's just a ~0UL
    value, which we insert into the IDR during ID allocation.  After
    shrinker is ready to be used, we insert actual shrinker pointer in the
    IDR, and it becomes available to shrink_slab_memcg().
    
    We can't use NULL instead of this new value for this purpose as:
    shrink_slab_memcg() already uses NULL to detect unregistered shrinkers,
    and we don't want the function sees NULL and clears the bit, otherwise
    (a) won't work.
    
    This is the only thing the patch makes: the better way to detect
    registering shrinker.  Nothing else this patch makes.
    
    Also this gives a better assembler, but it's minor side of the patch:
    
    Before:
      callq  <idr_find>
      mov    %rax,%r15
      test   %rax,%rax
      je     <shrink_slab_memcg+0x1d5>
      mov    0x20(%rax),%rax
      lea    0x20(%r15),%rdx
      cmp    %rax,%rdx
      je     <shrink_slab_memcg+0xbd>
      mov    0x8(%rsp),%edx
      mov    %r15,%rsi
      lea    0x10(%rsp),%rdi
      callq  <do_shrink_slab>
    
    After:
      callq  <idr_find>
      mov    %rax,%r15
      lea    -0x1(%rax),%rax
      cmp    $0xfffffffffffffffd,%rax
      ja     <shrink_slab_memcg+0x1cd>
      mov    0x8(%rsp),%edx
      mov    %r15,%rsi
      lea    0x10(%rsp),%rdi
      callq  ffffffff810cefd0 <do_shrink_slab>
    
    [ktkhai@virtuozzo.com: add #ifdef CONFIG_MEMCG_KMEM around idr_replace()]
      Link: http://lkml.kernel.org/r/758b8fec-7573-47eb-b26a-7b2847ae7b8c@virtuozzo.com
    Link: http://lkml.kernel.org/r/153355467546.11522.4518015068123480218.stgit@localhost.localdomainSigned-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
    Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Josef Bacik <jbacik@fb.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7e010df5
vmscan.c 120 KB