• Byungchul Park's avatar
    rcu: Add basic support for kfree_rcu() batching · a35d1690
    Byungchul Park authored
    Recently a discussion about stability and performance of a system
    involving a high rate of kfree_rcu() calls surfaced on the list [1]
    which led to another discussion how to prepare for this situation.
    
    This patch adds basic batching support for kfree_rcu(). It is "basic"
    because we do none of the slab management, dynamic allocation, code
    moving or any of the other things, some of which previous attempts did
    [2]. These fancier improvements can be follow-up patches and there are
    different ideas being discussed in those regards. This is an effort to
    start simple, and build up from there. In the future, an extension to
    use kfree_bulk and possibly per-slab batching could be done to further
    improve performance due to cache-locality and slab-specific bulk free
    optimizations. By using an array of pointers, the worker thread
    processing the work would need to read lesser data since it does not
    need to deal with large rcu_head(s) any longer.
    
    Torture tests follow in the next patch and show improvements of around
    5x reduction in number of  grace periods on a 16 CPU system. More
    details and test data are in that patch.
    
    There is an implication with rcu_barrier() with this patch. Since the
    kfree_rcu() calls can be batched, and may not be handed yet to the RCU
    machinery in fact, the monitor may not have even run yet to do the
    queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
    to wait for those kfree_rcu()s that are already made. So this means a
    kfree_rcu() followed by an rcu_barrier() does not imply that memory will
    be freed once rcu_barrier() returns.
    
    Another implication is higher active memory usage (although not
    run-away..) until the kfree_rcu() flooding ends, in comparison to
    without batching. More details about this are in the second patch which
    adds an rcuperf test.
    
    Finally, in the near future we will get rid of kfree_rcu() special casing
    within RCU such as in rcu_do_batch and switch everything to just
    batching. Currently we don't do that since timer subsystem is not yet up
    and we cannot schedule the kfree_rcu() monitor as the timer subsystem's
    lock are not initialized. That would also mean getting rid of
    kfree_call_rcu_nobatch() entirely.
    
    [1] http://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org
    [2] https://lkml.org/lkml/2017/12/19/824
    
    Cc: kernel-team@android.com
    Cc: kernel-team@lge.com
    Co-developed-by: default avatarByungchul Park <byungchul.park@lge.com>
    Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
    Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
    [ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ]
    [ paulmck: Make it work during early boot. ]
    [ paulmck: Add a crude early boot self-test. ]
    [ paulmck: Style adjustments and experimental docbook structure header. ]
    Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.9999.1908161931110.32497@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    a35d1690
tree.c 120 KB