• Anton Vorontsov's avatar
    memcg: add memory.pressure_level events · 70ddf637
    Anton Vorontsov authored
    With this patch userland applications that want to maintain the
    interactivity/memory allocation cost can use the pressure level
    notifications.  The levels are defined like this:
    
    The "low" level means that the system is reclaiming memory for new
    allocations.  Monitoring this reclaiming activity might be useful for
    maintaining cache level.  Upon notification, the program (typically
    "Activity Manager") might analyze vmstat and act in advance (i.e.
    prematurely shutdown unimportant services).
    
    The "medium" level means that the system is experiencing medium memory
    pressure, the system might be making swap, paging out active file
    caches, etc.  Upon this event applications may decide to further analyze
    vmstat/zoneinfo/memcg or internal memory usage statistics and free any
    resources that can be easily reconstructed or re-read from a disk.
    
    The "critical" level means that the system is actively thrashing, it is
    about to out of memory (OOM) or even the in-kernel OOM killer is on its
    way to trigger.  Applications should do whatever they can to help the
    system.  It might be too late to consult with vmstat or any other
    statistics, so it's advisable to take an immediate action.
    
    The events are propagated upward until the event is handled, i.e.  the
    events are not pass-through.  Here is what this means: for example you
    have three cgroups: A->B->C.  Now you set up an event listener on
    cgroups A, B and C, and suppose group C experiences some pressure.  In
    this situation, only group C will receive the notification, i.e.  groups
    A and B will not receive it.  This is done to avoid excessive
    "broadcasting" of messages, which disturbs the system and which is
    especially bad if we are low on memory or thrashing.  So, organize the
    cgroups wisely, or propagate the events manually (or, ask us to
    implement the pass-through events, explaining why would you need them.)
    
    Performance wise, the memory pressure notifications feature itself is
    lightweight and does not require much of bookkeeping, in contrast to the
    rest of memcg features.  Unfortunately, as of current memcg
    implementation, pages accounting is an inseparable part and cannot be
    turned off.  The good news is that there are some efforts[1] to improve
    the situation; plus, implementing the same, fully API-compatible[2]
    interface for CONFIG_MEMCG=n case (e.g.  embedded) is also a viable
    option, so it will not require any changes on the userland side.
    
    [1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291
    [2] http://lkml.org/lkml/2013/2/21/454
    
    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix CONFIG_CGROPUPS=n warnings]
    Signed-off-by: default avatarAnton Vorontsov <anton.vorontsov@linaro.org>
    Acked-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
    Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Glauber Costa <glommer@parallels.com>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Luiz Capitulino <lcapitulino@redhat.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Leonid Moiseichuk <leonid.moiseichuk@nokia.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
    Cc: John Stultz <john.stultz@linaro.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    70ddf637
vmscan.c 100 KB