• Dave Chinner's avatar
    mm: lift gfp_kmemleak_mask() to gfp.h · 1c00f936
    Dave Chinner authored
    Patch series "mm: fix nested allocation context filtering".
    
    This patchset is the followup to the comment I made earlier today:
    
    https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/
    
    Tl;dr: Memory allocations that are done inside the public memory
    allocation API need to obey the reclaim recursion constraints placed on
    the allocation by the original caller, including the "don't track
    recursion for this allocation" case defined by __GFP_NOLOCKDEP.
    
    These nested allocations are generally in debug code that is tracking
    something about the allocation (kmemleak, KASAN, etc) and so are
    allocating private kernel objects that only that debug system will use.
    
    Neither the page-owner code nor the stack depot code get this right.  They
    also also clear GFP_ZONEMASK as a separate operation, which is completely
    redundant because the constraint filter applied immediately after
    guarantees that GFP_ZONEMASK bits are cleared.
    
    kmemleak gets this filtering right.  It preserves the allocation
    constraints for deadlock prevention and clears all other context flags
    whilst also ensuring that the nested allocation will fail quickly,
    silently and without depleting emergency kernel reserves if there is no
    memory available.
    
    This can be made much more robust, immune to whack-a-mole games and the
    code greatly simplified by lifting gfp_kmemleak_mask() to
    include/linux/gfp.h and using that everywhere.  Also document it so that
    there is no excuse for not knowing about it when writing new debug code
    that nests allocations.
    
    Tested with lockdep, KASAN + page_owner=on and kmemleak=on over multiple
    fstests runs with XFS.
    
    
    This patch (of 3):
    
    Any "internal" nested allocation done from within an allocation context
    needs to obey the high level allocation gfp_mask constraints.  This is
    necessary for debug code like KASAN, kmemleak, lockdep, etc that allocate
    memory for saving stack traces and other information during memory
    allocation.  If they don't obey things like __GFP_NOLOCKDEP or
    __GFP_NOWARN, they produce false positive failure detections.
    
    kmemleak gets this right by using gfp_kmemleak_mask() to pass through the
    relevant context flags to the nested allocation to ensure that the
    allocation follows the constraints of the caller context.
    
    KASAN recently was foudn to be missing __GFP_NOLOCKDEP due to stack depot
    allocations, and even more recently the page owner tracking code was also
    found to be missing __GFP_NOLOCKDEP support.
    
    We also don't wan't want KASAN or lockdep to drive the system into OOM
    kill territory by exhausting emergency reserves.  This is something that
    kmemleak also gets right by adding (__GFP_NORETRY | __GFP_NOMEMALLOC |
    __GFP_NOWARN) to the allocation mask.
    
    Hence it is clear that we need to define a common nested allocation filter
    mask for these sorts of third party nested allocations used in debug code.
    So to start this process, lift gfp_kmemleak_mask() to gfp.h and rename it
    to gfp_nested_mask(), and convert the kmemleak callers to use it.
    
    Link: https://lkml.kernel.org/r/20240430054604.4169568-1-david@fromorbit.com
    Link: https://lkml.kernel.org/r/20240430054604.4169568-2-david@fromorbit.comSigned-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarMarco Elver <elver@google.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    1c00f936
kmemleak.c 62.6 KB