• Johannes Weiner's avatar
    mm: madvise: MADV_DONTNEED_LOCKED · 9457056a
    Johannes Weiner authored
    MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT
    and MCL_ONFAULT allowing to mlock without populating, there are valid use
    cases for depopulating locked ranges as well.
    
    Users mlock memory to protect secrets.  There are allocators for secure
    buffers that want in-use memory generally mlocked, but cleared and
    invalidated memory to give up the physical pages.  This could be done with
    explicit munlock -> mlock calls on free -> alloc of course, but that adds
    two unnecessary syscalls, heavy mmap_sem write locks, vma splits and
    re-merges - only to get rid of the backing pages.
    
    Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
    okay with on-demand initial population.  It seems valid to selectively
    free some memory during the lifetime of such a process, without having to
    mess with its overall policy.
    
    Why add a separate flag? Isn't this a pretty niche usecase?
    
    - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
      conceivable that someone, somewhere is relying on mlock to protect
      data from perhaps broader invalidation calls. Changing this behavior
      now could lead to quiet data corruption.
    
    - It also clarifies expectations around MADV_FREE and maybe
      MADV_REMOVE. It avoids the situation where one quietly behaves
      different than the others. MADV_FREE_LOCKED can be added later.
    
    - The combination of mlock() and madvise() in the first place is
      probably niche. But where it happens, I'd say that dropping pages
      from a locked region once they don't contain secrets or won't page
      anymore is much saner than relying on mlock to protect memory from
      speculative or errant invalidation calls. It's just that we can't
      change the default behavior because of the two previous points.
    
    Given that, an explicit new flag seems to make the most sense.
    
    [hannes@cmpxchg.org: fix mips build]
    
    Link: https://lkml.kernel.org/r/20220304171912.305060-1-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9457056a
mman.h 3.75 KB