• Joey Gouly's avatar
    mm: implement memory-deny-write-execute as a prctl · b507808e
    Joey Gouly authored
    Patch series "mm: In-kernel support for memory-deny-write-execute (MDWE)",
    v2.
    
    The background to this is that systemd has a configuration option called
    MemoryDenyWriteExecute [2], implemented as a SECCOMP BPF filter.  Its aim
    is to prevent a user task from inadvertently creating an executable
    mapping that is (or was) writeable.  Since such BPF filter is stateless,
    it cannot detect mappings that were previously writeable but subsequently
    changed to read-only.  Therefore the filter simply rejects any
    mprotect(PROT_EXEC).  The side-effect is that on arm64 with BTI support
    (Branch Target Identification), the dynamic loader cannot change an ELF
    section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().  For
    libraries, it can resort to unmapping and re-mapping but for the main
    executable it does not have a file descriptor.  The original bug report in
    the Red Hat bugzilla - [3] - and subsequent glibc workaround for libraries
    - [4].
    
    This series adds in-kernel support for this feature as a prctl
    PR_SET_MDWE, that is inherited on fork().  The prctl denies PROT_WRITE |
    PROT_EXEC mappings.  Like the systemd BPF filter it also denies adding
    PROT_EXEC to mappings.  However unlike the BPF filter it only denies it if
    the mapping didn't previous have PROT_EXEC.  This allows to PROT_EXEC ->
    PROT_EXEC | PROT_BTI with mprotect(), which is a problem with the BPF
    filter.
    
    
    This patch (of 2):
    
    The aim of such policy is to prevent a user task from creating an
    executable mapping that is also writeable.
    
    An example of mmap() returning -EACCESS if the policy is enabled:
    
    	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
    
    Similarly, mprotect() would return -EACCESS below:
    
    	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
    	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
    
    The BPF filter that systemd MDWE uses is stateless, and disallows
    mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
    be enabled if it was already PROT_EXEC, which allows the following case:
    
    	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
    	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
    
    where PROT_BTI enables branch tracking identification on arm64.
    
    Link: https://lkml.kernel.org/r/20230119160344.54358-1-joey.gouly@arm.com
    Link: https://lkml.kernel.org/r/20230119160344.54358-2-joey.gouly@arm.comSigned-off-by: default avatarJoey Gouly <joey.gouly@arm.com>
    Co-developed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Jeremy Linton <jeremy.linton@arm.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Lennart Poettering <lennart@poettering.net>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: nd <nd@arm.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
    Cc: Topi Miettinen <toiwoton@gmail.com>
    Cc: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
    Cc: David Hildenbrand <david@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    b507808e
mmap.c 100 KB