1. 25 Oct, 2023 34 commits
  2. 18 Oct, 2023 6 commits
    • Lorenzo Stoakes's avatar
      mm: perform the mapping_map_writable() check after call_mmap() · 15897894
      Lorenzo Stoakes authored
      In order for a F_SEAL_WRITE sealed memfd mapping to have an opportunity to
      clear VM_MAYWRITE, we must be able to invoke the appropriate
      vm_ops->mmap() handler to do so.  We would otherwise fail the
      mapping_map_writable() check before we had the opportunity to avoid it.
      
      This patch moves this check after the call_mmap() invocation.  Only memfd
      actively denies write access causing a potential failure here (in
      memfd_add_seals()), so there should be no impact on non-memfd cases.
      
      This patch makes the userland-visible change that MAP_SHARED, PROT_READ
      mappings of an F_SEAL_WRITE sealed memfd mapping will now succeed.
      
      There is a delicate situation with cleanup paths assuming that a writable
      mapping must have occurred in circumstances where it may now not have.  In
      order to ensure we do not accidentally mark a writable file unwritable by
      mistake, we explicitly track whether we have a writable mapping and unmap
      only if we do.
      
      [lstoakes@gmail.com: do not set writable_file_mapping in inappropriate case]
        Link: https://lkml.kernel.org/r/c9eb4cc6-7db4-4c2b-838d-43a0b319a4f0@lucifer.local
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
      Link: https://lkml.kernel.org/r/55e413d20678a1bb4c7cce889062bbb07b0df892.1697116581.git.lstoakes@gmail.comSigned-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      15897894
    • Lorenzo Stoakes's avatar
      mm: update memfd seal write check to include F_SEAL_WRITE · 28464bbb
      Lorenzo Stoakes authored
      The seal_check_future_write() function is called by shmem_mmap() or
      hugetlbfs_file_mmap() to disallow any future writable mappings of an memfd
      sealed this way.
      
      The F_SEAL_WRITE flag is not checked here, as that is handled via the
      mapping->i_mmap_writable mechanism and so any attempt at a mapping would
      fail before this could be run.
      
      However we intend to change this, meaning this check can be performed for
      F_SEAL_WRITE mappings also.
      
      The logic here is equally applicable to both flags, so update this
      function to accommodate both and rename it accordingly.
      
      Link: https://lkml.kernel.org/r/913628168ce6cce77df7d13a63970bae06a526e0.1697116581.git.lstoakes@gmail.comSigned-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      28464bbb
    • Lorenzo Stoakes's avatar
      mm: drop the assumption that VM_SHARED always implies writable · e8e17ee9
      Lorenzo Stoakes authored
      Patch series "permit write-sealed memfd read-only shared mappings", v4.
      
      The man page for fcntl() describing memfd file seals states the following
      about F_SEAL_WRITE:-
      
          Furthermore, trying to create new shared, writable memory-mappings via
          mmap(2) will also fail with EPERM.
      
      With emphasis on 'writable'.  In turns out in fact that currently the
      kernel simply disallows all new shared memory mappings for a memfd with
      F_SEAL_WRITE applied, rendering this documentation inaccurate.
      
      This matters because users are therefore unable to obtain a shared mapping
      to a memfd after write sealing altogether, which limits their usefulness. 
      This was reported in the discussion thread [1] originating from a bug
      report [2].
      
      This is a product of both using the struct address_space->i_mmap_writable
      atomic counter to determine whether writing may be permitted, and the
      kernel adjusting this counter when any VM_SHARED mapping is performed and
      more generally implicitly assuming VM_SHARED implies writable.
      
      It seems sensible that we should only update this mapping if VM_MAYWRITE
      is specified, i.e.  whether it is possible that this mapping could at any
      point be written to.
      
      If we do so then all we need to do to permit write seals to function as
      documented is to clear VM_MAYWRITE when mapping read-only.  It turns out
      this functionality already exists for F_SEAL_FUTURE_WRITE - we can
      therefore simply adapt this logic to do the same for F_SEAL_WRITE.
      
      We then hit a chicken and egg situation in mmap_region() where the check
      for VM_MAYWRITE occurs before we are able to clear this flag.  To work
      around this, perform this check after we invoke call_mmap(), with careful
      consideration of error paths.
      
      Thanks to Andy Lutomirski for the suggestion!
      
      [1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/
      [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238
      
      
      This patch (of 3):
      
      There is a general assumption that VMAs with the VM_SHARED flag set are
      writable.  If the VM_MAYWRITE flag is not set, then this is simply not the
      case.
      
      Update those checks which affect the struct address_space->i_mmap_writable
      field to explicitly test for this by introducing
      [vma_]is_shared_maywrite() helper functions.
      
      This remains entirely conservative, as the lack of VM_MAYWRITE guarantees
      that the VMA cannot be written to.
      
      Link: https://lkml.kernel.org/r/cover.1697116581.git.lstoakes@gmail.com
      Link: https://lkml.kernel.org/r/d978aefefa83ec42d18dfa964ad180dbcde34795.1697116581.git.lstoakes@gmail.comSigned-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Suggested-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e8e17ee9
    • SeongJae Park's avatar
      Docs/admin-guide/mm/damon/usage: update for tried regions update time interval · bc17ea26
      SeongJae Park authored
      The documentation says DAMOS tried regions update feature of DAMON sysfs
      interface is doing the update for one aggregation interval after the
      request is made.  Since the introduction of the per-scheme apply interval,
      that behavior makes no much sense.  Hence the implementation has changed
      to update the regions for each scheme for only its apply interval. 
      Further update the document to reflect the real behavior.
      
      Link: https://lkml.kernel.org/r/20231012192256.33556-4-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bc17ea26
    • SeongJae Park's avatar
      mm/damon/sysfs: avoid empty scheme tried regions for large apply interval · 76126332
      SeongJae Park authored
      DAMON_SYSFS assumes all schemes will be applied for at least one DAMON
      monitoring results snapshot within one aggregation interval, or makes no
      sense to wait for it while DAMON is deactivated by the watermarks.  That
      for deactivated status still makes sense, but the aggregation interval
      based assumption is invalid now because each scheme can has its own apply
      interval.  For schemes having larger than the aggregation or watermarks
      check interval, DAMOS tried regions update request can be finished without
      the update.  Avoid the case by explicitly checking the status of the
      schemes tried regions update and watermarks based DAMON deactivation.
      
      Link: https://lkml.kernel.org/r/20231012192256.33556-3-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      76126332
    • SeongJae Park's avatar
      mm/damon/sysfs-schemes: do not update tried regions more than one DAMON snapshot · 4d4e41b6
      SeongJae Park authored
      Patch series "mm/damon/sysfs-schemes: Do DAMOS tried regions update for
      only one apply interval".
      
      DAMOS tried regions update feature of DAMON sysfs interface is doing the
      update for one aggregation interval after the request is made.  Since the
      per-scheme apply interval is supported, that behavior makes no much sense.
      That is, the tried regions directory will have regions from multiple
      DAMON monitoring results snapshots, or no region for apply intervals that
      much shorter than, or longer than the aggregation interval, respectively. 
      Update the behavior to update the regions for each scheme for only its
      apply interval, and update the document.
      
      Since DAMOS apply interval is the aggregation by default, this change
      makes no visible behavioral difference to old users who don't explicitly
      set the apply intervals.
      
      Patches Sequence
      ----------------
      
      The first two patches makes schemes of apply intervals that much shorter
      or longer than the aggregation interval to keep the maximum and minimum
      times for continuing the update.  After the two patches, the update aligns
      with the each scheme's apply interval.
      
      Finally, the third patch updates the document to reflect the behavior.
      
      
      This patch (of 3):
      
      DAMON_SYSFS exposes every DAMON-found region that eligible for applying
      the scheme action for one aggregation interval.  However, each DAMON-based
      operation scheme has its own apply interval.  Hence, for a scheme that
      having its apply interval much smaller than the aggregation interval,
      DAMON_SYSFS will expose the scheme regions that applied to more than one
      DAMON monitoring results snapshots.  Since the purpose of DAMON tried
      regions is exposing single snapshot, this makes no much sense.  Track
      progress of each scheme's tried regions update and avoid the case.
      
      Link: https://lkml.kernel.org/r/20231012192256.33556-1-sj@kernel.org
      Link: https://lkml.kernel.org/r/20231012192256.33556-2-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4d4e41b6