1. 16 Oct, 2023 7 commits
  2. 06 Oct, 2023 13 commits
  3. 04 Oct, 2023 20 commits
    • Yin Fengwei's avatar
      mm: mlock: update mlock_pte_range to handle large folio · dc68badc
      Yin Fengwei authored
      Current kernel only lock base size folio during mlock syscall.
      Add large folio support with following rules:
        - Only mlock large folio when it's in VM_LOCKED VMA range
          and fully mapped to page table.
      
          fully mapped folio is required as if folio is not fully
          mapped to a VM_LOCKED VMA, if system is in memory pressure,
          page reclaim is allowed to pick up this folio, split it
          and reclaim the pages which are not in VM_LOCKED VMA.
      
        - munlock will apply to the large folio which is in VMA range
          or cross the VMA boundary.
      
          This is required to handle the case that the large folio is
          mlocked, later the VMA is split in the middle of large folio.
      
      Link: https://lkml.kernel.org/r/20230918073318.1181104-4-fengwei.yin@intel.comSigned-off-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dc68badc
    • Yin Fengwei's avatar
      mm: handle large folio when large folio in VM_LOCKED VMA range · 1acbc3f9
      Yin Fengwei authored
      If large folio is in the range of VM_LOCKED VMA, it should be mlocked to
      avoid being picked by page reclaim.  Which may split the large folio and
      then mlock each pages again.
      
      Mlock this kind of large folio to prevent them being picked by page
      reclaim.
      
      For the large folio which cross the boundary of VM_LOCKED VMA or not fully
      mapped to VM_LOCKED VMA, we'd better not to mlock it.  So if the system is
      under memory pressure, this kind of large folio will be split and the
      pages ouf of VM_LOCKED VMA can be reclaimed.
      
      Ideally, for large folio, we should mlock it when the large folio is fully
      mapped to VMA and munlock it if any page are unmampped from VMA.  But it's
      not easy to detect whether the large folio is fully mapped to VMA in some
      cases (like add/remove rmap).  So we update mlock_vma_folio() and
      munlock_vma_folio() to mlock/munlock the folio according to vma->vm_flags.
      Let caller to decide whether they should call these two functions.
      
      For add rmap, only mlock normal 4K folio and postpone large folio handling
      to page reclaim phase.  It is possible to reuse page table iterator to
      detect whether folio is fully mapped or not during page reclaim phase. 
      For remove rmap, invoke munlock_vma_folio() to munlock folio unconditionly
      because rmap makes folio not fully mapped to VMA.
      
      Link: https://lkml.kernel.org/r/20230918073318.1181104-3-fengwei.yin@intel.comSigned-off-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1acbc3f9
    • Yin Fengwei's avatar
      mm: add functions folio_in_range() and folio_within_vma() · 28e56657
      Yin Fengwei authored
      Patch series "support large folio for mlock", v3.
      
      Yu mentioned at [1] about the mlock() can't be applied to large folio.
      
      I leant the related code and here is my understanding:
      
      - For RLIMIT_MEMLOCK related, there is no problem.  Because the
        RLIMIT_MEMLOCK statistics is not related underneath page.  That means
        underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK
        statistics collection which is always correct.
      
      - For keeping the page in RAM, there is no problem either.  At least,
        during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit set in
        vm_flags, the folio will be kept whatever the folio is mlocked or not.
      
      So the function of mlock for large folio works.  But it's not optimized
      because the page reclaim needs scan these large folio and may split them.
      
      This series identified the large folio for mlock to four types:
        - The large folio is in VM_LOCKED range and fully mapped to the
          range
      
        - The large folio is in the VM_LOCKED range but not fully mapped to
          the range
      
        - The large folio cross VM_LOCKED VMA boundary
      
        - The large folio cross last level page table boundary
      
      For the first type, we mlock large folio so page reclaim will skip it.
      
      For the second/third type, we don't mlock large folio.  As the pages not
      mapped to VM_LOACKED range are mapped to none VM_LOCKED range, if system
      is in memory pressure situation, the large folio can be picked by page
      reclaim and split.  Then the pages not mapped to VM_LOCKED range can be
      reclaimed.
      
      For the fourth type, we don't mlock large folio because locking one page
      table lock can't prevent the part in another last level page table being
      unmapped.  Thanks to Ryan for pointing this out.
      
      
      To check whether the folio is fully mapped to the range, PTEs needs be
      checked to see whether the page of folio is associated.  Which needs take
      page table lock and is heavy operation.  So far, the only place needs this
      check is madvise and page reclaim.  These functions already have their own
      PTE iterator.
      
      patch1 introduce API to check whether large folio is in VMA range.
      patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support
             large folio mlock/munlock.
      patch3 make mlock/munlock syscall support large folio.
      
      Yu also mentioned a race which can make folio unevictable after munlock
      during RFC v2 discussion [3]:
      We decided that race issue didn't block this series based on:
        - That race issue was not introduced by this series
      
        - We had a looks-ok fix for that race issue. Need to wait
          for mlock_count fixing patch as Yosry Ahmed suggested [4]
      
      [1] https://lore.kernel.org/linux-mm/CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com/
      [2] https://lore.kernel.org/linux-mm/20230809061105.3369958-1-fengwei.yin@intel.com/
      [3] https://lore.kernel.org/linux-mm/CAOUHufZ6=9P_=CAOQyw0xw-3q707q-1FVV09dBNDC-hpcpj2Pg@mail.gmail.com/
      
      
      This patch (of 3):
      
      folio_in_range() will be used to check whether the folio is mapped to
      specific VMA and whether the mapping address of folio is in the range.
      
      Also a helper function folio_within_vma() to check whether folio
      is in the range of vma based on folio_in_range().
      
      Link: https://lkml.kernel.org/r/20230918073318.1181104-1-fengwei.yin@intel.com
      Link: https://lkml.kernel.org/r/20230918073318.1181104-2-fengwei.yin@intel.comSigned-off-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      28e56657
    • Jinjie Ruan's avatar
      mm/damon/core-test: fix memory leak in damon_new_ctx() · a0ce7925
      Jinjie Ruan authored
      When CONFIG_DAMON_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y and
      CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected.
      
      The damon_ctx which is allocated by kzalloc() in damon_new_ctx() in
      damon_test_ops_registration() and damon_test_set_attrs() are not freed. 
      So use damon_destroy_ctx() to free it.  After applying this patch, the
      following memory leak is never detected
      
          unreferenced object 0xffff2b49c6968800 (size 512):
            comm "kunit_try_catch", pid 350, jiffies 4294895294 (age 557.028s)
            hex dump (first 32 bytes):
              88 13 00 00 00 00 00 00 a0 86 01 00 00 00 00 00  ................
              00 87 93 03 00 00 00 00 0a 00 00 00 00 00 00 00  ................
            backtrace:
              [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368
              [<0000000073acab3b>] __kmem_cache_alloc_node+0x174/0x290
              [<00000000b5f89cef>] kmalloc_trace+0x40/0x164
              [<00000000eb19e83f>] damon_new_ctx+0x28/0xb4
              [<00000000daf6227b>] damon_test_ops_registration+0x34/0x328
              [<00000000559c4801>] kunit_try_run_case+0x50/0xac
              [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c
              [<000000003c3e9211>] kthread+0x124/0x130
              [<0000000028f85bdd>] ret_from_fork+0x10/0x20
          unreferenced object 0xffff2b49c1a9cc00 (size 512):
            comm "kunit_try_catch", pid 356, jiffies 4294895306 (age 557.000s)
            hex dump (first 32 bytes):
              88 13 00 00 00 00 00 00 a0 86 01 00 00 00 00 00  ................
              00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00  ................
            backtrace:
              [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368
              [<0000000073acab3b>] __kmem_cache_alloc_node+0x174/0x290
              [<00000000b5f89cef>] kmalloc_trace+0x40/0x164
              [<00000000eb19e83f>] damon_new_ctx+0x28/0xb4
              [<00000000058495c4>] damon_test_set_attrs+0x30/0x1a8
              [<00000000559c4801>] kunit_try_run_case+0x50/0xac
              [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c
              [<000000003c3e9211>] kthread+0x124/0x130
              [<0000000028f85bdd>] ret_from_fork+0x10/0x20
      
      Link: https://lkml.kernel.org/r/20230918120951.2230468-3-ruanjinjie@huawei.com
      Fixes: d1836a3b ("mm/damon/core-test: initialise context before test in damon_test_set_attrs()")
      Fixes: 4f540f5a ("mm/damon/core-test: add a kunit test case for ops registration")
      Signed-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Reviewed-by: default avatarFeng Tang <feng.tang@intel.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Brendan Higgins <brendan.higgins@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a0ce7925
    • Jinjie Ruan's avatar
      mm/damon/core-test: fix memory leak in damon_new_region() · f950fa6e
      Jinjie Ruan authored
      Patch series "mm/damon/core-test: Fix memory leaks in core-test", v3.
      
      There are a few memory leaks in core-test which are detected by kmemleak. 
      This patchset fixes the issues.
      
      
      This patch (of 2):
      
      When CONFIG_DAMON_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y
      and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected.
      
      The damon_region which is allocated by kmem_cache_alloc() in
      damon_new_region() in damon_test_regions() and
      damon_test_update_monitoring_result() are not freed.
      
      So for damon_test_regions(), replace damon_del_region() call with
      damon_destroy_region() so that it calls both damon_del_region() and
      damon_free_region(), the latter will free the damon_region. For
      damon_test_update_monitoring_result(), call damon_free_region() to
      free it. After applying this patch, the following memory leak is never
      detected.
      
          unreferenced object 0xffff2b49c3edc000 (size 56):
            comm "kunit_try_catch", pid 338, jiffies 4294895280 (age 557.084s)
            hex dump (first 32 bytes):
              01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00  ................
              00 00 00 00 00 00 00 00 00 00 00 00 49 2b ff ff  ............I+..
            backtrace:
              [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368
              [<00000000b528f67c>] kmem_cache_alloc+0x168/0x284
              [<000000008603f022>] damon_new_region+0x28/0x54
              [<00000000a3b8c64e>] damon_test_regions+0x38/0x270
              [<00000000559c4801>] kunit_try_run_case+0x50/0xac
              [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c
              [<000000003c3e9211>] kthread+0x124/0x130
              [<0000000028f85bdd>] ret_from_fork+0x10/0x20
          unreferenced object 0xffff2b49c5b20000 (size 56):
            comm "kunit_try_catch", pid 354, jiffies 4294895304 (age 556.988s)
            hex dump (first 32 bytes):
              03 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00  ................
              00 00 00 00 00 00 00 00 96 00 00 00 49 2b ff ff  ............I+..
            backtrace:
              [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368
              [<00000000b528f67c>] kmem_cache_alloc+0x168/0x284
              [<000000008603f022>] damon_new_region+0x28/0x54
              [<00000000ca019f80>] damon_test_update_monitoring_result+0x18/0x34
              [<00000000559c4801>] kunit_try_run_case+0x50/0xac
              [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c
              [<000000003c3e9211>] kthread+0x124/0x130
              [<0000000028f85bdd>] ret_from_fork+0x10/0x20
      
      Link: https://lkml.kernel.org/r/20230918120951.2230468-1-ruanjinjie@huawei.com
      Link: https://lkml.kernel.org/r/20230918120951.2230468-2-ruanjinjie@huawei.com
      Fixes: 17ccae8b ("mm/damon: add kunit tests")
      Fixes: f4c978b6 ("mm/damon/core-test: add a test for damon_update_monitoring_results()")
      Signed-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Brendan Higgins <brendan.higgins@linux.dev>
      Cc: Feng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f950fa6e
    • Jianguo Bao's avatar
      mm/writeback: update filemap_dirty_folio() comment · ab428b4c
      Jianguo Bao authored
      Change to use new address space operation dirty_folio().
      
      Link: https://lkml.kernel.org/r/20230917-trycontrib1-v1-1-db22630b8839@gmail.com
      Fixes: 6f31a5a2 ("fs: Add aops->dirty_folio")
      Signed-off-by: default avatarJianguo Bau <roidinev@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ab428b4c
    • SeongJae Park's avatar
      Docs/ABI/damon: update for DAMOS apply intervals · d57d36b5
      SeongJae Park authored
      Update DAMON ABI document for the newly added DAMON sysfs file for DAMOS
      apply intervals (apply_interval_us file).
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-10-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d57d36b5
    • SeongJae Park's avatar
      Docs/admin-guide/mm/damon/usage: update for DAMOS apply intervals · 033343d5
      SeongJae Park authored
      Update DAMON usage document's DAMON sysfs interface section for the newly
      added DAMOS apply intervals support (apply_interval_us file).
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-9-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      033343d5
    • SeongJae Park's avatar
      selftests/damon/sysfs: test DAMOS apply intervals · 65ded14e
      SeongJae Park authored
      Update DAMON selftests to test existence of the file for reading/writing
      DAMOS apply interval under each scheme directory.
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-8-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65ded14e
    • SeongJae Park's avatar
      mm/damon/sysfs-schemes: support DAMOS apply interval · a2a9f68e
      SeongJae Park authored
      Update DAMON sysfs interface to support DAMOS apply intervals by adding a
      new file, 'apply_interval_us' in each scheme directory.  Users can set and
      get the interval for each scheme in microseconds by writing to and reading
      from the file.
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-7-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a2a9f68e
    • SeongJae Park's avatar
      Docs/mm/damon/design: document DAMOS apply intervals · 3f8723f1
      SeongJae Park authored
      Update DAMON design doc to explain about DAMOS apply intervals.
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-6-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3f8723f1
    • SeongJae Park's avatar
      mm/damon/core: implement scheme-specific apply interval · 42f994b7
      SeongJae Park authored
      DAMON-based operation schemes are applied for every aggregation interval. 
      That was mainly because schemes were using nr_accesses, which be complete
      to be used for every aggregation interval.  However, the schemes are now
      using nr_accesses_bp, which is updated for each sampling interval in a way
      that reasonable to be used.  Therefore, there is no reason to apply
      schemes for each aggregation interval.
      
      The unnecessary alignment with aggregation interval was also making some
      use cases of DAMOS tricky.  Quotas setting under long aggregation interval
      is one such example.  Suppose the aggregation interval is ten seconds, and
      there is a scheme having CPU quota 100ms per 1s.  The scheme will actually
      uses 100ms per ten seconds, since it cannobe be applied before next
      aggregation interval.  The feature is working as intended, but the results
      might not that intuitive for some users.  This could be fixed by updating
      the quota to 1s per 10s.  But, in the case, the CPU usage of DAMOS could
      look like spikes, and would actually make a bad effect to other
      CPU-sensitive workloads.
      
      Implement a dedicated timing interval for each DAMON-based operation
      scheme, namely apply_interval.  The interval will be sampling interval
      aligned, and each scheme will be applied for its apply_interval.  The
      interval is set to 0 by default, and it means the scheme should use the
      aggregation interval instead.  This avoids old users getting any
      behavioral difference.
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-5-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      42f994b7
    • SeongJae Park's avatar
      mm/damon/core: use nr_accesses_bp as a source of damos_before_apply tracepoint · a72217ad
      SeongJae Park authored
      damos_before_apply tracepoint is exposing access rate of DAMON regions
      using nr_accesses field of regions, which was actually used by DAMOS in
      the past.  However, it has changed to use nr_accesses_bp instead.  Update
      the tracepoint to expose the value that DAMOS is really using.
      
      Note that it doesn't expose the value as is in the basis point, but after
      converting it to the natural number by dividing it by 10,000.  Therefore
      this change doesn't make user-visible behavioral differences.
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-4-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a72217ad
    • SeongJae Park's avatar
      mm/damon/sysfs-schemes: use nr_accesses_bp as the source of tried_regions/<N>/nr_accesses · e7639bb4
      SeongJae Park authored
      DAMON sysfs interface exposes access rate of each region via DAMOS tried
      regions directory.  For this, the nr_accesses field of the region is used.
      DAMOS was actually using nr_accesses in the past, but it uses
      nr_accesses_bp now.  Use the value that it is really using as the source.
      
      Note that this doesn't expose nr_accesses_bp as is (in basis point), but
      after converting it to the natural number by dividing the value by 10,000.
      Hence there is no behavioral change from users' perspective.
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-3-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e7639bb4
    • SeongJae Park's avatar
      mm/damon/core: make DAMOS uses nr_accesses_bp instead of nr_accesses · affa87c7
      SeongJae Park authored
      Patch series "mm/damon: implement DAMOS apply intervals".
      
      DAMON-based operation schemes are applied for every aggregation interval. 
      That is mainly because schemes are using nr_accesses, which be complete to
      be used for every aggregation interval.
      
      This makes some DAMOS use cases be tricky.  Quota setting under long
      aggregation interval is one such example.  Suppose the aggregation
      interval is ten seconds, and there is a scheme having CPU quota 100ms per
      1s.  The scheme will actually uses 100ms per ten seconds, since it cannobe
      be applied before next aggregation interval.  The feature is working as
      intended, but the results might not that intuitive for some users.  This
      could be fixed by updating the quota to 1s per 10s.  But, in the case, the
      CPU usage of DAMOS could look like spikes, and actually make a bad effect
      to other CPU-sensitive workloads.
      
      Also, with such huge aggregation interval, users may want schemes to be
      applied more frequently.
      
      DAMON provides nr_accesses_bp, which is updated for each sampling interval
      in a way that reasonable to be used.  By using that instead of
      nr_accesses, DAMOS can have its own time interval and mitigate abovely
      mentioned issues.
      
      This patchset makes DAMOS schemes to use nr_accesses_bp instead of
      nr_accesses, and have their own timing intervals.  Also update DAMOS tried
      regions sysfs files and DAMOS before_apply tracepoint to use the new data
      as their source.  Note that the interval is zero by default, and it is
      interpreted to use the aggregation interval instead.  This avoids making
      user-visible behavioral changes.
      
      
      Patches Seuqeunce
      -----------------
      
      The first patch (patch 1/9) makes DAMOS uses nr_accesses_bp instead of
      nr_accesses, and following two patches (patches 2/9 and 3/9) updates DAMON
      sysfs interface for DAMOS tried regions and the DAMOS before_apply
      tracespoint to use nr_accesses_bp instead of nr_accesses, respectively.
      
      The following two patches (patches 4/9 and 5/9) implements the
      scheme-specific apply interval for DAMON kernel API users and update the
      design document for the new feature.
      
      Finally, the following four patches (patches 6/9, 7/9, 8/9 and 9/9) add
      support of the feature in DAMON sysfs interface, add a simple selftest
      test case, and document the new file on the usage and the ABI documents,
      repsectively.
      
      
      This patch (of 9):
      
      DAMON provides nr_accesses_bp, which becomes same to nr_accesses * 10000
      for every aggregation interval, but updated every sampling interval with a
      reasonable accuracy.  Since DAMON-based operation schemes are applied in
      every aggregation interval using nr_accesses, using nr_accesses_bp instead
      will make no difference to users.  Meanwhile, it allows DAMOS to apply the
      schemes in a time interval that less than the aggregation interval.  It
      could be useful and more flexible for some cases.  Do it.
      
      Link: https://lkml.kernel.org/r/20230916020945.47296-1-sj@kernel.org
      Link: https://lkml.kernel.org/r/20230916020945.47296-2-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      affa87c7
    • Matthew Wilcox (Oracle)'s avatar
      hugetlb: convert remove_pool_huge_page() to remove_pool_hugetlb_folio() · d5b43e96
      Matthew Wilcox (Oracle) authored
      Convert the callers to expect a folio and remove the unnecesary conversion
      back to a struct page.
      
      Link: https://lkml.kernel.org/r/20230824141325.2704553-4-willy@infradead.orgSigned-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d5b43e96
    • Matthew Wilcox (Oracle)'s avatar
      hugetlb: remove a few calls to page_folio() · 04bbfd84
      Matthew Wilcox (Oracle) authored
      Anything found on a linked list threaded through ->lru is guaranteed to be
      a folio as the compound_head found in a tail page overlaps the ->lru
      member of struct page.  So we can pull folios directly off these lists no
      matter whether pages or folios were added to the list.
      
      Link: https://lkml.kernel.org/r/20230824141325.2704553-3-willy@infradead.orgSigned-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      04bbfd84
    • Matthew Wilcox (Oracle)'s avatar
      hugetlb: use a folio in free_hpage_workfn() · 3ec145f9
      Matthew Wilcox (Oracle) authored
      Patch series "Small hugetlb cleanups", v2.
      
      Some trivial folio conversions
      
      
      This patch (of 3):
      
      update_and_free_hugetlb_folio puts the memory on hpage_freelist as a folio
      so we can take it off the list as a folio.
      
      Link: https://lkml.kernel.org/r/20230824141325.2704553-1-willy@infradead.org
      Link: https://lkml.kernel.org/r/20230824141325.2704553-2-willy@infradead.orgSigned-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3ec145f9
    • Usama Arif's avatar
      mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO · fde1c4ec
      Usama Arif authored
      The new boot flow when it comes to initialization of gigantic pages is as
      follows:
      
      - At boot time, for a gigantic page during __alloc_bootmem_hugepage, the
        region after the first struct page is marked as noinit.
      
      - This results in only the first struct page to be initialized in
        reserve_bootmem_region.  As the tail struct pages are not initialized at
        this point, there can be a significant saving in boot time if HVO
        succeeds later on.
      
      - Later on in the boot, the head page is prepped and the first
        HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
        are initialized.
      
      - HVO is attempted.  If it is not successful, then the rest of the tail
        struct pages are initialized.  If it is successful, no more tail struct
        pages need to be initialized saving significant boot time.
      
      The WARN_ON for increased ref count in gather_bootmem_prealloc was changed
      to a VM_BUG_ON.  This is OK as there should be no speculative references
      this early in boot process.  The VM_BUG_ON's are there just in case such
      code is introduced.
      
      [akpm@linux-foundation.org: make it nicer for 80 cols]
      Link: https://lkml.kernel.org/r/20230913105401.519709-5-usama.arif@bytedance.comSigned-off-by: default avatarUsama Arif <usama.arif@bytedance.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Punit Agrawal <punit.agrawal@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fde1c4ec
    • Usama Arif's avatar
      memblock: introduce MEMBLOCK_RSRV_NOINIT flag · 77e6c43e
      Usama Arif authored
      For reserved memory regions marked with this flag, reserve_bootmem_region
      is not called during memmap_init_reserved_pages.  This can be used to
      avoid struct page initialization for regions which won't need them, for
      e.g.  hugepages with Hugepage Vmemmap Optimization enabled.
      
      Link: https://lkml.kernel.org/r/20230913105401.519709-4-usama.arif@bytedance.comSigned-off-by: default avatarUsama Arif <usama.arif@bytedance.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Punit Agrawal <punit.agrawal@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      77e6c43e