mm/hugetlb_vmemmap: batch HVO work when demoting
Batch the HVO work, including de-HVO of the source and HVO of the destination hugeTLB folios, to speed up demotion. After commit bd225530 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers"), each request of HVO or de-HVO, batched or not, invokes synchronize_rcu() once. For example, when not batched, demoting one 1GB hugeTLB folio to 512 2MB hugeTLB folios invokes synchronize_rcu() 513 times (1 de-HVO plus 512 HVO requests), whereas when batched, only twice (1 de-HVO plus 1 HVO request). And the performance difference between the two cases is significant, e.g., echo 2048kB >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote_size time echo 100 >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote Before this patch: real 8m58.158s user 0m0.009s sys 0m5.900s After this patch: real 0m0.900s user 0m0.000s sys 0m0.851s Note that this patch changes the behavior of the `demote` interface when de-HVO fails. Before, the interface aborts immediately upon failure; now, it tries to finish an entire batch, meaning it can make extra progress if the rest of the batch contains folios that do not need to de-HVO. Link: https://lkml.kernel.org/r/20240812224823.3914837-1-yuzhao@google.com Fixes: bd225530 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Signed-off-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Showing
Please register or sign in to comment