• Ryan Roberts's avatar
    arm64/mm: Hoist synchronization out of set_ptes() loop · 3425cec4
    Ryan Roberts authored
    set_ptes() sets a physically contiguous block of memory (which all
    belongs to the same folio) to a contiguous block of ptes. The arm64
    implementation of this previously just looped, operating on each
    individual pte. But the __sync_icache_dcache() and mte_sync_tags()
    operations can both be hoisted out of the loop so that they are
    performed once for the contiguous set of pages (which may be less than
    the whole folio). This should result in minor performance gains.
    
    __sync_icache_dcache() already acts on the whole folio, and sets a flag
    in the folio so that it skips duplicate calls. But by hoisting the call,
    all the pte testing is done only once.
    
    mte_sync_tags() operates on each individual page with its own loop. But
    by passing the number of pages explicitly, we can rely solely on its
    loop and do the checks only once. This approach also makes it robust for
    the future, rather than assuming if a head page of a compound page is
    being mapped, then the whole compound page is being mapped, instead we
    explicitly know how many pages are being mapped. The old assumption may
    not continue to hold once the "anonymous large folios" feature is
    merged.
    Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
    Reviewed-by: default avatarSteven Price <steven.price@arm.com>
    Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    3425cec4
mte.c 14.9 KB