• Jeremy Fitzhardinge's avatar
    mm: add a ptep_modify_prot transaction abstraction · 1ea0704e
    Jeremy Fitzhardinge authored
    This patch adds an API for doing read-modify-write updates to a pte's
    protection bits which may race against hardware updates to the pte.
    After reading the pte, the hardware may asynchonously set the accessed
    or dirty bits on a pte, which would be lost when writing back the
    modified pte value.
    
    The existing technique to handle this race is to use
    ptep_get_and_clear() atomically fetch the old pte value and clear it
    in memory.  This has the effect of marking the pte as non-present,
    which will prevent the hardware from updating its state.  When the new
    value is written back, the pte will be present again, and the hardware
    can resume updating the access/dirty flags.
    
    When running in a virtualized environment, pagetable updates are
    relatively expensive, since they generally involve some trap into the
    hypervisor.  To mitigate the cost of these updates, we tend to batch
    them.
    
    However, because of the atomic nature of ptep_get_and_clear(), it is
    inherently non-batchable.  This new interface allows batching by
    giving the underlying implementation enough information to open a
    transaction between the read and write phases:
    
    ptep_modify_prot_start() returns the current pte value, and puts the
      pte entry into a state where either the hardware will not update the
      pte, or if it does, the updates will be preserved on commit.
    
    ptep_modify_prot_commit() writes back the updated pte, makes sure that
      any hardware updates made since ptep_modify_prot_start() are
      preserved.
    
    ptep_modify_prot_start() and _commit() must be exactly paired, and
    used while holding the appropriate pte lock.  They do not protect
    against other software updates of the pte in any way.
    
    The current implementations of ptep_modify_prot_start and _commit are
    functionally unchanged from before: _start() uses ptep_get_and_clear()
    fetch the pte and zero the entry, preventing any hardware updates.
    _commit() simply writes the new pte value back knowing that the
    hardware has not updated the pte in the meantime.
    
    The only current user of this interface is mprotect
    Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Acked-by: default avatarHugh Dickins <hugh@veritas.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    1ea0704e
pgtable.h 8.68 KB