• Andy Lutomirski's avatar
    x86_64, entry: Use sysret to return to userspace when possible · 2a23c6b8
    Andy Lutomirski authored
    The x86_64 entry code currently jumps through complex and
    inconsistent hoops to try to minimize the impact of syscall exit
    work.  For a true fast-path syscall, almost nothing needs to be
    done, so returning is just a check for exit work and sysret.  For a
    full slow-path return from a syscall, the C exit hook is invoked if
    needed and we join the iret path.
    
    Using iret to return to userspace is very slow, so the entry code
    has accumulated various special cases to try to do certain forms of
    exit work without invoking iret.  This is error-prone, since it
    duplicates assembly code paths, and it's dangerous, since sysret
    can malfunction in interesting ways if used carelessly.  It's
    also inefficient, since a lot of useful cases aren't optimized
    and therefore force an iret out of a combination of paranoia and
    the fact that no one has bothered to write even more asm code
    to avoid it.
    
    I would argue that this approach is backwards.  Rather than trying
    to avoid the iret path, we should instead try to make the iret path
    fast.  Under a specific set of conditions, iret is unnecessary.  In
    particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not
    set, and both SS and CS are as expected, then
    movq 32(%rsp),%rsp;sysret does the same thing as iret.  This set of
    conditions is nearly always satisfied on return from syscalls, and
    it can even occasionally be satisfied on return from an irq.
    
    Even with the careful checks for sysret applicability, this cuts
    nearly 80ns off of the overhead from syscalls with unoptimized exit
    work.  This includes tracing and context tracking, and any return
    that invokes KVM's user return notifier.  For example, the cost of
    getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to
    ~280ns on my computer.
    
    This may allow the removal and even eventual conversion to C
    of a respectable amount of exit asm.
    
    This may require further tweaking to give the full benefit on Xen.
    
    It may be worthwhile to adjust signal delivery and exec to try hit
    the sysret path.
    
    This does not optimize returns to 32-bit userspace.  Making the same
    optimization for CS == __USER32_CS is conceptually straightforward,
    but it will require some tedious code to handle the differences
    between sysretl and sysexitl.
    
    Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.netSigned-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
    2a23c6b8
entry_64.S 43.5 KB