• Linus Torvalds's avatar
    fs: use __fput_sync in close(2) · 021a160a
    Linus Torvalds authored
    close(2) is a special case which guarantees a shallow kernel stack,
    making delegation to task_work machinery unnecessary. Said delegation is
    problematic as it involves atomic ops and interrupt masking trips, none
    of which are cheap on x86-64. Forcing close(2) to do it looks like an
    oversight in the original work.
    
    Moreover presence of CONFIG_RSEQ adds an additional overhead as fput()
    -> task_work_add(..., TWA_RESUME) -> set_notify_resume() makes the
    thread returning to userspace land in resume_user_mode_work(), where
    rseq_handle_notify_resume takes a SMAP round-trip if rseq is enabled for
    the thread (and it is by default with contemporary glibc).
    
    Sample result when benchmarking open1_processes -t 1 from will-it-scale
    (that's an open + close loop) + tmpfs on /tmp, running on the Sapphire
    Rapid CPU (ops/s):
    stock+RSEQ:     1329857
    stock-RSEQ:     1421667 (+7%)
    patched:        1523521 (+14.5% / +7%) (with / without rseq)
    
    Patched result is the same regardless of rseq as the codepath is avoided.
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
    021a160a
open.c 40.3 KB