• Mateusz Guzik's avatar
    x86: bring back rep movsq for user access on CPUs without ERMS · ca96b162
    Mateusz Guzik authored
    Intel CPUs ship with ERMS for over a decade, but this is not true for
    AMD.  In particular one reasonably recent uarch (EPYC 7R13) does not
    have it (or at least the bit is inactive when running on the Amazon EC2
    cloud -- I found rather conflicting information about AMD CPUs vs the
    extension).
    
    Hand-rolled mov loops executing in this case are quite pessimal compared
    to rep movsq for bigger sizes.  While the upper limit depends on uarch,
    everyone is well south of 1KB AFAICS and sizes bigger than that are
    common.
    
    While technically ancient CPUs may be suffering from rep usage, gcc has
    been emitting it for years all over kernel code, so I don't think this
    is a legitimate concern.
    
    Sample result from read1_processes from will-it-scale (4KB reads/s):
    
      before:   1507021
      after:    1721828 (+14%)
    
    Note that the cutoff point for rep usage is set to 64 bytes, which is
    way too conservative but I'm sticking to what was done in 47ee3f1d
    ("x86: re-introduce support for ERMS copies for user space accesses").
    That is to say *some* copies will now go slower, which is fixable but
    beyond the scope of this patch.
    Signed-off-by: default avatarMateusz Guzik <mjguzik@gmail.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ca96b162
copy_user_64.S 1.74 KB