• Palmer Dabbelt's avatar
    Merge patch series "riscv: Add fine-tuned checksum functions" · c6408684
    Palmer Dabbelt authored
    Charlie Jenkins <charlie@rivosinc.com> says:
    
    Each architecture generally implements fine-tuned checksum functions to
    leverage the instruction set. This patch adds the main checksum
    functions that are used in networking. Tested on QEMU, this series
    allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
    
    This patch takes heavy use of the Zbb extension using alternatives
    patching.
    
    To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.
    
    I have attempted to make these functions as optimal as possible, but I
    have not ran anything on actual riscv hardware. My performance testing
    has been limited to inspecting the assembly, running the algorithms on
    x86 hardware, and running in QEMU.
    
    ip_fast_csum is a relatively small function so even though it is
    possible to read 64 bits at a time on compatible hardware, the
    bottleneck becomes the clean up and setup code so loading 32 bits at a
    time is actually faster.
    
    * b4-shazam-merge:
      kunit: Add tests for csum_ipv6_magic and ip_fast_csum
      riscv: Add checksum library
      riscv: Add checksum header
      riscv: Add static key for misaligned accesses
      asm-generic: Improve csum_fold
    
    Link: https://lore.kernel.org/r/20240108-optimize_checksum-v15-0-1c50de5f2167@rivosinc.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
    c6408684
cpufeature.c 29.4 KB