• Jisheng Zhang's avatar
    riscv: select ARCH_HAS_FAST_MULTIPLIER · 48b4fc66
    Jisheng Zhang authored
    Currently, riscv linux requires at least IMA, so all platforms have a
    multiplier. And I assume the 'mul' efficiency is comparable or better
    than a sequence of five or so register-dependent arithmetic
    instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
    codegen. Refer to commit f9b41929 ("[PATCH] bitops: hweight()
    speedup") for more details.
    
    In a simple benchmark test calling hweight64() in a loop, it got:
    about 14% performance improvement on JH7110, tested on Milkv Mars.
    
    about 23% performance improvement on TH1520 and SG2042, tested on
    Sipeed LPI4A and SG2042 platform.
    
    a slight performance drop on CV1800B, tested on milkv duo. Among all
    riscv platforms in my hands, this is the only one which sees a slight
    performance drop. It means the 'mul' isn't quick enough. However, the
    situation exists on x86 too, for example, P4 doesn't have fast
    integer multiplies as said in the above commit, x86 also selects
    ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
    which can benefit almost riscv platforms.
    
    Samuel also provided some performance numbers:
    On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for
    __sw_hweight64.
    On D1: 8% speedup for __sw_hweight32 and 8% slowdown for
    __sw_hweight64.
    Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
    Reviewed-by: default avatarSamuel Holland <samuel.holland@sifive.com>
    Tested-by: default avatarSamuel Holland <samuel.holland@sifive.com>
    Reviewed-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
    Link: https://lore.kernel.org/r/20240325105823.1483-1-jszhang@kernel.orgSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
    48b4fc66
Kconfig 34.5 KB