• Jisheng Zhang's avatar
    riscv: allow kmalloc() caches aligned to the smallest value · 29267151
    Jisheng Zhang authored
    Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E
    64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel
    Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus
    it brings some bad effects to coherent platforms:
    
    Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and
    kmalloc-8 slab caches don't exist any more, they are replaced with
    either kmalloc-128 or kmalloc-64.
    
    Secondly, larger than necessary kmalloc aligned allocations results
    in unnecessary cache/TLB pressure.
    
    This issue also exists on arm64 platforms. From last year, Catalin
    tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from
    ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to
    dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage
    in various drivers with ARCH_DMA_MINALIGN etc.[1]
    
    One fact we can make use of for riscv: if the CPU doesn't support
    ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on
    Catalin's work and above fact, we can easily solve the kmalloc align
    issue for riscv: we can override dma_get_cache_alignment(), then let
    it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know
    the underlying HW neither supports ZICBOM nor supports T-HEAD CMO.
    
    So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the
    devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the
    kmalloc minimum alignment, nothing changed in this case. This case
    can be improved in the future.
    
    After this patch, a simple test of booting to a small buildroot rootfs
    on qemu shows:
    
    kmalloc-96           5041    5041     96  ...
    kmalloc-64           9606    9606     64  ...
    kmalloc-32           5128    5128     32  ...
    kmalloc-16           7682    7682     16  ...
    kmalloc-8           10246   10246      8  ...
    
    So we save about 1268KB memory. The saving will be much larger in normal
    OS env on real HW platforms.
    
    Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1]
    Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
    Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
    Link: https://lore.kernel.org/r/20230718152214.2907-2-jszhang@kernel.orgSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
    29267151
cache.h 897 Bytes