1. 23 Aug, 2023 1 commit
    • Jisheng Zhang's avatar
      riscv: allow kmalloc() caches aligned to the smallest value · 29267151
      Jisheng Zhang authored
      Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E
      64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel
      Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus
      it brings some bad effects to coherent platforms:
      
      Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and
      kmalloc-8 slab caches don't exist any more, they are replaced with
      either kmalloc-128 or kmalloc-64.
      
      Secondly, larger than necessary kmalloc aligned allocations results
      in unnecessary cache/TLB pressure.
      
      This issue also exists on arm64 platforms. From last year, Catalin
      tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from
      ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to
      dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage
      in various drivers with ARCH_DMA_MINALIGN etc.[1]
      
      One fact we can make use of for riscv: if the CPU doesn't support
      ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on
      Catalin's work and above fact, we can easily solve the kmalloc align
      issue for riscv: we can override dma_get_cache_alignment(), then let
      it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know
      the underlying HW neither supports ZICBOM nor supports T-HEAD CMO.
      
      So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the
      devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the
      kmalloc minimum alignment, nothing changed in this case. This case
      can be improved in the future.
      
      After this patch, a simple test of booting to a small buildroot rootfs
      on qemu shows:
      
      kmalloc-96           5041    5041     96  ...
      kmalloc-64           9606    9606     64  ...
      kmalloc-32           5128    5128     32  ...
      kmalloc-16           7682    7682     16  ...
      kmalloc-8           10246   10246      8  ...
      
      So we save about 1268KB memory. The saving will be much larger in normal
      OS env on real HW platforms.
      
      Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1]
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Link: https://lore.kernel.org/r/20230718152214.2907-2-jszhang@kernel.orgSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      29267151
  2. 09 Jul, 2023 10 commits
  3. 08 Jul, 2023 29 commits