1. 12 Oct, 2022 15 commits
    • Huacai Chen's avatar
      LoongArch: Add qspinlock support · 5f1e001b
      Huacai Chen authored
      On NUMA system, the performance of qspinlock is better than generic
      spinlock. Below is the UnixBench test results on a 8 nodes (4 cores
      per node, 32 cores in total) machine.
      
      A. With generic spinlock:
      
      System Benchmarks Index Values               BASELINE       RESULT    INDEX
      Dhrystone 2 using register variables         116700.0  449574022.5  38523.9
      Double-Precision Whetstone                       55.0      85190.4  15489.2
      Execl Throughput                                 43.0      14696.2   3417.7
      File Copy 1024 bufsize 2000 maxblocks          3960.0     143157.8    361.5
      File Copy 256 bufsize 500 maxblocks            1655.0      37631.8    227.4
      File Copy 4096 bufsize 8000 maxblocks          5800.0     444814.2    766.9
      Pipe Throughput                               12440.0    5047490.7   4057.5
      Pipe-based Context Switching                   4000.0    2021545.7   5053.9
      Process Creation                                126.0      23829.8   1891.3
      Shell Scripts (1 concurrent)                     42.4      33756.7   7961.5
      Shell Scripts (8 concurrent)                      6.0       4062.9   6771.5
      System Call Overhead                          15000.0    2479748.6   1653.2
                                                                         ========
      System Benchmarks Index Score                                        2955.6
      
      B. With qspinlock:
      
      System Benchmarks Index Values               BASELINE       RESULT    INDEX
      Dhrystone 2 using register variables         116700.0  449467876.9  38514.8
      Double-Precision Whetstone                       55.0      85174.6  15486.3
      Execl Throughput                                 43.0      14769.1   3434.7
      File Copy 1024 bufsize 2000 maxblocks          3960.0     146150.5    369.1
      File Copy 256 bufsize 500 maxblocks            1655.0      37496.8    226.6
      File Copy 4096 bufsize 8000 maxblocks          5800.0     447527.0    771.6
      Pipe Throughput                               12440.0    5175989.2   4160.8
      Pipe-based Context Switching                   4000.0    2207747.8   5519.4
      Process Creation                                126.0      25125.5   1994.1
      Shell Scripts (1 concurrent)                     42.4      33461.2   7891.8
      Shell Scripts (8 concurrent)                      6.0       4024.7   6707.8
      System Call Overhead                          15000.0    2917278.6   1944.9
                                                                         ========
      System Benchmarks Index Score                                        3040.1
      Signed-off-by: default avatarRui Wang <wangrui@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      5f1e001b
    • Huacai Chen's avatar
      LoongArch: Use TLB for ioremap() · d2791341
      Huacai Chen authored
      We can support more cache attributes (e.g., CC, SUC and WUC) and page
      protection when we use TLB for ioremap(). The implementation is based
      on GENERIC_IOREMAP.
      
      The existing simple ioremap() implementation has better performance so
      we keep it and introduce ARCH_IOREMAP to control the selection.
      
      We move pagetable_init() earlier to make early ioremap() works, and we
      modify the PCI ecam mapping because the TLB-based version of ioremap()
      will actually take the size into account.
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      d2791341
    • Huacai Chen's avatar
      LoongArch: Support access filter to /dev/mem interface · 235d074f
      Huacai Chen authored
      Accidental access to /dev/mem is obviously disastrous, but specific
      access can be used by people debugging the kernel. So select GENERIC_
      LIB_DEVMEM_IS_ALLOWED, as well as define ARCH_HAS_VALID_PHYS_ADDR_RANGE
      and related helpers, to support access filter to /dev/mem interface.
      Signed-off-by: default avatarWeihao Li <liweihao@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      235d074f
    • Huacai Chen's avatar
      LoongArch: Refactor cache probe and flush methods · b61a40af
      Huacai Chen authored
      Current cache probe and flush methods have some drawbacks:
      1, Assume there are 3 cache levels and only 3 levels;
      2, Assume L1 = I + D, L2 = V, L3 = S, V is exclusive, S is inclusive.
      
      However, the fact is I + D, I + D + V, I + D + S and I + D + V + S are
      all valid. So, refactor the cache probe and flush methods to adapt more
      types of cache hierarchy.
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      b61a40af
    • Rui Wang's avatar
      LoongArch: mm: Refactor TLB exception handlers · a2a84e36
      Rui Wang authored
      This patch simplifies TLB load, store and modify exception handlers:
      
      1. Reduce instructions, such as alu/csr and memory access;
      2. Execute tlb search instruction only in the fast path;
      3. Return directly from the fast path for both normal and huge pages;
      4. Re-tab the assembly for better vertical alignment.
      
      And fixes the concurrent modification issue of fast path for huge pages.
      
      This issue will occur in the following steps:
      
         CPU-1 (In TLB exception)         CPU-2 (In THP splitting)
      1: Load PMD entry (HUGE=1)
      2: Goto huge path
      3:                                  Store PMD entry (HUGE=0)
      4: Reload PMD entry (HUGE=0)
      5: Fill TLB entry (PA is incorrect)
      
      This patch also slightly improves the TLB processing performance:
      
      * Normal pages: 2.15%, Huge pages: 1.70%.
      
        #include <stdio.h>
        #include <stdlib.h>
        #include <unistd.h>
        #include <sys/mman.h>
      
        int main(int argc, char *argv[])
        {
              size_t page_size;
              size_t mem_size;
              size_t off;
              void *base;
              int flags;
              int i;
      
              if (argc < 2) {
                      fprintf(stderr, "%s MEM_SIZE [HUGE]\n", argv[0]);
                      return -1;
              }
      
              page_size = sysconf(_SC_PAGESIZE);
              flags = MAP_PRIVATE | MAP_ANONYMOUS;
              mem_size = strtoul(argv[1], NULL, 10);
              if (argc > 2)
                      flags |= MAP_HUGETLB;
      
              for (i = 0; i < 10; i++) {
                      base = mmap(NULL, mem_size, PROT_READ, flags, -1, 0);
                      if (base == MAP_FAILED) {
                              fprintf(stderr, "Map memory failed!\n");
                              return -1;
                      }
      
                      for (off = 0; off < mem_size; off += page_size)
                              *(volatile int *)(base + off);
      
                      munmap(base, mem_size);
              }
      
              return 0;
        }
      Signed-off-by: default avatarRui Wang <wangrui@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      a2a84e36
    • Xi Ruoyao's avatar
      LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules · 59b3d4a9
      Xi Ruoyao authored
      GCC >= 13 and GNU assembler >= 2.40 use these relocations to address
      external symbols, so we need to add them.
      
      Let the module loader emit GOT entries for data symbols so we would be
      able to handle GOT relocations. The GOT entry is just the data's symbol
      address.
      
      In module.lds, emit a stub .got section for a section header entry. The
      actual content of the section entry will be filled at runtime by module_
      frob_arch_sections().
      Tested-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      59b3d4a9
    • Xi Ruoyao's avatar
      LoongArch: Support PC-relative relocations in modules · 9bd1e380
      Xi Ruoyao authored
      Binutils >= 2.40 uses R_LARCH_B26 instead of R_LARCH_SOP_PUSH_PLT_PCREL,
      and R_LARCH_PCALA* instead of R_LARCH_SOP_PUSH_PCREL.
      
      Handle R_LARCH_B26 and R_LARCH_PCALA* in the module loader. For R_LARCH_
      B26, also create a PLT entry as needed.
      Tested-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      9bd1e380
    • Xi Ruoyao's avatar
      LoongArch: Define ELF relocation types added in ABIv2.0 · 0a75e5d1
      Xi Ruoyao authored
      These relocation types are used by GNU binutils >= 2.40 and GCC >= 13.
      Add their definitions so we will be able to use them in later patches.
      
      Link: https://github.com/loongson/LoongArch-Documentation/pull/57Tested-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      0a75e5d1
    • Xi Ruoyao's avatar
      LoongArch: Adjust symbol addressing for AS_HAS_EXPLICIT_RELOCS · 11cd8a64
      Xi Ruoyao authored
      If explicit relocation hints are used by the toolchain, -Wa,-mla-*
      options will be useless for the C code. So only use them for the
      !CONFIG_AS_HAS_EXPLICIT_RELOCS case.
      
      Replace "la" with "la.pcrel" in head.S to keep the semantic consistent
      with new and old toolchains for the low level startup code.
      
      For per-CPU variables, the "address" of the symbol is actually an offset
      from $r21. The value is near the loading address of main kernel image,
      but far from the loading address of modules. So we use model("extreme")
      attibute to tell the compiler that a PC-relative addressing with 32-bit
      offset is not sufficient for local per-CPU variables.
      
      The behavior with different assemblers and compilers are summarized in
      the following table:
      
      AS has            CC has
      explicit relocs   explicit relocs * Behavior
      ==============================================================
      No                No                Use la.* macros.
                                          No change from Linux 6.0.
      --------------------------------------------------------------
      No                Yes               Disable explicit relocs.
                                          No change from Linux 6.0.
      --------------------------------------------------------------
      Yes               No                Not supported.
      --------------------------------------------------------------
      Yes               Yes               Enable explicit relocs.
                                          No -Wa,-mla* options used.
      ==============================================================
      *: We assume CC must have model attribute if it has explicit relocs.
         Both features are added in GCC 13 development cycle, so any GCC
         release >= 13 should be OK. Using early GCC 13 development snapshots
         may produce modules with unsupported relocations.
      
      Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f09482a
      Link: https://gcc.gnu.org/r13-1834
      Link: https://gcc.gnu.org/r13-2199Tested-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      11cd8a64
    • Xi Ruoyao's avatar
      LoongArch: Add Kconfig option AS_HAS_EXPLICIT_RELOCS · 0d8dad70
      Xi Ruoyao authored
      GNU as >= 2.40 and GCC >= 13 will support using explicit relocation
      hints in the assembly code, instead of la.* macros. The usage of
      explicit relocation hints can improve code generation so it's enabled
      by default by GCC >= 13.
      
      Introduce a Kconfig option AS_HAS_EXPLICIT_RELOCS as the switch for
      "use explicit relocation hints or not".
      Tested-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      0d8dad70
    • Colin Ian King's avatar
      LoongArch: Kconfig: Fix spelling mistake "delibrately" -> "deliberately" · 9550dfde
      Colin Ian King authored
      There is a spelling mistake in a commented section. Fix it.
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      9550dfde
    • Huacai Chen's avatar
      LoongArch: Mark __xchg() and __cmpxchg() as __always_inline · ddf50271
      Huacai Chen authored
      Commit ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING
      forcibly") allows compiler to uninline functions marked as 'inline'.
      In case of __xchg()/__cmpxchg() this would cause to reference
      BUILD_BUG(), which is an error case for catching bugs and will not
      happen for correct code, if __xchg()/__cmpxchg() is inlined.
      
      This bug can be produced with CONFIG_DEBUG_SECTION_MISMATCH enabled,
      and the solution is similar to below commits:
      46f16195 ("MIPS: include: Mark __xchg as __always_inline"),
      88356d09 ("MIPS: include: Mark __cmpxchg as __always_inline").
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      ddf50271
    • Huacai Chen's avatar
      LoongArch: Flush TLB earlier at initialization · 1299a129
      Huacai Chen authored
      Move local_flush_tlb_all() earlier (just after setup_ptwalker() and
      before page allocation). This can avoid stale TLB entries misguiding
      the later page allocation. Without this patch the second kernel of
      kexec/kdump fails to boot SMP.
      
      BTW, move output_pgtable_bits_defines() into tlb_init() since it has
      nothing to do with tlb handler setup.
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      1299a129
    • Tiezhu Yang's avatar
      LoongArch: Do not create sysfs control file for io master CPUs · a522b7ad
      Tiezhu Yang authored
      Now io master CPUs are not hotpluggable on LoongArch, but in the current
      code only /sys/devices/system/cpu/cpu0/online is not created. Let us set
      the hotpluggable field of all the io master CPUs as 0, then prevent to
      create sysfs control file for all the io master CPUs which confuses some
      user space tools. This is similar with commit 9cce844a ("MIPS: CPU#0
      is not hotpluggable").
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      a522b7ad
    • Jianmin Lv's avatar
      LoongArch: Fix cpu name after CPU-hotplug · 4b2edd38
      Jianmin Lv authored
      Don't overwrite the SMBIOS-provided CPU name on coming back from CPU-
      hotplug (including S3/S4) if it is already initialized.
      Reviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarJianmin Lv <lvjianmin@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      4b2edd38
  2. 03 Oct, 2022 1 commit
  3. 02 Oct, 2022 4 commits
  4. 01 Oct, 2022 9 commits
  5. 30 Sep, 2022 11 commits