1. 17 Apr, 2024 15 commits
    • Alexander Gordeev's avatar
      s390/mm: Uncouple physical vs virtual address spaces · c98d2eca
      Alexander Gordeev authored
      The uncoupling physical vs virtual address spaces brings
      the following benefits to s390:
      
      - virtual memory layout flexibility;
      - closes the address gap between kernel and modules, it
        caused s390-only problems in the past (e.g. 'perf' bugs);
      - allows getting rid of trampolines used for module calls
        into kernel;
      - allows simplifying BPF trampoline;
      - minor performance improvement in branch prediction;
      - kernel randomization entropy is magnitude bigger, as it is
        derived from the amount of available virtual, not physical
        memory;
      
      The whole change could be described in two pictures below:
      before and after the change.
      
      Some aspects of the virtual memory layout setup are not
      clarified (number of page levels, alignment, DMA memory),
      since these are not a part of this change or secondary
      with regard to how the uncoupling itself is implemented.
      
      The focus of the pictures is to explain why __va() and __pa()
      macros are implemented the way they are.
      
              Memory layout in V==R mode:
      
      |    Physical      |    Virtual       |
      +- 0 --------------+- 0 --------------+ identity mapping start
      |                  | S390_lowcore     | Low-address memory
      |                  +- 8 KB -----------+
      |                  |                  |
      |                  | identity         | phys == virt
      |                  | mapping          | virt == phys
      |                  |                  |
      +- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start
      |.amode31 text/data|.amode31 text/data|
      +- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt start
      |                  |                  |
      |                  |                  |
      +- __kaslr_offset, __kaslr_offset_phys| kernel rand. phys/virt start
      |                  |                  |
      | kernel text/data | kernel text/data | phys == kvirt
      |                  |                  |
      +------------------+------------------+ kernel phys/virt end
      |                  |                  |
      |                  |                  |
      |                  |                  |
      |                  |                  |
      +- ident_map_size -+- ident_map_size -+ identity mapping end
                         |                  |
                         |  ... unused gap  |
                         |                  |
                         +---- vmemmap -----+ 'struct page' array start
                         |                  |
                         | virtually mapped |
                         | memory map       |
                         |                  |
                         +- __abs_lowcore --+
                         |                  |
                         | Absolute Lowcore |
                         |                  |
                         +- __memcpy_real_area
                         |                  |
                         |  Real Memory Copy|
                         |                  |
                         +- VMALLOC_START --+ vmalloc area start
                         |                  |
                         |  vmalloc area    |
                         |                  |
                         +- MODULES_VADDR --+ modules area start
                         |                  |
                         |  modules area    |
                         |                  |
                         +------------------+ UltraVisor Secure Storage limit
                         |                  |
                         |  ... unused gap  |
                         |                  |
                         +KASAN_SHADOW_START+ KASAN shadow memory start
                         |                  |
                         |   KASAN shadow   |
                         |                  |
                         +------------------+ ASCE limit
      
              Memory layout in V!=R mode:
      
      |    Physical      |    Virtual       |
      +- 0 --------------+- 0 --------------+
      |                  | S390_lowcore     | Low-address memory
      |                  +- 8 KB -----------+
      |                  |                  |
      |                  |                  |
      |                  | ... unused gap   |
      |                  |                  |
      +- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start
      |.amode31 text/data|.amode31 text/data|
      +- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt end (<2GB)
      |                  |                  |
      |                  |                  |
      +- __kaslr_offset_phys		     | kernel rand. phys start
      |                  |                  |
      | kernel text/data |                  |
      |                  |                  |
      +------------------+		     | kernel phys end
      |                  |                  |
      |                  |                  |
      |                  |                  |
      |                  |                  |
      +- ident_map_size -+		     |
                         |                  |
                         |  ... unused gap  |
                         |                  |
                         +- __identity_base + identity mapping start (>= 2GB)
                         |                  |
                         | identity         | phys == virt - __identity_base
                         | mapping          | virt == phys + __identity_base
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         |                  |
                         +---- vmemmap -----+ 'struct page' array start
                         |                  |
                         | virtually mapped |
                         | memory map       |
                         |                  |
                         +- __abs_lowcore --+
                         |                  |
                         | Absolute Lowcore |
                         |                  |
                         +- __memcpy_real_area
                         |                  |
                         |  Real Memory Copy|
                         |                  |
                         +- VMALLOC_START --+ vmalloc area start
                         |                  |
                         |  vmalloc area    |
                         |                  |
                         +- MODULES_VADDR --+ modules area start
                         |                  |
                         |  modules area    |
                         |                  |
                         +- __kaslr_offset -+ kernel rand. virt start
                         |                  |
                         | kernel text/data | phys == (kvirt - __kaslr_offset) +
                         |                  |         __kaslr_offset_phys
                         +- kernel .bss end + kernel rand. virt end
                         |                  |
                         |  ... unused gap  |
                         |                  |
                         +------------------+ UltraVisor Secure Storage limit
                         |                  |
                         |  ... unused gap  |
                         |                  |
                         +KASAN_SHADOW_START+ KASAN shadow memory start
                         |                  |
                         |   KASAN shadow   |
                         |                  |
                         +------------------+ ASCE limit
      
      Unused gaps in the virtual memory layout could be present
      or not - depending on how partucular system is configured.
      No page tables are created for the unused gaps.
      
      The relative order of vmalloc, modules and kernel image in
      virtual memory is defined by following considerations:
      
      - start of the modules area and end of the kernel should reside
        within 4GB to accommodate relative 32-bit jumps. The best way
        to achieve that is to place kernel next to modules;
      
      - vmalloc and module areas should locate next to each other
        to prevent failures and extra reworks in user level tools
        (makedumpfile, crash, etc.) which treat vmalloc and module
        addresses similarily;
      
      - kernel needs to be the last area in the virtual memory
        layout to easily distinguish between kernel and non-kernel
        virtual addresses. That is needed to (again) simplify
        handling of addresses in user level tools and make __pa()
        macro faster (see below);
      
      Concluding the above, the relative order of the considered
      virtual areas in memory is: vmalloc - modules - kernel.
      Therefore, the only change to the current memory layout is
      moving kernel to the end of virtual address space.
      
      With that approach the implementation of __pa() macro is
      straightforward - all linear virtual addresses less than
      kernel base are considered identity mapping:
      
      	phys == virt - __identity_base
      
      All addresses greater than kernel base are kernel ones:
      
      	phys == (kvirt - __kaslr_offset) + __kaslr_offset_phys
      
      By contrast, __va() macro deals only with identity mapping
      addresses:
      
      	virt == phys + __identity_base
      
      .amode31 section is mapped separately and is not covered by
      __pa() macro. In fact, it could have been handled easily by
      checking whether a virtual address is within the section or
      not, but there is no need for that. Thus, let __pa() code
      do as little machine cycles as possible.
      
      The KASAN shadow memory is located at the very end of the
      virtual memory layout, at addresses higher than the kernel.
      However, that is not a linear mapping and no code other than
      KASAN instrumentation or API is expected to access it.
      
      When KASLR mode is enabled the kernel base address randomized
      within a memory window that spans whole unused virtual address
      space. The size of that window depends from the amount of
      physical memory available to the system, the limit imposed by
      UltraVisor (if present) and the vmalloc area size as provided
      by vmalloc= kernel command line parameter.
      
      In case the virtual memory is exhausted the minimum size of
      the randomization window is forcefully set to 2GB, which
      amounts to in 15 bits of entropy if KASAN is enabled or 17
      bits of entropy in default configuration.
      
      The default kernel offset 0x100000 is used as a magic value
      both in the decompressor code and vmlinux linker script, but
      it will be removed with a follow-up change.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      c98d2eca
    • Alexander Gordeev's avatar
      s390/crash: Use old os_info to create PT_LOAD headers · f4cac27d
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      The vmcore ELF program headers describe virtual memory
      regions of a crashed kernel. User level tools use that
      information for the kernel text and data analysis (e.g
      vmcore-dmesg extracts the kernel log).
      
      Currently the kernel image is covered by program headers
      describing the identity mapping regions. But in the future
      the kernel image will be mapped into separate region outside
      of the identity mapping. Create the additional ELF program
      header that covers kernel image only, so that vmcore tools
      could locate kernel text and data.
      
      Further, the identity mapping in crashed and capture kernels
      will have different base address. Due to that __va() macro
      can not be used in the capture kernel. Instead, read crashed
      kernel identity mapping base address from os_info and use
      it for PT_LOAD type program headers creation.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      f4cac27d
    • Alexander Gordeev's avatar
      s390/vmcoreinfo: Store virtual memory layout · 378e32aa
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      The virtual memory layout is needed for address translation
      by crash tool when /proc/kcore device is used as the memory
      image.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      378e32aa
    • Alexander Gordeev's avatar
      s390/os_info: Store virtual memory layout · 8572f525
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      The virtual memory layout will be read out by makedumpfile,
      crash and other user tools for virtual address translation.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      8572f525
    • Alexander Gordeev's avatar
      s390/os_info: Introduce value entries · 88702793
      Alexander Gordeev authored
      Introduce entries that do not reference any data in memory,
      but rather provide values. Set the size of such entries to
      zero and do not compute checksum for them, since there is no
      data which integrity needs to be checked. The integrity of
      the value entries itself is still covered by the os_info
      checksum.
      
      Reserve the lowest unused entry index OS_INFO_RESERVED for
      future use - presumably for the number of entries present.
      That could later be used by user level tools. The existing
      tools would not notice any difference.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      88702793
    • Alexander Gordeev's avatar
      s390/boot: Make .amode31 section address range explicit · 5fb50fa6
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      Introduce .amode31 section address range AMODE31_START
      and AMODE31_END macros for later use.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      5fb50fa6
    • Alexander Gordeev's avatar
      s390/boot: Make identity mapping base address explicit · 7de0446f
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      Currently the identity mapping base address is implicit
      and is always set to zero. Make it explicit by putting
      into __identity_base persistent boot variable and use it
      in proper context - which is the value of PAGE_OFFSET.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      7de0446f
    • Alexander Gordeev's avatar
      s390/boot: Uncouple virtual and physical kernel offsets · 3bb11234
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      Currently __kaslr_offset is the kernel offset in both
      physical memory on boot and in virtual memory after DAT
      mode is enabled.
      
      Uncouple these offsets and rename the physical address
      space variant to __kaslr_offset_phys while keep the name
      __kaslr_offset for the offset in virtual address space.
      
      Do not use __kaslr_offset_phys after DAT mode is enabled
      just yet, but still make it a persistent boot variable
      for later use.
      
      Use __kaslr_offset and __kaslr_offset_phys offsets in
      proper contexts and alter handle_relocs() function to
      distinguish between the two.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      3bb11234
    • Alexander Gordeev's avatar
      s390/mm: Create virtual memory layout structure · 236f324b
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      Put virtual memory layout information into a structure
      to improve code generation when accessing the structure
      members, which are currently only ident_map_size and
      __kaslr_offset.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      236f324b
    • Alexander Gordeev's avatar
      s390/mm: Move KASLR related to <asm/page.h> · bbe72f39
      Alexander Gordeev authored
      Move everyting KASLR related to <asm/page.h>,
      similarly to many other architectures.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Suggested-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      bbe72f39
    • Alexander Gordeev's avatar
      s390/boot: Swap vmalloc and Lowcore/Real Memory Copy areas · c8aef260
      Alexander Gordeev authored
      This is a preparatory rework to allow uncoupling virtual
      and physical addresses spaces.
      
      Currently the order of virtual memory areas is (the lowcore
      and .amode31 section are skipped, as it is irrelevant):
      
      	identity mapping (the kernel is contained within)
      	vmemmap
      	vmalloc
      	modules
      	Absolute Lowcore
      	Real Memory Copy
      
      In the future the kernel will be mapped separately and placed
      to the end of the virtual address space, so the layout would
      turn like this:
      
      	identity mapping
      	vmemmap
      	vmalloc
      	modules
      	Absolute Lowcore
      	Real Memory Copy
      	kernel
      
      However, the distance between kernel and modules needs to be as
      little as possible, ideally - none. Thus, the Absolute Lowcore
      and Real Memory Copy areas would stay in the way and therefore
      need to be moved as well:
      
      	identity mapping
      	vmemmap
      	Absolute Lowcore
      	Real Memory Copy
      	vmalloc
      	modules
      	kernel
      
      To facilitate such layout swap the vmalloc and Absolute Lowcore
      together with Real Memory Copy areas. As result, the current
      layout turns into:
      
      	identity mapping (the kernel is contained within)
      	vmemmap
      	Absolute Lowcore
      	Real Memory Copy
      	vmalloc
      	modules
      
      This will allow to locate the kernel directly next to the
      modules once it gets mapped separately.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      c8aef260
    • Alexander Gordeev's avatar
      s390/boot: Reduce size of identity mapping on overlap · ecf74da6
      Alexander Gordeev authored
      In case vmemmap array could overlap with vmalloc area on
      virtual memory layout setup, the size of vmalloc area
      is decreased. That could result in less memory than user
      requested with vmalloc= kernel command line parameter.
      Instead, reduce the size of identity mapping (and the
      size of vmemmap array as result) to avoid such overlap.
      
      Further, currently the virtual memmory allocation "rolls"
      from top to bottom and it is only VMALLOC_START that could
      get increased due to the overlap. Change that to decrease-
      only, which makes the whole allocation algorithm more easy
      to comprehend.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      ecf74da6
    • Alexander Gordeev's avatar
      s390/boot: Consider DCSS segments on memory layout setup · b2b15f07
      Alexander Gordeev authored
      The maximum mappable physical address (as returned by
      arch_get_mappable_range() callback) is limited by the
      value of (1UL << MAX_PHYSMEM_BITS).
      
      The maximum physical address available to a DCSS segment
      is 512GB.
      
      In case the available online or offline memory size is less
      than the DCSS limit arch_get_mappable_range() would include
      never used [512GB..(1UL << MAX_PHYSMEM_BITS)] range.
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      b2b15f07
    • Alexander Gordeev's avatar
      s390/boot: Do not force vmemmap to start at MAX_PHYSMEM_BITS · 47bf8176
      Alexander Gordeev authored
      vmemmap is forcefully set to start at MAX_PHYSMEM_BITS at most.
      That could be needed in the past to limit ident_map_size to
      MAX_PHYSMEM_BITS. However since commit 75eba6ec0de1 ("s390:
      unify identity mapping limits handling") ident_map_size is
      limited in setup_ident_map_size() function, which is called
      earlier.
      
      Another reason to limit vmemmap start to MAX_PHYSMEM_BITS is
      because it was returned by arch_get_mappable_range() as the
      maximum mappable physical address. Since commit f641679dfe55
      ("s390/mm: rework arch_get_mappable_range() callback") that
      is not required anymore.
      
      As result, there is no neccessity to limit vmemmap starting
      address with MAX_PHYSMEM_BITS.
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      47bf8176
    • Nina Schoetterl-Glausch's avatar
      KVM: s390: vsie: Use virt_to_phys for facility control block · 22fdd8ba
      Nina Schoetterl-Glausch authored
      In order for SIE to interpretively execute STFLE, it requires the real
      or absolute address of a facility-list control block.
      Before writing the location into the shadow SIE control block, convert
      it from a virtual address.
      We currently do not run into this bug because the lower 31 bits are the
      same for virtual and physical addresses.
      Signed-off-by: default avatarNina Schoetterl-Glausch <nsg@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240319164420.4053380-3-nsg@linux.ibm.comSigned-off-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Message-Id: <20240319164420.4053380-3-nsg@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      22fdd8ba
  2. 12 Apr, 2024 6 commits
  3. 09 Apr, 2024 12 commits
  4. 31 Mar, 2024 7 commits