• Michael Ellerman's avatar
    powerpc/64s/radix: Fix crash with unaligned relocated kernel · 98d0219e
    Michael Ellerman authored
    If a relocatable kernel is loaded at an address that is not 2MB aligned
    and told not to relocate to zero, the kernel can crash due to
    mark_rodata_ro() incorrectly changing some read-write data to read-only.
    
    Scenarios where the misalignment can occur are when the kernel is
    loaded by kdump or using the RELOCATABLE_TEST config option.
    
    Example crash with the kernel loaded at 5MB:
    
      Run /sbin/init as init process
      BUG: Unable to handle kernel data access on write at 0xc000000000452000
      Faulting instruction address: 0xc0000000005b6730
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
      CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166
      Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
      NIP:  c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
      REGS: c000000004503250 TRAP: 0300   Not tainted  (6.2.0-rc1-00011-g349188be4841)
      MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44288480  XER: 00000000
      CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
      ...
      NIP memset+0x68/0x104
      LR  zero_user_segments.constprop.0+0xa8/0xf0
      Call Trace:
        ext4_mpage_readpages+0x7f8/0x830
        ext4_readahead+0x48/0x60
        read_pages+0xb8/0x380
        page_cache_ra_unbounded+0x19c/0x250
        filemap_fault+0x58c/0xae0
        __do_fault+0x60/0x100
        __handle_mm_fault+0x1230/0x1a40
        handle_mm_fault+0x120/0x300
        ___do_page_fault+0x20c/0xa80
        do_page_fault+0x30/0xc0
        data_access_common_virt+0x210/0x220
    
    This happens because mark_rodata_ro() tries to change permissions on the
    range _stext..__end_rodata, but _stext sits in the middle of the 2MB
    page from 4MB to 6MB:
    
      radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
      radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
      radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)
    
    The logic that changes the permissions assumes the linear mapping was
    split correctly at boot, so it marks the entire 2MB page read-only. That
    leads to the write fault above.
    
    To fix it, the boot time mapping logic needs to consider that if the
    kernel is running at a non-zero address then _stext is a boundary where
    it must split the mapping.
    
    That leads to the mapping being split correctly, allowing the rodata
    permission change to take happen correctly, with no spillover:
    
      radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
      radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
      radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
      radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
      radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)
    
    If the kernel is loaded at a 2MB aligned address, the mapping continues
    to use 2MB pages as before:
    
      radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
      radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
      radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
      radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages
    
    Fixes: c55d7b5e ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
    98d0219e
radix_pgtable.c 27.3 KB