• Paolo Bonzini's avatar
    x86/setup: Extend low identity map to cover whole kernel range · f5f3497c
    Paolo Bonzini authored
    On 32-bit systems, the initial_page_table is reused by
    efi_call_phys_prolog as an identity map to call
    SetVirtualAddressMap.  efi_call_phys_prolog takes care of
    converting the current CPU's GDT to a physical address too.
    
    For PAE kernels the identity mapping is achieved by aliasing the
    first PDPE for the kernel memory mapping into the first PDPE
    of initial_page_table.  This makes the EFI stub's trick "just work".
    
    However, for non-PAE kernels there is no guarantee that the identity
    mapping in the initial_page_table extends as far as the GDT; in this
    case, accesses to the GDT will cause a page fault (which quickly becomes
    a triple fault).  Fix this by copying the kernel mappings from
    swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
    identity mapping.
    
    For some reason, this is only reproducible with QEMU's dynamic translation
    mode, and not for example with KVM.  However, even under KVM one can clearly
    see that the page table is bogus:
    
        $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
        $ gdb
        (gdb) target remote localhost:1234
        (gdb) hb *0x02858f6f
        Hardware assisted breakpoint 1 at 0x2858f6f
        (gdb) c
        Continuing.
    
        Breakpoint 1, 0x02858f6f in ?? ()
        (gdb) monitor info registers
        ...
        GDT=     0724e000 000000ff
        IDT=     fffbb000 000007ff
        CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
        ...
    
    The page directory is sane:
    
        (gdb) x/4wx 0x32b7000
        0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
        (gdb) x/4wx 0x3398000
        0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
        (gdb) x/4wx 0x3399000
        0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003
    
    but our particular page directory entry is empty:
    
        (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
        0x32b7070:	0x00000000
    
    [ It appears that you can skate past this issue if you don't receive
      any interrupts while the bogus GDT pointer is loaded, or if you avoid
      reloading the segment registers in general.
    
      Andy Lutomirski provides some additional insight:
    
       "AFAICT it's entirely permissible for the GDTR and/or LDT
        descriptor to point to unmapped memory.  Any attempt to use them
        (segment loads, interrupts, IRET, etc) will try to access that memory
        as if the access came from CPL 0 and, if the access fails, will
        generate a valid page fault with CR2 pointing into the GDT or
        LDT."
    
      Up until commit 23a0d4e8 ("efi: Disable interrupts around EFI
      calls, not in the epilog/prolog calls") interrupts were disabled
      around the prolog and epilog calls, and the functional GDT was
      re-installed before interrupts were re-enabled.
    
      Which explains why no one has hit this issue until now. ]
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
    Cc: <stable@vger.kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
    [ Updated changelog. ]
    f5f3497c
setup.c 30.9 KB