• H. Peter Anvin's avatar
    x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 36102caa
    H. Peter Anvin authored
    commit 3891a04a upstream.
    
    The IRET instruction, when returning to a 16-bit segment, only
    restores the bottom 16 bits of the user space stack pointer.  This
    causes some 16-bit software to break, but it also leaks kernel state
    to user space.  We have a software workaround for that ("espfix") for
    the 32-bit kernel, but it relies on a nonzero stack segment base which
    is not available in 64-bit mode.
    
    In checkin:
    
        b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
    
    we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
    the logic that 16-bit support is crippled on 64-bit kernels anyway (no
    V86 support), but it turns out that people are doing stuff like
    running old Win16 binaries under Wine and expect it to work.
    
    This works around this by creating percpu "ministacks", each of which
    is mapped 2^16 times 64K apart.  When we detect that the return SS is
    on the LDT, we copy the IRET frame to the ministack and use the
    relevant alias to return to userspace.  The ministacks are mapped
    readonly, so if IRET faults we promote #GP to #DF which is an IST
    vector and thus has its own stack; we then do the fixup in the #DF
    handler.
    
    (Making #GP an IST exception would make the msr_safe functions unsafe
    in NMI/MC context, and quite possibly have other effects.)
    
    Special thanks to:
    
    - Andy Lutomirski, for the suggestion of using very small stack slots
      and copy (as opposed to map) the IRET frame there, and for the
      suggestion to mark them readonly and let the fault promote to #DF.
    - Konrad Wilk for paravirt fixup and testing.
    - Borislav Petkov for testing help and useful comments.
    Reported-by: default avatarBrian Gerst <brgerst@gmail.com>
    Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Andrew Lutomriski <amluto@gmail.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Dirk Hohndel <dirk@hohndel.org>
    Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
    Cc: comex <comexk@gmail.com>
    Cc: Alexander van Heukelum <heukelum@fastmail.fm>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
    (cherry picked from 3.2 commit e7836514)
    [wt: backport notes for 2.6.32 differences :
         - use DECLARE_PER_CPU instead of DECLARE_PER_CPU_READ_MOSTLY
         - replace this_cpu_read(foo) with per_cpu(foo, smp_processor_id())
         - replace this_cpu_write(foo,bar) with per_cpu(foo,smp_processor_id())=bar
    /wt]
    Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
    36102caa
espfix_64.c 6.44 KB