• Eric Biggers's avatar
    x86/fpu: Don't let userspace set bogus xcomp_bv · 814fb7bb
    Eric Biggers authored
    On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
    set a task's extended state (xstate) or "FPU" registers.  ptrace() can
    set them for another task using the PTRACE_SETREGSET request with
    NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
    In either case, registers can be set to any value, but the kernel
    assumes that the XSAVE area itself remains valid in the sense that the
    CPU can restore it.
    
    However, in the case where the kernel is using the uncompacted xstate
    format (which it does whenever the XSAVES instruction is unavailable),
    it was possible for userspace to set the xcomp_bv field in the
    xstate_header to an arbitrary value.  However, all bits in that field
    are reserved in the uncompacted case, so when switching to a task with
    nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault.  This
    caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit.  In
    addition, since the error is otherwise ignored, the FPU registers from
    the task previously executing on the CPU were leaked.
    
    Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
    the uncompacted case, and returning an error otherwise.
    
    The reason for validating xcomp_bv rather than simply overwriting it
    with 0 is that we want userspace to see an error if it (incorrectly)
    provides an XSAVE area in compacted format rather than in uncompacted
    format.
    
    Note that as before, in case of error we clear the task's FPU state.
    This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
    better to return an error before changing anything.  But it seems the
    "clear on error" behavior is fine for now, and it's a little tricky to
    do otherwise because it would mean we couldn't simply copy the full
    userspace state into kernel memory in one __copy_from_user().
    
    This bug was found by syzkaller, which hit the above-mentioned
    WARN_ON_FPU():
    
        WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
        RIP: 0010:__switch_to+0x5b5/0x5d0
        RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
        RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
        RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
        RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
        R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
        FS:  00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
        Call Trace:
        Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f
    
    Here is a C reproducer.  The expected behavior is that the program spin
    forever with no output.  However, on a buggy kernel running on a
    processor with the "xsave" feature but without the "xsaves" feature
    (e.g. Sandy Bridge through Broadwell for Intel), within a second or two
    the program reports that the xmm registers were corrupted, i.e. were not
    restored correctly.  With CONFIG_X86_DEBUG_FPU=y it also hits the above
    kernel warning.
    
        #define _GNU_SOURCE
        #include <stdbool.h>
        #include <inttypes.h>
        #include <linux/elf.h>
        #include <stdio.h>
        #include <sys/ptrace.h>
        #include <sys/uio.h>
        #include <sys/wait.h>
        #include <unistd.h>
    
        int main(void)
        {
            int pid = fork();
            uint64_t xstate[512];
            struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };
    
            if (pid == 0) {
                bool tracee = true;
                for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
                    tracee = (fork() != 0);
                uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
                asm volatile("   movdqu %0, %%xmm0\n"
                             "   mov %0, %%rbx\n"
                             "1: movdqu %%xmm0, %0\n"
                             "   mov %0, %%rax\n"
                             "   cmp %%rax, %%rbx\n"
                             "   je 1b\n"
                             : "+m" (xmm0) : : "rax", "rbx", "xmm0");
                printf("BUG: xmm registers corrupted!  tracee=%d, xmm0=%08X%08X%08X%08X\n",
                       tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
            } else {
                usleep(100000);
                ptrace(PTRACE_ATTACH, pid, 0, 0);
                wait(NULL);
                ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
                xstate[65] = -1;
                ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
                ptrace(PTRACE_CONT, pid, 0, 0);
                wait(NULL);
            }
            return 1;
        }
    
    Note: the program only tests for the bug using the ptrace() system call.
    The bug can also be reproduced using the rt_sigreturn() system call, but
    only when called from a 32-bit program, since for 64-bit programs the
    kernel restores the FPU state from the signal frame by doing XRSTOR
    directly from userspace memory (with proper error checking).
    Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
    Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
    Reviewed-by: default avatarKees Cook <keescook@chromium.org>
    Reviewed-by: default avatarRik van Riel <riel@redhat.com>
    Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Cc: <stable@vger.kernel.org> [v3.17+]
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Eric Biggers <ebiggers3@gmail.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Kevin Hao <haokexin@gmail.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Michael Halcrow <mhalcrow@google.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Wanpeng Li <wanpeng.li@hotmail.com>
    Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
    Cc: kernel-hardening@lists.openwall.com
    Fixes: 0b29643a ("x86/xsaves: Change compacted format xsave area header")
    Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
    Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    814fb7bb
regset.c 9.76 KB