1. 14 Nov, 2014 34 commits
  2. 13 Nov, 2014 6 commits
    • Jan Kara's avatar
      mm: Remove false WARN_ON from pagecache_isize_extended() · ae2490a0
      Jan Kara authored
      commit f55fefd1 upstream.
      
      The WARN_ON checking whether i_mutex is held in
      pagecache_isize_extended() was wrong because some filesystems (e.g.
      XFS) use different locks for serialization of truncates / writes. So
      just remove the check.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ae2490a0
    • Eric W. Biederman's avatar
      mnt: Prevent pivot_root from creating a loop in the mount tree · 31b7cb6b
      Eric W. Biederman authored
      commit 0d082601 upstream.
      
      Andy Lutomirski recently demonstrated that when chroot is used to set
      the root path below the path for the new ``root'' passed to pivot_root
      the pivot_root system call succeeds and leaks mounts.
      
      In examining the code I see that starting with a new root that is
      below the current root in the mount tree will result in a loop in the
      mount tree after the mounts are detached and then reattached to one
      another.  Resulting in all kinds of ugliness including a leak of that
      mounts involved in the leak of the mount loop.
      
      Prevent this problem by ensuring that the new mount is reachable from
      the current root of the mount tree.
      
      [Added stable cc.  Fixes CVE-2014-7970.  --Andy]
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.orgSigned-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      31b7cb6b
    • Andy Lutomirski's avatar
      x86_64, entry: Fix out of bounds read on sysenter · 83e8cfda
      Andy Lutomirski authored
      commit 653bc77a upstream.
      
      Rusty noticed a Really Bad Bug (tm) in my NT fix.  The entry code
      reads out of bounds, causing the NT fix to be unreliable.  But, and
      this is much, much worse, if your stack is somehow just below the
      top of the direct map (or a hole), you read out of bounds and crash.
      
      Excerpt from the crash:
      
      [    1.129513] RSP: 0018:ffff88001da4bf88  EFLAGS: 00010296
      
        2b:*    f7 84 24 90 00 00 00     testl  $0x4000,0x90(%rsp)
      
      That read is deterministically above the top of the stack.  I
      thought I even single-stepped through this code when I wrote it to
      check the offset, but I clearly screwed it up.
      
      Fixes: 8c7aa698 ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
      Reported-by: default avatarRusty Russell <rusty@ozlabs.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      83e8cfda
    • Andy Lutomirski's avatar
      x86_64, entry: Filter RFLAGS.NT on entry from userspace · 04de774b
      Andy Lutomirski authored
      commit 8c7aa698 upstream.
      
      The NT flag doesn't do anything in long mode other than causing IRET
      to #GP.  Oddly, CPL3 code can still set NT using popf.
      
      Entry via hardware or software interrupt clears NT automatically, so
      the only relevant entries are fast syscalls.
      
      If user code causes kernel code to run with NT set, then there's at
      least some (small) chance that it could cause trouble.  For example,
      user code could cause a call to EFI code with NT set, and who knows
      what would happen?  Apparently some games on Wine sometimes do
      this (!), and, if an IRET return happens, they will segfault.  That
      segfault cannot be handled, because signal delivery fails, too.
      
      This patch programs the CPU to clear NT on entry via SYSCALL (both
      32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
      in software on entry via SYSENTER.
      
      To save a few cycles, this borrows a trick from Jan Beulich in Xen:
      it checks whether NT is set before trying to clear it.  As a result,
      it seems to have very little effect on SYSENTER performance on my
      machine.
      
      There's another minor bug fix in here: it looks like the CFI
      annotations were wrong if CONFIG_AUDITSYSCALL=n.
      
      Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
      
      I haven't touched anything on 32-bit kernels.
      
      The syscall mask change comes from a variant of this patch by Anish
      Bhatt.
      
      Note to stable maintainers: there is no known security issue here.
      A misguided program can set NT and cause the kernel to try and fail
      to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
      on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275Reported-by: default avatarAnish Bhatt <anish@chelsio.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      04de774b
    • Rabin Vincent's avatar
      tracing/syscalls: Ignore numbers outside NR_syscalls' range · 9f156016
      Rabin Vincent authored
      commit 086ba77a upstream.
      
      ARM has some private syscalls (for example, set_tls(2)) which lie
      outside the range of NR_syscalls.  If any of these are called while
      syscall tracing is being performed, out-of-bounds array access will
      occur in the ftrace and perf sys_{enter,exit} handlers.
      
       # trace-cmd record -e raw_syscalls:* true && trace-cmd report
       ...
       true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
       true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
       true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
       true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
       ...
      
       # trace-cmd record -e syscalls:* true
       [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
       [   17.289590] pgd = 9e71c000
       [   17.289696] [aaaaaace] *pgd=00000000
       [   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
       [   17.290169] Modules linked in:
       [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
       [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
       [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
       [   17.290866] LR is at syscall_trace_enter+0x124/0x184
      
      Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
      
      Commit cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      added the check for less than zero, but it should have also checked
      for greater than NR_syscalls.
      
      Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
      
      Fixes: cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      9f156016
    • Eric Rannaud's avatar
      fs: allow open(dir, O_TMPFILE|..., 0) with mode 0 · cd8d57dc
      Eric Rannaud authored
      commit 69a91c23 upstream.
      
      The man page for open(2) indicates that when O_CREAT is specified, the
      'mode' argument applies only to future accesses to the file:
      
      	Note that this mode applies only to future accesses of the newly
      	created file; the open() call that creates a read-only file
      	may well return a read/write file descriptor.
      
      The man page for open(2) implies that 'mode' is treated identically by
      O_CREAT and O_TMPFILE.
      
      O_TMPFILE, however, behaves differently:
      
      	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
      	assert(fd == -1);
      	assert(errno == EACCES);
      
      	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
      	assert(fd > 0);
      
      For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
      
      	if (*opened & FILE_CREATED) {
      		/* Don't check for write permission, don't truncate */
      		open_flag &= ~O_TRUNC;
      		will_truncate = false;
      		acc_mode = MAY_OPEN;
      		path_to_nameidata(path, nd);
      		goto finish_open_created;
      	}
      
      But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
      may_open().
      
      This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
      inode is created, may_open() is called with acc_mode = MAY_OPEN, in
      do_tmpfile().
      
      A different, but related glibc bug revealed the discrepancy:
      https://sourceware.org/bugzilla/show_bug.cgi?id=17523
      
      The glibc lazily loads the 'mode' argument of open() and openat() using
      va_arg() only if O_CREAT is present in 'flags' (to support both the 2
      argument and the 3 argument forms of open; same idea for openat()).
      However, the glibc ignores the 'mode' argument if O_TMPFILE is in
      'flags'.
      
      On x86_64, for open(), it magically works anyway, as 'mode' is in
      RDX when entering open(), and is still in RDX on SYSCALL, which is where
      the kernel looks for the 3rd argument of a syscall.
      
      But openat() is not quite so lucky: 'mode' is in RCX when entering the
      glibc wrapper for openat(), while the kernel looks for the 4th argument
      of a syscall in R10. Indeed, the syscall calling convention differs from
      the regular calling convention in this respect on x86_64. So the kernel
      sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
      fails with EACCES.
      Signed-off-by: default avatarEric Rannaud <e@nanocritical.com>
      Acked-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      cd8d57dc