1. 12 Mar, 2016 18 commits
    • Tejun Heo's avatar
      libata: disable forced PORTS_IMPL for >= AHCI 1.3 · 2bd8e679
      Tejun Heo authored
      commit 566d1827 upstream.
      
      Some early controllers incorrectly reported zero ports in PORTS_IMPL
      register and the ahci driver fabricates PORTS_IMPL from the number of
      ports in those cases.  This hasn't mattered but with the new nvme
      controllers there are cases where zero PORTS_IMPL is valid and should
      be honored.
      
      Disable the workaround for >= AHCI 1.3.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/g/CALCETrU7yMvXEDhjAUShoHEhDwifJGapdw--BKxsP0jmjKGmRw@mail.gmail.com
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [wt: file is drivers/ata/ahci.c in 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2bd8e679
    • Sebastian Andrzej Siewior's avatar
      PCI/AER: Flush workqueue on device remove to avoid use-after-free · c5ab1213
      Sebastian Andrzej Siewior authored
      commit 4ae2182b upstream.
      
      A Root Port's AER structure (rpc) contains a queue of events.  aer_irq()
      enqueues AER status information and schedules aer_isr() to dequeue and
      process it.  When we remove a device, aer_remove() waits for the queue to
      be empty, then frees the rpc struct.
      
      But aer_isr() references the rpc struct after dequeueing and possibly
      emptying the queue, which can cause a use-after-free error as in the
      following scenario with two threads, aer_isr() on the left and a
      concurrent aer_remove() on the right:
      
        Thread A                      Thread B
        --------                      --------
        aer_irq():
          rpc->prod_idx++
                                      aer_remove():
                                        wait_event(rpc->prod_idx == rpc->cons_idx)
                                        # now blocked until queue becomes empty
        aer_isr():                      # ...
          rpc->cons_idx++               # unblocked because queue is now empty
          ...                           kfree(rpc)
          mutex_unlock(&rpc->rpc_mutex)
      
      To prevent this problem, use flush_work() to wait until the last scheduled
      instance of aer_isr() has completed before freeing the rpc struct in
      aer_remove().
      
      I reproduced this use-after-free by flashing a device FPGA and
      re-enumerating the bus to find the new device.  With SLUB debug, this
      crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in
      GPR25:
      
        pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
        Unable to handle kernel paging request for data at address 0x27ef9e3e
        Workqueue: events aer_isr
        GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
        NIP [602f5328] pci_walk_bus+0xd4/0x104
      
      [bhelgaas: changelog, stable tag]
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [wt: in 2.6.32, kfree() is called from aer_delete_rootport()]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c5ab1213
    • Eric Dumazet's avatar
      af_unix: fix struct pid memory leak · 2a890807
      Eric Dumazet authored
      commit fa0dc04d upstream.
      
      Dmitry reported a struct pid leak detected by a syzkaller program.
      
      Bug happens in unix_stream_recvmsg() when we break the loop when a
      signal is pending, without properly releasing scm.
      
      Fixes: b3ca9b02 ("net: fix multithreaded signal handling in unix recv routines")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      [wt: note, according to Rainer & Ben the bug was really introduced in
       2.5.65, not by the commit mentionned in Fixes. 2.6.32 uses siocb->scm
       instead of scm]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2a890807
    • Ben Hutchings's avatar
      pipe: Fix buffer offset after partially failed read · 42646fbf
      Ben Hutchings authored
      Quoting the RHEL advisory:
      
      > It was found that the fix for CVE-2015-1805 incorrectly kept buffer
      > offset and buffer length in sync on a failed atomic read, potentially
      > resulting in a pipe buffer state corruption. A local, unprivileged user
      > could use this flaw to crash the system or leak kernel memory to user
      > space. (CVE-2016-0774, Moderate)
      
      The same flawed fix was applied to stable branches from 2.6.32.y to
      3.14.y inclusive, and I was able to reproduce the issue on 3.2.y.
      We need to give pipe_iov_copy_to_user() a separate offset variable
      and only update the buffer offset if it succeeds.
      
      References: https://rhn.redhat.com/errata/RHSA-2016-0103.htmlSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      42646fbf
    • Linus Torvalds's avatar
      readv/writev: do the same MAX_RW_COUNT truncation that read/write does · adfdad98
      Linus Torvalds authored
      commit 435f49a5 upstream.
      
      We used to protect against overflow, but rather than return an error, do
      what read/write does, namely to limit the total size to MAX_RW_COUNT.
      This is not only more consistent, but it also means that any broken
      low-level read/write routine that still keeps counts in 'int' can't
      break.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      adfdad98
    • Linus Torvalds's avatar
      vfs: make AIO use the proper rw_verify_area() area helpers · cde5406e
      Linus Torvalds authored
      commit a70b52ec upstream.
      
      We had for some reason overlooked the AIO interface, and it didn't use
      the proper rw_verify_area() helper function that checks (for example)
      mandatory locking on the file, and that the size of the access doesn't
      cause us to overflow the provided offset limits etc.
      
      Instead, AIO did just the security_file_permission() thing (that
      rw_verify_area() also does) directly.
      
      This fixes it to do all the proper helper functions, which not only
      means that now mandatory file locking works with AIO too, we can
      actually remove lines of code.
      Reported-by: default avatarManish Honap <manish_honap_vit@yahoo.co.in>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cde5406e
    • Willy Tarreau's avatar
      l2tp: fix another panic in pppol2tp · e3dea307
      Willy Tarreau authored
      Commit 3feec909 ("l2tp: Fix oops in pppol2tp_xmit") was backported
      into 2.6.32.16 to fix a possible null deref in pppol2tp. But the same
      still exists in pppol2tp_sendmsg() possibly causing the same crash.
      Note that this bug doesn't appear to have any other impact than crashing
      the system, as the dereferenced pointer is only used to test a value
      against a 3-bit mask, so it can hardly be abused for anything except
      leaking one third of a bit of memory.
      
      This issue doesn't exist upstream because the code was replaced in 2.6.35
      and the new function l2tp_xmit_skb() performs the appropriate check.
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e3dea307
    • Jeff Layton's avatar
      locks: fix unlock when fcntl_setlk races with a close · 5a7c752d
      Jeff Layton authored
      commit 7f3697e2 upstream.
      
      Dmitry reported that he was able to reproduce the WARN_ON_ONCE that
      fires in locks_free_lock_context when the flc_posix list isn't empty.
      
      The problem turns out to be that we're basically rebuilding the
      file_lock from scratch in fcntl_setlk when we discover that the setlk
      has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END,
      then we may end up with fl_start and fl_end values that differ from
      when the lock was initially set, if the file position or length of the
      file has changed in the interim.
      
      Fix this by just reusing the same lock request structure, and simply
      override fl_type value with F_UNLCK as appropriate. That ensures that
      we really are unlocking the lock that was initially set.
      
      While we're there, make sure that we do pop a WARN_ON_ONCE if the
      removal ever fails. Also return -EBADF in this event, since that's
      what we would have returned if the close had happened earlier.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Fixes: c293621b (stale POSIX lock handling)
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Acked-by: default avatar"J. Bruce Fields" <bfields@fieldses.org>
      [bwh: Backported to 3.2: s/i_flctx->flc_posix/inode->i_flock/ in comments]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5a7c752d
    • Dmitry V. Levin's avatar
      sparc64: fix incorrect sign extension in sys_sparc64_personality · 7cbfc442
      Dmitry V. Levin authored
      commit 525fd5a9 upstream.
      
      The value returned by sys_personality has type "long int".
      It is saved to a variable of type "int", which is not a problem
      yet because the type of task_struct->pesonality is "unsigned int".
      The problem is the sign extension from "int" to "long int"
      that happens on return from sys_sparc64_personality.
      
      For example, a userspace call personality((unsigned) -EINVAL) will
      result to any subsequent personality call, including absolutely
      harmless read-only personality(0xffffffff) call, failing with
      errno set to EINVAL.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7cbfc442
    • Matt Fleming's avatar
      x86/mm/pat: Avoid truncation when converting cpa->numpages to address · 67022807
      Matt Fleming authored
      commit 74256377 upstream.
      
      There are a couple of nasty truncation bugs lurking in the pageattr
      code that can be triggered when mapping EFI regions, e.g. when we pass
      a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
      left by PAGE_SHIFT will truncate the resultant address to 32-bits.
      
      Viorel-Cătălin managed to trigger this bug on his Dell machine that
      provides a ~5GB EFI region which requires 1236992 pages to be mapped.
      When calling populate_pud() the end of the region gets calculated
      incorrectly in the following buggy expression,
      
        end = start + (cpa->numpages << PAGE_SHIFT);
      
      And only 188416 pages are mapped. Next, populate_pud() gets invoked
      for a second time because of the loop in __change_page_attr_set_clr(),
      only this time no pages get mapped because shifting the remaining
      number of pages (1048576) by PAGE_SHIFT is zero. At which point the
      loop in __change_page_attr_set_clr() spins forever because we fail to
      map progress.
      
      Hitting this bug depends very much on the virtual address we pick to
      map the large region at and how many pages we map on the initial run
      through the loop. This explains why this issue was only recently hit
      with the introduction of commit
      
        a5caa209 ("x86/efi: Fix boot crash by mapping EFI memmap
         entries bottom-up at runtime, instead of top-down")
      
      It's interesting to note that safe uses of cpa->numpages do exist in
      the pageattr code. If instead of shifting ->numpages we multiply by
      PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
      so the result is unsigned long.
      
      To avoid surprises when users try to convert very large cpa->numpages
      values to addresses, change the data type from 'int' to 'unsigned
      long', thereby making it suitable for shifting by PAGE_SHIFT without
      any type casting.
      
      The alternative would be to make liberal use of casting, but that is
      far more likely to cause problems in the future when someone adds more
      code and fails to cast properly; this bug was difficult enough to
      track down in the first place.
      Reported-and-tested-by: default avatarViorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
      Acked-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
      Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      67022807
    • Andy Lutomirski's avatar
      x86/mm: Improve switch_mm() barrier comments · 2e79d2ae
      Andy Lutomirski authored
      commit 4eaffdd5 upstream.
      
      My previous comments were still a bit confusing and there was a
      typo. Fix it up.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 71b3c126 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
      Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2e79d2ae
    • Andy Lutomirski's avatar
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · f7d29168
      Andy Lutomirski authored
      commit 71b3c126 upstream.
      
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 2.6.32:
       - There's no flush_tlb_mm_range(), only flush_tlb_mm() which does not use
         INVLPG
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f7d29168
    • Peter Hurley's avatar
      tty: Fix unsafe ldisc reference via ioctl(TIOCGETD) · f8b1cc04
      Peter Hurley authored
      commit 5c17c861 upstream.
      
      ioctl(TIOCGETD) retrieves the line discipline id directly from the
      ldisc because the line discipline id (c_line) in termios is untrustworthy;
      userspace may have set termios via ioctl(TCSETS*) without actually
      changing the line discipline via ioctl(TIOCSETD).
      
      However, directly accessing the current ldisc via tty->ldisc is
      unsafe; the ldisc ptr dereferenced may be stale if the line discipline
      is changing via ioctl(TIOCSETD) or hangup.
      
      Wait for the line discipline reference (just like read() or write())
      to retrieve the "current" line discipline id.
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      [bwh: Backported to 2.6.32: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f8b1cc04
    • Xin Long's avatar
      sctp: translate network order to host order when users get a hmacid · 3125a4fd
      Xin Long authored
      commit 7a84bd46 upstream.
      
      Commit ed5a377d ("sctp: translate host order to network order when
      setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
      but the same issue also exists on getting a hmacid.
      
      We fix it by changing hmacids to host order when users get them with
      getsockopt.
      
      Fixes: Commit ed5a377d ("sctp: translate host order to network order when setting a hmacid")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3125a4fd
    • Karl Heiss's avatar
      sctp: Prevent soft lockup when sctp_accept() is called during a timeout event · 491026ee
      Karl Heiss authored
      commit 635682a1 upstream.
      
      A case can occur when sctp_accept() is called by the user during
      a heartbeat timeout event after the 4-way handshake.  Since
      sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
      bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
      the listening socket but released with the new association socket.
      The result is a deadlock on any future attempts to take the listening
      socket lock.
      
      Note that this race can occur with other SCTP timeouts that take
      the bh_lock_sock() in the event sctp_accept() is called.
      
       BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
       ...
       RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
       RSP: 0018:ffff880028323b20  EFLAGS: 00000206
       RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
       RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
       R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
       R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
       FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
       CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
       Stack:
       ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
       <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
       <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
       Call Trace:
       <IRQ>
       [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
       [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
       [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
       [<ffffffff81497255>] ? ip_rcv+0x275/0x350
       [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
       ...
      
      With lockdep debugging:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       CslRx/12087 is trying to release lock (slock-AF_INET) at:
       [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
       but there are no more locks to release!
      
       other info that might help us debug this:
       2 locks held by CslRx/12087:
       #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
       #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
      
      Ensure the socket taken is also the same one that is released by
      saving a copy of the socket before entering the timeout event
      critical section.
      Signed-off-by: default avatarKarl Heiss <kheiss@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 2.6.32:
       - Net namespaces are not used
       - Keep using sctp_bh_{,un}lock_sock()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      491026ee
    • Johan Hovold's avatar
      USB: visor: fix null-deref at probe · ea26131f
      Johan Hovold authored
      commit cac9b50b upstream.
      
      Fix null-pointer dereference at probe should a (malicious) Treo device
      lack the expected endpoints.
      
      Specifically, the Treo port-setup hack was dereferencing the bulk-in and
      interrupt-in urbs without first making sure they had been allocated by
      core.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ea26131f
    • Oliver Neukum's avatar
      usbvision fix overflow of interfaces array · 9b583e2a
      Oliver Neukum authored
      commit 588afcc1 upstream.
      
      This fixes the crash reported in:
      http://seclists.org/bugtraq/2015/Oct/35
      The interface number needs a sanity check.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Cc: Vladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      [bwh: Backported to 2.6.32: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9b583e2a
    • Vladis Dronov's avatar
      usb: serial: visor: fix crash on detecting device without write_urbs · f188d26d
      Vladis Dronov authored
      commit cb323213 upstream.
      
      The visor driver crashes in clie_5_attach() when a specially crafted USB
      device without bulk-out endpoint is detected. This fix adds a check that
      the device has proper configuration expected by the driver.
      Reported-by: default avatarRalf Spenneberg <ralf@spenneberg.net>
      Signed-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f188d26d
  2. 29 Jan, 2016 22 commits
    • Willy Tarreau's avatar
      Linux 2.6.32.70 · 1a5b69df
      Willy Tarreau authored
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1a5b69df
    • Paolo Bonzini's avatar
      kvm: x86: only channel 0 of the i8254 is linked to the HPET · 24c14705
      Paolo Bonzini authored
      commit e5e57e7a upstream.
      
      While setting the KVM PIT counters in 'kvm_pit_load_count', if
      'hpet_legacy_start' is set, the function disables the timer on
      channel[0], instead of the respective index 'channel'. This is
      because channels 1-3 are not linked to the HPET.  Fix the caller
      to only activate the special HPET processing for channel 0.
      Reported-by: default avatarP J P <pjp@fedoraproject.org>
      Fixes: 0185604cSigned-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit ef90cf3d)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      24c14705
    • Andrew Honig's avatar
      KVM: x86: Reload pit counters for all channels when restoring state · 4d1805f9
      Andrew Honig authored
      commit 0185604c upstream.
      
      Currently if userspace restores the pit counters with a count of 0
      on channels 1 or 2 and the guest attempts to read the count on those
      channels, then KVM will perform a mod of 0 and crash.  This will ensure
      that 0 values are converted to 65536 as per the spec.
      
      This is CVE-2015-7513.
      Signed-off-by: default avatarAndy Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 08b8d1a6)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4d1805f9
    • Andrew Banman's avatar
      mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() · c7bde200
      Andrew Banman authored
      commit 5f0f2887 upstream.
      
      test_pages_in_a_zone() does not account for the possibility of missing
      sections in the given pfn range.  pfn_valid_within always returns 1 when
      CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
      sections to pass the test, leading to a kernel oops.
      
      Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
      for missing sections before proceeding into the zone-check code.
      
      This also prevents a crash from offlining memory devices with missing
      sections.  Despite this, it may be a good idea to keep the related patch
      '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
      missing sections' because missing sections in a memory block may lead to
      other problems not covered by the scope of this fix.
      Signed-off-by: default avatarAndrew Banman <abanman@sgi.com>
      Acked-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 17f6a291)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c7bde200
    • Daniel Kiper's avatar
      mm: add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro · 73734375
      Daniel Kiper authored
      commit a539f353 upstream.
      
      Add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro which aligns given
      pfn to upper section and lower section boundary accordingly.
      
      Required for the latest memory hotplug support for the Xen balloon driver.
      Signed-off-by: default avatarDaniel Kiper <dkiper@net-space.pl>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [wt: only needed for next patch]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      73734375
    • Andrey Ryabinin's avatar
      ipv6/addrlabel: fix ip6addrlbl_get() · 5a7fbabb
      Andrey Ryabinin authored
      commit e459dfee upstream.
      
      ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
      ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
      ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
      
      Fix this by inverting ip6addrlbl_hold() check.
      
      Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarCong Wang <cwang@twopensource.com>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 39b214ba)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5a7fbabb
    • Helge Deller's avatar
      parisc: Fix syscall restarts · 3bf5fe19
      Helge Deller authored
      commit 71a71fb5 upstream.
      
      On parisc syscalls which are interrupted by signals sometimes failed to
      restart and instead returned -ENOSYS which in the worst case lead to
      userspace crashes.
      A similiar problem existed on MIPS and was fixed by commit e967ef02
      ("MIPS: Fix restart of indirect syscalls").
      
      On parisc the current syscall restart code assumes that all syscall
      callers load the syscall number in the delay slot of the ble
      instruction. That's how it is e.g. done in the unistd.h header file:
      	ble 0x100(%sr2, %r0)
      	ldi #syscall_nr, %r20
      Because of that assumption the current code never restored %r20 before
      returning to userspace.
      
      This assumption is at least not true for code which uses the glibc
      syscall() function, which instead uses this syntax:
      	ble 0x100(%sr2, %r0)
      	copy regX, %r20
      where regX depend on how the compiler optimizes the code and register
      usage.
      
      This patch fixes this problem by adding code to analyze how the syscall
      number is loaded in the delay branch and - if needed - copy the syscall
      number to regX prior returning to userspace for the syscall restart.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 9f2dcffe)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3bf5fe19
    • Ed Swierk's avatar
      MIPS: Fix restart of indirect syscalls · 912fcae9
      Ed Swierk authored
      commit e967ef02 upstream.
      
      When 32-bit MIPS userspace invokes a syscall indirectly via syscall(number,
      arg1, ..., arg7), the kernel looks up the actual syscall based on the given
      number, shifts the other arguments to the left, and jumps to the syscall.
      
      If the syscall is interrupted by a signal and indicates it needs to be
      restarted by the kernel (by returning ERESTARTNOINTR for example), the
      syscall must be called directly, since the number is no longer the first
      argument, and the other arguments are now staged for a direct call.
      
      Before shifting the arguments, store the syscall number in pt_regs->regs[2].
      This gets copied temporarily into pt_regs->regs[0] after the syscall returns.
      If the syscall needs to be restarted, handle_signal()/do_signal() copies the
      number back to pt_regs->reg[2], which ends up in $v0 once control returns to
      userspace.
      Signed-off-by: default avatarEd Swierk <eswierk@skyportsystems.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/8929/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 08f865bb)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      912fcae9
    • Alan Stern's avatar
      USB: fix invalid memory access in hub_activate() · 533030ca
      Alan Stern authored
      commit e50293ef upstream.
      
      Commit 8520f380 ("USB: change hub initialization sleeps to
      delayed_work") changed the hub_activate() routine to make part of it
      run in a workqueue.  However, the commit failed to take a reference to
      the usb_hub structure or to lock the hub interface while doing so.  As
      a result, if a hub is plugged in and quickly unplugged before the work
      routine can run, the routine will try to access memory that has been
      deallocated.  Or, if the hub is unplugged while the routine is
      running, the memory may be deallocated while it is in active use.
      
      This patch fixes the problem by taking a reference to the usb_hub at
      the start of hub_activate() and releasing it at the end (when the work
      is finished), and by locking the hub interface while the work routine
      is running.  It also adds a check at the start of the routine to see
      if the hub has already been disconnected, in which nothing should be
      done.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Tested-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Fixes: 8520f380 ("USB: change hub initialization sleeps to delayed_work")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: add prototype for hub_release() before first use]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 10037421)
      [wt: made a few changes :
        - adjusted context due to some autopm code being added only in 2.6.33
        - no device_{lock,unlock}() in 2.6.32, use up/down(&->sem) instead]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      533030ca
    • Dan Carpenter's avatar
      USB: ipaq.c: fix a timeout loop · 00972bcd
      Dan Carpenter authored
      commit abdc9a3b upstream.
      
      The code expects the loop to end with "retries" set to zero but, because
      it is a post-op, it will end set to -1.  I have fixed this by moving the
      decrement inside the loop.
      
      Fixes: 014aa2a3 ('USB: ipaq: minor ipaq_open() cleanup.')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 53a68d3f)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      00972bcd
    • Michael Holzheu's avatar
      s390/dis: Fix handling of format specifiers · 22a1caa0
      Michael Holzheu authored
      commit 272fa59c upstream.
      
      The print_insn() function returns strings like "lghi %r1,0". To escape the
      '%' character in sprintf() a second '%' is used. For example "lghi %%r1,0"
      is converted into "lghi %r1,0".
      
      After print_insn() the output string is passed to printk(). Because format
      specifiers like "%r" or "%f" are ignored by printk() this works by chance
      most of the time. But for instructions with control registers like
      "lctl %c6,%c6,780" this fails because printk() interprets "%c" as
      character format specifier.
      
      Fix this problem and escape the '%' characters twice.
      
      For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf()
      into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780".
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [bwh: Backported to 3.2: drop the OPERAND_VR case]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 45f32e35)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      22a1caa0
    • Johan Hovold's avatar
      spi: fix parent-device reference leak · 2bafc5d5
      Johan Hovold authored
      commit 157f38f9 upstream.
      
      Fix parent-device reference leak due to SPI-core taking an unnecessary
      reference to the parent when allocating the master structure, a
      reference that was never released.
      
      Note that driver core takes its own reference to the parent when the
      master device is registered.
      
      Fixes: 49dce689 ("spi doesn't need class_device")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 40ddf30c)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2bafc5d5
    • Tilman Schmidt's avatar
      ser_gigaset: fix deallocation of platform device structure · 0e0eaf7e
      Tilman Schmidt authored
      commit 4c5e354a upstream.
      
      When shutting down the device, the struct ser_cardstate must not be
      kfree()d immediately after the call to platform_device_unregister()
      since the embedded struct platform_device is still in use.
      Move the kfree() call to the release method instead.
      Signed-off-by: default avatarTilman Schmidt <tilman@imap.cc>
      Fixes: 2869b23e ("drivers/isdn/gigaset: new M101 driver (v2)")
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 521e4101)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0e0eaf7e
    • Dan Carpenter's avatar
      mISDN: fix a loop count · 1fe6e687
      Dan Carpenter authored
      commit 40d24c4d upstream.
      
      There are two issue here.
      1)  cnt starts as maxloop + 1 so all these loops iterate one more time
          than intended.
      2)  At the end of the loop we test for "if (maxloop && !cnt)" but for
          the first two loops, we end with cnt equal to -1.  Changing this to
          a pre-op means we end with cnt set to 0.
      
      Fixes: cae86d4a ('mISDN: Add driver for Infineon ISDN chipset family')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 7eb2a015)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1fe6e687
    • Peter Hurley's avatar
      tty: Fix GPF in flush_to_ldisc() · 0dbf4344
      Peter Hurley authored
      commit 9ce119f3 upstream.
      
      A line discipline which does not define a receive_buf() method can
      can cause a GPF if data is ever received [1]. Oddly, this was known
      to the author of n_tracesink in 2011, but never fixed.
      
      [1] GPF report
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          PGD 3752d067 PUD 37a7b067 PMD 0
          Oops: 0010 [#1] SMP KASAN
          Modules linked in:
          CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: events_unbound flush_to_ldisc
          task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000
          RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
          RSP: 0018:ffff88006db67b50  EFLAGS: 00010246
          RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102
          RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388
          RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0
          R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000
          R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8
          FS:  0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0
          Stack:
           ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000
           ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90
           ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078
          Call Trace:
           [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030
           [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162
           [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302
           [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
          Code:  Bad RIP value.
          RIP  [<          (null)>]           (null)
           RSP <ffff88006db67b50>
          CR2: 0000000000000000
          ---[ end trace a587f8947e54d6ea ]---
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit b23324ff)
      [wt: applied to drivers/char/tty_buffer.c instead]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0dbf4344
    • James Bottomley's avatar
      ses: fix additional element traversal bug · d4b6a10d
      James Bottomley authored
      commit 5e103356 upstream.
      
      KASAN found that our additional element processing scripts drop off
      the end of the VPD page into unallocated space.  The reason is that
      not every element has additional information but our traversal
      routines think they do, leading to them expecting far more additional
      information than is present.  Fix this by adding a gate to the
      traversal routine so that it only processes elements that are expected
      to have additional information (list is in SES-2 section 6.1.13.1:
      Additional Element Status diagnostic page overview)
      Reported-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Tested-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 344d6d02)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d4b6a10d
    • James Bottomley's avatar
      ses: Fix problems with simple enclosures · f49fbe9e
      James Bottomley authored
      commit 3417c1b5 upstream.
      
      Simple enclosure implementations (mostly USB) are allowed to return only
      page 8 to every diagnostic query.  That really confuses our
      implementation because we assume the return is the page we asked for and
      end up doing incorrect offsets based on bogus information leading to
      accesses outside of allocated ranges.  Fix that by checking the page
      code of the return and giving an error if it isn't the one we asked for.
      This should fix reported bugs with USB storage by simply refusing to
      attach to enclosures that behave like this.  It's also good defensive
      practise now that we're starting to see more USB enclosures.
      Reported-by: default avatarAndrea Gelmini <andrea.gelmini@gelma.net>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 25ef9385)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f49fbe9e
    • Johannes Berg's avatar
      rfkill: copy the name into the rfkill struct · 8a90c757
      Johannes Berg authored
      commit b7bb1100 upstream.
      
      Some users of rfkill, like NFC and cfg80211, use a dynamic name when
      allocating rfkill, in those cases dev_name(). Therefore, the pointer
      passed to rfkill_alloc() might not be valid forever, I specifically
      found the case that the rfkill name was quite obviously an invalid
      pointer (or at least garbage) when the wiphy had been renamed.
      
      Fix this by making a copy of the rfkill name in rfkill_alloc().
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 6f23bc6f)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8a90c757
    • Eric Dumazet's avatar
      af_unix: fix a fatal race with bit fields · 24ba3c53
      Eric Dumazet authored
      commit 60bc851a upstream.
      
      Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
      uses 64bit instructions to manipulate them.
      If the 64bit word includes any atomic_t or spinlock_t, we can lose
      critical concurrent changes.
      
      This is happening in af_unix, where unix_sk(sk)->gc_candidate/
      gc_maybe_cycle/lock share the same 64bit word.
      
      This leads to fatal deadlock, as one/several cpus spin forever
      on a spinlock that will never be available again.
      
      A safer way would be to use a long to store flags.
      This way we are sure compiler/arch wont do bad things.
      
      As we own unix_gc_lock spinlock when clearing or setting bits,
      we can use the non atomic __set_bit()/__clear_bit().
      
      recursion_level can share the same 64bit location with the spinlock,
      as it is set only with this spinlock held.
      
      [1] bug fixed in gcc-4.8.0 :
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080Reported-by: default avatarAmbrose Feinstein <ambrose@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 2ee9cbe7)
      [wt: adjusted context]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      24ba3c53
    • Marcelo Ricardo Leitner's avatar
      sctp: update the netstamp_needed counter when copying sockets · 7e0e67b0
      Marcelo Ricardo Leitner authored
      [ Upstream commit 01ce63c9 ]
      
      Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy
      related to disabling sock timestamp.
      
      When SCTP accepts an association or peel one off, it copies sock flags
      but forgot to call net_enable_timestamp() if a packet timestamping flag
      was copied, leading to extra calls to net_disable_timestamp() whenever
      such clones were closed.
      
      The fix is to call net_enable_timestamp() whenever we copy a sock with
      that flag on, like tcp does.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: SK_FLAGS_TIMESTAMP is newly defined]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit d85242d9)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7e0e67b0
    • Daniel Borkmann's avatar
      net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds · 224994fa
      Daniel Borkmann authored
      [ Upstream commit 6900317f ]
      
      David and HacKurx reported a following/similar size overflow triggered
      in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:
      
      (Already fixed in later grsecurity versions by Brad and PaX Team.)
      
      [ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
                     cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
      [ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
      [ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
      [ 1002.296153]  ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
      [ 1002.296162]  ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
      [ 1002.296169]  ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
      [ 1002.296176] Call Trace:
      [ 1002.296190]  [<ffffffff818129ba>] dump_stack+0x45/0x57
      [ 1002.296200]  [<ffffffff8121f838>] report_size_overflow+0x38/0x60
      [ 1002.296209]  [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
      [ 1002.296220]  [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
      [ 1002.296228]  [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
      [ 1002.296236]  [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
      [ 1002.296243]  [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
      [ 1002.296248]  [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
      [ 1002.296257]  [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
      [ 1002.296263]  [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
      [ 1002.296271]  [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85
      
      Further investigation showed that this can happen when an *odd* number of
      fds are being passed over AF_UNIX sockets.
      
      In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
      where i is the number of successfully passed fds, differ by 4 bytes due
      to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
      on 64 bit. The padding is used to align subsequent cmsg headers in the
      control buffer.
      
      When the control buffer passed in from the receiver side *lacks* these 4
      bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
      overflow in scm_detach_fds():
      
        int cmlen = CMSG_LEN(i * sizeof(int));  <--- cmlen w/o tail-padding
        err = put_user(SOL_SOCKET, &cm->cmsg_level);
        if (!err)
          err = put_user(SCM_RIGHTS, &cm->cmsg_type);
        if (!err)
          err = put_user(cmlen, &cm->cmsg_len);
        if (!err) {
          cmlen = CMSG_SPACE(i * sizeof(int));  <--- cmlen w/ 4 byte extra tail-padding
          msg->msg_control += cmlen;
          msg->msg_controllen -= cmlen;         <--- iff no tail-padding space here ...
        }                                            ... wrap-around
      
      F.e. it will wrap to a length of 18446744073709551612 bytes in case the
      receiver passed in msg->msg_controllen of 20 bytes, and the sender
      properly transferred 1 fd to the receiver, so that its CMSG_LEN results
      in 20 bytes and CMSG_SPACE in 24 bytes.
      
      In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
      issue in my tests as alignment seems always on 4 byte boundary. Same
      should be in case of native 32 bit, where we end up with 4 byte boundaries
      as well.
      
      In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
      a single fd would mean that on successful return, msg->msg_controllen is
      being set by the kernel to 24 bytes instead, thus more than the input
      buffer advertised. It could f.e. become an issue if such application later
      on zeroes or copies the control buffer based on the returned msg->msg_controllen
      elsewhere.
      
      Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).
      
      Going over the code, it seems like msg->msg_controllen is not being read
      after scm_detach_fds() in scm_recv() anymore by the kernel, good!
      
      Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
      and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
      and ___sys_recvmsg() places the updated length, that is, new msg_control -
      old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
      in the example).
      
      Long time ago, Wei Yongjun fixed something related in commit 1ac70e7a
      ("[NET]: Fix function put_cmsg() which may cause usr application memory
      overflow").
      
      RFC3542, section 20.2. says:
      
        The fields shown as "XX" are possible padding, between the cmsghdr
        structure and the data, and between the data and the next cmsghdr
        structure, if required by the implementation. While sending an
        application may or may not include padding at the end of last
        ancillary data in msg_controllen and implementations must accept both
        as valid. On receiving a portable application must provide space for
        padding at the end of the last ancillary data as implementations may
        copy out the padding at the end of the control message buffer and
        include it in the received msg_controllen. When recvmsg() is called
        if msg_controllen is too small for all the ancillary data items
        including any trailing padding after the last item an implementation
        may set MSG_CTRUNC.
      
      Since we didn't place MSG_CTRUNC for already quite a long time, just do
      the same as in 1ac70e7a to avoid an overflow.
      
      Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
      error in SCM_RIGHTS code sample"). Some people must have copied this (?),
      thus it got triggered in the wild (reported several times during boot by
      David and HacKurx).
      
      No Fixes tag this time as pre 2002 (that is, pre history tree).
      Reported-by: default avatarDavid Sterba <dave@jikos.cz>
      Reported-by: default avatarHacKurx <hackurx@gmail.com>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 831a2a17)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      224994fa
    • Eric Dumazet's avatar
      tcp: initialize tp->copied_seq in case of cross SYN connection · 03bc31b9
      Eric Dumazet authored
      [ Upstream commit 142a2e7e ]
      
      Dmitry provided a syzkaller (http://github.com/google/syzkaller)
      generated program that triggers the WARNING at
      net/ipv4/tcp.c:1729 in tcp_recvmsg() :
      
      WARN_ON(tp->copied_seq != tp->rcv_nxt &&
              !(flags & (MSG_PEEK | MSG_TRUNC)));
      
      His program is specifically attempting a Cross SYN TCP exchange,
      that we support (for the pleasure of hackers ?), but it looks we
      lack proper tcp->copied_seq initialization.
      
      Thanks again Dmitry for your report and testings.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 6cfa9781)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      03bc31b9