1. 05 Jun, 2017 14 commits
  2. 04 Apr, 2017 5 commits
    • Ben Hutchings's avatar
      Linux 3.2.88 · ac11752c
      Ben Hutchings authored
      ac11752c
    • Ben Hutchings's avatar
      keys: Guard against null match function in keyring_search_aux() · e2b41f76
      Ben Hutchings authored
      The "dead" key type has no match operation, and a search for keys of
      this type can cause a null dereference in keyring_search_aux().
      keyring_search() has a check for this, but request_keyring_and_link()
      does not.  Move the check into keyring_search_aux(), covering both of
      them.
      
      This was fixed upstream by commit c06cfb08 ("KEYS: Remove
      key_type::match in favour of overriding default by match_preparse"),
      part of a series of large changes that are not suitable for
      backporting.
      
      CVE-2017-2647 / CVE-2017-6951
      Reported-by: default avatarIgor Redko <redkoi@virtuozzo.com>
      Reported-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      References: https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2017-2647Reported-by: default avataridl3r <idler1984@gmail.com>
      References: https://www.spinics.net/lists/keyrings/msg01845.htmlSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      e2b41f76
    • Guillaume Nault's avatar
      l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() · 2147a170
      Guillaume Nault authored
      commit 32c23116 upstream.
      
      Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
      Without lock, a concurrent call could modify the socket flags between
      the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
      a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
      would then leave a stale pointer there, generating use-after-free
      errors when walking through the list or modifying adjacent entries.
      
      BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
      Write of size 8 by task syz-executor/10987
      CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
       ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
       ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
       ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
      Call Trace:
       [<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
       [<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
       [<     inline     >] print_address_description mm/kasan/report.c:194
       [<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
       [<     inline     >] kasan_report mm/kasan/report.c:303
       [<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
       [<     inline     >] __write_once_size ./include/linux/compiler.h:249
       [<     inline     >] __hlist_del ./include/linux/list.h:622
       [<     inline     >] hlist_del_init ./include/linux/list.h:637
       [<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
       [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
      Allocated:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
       [ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
       [ 1116.897025] [<     inline     >] slab_post_alloc_hook mm/slab.h:417
       [ 1116.897025] [<     inline     >] slab_alloc_node mm/slub.c:2708
       [ 1116.897025] [<     inline     >] slab_alloc mm/slub.c:2716
       [ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
       [ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
       [ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
       [ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
       [ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
       [ 1116.897025] [<     inline     >] sock_create net/socket.c:1193
       [ 1116.897025] [<     inline     >] SYSC_socket net/socket.c:1223
       [ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
       [ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Freed:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
       [ 1116.897025] [<     inline     >] slab_free_hook mm/slub.c:1352
       [ 1116.897025] [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
       [ 1116.897025] [<     inline     >] slab_free mm/slub.c:2951
       [ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
       [ 1116.897025] [<     inline     >] sk_prot_free net/core/sock.c:1369
       [ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
       [ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
       [ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
       [ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
       [ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
       [ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
       [ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [ 1116.897025] [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Memory state around the buggy address:
       ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                          ^
       ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      ==================================================================
      
      The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.
      
      Fixes: c51ce497 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: drop IPv6 changes]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2147a170
    • Michal Hocko's avatar
      mm/huge_memory.c: fix up "mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp" backport · 2ea68951
      Michal Hocko authored
      This is a stable follow up fix for an incorrect backport. The issue is
      not present in the upstream kernel.
      
      Miroslav has noticed the following splat when testing my 3.2 forward
      port of 8310d48b ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for
      thp") to 3.12:
      
      BUG: Bad page state in process a.out  pfn:26400
      page:ffffea000085e000 count:0 mapcount:1 mapping:          (null) index:0x7f049d600
      page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked)
      page dumped because: nonzero mapcount
      [iii]
      CPU: 2 PID: 5926 Comm: a.out Tainted: G            E    3.12.61-0-default #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
       0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7
       ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000
       ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009
      Call Trace:
       [<ffffffff8100475d>] dump_trace+0x7d/0x2d0
       [<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170
       [<ffffffff81005ce1>] show_stack+0x21/0x50
       [<ffffffff81515830>] dump_stack+0x5d/0x78
       [<ffffffff815118a5>] bad_page.part.67+0xe8/0x102
       [<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0
       [<ffffffff81141275>] __free_pages_ok+0x15/0xd0
       [<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0
       [<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0
       [<ffffffff811a3b18>] vfs_write+0xb8/0x1e0
       [<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0
       [<ffffffff81523b49>] system_call_fastpath+0x16/0x1b
       [<00007f049da18573>] 0x7f049da18572
      
      The problem is that the original 3.2 backport didn't return NULL page on
      the FOLL_COW page and so the page got reused.
      Reported-and-tested-by: default avatarMiroslav Beneš <mbenes@suse.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2ea68951
    • Eric Dumazet's avatar
      ipv4: keep skb->dst around in presence of IP options · 6892986c
      Eric Dumazet authored
      Upstream commit 34b2cef2
      ("ipv4: keep skb->dst around in presence of IP options") incorrectly
      root caused commit d826eb14 ("ipv4: PKTINFO doesnt need dst
      reference") as bug origin.
      
      This patch should fix the issue for 3.2.xx stable kernels, since IPv4
      options seem to get more traction these days, after years of oblivion ;)
      
      Fixes: f84af32c ("net: ip_queue_rcv_skb() helper"))
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAnarcheuz Fritz <anarcheuz@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6892986c
  3. 16 Mar, 2017 21 commits
    • Ben Hutchings's avatar
      Linux 3.2.87 · a5d34b6e
      Ben Hutchings authored
      a5d34b6e
    • Alexander Popov's avatar
      tty: n_hdlc: get rid of racy n_hdlc.tbuf · d7ac6cf6
      Alexander Popov authored
      commit 82f2341c upstream.
      
      Currently N_HDLC line discipline uses a self-made singly linked list for
      data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
      an error.
      
      The commit be10eb75
      ("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
      After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
      one data buffer to tx_free_buf_list twice. That causes double free in
      n_hdlc_release().
      
      Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
      in case of tx error put current data buffer after the head of tx_buf_list.
      Signed-off-by: default avatarAlexander Popov <alex.popov@linux.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d7ac6cf6
    • Jiri Pirko's avatar
      list: introduce list_first_entry_or_null · 4157e287
      Jiri Pirko authored
      commit 6d7581e6 upstream.
      
      non-rcu variant of list_first_or_null_rcu
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4157e287
    • Jiri Slaby's avatar
      TTY: n_hdlc, fix lockdep false positive · bbaa1c66
      Jiri Slaby authored
      commit e9b736d8 upstream.
      
      The class of 4 n_hdls buf locks is the same because a single function
      n_hdlc_buf_list_init is used to init all the locks. But since
      flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls
      n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep
      emits a warning:
      =============================================
      [ INFO: possible recursive locking detected ]
      4.3.0-25.g91e30a7-default #1 Not tainted
      ---------------------------------------------
      a.out/1248 is trying to acquire lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
      
      but task is already holding lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&list->spinlock)->rlock);
        lock(&(&list->spinlock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by a.out/1248:
       #0:  (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50
       #1:  (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      ...
      Call Trace:
      ...
       [<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70
       [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
       [<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc]
       [<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40
      ...
      
      Fix it by initializing the spin_locks separately. This removes also
      reduntand memset of a freshly kzallocated space.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bbaa1c66
    • Marcelo Ricardo Leitner's avatar
      sctp: deny peeloff operation on asocs with threads sleeping on it · 6c24f537
      Marcelo Ricardo Leitner authored
      commit dfcb9f4f upstream.
      
      commit 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      attempted to avoid a BUG_ON call when the association being used for a
      sendmsg() is blocked waiting for more sndbuf and another thread did a
      peeloff operation on such asoc, moving it to another socket.
      
      As Ben Hutchings noticed, then in such case it would return without
      locking back the socket and would cause two unlocks in a row.
      
      Further analysis also revealed that it could allow a double free if the
      application managed to peeloff the asoc that is created during the
      sendmsg call, because then sctp_sendmsg() would try to free the asoc
      that was created only for that call.
      
      This patch takes another approach. It will deny the peeloff operation
      if there is a thread sleeping on the asoc, so this situation doesn't
      exist anymore. This avoids the issues described above and also honors
      the syscalls that are already being handled (it can be multiple sendmsg
      calls).
      
      Joint work with Xin Long.
      
      Fixes: 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6c24f537
    • Marcelo Ricardo Leitner's avatar
      sctp: avoid BUG_ON on sctp_wait_for_sndbuf · 8b9f297c
      Marcelo Ricardo Leitner authored
      commit 2dcab598 upstream.
      
      Alexander Popov reported that an application may trigger a BUG_ON in
      sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
      waiting on it to queue more data and meanwhile another thread peels off
      the association being used by the first thread.
      
      This patch replaces the BUG_ON call with a proper error handling. It
      will return -EPIPE to the original sendmsg call, similarly to what would
      have been done if the association wasn't found in the first place.
      Acked-by: default avatarAlexander Popov <alex.popov@linux.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8b9f297c
    • Davidlohr Bueso's avatar
      ipc/shm: Fix shmat mmap nil-page protection · c14d51eb
      Davidlohr Bueso authored
      commit 95e91b83 upstream.
      
      The issue is described here, with a nice testcase:
      
          https://bugzilla.kernel.org/show_bug.cgi?id=192931
      
      The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
      the address rounded down to 0.  For the regular mmap case, the
      protection mentioned above is that the kernel gets to generate the
      address -- arch_get_unmapped_area() will always check for MAP_FIXED and
      return that address.  So by the time we do security_mmap_addr(0) things
      get funky for shmat().
      
      The testcase itself shows that while a regular user crashes, root will
      not have a problem attaching a nil-page.  There are two possible fixes
      to this.  The first, and which this patch does, is to simply allow root
      to crash as well -- this is also regular mmap behavior, ie when hacking
      up the testcase and adding mmap(...  |MAP_FIXED).  While this approach
      is the safer option, the second alternative is to ignore SHM_RND if the
      rounded address is 0, thus only having MAP_SHARED flags.  This makes the
      behavior of shmat() identical to the mmap() case.  The downside of this
      is obviously user visible, but does make sense in that it maintains
      semantics after the round-down wrt 0 address and mmap.
      
      Passes shm related ltp tests.
      
      Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarGareth Evans <gareth.evans@contextis.co.uk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: use SHMLBA constant instead of shmlba parameter]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c14d51eb
    • Ben Hutchings's avatar
      Revert "KVM: x86: expose MSR_TSC_AUX to userspace" · 8e19e5ef
      Ben Hutchings authored
      This reverts commit bc48f6f5, which was
      commit 9dbe6cf9 upstream.  It depends on
      several other large commits to work, and without them causes a regression.
      
      References: https://bugzilla.redhat.com/show_bug.cgi?id=1408333Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Eric Wheeler <kvm@lists.ewheeler.net>
      8e19e5ef
    • Hangbin Liu's avatar
      igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() · 688ddc50
      Hangbin Liu authored
      commit 9c8bb163 upstream.
      
      In function igmpv3/mld_add_delrec() we allocate pmc and put it in
      idev->mc_tomb, so we should free it when we don't need it in del_delrec().
      But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.
      
      Fixes: 24803f38 ("igmp: do not remove igmp souce list info when ...")
      Fixes: 1666d49e ("mld: do not remove mld souce list info when ...")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      688ddc50
    • Hangbin Liu's avatar
      mld: do not remove mld souce list info when set link down · 7c906c36
      Hangbin Liu authored
      commit 1666d49e upstream.
      
      This is an IPv6 version of commit 24803f38 ("igmp: do not remove igmp
      souce list..."). In mld_del_delrec(), we will restore back all source filter
      info instead of flush them.
      
      Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
      we should not remove source list info when set link down. Remove
      igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
      ipv6_mc_down().
      
      Also clear all source info after igmp6_group_dropped() instead of in it
      because ipv6_mc_down() will call igmp6_group_dropped().
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Timer code moved around in ipv6_mc_down() is different
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7c906c36
    • Hangbin Liu's avatar
      igmp: do not remove igmp souce list info when set link down · bd1b664a
      Hangbin Liu authored
      commit 24803f38 upstream.
      
      In commit 24cf3af3 ("igmp: call ip_mc_clear_src..."), we forgot to remove
      igmpv3_clear_delrec() in ip_mc_down(), which also called ip_mc_clear_src().
      This make us clear all IGMPv3 source filter info after NETDEV_DOWN.
      Move igmpv3_clear_delrec() to ip_mc_destroy_dev() and then no need
      ip_mc_clear_src() in ip_mc_destroy_dev().
      
      On the other hand, we should restore back instead of free all source filter
      info in igmpv3_del_delrec(). Or we will not able to restore IGMPv3 source
      filter info after NETDEV_UP and NETDEV_POST_TYPE_CHANGE.
      
      Fixes: 24cf3af3 ("igmp: call ip_mc_clear_src() only when ...")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Use IGMP_Unsolicited_Report_Count instead of sysctl_igmp_qrv
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bd1b664a
    • Willem de Bruijn's avatar
      macvtap: read vnet_hdr_size once · 43df37db
      Willem de Bruijn authored
      [ Upstream commit 837585a5 ]
      
      When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
      Data length is verified to be greater than or equal to expected header
      length tun->vnet_hdr_sz before copying.
      
      Macvtap functions read the value once, but unless READ_ONCE is used,
      the compiler may ignore this and read multiple times. Enforce a single
      read and locally cached value to avoid updates between test and use.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: BAckported to 3.2:
       - Use ACCESS_ONCE() instead of READ_ONCE()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      43df37db
    • Willem de Bruijn's avatar
      tun: read vnet_hdr_sz once · 6451245e
      Willem de Bruijn authored
      [ Upstream commit e1edab87 ]
      
      When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
      Data length is verified to be greater than or equal to expected header
      length tun->vnet_hdr_sz before copying.
      
      Read this value once and cache locally, as it can be updated between
      the test and use (TOCTOU).
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      CC: Eric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Use ACCESS_ONCE() instead of READ_ONCE()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6451245e
    • Herbert Xu's avatar
      tun: Fix TUN_PKT_STRIP setting · 12e8c6a7
      Herbert Xu authored
      commit 2eb783c4 upstream.
      
      We set the flag TUN_PKT_STRIP if the user buffer provided is too
      small to contain the entire packet plus meta-data.  However, this
      has been broken ever since we added GSO meta-data.  VLAN acceleration
      also has the same problem.
      
      This patch fixes this by taking both into account when setting the
      TUN_PKT_STRIP flag.
      
      The fact that this has been broken for six years without anyone
      realising means that nobody actually uses this flag.
      
      Fixes: f43798c2 ("tun: Allow GSO using virtio_net_hdr")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - No VLAN acceleration support
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      12e8c6a7
    • Eric Dumazet's avatar
      tcp: fix 0 divide in __tcp_select_window() · 35558f80
      Eric Dumazet authored
      [ Upstream commit 06425c30 ]
      
      syszkaller fuzzer was able to trigger a divide by zero, when
      TCP window scaling is not enabled.
      
      SO_RCVBUF can be used not only to increase sk_rcvbuf, also
      to decrease it below current receive buffers utilization.
      
      If mss is negative or 0, just return a zero TCP window.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov  <dvyukov@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      35558f80
    • Dan Carpenter's avatar
      ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() · 27562537
      Dan Carpenter authored
      [ Upstream commit 63117f09 ]
      
      Casting is a high precedence operation but "off" and "i" are in terms of
      bytes so we need to have some parenthesis here.
      
      Fixes: fbfa743a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      27562537
    • Eric Dumazet's avatar
      ipv6: fix ip6_tnl_parse_tlv_enc_lim() · a6f6bb6b
      Eric Dumazet authored
      [ Upstream commit fbfa743a ]
      
      This function suffers from multiple issues.
      
      First one is that pskb_may_pull() may reallocate skb->head,
      so the 'raw' pointer needs either to be reloaded or not used at all.
      
      Second issue is that NEXTHDR_DEST handling does not validate
      that the options are present in skb->data, so we might read
      garbage or access non existent memory.
      
      With help from Willem de Bruijn.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a6f6bb6b
    • Eric Dumazet's avatar
      can: Fix kernel panic at security_sock_rcv_skb · 47c192ef
      Eric Dumazet authored
      [ Upstream commit f1712c73 ]
      
      Zhang Yanmin reported crashes [1] and provided a patch adding a
      synchronize_rcu() call in can_rx_unregister()
      
      The main problem seems that the sockets themselves are not RCU
      protected.
      
      If CAN uses RCU for delivery, then sockets should be freed only after
      one RCU grace period.
      
      Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
      ease stable backports with the following fix instead.
      
      [1]
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0
      
      Call Trace:
       <IRQ>
       [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
       [<ffffffff81d55771>] sk_filter+0x41/0x210
       [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
       [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
       [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
       [<ffffffff81f07af9>] can_receive+0xd9/0x120
       [<ffffffff81f07beb>] can_rcv+0xab/0x100
       [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
       [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
       [<ffffffff81d37f67>] process_backlog+0x127/0x280
       [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
       [<ffffffff810c88d4>] __do_softirq+0x184/0x440
       [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
       [<ffffffff810c8bed>] do_softirq+0x1d/0x20
       [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
       [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
       [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
       [<ffffffff810e3baf>] process_one_work+0x24f/0x670
       [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
       [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
       [<ffffffff810ebafc>] kthread+0x12c/0x150
       [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
      Reported-by: default avatarZhang Yanmin <yanmin.zhang@intel.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      47c192ef
    • Maxime Jayat's avatar
      net: socket: fix recvmmsg not returning error from sock_error · 82df12b2
      Maxime Jayat authored
      [ Upstream commit e623a9e9 ]
      
      Commit 34b88a68 ("net: Fix use after free in the recvmmsg exit path"),
      changed the exit path of recvmmsg to always return the datagrams
      variable and modified the error paths to set the variable to the error
      code returned by recvmsg if necessary.
      
      However in the case sock_error returned an error, the error code was
      then ignored, and recvmmsg returned 0.
      
      Change the error path of recvmmsg to correctly return the error code
      of sock_error.
      
      The bug was triggered by using recvmmsg on a CAN interface which was
      not up. Linux 4.6 and later return 0 in this case while earlier
      releases returned -ENETDOWN.
      
      Fixes: 34b88a68 ("net: Fix use after free in the recvmmsg exit path")
      Signed-off-by: default avatarMaxime Jayat <maxime.jayat@mobile-devices.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      82df12b2
    • Kefeng Wang's avatar
      ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock · b774af06
      Kefeng Wang authored
      [ Upstream commit 03e4deff ]
      
      Just like commit 4acd4945 ("ipv6: addrconf: Avoid calling
      netdevice notifiers with RCU read-side lock"), it is unnecessary
      to make addrconf_disable_change() use RCU iteration over the
      netdev list, since it already holds the RTNL lock, or we may meet
      Illegal context switch in RCU read-side critical section.
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b774af06
    • Michal Tesar's avatar
      igmp: Make igmp group member RFC 3376 compliant · afc5bf35
      Michal Tesar authored
      [ Upstream commit 7ababb78 ]
      
      5.2. Action on Reception of a Query
      
       When a system receives a Query, it does not respond immediately.
       Instead, it delays its response by a random amount of time, bounded
       by the Max Resp Time value derived from the Max Resp Code in the
       received Query message.  A system may receive a variety of Queries on
       different interfaces and of different kinds (e.g., General Queries,
       Group-Specific Queries, and Group-and-Source-Specific Queries), each
       of which may require its own delayed response.
      
       Before scheduling a response to a Query, the system must first
       consider previously scheduled pending responses and in many cases
       schedule a combined response.  Therefore, the system must be able to
       maintain the following state:
      
       o A timer per interface for scheduling responses to General Queries.
      
       o A per-group and interface timer for scheduling responses to Group-
         Specific and Group-and-Source-Specific Queries.
      
       o A per-group and interface list of sources to be reported in the
         response to a Group-and-Source-Specific Query.
      
       When a new Query with the Router-Alert option arrives on an
       interface, provided the system has state to report, a delay for a
       response is randomly selected in the range (0, [Max Resp Time]) where
       Max Resp Time is derived from Max Resp Code in the received Query
       message.  The following rules are then used to determine if a Report
       needs to be scheduled and the type of Report to schedule.  The rules
       are considered in order and only the first matching rule is applied.
      
       1. If there is a pending response to a previous General Query
          scheduled sooner than the selected delay, no additional response
          needs to be scheduled.
      
       2. If the received Query is a General Query, the interface timer is
          used to schedule a response to the General Query after the
          selected delay.  Any previously pending response to a General
          Query is canceled.
      --8<--
      
      Currently the timer is rearmed with new random expiration time for
      every incoming query regardless of possibly already pending report.
      Which is not aligned with the above RFE.
      It also might happen that higher rate of incoming queries can
      postpone the report after the expiration time of the first query
      causing group membership loss.
      
      Now the per interface general query timer is rearmed only
      when there is no pending report already scheduled on that interface or
      the newly selected expiration time is before the already pending
      scheduled report.
      Signed-off-by: default avatarMichal Tesar <mtesar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      afc5bf35