1. 03 May, 2017 29 commits
  2. 30 Apr, 2017 11 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.65 · 418b9904
      Greg Kroah-Hartman authored
      418b9904
    • Peter Zijlstra's avatar
      perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race · 416bd4a3
      Peter Zijlstra authored
      commit 321027c1 upstream.
      
      Di Shen reported a race between two concurrent sys_perf_event_open()
      calls where both try and move the same pre-existing software group
      into a hardware context.
      
      The problem is exactly that described in commit:
      
        f63a8daa ("perf: Fix event->ctx locking")
      
      ... where, while we wait for a ctx->mutex acquisition, the event->ctx
      relation can have changed under us.
      
      That very same commit failed to recognise sys_perf_event_context() as an
      external access vector to the events and thereby didn't apply the
      established locking rules correctly.
      
      So while one sys_perf_event_open() call is stuck waiting on
      mutex_lock_double(), the other (which owns said locks) moves the group
      about. So by the time the former sys_perf_event_open() acquires the
      locks, the context we've acquired is stale (and possibly dead).
      
      Apply the established locking rules as per perf_event_ctx_lock_nested()
      to the mutex_lock_double() for the 'move_group' case. This obviously means
      we need to validate state after we acquire the locks.
      
      Reported-by: Di Shen (Keen Lab)
      Tested-by: default avatarJohn Dias <joaodias@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Min Chong <mchong@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: f63a8daa ("perf: Fix event->ctx locking")
      Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 4.4:
       - Test perf_event::group_flags instead of group_caps
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      416bd4a3
    • Eric Dumazet's avatar
      ping: implement proper locking · b7f47c79
      Eric Dumazet authored
      commit 43a66845 upstream.
      
      We got a report of yet another bug in ping
      
      http://www.openwall.com/lists/oss-security/2017/03/24/6
      
      ->disconnect() is not called with socket lock held.
      
      Fix this by acquiring ping rwlock earlier.
      
      Thanks to Daniel, Alexander and Andrey for letting us know this problem.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDaniel Jiang <danieljiang0415@gmail.com>
      Reported-by: default avatarSolar Designer <solar@openwall.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7f47c79
    • EunTaik Lee's avatar
      staging/android/ion : fix a race condition in the ion driver · a7544fdd
      EunTaik Lee authored
      commit 9590232b upstream.
      
      There is a use-after-free problem in the ion driver.
      This is caused by a race condition in the ion_ioctl()
      function.
      
      A handle has ref count of 1 and two tasks on different
      cpus calls ION_IOC_FREE simultaneously.
      
      cpu 0                                   cpu 1
      -------------------------------------------------------
      ion_handle_get_by_id()
      (ref == 2)
                                  ion_handle_get_by_id()
                                  (ref == 3)
      
      ion_free()
      (ref == 2)
      
      ion_handle_put()
      (ref == 1)
      
                                  ion_free()
                                  (ref == 0 so ion_handle_destroy() is
                                  called
                                  and the handle is freed.)
      
                                  ion_handle_put() is called and it
                                  decreases the slub's next free pointer
      
      The problem is detected as an unaligned access in the
      spin lock functions since it uses load exclusive
       instruction. In some cases it corrupts the slub's
      free pointer which causes a mis-aligned access to the
      next free pointer.(kmalloc returns a pointer like
      ffffc0745b4580aa). And it causes lots of other
      hard-to-debug problems.
      
      This symptom is caused since the first member in the
      ion_handle structure is the reference count and the
      ion driver decrements the reference after it has been
      freed.
      
      To fix this problem client->lock mutex is extended
      to protect all the codes that uses the handle.
      Signed-off-by: default avatarEun Taik Lee <eun.taik.lee@samsung.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      index 7ff2a7ec871f..33b390e7ea31
      a7544fdd
    • Vlad Tsyrklevich's avatar
      vfio/pci: Fix integer overflows, bitmask check · d23ef85b
      Vlad Tsyrklevich authored
      commit 05692d70 upstream.
      
      The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
      user-supplied integers, potentially allowing memory corruption. This
      patch adds appropriate integer overflow checks, checks the range bounds
      for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
      in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
      VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
      vfio_pci_set_irqs_ioctl().
      
      Furthermore, a kzalloc is changed to a kcalloc because the use of a
      kzalloc with an integer multiplication allowed an integer overflow
      condition to be reached without this patch. kcalloc checks for overflow
      and should prevent a similar occurrence.
      Signed-off-by: default avatarVlad Tsyrklevich <vlad@tsyrklevich.net>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d23ef85b
    • Michal Kubeček's avatar
      tipc: check minimum bearer MTU · 65d30f75
      Michal Kubeček authored
      commit 3de81b75 upstream.
      
      Qian Zhang (张谦) reported a potential socket buffer overflow in
      tipc_msg_build() which is also known as CVE-2016-8632: due to
      insufficient checks, a buffer overflow can occur if MTU is too short for
      even tipc headers. As anyone can set device MTU in a user/net namespace,
      this issue can be abused by a regular user.
      
      As agreed in the discussion on Ben Hutchings' original patch, we should
      check the MTU at the moment a bearer is attached rather than for each
      processed packet. We also need to repeat the check when bearer MTU is
      adjusted to new device MTU. UDP case also needs a check to avoid
      overflow when calculating bearer MTU.
      
      Fixes: b97bf3fd ("[TIPC] Initial merge")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reported-by: default avatarQian Zhang (张谦) <zhangqian-c@360.cn>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 4.4:
       - Adjust context
       - NETDEV_GOING_DOWN and NETDEV_CHANGEMTU cases in net notifier were combined]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65d30f75
    • Phil Turnbull's avatar
      netfilter: nfnetlink: correctly validate length of batch messages · 9540baad
      Phil Turnbull authored
      commit c58d6c93 upstream.
      
      If nlh->nlmsg_len is zero then an infinite loop is triggered because
      'skb_pull(skb, msglen);' pulls zero bytes.
      
      The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len <
      NLMSG_HDRLEN' which bypasses the length validation and will later
      trigger an out-of-bound read.
      
      If the length validation does fail then the malformed batch message is
      copied back to userspace. However, we cannot do this because the
      nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in
      netlink_ack:
      
          [   41.455421] ==================================================================
          [   41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr ffff880119e79340
          [   41.456431] Read of size 4294967280 by task a.out/987
          [   41.456431] =============================================================================
          [   41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected
          [   41.456431] -----------------------------------------------------------------------------
          ...
          [   41.456431] Bytes b4 ffff880119e79310: 00 00 00 00 d5 03 00 00 b0 fb fe ff 00 00 00 00  ................
          [   41.456431] Object ffff880119e79320: 20 00 00 00 10 00 05 00 00 00 00 00 00 00 00 00   ...............
          [   41.456431] Object ffff880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 22 33 10 00 05  .......@EV."3...
          [   41.456431] Object ffff880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 0a 00 06 fe fb  ................
                                                  ^^ start of batch nlmsg with
                                                     nlmsg_len=4294967280
          ...
          [   41.456431] Memory state around the buggy address:
          [   41.456431]  ffff880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          [   41.456431]  ffff880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          [   41.456431] >ffff880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
          [   41.456431]                                ^
          [   41.456431]  ffff880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          [   41.456431]  ffff880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
          [   41.456431] ==================================================================
      
      Fix this with better validation of nlh->nlmsg_len and by setting
      NFNL_BATCH_FAILURE if any batch message fails length validation.
      
      CAP_NET_ADMIN is required to trigger the bugs.
      
      Fixes: 9ea2aa8b ("netfilter: nfnetlink: validate nfnetlink header from batch")
      Signed-off-by: default avatarPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9540baad
    • Mauro Carvalho Chehab's avatar
      xc2028: avoid use after free · 0d9dac5d
      Mauro Carvalho Chehab authored
      commit 8dfbcc43 upstream.
      
      If struct xc2028_config is passed without a firmware name,
      the following trouble may happen:
      
      [11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner
      [11009.907491] ==================================================================
      [11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr ffff8803bd78ab40
      [11009.907992] Read of size 1 by task modprobe/28992
      [11009.907994] =============================================================================
      [11009.907997] BUG kmalloc-16 (Tainted: G        W      ): kasan: bad access detected
      [11009.907999] -----------------------------------------------------------------------------
      
      [11009.908008] INFO: Allocated in xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] age=0 cpu=3 pid=28992
      [11009.908012] 	___slab_alloc+0x581/0x5b0
      [11009.908014] 	__slab_alloc+0x51/0x90
      [11009.908017] 	__kmalloc+0x27b/0x350
      [11009.908022] 	xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd]
      [11009.908026] 	usb_hcd_submit_urb+0x1e8/0x1c60
      [11009.908029] 	usb_submit_urb+0xb0e/0x1200
      [11009.908032] 	usb_serial_generic_write_start+0xb6/0x4c0
      [11009.908035] 	usb_serial_generic_write+0x92/0xc0
      [11009.908039] 	usb_console_write+0x38a/0x560
      [11009.908045] 	call_console_drivers.constprop.14+0x1ee/0x2c0
      [11009.908051] 	console_unlock+0x40d/0x900
      [11009.908056] 	vprintk_emit+0x4b4/0x830
      [11009.908061] 	vprintk_default+0x1f/0x30
      [11009.908064] 	printk+0x99/0xb5
      [11009.908067] 	kasan_report_error+0x10a/0x550
      [11009.908070] 	__asan_report_load1_noabort+0x43/0x50
      [11009.908074] INFO: Freed in xc2028_set_config+0x90/0x630 [tuner_xc2028] age=1 cpu=3 pid=28992
      [11009.908077] 	__slab_free+0x2ec/0x460
      [11009.908080] 	kfree+0x266/0x280
      [11009.908083] 	xc2028_set_config+0x90/0x630 [tuner_xc2028]
      [11009.908086] 	xc2028_attach+0x310/0x8a0 [tuner_xc2028]
      [11009.908090] 	em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
      [11009.908094] 	em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
      [11009.908098] 	em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
      [11009.908101] 	em28xx_register_extension+0xd9/0x190 [em28xx]
      [11009.908105] 	em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
      [11009.908108] 	do_one_initcall+0x141/0x300
      [11009.908111] 	do_init_module+0x1d0/0x5ad
      [11009.908114] 	load_module+0x6666/0x9ba0
      [11009.908117] 	SyS_finit_module+0x108/0x130
      [11009.908120] 	entry_SYSCALL_64_fastpath+0x16/0x76
      [11009.908123] INFO: Slab 0xffffea000ef5e280 objects=25 used=25 fp=0x          (null) flags=0x2ffff8000004080
      [11009.908126] INFO: Object 0xffff8803bd78ab40 @offset=2880 fp=0x0000000000000001
      
      [11009.908130] Bytes b4 ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00  ....*....(......
      [11009.908133] Object ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff  ...........j....
      [11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G    B   W       4.5.0-rc1+ #43
      [11009.908140] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
      [11009.908142]  ffff8803bd78a000 ffff8802c273f1b8 ffffffff81932007 ffff8803c6407a80
      [11009.908148]  ffff8802c273f1e8 ffffffff81556759 ffff8803c6407a80 ffffea000ef5e280
      [11009.908153]  ffff8803bd78ab40 dffffc0000000000 ffff8802c273f210 ffffffff8155ccb4
      [11009.908158] Call Trace:
      [11009.908162]  [<ffffffff81932007>] dump_stack+0x4b/0x64
      [11009.908165]  [<ffffffff81556759>] print_trailer+0xf9/0x150
      [11009.908168]  [<ffffffff8155ccb4>] object_err+0x34/0x40
      [11009.908171]  [<ffffffff8155f260>] kasan_report_error+0x230/0x550
      [11009.908175]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
      [11009.908179]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908182]  [<ffffffff8155f5c3>] __asan_report_load1_noabort+0x43/0x50
      [11009.908185]  [<ffffffff8155ea00>] ? __asan_register_globals+0x50/0xa0
      [11009.908189]  [<ffffffff8194cea6>] ? strcmp+0x96/0xb0
      [11009.908192]  [<ffffffff8194cea6>] strcmp+0x96/0xb0
      [11009.908196]  [<ffffffffa13ba4ac>] xc2028_set_config+0x15c/0x630 [tuner_xc2028]
      [11009.908200]  [<ffffffffa13bac90>] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
      [11009.908203]  [<ffffffff8155ea78>] ? memset+0x28/0x30
      [11009.908206]  [<ffffffffa13ba980>] ? xc2028_set_config+0x630/0x630 [tuner_xc2028]
      [11009.908211]  [<ffffffffa157a59a>] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
      [11009.908215]  [<ffffffffa157aa2a>] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb]
      [11009.908219]  [<ffffffffa157a3a1>] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb]
      [11009.908222]  [<ffffffffa01795ac>] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x]
      [11009.908226]  [<ffffffffa01793e0>] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x]
      [11009.908230]  [<ffffffff812e87d0>] ? ref_module.part.15+0x10/0x10
      [11009.908233]  [<ffffffff812e56e0>] ? module_assert_mutex_or_preempt+0x80/0x80
      [11009.908238]  [<ffffffffa157af92>] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
      [11009.908242]  [<ffffffffa157a6ae>] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb]
      [11009.908245]  [<ffffffff8195222d>] ? string+0x14d/0x1f0
      [11009.908249]  [<ffffffff8195381f>] ? symbol_string+0xff/0x1a0
      [11009.908253]  [<ffffffff81953720>] ? uuid_string+0x6f0/0x6f0
      [11009.908257]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
      [11009.908260]  [<ffffffff8104b02f>] ? print_context_stack+0x7f/0xf0
      [11009.908264]  [<ffffffff812e9846>] ? __module_address+0xb6/0x360
      [11009.908268]  [<ffffffff8137fdc9>] ? is_ftrace_trampoline+0x99/0xe0
      [11009.908271]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
      [11009.908275]  [<ffffffff81240a70>] ? debug_check_no_locks_freed+0x290/0x290
      [11009.908278]  [<ffffffff8104a24b>] ? dump_trace+0x11b/0x300
      [11009.908282]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
      [11009.908285]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
      [11009.908289]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
      [11009.908292]  [<ffffffff812404dd>] ? trace_hardirqs_on+0xd/0x10
      [11009.908296]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
      [11009.908299]  [<ffffffff822dcbb0>] ? mutex_trylock+0x400/0x400
      [11009.908302]  [<ffffffff810021a1>] ? do_one_initcall+0x131/0x300
      [11009.908306]  [<ffffffff81296dc7>] ? call_rcu_sched+0x17/0x20
      [11009.908309]  [<ffffffff8159e708>] ? put_object+0x48/0x70
      [11009.908314]  [<ffffffffa1579f11>] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
      [11009.908317]  [<ffffffffa13e81f9>] em28xx_register_extension+0xd9/0x190 [em28xx]
      [11009.908320]  [<ffffffffa0150000>] ? 0xffffffffa0150000
      [11009.908324]  [<ffffffffa0150010>] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
      [11009.908327]  [<ffffffff810021b1>] do_one_initcall+0x141/0x300
      [11009.908330]  [<ffffffff81002070>] ? try_to_run_init_process+0x40/0x40
      [11009.908333]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
      [11009.908337]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908340]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908343]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908346]  [<ffffffff8155ea37>] ? __asan_register_globals+0x87/0xa0
      [11009.908350]  [<ffffffff8144da7b>] do_init_module+0x1d0/0x5ad
      [11009.908353]  [<ffffffff812f2626>] load_module+0x6666/0x9ba0
      [11009.908356]  [<ffffffff812e9c90>] ? symbol_put_addr+0x50/0x50
      [11009.908361]  [<ffffffffa1580037>] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb]
      [11009.908366]  [<ffffffff812ebfc0>] ? module_frob_arch_sections+0x20/0x20
      [11009.908369]  [<ffffffff815bc940>] ? open_exec+0x50/0x50
      [11009.908374]  [<ffffffff811671bb>] ? ns_capable+0x5b/0xd0
      [11009.908377]  [<ffffffff812f5e58>] SyS_finit_module+0x108/0x130
      [11009.908379]  [<ffffffff812f5d50>] ? SyS_init_module+0x1f0/0x1f0
      [11009.908383]  [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
      [11009.908394]  [<ffffffff822e6936>] entry_SYSCALL_64_fastpath+0x16/0x76
      [11009.908396] Memory state around the buggy address:
      [11009.908398]  ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908401]  ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908403] >ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc
      [11009.908405]                                            ^
      [11009.908407]  ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908409]  ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908411] ==================================================================
      
      In order to avoid it, let's set the cached value of the firmware
      name to NULL after freeing it. While here, return an error if
      the memory allocation fails.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d9dac5d
    • Eric W. Biederman's avatar
      mnt: Add a per mount namespace limit on the number of mounts · c50fd34e
      Eric W. Biederman authored
      commit d2921684 upstream.
      
      CAI Qian <caiqian@redhat.com> pointed out that the semantics
      of shared subtrees make it possible to create an exponentially
      increasing number of mounts in a mount namespace.
      
          mkdir /tmp/1 /tmp/2
          mount --make-rshared /
          for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done
      
      Will create create 2^20 or 1048576 mounts, which is a practical problem
      as some people have managed to hit this by accident.
      
      As such CVE-2016-6213 was assigned.
      
      Ian Kent <raven@themaw.net> described the situation for autofs users
      as follows:
      
      > The number of mounts for direct mount maps is usually not very large because of
      > the way they are implemented, large direct mount maps can have performance
      > problems. There can be anywhere from a few (likely case a few hundred) to less
      > than 10000, plus mounts that have been triggered and not yet expired.
      >
      > Indirect mounts have one autofs mount at the root plus the number of mounts that
      > have been triggered and not yet expired.
      >
      > The number of autofs indirect map entries can range from a few to the common
      > case of several thousand and in rare cases up to between 30000 and 50000. I've
      > not heard of people with maps larger than 50000 entries.
      >
      > The larger the number of map entries the greater the possibility for a large
      > number of active mounts so it's not hard to expect cases of a 1000 or somewhat
      > more active mounts.
      
      So I am setting the default number of mounts allowed per mount
      namespace at 100,000.  This is more than enough for any use case I
      know of, but small enough to quickly stop an exponential increase
      in mounts.  Which should be perfect to catch misconfigurations and
      malfunctioning programs.
      
      For anyone who needs a higher limit this can be changed by writing
      to the new /proc/sys/fs/mount-max sysctl.
      Tested-by: default avatarCAI Qian <caiqian@redhat.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      [bwh: Backported to 4.4: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c50fd34e
    • Jon Paul Maloy's avatar
      tipc: fix socket timer deadlock · 59e0cd11
      Jon Paul Maloy authored
      commit f1d048f2 upstream.
      
      We sometimes observe a 'deadly embrace' type deadlock occurring
      between mutually connected sockets on the same node. This happens
      when the one-hour peer supervision timers happen to expire
      simultaneously in both sockets.
      
      The scenario is as follows:
      
      CPU 1:                          CPU 2:
      --------                        --------
      tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
        lock(sk1.slock)                 lock(sk2.slock)
        msg_create(probe)               msg_create(probe)
        unlock(sk1.slock)               unlock(sk2.slock)
        tipc_node_xmit_skb()            tipc_node_xmit_skb()
          tipc_node_xmit()                tipc_node_xmit()
            tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
              lock(sk2.slock)                 lock((sk1.slock)
              filter_rcv()                    filter_rcv()
                tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
                  msg_create(probe_rsp)           msg_create(probe_rsp)
                  tipc_sk_respond()               tipc_sk_respond()
                    tipc_node_xmit_skb()            tipc_node_xmit_skb()
                      tipc_node_xmit()                tipc_node_xmit()
                        tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                          lock((sk1.slock)                lock((sk2.slock)
                          ===> DEADLOCK                   ===> DEADLOCK
      
      Further analysis reveals that there are three different locations in the
      socket code where tipc_sk_respond() is called within the context of the
      socket lock, with ensuing risk of similar deadlocks.
      
      We now solve this by passing a buffer queue along with all upcalls where
      sk_lock.slock may potentially be held. Response or rejected message
      buffers are accumulated into this queue instead of being sent out
      directly, and only sent once we know we are safely outside the slock
      context.
      Reported-by: default avatarGUNA <gbalasun@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59e0cd11
    • Parthasarathy Bhuvaragan's avatar
      tipc: fix random link resets while adding a second bearer · abc025d1
      Parthasarathy Bhuvaragan authored
      commit d2f394dc upstream.
      
      In a dual bearer configuration, if the second tipc link becomes
      active while the first link still has pending nametable "bulk"
      updates, it randomly leads to reset of the second link.
      
      When a link is established, the function named_distribute(),
      fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
      with NAME_DISTRIBUTOR message for each PUBLICATION.
      However, the function named_distribute() allocates the buffer by
      increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
      This consumes the space allocated for TUNNEL_PROTOCOL.
      
      When establishing the second link, the link shall tunnel all the
      messages in the first link queue including the "bulk" update.
      As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
      the link mtu the transmission fails (-EMSGSIZE).
      
      Thus, the synch point based on the message count of the tunnel
      packets is never reached leading to link timeout.
      
      In this commit, we adjust the size of name distributor message so that
      they can be tunnelled.
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abc025d1