1. 02 Dec, 2015 6 commits
  2. 01 Dec, 2015 5 commits
    • Eric Dumazet's avatar
      net: fix sock_wake_async() rcu protection · ceb5d58b
      Eric Dumazet authored
      Dmitry provided a syzkaller (http://github.com/google/syzkaller)
      triggering a fault in sock_wake_async() when async IO is requested.
      
      Said program stressed af_unix sockets, but the issue is generic
      and should be addressed in core networking stack.
      
      The problem is that by the time sock_wake_async() is called,
      we should not access the @flags field of 'struct socket',
      as the inode containing this socket might be freed without
      further notice, and without RCU grace period.
      
      We already maintain an RCU protected structure, "struct socket_wq"
      so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
      is the safe route.
      
      It also reduces number of cache lines needing dirtying, so might
      provide a performance improvement anyway.
      
      In followup patches, we might move remaining flags (SOCK_NOSPACE,
      SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
      being mostly read and let it being shared between cpus.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceb5d58b
    • Eric Dumazet's avatar
      net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072
      Eric Dumazet authored
      This patch is a cleanup to make following patch easier to
      review.
      
      Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
      from (struct socket)->flags to a (struct socket_wq)->flags
      to benefit from RCU protection in sock_wake_async()
      
      To ease backports, we rename both constants.
      
      Two new helpers, sk_set_bit(int nr, struct sock *sk)
      and sk_clear_bit(int net, struct sock *sk) are added so that
      following patch can change their implementation.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cd3e072
    • Alexey Khoroshilov's avatar
      vmxnet3: fix checks for dma mapping errors · 5738a09d
      Alexey Khoroshilov authored
      vmxnet3_drv does not check dma_addr with dma_mapping_error()
      after mapping dma memory. The patch adds the checks and
      tries to handle failures.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5738a09d
    • Peter Hurley's avatar
      wan/x25: Fix use-after-free in x25_asy_open_tty() · ee9159dd
      Peter Hurley authored
      The N_X25 line discipline may access the previous line discipline's closed
      and already-freed private data on open [1].
      
      The tty->disc_data field _never_ refers to valid data on entry to the
      line discipline's open() method. Rather, the ldisc is expected to
      initialize that field for its own use for the lifetime of the instance
      (ie. from open() to close() only).
      
      [1]
          [  634.336761] ==================================================================
          [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
          [  634.339558] Read of size 4 by task syzkaller_execu/8981
          [  634.340359] =============================================================================
          [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
          ...
          [  634.405018] Call Trace:
          [  634.405277] dump_stack (lib/dump_stack.c:52)
          [  634.405775] print_trailer (mm/slub.c:655)
          [  634.406361] object_err (mm/slub.c:662)
          [  634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
          [  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
          [  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
          [  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
          [  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
          [  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
          [  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
          [  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
          [  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)
      Reported-and-tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee9159dd
    • Nicolas Dichtel's avatar
      Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" · 304d888b
      Nicolas Dichtel authored
      This reverts commit ab450605.
      
      In IPv6, we cannot inherit the dst of the original dst. ndisc packets
      are IPv6 packets and may take another route than the original packet.
      
      This patch breaks the following scenario: a packet comes from eth0 and
      is forwarded through vxlan1. The encapsulated packet triggers an NS
      which cannot be sent because of the wrong route.
      
      CC: Jiri Benc <jbenc@redhat.com>
      CC: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      304d888b
  3. 30 Nov, 2015 12 commits
  4. 25 Nov, 2015 2 commits
    • Daniel Borkmann's avatar
      bpf: fix clearing on persistent program array maps · c9da161c
      Daniel Borkmann authored
      Currently, when having map file descriptors pointing to program arrays,
      there's still the issue that we unconditionally flush program array
      contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens
      when such a file descriptor is released and is independent of the map's
      refcount.
      
      Having this flush independent of the refcount is for a reason: there
      can be arbitrary complex dependency chains among tail calls, also circular
      ones (direct or indirect, nesting limit determined during runtime), and
      we need to make sure that the map drops all references to eBPF programs
      it holds, so that the map's refcount can eventually drop to zero and
      initiate its freeing. Btw, a walk of the whole dependency graph would
      not be possible for various reasons, one being complexity and another
      one inconsistency, i.e. new programs can be added to parts of the graph
      at any time, so there's no guaranteed consistent state for the time of
      such a walk.
      
      Now, the program array pinning itself works, but the issue is that each
      derived file descriptor on close would nevertheless call unconditionally
      into bpf_fd_array_map_clear(). Instead, keep track of users and postpone
      this flush until the last reference to a user is dropped. As this only
      concerns a subset of references (f.e. a prog array could hold a program
      that itself has reference on the prog array holding it, etc), we need to
      track them separately.
      
      Short analysis on the refcounting: on map creation time usercnt will be
      one, so there's no change in behaviour for bpf_map_release(), if unpinned.
      If we already fail in map_create(), we are immediately freed, and no
      file descriptor has been made public yet. In bpf_obj_pin_user(), we need
      to probe for a possible map in bpf_fd_probe_obj() already with a usercnt
      reference, so before we drop the reference on the fd with fdput().
      Therefore, if actual pinning fails, we need to drop that reference again
      in bpf_any_put(), otherwise we keep holding it. When last reference
      drops on the inode, the bpf_any_put() in bpf_evict_inode() will take
      care of dropping the usercnt again. In the bpf_obj_get_user() case, the
      bpf_any_get() will grab a reference on the usercnt, still at a time when
      we have the reference on the path. Should we later on fail to grab a new
      file descriptor, bpf_any_put() will drop it, otherwise we hold it until
      bpf_map_release() time.
      
      Joint work with Alexei.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9da161c
    • Christoph Biedl's avatar
      isdn: Partially revert debug format string usage clean up · 19cebbcb
      Christoph Biedl authored
      Commit 35a4a573 ("isdn: clean up debug format string usage") introduced
      a safeguard to avoid accidential format string interpolation of data
      when calling debugl1 or HiSax_putstatus. This did however not take into
      account VHiSax_putstatus (called by HiSax_putstatus) does *not* call
      vsprintf if the head parameter is NULL - the format string is treated
      as plain text then instead. As a result, the string "%s" is processed
      literally, and the actual information is lost. This affects the isdnlog
      userspace program which stopped logging information since that commit.
      
      So revert the HiSax_putstatus invocations to the previous state.
      
      Fixes: 35a4a573 ("isdn: clean up debug format string usage")
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Signed-off-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19cebbcb
  5. 24 Nov, 2015 11 commits
    • Quentin Casasnovas's avatar
      RDS: fix race condition when sending a message on unbound socket · 8c7188b2
      Quentin Casasnovas authored
      Sasha's found a NULL pointer dereference in the RDS connection code when
      sending a message to an apparently unbound socket.  The problem is caused
      by the code checking if the socket is bound in rds_sendmsg(), which checks
      the rs_bound_addr field without taking a lock on the socket.  This opens a
      race where rs_bound_addr is temporarily set but where the transport is not
      in rds_bind(), leading to a NULL pointer dereference when trying to
      dereference 'trans' in __rds_conn_create().
      
      Vegard wrote a reproducer for this issue, so kindly ask him to share if
      you're interested.
      
      I cannot reproduce the NULL pointer dereference using Vegard's reproducer
      with this patch, whereas I could without.
      
      Complete earlier incomplete fix to CVE-2015-6937:
      
        74e98eb0 ("RDS: verify the underlying transport exists before creating a connection")
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Reviewed-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c7188b2
    • Aaron Conole's avatar
      net: openvswitch: Remove invalid comment · 20f79566
      Aaron Conole authored
      During pre-upstream development, the openvswitch datapath used a custom
      hashtable to store vports that could fail on delete due to lack of
      memory. However, prior to upstream submission, this code was reworked to
      use an hlist based hastable with flexible-array based buckets. As such
      the failure condition was eliminated from the vport_del path, rendering
      this comment invalid.
      Signed-off-by: default avatarAaron Conole <aconole@bytheb.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20f79566
    • Nikolay Aleksandrov's avatar
      net: ipmr, ip6mr: fix vif/tunnel failure race condition · fbdd29bf
      Nikolay Aleksandrov authored
      Since (at least) commit b17a7c17 ("[NET]: Do sysfs registration as
      part of register_netdevice."), netdev_run_todo() deals only with
      unregistration, so we don't need to do the rtnl_unlock/lock cycle to
      finish registration when failing pimreg or dvmrp device creation. In
      fact that opens a race condition where someone can delete the device
      while rtnl is unlocked because it's fully registered. The problem gets
      worse when netlink support is introduced as there are more points of entry
      that can cause it and it also makes reusing that code correctly impossible.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbdd29bf
    • David Howells's avatar
      rxrpc: Correctly handle ack at end of client call transmit phase · 33c40e24
      David Howells authored
      Normally, the transmit phase of a client call is implicitly ack'd by the
      reception of the first data packet of the response being received.
      However, if a security negotiation happens, the transmit phase, if it is
      entirely contained in a single packet, may get an ack packet in response
      and then may get aborted due to security negotiation failure.
      
      Because the client has shifted state to RXRPC_CALL_CLIENT_AWAIT_REPLY due
      to having transmitted all the data, the code that handles processing of the
      received ack packet doesn't note the hard ack the data packet.
      
      The following abort packet in the case of security negotiation failure then
      incurs an assertion failure when it tries to drain the Tx queue because the
      hard ack state is out of sync (hard ack means the packets have been
      processed and can be discarded by the sender; a soft ack means that the
      packets are received but could still be discarded and rerequested by the
      receiver).
      
      To fix this, we should record the hard ack we received for the ack packet.
      
      The assertion failure looks like:
      
      	RxRPC: Assertion failed
      	1 <= 0 is false
      	0x1 <= 0x0 is false
      	------------[ cut here ]------------
      	kernel BUG at ../net/rxrpc/ar-ack.c:431!
      	...
      	RIP: 0010:[<ffffffffa006857b>]  [<ffffffffa006857b>] rxrpc_rotate_tx_window+0xbc/0x131 [af_rxrpc]
      	...
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33c40e24
    • Michal Kubeček's avatar
      ipv6: distinguish frag queues by device for multicast and link-local packets · 264640fc
      Michal Kubeček authored
      If a fragmented multicast packet is received on an ethernet device which
      has an active macvlan on top of it, each fragment is duplicated and
      received both on the underlying device and the macvlan. If some
      fragments for macvlan are processed before the whole packet for the
      underlying device is reassembled, the "overlapping fragments" test in
      ip6_frag_queue() discards the whole fragment queue.
      
      To resolve this, add device ifindex to the search key and require it to
      match reassembling multicast packets and packets to link-local
      addresses.
      
      Note: similar patch has been already submitted by Yoshifuji Hideaki in
      
        http://patchwork.ozlabs.org/patch/220979/
      
      but got lost and forgotten for some reason.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      264640fc
    • Iyappan Subramanian's avatar
      drivers: net: xgene: fix: ifconfig up/down crash · aeb20b6b
      Iyappan Subramanian authored
      Fixing kernel crash when doing ifconfig down and up in a loop,
      
      [ 124.028237] Call trace:
      [ 124.030670] [<ffffffc000367ce0>] memcpy+0x20/0x180
      [ 124.035436] [<ffffffc00053c250>] skb_clone+0x3c/0xa8
      [ 124.040374] [<ffffffc00053ffa4>] __skb_tstamp_tx+0xc0/0x118
      [ 124.045918] [<ffffffc00054000c>] skb_tstamp_tx+0x10/0x1c
      [ 124.051203] [<ffffffc00049bc84>] xgene_enet_start_xmit+0x2e4/0x33c
      [ 124.057352] [<ffffffc00054fc20>] dev_hard_start_xmit+0x2e8/0x400
      [ 124.063327] [<ffffffc00056cb14>] sch_direct_xmit+0x90/0x1d4
      [ 124.068870] [<ffffffc000550100>] __dev_queue_xmit+0x28c/0x498
      [ 124.074585] [<ffffffc00055031c>] dev_queue_xmit_sk+0x10/0x1c
      [ 124.080216] [<ffffffc0005c3f14>] ip_finish_output2+0x3d0/0x438
      [ 124.086017] [<ffffffc0005c5794>] ip_finish_output+0x198/0x1ac
      [ 124.091732] [<ffffffc0005c61d4>] ip_output+0xec/0x164
      [ 124.096755] [<ffffffc0005c5910>] ip_local_out_sk+0x38/0x48
      [ 124.102211] [<ffffffc0005c5d84>] ip_queue_xmit+0x288/0x330
      [ 124.107668] [<ffffffc0005da8bc>] tcp_transmit_skb+0x908/0x964
      [ 124.113383] [<ffffffc0005dc0d4>] tcp_send_ack+0x128/0x138
      [ 124.118753] [<ffffffc0005d1580>] __tcp_ack_snd_check+0x5c/0x94
      [ 124.124555] [<ffffffc0005d7a0c>] tcp_rcv_established+0x554/0x68c
      [ 124.130530] [<ffffffc0005df0d4>] tcp_v4_do_rcv+0xa4/0x37c
      [ 124.135900] [<ffffffc000539430>] release_sock+0xb4/0x150
      [ 124.141184] [<ffffffc0005cdf88>] tcp_recvmsg+0x448/0x9e0
      [ 124.146468] [<ffffffc0005f2f3c>] inet_recvmsg+0xa0/0xc0
      [ 124.151666] [<ffffffc000533660>] sock_recvmsg+0x10/0x1c
      [ 124.156863] [<ffffffc0005370d4>] SyS_recvfrom+0xa4/0xf8
      [ 124.162061] Code: f2400c84 540001c0 cb040042 36000064 (38401423)
      [ 124.168133] ---[ end trace 7ab2550372e8a65b ]---
      
      The fix was to reorder napi_enable, napi_disable, request_irq and
      free_irq calls, move register_netdev after dma_coerce_mask_and_coherent.
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Tested-by: default avatarKhuong Dinh <kdinh@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aeb20b6b
    • Bjørn Mork's avatar
      net: cdc_ncm: fix NULL pointer deref in cdc_ncm_bind_common · 6527f833
      Bjørn Mork authored
      Commit 77b0a099 ("cdc-ncm: use common parser") added a dangerous
      new trust in the CDC functional descriptors presented by the device,
      unconditionally assuming that any device handled by the driver has
      a CDC Union descriptor.
      
      This descriptor is required by the NCM and MBIM specs, but crashing
      on non-compliant devices is still unacceptable. Not only will that
      allow malicious devices to crash the kernel, but in this case it is
      also well known that there are non-compliant real devices on the
      market - as shown by the comment accompanying the IAD workaround
      in the same function.
      
      The Sierra Wireless EM7305 is an example of such device, having
      a CDC header and a CDC MBIM descriptor but no CDC Union:
      
          Interface Descriptor:
            bLength                 9
            bDescriptorType         4
            bInterfaceNumber       12
            bAlternateSetting       0
            bNumEndpoints           1
            bInterfaceClass         2 Communications
            bInterfaceSubClass     14
            bInterfaceProtocol      0
            iInterface              0
            CDC Header:
              bcdCDC               1.10
            CDC MBIM:
              bcdMBIMVersion       1.00
              wMaxControlMessage   4096
              bNumberFilters       16
              bMaxFilterSize       128
              wMaxSegmentSize      4064
              bmNetworkCapabilities 0x20
                8-byte ntb input size
            Endpoint Descriptor:
      	..
      
      The conversion to a common parser also left the local cdc_union
      variable untouched.  This caused the IAD workaround code to be applied
      to all devices with an IAD descriptor, which was never intended.  Finish
      the conversion by testing for hdr.usb_cdc_union_desc instead.
      
      Cc: Oliver Neukum <oneukum@suse.com>
      Fixes: 77b0a099 ("cdc-ncm: use common parser")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6527f833
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-4.4-20151123' of... · 90bb81f3
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-4.4-20151123' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2015-11-23
      
      this is a pull request of three patches for the upcoming v4.4 release.
      
      The first patch is by Mirza Krak, it fixes a problem with the sja1000 driver
      after resuming from suspend to disk, by clearing all outstanding interrupts.
      Oliver Hartkopp contributes two patches targeting almost all driver, they fix
      the assignment of the error location in CAN error messages.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90bb81f3
    • Ying Xue's avatar
      tipc: fix error handling of expanding buffer headroom · 7098356b
      Ying Xue authored
      Coverity says:
      
      *** CID 1338065:  Error handling issues  (CHECKED_RETURN)
      /net/tipc/udp_media.c: 162 in tipc_udp_send_msg()
      156     	struct udp_media_addr *dst = (struct udp_media_addr *)&dest->value;
      157     	struct udp_media_addr *src = (struct udp_media_addr *)&b->addr.value;
      158     	struct sk_buff *clone;
      159     	struct rtable *rt;
      160
      161     	if (skb_headroom(skb) < UDP_MIN_HEADROOM)
      >>>     CID 1338065:  Error handling issues  (CHECKED_RETURN)
      >>>     Calling "pskb_expand_head" without checking return value (as is done elsewhere 51 out of 56 times).
      162     		pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
      163
      164     	clone = skb_clone(skb, GFP_ATOMIC);
      165     	skb_set_inner_protocol(clone, htons(ETH_P_TIPC));
      166     	ub = rcu_dereference_rtnl(b->media_ptr);
      167     	if (!ub) {
      
      When expanding buffer headroom over udp tunnel with pskb_expand_head(),
      it's unfortunate that we don't check its return value. As a result, if
      the function returns an error code due to the lack of memory, it may
      cause unpredictable consequence as we unconditionally consider that
      it's always successful.
      
      Fixes: e5356794 ("tipc: conditionally expand buffer headroom over udp tunnel")
      Reported-by: <scan-admin@coverity.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7098356b
    • Ying Xue's avatar
      tipc: avoid packets leaking on socket receive queue · f4195d1e
      Ying Xue authored
      Even if we drain receive queue thoroughly in tipc_release() after tipc
      socket is removed from rhashtable, it is possible that some packets
      are in flight because some CPU runs receiver and did rhashtable lookup
      before we removed socket. They will achieve receive queue, but nobody
      delete them at all. To avoid this leak, we register a private socket
      destructor to purge receive queue, meaning releasing packets pending
      on receive queue will be delayed until the last reference of tipc
      socket will be released.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4195d1e
    • Aaro Koskinen's avatar
      broadcom: fix PHY_ID_BCM5481 entry in the id table · 3c25a860
      Aaro Koskinen authored
      Commit fcb26ec5 ("broadcom: move all PHY_ID's to header")
      updated broadcom_tbl to use PHY_IDs, but incorrectly replaced 0x0143bca0
      with PHY_ID_BCM5482 (making a duplicate entry, and completely omitting
      the original). Fix that.
      
      Fixes: fcb26ec5 ("broadcom: move all PHY_ID's to header")
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c25a860
  6. 23 Nov, 2015 4 commits
    • Nikolay Aleksandrov's avatar
      vrf: fix double free and memory corruption on register_netdevice failure · 7f109f7c
      Nikolay Aleksandrov authored
      When vrf's ->newlink is called, if register_netdevice() fails then it
      does free_netdev(), but that's also done by rtnl_newlink() so a second
      free happens and memory gets corrupted, to reproduce execute the
      following line a couple of times (1 - 5 usually is enough):
      $ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
      This works because we fail in register_netdevice() because of the wrong
      name "vrf:".
      
      And here's a trace of one crash:
      [   28.792157] ------------[ cut here ]------------
      [   28.792407] kernel BUG at fs/namei.c:246!
      [   28.792608] invalid opcode: 0000 [#1] SMP
      [   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry
      nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul
      crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse
      glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev
      parport_pc parport serio_raw pcspkr virtio_balloon virtio_console
      i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4
      ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom
      ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix
      libata virtio_pci virtio_ring virtio scsi_mod floppy
      [   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
      4.4.0-rc1+ #24
      [   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.8.1-20150318_183358- 04/01/2014
      [   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti:
      ffff88003592c000
      [   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>]
      putname+0x43/0x60
      [   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
      [   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX:
      0000000000000001
      [   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
      ffff88003784f000
      [   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09:
      0000000000000000
      [   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12:
      0000000000000000
      [   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15:
      ffff8800358c4a00
      [   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
      knlGS:0000000000000000
      [   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4:
      00000000000406f0
      [   28.796016] Stack:
      [   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0
      ffff880035a91660
      [   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940
      00ffffff81218684
      [   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000
      ffff880035b36d80
      [   28.796016] Call Trace:
      [   28.796016]  [<ffffffff8121045d>] ?
      do_execveat_common.isra.34+0x74d/0x930
      [   28.796016]  [<ffffffff812102d3>] ?
      do_execveat_common.isra.34+0x5c3/0x930
      [   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
      [   28.796016]  [<ffffffff810939a0>]
      call_usermodehelper_exec_async+0xf0/0x140
      [   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
      [   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
      [   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6
      74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d
      f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9
      [   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
      [   28.796016]  RSP <ffff88003592fe88>
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f109f7c
    • Dan Carpenter's avatar
      net/hsr: fix a warning message · 3d1a54e8
      Dan Carpenter authored
      WARN_ON_ONCE() takes a condition, it doesn't take an error message.  I
      have converted this to WARN() instead.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d1a54e8
    • Rainer Weikusat's avatar
      unix: avoid use-after-free in ep_remove_wait_queue · 7d267278
      Rainer Weikusat authored
      Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
      An AF_UNIX datagram socket being the client in an n:1 association with
      some server socket is only allowed to send messages to the server if the
      receive queue of this socket contains at most sk_max_ack_backlog
      datagrams. This implies that prospective writers might be forced to go
      to sleep despite none of the message presently enqueued on the server
      receive queue were sent by them. In order to ensure that these will be
      woken up once space becomes again available, the present unix_dgram_poll
      routine does a second sock_poll_wait call with the peer_wait wait queue
      of the server socket as queue argument (unix_dgram_recvmsg does a wake
      up on this queue after a datagram was received). This is inherently
      problematic because the server socket is only guaranteed to remain alive
      for as long as the client still holds a reference to it. In case the
      connection is dissolved via connect or by the dead peer detection logic
      in unix_dgram_sendmsg, the server socket may be freed despite "the
      polling mechanism" (in particular, epoll) still has a pointer to the
      corresponding peer_wait queue. There's no way to forcibly deregister a
      wait queue with epoll.
      
      Based on an idea by Jason Baron, the patch below changes the code such
      that a wait_queue_t belonging to the client socket is enqueued on the
      peer_wait queue of the server whenever the peer receive queue full
      condition is detected by either a sendmsg or a poll. A wake up on the
      peer queue is then relayed to the ordinary wait queue of the client
      socket via wake function. The connection to the peer wait queue is again
      dissolved if either a wake up is about to be relayed or the client
      socket reconnects or a dead peer is detected or the client socket is
      itself closed. This enables removing the second sock_poll_wait from
      unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
      that no blocked writer sleeps forever.
      Signed-off-by: default avatarRainer Weikusat <rweikusat@mobileactivedefense.com>
      Fixes: ec0d215f ("af_unix: fix 'poll for write'/connected DGRAM sockets")
      Reviewed-by: default avatarJason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d267278
    • Nina Schiff's avatar
      cgroups: Allow dynamically changing net_classid · 3b13758f
      Nina Schiff authored
      The classid of a process is changed either when a process is moved to
      or from a cgroup or when the net_cls.classid file is updated.
      Previously net_cls only supported propogating these changes to the
      cgroup's related sockets when a process was added or removed from the
      cgroup. This means it was neccessary to remove and re-add all processes
      to a cgroup in order to update its classid. This change introduces
      support for doing this dynamically - i.e. when the value is changed in
      the net_cls_classid file, this will also trigger an update to the
      classid associated with all sockets controlled by the cgroup.
      This mimics the behaviour of other cgroup subsystems.
      net_prio circumvents this issue by storing an index into a table with
      each socket (and so any updates to the table, don't require updating
      the value associated with the socket). net_cls, however, passes the
      socket the classid directly, and so this additional step is needed.
      Signed-off-by: default avatarNina Schiff <ninasc@fb.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b13758f