1. 03 Dec, 2015 9 commits
    • Eric Dumazet's avatar
      ipv6: add complete rcu protection around np->opt · 45f6fad8
      Eric Dumazet authored
      This patch addresses multiple problems :
      
      UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
      while socket is not locked : Other threads can change np->opt
      concurrently. Dmitry posted a syzkaller
      (http://github.com/google/syzkaller) program desmonstrating
      use-after-free.
      
      Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
      and dccp_v6_request_recv_sock() also need to use RCU protection
      to dereference np->opt once (before calling ipv6_dup_options())
      
      This patch adds full RCU protection to np->opt
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45f6fad8
    • Alexei Starovoitov's avatar
      bpf: fix allocation warnings in bpf maps and integer overflow · 01b3f521
      Alexei Starovoitov authored
      For large map->value_size the user space can trigger memory allocation warnings like:
      WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
      __alloc_pages_nodemask+0x695/0x14e0()
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82743b56>] dump_stack+0x68/0x92 lib/dump_stack.c:50
       [<ffffffff81244ec9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
       [<ffffffff812450f9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
       [<     inline     >] __alloc_pages_slowpath mm/page_alloc.c:2989
       [<ffffffff81554e95>] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
       [<ffffffff816188fe>] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
       [<     inline     >] alloc_pages include/linux/gfp.h:451
       [<ffffffff81550706>] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
       [<ffffffff815a1c89>] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
       [<ffffffff815a1cef>] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
       [<     inline     >] kmalloc_large include/linux/slab.h:390
       [<ffffffff81627784>] __kmalloc+0x234/0x250 mm/slub.c:3525
       [<     inline     >] kmalloc include/linux/slab.h:463
       [<     inline     >] map_update_elem kernel/bpf/syscall.c:288
       [<     inline     >] SYSC_bpf kernel/bpf/syscall.c:744
      
      To avoid never succeeding kmalloc with order >= MAX_ORDER check that
      elem->value_size and computed elem_size are within limits for both hash and
      array type maps.
      Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
      Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512,
      so keep those kmalloc-s as-is.
      
      Large value_size can cause integer overflows in elem_size and map.pages
      formulas, so check for that as well.
      
      Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01b3f521
    • David S. Miller's avatar
      Merge branch 'mvneta-fixes' · a64f3f83
      David S. Miller authored
      Marcin Wojtas says:
      
      ====================
      Marvell Armada XP/370/38X Neta fixes
      
      I'm sending v4 with corrected commit log of the last patch, in order to
      avoid possible conflicts between the branches as suggested by Gregory
      Clement.
      
      Best regards,
      Marcin Wojtas
      
      Changes from v4:
      * Correct commit log of patch 6/6
      
      Changes from v2:
      * Style fixes in patch updating mbus protection
      * Remove redundant stable notifications except for patch 4/6
      
      Changes from v1:
      * update MBUS windows access protection register once, after whole loop
      * add fixing value of MVNETA_RXQ_INTR_ENABLE_ALL_MASK
      * add fixing error path for skb_build()
      * add possibility of setting custom TX IP checksum limit in DT property
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a64f3f83
    • Marcin Wojtas's avatar
      mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0 · c4a25007
      Marcin Wojtas authored
      The Ethernet controller found in the Armada 38x SoC's family support
      TCP/IP checksumming with frame sizes larger than 1600 bytes, however
      only on port 0.
      
      This commit enables it by setting 'tx-csum-limit' to 9800B in
      'ethernet@70000' node.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4a25007
    • Marcin Wojtas's avatar
      net: mvneta: enable setting custom TX IP checksum limit · 9110ee07
      Marcin Wojtas authored
      Since Armada 38x SoC can support IP checksum for jumbo frames only on
      a single port, it means that this feature should be enabled per-port,
      rather than for the whole SoC.
      
      This patch enables setting custom TX IP checksum limit by adding new
      optional property to the mvneta device tree node. If not used, by
      default 1600B is set for "marvell,armada-370-neta" and 9800B for other
      strings, which ensures backward compatibility. Binding documentation
      is updated accordingly.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9110ee07
    • Marcin Wojtas's avatar
      net: mvneta: fix error path for building skb · 26c17a17
      Marcin Wojtas authored
      In the actual RX processing, there is same error path for both descriptor
      ring refilling and building skb fails. This is not correct, because after
      successful refill, the ring is already updated with newly allocated
      buffer. Then, in case of build_skb() fail, hitherto code left the original
      buffer unmapped.
      
      This patch fixes above situation by swapping error check of skb build with
      DMA-unmap of original buffer.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Acked-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Cc: <stable@vger.kernel.org> # v4.2+
      Fixes a84e3289 ("net: mvneta: fix refilling for Rx DMA buffers")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26c17a17
    • Marcin Wojtas's avatar
      net: mvneta: fix bit assignment for RX packet irq enable · dc1aadf6
      Marcin Wojtas authored
      A value originally defined in the driver was inappropriate. Even though
      the ingress was somehow working, writing MVNETA_RXQ_INTR_ENABLE_ALL_MASK
      to MVNETA_INTR_ENABLE didn't make any effect, because the bits [31:16]
      are reserved and read-only.
      
      This commit updates MVNETA_RXQ_INTR_ENABLE_ALL_MASK to be compliant with
      the controller's documentation.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      
      Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
      unit")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc1aadf6
    • Marcin Wojtas's avatar
      net: mvneta: fix bit assignment in MVNETA_RXQ_CONFIG_REG · e5bdf689
      Marcin Wojtas authored
      MVNETA_RXQ_HW_BUF_ALLOC bit which controls enabling hardware buffer
      allocation was mistakenly set as BIT(1). This commit fixes the assignment.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Reviewed-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      
      Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
      unit")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5bdf689
    • Marcin Wojtas's avatar
      net: mvneta: add configuration for MBUS windows access protection · db6ba9a5
      Marcin Wojtas authored
      This commit adds missing configuration of MBUS windows access protection
      in mvneta_conf_mbus_windows function - a dedicated variable for that
      purpose remained there unused since v3.8 initial mvneta support. Because
      of that the register contents were inherited from the bootloader.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Reviewed-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      
      Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
      unit")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db6ba9a5
  2. 02 Dec, 2015 8 commits
  3. 01 Dec, 2015 5 commits
    • Eric Dumazet's avatar
      net: fix sock_wake_async() rcu protection · ceb5d58b
      Eric Dumazet authored
      Dmitry provided a syzkaller (http://github.com/google/syzkaller)
      triggering a fault in sock_wake_async() when async IO is requested.
      
      Said program stressed af_unix sockets, but the issue is generic
      and should be addressed in core networking stack.
      
      The problem is that by the time sock_wake_async() is called,
      we should not access the @flags field of 'struct socket',
      as the inode containing this socket might be freed without
      further notice, and without RCU grace period.
      
      We already maintain an RCU protected structure, "struct socket_wq"
      so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
      is the safe route.
      
      It also reduces number of cache lines needing dirtying, so might
      provide a performance improvement anyway.
      
      In followup patches, we might move remaining flags (SOCK_NOSPACE,
      SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
      being mostly read and let it being shared between cpus.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceb5d58b
    • Eric Dumazet's avatar
      net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072
      Eric Dumazet authored
      This patch is a cleanup to make following patch easier to
      review.
      
      Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
      from (struct socket)->flags to a (struct socket_wq)->flags
      to benefit from RCU protection in sock_wake_async()
      
      To ease backports, we rename both constants.
      
      Two new helpers, sk_set_bit(int nr, struct sock *sk)
      and sk_clear_bit(int net, struct sock *sk) are added so that
      following patch can change their implementation.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cd3e072
    • Alexey Khoroshilov's avatar
      vmxnet3: fix checks for dma mapping errors · 5738a09d
      Alexey Khoroshilov authored
      vmxnet3_drv does not check dma_addr with dma_mapping_error()
      after mapping dma memory. The patch adds the checks and
      tries to handle failures.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5738a09d
    • Peter Hurley's avatar
      wan/x25: Fix use-after-free in x25_asy_open_tty() · ee9159dd
      Peter Hurley authored
      The N_X25 line discipline may access the previous line discipline's closed
      and already-freed private data on open [1].
      
      The tty->disc_data field _never_ refers to valid data on entry to the
      line discipline's open() method. Rather, the ldisc is expected to
      initialize that field for its own use for the lifetime of the instance
      (ie. from open() to close() only).
      
      [1]
          [  634.336761] ==================================================================
          [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
          [  634.339558] Read of size 4 by task syzkaller_execu/8981
          [  634.340359] =============================================================================
          [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
          ...
          [  634.405018] Call Trace:
          [  634.405277] dump_stack (lib/dump_stack.c:52)
          [  634.405775] print_trailer (mm/slub.c:655)
          [  634.406361] object_err (mm/slub.c:662)
          [  634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
          [  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
          [  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
          [  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
          [  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
          [  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
          [  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
          [  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
          [  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)
      Reported-and-tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee9159dd
    • Nicolas Dichtel's avatar
      Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" · 304d888b
      Nicolas Dichtel authored
      This reverts commit ab450605.
      
      In IPv6, we cannot inherit the dst of the original dst. ndisc packets
      are IPv6 packets and may take another route than the original packet.
      
      This patch breaks the following scenario: a packet comes from eth0 and
      is forwarded through vxlan1. The encapsulated packet triggers an NS
      which cannot be sent because of the wrong route.
      
      CC: Jiri Benc <jbenc@redhat.com>
      CC: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      304d888b
  4. 30 Nov, 2015 12 commits
  5. 25 Nov, 2015 2 commits
    • Daniel Borkmann's avatar
      bpf: fix clearing on persistent program array maps · c9da161c
      Daniel Borkmann authored
      Currently, when having map file descriptors pointing to program arrays,
      there's still the issue that we unconditionally flush program array
      contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens
      when such a file descriptor is released and is independent of the map's
      refcount.
      
      Having this flush independent of the refcount is for a reason: there
      can be arbitrary complex dependency chains among tail calls, also circular
      ones (direct or indirect, nesting limit determined during runtime), and
      we need to make sure that the map drops all references to eBPF programs
      it holds, so that the map's refcount can eventually drop to zero and
      initiate its freeing. Btw, a walk of the whole dependency graph would
      not be possible for various reasons, one being complexity and another
      one inconsistency, i.e. new programs can be added to parts of the graph
      at any time, so there's no guaranteed consistent state for the time of
      such a walk.
      
      Now, the program array pinning itself works, but the issue is that each
      derived file descriptor on close would nevertheless call unconditionally
      into bpf_fd_array_map_clear(). Instead, keep track of users and postpone
      this flush until the last reference to a user is dropped. As this only
      concerns a subset of references (f.e. a prog array could hold a program
      that itself has reference on the prog array holding it, etc), we need to
      track them separately.
      
      Short analysis on the refcounting: on map creation time usercnt will be
      one, so there's no change in behaviour for bpf_map_release(), if unpinned.
      If we already fail in map_create(), we are immediately freed, and no
      file descriptor has been made public yet. In bpf_obj_pin_user(), we need
      to probe for a possible map in bpf_fd_probe_obj() already with a usercnt
      reference, so before we drop the reference on the fd with fdput().
      Therefore, if actual pinning fails, we need to drop that reference again
      in bpf_any_put(), otherwise we keep holding it. When last reference
      drops on the inode, the bpf_any_put() in bpf_evict_inode() will take
      care of dropping the usercnt again. In the bpf_obj_get_user() case, the
      bpf_any_get() will grab a reference on the usercnt, still at a time when
      we have the reference on the path. Should we later on fail to grab a new
      file descriptor, bpf_any_put() will drop it, otherwise we hold it until
      bpf_map_release() time.
      
      Joint work with Alexei.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9da161c
    • Christoph Biedl's avatar
      isdn: Partially revert debug format string usage clean up · 19cebbcb
      Christoph Biedl authored
      Commit 35a4a573 ("isdn: clean up debug format string usage") introduced
      a safeguard to avoid accidential format string interpolation of data
      when calling debugl1 or HiSax_putstatus. This did however not take into
      account VHiSax_putstatus (called by HiSax_putstatus) does *not* call
      vsprintf if the head parameter is NULL - the format string is treated
      as plain text then instead. As a result, the string "%s" is processed
      literally, and the actual information is lost. This affects the isdnlog
      userspace program which stopped logging information since that commit.
      
      So revert the HiSax_putstatus invocations to the previous state.
      
      Fixes: 35a4a573 ("isdn: clean up debug format string usage")
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Signed-off-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19cebbcb
  6. 24 Nov, 2015 4 commits
    • Quentin Casasnovas's avatar
      RDS: fix race condition when sending a message on unbound socket · 8c7188b2
      Quentin Casasnovas authored
      Sasha's found a NULL pointer dereference in the RDS connection code when
      sending a message to an apparently unbound socket.  The problem is caused
      by the code checking if the socket is bound in rds_sendmsg(), which checks
      the rs_bound_addr field without taking a lock on the socket.  This opens a
      race where rs_bound_addr is temporarily set but where the transport is not
      in rds_bind(), leading to a NULL pointer dereference when trying to
      dereference 'trans' in __rds_conn_create().
      
      Vegard wrote a reproducer for this issue, so kindly ask him to share if
      you're interested.
      
      I cannot reproduce the NULL pointer dereference using Vegard's reproducer
      with this patch, whereas I could without.
      
      Complete earlier incomplete fix to CVE-2015-6937:
      
        74e98eb0 ("RDS: verify the underlying transport exists before creating a connection")
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Reviewed-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c7188b2
    • Aaron Conole's avatar
      net: openvswitch: Remove invalid comment · 20f79566
      Aaron Conole authored
      During pre-upstream development, the openvswitch datapath used a custom
      hashtable to store vports that could fail on delete due to lack of
      memory. However, prior to upstream submission, this code was reworked to
      use an hlist based hastable with flexible-array based buckets. As such
      the failure condition was eliminated from the vport_del path, rendering
      this comment invalid.
      Signed-off-by: default avatarAaron Conole <aconole@bytheb.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20f79566
    • Nikolay Aleksandrov's avatar
      net: ipmr, ip6mr: fix vif/tunnel failure race condition · fbdd29bf
      Nikolay Aleksandrov authored
      Since (at least) commit b17a7c17 ("[NET]: Do sysfs registration as
      part of register_netdevice."), netdev_run_todo() deals only with
      unregistration, so we don't need to do the rtnl_unlock/lock cycle to
      finish registration when failing pimreg or dvmrp device creation. In
      fact that opens a race condition where someone can delete the device
      while rtnl is unlocked because it's fully registered. The problem gets
      worse when netlink support is introduced as there are more points of entry
      that can cause it and it also makes reusing that code correctly impossible.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbdd29bf
    • David Howells's avatar
      rxrpc: Correctly handle ack at end of client call transmit phase · 33c40e24
      David Howells authored
      Normally, the transmit phase of a client call is implicitly ack'd by the
      reception of the first data packet of the response being received.
      However, if a security negotiation happens, the transmit phase, if it is
      entirely contained in a single packet, may get an ack packet in response
      and then may get aborted due to security negotiation failure.
      
      Because the client has shifted state to RXRPC_CALL_CLIENT_AWAIT_REPLY due
      to having transmitted all the data, the code that handles processing of the
      received ack packet doesn't note the hard ack the data packet.
      
      The following abort packet in the case of security negotiation failure then
      incurs an assertion failure when it tries to drain the Tx queue because the
      hard ack state is out of sync (hard ack means the packets have been
      processed and can be discarded by the sender; a soft ack means that the
      packets are received but could still be discarded and rerequested by the
      receiver).
      
      To fix this, we should record the hard ack we received for the ack packet.
      
      The assertion failure looks like:
      
      	RxRPC: Assertion failed
      	1 <= 0 is false
      	0x1 <= 0x0 is false
      	------------[ cut here ]------------
      	kernel BUG at ../net/rxrpc/ar-ack.c:431!
      	...
      	RIP: 0010:[<ffffffffa006857b>]  [<ffffffffa006857b>] rxrpc_rotate_tx_window+0xbc/0x131 [af_rxrpc]
      	...
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33c40e24