1. 16 Mar, 2017 40 commits
    • Maxime Jayat's avatar
      net: socket: fix recvmmsg not returning error from sock_error · 82df12b2
      Maxime Jayat authored
      [ Upstream commit e623a9e9 ]
      
      Commit 34b88a68 ("net: Fix use after free in the recvmmsg exit path"),
      changed the exit path of recvmmsg to always return the datagrams
      variable and modified the error paths to set the variable to the error
      code returned by recvmsg if necessary.
      
      However in the case sock_error returned an error, the error code was
      then ignored, and recvmmsg returned 0.
      
      Change the error path of recvmmsg to correctly return the error code
      of sock_error.
      
      The bug was triggered by using recvmmsg on a CAN interface which was
      not up. Linux 4.6 and later return 0 in this case while earlier
      releases returned -ENETDOWN.
      
      Fixes: 34b88a68 ("net: Fix use after free in the recvmmsg exit path")
      Signed-off-by: default avatarMaxime Jayat <maxime.jayat@mobile-devices.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      82df12b2
    • Kefeng Wang's avatar
      ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock · b774af06
      Kefeng Wang authored
      [ Upstream commit 03e4deff ]
      
      Just like commit 4acd4945 ("ipv6: addrconf: Avoid calling
      netdevice notifiers with RCU read-side lock"), it is unnecessary
      to make addrconf_disable_change() use RCU iteration over the
      netdev list, since it already holds the RTNL lock, or we may meet
      Illegal context switch in RCU read-side critical section.
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b774af06
    • Michal Tesar's avatar
      igmp: Make igmp group member RFC 3376 compliant · afc5bf35
      Michal Tesar authored
      [ Upstream commit 7ababb78 ]
      
      5.2. Action on Reception of a Query
      
       When a system receives a Query, it does not respond immediately.
       Instead, it delays its response by a random amount of time, bounded
       by the Max Resp Time value derived from the Max Resp Code in the
       received Query message.  A system may receive a variety of Queries on
       different interfaces and of different kinds (e.g., General Queries,
       Group-Specific Queries, and Group-and-Source-Specific Queries), each
       of which may require its own delayed response.
      
       Before scheduling a response to a Query, the system must first
       consider previously scheduled pending responses and in many cases
       schedule a combined response.  Therefore, the system must be able to
       maintain the following state:
      
       o A timer per interface for scheduling responses to General Queries.
      
       o A per-group and interface timer for scheduling responses to Group-
         Specific and Group-and-Source-Specific Queries.
      
       o A per-group and interface list of sources to be reported in the
         response to a Group-and-Source-Specific Query.
      
       When a new Query with the Router-Alert option arrives on an
       interface, provided the system has state to report, a delay for a
       response is randomly selected in the range (0, [Max Resp Time]) where
       Max Resp Time is derived from Max Resp Code in the received Query
       message.  The following rules are then used to determine if a Report
       needs to be scheduled and the type of Report to schedule.  The rules
       are considered in order and only the first matching rule is applied.
      
       1. If there is a pending response to a previous General Query
          scheduled sooner than the selected delay, no additional response
          needs to be scheduled.
      
       2. If the received Query is a General Query, the interface timer is
          used to schedule a response to the General Query after the
          selected delay.  Any previously pending response to a General
          Query is canceled.
      --8<--
      
      Currently the timer is rearmed with new random expiration time for
      every incoming query regardless of possibly already pending report.
      Which is not aligned with the above RFE.
      It also might happen that higher rate of incoming queries can
      postpone the report after the expiration time of the first query
      causing group membership loss.
      
      Now the per interface general query timer is rearmed only
      when there is no pending report already scheduled on that interface or
      the newly selected expiration time is before the already pending
      scheduled report.
      Signed-off-by: default avatarMichal Tesar <mtesar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      afc5bf35
    • Reiter Wolfgang's avatar
      drop_monitor: consider inserted data in genlmsg_end · ad998a40
      Reiter Wolfgang authored
      [ Upstream commit 3b48ab22 ]
      
      Final nlmsg_len field update must reflect inserted net_dm_drop_point
      data.
      
      This patch depends on previous patch:
      "drop_monitor: add missing call to genlmsg_end"
      Signed-off-by: default avatarReiter Wolfgang <wr0112358@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ad998a40
    • Reiter Wolfgang's avatar
      drop_monitor: add missing call to genlmsg_end · cf7688c4
      Reiter Wolfgang authored
      [ Upstream commit 4200462d ]
      
      Update nlmsg_len field with genlmsg_end to enable userspace processing
      using nlmsg_next helper. Also adds error handling.
      Signed-off-by: default avatarReiter Wolfgang <wr0112358@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cf7688c4
    • Dave Jones's avatar
      ipv6: handle -EFAULT from skb_copy_bits · e094fdd4
      Dave Jones authored
      [ Upstream commit a98f9175 ]
      
      By setting certain socket options on ipv6 raw sockets, we can confuse the
      length calculation in rawv6_push_pending_frames triggering a BUG_ON.
      
      RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
      RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
      RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
      RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
      RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
      R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
      R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
      
      Call Trace:
       [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
       [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
       [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
       [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
       [<ffffffff816da27e>] SyS_sendto+0xe/0x10
       [<ffffffff81002910>] do_syscall_64+0x50/0xa0
       [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
      
      Reproducer:
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      
      #define LEN 504
      
      int main(int argc, char* argv[])
      {
      	int fd;
      	int zero = 0;
      	char buf[LEN];
      
      	memset(buf, 0, LEN);
      
      	fd = socket(AF_INET6, SOCK_RAW, 7);
      
      	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
      	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
      
      	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
      }
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e094fdd4
    • stephen hemminger's avatar
      netvsc: reduce maximum GSO size · a643d659
      stephen hemminger authored
      [ Upstream commit a50af86d ]
      
      Hyper-V (and Azure) support using NVGRE which requires some extra space
      for encapsulation headers. Because of this the largest allowed TSO
      packet is reduced.
      
      For older releases, hard code a fixed reduced value.  For next release,
      there is a better solution which uses result of host offload
      negotiation.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a643d659
    • Eric Dumazet's avatar
      net/dccp: fix use-after-free in dccp_invalid_packet · 00b9bf63
      Eric Dumazet authored
      [ Upstream commit 648f0c28 ]
      
      pskb_may_pull() can reallocate skb->head, we need to reload dh pointer
      in dccp_invalid_packet() or risk use after free.
      
      Bug found by Andrey Konovalov using syzkaller.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      00b9bf63
    • Amir Vadai's avatar
      net/sched: pedit: make sure that offset is valid · 03aeee0d
      Amir Vadai authored
      [ Upstream commit 95c2027b ]
      
      Add a validation function to make sure offset is valid:
      1. Not below skb head (could happen when offset is negative).
      2. Validate both 'offset' and 'at'.
      Signed-off-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      03aeee0d
    • Jeremy Linton's avatar
      net: sky2: Fix shutdown crash · 38ae3328
      Jeremy Linton authored
      [ Upstream commit 06ba3b21 ]
      
      The sky2 frequently crashes during machine shutdown with:
      
      sky2_get_stats+0x60/0x3d8 [sky2]
      dev_get_stats+0x68/0xd8
      rtnl_fill_stats+0x54/0x140
      rtnl_fill_ifinfo+0x46c/0xc68
      rtmsg_ifinfo_build_skb+0x7c/0xf0
      rtmsg_ifinfo.part.22+0x3c/0x70
      rtmsg_ifinfo+0x50/0x5c
      netdev_state_change+0x4c/0x58
      linkwatch_do_dev+0x50/0x88
      __linkwatch_run_queue+0x104/0x1a4
      linkwatch_event+0x30/0x3c
      process_one_work+0x140/0x3e0
      worker_thread+0x60/0x44c
      kthread+0xdc/0xf0
      ret_from_fork+0x10/0x50
      
      This is caused by the sky2 being called after it has been shutdown.
      A previous thread about this can be found here:
      
      https://lkml.org/lkml/2016/4/12/410
      
      An alternative fix is to assure that IFF_UP gets cleared by
      calling dev_close() during shutdown. This is similar to what the
      bnx2/tg3/xgene and maybe others are doing to assure that the driver
      isn't being called following _shutdown().
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      38ae3328
    • Paolo Abeni's avatar
      ip6_tunnel: disable caching when the traffic class is inherited · 0fe8ea69
      Paolo Abeni authored
      [ Upstream commit b5c2d495 ]
      
      If an ip6 tunnel is configured to inherit the traffic class from
      the inner header, the dst_cache must be disabled or it will foul
      the policy routing.
      
      The issue is apprently there since at leat Linux-2.6.12-rc2.
      Reported-by: default avatarLiam McBirnie <liam.mcbirnie@boeing.com>
      Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0fe8ea69
    • Soheil Hassas Yeganeh's avatar
      sock: fix sendmmsg for partial sendmsg · 058627c4
      Soheil Hassas Yeganeh authored
      [ Upstream commit 3023898b ]
      
      Do not send the next message in sendmmsg for partial sendmsg
      invocations.
      
      sendmmsg assumes that it can continue sending the next message
      when the return value of the individual sendmsg invocations
      is positive. It results in corrupting the data for TCP,
      SCTP, and UNIX streams.
      
      For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
      of "aefgh" if the first sendmsg invocation sends only the first
      byte while the second sendmsg goes through.
      
      Datagram sockets either send the entire datagram or fail, so
      this patch affects only sockets of type SOCK_STREAM and
      SOCK_SEQPACKET.
      
      Fixes: 228e548e ("net: Add sendmmsg socket system call")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: we don't have the iov_iter API, so make
       ___sys_sendmsg() calculate and write back the remaining length]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      058627c4
    • Marcelo Ricardo Leitner's avatar
      sctp: assign assoc_id earlier in __sctp_connect · d1e28108
      Marcelo Ricardo Leitner authored
      [ Upstream commit 7233bc84 ]
      
      sctp_wait_for_connect() currently already holds the asoc to keep it
      alive during the sleep, in case another thread release it. But Andrey
      Konovalov and Dmitry Vyukov reported an use-after-free in such
      situation.
      
      Problem is that __sctp_connect() doesn't get a ref on the asoc and will
      do a read on the asoc after calling sctp_wait_for_connect(), but by then
      another thread may have closed it and the _put on sctp_wait_for_connect
      will actually release it, causing the use-after-free.
      
      Fix is, instead of doing the read after waiting for the connect, do it
      before so, and avoid this issue as the socket is still locked by then.
      There should be no issue on returning the asoc id in case of failure as
      the application shouldn't trust on that number in such situations
      anyway.
      
      This issue doesn't exist in sctp_sendmsg() path.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d1e28108
    • Eric Dumazet's avatar
      ipv6: dccp: fix out of bound access in dccp_v6_err() · 4ca7e66f
      Eric Dumazet authored
      [ Upstream commit 1aa9d1a0 ]
      
      dccp_v6_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmpv6_notify() are more than enough.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: use offsetof() + sizeof() instead of
       offsetofend()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4ca7e66f
    • Eric Dumazet's avatar
      dccp: fix out of bound access in dccp_v4_err() · 96106a20
      Eric Dumazet authored
      [ Upstream commit 6706a97f ]
      
      dccp_v4_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmp_socket_deliver() are more than enough.
      
      This patch might allow to process more ICMP messages, as some routers
      are still limiting the size of reflected bytes to 28 (RFC 792), instead
      of extended lengths (RFC 1812 4.3.2.3)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: use offsetof() + sizeof() instead of
       offsetofend()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      96106a20
    • Eric Dumazet's avatar
      dccp: do not send reset to already closed sockets · 0d8a6712
      Eric Dumazet authored
      [ Upstream commit 346da62c ]
      
      Andrey reported following warning while fuzzing with syzkaller
      
      WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
       ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
       0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
       [<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179
       [<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542
       [<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
       [<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
       [<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
       [<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570
       [<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208
       [<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116
       [<     inline     >] exit_task_work include/linux/task_work.h:21
       [<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828
       [<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931
       [<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307
       [<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
       [<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130
      arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0
      arch/x86/entry/common.c:259
       [<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      
      Fix this the same way we did for TCP in commit 565b7b2d
      ("tcp: do not send reset to already closed sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0d8a6712
    • Eric Dumazet's avatar
      net: mangle zero checksum in skb_checksum_help() · f8c5af06
      Eric Dumazet authored
      [ Upstream commit 4f2e4ad5 ]
      
      Sending zero checksum is ok for TCP, but not for UDP.
      
      UDPv6 receiver should by default drop a frame with a 0 checksum,
      and UDPv4 would not verify the checksum and might accept a corrupted
      packet.
      
      Simply replace such checksum by 0xffff, regardless of transport.
      
      This error was caught on SIT tunnels, but seems generic.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f8c5af06
    • Eric Dumazet's avatar
      net: clear sk_err_soft in sk_clone_lock() · 5a479261
      Eric Dumazet authored
      [ Upstream commit e551c32d ]
      
      At accept() time, it is possible the parent has a non zero
      sk_err_soft, leftover from a prior error.
      
      Make sure we do not leave this value in the child, as it
      makes future getsockopt(SO_ERROR) calls quite unreliable.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5a479261
    • Jiri Slaby's avatar
      net: sctp, forbid negative length · 74761f65
      Jiri Slaby authored
      [ Upstream commit a4b8e71b ]
      
      Most of getsockopt handlers in net/sctp/socket.c check len against
      sizeof some structure like:
              if (len < sizeof(int))
                      return -EINVAL;
      
      On the first look, the check seems to be correct. But since len is int
      and sizeof returns size_t, int gets promoted to unsigned size_t too. So
      the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
      false.
      
      Fix this in sctp by explicitly checking len < 0 before any getsockopt
      handler is called.
      
      Note that sctp_getsockopt_events already handled the negative case.
      Since we added the < 0 check elsewhere, this one can be removed.
      
      If not checked, this is the result:
      UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
      shift exponent 52 is too large for 32-bit type 'int'
      CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
       0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
       ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
       0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
      Call Trace:
       [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
      ...
       [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
       [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
       [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
       [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
       [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
       [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
       [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-sctp@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      74761f65
    • Douglas Caetano dos Santos's avatar
      tcp: fix wrong checksum calculation on MTU probing · 016082bf
      Douglas Caetano dos Santos authored
      [ Upstream commit 2fe664f1 ]
      
      With TCP MTU probing enabled and offload TX checksumming disabled,
      tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
      into the probe's SKB had an odd length. This was caused by the direct use
      of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
      fragment being copied, if needed. When this fragment was not the last, a
      subsequent call used the previous checksum without considering this
      padding.
      
      The effect was a stale connection in one way, as even retransmissions
      wouldn't solve the problem, because the checksum was never recalculated for
      the full SKB length.
      Signed-off-by: default avatarDouglas Caetano dos Santos <douglascs@taghos.com.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      016082bf
    • Eric Dumazet's avatar
      net: avoid sk_forward_alloc overflows · 5e8b0208
      Eric Dumazet authored
      [ Upstream commit 20c64d5c ]
      
      A malicious TCP receiver, sending SACK, can force the sender to split
      skbs in write queue and increase its memory usage.
      
      Then, when socket is closed and its write queue purged, we might
      overflow sk_forward_alloc (It becomes negative)
      
      sk_mem_reclaim() does nothing in this case, and more than 2GB
      are leaked from TCP perspective (tcp_memory_allocated is not changed)
      
      Then warnings trigger from inet_sock_destruct() and
      sk_stream_kill_queues() seeing a not zero sk_forward_alloc
      
      All TCP stack can be stuck because TCP is under memory pressure.
      
      A simple fix is to preemptively reclaim from sk_mem_uncharge().
      
      This makes sure a socket wont have more than 2 MB forward allocated,
      after burst and idle period.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5e8b0208
    • Eric Dumazet's avatar
      tcp: fix overflow in __tcp_retransmit_skb() · b2c5b704
      Eric Dumazet authored
      [ Upstream commit ffb4d6c8 ]
      
      If a TCP socket gets a large write queue, an overflow can happen
      in a test in __tcp_retransmit_skb() preventing all retransmits.
      
      The flow then stalls and resets after timeouts.
      
      Tested:
      
      sysctl -w net.core.wmem_max=1000000000
      netperf -H dest -- -s 1000000000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b2c5b704
    • Eric Dumazet's avatar
      net: fix sk_mem_reclaim_partial() · 065d8e5a
      Eric Dumazet authored
      commit 1a24e04e upstream.
      
      sk_mem_reclaim_partial() goal is to ensure each socket has
      one SK_MEM_QUANTUM forward allocation. This is needed both for
      performance and better handling of memory pressure situations in
      follow up patches.
      
      SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes
      as some arches have 64KB pages.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Keep using atomic_long_sub() directly, not sk_memory_allocated_sub()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      065d8e5a
    • Tom Goff's avatar
      ipmr/ip6mr: Initialize the last assert time of mfc entries. · 093582b9
      Tom Goff authored
      [ Upstream commit 70a0dec4 ]
      
      This fixes wrong-interface signaling on 32-bit platforms for entries
      created when jiffies > 2^31 + MFC_ASSERT_THRESH.
      Signed-off-by: default avatarTom Goff <thomas.goff@ll.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      093582b9
    • Nikolay Aleksandrov's avatar
      net: bridge: fix old ioctl unlocked net device walk · 91fbc717
      Nikolay Aleksandrov authored
      [ Upstream commit 31ca0458 ]
      
      get_bridge_ifindices() is used from the old "deviceless" bridge ioctl
      calls which aren't called with rtnl held. The comment above says that it is
      called with rtnl but that is not really the case.
      Here's a sample output from a test ASSERT_RTNL() which I put in
      get_bridge_ifindices and executed "brctl show":
      [  957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30)
      [  957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G        W  O
      4.6.0-rc4+ #157
      [  957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.8.1-20150318_183358- 04/01/2014
      [  957.423009]  0000000000000000 ffff880058adfdf0 ffffffff8138dec5
      0000000000000400
      [  957.423009]  ffffffff81ce8380 ffff880058adfe58 ffffffffa05ead32
      0000000000000001
      [  957.423009]  00007ffec1a444b0 0000000000000400 ffff880053c19130
      0000000000008940
      [  957.423009] Call Trace:
      [  957.423009]  [<ffffffff8138dec5>] dump_stack+0x85/0xc0
      [  957.423009]  [<ffffffffa05ead32>]
      br_ioctl_deviceless_stub+0x212/0x2e0 [bridge]
      [  957.423009]  [<ffffffff81515beb>] sock_ioctl+0x22b/0x290
      [  957.423009]  [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700
      [  957.423009]  [<ffffffff8126c159>] SyS_ioctl+0x79/0x90
      [  957.423009]  [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
      
      Since it only reads bridge ifindices, we can use rcu to safely walk the net
      device list. Also remove the wrong rtnl comment above.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      91fbc717
    • WANG Cong's avatar
      sch_dsmark: update backlog as well · be3b6736
      WANG Cong authored
      [ Upstream commit bdf17661 ]
      
      Similarly, we need to update backlog too when we update qlen.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: open-code qdisc_qstats_backlog_{inc,dec}()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      be3b6736
    • WANG Cong's avatar
      sch_htb: update backlog as well · 46d263c2
      WANG Cong authored
      [ Upstream commit 431e3a8e ]
      
      We saw qlen!=0 but backlog==0 on our production machine:
      
      qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17
       Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0)
       backlog 0b 72p requeues 0
      
      The problem is we only count qlen for HTB qdisc but not backlog.
      We need to update backlog too when we update qlen, so that we
      can at least know the average packet length.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Open-code qdisc_qstats_backlog_{inc,dec}()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      46d263c2
    • Chris Friesen's avatar
      route: do not cache fib route info on local routes with oif · a784a2be
      Chris Friesen authored
      [ Upstream commit d6d5e999 ]
      
      For local routes that require a particular output interface we do not want
      to cache the result.  Caching the result causes incorrect behaviour when
      there are multiple source addresses on the interface.  The end result
      being that if the intended recipient is waiting on that interface for the
      packet he won't receive it because it will be delivered on the loopback
      interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
      interface as well.
      
      This can be tested by running a program such as "dhcp_release" which
      attempts to inject a packet on a particular interface so that it is
      received by another program on the same board.  The receiving process
      should see an IP_PKTINFO ipi_ifndex value of the source interface
      (e.g., eth1) instead of the loopback interface (e.g., lo).  The packet
      will still appear on the loopback interface in tcpdump but the important
      aspect is that the CMSG info is correct.
      
      Sample dhcp_release command line:
      
         dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
      Signed-off-by: default avatarAllain Legacy <allain.legacy@windriver.com>
      Signed off-by: Chris Friesen <chris.friesen@windriver.com>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a784a2be
    • David S. Miller's avatar
      decnet: Do not build routes to devices without decnet private data. · d5536274
      David S. Miller authored
      [ Upstream commit a36a0d40 ]
      
      In particular, make sure we check for decnet private presence
      for loopback devices.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d5536274
    • Rasmus Villemoes's avatar
      lib/vsprintf.c: improve sanity check in vsnprintf() · bebe2f0f
      Rasmus Villemoes authored
      commit 2aa2f9e2 upstream.
      
      On 64 bit, size may very well be huge even if bit 31 happens to be 0.
      Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
      3 GiB one.  So cap at INT_MAX as was probably the intention all along.
      This is also the made-up value passed by sprintf and vsprintf.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Willy Tarreau <w@1wt.eu>
      bebe2f0f
    • David S. Miller's avatar
      irda: Fix lockdep annotations in hashbin_delete(). · c512d177
      David S. Miller authored
      commit 4c03b862 upstream.
      
      A nested lock depth was added to the hasbin_delete() code but it
      doesn't actually work some well and results in tons of lockdep splats.
      
      Fix the code instead to properly drop the lock around the operation
      and just keep peeking the head of the hashbin queue.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c512d177
    • Al Viro's avatar
      Fix missing sanity check in /dev/sg · fd3a84ea
      Al Viro authored
      commit 137d01df upstream.
      
      What happens is that a write to /dev/sg is given a request with non-zero
      ->iovec_count combined with zero ->dxfer_len.  Or with ->dxferp pointing
      to an array full of empty iovecs.
      
      Having write permission to /dev/sg shouldn't be equivalent to the
      ability to trigger BUG_ON() while holding spinlocks...
      
      Found by Dmitry Vyukov and syzkaller.
      
      [ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the
        underlying issue.  - Linus ]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: we're not using iov_iter, but can check the
       byte length after truncation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fd3a84ea
    • Anoob Soman's avatar
      packet: Do not call fanout_release from atomic contexts · 9a9c1bae
      Anoob Soman authored
      commit 2bd624b4 upstream.
      
      Commit 66644982 ("packet: call fanout_release, while UNREGISTERING a
      netdev"), unfortunately, introduced the following issues.
      
      1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
      rcu_read-side critical section. rcu_read_lock disables preemption, most often,
      which prohibits calling sleeping functions.
      
      [  ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section!
      [  ]
      [  ] rcu_scheduler_active = 1, debug_locks = 0
      [  ] 4 locks held by ovs-vswitchd/1969:
      [  ]  #0:  (cb_lock){++++++}, at: [<ffffffff8158a6c9>] genl_rcv+0x19/0x40
      [  ]  #1:  (ovs_mutex){+.+.+.}, at: [<ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
      [  ]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81564157>] rtnl_lock+0x17/0x20
      [  ]  #3:  (rcu_read_lock){......}, at: [<ffffffff81614165>] packet_notifier+0x5/0x3f0
      [  ]
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110
      [  ]  [<ffffffff810a2da7>] ___might_sleep+0x57/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff810de93f>] ? vprintk_default+0x1f/0x30
      [  ]  [<ffffffff81186e88>] ? printk+0x4d/0x4f
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
      "sleeping function called from invalid context"
      
      [  ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [  ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810a2f52>] ___might_sleep+0x202/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      3. calling dev_remove_pack(&fanout->prot_hook), from inside
      spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
      -> synchronize_net(), which might sleep.
      
      [  ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff81186274>] __schedule_bug+0x64/0x73
      [  ]  [<ffffffff8162b8cb>] __schedule+0x6b/0xd10
      [  ]  [<ffffffff8162c5db>] schedule+0x6b/0x80
      [  ]  [<ffffffff81630b1d>] schedule_timeout+0x38d/0x410
      [  ]  [<ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810
      [  ]  [<ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10
      [  ]  [<ffffffff8154eab5>] synchronize_net+0x35/0x50
      [  ]  [<ffffffff8154eae3>] dev_remove_pack+0x13/0x20
      [  ]  [<ffffffff8161077e>] fanout_release+0xbe/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      4. fanout_release() races with calls from different CPU.
      
      To fix the above problems, remove the call to fanout_release() under
      rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
      netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
      to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
      __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
      fanout->prot_hook is removed as well.
      
      Fixes: 66644982 ("packet: call fanout_release, while UNREGISTERING a netdev")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Don't call fanout_release_data()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9a9c1bae
    • Anoob Soman's avatar
      packet: call fanout_release, while UNREGISTERING a netdev · 4d872cbe
      Anoob Soman authored
      commit 66644982 upstream.
      
      If a socket has FANOUT sockopt set, a new proto_hook is registered
      as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
      af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
      registered as part of fanout_add is not removed. Call fanout_release, on a
      NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
      fanout_list.
      
      This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4d872cbe
    • Miklos Szeredi's avatar
      vfs: fix uninitialized flags in splice_to_pipe() · da3fd214
      Miklos Szeredi authored
      commit 5a81e6a1 upstream.
      
      Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
      unused part of the pipe ring buffer.  Previously splice_to_pipe() left
      the flags value alone, which could result in incorrect behavior.
      
      Uninitialized flags appears to have been there from the introduction of
      the splice syscall.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      da3fd214
    • Anssi Hannula's avatar
      net: xilinx_emaclite: fix receive buffer overflow · 4091620d
      Anssi Hannula authored
      commit cd224553 upstream.
      
      xilinx_emaclite looks at the received data to try to determine the
      Ethernet packet length but does not properly clamp it if
      proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
      overflow and a panic via skb_panic() as the length exceeds the allocated
      skb size.
      
      Fix those cases.
      
      Also add an additional unconditional check with WARN_ON() at the end.
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Fixes: bb81b2dd ("net: add Xilinx emac lite device driver")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4091620d
    • Mauro Carvalho Chehab's avatar
      siano: make it work again with CONFIG_VMAP_STACK · f606625d
      Mauro Carvalho Chehab authored
      commit f9c85ee6 upstream.
      
      Reported as a Kaffeine bug:
      	https://bugs.kde.org/show_bug.cgi?id=375811
      
      The USB control messages require DMA to work. We cannot pass
      a stack-allocated buffer, as it is not warranted that the
      stack would be into a DMA enabled area.
      
      On Kernel 4.9, the default is to not accept DMA on stack anymore
      on x86 architecture. On other architectures, this has been a
      requirement since Kernel 2.2. So, after this patch, this driver
      should likely work fine on all archs.
      
      Tested with USB ID 2040:5510: Hauppauge Windham
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      [bwh: Backported to 3.2:
       - s/sms_msg_hdr/SmsMsgHdr_ST/
       - Adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f606625d
    • Eric Dumazet's avatar
      packet: fix races in fanout_add() · 382299a0
      Eric Dumazet authored
      commit d199fab6 upstream.
      
      Multiple threads can call fanout_add() at the same time.
      
      We need to grab fanout_mutex earlier to avoid races that could
      lead to one thread freeing po->rollover that was set by another thread.
      
      Do the same in fanout_release(), for peace of mind, and to help us
      finding lockdep issues earlier.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - No rollover queue stats
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      382299a0
    • Yang Yang's avatar
      futex: Move futex_init() to core_initcall · 1560f610
      Yang Yang authored
      commit 25f71d1c upstream.
      
      The UEVENT user mode helper is enabled before the initcalls are executed
      and is available when the root filesystem has been mounted.
      
      The user mode helper is triggered by device init calls and the executable
      might use the futex syscall.
      
      futex_init() is marked __initcall which maps to device_initcall, but there
      is no guarantee that futex_init() is invoked _before_ the first device init
      call which triggers the UEVENT user mode helper.
      
      If the user mode helper uses the futex syscall before futex_init() then the
      syscall crashes with a NULL pointer dereference because the futex subsystem
      has not been initialized yet.
      
      Move futex_init() to core_initcall so futexes are initialized before the
      root filesystem is mounted and the usermode helper becomes available.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: default avatarYang Yang <yang.yang29@zte.com.cn>
      Cc: jiang.biao2@zte.com.cn
      Cc: jiang.zhengxiong@zte.com.cn
      Cc: zhong.weidong@zte.com.cn
      Cc: deng.huali@zte.com.cn
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cnSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1560f610
    • Eric Dumazet's avatar
      net/llc: avoid BUG_ON() in skb_orphan() · 0887b3f3
      Eric Dumazet authored
      commit 8b74d439 upstream.
      
      It seems nobody used LLC since linux-3.12.
      
      Fortunately fuzzers like syzkaller still know how to run this code,
      otherwise it would be no fun.
      
      Setting skb->sk without skb->destructor leads to all kinds of
      bugs, we now prefer to be very strict about it.
      
      Ideally here we would use skb_set_owner() but this helper does not exist yet,
      only CAN seems to have a private helper for that.
      
      Fixes: 376c7311 ("net: add a temporary sanity check in skb_orphan()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0887b3f3