1. 19 May, 2014 40 commits
    • Hannes Frederic Sowa's avatar
      net: rework recvmsg handler msg_name and msg_namelen logic · 4485f23c
      Hannes Frederic Sowa authored
      CVE-2013-7266
      
      BugLink: http://bugs.launchpad.net/bugs/1267081
      
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (back ported from commit f3d33426)
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Acked-by: default avatarAndy Whitcroft <andy.whitcroft@canonical.com>
      Acked-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4485f23c
    • Hannes Frederic Sowa's avatar
      inet: prevent leakage of uninitialized memory to user in recv syscalls · f8e4f8fc
      Hannes Frederic Sowa authored
      [ Upstream commit bceaa902 ]
      
      Only update *addr_len when we actually fill in sockaddr, otherwise we
      can return uninitialized memory from the stack to the caller in the
      recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
      checks because we only get called with a valid addr_len pointer either
      from sock_common_recvmsg or inet_recvmsg.
      
      If a blocking read waits on a socket which is concurrently shut down we
      now return zero and set msg_msgnamelen to 0.
      Reported-by: default avatarmpb <mpb.mail@gmail.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [wt: no ieee802154, ping nor l2tp in 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f8e4f8fc
    • Eric Dumazet's avatar
      ipv4: fix possible seqlock deadlock · c091517b
      Eric Dumazet authored
      [ Upstream commit c9e90429 ]
      
      ip4_datagram_connect() being called from process context,
      it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
      otherwise we can deadlock on 32bit arches, or get corruptions of
      SNMP counters.
      
      Fixes: 584bdf8c ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c091517b
    • Dan Carpenter's avatar
      isdnloop: use strlcpy() instead of strcpy() · ed488d77
      Dan Carpenter authored
      [ Upstream commit f9a23c84 ]
      
      These strings come from a copy_from_user() and there is no way to be
      sure they are NUL terminated.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ed488d77
    • Nikolay Aleksandrov's avatar
      bonding: fix two race conditions in bond_store_updelay/downdelay · f1a8a3ff
      Nikolay Aleksandrov authored
      [ Upstream commit b869ccfa ]
      
      This patch fixes two race conditions between bond_store_updelay/downdelay
      and bond_store_miimon which could lead to division by zero as miimon can
      be set to 0 while either updelay/downdelay are being set and thus miss the
      zero check in the beginning, the zero div happens because updelay/downdelay
      are stored as new_value / bond->params.miimon. Use rtnl to synchronize with
      miimon setting.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Veaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f1a8a3ff
    • Daniel Borkmann's avatar
      random32: fix off-by-one in seeding requirement · 77ef71a8
      Daniel Borkmann authored
      [ Upstream commit 51c37a70 ]
      
      For properly initialising the Tausworthe generator [1], we have
      a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15.
      
      Commit 697f8d03 ("random32: seeding improvement") introduced
      a __seed() function that imposes boundary checks proposed by the
      errata paper [2] to properly ensure above conditions.
      
      However, we're off by one, as the function is implemented as:
      "return (x < m) ? x + m : x;", and called with __seed(X, 1),
      __seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15
      would be possible, whereas the lower boundary should actually
      be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise
      an initialization with an unwanted seed could have the effect
      that Tausworthe's PRNG properties cannot not be ensured.
      
      Note that this PRNG is *not* used for cryptography in the kernel.
      
       [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
       [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
      
      Joint work with Hannes Frederic Sowa.
      
      Fixes: 697f8d03 ("random32: seeding improvement")
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      77ef71a8
    • Duan Jiong's avatar
      ipv6: use rt6_get_dflt_router to get default router in rt6_route_rcv · e2c48fd8
      Duan Jiong authored
      [ Upstream commit f104a567 ]
      
      As the rfc 4191 said, the Router Preference and Lifetime values in a
      ::/0 Route Information Option should override the preference and lifetime
      values in the Router Advertisement header. But when the kernel deals with
      a ::/0 Route Information Option, the rt6_get_route_info() always return
      NULL, that means that overriding will not happen, because those default
      routers were added without flag RTF_ROUTEINFO in rt6_add_dflt_router().
      
      In order to deal with that condition, we should call rt6_get_dflt_router
      when the prefix length is 0.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e2c48fd8
    • Andreas Henriksson's avatar
      net: Fix "ip rule delete table 256" · 062e4edd
      Andreas Henriksson authored
      [ Upstream commit 13eb2ab2 ]
      
      When trying to delete a table >= 256 using iproute2 the local table
      will be deleted.
      The table id is specified as a netlink attribute when it needs more then
      8 bits and iproute2 then sets the table field to RT_TABLE_UNSPEC (0).
      Preconditions to matching the table id in the rule delete code
      doesn't seem to take the "table id in netlink attribute" into condition
      so the frh_get_table helper function never gets to do its job when
      matching against current rule.
      Use the helper function twice instead of peaking at the table value directly.
      
      Originally reported at: http://bugs.debian.org/724783Reported-by: default avatarNicolas HICHER <nhicher@avencall.com>
      Signed-off-by: default avatarAndreas Henriksson <andreas@fatal.se>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      062e4edd
    • Ying Xue's avatar
      tipc: fix lockdep warning during bearer initialization · b0027af4
      Ying Xue authored
      [ Upstream commit 4225a398 ]
      
      When the lockdep validator is enabled, it will report the below
      warning when we enable a TIPC bearer:
      
      [ INFO: possible irq lock inversion dependency detected ]
      ---------------------------------------------------------
      Possible interrupt unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(ptype_lock);
                                      local_irq_disable();
                                      lock(tipc_net_lock);
                                      lock(ptype_lock);
         <Interrupt>
         lock(tipc_net_lock);
      
        *** DEADLOCK ***
      
      the shortest dependencies between 2nd lock and 1st lock:
        -> (ptype_lock){+.+...} ops: 10 {
      [...]
      SOFTIRQ-ON-W at:
                            [<c1089418>] __lock_acquire+0x528/0x13e0
                            [<c108a360>] lock_acquire+0x90/0x100
                            [<c1553c38>] _raw_spin_lock+0x38/0x50
                            [<c14651ca>] dev_add_pack+0x3a/0x60
                            [<c182da75>] arp_init+0x1a/0x48
                            [<c182dce5>] inet_init+0x181/0x27e
                            [<c1001114>] do_one_initcall+0x34/0x170
                            [<c17f7329>] kernel_init+0x110/0x1b2
                            [<c155b6a2>] kernel_thread_helper+0x6/0x10
      [...]
         ... key      at: [<c17e4b10>] ptype_lock+0x10/0x20
         ... acquired at:
          [<c108a360>] lock_acquire+0x90/0x100
          [<c1553c38>] _raw_spin_lock+0x38/0x50
          [<c14651ca>] dev_add_pack+0x3a/0x60
          [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
          [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
          [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
          [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
          [<c148e802>] genl_rcv_msg+0x232/0x280
          [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
          [<c148e5bc>] genl_rcv+0x1c/0x30
          [<c148d144>] netlink_unicast+0x174/0x1f0
          [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
          [<c1456bc1>] sock_aio_write+0x161/0x170
          [<c1135a7c>] do_sync_write+0xac/0xf0
          [<c11360f6>] vfs_write+0x156/0x170
          [<c11361e2>] sys_write+0x42/0x70
          [<c155b0df>] sysenter_do_call+0x12/0x38
      [...]
      }
        -> (tipc_net_lock){+..-..} ops: 4 {
      [...]
          IN-SOFTIRQ-R at:
                           [<c108953a>] __lock_acquire+0x64a/0x13e0
                           [<c108a360>] lock_acquire+0x90/0x100
                           [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
                           [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
                           [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
                           [<c146a5fa>] __netif_receive_skb+0x22a/0x590
                           [<c146ab0b>] netif_receive_skb+0x2b/0xf0
                           [<c13c43d2>] pcnet32_poll+0x292/0x780
                           [<c146b00a>] net_rx_action+0xfa/0x1e0
                           [<c103a4be>] __do_softirq+0xae/0x1e0
      [...]
      }
      
      >From the log, we can see three different call chains between
      CPU0 and CPU1:
      
      Time 0 on CPU0:
      
        kernel_init()->inet_init()->dev_add_pack()
      
      At time 0, the ptype_lock is held by CPU0 in dev_add_pack();
      
      Time 1 on CPU1:
      
        tipc_enable_bearer()->enable_bearer()->dev_add_pack()
      
      At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
      wants to take ptype_lock to register TIPC protocol handler into the
      networking stack.  But the ptype_lock has been taken by dev_add_pack()
      on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
      busy looping.
      
      Time 2 on CPU0:
      
        netif_receive_skb()->recv_msg()->tipc_recv_msg()
      
      At time 2, an incoming TIPC packet arrives at CPU0, hence
      tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
      to hold tipc_net_lock.  At the moment, below scenario happens:
      
      On CPU0, below is our sequence of taking locks:
      
        lock(ptype_lock)->lock(tipc_net_lock)
      
      On CPU1, our sequence of taking locks looks like:
      
        lock(tipc_net_lock)->lock(ptype_lock)
      
      Obviously deadlock may happen in this case.
      
      But please note the deadlock possibly doesn't occur at all when the
      first TIPC bearer is enabled.  Before enable_bearer() -- running on
      CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
      recv_msg()) is not registered successfully via dev_add_pack(), so
      the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
      message comes to CPU0. But when the second TIPC bearer is
      registered, the deadlock can perhaps really happen.
      
      To fix it, we will push the work of registering TIPC protocol
      handler into workqueue context. After the change, both paths taking
      ptype_lock are always in process contexts, thus, the deadlock should
      never occur.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b0027af4
    • Jiri Bohac's avatar
      ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO · 43350267
      Jiri Bohac authored
      [ Upstream commit 61e76b17 ]
      
      RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination
      unreachable) messages:
              5 - Source address failed ingress/egress policy
      	6 - Reject route to destination
      
      Now they are treated as protocol error and icmpv6_err_convert() converts them
      to EPROTO.
      
      RFC 4443 says:
      	"Codes 5 and 6 are more informative subsets of code 1."
      
      Treat codes 5 and 6 as code 1 (EACCES)
      
      Btw, connect() returning -EPROTO confuses firefox, so that fallback to
      other/IPv4 addresses does not work:
      https://bugzilla.mozilla.org/show_bug.cgi?id=910773Signed-off-by: default avatarJiri Bohac <jbohac@suse.cz>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      43350267
    • Thomas Graf's avatar
      ipv6: Don't depend on per socket memory for neighbour discovery messages · 8eb7263e
      Thomas Graf authored
      [ Upstream commit 25a6e6b8 ]
      
      Allocating skbs when sending out neighbour discovery messages
      currently uses sock_alloc_send_skb() based on a per net namespace
      socket and thus share a socket wmem buffer space.
      
      If a netdevice is temporarily unable to transmit due to carrier
      loss or for other reasons, the queued up ndisc messages will cosnume
      all of the wmem space and will thus prevent from any more skbs to
      be allocated even for netdevices that are able to transmit packets.
      
      The number of neighbour discovery messages sent is very limited,
      use of alloc_skb() bypasses the socket wmem buffer size enforcement
      while the manual call to skb_set_owner_w() maintains the socket
      reference needed for the IPv6 output path.
      
      This patch has orginally been posted by Eric Dumazet in a modified
      form.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Fabio Estevam <festevam@gmail.com>
      Tested-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Tested-by: default avatarStephen Warren <swarren@nvidia.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8eb7263e
    • Hannes Frederic Sowa's avatar
      ipv6: drop packets with multiple fragmentation headers · ede46183
      Hannes Frederic Sowa authored
      [ Upstream commit f46078cf ]
      
      It is not allowed for an ipv6 packet to contain multiple fragmentation
      headers. So discard packets which were already reassembled by
      fragmentation logic and send back a parameter problem icmp.
      
      The updates for RFC 6980 will come in later, I have to do a bit more
      research here.
      
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ede46183
    • Hannes Frederic Sowa's avatar
      ipv6: remove max_addresses check from ipv6_create_tempaddr · 789ee2ae
      Hannes Frederic Sowa authored
      [ Upstream commit 4b08a8f1 ]
      
      Because of the max_addresses check attackers were able to disable privacy
      extensions on an interface by creating enough autoconfigured addresses:
      
      <http://seclists.org/oss-sec/2012/q4/292>
      
      But the check is not actually needed: max_addresses protects the
      kernel to install too many ipv6 addresses on an interface and guards
      addrconf_prefix_rcv to install further addresses as soon as this limit
      is reached. We only generate temporary addresses in direct response of
      a new address showing up. As soon as we filled up the maximum number of
      addresses of an interface, we stop installing more addresses and thus
      also stop generating more temp addresses.
      
      Even if the attacker tries to generate a lot of temporary addresses
      by announcing a prefix and removing it again (lifetime == 0) we won't
      install more temp addresses, because the temporary addresses do count
      to the maximum number of addresses, thus we would stop installing new
      autoconfigured addresses when the limit is reached.
      
      This patch fixes CVE-2013-0343 (but other layer-2 attacks are still
      possible).
      
      Thanks to Ding Tianhong to bring this topic up again.
      
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: George Kargiotakis <kargig@void.gr>
      Cc: P J P <ppandit@redhat.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      789ee2ae
    • Hannes Frederic Sowa's avatar
      ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match · 8d4e8ab0
      Hannes Frederic Sowa authored
      [ Upstream commit 3e3be275 ]
      
      In case a subtree did not match we currently stop backtracking and return
      NULL (root table from fib_lookup). This could yield in invalid routing
      table lookups when using subtrees.
      
      Instead continue to backtrack until a valid subtree or node is found
      and return this match.
      
      Also remove unneeded NULL check.
      Reported-by: default avatarTeco Boot <teco@inf-net.nl>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: David Lamparter <equinox@diac24.net>
      Cc: <boutier@pps.univ-paris-diderot.fr>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8d4e8ab0
    • Eric Dumazet's avatar
      tcp: cubic: fix bug in bictcp_acked() · 91b5dbd8
      Eric Dumazet authored
      [ Upstream commit cd6b423a ]
      
      While investigating about strange increase of retransmit rates
      on hosts ~24 days after boot, Van found hystart was disabled
      if ca->epoch_start was 0, as following condition is true
      when tcp_time_stamp high order bit is set.
      
      (s32)(tcp_time_stamp - ca->epoch_start) < HZ
      
      Quoting Van :
      
       At initialization & after every loss ca->epoch_start is set to zero so
       I believe that the above line will turn off hystart as soon as the 2^31
       bit is set in tcp_time_stamp & hystart will stay off for 24 days.
       I think we've observed that cubic's restart is too aggressive without
       hystart so this might account for the higher drop rate we observe.
      Diagnosed-by: default avatarVan Jacobson <vanj@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      91b5dbd8
    • Roman Gushchin's avatar
      net: check net.core.somaxconn sysctl values · eedcafdc
      Roman Gushchin authored
      [ Upstream commit 5f671d6b ]
      
      It's possible to assign an invalid value to the net.core.somaxconn
      sysctl variable, because there is no checks at all.
      
      The sk_max_ack_backlog field of the sock structure is defined as
      unsigned short. Therefore, the backlog argument in inet_listen()
      shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
      is truncated to the somaxconn value. So, the somaxconn value shouldn't
      exceed 65535 (USHRT_MAX).
      Also, negative values of somaxconn are meaningless.
      
      before:
      $ sysctl -w net.core.somaxconn=256
      net.core.somaxconn = 256
      $ sysctl -w net.core.somaxconn=65536
      net.core.somaxconn = 65536
      $ sysctl -w net.core.somaxconn=-100
      net.core.somaxconn = -100
      
      after:
      $ sysctl -w net.core.somaxconn=256
      net.core.somaxconn = 256
      $ sysctl -w net.core.somaxconn=65536
      error: "Invalid argument" setting key "net.core.somaxconn"
      $ sysctl -w net.core.somaxconn=-100
      error: "Invalid argument" setting key "net.core.somaxconn"
      
      Based on a prior patch from Changli Gao.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Reported-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      eedcafdc
    • stephen hemminger's avatar
      htb: fix sign extension bug · 63251ed0
      stephen hemminger authored
      [ Upstream commit cbd37556 ]
      
      When userspace passes a large priority value
      the assignment of the unsigned value hopt->prio
      to  signed int cl->prio causes cl->prio to become negative and the
      comparison is with TC_HTB_NUMPRIO is always false.
      
      The result is that HTB crashes by referencing outside
      the array when processing packets. With this patch the large value
      wraps around like other values outside the normal range.
      
      See: https://bugzilla.kernel.org/show_bug.cgi?id=60669Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      63251ed0
    • Dan Carpenter's avatar
      net_sched: info leak in atm_tc_dump_class() · f5a2119c
      Dan Carpenter authored
      [ Upstream commit 8cb3b9c3 ]
      
      The "pvc" struct has a hole after pvc.sap_family which is not cleared.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f5a2119c
    • Dan Carpenter's avatar
      af_key: more info leaks in pfkey messages · f084fd38
      Dan Carpenter authored
      [ Upstream commit ff862a46 ]
      
      This is inspired by a5cc68f3 "af_key: fix info leaks in notify
      messages".  There are some struct members which don't get initialized
      and could disclose small amounts of private information.
      Acked-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f084fd38
    • David S. Miller's avatar
      net_sched: Fix stack info leak in cbq_dump_wrr(). · dd0938fa
      David S. Miller authored
      [ Upstream commit a0db856a ]
      
      Make sure the reserved fields, and padding (if any), are
      fully initialized.
      
      Based upon a patch by Dan Carpenter and feedback from
      Joe Perches.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd0938fa
    • Neil Horman's avatar
      sctp: fully initialize sctp_outq in sctp_outq_init · e409e24a
      Neil Horman authored
      [ Upstream commit c5c7774d ]
      
      In commit 2f94aabd
      (refactor sctp_outq_teardown to insure proper re-initalization)
      we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the
      outq structure.  Steve West recently asked me why I removed the q->error = 0
      initalization from sctp_outq_teardown.  I did so because I was operating under
      the impression that sctp_outq_init would properly initalize that value for us,
      but it doesn't.  sctp_outq_init operates under the assumption that the outq
      struct is all 0's (as it is when called from sctp_association_init), but using
      it in __sctp_outq_teardown violates that assumption. We should do a memset in
      sctp_outq_init to ensure that the entire structure is in a known state there
      instead.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatar"West, Steve (NSN - US/Fort Worth)" <steve.west@nsn.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: netdev@vger.kernel.org
      CC: davem@davemloft.net
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e409e24a
    • Michal Tesar's avatar
      sysctl net: Keep tcp_syn_retries inside the boundary · b7c9e4ee
      Michal Tesar authored
      [ Upstream commit 651e9271 ]
      
      Limit the min/max value passed to the
      /proc/sys/net/ipv4/tcp_syn_retries.
      Signed-off-by: default avatarMichal Tesar <mtesar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b7c9e4ee
    • Dan Carpenter's avatar
      arcnet: cleanup sizeof parameter · 8c46d377
      Dan Carpenter authored
      [ Upstream commit 087d273c ]
      
      This patch doesn't change the compiled code because ARC_HDR_SIZE is 4
      and sizeof(int) is 4, but the intent was to use the header size and not
      the sizeof the header size.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8c46d377
    • Eric Dumazet's avatar
      vlan: fix a race in egress prio management · 28a7606f
      Eric Dumazet authored
      [ Upstream commit 3e3aac49 ]
      
      egress_priority_map[] hash table updates are protected by rtnl,
      and we never remove elements until device is dismantled.
      
      We have to make sure that before inserting an new element in hash table,
      all its fields are committed to memory or else another cpu could
      find corrupt values and crash.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      28a7606f
    • dingtianhong's avatar
      ifb: fix oops when loading the ifb failed · 5393c4b2
      dingtianhong authored
      [ Upstream commit f2966cd5 ]
      
      If __rtnl_link_register() return faild when loading the ifb, it will
      take the wrong path and get oops, so fix it just like dummy.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5393c4b2
    • dingtianhong's avatar
      dummy: fix oops when loading the dummy failed · 8665a8c4
      dingtianhong authored
      [ Upstream commit 2c8a0189 ]
      
      We rename the dummy in modprobe.conf like this:
      
      install dummy0 /sbin/modprobe -o dummy0 --ignore-install dummy
      install dummy1 /sbin/modprobe -o dummy1 --ignore-install dummy
      
      We got oops when we run the command:
      
      modprobe dummy0
      modprobe dummy1
      
      ------------[ cut here ]------------
      
      [ 3302.187584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [ 3302.195411] IP: [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
      [ 3302.201844] PGD 85c94a067 PUD 8517bd067 PMD 0
      [ 3302.206305] Oops: 0002 [#1] SMP
      [ 3302.299737] task: ffff88105ccea300 ti: ffff880eba4a0000 task.ti: ffff880eba4a0000
      [ 3302.307186] RIP: 0010:[<ffffffff813fe62a>]  [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
      [ 3302.316044] RSP: 0018:ffff880eba4a1dd8  EFLAGS: 00010246
      [ 3302.321332] RAX: 0000000000000000 RBX: ffffffff81a9d738 RCX: 0000000000000002
      [ 3302.328436] RDX: 0000000000000000 RSI: ffffffffa04d602c RDI: ffff880eba4a1dd8
      [ 3302.335541] RBP: ffff880eba4a1e18 R08: dead000000200200 R09: dead000000100100
      [ 3302.342644] R10: 0000000000000080 R11: 0000000000000003 R12: ffffffff81a9d788
      [ 3302.349748] R13: ffffffffa04d7020 R14: ffffffff81a9d670 R15: ffff880eba4a1dd8
      [ 3302.364910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3302.370630] CR2: 0000000000000008 CR3: 000000085e15e000 CR4: 00000000000427e0
      [ 3302.377734] DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001
      [ 3302.384838] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [ 3302.391940] Stack:
      [ 3302.393944]  ffff880eba4a1dd8 ffff880eba4a1dd8 ffff880eba4a1e18 ffffffffa04d70c0
      [ 3302.401350]  00000000ffffffef ffffffffa01a8000 0000000000000000 ffffffff816111c8
      [ 3302.408758]  ffff880eba4a1e48 ffffffffa01a80be ffff880eba4a1e48 ffffffffa04d70c0
      [ 3302.416164] Call Trace:
      [ 3302.418605]  [<ffffffffa01a8000>] ? 0xffffffffa01a7fff
      [ 3302.423727]  [<ffffffffa01a80be>] dummy_init_module+0xbe/0x1000 [dummy0]
      [ 3302.430405]  [<ffffffffa01a8000>] ? 0xffffffffa01a7fff
      [ 3302.435535]  [<ffffffff81000322>] do_one_initcall+0x152/0x1b0
      [ 3302.441263]  [<ffffffff810ab24b>] do_init_module+0x7b/0x200
      [ 3302.446824]  [<ffffffff810ad3d2>] load_module+0x4e2/0x530
      [ 3302.452215]  [<ffffffff8127ae40>] ? ddebug_dyndbg_boot_param_cb+0x60/0x60
      [ 3302.458979]  [<ffffffff810ad5f1>] SyS_init_module+0xd1/0x130
      [ 3302.464627]  [<ffffffff814b9652>] system_call_fastpath+0x16/0x1b
      [ 3302.490090] RIP  [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0
      [ 3302.496607]  RSP <ffff880eba4a1dd8>
      [ 3302.500084] CR2: 0000000000000008
      [ 3302.503466] ---[ end trace 8342d49cd49f78ed ]---
      
      The reason is that when loading dummy, if __rtnl_link_register() return failed,
      the init_module should return and avoid take the wrong path.
      Signed-off-by: default avatarTan Xiaojun <tanxiaojun@huawei.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8665a8c4
    • dingtianhong's avatar
      ifb: fix rcu_sched self-detected stalls · 895f0ade
      dingtianhong authored
      [ Upstream commit 440d57bc ]
      
      According to the commit 16b0dc29
      (dummy: fix rcu_sched self-detected stalls)
      
      Eric Dumazet fix the problem in dummy, but the ifb will occur the
      same problem like the dummy modules.
      
      Trying to "modprobe ifb numifbs=30000" triggers :
      
      INFO: rcu_sched self-detected stall on CPU
      
      After this splat, RTNL is locked and reboot is needed.
      
      We must call cond_resched() to avoid this, even holding RTNL.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [wt: 2.6.32: cond_resched() needs linux/sched.h]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      895f0ade
    • Dave Kleikamp's avatar
      sunvnet: vnet_port_remove must call unregister_netdev · 63d632e5
      Dave Kleikamp authored
      [ Upstream commit aabb9875 ]
      
      The missing call to unregister_netdev() leaves the interface active
      after the driver is unloaded by rmmod.
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      63d632e5
    • Changli Gao's avatar
      net: Swap ver and type in pppoe_hdr · b0d43763
      Changli Gao authored
      [ Upstream commit b1a5a34b ]
      
      Ver and type in pppoe_hdr should be swapped as defined by RFC2516
      section-4.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b0d43763
    • Eric Dumazet's avatar
      neighbour: fix a race in neigh_destroy() · af6739a5
      Eric Dumazet authored
      [ Upstream commit c9ab4d85 ]
      
      There is a race in neighbour code, because neigh_destroy() uses
      skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
      while other parts of the code assume neighbour rwlock is what
      protects arp_queue
      
      Convert all skb_queue_purge() calls to the __skb_queue_purge() variant
      
      Use __skb_queue_head_init() instead of skb_queue_head_init()
      to make clear we do not use arp_queue.lock
      
      And hold neigh->lock in neigh_destroy() to close the race.
      Reported-by: default avatarJoe Jin <joe.jin@oracle.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      af6739a5
    • Daniel Borkmann's avatar
      packet: packet_getname_spkt: make sure string is always 0-terminated · 6597b256
      Daniel Borkmann authored
      [ Upstream commit 2dc85bf3 ]
      
      uaddr->sa_data is exactly of size 14, which is hard-coded here and
      passed as a size argument to strncpy(). A device name can be of size
      IFNAMSIZ (== 16), meaning we might leave the destination string
      unterminated. Thus, use strlcpy() and also sizeof() while we're
      at it. We need to memset the data area beforehand, since strlcpy
      does not padd the remaining buffer with zeroes for user space, so
      that we do not possibly leak anything.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6597b256
    • Daniel Borkmann's avatar
      net: sctp: fix NULL pointer dereference in socket destruction · de892243
      Daniel Borkmann authored
      [ Upstream commit 1abd165e ]
      
      While stress testing sctp sockets, I hit the following panic:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      IP: [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp]
      PGD 7cead067 PUD 7ce76067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: sctp(F) libcrc32c(F) [...]
      CPU: 7 PID: 2950 Comm: acc Tainted: GF            3.10.0-rc2+ #1
      Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011
      task: ffff88007ce0e0c0 ti: ffff88007b568000 task.ti: ffff88007b568000
      RIP: 0010:[<ffffffffa0490c4e>]  [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp]
      RSP: 0018:ffff88007b569e08  EFLAGS: 00010292
      RAX: 0000000000000000 RBX: ffff88007db78a00 RCX: dead000000200200
      RDX: ffffffffa049fdb0 RSI: ffff8800379baf38 RDI: 0000000000000000
      RBP: ffff88007b569e18 R08: ffff88007c230da0 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff880077990d00 R14: 0000000000000084 R15: ffff88007db78a00
      FS:  00007fc18ab61700(0000) GS:ffff88007fc60000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000020 CR3: 000000007cf9d000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff88007b569e38 ffff88007db78a00 ffff88007b569e38 ffffffffa049fded
       ffffffff81abf0c0 ffff88007db78a00 ffff88007b569e58 ffffffff8145b60e
       0000000000000000 0000000000000000 ffff88007b569eb8 ffffffff814df36e
      Call Trace:
       [<ffffffffa049fded>] sctp_destroy_sock+0x3d/0x80 [sctp]
       [<ffffffff8145b60e>] sk_common_release+0x1e/0xf0
       [<ffffffff814df36e>] inet_create+0x2ae/0x350
       [<ffffffff81455a6f>] __sock_create+0x11f/0x240
       [<ffffffff81455bf0>] sock_create+0x30/0x40
       [<ffffffff8145696c>] SyS_socket+0x4c/0xc0
       [<ffffffff815403be>] ? do_page_fault+0xe/0x10
       [<ffffffff8153cb32>] ? page_fault+0x22/0x30
       [<ffffffff81544e02>] system_call_fastpath+0x16/0x1b
      Code: 0c c9 c3 66 2e 0f 1f 84 00 00 00 00 00 e8 fb fe ff ff c9 c3 66 0f
            1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 <48>
            8b 47 20 48 89 fb c6 47 1c 01 c6 40 12 07 e8 9e 68 01 00 48
      RIP  [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp]
       RSP <ffff88007b569e08>
      CR2: 0000000000000020
      ---[ end trace e0d71ec1108c1dd9 ]---
      
      I did not hit this with the lksctp-tools functional tests, but with a
      small, multi-threaded test program, that heavily allocates, binds,
      listens and waits in accept on sctp sockets, and then randomly kills
      some of them (no need for an actual client in this case to hit this).
      Then, again, allocating, binding, etc, and then killing child processes.
      
      This panic then only occurs when ``echo 1 > /proc/sys/net/sctp/auth_enable''
      is set. The cause for that is actually very simple: in sctp_endpoint_init()
      we enter the path of sctp_auth_init_hmacs(). There, we try to allocate
      our crypto transforms through crypto_alloc_hash(). In our scenario,
      it then can happen that crypto_alloc_hash() fails with -EINTR from
      crypto_larval_wait(), thus we bail out and release the socket via
      sk_common_release(), sctp_destroy_sock() and hit the NULL pointer
      dereference as soon as we try to access members in the endpoint during
      sctp_endpoint_free(), since endpoint at that time is still NULL. Now,
      if we have that case, we do not need to do any cleanup work and just
      leave the destruction handler.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      de892243
    • Eric Dumazet's avatar
      ip_tunnel: fix kernel panic with icmp_dest_unreach · 09ef3754
      Eric Dumazet authored
      [ Upstream commit a6222602 ]
      
      Daniel Petre reported crashes in icmp_dst_unreach() with following call
      graph:
      
      Daniel found a similar problem mentioned in
       http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html
      
      And indeed this is the root cause : skb->cb[] contains data fooling IP
      stack.
      
      We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
      is called. Or else skb->cb[] might contain garbage from GSO segmentation
      layer.
      
      A similar fix was tested on linux-3.9, but gre code was refactored in
      linux-3.10. I'll send patches for stable kernels as well.
      
      Many thanks to Daniel for providing reports, patches and testing !
      Reported-by: default avatarDaniel Petre <daniel.petre@rcs-rds.ro>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      09ef3754
    • Eric Dumazet's avatar
      ipv6: fix possible crashes in ip6_cork_release() · 46e280b6
      Eric Dumazet authored
      [ Upstream commit 284041ef ]
      
      commit 0178b695 ("ipv6: Copy cork options in ip6_append_data")
      added some code duplication and bad error recovery, leading to potential
      crash in ip6_cork_release() as kfree() could be called with garbage.
      
      use kzalloc() to make sure this wont happen.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      46e280b6
    • Eric Dumazet's avatar
      tcp: fix tcp_md5_hash_skb_data() · 048c4543
      Eric Dumazet authored
      [ Upstream commit 54d27fcb ]
      
      TCP md5 communications fail [1] for some devices, because sg/crypto code
      assume page offsets are below PAGE_SIZE.
      
      This was discovered using mlx4 driver [2], but I suspect loopback
      might trigger the same bug now we use order-3 pages in tcp_sendmsg()
      
      [1] Failure is giving following messages.
      
      huh, entered softirq 3 NET_RX ffffffff806ad230 preempt_count 00000100,
      exited with 00000101?
      
      [2] mlx4 driver uses order-2 pages to allocate RX frags
      Reported-by: default avatarMatt Schnall <mischnal@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Bernhard Beck <bbeck@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      048c4543
    • Ricardo Ribalda's avatar
      ll_temac: Reset dma descriptors indexes on ndo_open · dc070123
      Ricardo Ribalda authored
      [ Upstream commit 7167cf0e ]
      
      The dma descriptors indexes are only initialized on the probe function.
      
      If a packet is on the buffer when temac_stop is called, the dma
      descriptors indexes can be left on a incorrect state where no other
      package can be sent.
      
      So an interface could be left in an usable state after ifdow/ifup.
      
      This patch makes sure that the descriptors indexes are in a proper
      status when the device is open.
      Signed-off-by: default avatarRicardo Ribalda Delgado <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dc070123
    • Neil Horman's avatar
      bonding: Fix broken promiscuity reference counting issue · d619b073
      Neil Horman authored
      [ Upstream commit 5a0068de ]
      
      Recently grabbed this report:
      https://bugzilla.redhat.com/show_bug.cgi?id=1005567
      
      Of an issue in which the bonding driver, with an attached vlan encountered the
      following errors when bond0 was taken down and back up:
      
      dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of
      device might be broken.
      
      The error occurs because, during __bond_release_one, if we release our last
      slave, we take on a random mac address and issue a NETDEV_CHANGEADDR
      notification.  With an attached vlan, the vlan may see that the vlan and bond
      mac address were in sync, but no longer are.  This triggers a call to dev_uc_add
      and dev_set_rx_mode, which enables IFF_PROMISC on the bond device.  Then, when
      we complete __bond_release_one, we use the current state of the bond flags to
      determine if we should decrement the promiscuity of the releasing slave.  But
      since the bond changed promiscuity state during the release operation, we
      incorrectly decrement the slave promisc count when it wasn't in promiscuous mode
      to begin with, causing the above error
      
      Fix is pretty simple, just cache the bonding flags at the start of the function
      and use those when determining the need to set promiscuity.
      
      This is also needed for the ALLMULTI flag
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Mark Wu <wudxw@linux.vnet.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Reported-by: default avatarMark Wu <wudxw@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d619b073
    • Peter Korsgaard's avatar
      dm9601: fix IFF_ALLMULTI handling · cc2a9147
      Peter Korsgaard authored
      [ Upstream commit bf0ea638 ]
      
      Pass-all-multicast is controlled by bit 3 in RX control, not bit 2
      (pass undersized frames).
      Reported-by: default avatarJoseph Chang <joseph_chang@davicom.com.tw>
      Signed-off-by: default avatarPeter Korsgaard <peter@korsgaard.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      cc2a9147
    • Salam Noureddine's avatar
      ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put · 5d71a2b3
      Salam Noureddine authored
      [ Upstream commit e2401654 ]
      
      It is possible for the timer handlers to run after the call to
      ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
      function in order to do proper cleanup when the refcnt reaches 0.
      Otherwise, the refcnt can reach zero without the in_device being
      destroyed and we end up leaking a reference to the net_device and
      see messages like the following,
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Tested on linux-3.4.43.
      Signed-off-by: default avatarSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5d71a2b3
    • Salam Noureddine's avatar
      ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put · 3b34788b
      Salam Noureddine authored
      [ Upstream commit 9260d3e1 ]
      
      It is possible for the timer handlers to run after the call to
      ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
      handler function in order to do proper cleanup when the refcnt
      reaches 0. Otherwise, the refcnt can reach zero without the
      inet6_dev being destroyed and we end up leaking a reference to
      the net_device and see messages like the following,
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Tested on linux-3.4.43.
      Signed-off-by: default avatarSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3b34788b