1. 15 Oct, 2014 39 commits
    • Nicolas Dichtel's avatar
      ip6_gre: fix flowi6_proto value in xmit path · c02b2427
      Nicolas Dichtel authored
      [ Upstream commit 3be07244 ]
      
      In xmit path, we build a flowi6 which will be used for the output route lookup.
      We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
      protocol should be IPPROTO_GRE.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarMatthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c02b2427
    • KY Srinivasan's avatar
      hyperv: Fix a bug in netvsc_start_xmit() · b3e77426
      KY Srinivasan authored
      [ Upstream commit dedb845d ]
      
      After the packet is successfully sent, we should not touch the skb
      as it may have been freed. This patch is based on the work done by
      Long Li <longli@microsoft.com>.
      
      In this version of the patch I have fixed issues pointed out by David.
      David, please queue this up for stable.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Tested-by: default avatarLong Li <longli@microsoft.com>
      Tested-by: default avatarSitsofe Wheeler <sitsofe@yahoo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3e77426
    • Hannes Frederic Sowa's avatar
      ipv6: remove rt6i_genid · 46e9e43e
      Hannes Frederic Sowa authored
      [ Upstream commit 705f1c86 ]
      
      Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
      are already marked DST_HOST (e.g. input routes routes) will always be
      invalidated during sk_dst_check. Thus per-socket dst caching absolutely
      had no effect and early demuxing had no effect.
      
      Thus this patch removes rt6i_genid: fn_sernum already gets modified during
      add operations, so we only must ensure we mutate fn_sernum during ipv6
      address remove operations. This is a fairly cost extensive operations,
      but address removal should not happen that often. Also our mtu update
      functions do the same and we heard no complains so far. xfrm policy
      changes also cause a call into fib6_flush_trees. Also plug a hole in
      rt6_info (no cacheline changes).
      
      I verified via tracing that this change has effect.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46e9e43e
    • Eric Dumazet's avatar
      gro: fix aggregation for skb using frag_list · 47a0965d
      Eric Dumazet authored
      [ Upstream commit 73d3fe6d ]
      
      In commit 8a29111c ("net: gro: allow to build full sized skb")
      I added a regression for linear skb that traditionally force GRO
      to use the frag_list fallback.
      
      Erez Shitrit found that at most two segments were aggregated and
      the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing.
      
      This is because pinfo at this spot still points to the last skb in the
      chain, instead of the first one, where we find the correct gso_size
      information.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 8a29111c ("net: gro: allow to build full sized skb")
      Reported-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47a0965d
    • Matan Barak's avatar
      net/mlx4: Correctly configure single ported VFs from the host · 7197ff3f
      Matan Barak authored
      [ Upstream commit a91c772f ]
      
      Single port VFs are seen PCI wise on both ports of the PF (we don't have
      single port PFs with ConnectX). With this in mind, it's possible for
      virtualization tools to try and configure a single ported VF through
      the "wrong" PF port.
      
      To handle that, we use the PF driver mapping of single port VFs to NIC
      ports and adjust the port value before calling into the low level
      code that does the actual VF configuration
      
      Fixes: 449fc488 ('net/mlx4: Adapt code for N-Port VF')
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7197ff3f
    • Matan Barak's avatar
      net/mlx4_core: Allow not to specify probe_vf in SRIOV IB mode · 256ddc62
      Matan Barak authored
      [ Upstream commit effa4bc4 ]
      
      When the HCA is configured in SRIOV IB mode (that is, at least one of
      the ports is IB) and the probe_vf module param isn't specified,
      mlx4_init_one() failed because of the following condition:
      
      if (ib_ports && (num_vfs_argc > 1 || probe_vfs_argc > 1)) {
      	 .....
      }
      
      The root cause for that is a mistake in the initialization of num_vfs_argc
      and probe_vfs_argc. When num_vfs / probe_vf aren't given, their argument
      count counterpart should be 0, fix that.
      
      Fixes: dd41cc3b ('net/mlx4: Adapt num_vfs/probed_vf params for single port VF')
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      256ddc62
    • Soren Brinkmann's avatar
      Revert "net/macb: add pinctrl consumer support" · 897a8a7d
      Soren Brinkmann authored
      [ Upstream commit 9026968a ]
      
      This reverts commit 8ef29f8a.
      The driver core already calls pinctrl_get() and claims the default
      state. There is no need to replicate this in the driver.
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      897a8a7d
    • Vlad Yasevich's avatar
      macvtap: Fix race between device delete and open. · e92b35f5
      Vlad Yasevich authored
      [ Upstream commit 40b8fe45 ]
      
      In macvtap device delete and open calls can race and
      this causes a list curruption of the vlan queue_list.
      
      The race intself is triggered by the idr accessors
      that located the vlan device.  The device is stored
      into and removed from the idr under both an rtnl and
      a mutex.  However, when attempting to locate the device
      in idr, only a mutex is taken.  As a result, once cpu
      perfoming a delete may take an rtnl and wait for the mutex,
      while another cput doing an open() will take the idr
      mutex first to fetch the device pointer and later take
      an rtnl to add a queue for the device which may have
      just gotten deleted.
      
      With this patch, we now hold the rtnl for the duration
      of the macvtap_open() call thus making sure that
      open will not race with delete.
      
      CC: Michael S. Tsirkin <mst@redhat.com>
      CC: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e92b35f5
    • Steffen Klassert's avatar
      ip_tunnel: Don't allow to add the same tunnel multiple times. · d7ea26ff
      Steffen Klassert authored
      [ Upstream commit d61746b2 ]
      
      When we try to add an already existing tunnel, we don't return
      an error. Instead we continue and call ip_tunnel_update().
      This means that we can change existing tunnels by adding
      the same tunnel multiple times. It is even possible to change
      the tunnel endpoints of the fallback device.
      
      We fix this by returning an error if we try to add an existing
      tunnel.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7ea26ff
    • Steffen Klassert's avatar
      xfrm: Generate queueing routes only from route lookup functions · c4cb71c5
      Steffen Klassert authored
      [ Upstream commit b8c203b2 ]
      
      Currently we genarate a queueing route if we have matching policies
      but can not resolve the states and the sysctl xfrm_larval_drop is
      disabled. Here we assume that dst_output() is called to kill the
      queued packets. Unfortunately this assumption is not true in all
      cases, so it is possible that these packets leave the system unwanted.
      
      We fix this by generating queueing routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: a0073fe1 ("xfrm: Add a state resolution packet queue")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4cb71c5
    • Steffen Klassert's avatar
      xfrm: Generate blackhole routes only from route lookup functions · de541d0c
      Steffen Klassert authored
      [ Upstream commit f92ee619 ]
      
      Currently we genarate a blackhole route route whenever we have
      matching policies but can not resolve the states. Here we assume
      that dst_output() is called to kill the balckholed packets.
      Unfortunately this assumption is not true in all cases, so
      it is possible that these packets leave the system unwanted.
      
      We fix this by generating blackhole routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de541d0c
    • Vlad Yasevich's avatar
      tg3: Allow for recieve of full-size 8021AD frames · 5fe83529
      Vlad Yasevich authored
      [ Upstream commit 7d3083ee ]
      
      When receiving a vlan-tagged frame that still contains
      a vlan header, the length of the packet will be greater
      then MTU+ETH_HLEN since it will account of the extra
      vlan header.  TG3 checks this for the case for 802.1Q,
      but not for 802.1ad.  As a result, full sized 802.1ad
      frames get dropped by the card.
      
      Add a check for 802.1ad protocol when receving full
      sized frames.
      Suggested-by: default avatarPrashant Sreedharan <prashant@broadcom.com>
      CC: Prashant Sreedharan <prashant@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fe83529
    • Vlad Yasevich's avatar
      tg3: Work around HW/FW limitations with vlan encapsulated frames · b0ce10f7
      Vlad Yasevich authored
      [ Upstream commit 476c1885 ]
      
      TG3 appears to have an issue performing TSO and checksum offloading
      correclty when the frame has been vlan encapsulated (non-accelrated).
      In these cases, tcp checksum is not correctly updated.
      
      This patch attempts to work around this issue.  After the patch,
      802.1ad vlans start working correctly over tg3 devices.
      
      CC: Prashant Sreedharan <prashant@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0ce10f7
    • Nicolas Dichtel's avatar
      macvlan: allow to enqueue broadcast pkt on virtual device · 576642b9
      Nicolas Dichtel authored
      [ Upstream commit 07d92d5c ]
      
      Since commit 412ca155 ("macvlan: Move broadcasts into a work queue"), the
      driver uses tx_queue_len of the master device as the limit of packets enqueuing.
      Problem is that virtual drivers have this value set to 0, thus all broadcast
      packets were rejected.
      Because tx_queue_len was arbitrarily chosen, I replace it with a static limit
      of 1000 (also arbitrarily chosen).
      
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarThibaut Collet <thibaut.collet@6wind.com>
      Suggested-by: default avatarThibaut Collet <thibaut.collet@6wind.com>
      Tested-by: default avatarThibaut Collet <thibaut.collet@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      576642b9
    • Francesco Ruggeri's avatar
      net: allow macvlans to move to net namespace · 156c6b80
      Francesco Ruggeri authored
      [ Upstream commit 0d0162e7 ]
      
      I cannot move a macvlan interface created on top of a bonding interface
      to a different namespace:
      
      % ip netns add dummy0
      % ip link add link bond0 mac0 type macvlan
      % ip link set mac0 netns dummy0
      RTNETLINK answers: Invalid argument
      %
      
      The problem seems to be that commit f9399814 ("bonding: Don't allow
      bond devices to change network namespaces.") sets NETIF_F_NETNS_LOCAL
      on bonding interfaces, and commit 797f87f8 ("macvlan: fix netdev
      feature propagation from lower device") causes macvlan interfaces
      to inherit its features from the lower device.
      
      NETIF_F_NETNS_LOCAL should not be inherited from the lower device
      by a macvlan.
      Patch tested on 3.16.
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      156c6b80
    • Vlad Yasevich's avatar
      bridge: Fix br_should_learn to check vlan_enabled · c78abd1a
      Vlad Yasevich authored
      [ Upstream commit c095f248 ]
      
      As Toshiaki Makita pointed out, the BRIDGE_INPUT_SKB_CB will
      not be initialized in br_should_learn() as that function
      is called only from br_handle_local_finish().  That is
      an input handler for link-local ethernet traffic so it perfectly
      correct to check br->vlan_enabled here.
      
      Reported-by: Toshiaki Makita<toshiaki.makita1@gmail.com>
      Fixes: 20adfa1a bridge: Check if vlan filtering is enabled only once.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c78abd1a
    • Vlad Yasevich's avatar
      bridge: Check if vlan filtering is enabled only once. · 84351f1a
      Vlad Yasevich authored
      [ Upstream commit 20adfa1a ]
      
      The bridge code checks if vlan filtering is enabled on both
      ingress and egress.   When the state flip happens, it
      is possible for the bridge to currently be forwarding packets
      and forwarding behavior becomes non-deterministic.  Bridge
      may drop packets on some interfaces, but not others.
      
      This patch solves this by caching the filtered state of the
      packet into skb_cb on ingress.  The skb_cb is guaranteed to
      not be over-written between the time packet entres bridge
      forwarding path and the time it leaves it.  On egress, we
      can then check the cached state to see if we need to
      apply filtering information.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84351f1a
    • Eric Dumazet's avatar
      net: filter: fix possible use after free · bcbfd0c0
      Eric Dumazet authored
      [ No appicable upstream commit, this bug has been subsequently been
        fixed as a side effect of other changes. ]
      
      If kmemdup() fails, we free fp->orig_prog and return -ENOMEM
      
      sk_attach_filter()
       -> sk_filter_uncharge(sk, fp)
        -> sk_filter_release(fp)
         -> call_rcu(&fp->rcu, sk_filter_release_rcu)
          -> sk_filter_release_rcu()
           -> sk_release_orig_filter()
              fprog = fp->orig_prog; // not NULL, but points to freed memory
      	  kfree(fprog->filter); // use after free, potential corruption
                kfree(fprog); // double free or corruption
      
      Note: This was fixed in 3.17+ with commit 278571ba
      ("net: filter: simplify socket charging")
      
      Found by AddressSanitizer
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: a3ea269b ("net: filter: keep original BPF program around")
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcbfd0c0
    • Nikolay Aleksandrov's avatar
      bonding: fix div by zero while enslaving and transmitting · e39b8b0e
      Nikolay Aleksandrov authored
      [ Upstream commit 9a72c2da ]
      
      The problem is that the slave is first linked and slave_cnt is
      incremented afterwards leading to a div by zero in the modes that use it
      as a modulus. What happens is that in bond_start_xmit()
      bond_has_slaves() is used to evaluate further transmission and it becomes
      true after the slave is linked in, but when slave_cnt is used in the xmit
      path it is still 0, so fetch it once and transmit based on that. Since
      it is used only in round-robin and XOR modes, the fix is only for them.
      Thanks to Eric Dumazet for pointing out the fault in my first try to fix
      this.
      
      Call trace (took it out of net-next kernel, but it's the same with net):
      [46934.330038] divide error: 0000 [#1] SMP
      [46934.330041] Modules linked in: bonding(O) 9p fscache
      snd_hda_codec_generic crct10dif_pclmul
      [46934.330041] bond0: Enslaving eth1 as an active interface with an up
      link
      [46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
      ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
      serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
      snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
      virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
      pata_acpi floppy [last unloaded: bonding]
      [46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
      3.17.0-rc4+ #27
      [46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
      ffff88005b728000
      [46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
      bond_start_xmit+0x1c3/0x450 [bonding]
      [46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
      [46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
      000000000000002a
      [46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
      ffff88004b077940
      [46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
      ffff88004a83e000
      [46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
      ffff88004b3f0500
      [46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
      ffff88004b077940
      [46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
      knlGS:0000000000000000
      [46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
      00000000000406e0
      [46934.330069] Stack:
      [46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
      ffff88004b3f0500
      [46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
      ffff88005b72b8a0
      [46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
      ffff88005b302000
      [46934.330073] Call Trace:
      [46934.330077]  [<ffffffff811e6169>] ?
      __kmalloc_node_track_caller+0x119/0x300
      [46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
      [46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
      [46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
      [46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
      [46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
      [46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
      [46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
      [46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
      [46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
      [46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
      [46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
      [46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
      [46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
      [46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
      [46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
      [46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
      [46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
      [46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
      [46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
      [46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
      [46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
      [46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
      [46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
      [46934.330115] bond0: Adding slave eth2
      [46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
      [46934.330118]  [<ffffffff81603248>] ?
      move_addr_to_kernel.part.20+0x28/0x80
      [46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
      [46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
      [46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
      [46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
      [46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
      [46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
      [46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
      [46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
      [46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
      1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
      89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
      [46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
      [bonding]
      [46934.330146]  RSP <ffff88005b72b7f8>
      
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      Fixes: 278b2083 ("bonding: initial RCU conversion")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e39b8b0e
    • WANG Cong's avatar
      ipv6: restore the behavior of ipv6_sock_ac_drop() · 2022f408
      WANG Cong authored
      [ Upstream commit de185ab4 ]
      
      It is possible that the interface is already gone after joining
      the list of anycast on this interface as we don't hold a refcount
      for the device, in this case we are safe to ignore the error.
      
      What's more important, for API compatibility we should not
      change this behavior for applications even if it were correct.
      
      Fixes: commit a9ed4a29 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast")
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2022f408
    • Guillaume Nault's avatar
      l2tp: fix race while getting PMTU on PPP pseudo-wire · e9d3d9a6
      Guillaume Nault authored
      [ Upstream commit eed4d839 ]
      
      Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.
      
      The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
      could return NULL if tunnel->sock->sk_dst_cache was reset just before the
      call, thus making dst_mtu() dereference a NULL pointer:
      
      [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
      [ 1937.664005] Oops: 0000 [#1] SMP
      [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
      [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
      [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
      [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
      [ 1937.664005] RIP: 0010:[<ffffffffa049db88>]  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
      [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
      [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
      [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
      [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
      [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
      [ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
      [ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
      [ 1937.664005] Stack:
      [ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
      [ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
      [ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
      [ 1937.664005] Call Trace:
      [ 1937.664005]  [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
      [ 1937.664005]  [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
      [ 1937.664005]  [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
      [ 1937.664005]  [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
      [ 1937.664005]  [<ffffffff81309196>] SYSC_connect+0x87/0xb1
      [ 1937.664005]  [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
      [ 1937.664005]  [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
      [ 1937.664005]  [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 1937.664005]  [<ffffffff8114c262>] ? spin_lock+0x9/0xb
      [ 1937.664005]  [<ffffffff813092b4>] SyS_connect+0x9/0xb
      [ 1937.664005]  [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
      [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
      [ 1937.664005] RIP  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005]  RSP <ffff8800c43c7de8>
      [ 1937.664005] CR2: 0000000000000020
      [ 1939.559375] ---[ end trace 82d44500f28f8708 ]---
      
      Fixes: f34c4a35 ("l2tp: take PMTU from tunnel UDP socket")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9d3d9a6
    • Sabrina Dubroca's avatar
      ipv6: fix rtnl locking in setsockopt for anycast and multicast · e06d2bcb
      Sabrina Dubroca authored
      [ Upstream commit a9ed4a29 ]
      
      Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
      triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()
      
      ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
      take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
      ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
      calling ipv6_dev_mc_inc/dec.
      
      This patch moves ASSERT_RTNL() up a level in the call stack.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reported-by: default avatarTommi Rantala <tt.rantala@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e06d2bcb
    • Michal Kubeček's avatar
      net: fix checksum features handling in netif_skb_features() · af3e4e5d
      Michal Kubeček authored
      [ Upstream commit db115037 ]
      
      This is follow-up to
      
        da08143b ("vlan: more careful checksum features handling")
      
      which introduced more careful feature intersection in vlan code,
      taking into account that HW_CSUM should be considered superset
      of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
      in order to avoid offloading mismatch warning when vlan is
      created on top of a bond consisting of slaves supporting IP/IPv6
      checksumming but not vlan Tx offloading.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3e4e5d
    • Gerhard Stenzel's avatar
      vxlan: fix incorrect initializer in union vxlan_addr · cd80ab0c
      Gerhard Stenzel authored
      [ Upstream commit a45e92a5 ]
      
      The first initializer in the following
      
              union vxlan_addr ipa = {
                  .sin.sin_addr.s_addr = tip,
                  .sa.sa_family = AF_INET,
              };
      
      is optimised away by the compiler, due to the second initializer,
      therefore initialising .sin.sin_addr.s_addr always to 0.
      This results in netlink messages indicating a L3 miss never contain the
      missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
      The problem affects user space programs relying on an IP address being
      sent as part of a netlink message indicating a L3 miss.
      
      Changing
                  .sa.sa_family = AF_INET,
      to
                  .sin.sin_family = AF_INET,
      fixes the problem.
      Signed-off-by: default avatarGerhard Stenzel <gerhard.stenzel@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd80ab0c
    • Jiri Benc's avatar
      openvswitch: fix panic with multiple vlan headers · 18460f5f
      Jiri Benc authored
      [ Upstream commit 2ba5af42 ]
      
      When there are multiple vlan headers present in a received frame, the first
      one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
      skb beyond the VLAN TPID may be still non-linear, including the inner TCI
      and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
      does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
      executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
      
      This leads to two things:
      
      1. Part of the resulting ethernet header is in the non-linear part of the
         skb. When eth_type_trans is called later as the result of
         OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
         is in fact accessing random data when it reads past the TPID.
      
      2. network_header points into the ethernet header instead of behind it.
         mac_len is set to a wrong value (10), too.
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18460f5f
    • Benjamin Block's avatar
      net: ipv6: fib: don't sleep inside atomic lock · 3f719b11
      Benjamin Block authored
      [ Upstream commit 793c3b40 ]
      
      The function fib6_commit_metrics() allocates a piece of memory in mode
      GFP_KERNEL while holding an atomic lock from higher up in the stack, in
      the function __ip6_ins_rt(). This produces the following BUG:
      
      > BUG: sleeping function called from invalid context at mm/slub.c:1250
      > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
      > 2 locks held by dhcpcd/2909:
      >  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81978e67>] rtnl_lock+0x17/0x20
      >  #1:  (&tb->tb6_lock){++--+.}, at: [<ffffffff81a6951a>] ip6_route_add+0x65a/0x800
      > CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
      > Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
      >  0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000
      >  ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98
      >  0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001
      > Call Trace:
      >  [<ffffffff81af135a>] dump_stack+0x4e/0x68
      >  [<ffffffff810f8d3a>] __might_sleep+0x10a/0x120
      >  [<ffffffff8121ad0e>] kmem_cache_alloc_trace+0x4e/0x190
      >  [<ffffffff81a6bcd6>] ? fib6_commit_metrics+0x66/0x110
      >  [<ffffffff81a6bcd6>] fib6_commit_metrics+0x66/0x110
      >  [<ffffffff81a6cbf3>] fib6_add+0x883/0xa80
      >  [<ffffffff81a6951a>] ? ip6_route_add+0x65a/0x800
      >  [<ffffffff81a69535>] ip6_route_add+0x675/0x800
      >  [<ffffffff81a68f2a>] ? ip6_route_add+0x6a/0x800
      >  [<ffffffff81a6990c>] inet6_rtm_newroute+0x5c/0x80
      >  [<ffffffff8197cf01>] rtnetlink_rcv_msg+0x211/0x260
      >  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
      >  [<ffffffff81119708>] ? lock_release_holdtime+0x28/0x180
      >  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
      >  [<ffffffff8197ccf0>] ? __rtnl_unlock+0x20/0x20
      >  [<ffffffff819a989e>] netlink_rcv_skb+0x6e/0xd0
      >  [<ffffffff81978ee5>] rtnetlink_rcv+0x25/0x40
      >  [<ffffffff819a8e59>] netlink_unicast+0xd9/0x180
      >  [<ffffffff819a9600>] netlink_sendmsg+0x700/0x770
      >  [<ffffffff81103735>] ? local_clock+0x25/0x30
      >  [<ffffffff8194e83c>] sock_sendmsg+0x6c/0x90
      >  [<ffffffff811f98e3>] ? might_fault+0xa3/0xb0
      >  [<ffffffff8195ca6d>] ? verify_iovec+0x7d/0xf0
      >  [<ffffffff8194ec3e>] ___sys_sendmsg+0x37e/0x3b0
      >  [<ffffffff8111ef15>] ? trace_hardirqs_on_caller+0x185/0x220
      >  [<ffffffff81af979e>] ? mutex_unlock+0xe/0x10
      >  [<ffffffff819a55ec>] ? netlink_insert+0xbc/0xe0
      >  [<ffffffff819a65e5>] ? netlink_autobind.isra.30+0x125/0x150
      >  [<ffffffff819a6520>] ? netlink_autobind.isra.30+0x60/0x150
      >  [<ffffffff819a84f9>] ? netlink_bind+0x159/0x230
      >  [<ffffffff811f989a>] ? might_fault+0x5a/0xb0
      >  [<ffffffff8194f25e>] ? SYSC_bind+0x7e/0xd0
      >  [<ffffffff8194f8cd>] __sys_sendmsg+0x4d/0x80
      >  [<ffffffff8194f912>] SyS_sendmsg+0x12/0x20
      >  [<ffffffff81afc692>] system_call_fastpath+0x16/0x1b
      
      Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.
      Signed-off-by: default avatarBenjamin Block <bebl@mageta.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f719b11
    • Yuval Mintz's avatar
      bnx2x: Revert UNDI flushing mechanism · 39f5b1eb
      Yuval Mintz authored
      [ Upstream commit 7c3afd85 ]
      
      Commit 91ebb929 ("bnx2x: Add support for Multi-Function UNDI") [which was
      later supposedly fixed by de682941 ("bnx2x: Fix UNDI driver unload")]
      introduced a bug in which in some [yet-to-be-determined] scenarios the
      alternative flushing mechanism which was to guarantee the Rx buffers are
      empty before resetting them during device probe will fail.
      If this happens, when device will be loaded once more a fatal attention will
      occur; Since this most likely happens in boot from SAN scenarios, the machine
      will fail to load.
      
      Notice this may occur not only in the 'Multi-Function' scenario but in the
      regular scenario as well, i.e., this introduced a regression in the driver's
      ability to perform boot from SAN.
      
      The patch reverts the mechanism and applies the old scheme to multi-function
      devices as well as to single-function devices.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarAriel Elior <Ariel.Elior@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39f5b1eb
    • Eric Dumazet's avatar
      packet: handle too big packets for PACKET_V3 · 29c75569
      Eric Dumazet authored
      [ Upstream commit dc808110 ]
      
      af_packet can currently overwrite kernel memory by out of bound
      accesses, because it assumed a [new] block can always hold one frame.
      
      This is not generally the case, even if most existing tools do it right.
      
      This patch clamps too long frames as API permits, and issue a one time
      error on syslog.
      
      [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
      
      In this example, packet header tp_snaplen was set to 3966,
      and tp_len was set to 5042 (skb->len)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29c75569
    • Erik Hugne's avatar
      tipc: fix message importance range check · 8ef544dc
      Erik Hugne authored
      [ Upstream commit ac32c7f7 ]
      
      Commit 3b4f302d ("tipc: eliminate
      redundant locking") introduced a bug by removing the sanity check
      for message importance, allowing programs to assign any value to
      the msg_user field. This will mess up the packet reception logic
      and may cause random link resets.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ef544dc
    • Gwenhael Goavec-Merou's avatar
      net: phy: smsc: move smsc_phy_config_init reset part in a soft_reset function · 4775bfac
      Gwenhael Goavec-Merou authored
      [ Upstream commit 21009686 ]
      
      On the one hand, phy_device.c provides a generic reset function if the phy
      driver does not provide a soft_reset pointer. This generic reset does not take
      into account the state of the phy, with a potential failure if the phy is in
      powerdown mode. On the other hand, smsc driver provides a function with both
      correct reset behaviour and configuration.
      
      This patch moves the reset part into a new smsc_phy_reset function and provides
      the soft_reset pointer to have a correct reset behaviour by default.
      Signed-off-by: default avatarGwenhael Goavec-Merou <gwenhael.goavec-merou@armadeus.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4775bfac
    • Neal Cardwell's avatar
      tcp: fix ssthresh and undo for consecutive short FRTO episodes · 638228c5
      Neal Cardwell authored
      [ Upstream commit 0c9ab092 ]
      
      Fix TCP FRTO logic so that it always notices when snd_una advances,
      indicating that any RTO after that point will be a new and distinct
      loss episode.
      
      Previously there was a very specific sequence that could cause FRTO to
      fail to notice a new loss episode had started:
      
      (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
      (2) receiver ACKs packet 1
      (3) FRTO sends 2 more packets
      (4) RTO timer fires again (should start a new loss episode)
      
      The problem was in step (3) above, where tcp_process_loss() returned
      early (in the spot marked "Step 2.b"), so that it never got to the
      logic to clear icsk_retransmits. Thus icsk_retransmits stayed
      non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
      icsk_retransmits, decide that this RTO is not a new episode, and
      decide not to cut ssthresh and remember the current cwnd and ssthresh
      for undo.
      
      There were two main consequences to the bug that we have
      observed. First, ssthresh was not decreased in step (4). Second, when
      there was a series of such FRTO (1-4) sequences that happened to be
      followed by an FRTO undo, we would restore the cwnd and ssthresh from
      before the entire series started (instead of the cwnd and ssthresh
      from before the most recent RTO). This could result in cwnd and
      ssthresh being restored to values much bigger than the proper values.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      638228c5
    • Neal Cardwell's avatar
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 72dcbec4
      Neal Cardwell authored
      [ Upstream commit 4fab9071 ]
      
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Diagnosed-by: default avatarWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72dcbec4
    • Shmulik Ladkani's avatar
      sit: Fix ipip6_tunnel_lookup device matching criteria · e72790ed
      Shmulik Ladkani authored
      [ Upstream commit bc8fc7b8 ]
      
      As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"),
      when looking up a tunnel, tunnel's underlying interface (t->parms.link)
      is verified to match incoming traffic's ingress device.
      
      However the comparison was incorrectly based on skb->dev->iflink.
      
      Instead, dev->ifindex should be used, which correctly represents the
      interface from which the IP stack hands the ipip6 packets.
      
      This allows setting up sit tunnels bound to vlan interfaces (otherwise
      incoming ipip6 traffic on the vlan interface was dropped due to
      ipip6_tunnel_lookup match failure).
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e72790ed
    • Andrey Vagin's avatar
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 48518112
      Andrey Vagin authored
      [ Upstream commit 9d186cac ]
      
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48518112
    • David S. Miller's avatar
      Revert "macvlan: simplify the structure port" · de25adff
      David S. Miller authored
      [ Upstream commit 5e3c516b ]
      
      This reverts commit a188a54d.
      
      It causes crashes
      
      ====================
      [   80.643286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000878
      [   80.670103] IP: [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
      [   80.691289] PGD 22c102067 PUD 235bf0067 PMD 0
      [   80.706611] Oops: 0002 [#1] SMP
      [   80.717836] Modules linked in: macvlan nfsd lockd nfs_acl exportfs auth_rpcgss sunrpc oid_registry ioatdma ixgbe(-) mdio igb dca
      [   80.757935] CPU: 37 PID: 6724 Comm: rmmod Not tainted 3.16.0-net-next-08-12-2014-FCoE+ #1
      [   80.785688] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
      [   80.820310] task: ffff880235a9eae0 ti: ffff88022e844000 task.ti: ffff88022e844000
      [   80.845770] RIP: 0010:[<ffffffff810832e4>]  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
      [   80.875326] RSP: 0018:ffff88022e847b28  EFLAGS: 00010046
      [   80.893251] RAX: 0000000000037a6a RBX: 0000000000000878 RCX: 0000000000000000
      [   80.917187] RDX: ffff880235a9eae0 RSI: 0000000000000001 RDI: ffffffff810832db
      [   80.941125] RBP: ffff88022e847b58 R08: 0000000000000000 R09: 0000000000000000
      [   80.965056] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022e847b70
      [   80.988994] R13: 0000000000000000 R14: ffff88022e847be8 R15: ffffffff81ebe440
      [   81.012929] FS:  00007fab90b07700(0000) GS:ffff88043f7a0000(0000) knlGS:0000000000000000
      [   81.040400] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   81.059757] CR2: 0000000000000878 CR3: 0000000235a42000 CR4: 00000000001407e0
      [   81.083689] Stack:
      [   81.090739]  ffff880235a9eae0 0000000000000878 ffff88022e847b70 0000000000000000
      [   81.116253]  ffff88022e847be8 ffffffff81ebe440 ffff88022e847b98 ffffffff810847f1
      [   81.141766]  ffff88022e847b78 0000000000000286 ffff880234200000 0000000000000000
      [   81.167282] Call Trace:
      [   81.175768]  [<ffffffff810847f1>] __cancel_work_timer+0x31/0x170
      [   81.195985]  [<ffffffff8108494b>] cancel_work_sync+0xb/0x10
      [   81.214769]  [<ffffffffa015ae68>] macvlan_port_destroy+0x28/0x60 [macvlan]
      [   81.237844]  [<ffffffffa015b930>] macvlan_uninit+0x40/0x50 [macvlan]
      [   81.259209]  [<ffffffff816bf6e2>] rollback_registered_many+0x1a2/0x2c0
      [   81.281140]  [<ffffffff816bf81a>] unregister_netdevice_many+0x1a/0xb0
      [   81.302786]  [<ffffffffa015a4ff>] macvlan_device_event+0x1ef/0x240 [macvlan]
      [   81.326439]  [<ffffffff8108a13d>] notifier_call_chain+0x4d/0x70
      [   81.346366]  [<ffffffff8108a201>] raw_notifier_call_chain+0x11/0x20
      [   81.367439]  [<ffffffff816bf25b>] call_netdevice_notifiers_info+0x3b/0x70
      [   81.390228]  [<ffffffff816bf2a1>] call_netdevice_notifiers+0x11/0x20
      [   81.411587]  [<ffffffff816bf6bd>] rollback_registered_many+0x17d/0x2c0
      [   81.433518]  [<ffffffff816bf925>] unregister_netdevice_queue+0x75/0x110
      [   81.455735]  [<ffffffff816bfb2b>] unregister_netdev+0x1b/0x30
      [   81.475094]  [<ffffffffa0039b50>] ixgbe_remove+0x170/0x1d0 [ixgbe]
      [   81.495886]  [<ffffffff813512a2>] pci_device_remove+0x32/0x60
      [   81.515246]  [<ffffffff814c75c4>] __device_release_driver+0x64/0xd0
      [   81.536321]  [<ffffffff814c76f8>] driver_detach+0xc8/0xd0
      [   81.554530]  [<ffffffff814c656e>] bus_remove_driver+0x4e/0xa0
      [   81.573888]  [<ffffffff814c828b>] driver_unregister+0x2b/0x60
      [   81.593246]  [<ffffffff8135143e>] pci_unregister_driver+0x1e/0xa0
      [   81.613749]  [<ffffffffa005db18>] ixgbe_exit_module+0x1c/0x2e [ixgbe]
      [   81.635401]  [<ffffffff810e738b>] SyS_delete_module+0x15b/0x1e0
      [   81.655334]  [<ffffffff8187a395>] ? sysret_check+0x22/0x5d
      [   81.673833]  [<ffffffff810abd2d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
      [   81.696339]  [<ffffffff8132bfde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [   81.717985]  [<ffffffff8187a369>] system_call_fastpath+0x16/0x1b
      [   81.738199] Code: 00 48 83 3d 6e bb da 00 00 48 89 c2 0f 84 67 01 00 00 fa 66 0f 1f 44 00 00 49 89 14 24 e8 b5 4b 02 00 45 84 ed 0f 85 ac 00 00 00 <f0> 0f ba 2b 00 72 1d 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8
      [   81.807026] RIP  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
      [   81.828468]  RSP <ffff88022e847b28>
      [   81.840384] CR2: 0000000000000878
      [   81.851731] ---[ end trace 9f6c7232e3464e11 ]---
      ====================
      
      This bug could be triggered by these steps:
      
      modprobe ixgbe ; modprobe macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:00 macvlan0 type macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:01 macvlan1 type macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:02 macvlan2 type macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:03 macvlan3 type macvlan
      rmmod ixgbe
      Reported-by: default avatar"Keller, Jacob E" <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de25adff
    • Stanislaw Gruszka's avatar
      myri10ge: check for DMA mapping errors · a378b942
      Stanislaw Gruszka authored
      [ Upstream commit 10545937 ]
      
      On IOMMU systems DMA mapping can fail, we need to check for
      that possibility.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a378b942
    • Vlad Yasevich's avatar
      net: Always untag vlan-tagged traffic on input. · 84beb1a9
      Vlad Yasevich authored
      [ Upstream commit 0d5501c1 ]
      
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84beb1a9
    • Jiri Benc's avatar
      rtnetlink: fix VF info size · 53f8c7d2
      Jiri Benc authored
      [ Upstream commit 945a3676 ]
      
      Commit 1d8faf48 ("net/core: Add VF link state control") added new
      attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
      of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
      may trigger warnings in rtnl_getlink and similar functions when many VF
      links are enabled, as the information does not fit into the allocated skb.
      
      Fixes: 1d8faf48 ("net/core: Add VF link state control")
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53f8c7d2
    • Daniel Borkmann's avatar
      netlink: reset network header before passing to taps · 47a0ff6c
      Daniel Borkmann authored
      [ Upstream commit 4e48ed88 ]
      
      netlink doesn't set any network header offset thus when the skb is
      being passed to tap devices via dev_queue_xmit_nit(), it emits klog
      false positives due to it being unset like:
      
        ...
        [  124.990397] protocol 0000 is buggy, dev nlmon0
        [  124.990411] protocol 0000 is buggy, dev nlmon0
        ...
      
      So just reset the network header before passing to the device; for
      packet sockets that just means nothing will change - mac and net
      offset hold the same value just as before.
      Reported-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47a0ff6c
  2. 09 Oct, 2014 1 commit