1. 11 Feb, 2016 7 commits
  2. 09 Feb, 2016 4 commits
  3. 08 Feb, 2016 1 commit
    • David Vrabel's avatar
      x86/xen/p2m: hint at the last populated P2M entry · 4e721da3
      David Vrabel authored
      commit 98dd166e upstream.
      
      With commit 633d6f17 (x86/xen: prepare
      p2m list for memory hotplug) the P2M may be sized to accomdate a much
      larger amount of memory than the domain currently has.
      
      When saving a domain, the toolstack must scan all the P2M looking for
      populated pages.  This results in a performance regression due to the
      unnecessary scanning.
      
      Instead of reporting (via shared_info) the maximum possible size of
      the P2M, hint at the last PFN which might be populated.  This hint is
      increased as new leaves are added to the P2M (in the expectation that
      they will be used for populated entries).
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      [ tim.gardner: backport to 4.2-stable ]
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4e721da3
  4. 02 Feb, 2016 1 commit
  5. 29 Jan, 2016 27 commits
    • Dan Streetman's avatar
      xfrm: dst_entries_init() per-net dst_ops · 72b2bd26
      Dan Streetman authored
      [ Upstream commit a8a572a6 ]
      
      Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
      templates; their dst_entries counters will never be used.  Move the
      xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
      xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
      and dst_entries_destroy for each net namespace.
      
      The ipv4 and ipv6 xfrms each create dst_ops template, and perform
      dst_entries_init on the templates.  The template values are copied to each
      net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
      pcpuc_entries field is a percpu counter and cannot be used correctly by
      simply copying it to another object.
      
      The result of this is a very subtle bug; changes to the dst entries
      counter from one net namespace may sometimes get applied to a different
      net namespace dst entries counter.  This is because of how the percpu
      counter works; it has a main count field as well as a pointer to the
      percpu variables.  Each net namespace maintains its own main count
      variable, but all point to one set of percpu variables.  When any net
      namespace happens to change one of the percpu variables to outside its
      small batch range, its count is moved to the net namespace's main count
      variable.  So with multiple net namespaces operating concurrently, the
      dst_ops entries counter can stray from the actual value that it should
      be; if counts are consistently moved from one net namespace to another
      (which my testing showed is likely), then one net namespace winds up
      with a negative dst_ops count while another winds up with a continually
      increasing count, eventually reaching its gc_thresh limit, which causes
      all new traffic on the net namespace to fail with -ENOBUFS.
      Signed-off-by: default avatarDan Streetman <dan.streetman@canonical.com>
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      72b2bd26
    • Joe Jin's avatar
      xen-netfront: update num_queues to real created · 3c9a1069
      Joe Jin authored
      [ Upstream commit ca88ea12 ]
      
      Sometimes xennet_create_queues() may failed to created all requested
      queues, we need to update num_queues to real created to avoid NULL
      pointer dereference.
      Signed-off-by: default avatarJoe Jin <joe.jin@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3c9a1069
    • Wei Liu's avatar
      xen-netfront: respect user provided max_queues · 2bc31391
      Wei Liu authored
      [ Upstream commit 32a84405 ]
      
      Originally that parameter was always reset to num_online_cpus during
      module initialisation, which renders it useless.
      
      The fix is to only set max_queues to num_online_cpus when user has not
      provided a value.
      Signed-off-by: default avatarWei Liu <wei.liu2@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Tested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2bc31391
    • Wei Liu's avatar
      xen-netback: respect user provided max_queues · bfc924ec
      Wei Liu authored
      [ Upstream commit 4c82ac3c ]
      
      Originally that parameter was always reset to num_online_cpus during
      module initialisation, which renders it useless.
      
      The fix is to only set max_queues to num_online_cpus when user has not
      provided a value.
      Reported-by: default avatarJohnny Strom <johnny.strom@linuxsolutions.fi>
      Signed-off-by: default avatarWei Liu <wei.liu2@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bfc924ec
    • Karl Heiss's avatar
      sctp: Prevent soft lockup when sctp_accept() is called during a timeout event · b6a28a4f
      Karl Heiss authored
      [ Upstream commit 635682a1 ]
      
      A case can occur when sctp_accept() is called by the user during
      a heartbeat timeout event after the 4-way handshake.  Since
      sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
      bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
      the listening socket but released with the new association socket.
      The result is a deadlock on any future attempts to take the listening
      socket lock.
      
      Note that this race can occur with other SCTP timeouts that take
      the bh_lock_sock() in the event sctp_accept() is called.
      
       BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
       ...
       RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
       RSP: 0018:ffff880028323b20  EFLAGS: 00000206
       RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
       RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
       R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
       R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
       FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
       CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
       Stack:
       ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
       <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
       <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
       Call Trace:
       <IRQ>
       [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
       [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
       [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
       [<ffffffff81497255>] ? ip_rcv+0x275/0x350
       [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
       ...
      
      With lockdep debugging:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       CslRx/12087 is trying to release lock (slock-AF_INET) at:
       [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
       but there are no more locks to release!
      
       other info that might help us debug this:
       2 locks held by CslRx/12087:
       #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
       #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
      
      Ensure the socket taken is also the same one that is released by
      saving a copy of the socket before entering the timeout event
      critical section.
      Signed-off-by: default avatarKarl Heiss <kheiss@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b6a28a4f
    • Ido Schimmel's avatar
      team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid · 804f5e29
      Ido Schimmel authored
      [ Upstream commit 60a6531b ]
      
      We can't be within an RCU read-side critical section when deleting
      VLANs, as underlying drivers might sleep during the hardware operation.
      Therefore, replace the RCU critical section with a mutex. This is
      consistent with team_vlan_rx_add_vid.
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      804f5e29
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate orig_node free function · c24ba6ae
      Sven Eckelmann authored
      [ Upstream commit 42eff6a6 ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_node_free_ref.
      
      Fixes: 72822225 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c24ba6ae
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_hard_iface free function · 7e3ed644
      Sven Eckelmann authored
      [ Upstream commit b4d922cf ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_hardif_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7e3ed644
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate neigh_ifinfo free function · 556b76f9
      Sven Eckelmann authored
      [ Upstream commit ae3e1e36 ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_ifinfo_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      556b76f9
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_neigh_node free function · 988d1d15
      Sven Eckelmann authored
      [ Upstream commit 2baa753c ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_node_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      988d1d15
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_orig_ifinfo free function · 1a028ab3
      Sven Eckelmann authored
      [ Upstream commit deed9660 ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_ifinfo_free_ref.
      
      Fixes: 7351a482 ("batman-adv: split out router from orig_node")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1a028ab3
    • Sven Eckelmann's avatar
      batman-adv: Avoid recursive call_rcu for batadv_nc_node · 73234039
      Sven Eckelmann authored
      [ Upstream commit 44e8e7e9 ]
      
      The batadv_nc_node_free_ref function uses call_rcu to delay the free of the
      batadv_nc_node object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_nc_node also contains a
      reference to orig_node which must be removed.
      
      The reference drop of orig_node was done in the call_rcu function
      batadv_nc_node_free_rcu but should actually be done in the
      batadv_nc_node_release function to avoid nested call_rcus. This is
      important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will
      not detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: d56b1705 ("batman-adv: network coding - detect coding nodes and remove these after timeout")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      73234039
    • Sven Eckelmann's avatar
      batman-adv: Avoid recursive call_rcu for batadv_bla_claim · 96a84694
      Sven Eckelmann authored
      [ Upstream commit 63b39927 ]
      
      The batadv_claim_free_ref function uses call_rcu to delay the free of the
      batadv_bla_claim object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_bla_claim also contains a
      reference to backbone_gw which must be removed.
      
      The reference drop of backbone_gw was done in the call_rcu function
      batadv_claim_free_rcu but should actually be done in the
      batadv_claim_release function to avoid nested call_rcus. This is important
      because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not
      detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: 23721387 ("batman-adv: add basic bridge loop avoidance code")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Acked-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      96a84694
    • Nikolay Aleksandrov's avatar
      bridge: fix lockdep addr_list_lock false positive splat · d2f1feae
      Nikolay Aleksandrov authored
      [ Upstream commit c6894dec ]
      
      After promisc mode management was introduced a bridge device could do
      dev_set_promiscuity from its ndo_change_rx_flags() callback which in
      turn can be called after the bridge's addr_list_lock has been taken
      (e.g. by dev_uc_add). This causes a false positive lockdep splat because
      the port interfaces' addr_list_lock is taken when br_manage_promisc()
      runs after the bridge's addr list lock was already taken.
      To remove the false positive introduce a custom bridge addr_list_lock
      class and set it on bridge init.
      A simple way to reproduce this is with the following:
      $ brctl addbr br0
      $ ip l add l br0 br0.100 type vlan id 100
      $ ip l set br0 up
      $ ip l set br0.100 up
      $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
      $ brctl addif br0 eth0
      Splat:
      [   43.684325] =============================================
      [   43.684485] [ INFO: possible recursive locking detected ]
      [   43.684636] 4.4.0-rc8+ #54 Not tainted
      [   43.684755] ---------------------------------------------
      [   43.684906] brctl/1187 is trying to acquire lock:
      [   43.685047]  (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.685460]  but task is already holding lock:
      [   43.685618]  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.686015]  other info that might help us debug this:
      [   43.686316]  Possible unsafe locking scenario:
      
      [   43.686743]        CPU0
      [   43.686967]        ----
      [   43.687197]   lock(_xmit_ETHER);
      [   43.687544]   lock(_xmit_ETHER);
      [   43.687886] *** DEADLOCK ***
      
      [   43.688438]  May be due to missing lock nesting notation
      
      [   43.688882] 2 locks held by brctl/1187:
      [   43.689134]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20
      [   43.689852]  #1:  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.690575] stack backtrace:
      [   43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54
      [   43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
      [   43.691770]  ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0
      [   43.692425]  ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139
      [   43.693071]  ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020
      [   43.693709] Call Trace:
      [   43.693931]  [<ffffffff81360ceb>] dump_stack+0x4b/0x70
      [   43.694199]  [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90
      [   43.694483]  [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0
      [   43.694789]  [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0
      [   43.695064]  [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0
      [   43.695340]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.695623]  [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80
      [   43.695901]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.696180]  [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.696460]  [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50
      [   43.696750]  [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge]
      [   43.697052]  [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge]
      [   43.697348]  [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge]
      [   43.697655]  [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0
      [   43.697943]  [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90
      [   43.698223]  [<ffffffff815072de>] dev_uc_add+0x5e/0x80
      [   43.698498]  [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q]
      [   43.698798]  [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80
      [   43.699083]  [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20
      [   43.699374]  [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80
      [   43.699678]  [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20
      [   43.699973]  [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge]
      [   43.700259]  [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge]
      [   43.700548]  [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge]
      [   43.700836]  [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0
      [   43.701106]  [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0
      [   43.701379]  [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450
      [   43.701665]  [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450
      [   43.701947]  [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50
      [   43.702219]  [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290
      [   43.702500]  [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0
      [   43.702771]  [<ffffffff81243079>] SyS_ioctl+0x79/0x90
      [   43.703033]  [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Bridge list <bridge@lists.linux-foundation.org>
      CC: Andy Gospodarek <gospo@cumulusnetworks.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Fixes: 2796d0c6 ("bridge: Automatically manage port promiscuous mode.")
      Reported-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d2f1feae
    • Eric Dumazet's avatar
      ipv6: update skb->csum when CE mark is propagated · ca7f40ec
      Eric Dumazet authored
      [ Upstream commit 34ae6a1a ]
      
      When a tunnel decapsulates the outer header, it has to comply
      with RFC 6080 and eventually propagate CE mark into inner header.
      
      It turns out IP6_ECN_set_ce() does not correctly update skb->csum
      for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
      messages and stack traces.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ca7f40ec
    • Rabin Vincent's avatar
      net: bpf: reject invalid shifts · 4540f1c1
      Rabin Vincent authored
      [ Upstream commit 229394e8 ]
      
      On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a
      constant shift that can't be encoded in the immediate field of the
      UBFM/SBFM instructions is passed to the JIT.  Since these shifts
      amounts, which are negative or >= regsize, are invalid, reject them in
      the eBPF verifier and the classic BPF filter checker, for all
      architectures.
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4540f1c1
    • Eric Dumazet's avatar
      phonet: properly unshare skbs in phonet_rcv() · 2558ef49
      Eric Dumazet authored
      [ Upstream commit 7aaed57c ]
      
      Ivaylo Dimitrov reported a regression caused by commit 7866a621
      ("dev: add per net_device packet type chains").
      
      skb->dev becomes NULL and we crash in __netif_receive_skb_core().
      
      Before above commit, different kind of bugs or corruptions could happen
      without major crash.
      
      But the root cause is that phonet_rcv() can queue skb without checking
      if skb is shared or not.
      
      Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
      Reported-by: default avatarIvaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
      Tested-by: default avatarIvaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Remi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2558ef49
    • Karl Heiss's avatar
      bonding: Prevent IPv6 link local address on enslaved devices · 68ab182d
      Karl Heiss authored
      [ Upstream commit 03d84a5f ]
      
      Commit 1f718f0f ("bonding: populate neighbour's private on enslave")
      undoes the fix provided by commit c2edacf8 ("bonding / ipv6: no addrconf
      for slaves separately from master") by effectively setting the slave flag
      after the slave has been opened.  If the slave comes up quickly enough, it
      will go through the IPv6 addrconf before the slave flag has been set and
      will get a link local IPv6 address.
      
      In order to ensure that addrconf knows to ignore the slave devices on state
      change, set IFF_SLAVE before dev_open() during bonding enslavement.
      
      Fixes: 1f718f0f ("bonding: populate neighbour's private on enslave")
      Signed-off-by: default avatarKarl Heiss <kheiss@gmail.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Reviewed-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      68ab182d
    • Konstantin Khlebnikov's avatar
      net: preserve IP control block during GSO segmentation · b2803c41
      Konstantin Khlebnikov authored
      [ Upstream commit 9207f9d4 ]
      
      Skb_gso_segment() uses skb control block during segmentation.
      This patch adds 32-bytes room for previous control block which
      will be copied into all resulting segments.
      
      This patch fixes kernel crash during fragmenting forwarded packets.
      Fragmentation requires valid IP CB in skb for clearing ip options.
      Also patch removes custom save/restore in ovs code, now it's redundant.
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.comSigned-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b2803c41
    • Michal Kubeček's avatar
      udp: disallow UFO for sockets with SO_NO_CHECK option · 5d5076ec
      Michal Kubeček authored
      [ Upstream commit 40ba3302 ]
      
      Commit acf8dd0a ("udp: only allow UFO for packets from SOCK_DGRAM
      sockets") disallows UFO for packets sent from raw sockets. We need to do
      the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if
      for a bit different reason: while such socket would override the
      CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and
      bad offloading flags warning is triggered in __skb_gso_segment().
      
      In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow
      UFO for packets sent by sockets with UDP_NO_CHECK6_TX option.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Tested-by: default avatarShannon Nelson <shannon.nelson@intel.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5d5076ec
    • Neal Cardwell's avatar
      tcp_yeah: don't set ssthresh below 2 · 3ddce9ec
      Neal Cardwell authored
      [ Upstream commit 83d15e70 ]
      
      For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno
      and CUBIC, per RFC 5681 (equation 4).
      
      tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh
      value if the intended reduction is as big or bigger than the current
      cwnd. Congestion control modules should never return a zero or
      negative ssthresh. A zero ssthresh generally results in a zero cwnd,
      causing the connection to stall. A negative ssthresh value will be
      interpreted as a u32 and will set a target cwnd for PRR near 4
      billion.
      
      Oleksandr Natalenko reported that a system using tcp_yeah with ECN
      could see a warning about a prior_cwnd of 0 in
      tcp_cwnd_reduction(). Testing verified that this was due to
      tcp_yeah_ssthresh() misbehaving in this way.
      Reported-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3ddce9ec
    • Sasha Levin's avatar
      net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memory · 105ae375
      Sasha Levin authored
      [ Upstream commit 320f1a4a ]
      
      proc_dostring() needs an initialized destination string, while the one
      provided in proc_sctp_do_hmac_alg() contains stack garbage.
      
      Thus, writing to cookie_hmac_alg would strlen() that garbage and end up
      accessing invalid memory.
      
      Fixes: 3c68198e ("sctp: Make hmac algorithm selection for cookie generation dynamic")
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      105ae375
    • Nicolas Dichtel's avatar
      vxlan: fix test which detect duplicate vxlan iface · 780a5b75
      Nicolas Dichtel authored
      [ Upstream commit 07b9b37c ]
      
      When a vxlan interface is created, the driver checks that there is not
      another vxlan interface with the same properties. To do this, it checks
      the existing vxlan udp socket. Since commit 1c51a915, the creation of
      the vxlan socket is done only when the interface is set up, thus it breaks
      that test.
      
      Example:
      $ ip l a vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
      $ ip l a vxlan11 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
      $ ip -br l | grep vxlan
      vxlan10          DOWN           f2:55:1c:6a:fb:00 <BROADCAST,MULTICAST>
      vxlan11          DOWN           7a:cb:b9:38:59:0d <BROADCAST,MULTICAST>
      
      Instead of checking sockets, let's loop over the vxlan iface list.
      
      Fixes: 1c51a915 ("vxlan: fix race caused by dropping rtnl_unlock")
      Reported-by: default avatarThomas Faivre <thomas.faivre@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      780a5b75
    • Hannes Frederic Sowa's avatar
      bridge: Only call /sbin/bridge-stp for the initial network namespace · 40db2513
      Hannes Frederic Sowa authored
      [ Upstream commit ff621985 ]
      
      [I stole this patch from Eric Biederman. He wrote:]
      
      > There is no defined mechanism to pass network namespace information
      > into /sbin/bridge-stp therefore don't even try to invoke it except
      > for bridge devices in the initial network namespace.
      >
      > It is possible for unprivileged users to cause /sbin/bridge-stp to be
      > invoked for any network device name which if /sbin/bridge-stp does not
      > guard against unreasonable arguments or being invoked twice on the
      > same network device could cause problems.
      
      [Hannes: changed patch using netns_eq]
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      40db2513
    • willy tarreau's avatar
      unix: properly account for FDs passed over unix sockets · 75bf68f2
      willy tarreau authored
      [ Upstream commit 712f4aad ]
      
      It is possible for a process to allocate and accumulate far more FDs than
      the process' limit by sending them over a unix socket then closing them
      to keep the process' fd count low.
      
      This change addresses this problem by keeping track of the number of FDs
      in flight per user and preventing non-privileged processes from having
      more FDs in flight than their configured FD limit.
      
      Reported-by: socketpair@gmail.com
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Mitigates: CVE-2013-4312 (Linux 2.0+)
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      75bf68f2
    • Florian Westphal's avatar
      connector: bump skb->users before callback invocation · f8beafa5
      Florian Westphal authored
      [ Upstream commit 55285bf0 ]
      
      Dmitry reports memleak with syskaller program.
      Problem is that connector bumps skb usecount but might not invoke callback.
      
      So move skb_get to where we invoke the callback.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f8beafa5
    • Xin Long's avatar
      sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 7b42666b
      Xin Long authored
      [ Upstream commit 068d8bd3 ]
      
      In sctp_close, sctp_make_abort_user may return NULL because of memory
      allocation failure. If this happens, it will bypass any state change
      and never free the assoc. The assoc has no chance to be freed and it
      will be kept in memory with the state it had even after the socket is
      closed by sctp_close().
      
      So if sctp_make_abort_user fails to allocate memory, we should abort
      the asoc via sctp_primitive_ABORT as well. Just like the annotation in
      sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
      "Even if we can't send the ABORT due to low memory delete the TCB.
      This is a departure from our typical NOMEM handling".
      
      But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
      dereference the chunk pointer, and system crash. So we should add
      SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
      places where it adds SCTP_CMD_REPLY cmd.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7b42666b