1. 27 Jan, 2013 4 commits
    • Eric Dumazet's avatar
      net: loopback: fix a dst refcounting issue · 794ed393
      Eric Dumazet authored
      Ben Greear reported crashes in ip_rcv_finish() on a stress
      test involving many macvlans.
      
      We tracked the bug to a dst use after free. ip_rcv_finish()
      was calling dst->input() and got garbage for dst->input value.
      
      It appears the bug is in loopback driver, lacking
      a skb_dst_force() before calling netif_rx().
      
      As a result, a non refcounted dst, normally protected by a
      RCU read_lock section, was escaping this section and could
      be freed before the packet being processed.
      
        [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
        [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
        [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
        [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
        [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
        [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
        [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
        [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
        [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
        [<ffffffff814ae412>] ip_output+0x61/0x67
        [<ffffffff814ab904>] dst_output+0x17/0x1b
        [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
        [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
        [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
        [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
        [<ffffffff814c3571>] tcp_connect+0x53c/0x587
        [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
        [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
        [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
        [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
        [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
        [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
        [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
        [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
        [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
        [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
        [<ffffffff814632c6>] sys_connect+0x75/0x98
      
      This bug was introduced in linux-2.6.35, in commit
      7fee226a (net: add a noref bit on skb dst)
      
      skb_dst_force() is enforced in dev_queue_xmit() for devices having a
      qdisc.
      Reported-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      794ed393
    • Wanlong Gao's avatar
      virtio-net: reset virtqueue affinity when doing cpu hotplug · 8de4b2f3
      Wanlong Gao authored
      Add a cpu notifier to virtio-net, so that we can reset the
      virtqueue affinity if the cpu hotplug happens. It improve
      the performance through enabling or disabling the virtqueue
      affinity after doing cpu hotplug.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Eric Dumazet <erdnetdev@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: virtualization@lists.linux-foundation.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarWanlong Gao <gaowanlong@cn.fujitsu.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8de4b2f3
    • Wanlong Gao's avatar
      virtio-net: split out clean affinity function · 8898c21c
      Wanlong Gao authored
      Split out the clean affinity function to virtnet_clean_affinity().
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Eric Dumazet <erdnetdev@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: virtualization@lists.linux-foundation.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarWanlong Gao <gaowanlong@cn.fujitsu.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8898c21c
    • Wanlong Gao's avatar
      virtio-net: fix the set affinity bug when CPU IDs are not consecutive · 47be2479
      Wanlong Gao authored
      As Michael mentioned, set affinity and select queue will not work very
      well when CPU IDs are not consecutive, this can happen with hot unplug.
      Fix this bug by traversal the online CPUs, and create a per cpu variable
      to find the mapping from CPU to the preferable virtual-queue.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Eric Dumazet <erdnetdev@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: virtualization@lists.linux-foundation.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarWanlong Gao <gaowanlong@cn.fujitsu.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47be2479
  2. 26 Jan, 2013 3 commits
  3. 23 Jan, 2013 4 commits
    • Timo Teräs's avatar
      r8169: remove the obsolete and incorrect AMD workaround · 5d0feaff
      Timo Teräs authored
      This was introduced in commit 6dccd16b "r8169: merge with version
      6.001.00 of Realtek's r8169 driver". I did not find the version
      6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
      this hunk is no longer present.
      
      Also commit 05af2142 "r8169: fix Ethernet Hangup for RTL8110SC
      rev d" claims to have fixed this issue otherwise.
      
      The magic compare mask of 0xfffe000 is dubious as it masks
      parts of the Reserved part, and parts of the VLAN tag. But this
      does not make much sense as the VLAN tag parts are perfectly
      valid there. In matter of fact this seems to be triggered with
      any VLAN tagged packet as RxVlanTag bit is matched. I would
      suspect 0xfffe0000 was intended to test reserved part only.
      
      Finally, this hunk is evil as it can cause more packets to be
      handled than what was NAPI quota causing net/core/dev.c:
      net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
      mess up the NAPI state causing device to hang.
      
      As result, any system using VLANs and having high receive
      traffic (so that NAPI poll budget limits rtl_rx) would result
      in device hang.
      Signed-off-by: default avatarTimo Teräs <timo.teras@iki.fi>
      Acked-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d0feaff
    • Jason Wang's avatar
      tuntap: limit the number of flow caches · b8732fb7
      Jason Wang authored
      We create new flow caches when a new flow is identified by tuntap, This may lead
      some issues:
      
      - userspace may produce a huge amount of short live flows to exhaust host memory
      - the unlimited number of flow caches may produce a long list which increase the
        time in the linear searching
      
      Solve this by introducing a limit of total number of flow caches.
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8732fb7
    • Jason Wang's avatar
      tuntap: reduce memory using of queues · edfb6a14
      Jason Wang authored
      A MAX_TAP_QUEUES(1024) queues of tuntap device is always allocated
      unconditionally even userspace only requires a single queue device. This is
      unnecessary and will lead a very high order of page allocation when has a high
      possibility to fail. Solving this by creating a one queue net device when
      userspace only use one queue and also reduce MAX_TAP_QUEUES to
      DEFAULT_MAX_NUM_RSS_QUEUES which can guarantee the success of
      the allocation.
      Reported-by: default avatarDirk Hohndel <dirk@hohndel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edfb6a14
    • Bjørn Mork's avatar
      net: cdc_mbim: send ZLP only for the specific buggy device · 844e88f0
      Bjørn Mork authored
      Reverting 328d7b8a and instead adding an exception for the
      Sierra Wireless MC7710.
      
      commit 328d7b8a (net: cdc_mbim: send ZLP after max sized NTBs)
      added a workaround for an issue observed on one specific device.
      Concerns were raised that this workaround adds a performance
      penalty to all devices based on questionable, if not buggy,
      behaviour of a single device:
      
       "If you add ZLP for NTBs of dwNtbOutMaxSize, you are heavily affecting CPU
        load, increasing interrupt load by factor of 2 in high load traffic
        scenario and possibly decreasing throughput for all other devices
        which behaves correctly."
      
       "The idea of NCM was to avoid extra ZLPs. If your transfer is exactly
        dwNtbOutMaxSize, it's known, you can submit such request on the receiver
        side and you do not need any EOT indicatation, so the frametime can be
        used for useful data."
      
      Adding a device specific exception to prevent the workaround from
      affecting well behaved devices.
      
      The assumption here is that needing a ZLP is truly an *exception*.
      We do not yet have enough data to verify this.  The generic
      workaround in commit 328d7b8a should be considered acceptable despite
      the performance penalty if the exception list becomes a maintainance
      hassle.
      
      Cc: Alexey ORISHKO <alexey.orishko@stericsson.com>
      Cc: Yauheni Kaliuta <y.kaliuta@gmail.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      844e88f0
  4. 22 Jan, 2013 4 commits
    • Steffen Klassert's avatar
      ipv4: Fix route refcount on pmtu discovery · b44108db
      Steffen Klassert authored
      git commit 9cb3a50c (ipv4: Invalidate the socket cached route on
      pmtu events if possible) introduced a refcount problem. We don't
      get a refcount on the route if we get it from__sk_dst_get(), but
      we need one if we want to reuse this route because __sk_dst_set()
      releases the refcount of the old route. This patch adds proper
      refcount handling for that case. We introduce a 'new' flag to
      indicate that we are going to use a new route and we release the
      old route only if we replace it by a new one.
      Reported-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b44108db
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 0c8729c9
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      1) The transport header did not point to the right place after
         esp/ah processing on tunnel mode in the receive path. As a
         result, the ECN field of the inner header was not set correctly,
         fixes from Li RongQing.
      
      2) We did a null check too late in one of the xfrm_replay advance
         functions. This can lead to a division by zero, fix from
         Nickolai Zeldovich.
      
      3) The size calculation of the hash table missed the muiltplication
         with the actual struct size when the hash table is freed.
         We might call the wrong free function, fix from Michal Kubecek.
      
      4) On IPsec pmtu events we can't access the transport headers of
         the original packet, so force a relookup for all routes
         to notify about the pmtu event.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c8729c9
    • Daniel Wagner's avatar
      net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly · d8429506
      Daniel Wagner authored
      Commit 6a328d8c changed the update
      logic for the socket but it does not update the SCM_RIGHTS update
      as well. This patch is based on the net_prio fix commit
      
      48a87cc2
      
          net: netprio: fd passed in SCM_RIGHTS datagram not set correctly
      
          A socket fd passed in a SCM_RIGHTS datagram was not getting
          updated with the new tasks cgrp prioidx. This leaves IO on
          the socket tagged with the old tasks priority.
      
          To fix this add a check in the scm recvmsg path to update the
          sock cgrp prioidx with the new tasks value.
      
      Let's apply the same fix for net_cls.
      Signed-off-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Reported-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: netdev@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8429506
    • Eric Dumazet's avatar
      netxen: fix off by one bug in netxen_release_tx_buffer() · a05948f2
      Eric Dumazet authored
      Christoph Paasch found netxen could trigger a BUG in its dismantle
      phase, in netxen_release_tx_buffer(), using full size TSO packets.
      
      cmd_buf->frag_count includes the skb->data part, so the loop must
      start at index 1 instead of 0, or else we can make an out
      of bound access to cmd_buff->frag_array[MAX_SKB_FRAGS + 2]
      
      Christoph provided the fixes in netxen_map_tx_skb() function.
      In case of a dma mapping error, its better to clear the dma fields
      so that we don't try to unmap them again in netxen_release_tx_buffer()
      Reported-by: default avatarChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarChristoph Paasch <christoph.paasch@uclouvain.be>
      Cc: Sony Chacko <sony.chacko@qlogic.com>
      Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a05948f2
  5. 21 Jan, 2013 13 commits
  6. 20 Jan, 2013 1 commit
  7. 19 Jan, 2013 2 commits
  8. 18 Jan, 2013 5 commits
  9. 17 Jan, 2013 4 commits