1. 10 Dec, 2013 6 commits
    • Daniel Borkmann's avatar
      packet: introduce PACKET_QDISC_BYPASS socket option · d346a3fa
      Daniel Borkmann authored
      This patch introduces a PACKET_QDISC_BYPASS socket option, that
      allows for using a similar xmit() function as in pktgen instead
      of taking the dev_queue_xmit() path. This can be very useful when
      PF_PACKET applications are required to be used in a similar
      scenario as pktgen, but with full, flexible packet payload that
      needs to be provided, for example.
      
      On default, nothing changes in behaviour for normal PF_PACKET
      TX users, so everything stays as is for applications. New users,
      however, can now set PACKET_QDISC_BYPASS if needed to prevent
      own packets from i) reentering packet_rcv() and ii) to directly
      push the frame to the driver.
      
      In doing so we can increase pps (here 64 byte packets) for
      PF_PACKET a bit:
      
        # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
        1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
        2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
        3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
        4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
        5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
        6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
        7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
        8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
        9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
       10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
       11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
       [...]
       20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081
      
       [**]: qdisc path with packet_rcv(), how probably most people
             seem to use it (hopefully not anymore if not needed)
      
      The test was done using a modified trafgen, sending a simple
      static 64 bytes packet, on all CPUs.  The trick in the fast
      "qdisc path" case, is to avoid reentering packet_rcv() by
      setting the RAW socket protocol to zero, like:
      socket(PF_PACKET, SOCK_RAW, 0);
      
      Tradeoffs are documented as well in this patch, clearly, if
      queues are busy, we will drop more packets, tc disciplines are
      ignored, and these packets are not visible to taps anymore. For
      a pktgen like scenario, we argue that this is acceptable.
      
      The pointer to the xmit function has been placed in packet
      socket structure hole between cached_dev and prot_hook that
      is hot anyway as we're working on cached_dev in each send path.
      
      Done in joint work together with Jesper Dangaard Brouer.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d346a3fa
    • Daniel Borkmann's avatar
      net: dev: move inline skb_needs_linearize helper to header · 4262e5cc
      Daniel Borkmann authored
      As we need it elsewhere, move the inline helper function of
      skb_needs_linearize() over to skbuff.h include file. While
      at it, also convert the return to 'bool' instead of 'int'
      and add a proper kernel doc.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4262e5cc
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 34f9f437
      David S. Miller authored
      Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
      Daniel's direct transmit changes depend upon.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34f9f437
    • Daniel Borkmann's avatar
      packet: fix send path when running with proto == 0 · 66e56cd4
      Daniel Borkmann authored
      Commit e40526cb introduced a cached dev pointer, that gets
      hooked into register_prot_hook(), __unregister_prot_hook() to
      update the device used for the send path.
      
      We need to fix this up, as otherwise this will not work with
      sockets created with protocol = 0, plus with sll_protocol = 0
      passed via sockaddr_ll when doing the bind.
      
      So instead, assign the pointer directly. The compiler can inline
      these helper functions automagically.
      
      While at it, also assume the cached dev fast-path as likely(),
      and document this variant of socket creation as it seems it is
      not widely used (seems not even the author of TX_RING was aware
      of that in his reference example [1]). Tested with reproducer
      from e40526cb.
      
       [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example
      
      Fixes: e40526cb ("packet: fix use after free race in send path when dev is released")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Tested-by: default avatarSalam Noureddine <noureddine@aristanetworks.com>
      Tested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66e56cd4
    • Eric Dumazet's avatar
      pkt_sched: give visibility to mq slave qdiscs · 95dc1929
      Eric Dumazet authored
      Commit 6da7c8fc ("qdisc: allow setting default queuing discipline")
      added the ability to change default qdisc from pfifo_fast to say fq
      
      But as most modern ethernet devices are multiqueue, we cant really
      see all the statistics from "tc -s qdisc show", as the default root
      qdisc is mq.
      
      This patch adds the calls to qdisc_list_add() to mq and mqprio
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95dc1929
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · fbec3706
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains updates to i40e only.
      
      Jacob provides a i40e patch to get 1588 work correctly by separating
      TSYNVALID and TSYNINDX fields in the receive descriptor.
      
      Jesse provides several i40e patches, first to correct the checking
      of the multi-bit state.  The hash is reported correctly in the RSS
      field if and only if the filter status is 3.  Other values of the
      filter status mean different things and we should not depend on a
      bitwise result.  Then provides a patch to enable a couple of
      workarounds based on revision ID that allow the driver to work
      more fully on early hardware.
      
      Shannon provides several i40e patches as well.  First sets the media
      type in the hardware structure based on the external connection type.
      Then provides a patch to only setup the rings that will be used.  Lastly
      provides a fix where the TESTING state was still set when exiting the
      ethtool diagnostics.
      
      Kevin Scott provides one i40e patch to add a new flag to the i40e_add_veb()
      which allows the driver to request the hardware to filter on layer 2
      parameters.
      
      Anjali provides four i40e patches, first refactors the reset code in
      order to re-size queues and vectors while the interface is still up.
      Then provides a patch to enable all PCTYPEs expect FCoE for RSS.  Adds
      a message to notify the user of how many VFs are initialized on each
      port.  Lastly adds a new variable to track the number of PF instances,
      this is a global counter on purpose so that each PF loaded has a
      unique ID.
      
      Catherine bumps the driver version.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbec3706
  2. 09 Dec, 2013 12 commits
  3. 07 Dec, 2013 14 commits
  4. 06 Dec, 2013 8 commits
    • Joe Perches's avatar
      ether_addr_equal: Optimize implementation, remove unused compare_ether_addr · 0d74c42f
      Joe Perches authored
      Add a new check for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to reduce
      the number of or's used in the ether_addr_equal comparison to very
      slightly improve function performance.
      
      Simplify the ether_addr_equal_64bits implementation.
      Integrate and remove the zap_last_2bytes helper as it's now
      used only once.
      
      Remove the now unused compare_ether_addr function.
      
      Update the unaligned-memory-access documentation to remove the
      compare_ether_addr description and show how unaligned accesses
      could occur with ether_addr_equal.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d74c42f
    • wangweidong's avatar
      unix: convert printks to pr_<level> · 5cc208be
      wangweidong authored
      use pr_<level> instead of printk(LEVEL)
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cc208be
    • Jiri Pirko's avatar
      ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses · 53bd6749
      Jiri Pirko authored
      Creating an address with this flag set will result in kernel taking care
      of temporary addresses in the same way as if the address was created by
      kernel itself (after RA receive). This allows userspace applications
      implementing the autoconfiguration (NetworkManager for example) to
      implement ipv6 addresses privacy.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarThomas Haller <thaller@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53bd6749
    • Jiri Pirko's avatar
      ipv6 addrconf: extend ifa_flags to u32 · 479840ff
      Jiri Pirko authored
      There is no more space in u8 ifa_flags. So do what davem suffested and
      add another netlink attr called IFA_FLAGS for carry more flags.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarThomas Haller <thaller@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      479840ff
    • Michael Dalton's avatar
      virtio-net: free bufs correctly on invalid packet length · 98bfd23c
      Michael Dalton authored
      When a packet with invalid length arrives, ensure that the packet
      is freed correctly if mergeable packet buffers and big packets
      (GUEST_TSO4) are both enabled.
      Signed-off-by: default avatarMichael Dalton <mwdalton@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarAndrew Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98bfd23c
    • Ezequiel Garcia's avatar
      net: mvneta: Fix incorrect DMA unmapping size · a328f3a0
      Ezequiel Garcia authored
      The current code unmaps the DMA mapping created for rx skb_buff's by
      using the data_size as the the mapping size. This is wrong since the
      correct size to specify should match the size used to create the mapping.
      
      This commit removes the following DMA_API_DEBUG warning:
      
      ------------[ cut here ]------------
      WARNING: at lib/dma-debug.c:887 check_unmap+0x3a8/0x860()
      mvneta d0070000.ethernet: DMA-API: device driver frees DMA memory with different size [device address=0x000000002eb80000] [map size=1600 bytes] [unmap size=66 bytes]
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.21-01444-ga88ae13-dirty #92
      [<c0013600>] (unwind_backtrace+0x0/0xf8) from [<c0010fb8>] (show_stack+0x10/0x14)
      [<c0010fb8>] (show_stack+0x10/0x14) from [<c001afa0>] (warn_slowpath_common+0x48/0x68)
      [<c001afa0>] (warn_slowpath_common+0x48/0x68) from [<c001b01c>] (warn_slowpath_fmt+0x30/0x40)
      [<c001b01c>] (warn_slowpath_fmt+0x30/0x40) from [<c018d0fc>] (check_unmap+0x3a8/0x860)
      [<c018d0fc>] (check_unmap+0x3a8/0x860) from [<c018d734>] (debug_dma_unmap_page+0x64/0x70)
      [<c018d734>] (debug_dma_unmap_page+0x64/0x70) from [<c0233f78>] (mvneta_rx+0xec/0x468)
      [<c0233f78>] (mvneta_rx+0xec/0x468) from [<c023436c>] (mvneta_poll+0x78/0x16c)
      [<c023436c>] (mvneta_poll+0x78/0x16c) from [<c02db468>] (net_rx_action+0x94/0x160)
      [<c02db468>] (net_rx_action+0x94/0x160) from [<c0021e68>] (__do_softirq+0xe8/0x1d0)
      [<c0021e68>] (__do_softirq+0xe8/0x1d0) from [<c0021ff8>] (do_softirq+0x4c/0x58)
      [<c0021ff8>] (do_softirq+0x4c/0x58) from [<c0022228>] (irq_exit+0x58/0x90)
      [<c0022228>] (irq_exit+0x58/0x90) from [<c000e7c8>] (handle_IRQ+0x3c/0x94)
      [<c000e7c8>] (handle_IRQ+0x3c/0x94) from [<c0008548>] (armada_370_xp_handle_irq+0x4c/0xb4)
      [<c0008548>] (armada_370_xp_handle_irq+0x4c/0xb4) from [<c000dc20>] (__irq_svc+0x40/0x50)
      Exception stack(0xc04f1f70 to 0xc04f1fb8)
      1f60:                                     c1fe46f8 00000000 00001d92 00001d92
      1f80: c04f0000 c04f0000 c04f84a4 c03e081c c05220e7 00000001 c05220e7 c04f0000
      1fa0: 00000000 c04f1fb8 c000eaf8 c004c048 60000113 ffffffff
      [<c000dc20>] (__irq_svc+0x40/0x50) from [<c004c048>] (cpu_startup_entry+0x54/0x128)
      [<c004c048>] (cpu_startup_entry+0x54/0x128) from [<c04c1a14>] (start_kernel+0x29c/0x2f0)
      [<c04c1a14>] (start_kernel+0x29c/0x2f0) from [<00008074>] (0x8074)
      ---[ end trace d4955f6acd178110 ]---
      Mapped at:
       [<c018d600>] debug_dma_map_page+0x4c/0x11c
       [<c0235d6c>] mvneta_setup_rxqs+0x398/0x598
       [<c0236084>] mvneta_open+0x40/0x17c
       [<c02dbbd4>] __dev_open+0x9c/0x100
       [<c02dbe58>] __dev_change_flags+0x7c/0x134
      Signed-off-by: default avatarEzequiel Garcia <ezequiel.garcia@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a328f3a0
    • Jiri Pirko's avatar
      br: fix use of ->rx_handler_data in code executed on non-rx_handler path · 859828c0
      Jiri Pirko authored
      br_stp_rcv() is reached by non-rx_handler path. That means there is no
      guarantee that dev is bridge port and therefore simple NULL check of
      ->rx_handler_data is not enough. There is need to check if dev is really
      bridge port and since only rcu read lock is held here, do it by checking
      ->rx_handler pointer.
      
      Note that synchronize_net() in netdev_rx_handler_unregister() ensures
      this approach as valid.
      
      Introduced originally by:
      commit f350a0a8
        "bridge: use rx_handler_data pointer to store net_bridge_port pointer"
      
      Fixed but not in the best way by:
      commit b5ed54e9
        "bridge: fix RCU races with bridge port"
      
      Reintroduced by:
      commit 716ec052
        "bridge: fix NULL pointer deref of br_port_get_rcu"
      
      Please apply to stable trees as well. Thanks.
      
      RH bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1025770Reported-by: default avatarLaine Stump <laine@redhat.com>
      Debugged-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      859828c0
    • Andrey Vagin's avatar
      virtio: delete napi structures from netdev before releasing memory · d4fb84ee
      Andrey Vagin authored
      free_netdev calls netif_napi_del too, but it's too late, because napi
      structures are placed on vi->rq. netif_napi_add() is called from
      virtnet_alloc_queues.
      
      general protection fault: 0000 [#1] SMP
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables virtio_balloon pcspkr virtio_net(-) i2c_pii
      CPU: 1 PID: 347 Comm: rmmod Not tainted 3.13.0-rc2+ #171
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      task: ffff8800b779c420 ti: ffff8800379e0000 task.ti: ffff8800379e0000
      RIP: 0010:[<ffffffff81322e19>]  [<ffffffff81322e19>] __list_del_entry+0x29/0xd0
      RSP: 0018:ffff8800379e1dd0  EFLAGS: 00010a83
      RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800379c2fd0 RCX: dead000000200200
      RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: ffff8800379c2fd0
      RBP: ffff8800379e1dd0 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800379c2f90
      R13: ffff880037839160 R14: 0000000000000000 R15: 00000000013352f0
      FS:  00007f1400e34740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f464124c763 CR3: 00000000b68cf000 CR4: 00000000000006e0
      Stack:
       ffff8800379e1df0 ffffffff8155beab 6b6b6b6b6b6b6b2b ffff8800378391c0
       ffff8800379e1e18 ffffffff8156499b ffff880037839be0 ffff880037839d20
       ffff88003779d3f0 ffff8800379e1e38 ffffffffa003477c ffff88003779d388
      Call Trace:
       [<ffffffff8155beab>] netif_napi_del+0x1b/0x80
       [<ffffffff8156499b>] free_netdev+0x8b/0x110
       [<ffffffffa003477c>] virtnet_remove+0x7c/0x90 [virtio_net]
       [<ffffffff813ae323>] virtio_dev_remove+0x23/0x80
       [<ffffffff813f62ef>] __device_release_driver+0x7f/0xf0
       [<ffffffff813f6ca0>] driver_detach+0xc0/0xd0
       [<ffffffff813f5f28>] bus_remove_driver+0x58/0xd0
       [<ffffffff813f72ec>] driver_unregister+0x2c/0x50
       [<ffffffff813ae65e>] unregister_virtio_driver+0xe/0x10
       [<ffffffffa0036942>] virtio_net_driver_exit+0x10/0x6ce [virtio_net]
       [<ffffffff810d7cf2>] SyS_delete_module+0x172/0x220
       [<ffffffff810a732d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff810f5d4c>] ? __audit_syscall_entry+0x9c/0xf0
       [<ffffffff81677f69>] system_call_fastpath+0x16/0x1b
      Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00
      RIP  [<ffffffff81322e19>] __list_del_entry+0x29/0xd0
       RSP <ffff8800379e1dd0>
      ---[ end trace d5931cd3f87c9763 ]---
      
      Fixes: 986a4f4d (virtio_net: multiqueue support)
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4fb84ee