1. 09 Jan, 2014 13 commits
  2. 08 Jan, 2014 18 commits
  3. 07 Jan, 2014 9 commits
    • David S. Miller's avatar
      Merge branch 'tipc' · 8752b5ca
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: link setup and failover improvements
      
      This series consists of four unrelated commits with different purposes.
      
      - Commit #1 is purely cosmetic and pedagogic, hopefully making the
        failover/tunneling logics slightly easier to understand.
      - Commit #2 fixes a bug that has always been in the code, but was not
        discovered until very recently.
      - Commit #3 fixes a non-fatal race issue in the neighbour discovery
        code.
      - Commit #4 removes an unnecessary indirection step during link
        startup.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8752b5ca
    • Jon Paul Maloy's avatar
      tipc: make link start event synchronous · 581465fa
      Jon Paul Maloy authored
      When a link is created we delay the start event by launching it
      to be executed later in a tasklet. As we hold all the
      necessary locks at the moment of creation, and there is no risk
      of deadlock or contention, this delay serves no purpose in the
      current code.
      
      We remove this obsolete indirection step, and the associated function
      link_start(). At the same time, we rename the function tipc_link_stop()
      to the more appropriate tipc_link_purge_queues().
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      581465fa
    • Ying Xue's avatar
      tipc: introduce new spinlock to protect struct link_req · f9a2c80b
      Ying Xue authored
      Currently, only 'bearer_lock' is used to protect struct link_req in
      the function disc_timeout(). This is unsafe, since the member fields
      'num_nodes' and 'timer_intv' might be accessed by below three different
      threads simultaneously, none of them grabbing bearer_lock in the
      critical region:
      
      link_activate()
        tipc_bearer_add_dest()
          tipc_disc_add_dest()
            req->num_nodes++;
      
      tipc_link_reset()
        tipc_bearer_remove_dest()
          tipc_disc_remove_dest()
            req->num_nodes--
            disc_update()
              read req->num_nodes
      	write req->timer_intv
      
      disc_timeout()
        read req->num_nodes
        read/write req->timer_intv
      
      Without lock protection, the only symptom of a race is that discovery
      messages occasionally may not be sent out. This is not fatal, since such
      messages are best-effort anyway. On the other hand, since discovery
      messages are not time critical, adding a protecting lock brings no
      serious overhead either. So we add a new, dedicated spinlock in
      order to guarantee absolute data consistency in link_req objects.
      This also helps reduce the overall role of the bearer_lock, which
      we want to remove completely in a later commit series.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9a2c80b
    • Jon Paul Maloy's avatar
      tipc: remove 'has_redundant_link' flag from STATE link protocol messages · b9d4c339
      Jon Paul Maloy authored
      The flag 'has_redundant_link' is defined only in RESET and ACTIVATE
      protocol messages. Due to an ambiguity in the protocol specification it
      is currently also transferred in STATE messages. Its value is used to
      initialize a link state variable, 'permit_changeover', which is used
      to inhibit futile link failover attempts when it is known that the
      peer node has no working links at the moment, although the local node
      may still think it has one.
      
      The fact that 'has_redundant_link' incorrectly is read from STATE
      messages has the effect that 'permit_changeover' sometimes gets a wrong
      value, and permanently blocks any links from being re-established. Such
      failures can only occur in in dual-link systems, and are extremely rare.
      This bug seems to have always been present in the code.
      
      Furthermore, since commit b4b56102
      ("tipc: Ensure both nodes recognize loss of contact between them"),
      the 'permit_changeover' field serves no purpose any more. The task of
      enforcing 'lost contact' cycles at both peer endpoints is now taken
      by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN
      in struct tipc_node to abort unnecessary failover attempts.
      
      We therefore remove the 'has_redundant_link' flag from STATE messages,
      as well as the now redundant 'permit_changeover' variable.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9d4c339
    • Jon Paul Maloy's avatar
      tipc: rename functions related to link failover and improve comments · 170b3927
      Jon Paul Maloy authored
      The functionality related to link addition and failover is unnecessarily
      hard to understand and maintain. We try to improve this by renaming
      some of the functions, at the same time adding or improving the
      explanatory comments around them. Names such as "tipc_rcv()" etc. also
      align better with what is used in other networking components.
      
      The changes in this commit are purely cosmetic, no functional changes
      are made.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      170b3927
    • Daniel Borkmann's avatar
      net: skbuff: const-ify casts in skb_queue_* functions · fd44b93c
      Daniel Borkmann authored
      We should const-ify comparisons on skb_queue_* inline helper
      functions as their parameters are const as well, so lets not
      drop that.
      Suggested-by: default avatarBrad Spengler <spender@grsecurity.net>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd44b93c
    • Daniel Borkmann's avatar
      net: xfrm: xfrm_policy: fix inline not at beginning of declaration · be7928d2
      Daniel Borkmann authored
      Fix three warnings related to:
      
        net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
        net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
        net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
      
      Just removing the inline keyword is sufficient as the compiler will
      decide on its own about inlining or not.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be7928d2
    • Shawn Bohrer's avatar
      mlx4_en: Select PTP_1588_CLOCK · 74b9c3ea
      Shawn Bohrer authored
      Now that mlx4_en includes a PHC driver it must select PTP_1588_CLOCK.
      
         drivers/built-in.o: In function `mlx4_en_get_ts_info':
      >> en_ethtool.c:(.text+0x391a11): undefined reference to `ptp_clock_index'
         drivers/built-in.o: In function `mlx4_en_remove_timestamp':
      >> (.text+0x397913): undefined reference to `ptp_clock_unregister'
         drivers/built-in.o: In function `mlx4_en_init_timestamp':
      >> (.text+0x397b20): undefined reference to `ptp_clock_register'
      
      Fixes: ad7d4eae ("mlx4_en: Add PTP hardware clock")
      Signed-off-by: default avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74b9c3ea
    • Jerry Chu's avatar
      net-gre-gro: Add GRE support to the GRO stack · bf5a755f
      Jerry Chu authored
      This patch built on top of Commit 299603e8
      ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
      the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
      stack. It also serves as an example for supporting other encapsulation
      protocols in the GRO stack in the future.
      
      The patch supports version 0 and all the flags (key, csum, seq#) but
      will flush any pkt with the S (seq#) flag. This is because the S flag
      is not support by GSO, and a GRO pkt may end up in the forwarding path,
      thus requiring GSO support to break it up correctly.
      
      Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
      ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
      IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
      be easily added, so is the support for GRE variations like NVGRE.
      
      The patch also support csum offload. Specifically if the csum flag is on
      and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
      the code will take advantage of the csum computed by the h/w when
      validating the GRE csum.
      
      Note that commit 60769a5d "ipv4: gre:
      add GRO capability" already introduces GRO capability to IPv4 GRE
      tunnels, using the gro_cells infrastructure. But GRO is done after
      GRE hdr has been removed (i.e., decapped). The following patch applies
      GRO when pkts first come in (before hitting the GRE tunnel code). There
      is some performance advantage for applying GRO as early as possible.
      Also this approach is transparent to other subsystem like Open vSwitch
      where GRE decap is handled outside of the IP stack hence making it
      harder for the gro_cells stuff to apply. On the other hand, some NICs
      are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
      In that case the GRO processing of pkts from the same remote host will
      all happen on the same CPU and the performance may be suboptimal.
      
      I'm including some rough preliminary performance numbers below. Note
      that the performance will be highly dependent on traffic load, mix as
      usual. Moreover it also depends on NIC offload features hence the
      following is by no means a comprehesive study. Local testing and tuning
      will be needed to decide the best setting.
      
      All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
      (super_netperf 50 -H 192.168.1.18 -l 30)
      
      An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
      mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
      is configured.
      
      The GRO support for pkts AFTER decap are controlled through the device
      feature of the GRE device (e.g., ethtool -K gre1 gro on/off).
      
      1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
      thruput: 9.16Gbps
      CPU utilization: 19%
      
      1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
      thruput: 5.9Gbps
      CPU utilization: 15%
      
      1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
      thruput: 9.26Gbps
      CPU utilization: 12-13%
      
      1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
      thruput: 9.26Gbps
      CPU utilization: 10%
      
      The following tests were performed on a different NIC that is capable of
      csum offload. I.e., the h/w is capable of computing IP payload csum
      (CHECKSUM_COMPLETE).
      
      2.1 ethtool -K gre1 gro on (hence will use gro_cells)
      
      2.1.1 ethtool -K eth0 gro off; csum offload disabled
      thruput: 8.53Gbps
      CPU utilization: 9%
      
      2.1.2 ethtool -K eth0 gro off; csum offload enabled
      thruput: 8.97Gbps
      CPU utilization: 7-8%
      
      2.1.3 ethtool -K eth0 gro on; csum offload disabled
      thruput: 8.83Gbps
      CPU utilization: 5-6%
      
      2.1.4 ethtool -K eth0 gro on; csum offload enabled
      thruput: 8.98Gbps
      CPU utilization: 5%
      
      2.2 ethtool -K gre1 gro off
      
      2.2.1 ethtool -K eth0 gro off; csum offload disabled
      thruput: 5.93Gbps
      CPU utilization: 9%
      
      2.2.2 ethtool -K eth0 gro off; csum offload enabled
      thruput: 5.62Gbps
      CPU utilization: 8%
      
      2.2.3 ethtool -K eth0 gro on; csum offload disabled
      thruput: 7.69Gbps
      CPU utilization: 8%
      
      2.2.4 ethtool -K eth0 gro on; csum offload enabled
      thruput: 8.96Gbps
      CPU utilization: 5-6%
      Signed-off-by: default avatarH.K. Jerry Chu <hkchu@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf5a755f