1. 22 Jul, 2015 10 commits
  2. 21 Jul, 2015 30 commits
    • Jon Paul Maloy's avatar
      tipc: fix compatibility bug · 16040894
      Jon Paul Maloy authored
      In commit d999297c
      ("tipc: reduce locking scope during packet reception") we introduced
      a new function tipc_link_proto_rcv(). This function contains a bug,
      so that it sometimes by error sends out a non-zero link priority value
      in created protocol messages.
      
      The bug may lead to an extra link reset at initial link establising
      with older nodes. This will never happen more than once, whereafter
      the link will work as intended.
      
      We fix this bug in this commit.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16040894
    • David S. Miller's avatar
      Merge branch 'explicit-inbound-link-state' · 67b2914b
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: enable inband link state negotiation only when explicitly requested
      
      Changes in v5:
      
      - removed an invalid use of the link_update callback in the SF2 driver
        was appeared after merging "net: phy: fixed_phy: handle link-down case"
      
      - reworded the commit message for patch 2 to make it clear what it fixes and
        why this is required
      
      Initial cover letter from Stas:
      
      Hello.
      
      Currently the link status auto-negotiation is enabled
      for any SGMII link with fixed-link DT binding.
      The regression was reported:
      https://lkml.org/lkml/2015/7/8/865
      Apparently not all HW that implements SGMII protocol, generates the
      inband status for the auto-negotiation to work.
      More details here:
      https://lkml.org/lkml/2015/7/10/206
      
      The following patches reverts to the old behavior by default,
      which is to not enable the auto-negotiation for fixed-link.
      The new DT property is added that allows to explicitly request
      the auto-negotiation.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67b2914b
    • Stas Sergeev's avatar
      mvneta: use inband status only when explicitly enabled · f8af8e6e
      Stas Sergeev authored
      The commit 898b2970 ("mvneta: implement SGMII-based in-band link state
      signaling") implemented the link parameters auto-negotiation unconditionally.
      Unfortunately it appears that some HW that implements SGMII protocol,
      doesn't generate the inband status, so it is not possible to auto-negotiate
      anything with such HW.
      
      This patch enables the auto-negotiation only if explicitly requested with
      the 'managed' DT property.
      
      This patch fixes the following regression:
      https://lkml.org/lkml/2015/7/8/865Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      
      CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8af8e6e
    • Stas Sergeev's avatar
      of_mdio: add new DT property 'managed' to specify the PHY management type · 4cba5c21
      Stas Sergeev authored
      Currently the PHY management type is selected by the MAC driver arbitrary.
      The decision is based on the presence of the "fixed-link" node and on a
      will of the driver's authors.
      This caused a regression recently, when mvneta driver suddenly started
      to use the in-band status for auto-negotiation on fixed links.
      It appears the auto-negotiation may not work when expected by the MAC driver.
      Sebastien Rannou explains:
      << Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's
      a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with
      inband status) is connected to the switch through QSGMII, and in this context
      we are on the media side of the PHY. >>
      https://lkml.org/lkml/2015/7/10/206
      
      This patch introduces the new string property 'managed' that allows
      the user to set the management type explicitly.
      The supported values are:
      "auto" - default. Uses either MDIO or nothing, depending on the presence
      of the fixed-link node
      "in-band-status" - use in-band status
      Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      
      CC: Rob Herring <robh+dt@kernel.org>
      CC: Pawel Moll <pawel.moll@arm.com>
      CC: Mark Rutland <mark.rutland@arm.com>
      CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
      CC: Kumar Gala <galak@codeaurora.org>
      CC: Florian Fainelli <f.fainelli@gmail.com>
      CC: Grant Likely <grant.likely@linaro.org>
      CC: devicetree@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cba5c21
    • Stas Sergeev's avatar
      net: phy: fixed_phy: handle link-down case · 868a4215
      Stas Sergeev authored
      fixed_phy_register() currently hardcodes the fixed PHY link to 1, and
      expects to find a "speed" parameter to provide correct information
      towards the fixed PHY consumer.
      
      In a subsequent change, where we allow "managed" (e.g: (RS)GMII in-band
      status auto-negotiation) fixed PHYs, none of these parameters can be
      provided since they will be auto-negotiated, hence, we just provide a
      zero-initialized fixed_phy_status to fixed_phy_register() which makes it
      fail when we call fixed_phy_update_regs() since status.speed = 0 which
      makes us hit the "default" label and error out.
      
      Without this change, we would also see potentially inconsistent
      speed/duplex parameters for fixed PHYs when the link is DOWN.
      
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarStas Sergeev <stsp@users.sourceforge.net>
      [florian: add more background to why this is correct and desirable]
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      868a4215
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Do not override speed settings · d2eac98f
      Florian Fainelli authored
      The SF2 driver currently overrides speed settings for its port
      configured using a fixed PHY, this is both unnecessary and incorrect,
      because we keep feedback to the hardware parameters that we read from
      the PHY device, which in the case of a fixed PHY cannot possibly change
      speed.
      
      This is a required change to allow the fixed PHY code to allow
      registering a PHY with a link configured as DOWN by default and avoid
      some sort of circular dependency where we require the link_update
      callback to run to program the hardware, and we then utilize the fixed
      PHY parameters to program the hardware with the same settings.
      
      Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2eac98f
    • Mathias Krause's avatar
      net: #ifdefify sk_classid member of struct sock · e181a543
      Mathias Krause authored
      The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is
      enabled. #ifdefify it to reduce the size of struct sock on 32 bit
      systems, at least.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e181a543
    • David S. Miller's avatar
      Merge branch 'lwtunnel' · e69724f3
      David S. Miller authored
      Thomas Graf says:
      
      ====================
      Lightweight & flow based encapsulation
      
      This series combines the work previously posted by Roopa, Robert and
      myself. It's according to what we discussed at NFWS. The motivation
      of this series is to:
      
       * Consolidate code between OVS and the rest of the kernel and get
         rid of OVS vports and instead represent them as pure net_devices.
       * Introduce a lightweight tunneling mechanism which enables flow
         based encapsulation to improve scalability on both RX and TX.
       * Do the above in an encapsulation unspecific way so that the
         encapsulation type is eventually abstracted away from the user.
       * Use the same forwarding decision for both native forwarding and
         encapsulation thus allowing to switch between native IPv6 and
         UDP encapsulation based on endpoint without requiring additional
         logic
      
      The fundamental changes introduces in this series are:
       * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
         instructions. Depending on the specified type, the instructions
         apply to UDP encapsulations, MPLS and possible other in the future.
       * Depending on the encapsulation type, the output function of the
         dst is directly overwritten or the dst merely attaches metadata and
         relies on a subsequent net_device to apply it to the packet. The
         latter is typically used if an inner and outer IP header exist which
         require two subsequent routing lookups to be performed.
       * A new metadata_dst structure which can be attached to skbs to
         carry metadata in between subsystems. This new metadata transport
         is used to provide a single interface for VXLAN, routing and OVS
         to communicate through metadata.
      
      The OVS interfaces remain as-is but will transparently create a real
      VXLAN net_device in the background. iproute2 is extended with a new
      use cases:
      
        VXLAN:
        ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0
      
        MPLS:
        ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1
      
      Performance implications:
        The additional memory allocation in the receive path should have
        performance implications although it is not observable in standard
        throughput tests if GRO is properly done. The correct net_device
        model outweights the additional cost of the allocation. Furthermore,
        this implication can be relaxed by reintroducing a direct unqueued
        path from a software device to a consumer like bridge or OVS if
        needed.
      
          $ netperf  -t TCP_STREAM -H 15.1.1.201
          MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
          15.1.1.201 (15.1.1.201) port 0 AF_INET : demo
          Recv   Send    Send
          Socket Socket  Message  Elapsed
          Size   Size    Size     Time     Throughput
          bytes  bytes   bytes    secs.    10^6bits/sec
      
           87380  16384  16384    10.00    9118.17
      
      Changes since v1:
       * Properly initialize tun_id as reported by Julian
       * Drop dupliate netif_keep_dst() as reported by Alexei
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e69724f3
    • Thomas Graf's avatar
      openvswitch: Use regular VXLAN net_device device · 614732ea
      Thomas Graf authored
      This gets rid of all OVS specific VXLAN code in the receive and
      transmit path by using a VXLAN net_device to represent the vport.
      Only a small shim layer remains which takes care of handling the
      VXLAN specific OVS Netlink configuration.
      
      Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
      since they are no longer needed.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      614732ea
    • Thomas Graf's avatar
      openvswitch: Abstract vport name through ovs_vport_name() · c9db965c
      Thomas Graf authored
      This allows to get rid of the get_name() vport ops later on.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9db965c
    • Thomas Graf's avatar
      openvswitch: Move dev pointer into vport itself · be4ace6e
      Thomas Graf authored
      This is the first step in representing all OVS vports as regular
      struct net_devices. Move the net_device pointer into the vport
      structure itself to get rid of struct vport_netdev.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be4ace6e
    • Thomas Graf's avatar
      openvswitch: Make tunnel set action attach a metadata dst · 34ae932a
      Thomas Graf authored
      Utilize the new metadata dst to attach encapsulation instructions to
      the skb. The existing egress_tun_info via the OVS_CB() is left in
      place until all tunnel vports have been converted to the new method.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34ae932a
    • Thomas Graf's avatar
      vxlan: Factor out device configuration · 0dfbdf41
      Thomas Graf authored
      This factors out the device configuration out of the RTNL newlink
      API which allows for in-kernel creation of VXLAN net_devices.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dfbdf41
    • Thomas Graf's avatar
      fib: Add fib rule match on tunnel id · e7030878
      Thomas Graf authored
      This add the ability to select a routing table based on the tunnel
      id which allows to maintain separate routing tables for each virtual
      tunnel network.
      
      ip rule add from all tunnel-id 100 lookup 100
      ip rule add from all tunnel-id 200 lookup 200
      
      A new static key controls the collection of metadata at tunnel level
      upon demand.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7030878
    • Thomas Graf's avatar
      route: Per route IP tunnel metadata via lightweight tunnel · 3093fbe7
      Thomas Graf authored
      This introduces a new IP tunnel lightweight tunnel type which allows
      to specify IP tunnel instructions per route. Only IPv4 is supported
      at this point.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3093fbe7
    • Thomas Graf's avatar
      route: Extend flow representation with tunnel key · 1b7179d3
      Thomas Graf authored
      Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to
      allow routes to match on tunnel metadata. For now, the tunnel id is
      added to flowi_tunnel which allows for routes to be bound to specific
      virtual tunnels.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b7179d3
    • Thomas Graf's avatar
      vxlan: Flow based tunneling · ee122c79
      Thomas Graf authored
      Allows putting a VXLAN device into a new flow-based mode in which
      skbs with a ip_tunnel_info dst metadata attached will be encapsulated
      according to the instructions stored in there with the VXLAN device
      defaults taken into consideration.
      
      Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
      set, the packet processing will populate a ip_tunnel_info struct for
      each packet received and attach it to the skb using the new metadata
      dst.  The metadata structure will contain the outer header and tunnel
      header fields which have been stripped off. Layers further up in the
      stack such as routing, tc or netfitler can later match on these fields
      and perform forwarding. It is the responsibility of upper layers to
      ensure that the flag is set if the metadata is needed. The flag limits
      the additional cost of metadata collecting based on demand.
      
      This prepares the VXLAN device to be steered by the routing and other
      subsystems which allows to support encapsulation for a large number
      of tunnel endpoints and tunnel ids through a single net_device which
      improves the scalability.
      
      It also allows for OVS to leverage this mode which in turn allows for
      the removal of the OVS specific VXLAN code.
      
      Because the skb is currently scrubed in vxlan_rcv(), the attachment of
      the new dst metadata is postponed until after scrubing which requires
      the temporary addition of a new member to vxlan_metadata. This member
      is removed again in a later commit after the indirect VXLAN receive API
      has been removed.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee122c79
    • Thomas Graf's avatar
      arp: Inherit metadata dst when creating ARP requests · 0accfc26
      Thomas Graf authored
      If output device wants to see the dst, inherit the dst of the
      original skb and pass it on to generate the ARP request.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0accfc26
    • Thomas Graf's avatar
      dst: Metadata destinations · f38a9eb1
      Thomas Graf authored
      Introduces a new dst_metadata which enables to carry per packet metadata
      between forwarding and processing elements via the skb->dst pointer.
      
      The structure is set up to be a union. Thus, each separate type of
      metadata requires its own dst instance. If demand arises to carry
      multiple types of metadata concurrently, metadata dst entries can be
      made stackable.
      
      The metadata dst entry is refcnt'ed as expected for now but a non
      reference counted use is possible if the reference is forced before
      queueing the skb.
      
      In order to allow allocating dsts with variable length, the existing
      dst_alloc() is split into a dst_alloc() and dst_init() function. The
      existing dst_init() function to initialize the subsystem is being
      renamed to dst_subsys_init() to make it clear what is what.
      
      The check before ip_route_input() is changed to ignore metadata dsts
      and drop the dst inside the routing function thus allowing to interpret
      metadata in a later commit.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f38a9eb1
    • Thomas Graf's avatar
      icmp: Don't leak original dst into ip_route_input() · 773a69d6
      Thomas Graf authored
      ip_route_input() unconditionally overwrites the dst. Hide the original
      dst attached to the skb by calling skb_dst_set(skb, NULL) prior to
      ip_route_input().
      Reported-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      773a69d6
    • Thomas Graf's avatar
      ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic · 1d8fff90
      Thomas Graf authored
      Rename the tunnel metadata data structures currently internal to
      OVS and make them generic for use by all IP tunnels.
      
      Both structures are kernel internal and will stay that way. Their
      members are exposed to user space through individual Netlink
      attributes by OVS. It will therefore be possible to extend/modify
      these structures without affecting user ABI.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d8fff90
    • Roopa Prabhu's avatar
      mpls: ip tunnel support · e3e4712e
      Roopa Prabhu authored
      This implementation uses lwtunnel infrastructure to register
      hooks for mpls tunnel encaps.
      
      It picks cues from iptunnel_encaps infrastructure and previous
      mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3e4712e
    • Roopa Prabhu's avatar
    • Roopa Prabhu's avatar
      ipv6: rt6_info output redirect to tunnel output · 74a0f2fe
      Roopa Prabhu authored
      This is similar to ipv4 redirect of dst output to lwtunnel
      output function for encapsulation and xmit.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74a0f2fe
    • Roopa Prabhu's avatar
      ipv4: redirect dst output to lwtunnel output · 8602a625
      Roopa Prabhu authored
      For input routes with tunnel encap state this patch redirects
      dst output functions to lwtunnel_output which later resolves to
      the corresponding lwtunnel output function.
      
      This has been tested to work with mpls ip tunnels.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8602a625
    • Roopa Prabhu's avatar
      lwtunnel: support dst output redirect function · ffce4196
      Roopa Prabhu authored
      This patch introduces lwtunnel_output function to call corresponding
      lwtunnels output function to xmit the packet.
      
      It adds two variants lwtunnel_output and lwtunnel_output6 for ipv4 and
      ipv6 respectively today. But this is subject to change when lwtstate will
      reside in dst or dst_metadata (as per upstream discussions).
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ffce4196
    • Roopa Prabhu's avatar
      ipv6: support for fib route lwtunnel encap attributes · 19e42e45
      Roopa Prabhu authored
      This patch adds support in ipv6 fib functions to parse Netlink
      RTA encap attributes and attach encap state data to rt6_info.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19e42e45
    • Roopa Prabhu's avatar
      ipv4: support for fib route lwtunnel encap attributes · 571e7226
      Roopa Prabhu authored
      This patch adds support in ipv4 fib functions to parse user
      provided encap attributes and attach encap state data to fib_nh
      and rtable.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      571e7226
    • Roopa Prabhu's avatar
      lwtunnel: infrastructure for handling light weight tunnels like mpls · 499a2425
      Roopa Prabhu authored
      Provides infrastructure to parse/dump/store encap information for
      light weight tunnels like mpls. Encap information for such tunnels
      is associated with fib routes.
      
      This infrastructure is based on previous suggestions from
      Eric Biederman to follow the xfrm infrastructure.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      499a2425
    • Roopa Prabhu's avatar
      rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes · a0d9a860
      Roopa Prabhu authored
      This patch introduces two new RTA attributes to attach encap
      data to fib routes.
      
      Example iproute2 command to attach mpls encap data to ipv4 routes
      
      $ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Suggested-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0d9a860