1. 25 Feb, 2014 23 commits
    • Steffen Klassert's avatar
      vti: Update the ipv4 side to use it's own receive hook. · df3893c1
      Steffen Klassert authored
      With this patch, vti uses the IPsec protocol multiplexer to
      register it's own receive side hooks for ESP, AH and IPCOMP.
      
      Vti now does the following on receive side:
      
      1. Do an input policy check for the IPsec packet we received.
         This is required because this packet could be already
         prosecces by IPsec, so an inbuond policy check is needed.
      
      2. Mark the packet with the i_key. The policy and the state
         must match this key now. Policy and state belong to the outer
         namespace and policy enforcement is done at the further layers.
      
      3. Call the generic xfrm layer to do decryption and decapsulation.
      
      4. Wait for a callback from the xfrm layer to properly clean the
         skb to not leak informations on namespace and to update the
         device statistics.
      
      On transmit side:
      
      1. Mark the packet with the o_key. The policy and the state
         must match this key now.
      
      2. Do a xfrm_lookup on the original packet with the mark applied.
      
      3. Check if we got an IPsec route.
      
      4. Clean the skb to not leak informations on namespace
         transitions.
      
      5. Attach the dst_enty we got from the xfrm_lookup to the skb.
      
      6. Call dst_output to do the IPsec processing.
      
      7. Do the device statistics.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      df3893c1
    • Steffen Klassert's avatar
      ip_tunnel: Make vti work with i_key set · 6d608f06
      Steffen Klassert authored
      Vti uses the o_key to mark packets that were transmitted or received
      by a vti interface. Unfortunately we can't apply different marks
      to in and outbound packets with only one key availabe. Vti interfaces
      typically use wildcard selectors for vti IPsec policies. On forwarding,
      the same output policy will match for both directions. This generates
      a loop between the IPsec gateways until the ttl of the packet is
      exceeded.
      
      The gre i_key/o_key are usually there to find the right gre tunnel
      during a lookup. When vti uses the i_key to mark packets, the tunnel
      lookup does not work any more because vti does not use the gre keys
      as a hash key for the lookup.
      
      This patch workarounds this my not including the i_key when comupting
      the hash for the tunnel lookup in case of vti tunnels.
      
      With this we have separate keys available for the transmitting and
      receiving side of the vti interface.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      6d608f06
    • Steffen Klassert's avatar
      xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer · 70be6c91
      Steffen Klassert authored
      IPsec vti_rcv needs to remind the tunnel pointer to
      check it later at the vti_rcv_cb callback. So add
      this pointer to the IPsec common buffer, initialize
      it and check it to avoid transport state matching of
      a tunneled packet.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      70be6c91
    • Steffen Klassert's avatar
      ipcomp4: Use the IPsec protocol multiplexer API · d099160e
      Steffen Klassert authored
      Switch ipcomp4 to use the new IPsec protocol multiplexer.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      d099160e
    • Steffen Klassert's avatar
      ah4: Use the IPsec protocol multiplexer API · e5b56454
      Steffen Klassert authored
      Switch ah4 to use the new IPsec protocol multiplexer.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      e5b56454
    • Steffen Klassert's avatar
      esp4: Use the IPsec protocol multiplexer API · 827789cb
      Steffen Klassert authored
      Switch esp4 to use the new IPsec protocol multiplexer.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      827789cb
    • Steffen Klassert's avatar
      xfrm4: Add IPsec protocol multiplexer · 3328715e
      Steffen Klassert authored
      This patch add an IPsec protocol multiplexer. With this
      it is possible to add alternative protocol handlers as
      needed for IPsec virtual tunnel interfaces.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      3328715e
    • Florian Fainelli's avatar
      net: bcmgenet: remove unused bh_lock member · 51adfcc3
      Florian Fainelli authored
      bh_lock spinlock is unused, remove it from the private driver structure.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51adfcc3
    • Florian Fainelli's avatar
      net: bcmgenet: remove commented code in bcmgenet_xmit() · da56bbf7
      Florian Fainelli authored
      This code is commented since it is unused, left-over from the very first
      time this driver was merged.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da56bbf7
    • Florian Fainelli's avatar
      net: bcmgenet: drop checks on priv->phydev · 80d8e96d
      Florian Fainelli authored
      Drop all the checks on priv->phydev since we will refuse probing the
      driver if we cannot attach to a PHY device. Drop all checks on
      priv->phydev. This also fixes some smatch issues reported by Dan
      Carpenter.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80d8e96d
    • David S. Miller's avatar
      Merge branch 'gianfar' · 432c5b3a
      David S. Miller authored
      Claudiu Manoil says:
      
      ====================
      gianfar: Device reset and reconfig fixes
      
      These patches end up fixing some notable device reset & reconfig
      related problems.  One issue is on-the-fly (Rx/Tx on) programming
      of interrupt coalescing (IC) registers on the processing path,
      against HW recommendation.  This is an old issue that became visible
      after BQL introduction, as under certain conditions (low traffic)
      one TX interrupt gets lost and BQL fires Tx timeout as a result.
      Another notable issue is a race on the Tx path (xmit, clean_tx)
      during device reset (i.e. during Tx timeout watchdog firing)
      that leads to NULL access.
      Fixing the problematic on-thy-fly register writes (i.e. the IC regs)
      required the implementation of a MAC soft reset procedure.
      The race leading to NULL access was addressed by fixing the
      stop_gfar()/startup_gfar() pair (disable/enable napi a.s.o.)
      and adding the device state DOWN to sync with the TX path.
      
      v2: Refactored if() clauses from gfar_set_features(), PATCH 2.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      432c5b3a
    • Claudiu Manoil's avatar
      gianfar: Fix Tx int miss, dont write IC on-the-fly · f19015ba
      Claudiu Manoil authored
      Programming the interrupt coalescing (IC) registers while
      the controller/DMA is on may incur the loss of one Tx
      confirmation interrupt, under certain conditions.  This is
      a subtle hw race because it does not occur during a burst
      of Tx packets.  It has been observed on p2020 devices that,
      if just one packet is being xmit'ed, the Tx confirmation
      doesn't trigger and BQL evetually blocks the Tx queues,
      followed by Tx timeout and an un-responsive device.
      This issue was not apparent prior to introducing BQL
      support, as a late Tx confirmation was not an issue back then
      and the next burst of Tx frames would have triggered the
      Tx confirmation/ Tx ring cleanup anyway.
      
      Bottom line, the hw specifications state that the IC registers
      should not be programmed while the Rx/Tx blocks (the DMA) are
      enabled. Further more, these registers are currently re-written
      with the same values on the processing path, over and over again.
      To fix this, rewriting the IC registers has been removed from
      the processing path (napi poll).  A complete MAC reset procedure
      has been implemented for the ethtool -c option instead, to
      reliably update these registers while the controller is stopped.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f19015ba
    • Claudiu Manoil's avatar
      gianfar: Fix device reset races (oops) for Tx · 0851133b
      Claudiu Manoil authored
      The device reset procedure, stop_gfar()/startup_gfar(), has
      concurrency issues.
      "Kernel access of bad area" oopses show up during Tx timeout
      device reset or other reset cases (like changing MTU) that
      happen while the interface still has traffic. The oopses
      happen in start_xmit and clean_tx_ring when accessing tx_queue->
      tx_skbuff which is NULL. The race comes from de-allocating the
      tx_skbuff while transmission and napi processing are still
      active. Though the Tx queues get temoprarily stopped when Tx
      timeout occurs, they get re-enabled as a result of Tx congestion
      handling inside the napi context (see clean_tx_ring()). Not
      disabling the napi during reset is also a bug, because
      clean_tx_ring() will try to access tx_skbuff while it is being
      de-alloc'ed and re-alloc'ed.
      
      To fix this, stop_gfar() needs to disable napi processing
      after stopping the Tx queues. However, in order to prevent
      clean_tx_ring() to re-enable the Tx queue before the napi
      gets disabled, the device state DOWN has been introduced.
      It prevents the Tx congestion management from re-enabling the
      de-congested Tx queue while the device is brought down.
      An additional locking state, RESETTING, has been introduced
      to prevent simultaneous resets or to prevent configuring the
      device while it is resetting.
      The bogus 'rxlock's (for each Rx queue) have been removed since
      their purpose is not justified, as they don't prevent nor are
      suited to prevent device reset/reconfig races (such as this one).
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0851133b
    • Claudiu Manoil's avatar
      gianfar: Don't free/request irqs on device reset · 80ec396c
      Claudiu Manoil authored
      Resetting the device (stop_gfar()/startup_gfar()) should
      be fast and to the point, in order to timely recover
      from an error condition (like Tx timeout) or during
      device reconfig.  The irq free/ request routines are just
      redundant here, and they should be part of the device
      close/ open routines instead.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80ec396c
    • Claudiu Manoil's avatar
      gianfar: Fix on-the-fly vlan and mtu updates · 88302648
      Claudiu Manoil authored
      The RCTRL and TCTRL registers should not be changed
      on-the-fly, while the controller is running, otherwise
      unexpected behaviour occurs.  But that's exactly what
      gfar_vlan_mode() does, updating the VLAN acceleration
      bits inside RCTRL/TCTRL.  The attempt to lock these
      operations doesn't help, but only adds to the confusion.
      There's also a dependency for Rx FCB insertion (activating
      /de-activating the TOE offload block on Rx) which might
      change the required rx buffer size.  This makes matters
      worse as gfar_vlan_mode() ends up calling gfar_change_mtu(),
      though the MTU size remains the same.  Note that there are
      other situations that may affect the required rx buffer size,
      like changing RXCSUM or rx hw timestamping, but errorneously
      the rx buffer size is not recomputed/ updated in the process.
      
      To fix this, do the vlan updates properly inside the MAC
      reset and reconfiguration procedure, which takes care of
      the rx buffer size dependecy and the rx TOE block (PRSDEP)
      activation/deactivation as well (in the correct order).
      As a consequence, MTU/ rx buff size updates are done now
      by the same MAC reset and reconfig procedure, so that out
      of context updates to MAXFRM, MRBLR, and MACCFG inside
      change_mtu() are no longer needed.  The rx buffer size
      dependecy to Rx FCB is now handled for the other cases too
      (RXCSUM and rx hw timestamping).
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88302648
    • Claudiu Manoil's avatar
      gianfar: Implement MAC reset and reconfig procedure · a328ac92
      Claudiu Manoil authored
      The main MAC config registers like: RCTRL/TCTRL, MRBLR,
      MAXFRM, RXIC/TXIC, most fields of MACCFG1/2, should not
      be changed on-the-fly, but at least after stopping the
      DMA and disabling the Rx/Tx blocks and, for increased
      reliability, after a MAC soft reset.
      
      Impelement a complete MAC soft reset and reconfig procedure
      following the latest HW advisories - gfar_mac_reset() - to
      replace gfar_mac_init() and (the confusing) init_registers()
      functions.
      
      Factor out separate config functions for RCTRL and TCTRL,
      insure programming order of the relevant config regs after
      MAC soft reset.
      
      Split gfar_hw_init() into gfar_mac_reset() and the remaining
      global regs that don't need to be reconfigured after MAC soft
      reset (FIFOCFG, ATTRELI, HW counters a.s.o).
      
      As gfar_hw_init() now makes all the register writes @probe()
      time, based on all the device flags and config options, it
      must be moved further down, just before register_netdev(),
      as the last config step when the config values are comitted
      to HW.  Also, move netif_carrier_off() after register_netdev(),
      because it has no effect if called before.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a328ac92
    • David S. Miller's avatar
    • Fabio Estevam's avatar
      net: bcmgenet: Use devm_ioremap_resource() · 5343a10d
      Fabio Estevam authored
      According to Documentation/driver-model/devres.txt, devm_request_and_ioremap()
      is deprecated, so use devm_ioremap_resource() instead.
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5343a10d
    • Joe Perches's avatar
      bridge: netfilter: Use ether_addr_copy · 04091142
      Joe Perches authored
      Convert the uses of memcpy to ether_addr_copy because
      for some architectures it is smaller and faster.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04091142
    • Joe Perches's avatar
      bridge: Use ether_addr_copy and ETH_ALEN · e5a727f6
      Joe Perches authored
      Convert the more obvious uses of memcpy to ether_addr_copy.
      
      There are still uses of memcpy that could be converted but
      these addresses are __aligned(2).
      
      Convert a couple uses of 6 in gr_private.h to ETH_ALEN.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5a727f6
    • Ben Hutchings's avatar
      cgxb4: Stop using ethtool SPEED_* constants · e8b39015
      Ben Hutchings authored
      ethtool speed values are just numbers of megabits and there is no need
      to add SPEED_40000.  To be consistent, use integer constants directly
      for all speeds.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8b39015
    • Daniel Borkmann's avatar
      tools: bpf_dbg: various misc code cleanups · 7debf780
      Daniel Borkmann authored
      Lets clean up bpf_dbg a bit and improve its code slightly
      in various areas: i) Get rid of some macros as there's no
      good reason for keeping them, ii) remove one unused variable
      and reduce scope of various variables found by cppcheck,
      iii) Close non-default file descriptors when exiting the shell.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7debf780
    • Daniel Borkmann's avatar
      loopback: sctp: add NETIF_F_SCTP_CSUM to device features · b17c7069
      Daniel Borkmann authored
      Drivers are allowed to set NETIF_F_SCTP_CSUM if they have
      hardware crc32c checksumming support for the SCTP protocol.
      Currently, NETIF_F_SCTP_CSUM flag is available in igb,
      ixgbe, i40e/i40evf drivers and for vlan devices.
      
      If we don't have NETIF_F_SCTP_CSUM then crc32c is done
      through CPU instructions, invoked from crypto layer, or
      if not available as slow-path fallback in software.
      
      Currently, loopback device propagates checksum offloading
      feature flags in dev->features, but is missing SCTP checksum
      offloading. Therefore, account for NETIF_F_SCTP_CSUM as
      well.
      
      Before patch:
      
      ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
      SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
      4194304 4194304   4096    10.00    4683.50
      
      After patch:
      
      ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
      SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
      4194304 4194304   4096    10.00    15348.26
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b17c7069
  2. 24 Feb, 2014 17 commits