1. 04 Nov, 2013 39 commits
    • David S. Miller's avatar
      Merge branch 'mlx4' · 5a6e55c4
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      Mellanox driver updates
      
      This patch set from Jack Morgenstein does the following:
      
      1. Fix MAC/VLAN SRIOV implementation, and add wrapper functions for VLAN allocation
         and de-allocation (patches 1-6).
      
      2. Implements resource quotas when running under SRIOV (patches 7-10).
         Patch 7 is a small bug fix, and patches 8-10 implement the quotas.
      
      Quotas are implemented per resource type for VFs and the PF, to prevent
      any entity from simply grabbing all the resources for itself and leaving
      the other entities unable to obtain such resources.
      
      The series is against net-next commit ba486502 "ipv6: remove the unnecessary statement in find_match()"
      
      changes from V0:
       - dropped the 1st patch which needs to go to -stable and hence through net,
         not net-next
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a6e55c4
    • Jack Morgenstein's avatar
      net/mlx4_core: Implement resource quota enforcement · 146f3ef4
      Jack Morgenstein authored
      Implements resource quota grant decision when resources are requested,
      for the following resources:  QPs, CQs, SRQs, MPTs, MTTs, vlans, MACs,
      and Counters.
      
      When granting a resource, the quota system increases the allocated-count
      for that slave.
      
      When the slave later frees the resource, its allocated-count is reduced.
      
      A spinlock is used to protect the integrity of each resource's free-pool counter.
      (One slave may be in the process of being granted a resource while another
      slave has crashed, initiating cleanup of that slave's resource quotas).
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      146f3ef4
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix quota handling in the QUERY_FUNC_CAP wrapper · eb456a68
      Jack Morgenstein authored
      In current kernels, the mlx4 driver running on a VM does not
      differentiate between max resource numbers for the HCA and
      max quotas -- it simply takes the quota values passed to it
      as max-resource values.
      
      However, the driver actually requires the VFs to be aware of
      the actual number of resources that the HCA was initialized with,
      for QPs, CQs, SRQs and MPTs.
      
      For QPs, CQs and SRQs, the reason is that in completion handling
      the driver must know which of the 24 bits are the actual resource
      number, and which are "padding" bits.
      
      For MPTs, also, the driver assumes knowledge of the number of MPTs
      in the system.
      
      The previous commit fixes the quota logic on the VM for the quota values
      passed to it by QUERY_FUNC_CAPS.
      
      For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers
      from QUERY_HCA (and not QUERY_FUNC_CAPS).  The quotas passed
      in QUERY_FUNC_CAPS are used to report max resource number values
      in the response to ib_query_device.
      
      However, the Hypervisor driver must consider that VMs
      may be running previous kernels, and compatibility must be preserved.
      
      To resolve the incompatibility with previous kernels running on VMs,
      we deprecated the quota fields in mlx4_QUERY_FUNC_CAP.  In the
      deprecated fields, we pass the max-resource values from INIT_HCA
      
      The quota fields are moved to a new location, and the current kernel
      driver takes the proper values from that location. There is
      also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox;
      if this flag is set, the (VM) driver takes the quota values from the
      new location.
      
      VMs running previous kernels will work properly, except that the max resource
      numbers reported in ib_query_device for these resources will be
      too high.  The Hypervisor driver will, however, enforce the quotas
      for these VMs.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb456a68
    • Jack Morgenstein's avatar
      mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61
      Jack Morgenstein authored
      This is step #1 for implementing SRIOV resource quotas for VFs.
      
      Quotas are implemented per resource type for VFs and the PF, to prevent
      any entity from simply grabbing all the resources for itself and leaving
      the other entities unable to obtain such resources.
      
      Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                                   VLAN, and Counters.
      
      The quota system works as follows:
      Each entity (VF or PF) is given a max number of a given resource (its quota),
      and a guaranteed minimum number for each resource (starvation prevention).
      
      For QPs, CQs, SRQs, MPTs and MTTs:
      50% of the available quantity for the resource is divided equally among
      the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
      parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
      The other 50% is the "free pool", allocated on a first-come-first-serve basis.
      For each VF/PF, resources are first allocated from its "guaranteed-minimum"
      pool. When that pool is exhausted, the driver attempts to allocate from
      the resource "free-pool".
      
      The quota (i.e., max) for the VFs and the PF is:
        The free-pool amount (50% of the real max) + the guaranteed minimum
      
      For MACs:
        Guarantee 2 MACs per VF/PF per port. As a result, since we have only
        128 MACs per port, reduce the allowable number of VFs from 64 to 63.
        Any remaining MACs are put into a free pool.
      
      For VLANs:
        For the PF, the per-port quota is 128 and guarantee is 64
           (to allow the PF to register at least a VLAN per VF in VST mode).
        For the VFs, the per-port quota is 64 and the guarantee is 0.
            We assume that VGT VFs are trusted not to abuse the VLAN resource.
      
      For Counters:
        For all functions (PF and VFs), the quota is 128 and the guarantee is 0.
      
      In this patch, we define the needed structures, which are added to the
      resource-tracker struct.  In addition, we do initialization
      for the resource quota, and adjust the query_device response to use quotas
      rather than resource maxima.
      
      As part of the implementation, we introduce a new field in
      mlx4_dev: quotas.  This field holds the resource quotas used
      to report maxima to the upper layers (ib_core, via query_device).
      
      The HCA maxima of these values are passed to the VFs (via
      QUERY_HCA) so that they may continue to use these in handling
      QPs, CQs, SRQs and MPTs.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a0d0a61
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix checking order in MR table init · a30f1bc5
      Jack Morgenstein authored
      In procedure mlx4_init_mr_table(), slaves should do no processing,
      but should return success. This initialization is hypervisor-only.
      
      However, the check for num_mpts being a power-of-2 was performed
      before the check to return immediately if the driver is for a slave.
      This resulted in spurious failures.
      
      The order of performing the checks is reversed, so that if the
      driver is for a slave, no processing is done and success is returned.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a30f1bc5
    • Jack Morgenstein's avatar
      net/mlx4_core: Don't fail reg/unreg vlan for older guests · 2c957ff2
      Jack Morgenstein authored
      In upstream kernels under SRIOV, the vlan register/unregister calls
      were NOPs (doing nothing and returning OK). We detect these old
      calls from guests (via the comm channel), since previously the
      port number in mlx4_register_vlan was passed (improperly) in the
      out_param. This has been corrected so that the port number is now
      passed in bits 8..15 of the in_modifier field.
      
      For old calls, these bits will be zero, so if the passed port
      number is zero, we can still look at the out_param field to see
      if it contains a valid port number. If yes, the VM is running
      an old driver.
      
      Since for old drivers, the register/unregister_vlan wrappers were
      NOPs, we continue this policy -- the reason being that upstream
      had an additional bug in eth driver running on guests (where
      procedure mlx4_en_vlan_rx_kill_vid() had the following code:
      
      if (!mlx4_find_cached_vlan(mdev->dev, priv->port, vid, &idx))
              mlx4_unregister_vlan(mdev->dev, priv->port, idx);
      else
              en_err(priv, "could not find vid %d in cache\n", vid);
      
      On a VM, mlx4_find_cached_vlan() will always fail, since the
      vlan cache is located on the Hypervisor; on guests it is empty.
      
      Therefore, if we allow upstream guests to register vlans, we will
      have vlan leakage since the unregister will never be performed.
      Leaving vlan reg/unreg for old guest drivers as a NOP is not a
      feature regression, since in upstream the register/unregister
      vlan wrapper is a NOP.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c957ff2
    • Jack Morgenstein's avatar
      net/mlx4_core: Resource tracker for reg/unreg vlans · 4874080d
      Jack Morgenstein authored
      Add resource tracker support for reg/unreg vlans calls done by VFs.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4874080d
    • Jack Morgenstein's avatar
      net/mlx4_en: Use vlan id instead of vlan index for unregistration · 2009d005
      Jack Morgenstein authored
      Use of vlan_index created problems unregistering vlans on guests.
      
      In addition, tools delete vlan by tag, not by index, lets follow that.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2009d005
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix reg/unreg vlan/mac to conform to the firmware spec · acddd5dd
      Jack Morgenstein authored
      The functions mlx4_register_vlan, mlx4_unregister_vlan, mlx4_register_mac,
      mlx4_unregister_mac all made illegal use of the out_param in multifunc mode
      to pass the port number. The firmware spec specifies that the port number
      should be passed in bits 8..15 of the input-modifier field for ALLOC_RES and
      FREE_RES (sections 20.15.1 and 20.15.2).
      
      For MAC register/unregister, this patch contains workarounds so that guests
      running previous kernels continue to work on a new Hypervisor, and guests
      running the new kernel will continue to work on old hypervisors.
      
      Vlan registeration capability is still not operational in multifunction mode,
      since the vlan wrapper functions are not implemented in this patch.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acddd5dd
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix register/unreg vlan flow · 162226a1
      Jack Morgenstein authored
      The reg/unreg vlan code was broken:
      
      1. a wrapped function called another wrapped function, causing a deadlock.
      
      2. unregister_vlan called cmd_box instead of cmd_box_imm, leading to
         incorrectly passed parameters.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      162226a1
    • Sergei Shtylyov's avatar
      sh_eth: check platform data pointer · 3b4c5cbf
      Sergei Shtylyov authored
      Check the platform data pointer before dereferencing it and error out of the
      probe() method if it's NULL.
      
      This has additional effect of preventing kernel oops with outdated platform data
      containing zero PHY address instead (such as on SolutionEngine7710).
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Acked-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b4c5cbf
    • David S. Miller's avatar
      Merge branch 'usbnet' · 6f9fed0b
      David S. Miller authored
      Bjørn Mork says:
      
      ====================
      cdc_mbim + qmi_wwan trivial fixes
      
      This series fixes three problems Oliver pointed out during the
      review of the new huawei_cdc_ncm driver:
      http://patchwork.ozlabs.org/patch/278903/
      
      That innocent driver only used cdc_mbim as a blueprint, and
      all the blame should really have gone to me....
      
      I do have a similar fix for the manage_power issue in the
      cdc-wdm USB class driver as well.  It will be submitted to
      linux-usb as soon as Greg opens up his mailbox again :-)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f9fed0b
    • Bjørn Mork's avatar
      net: cdc_mbim: fixup error return value · e62416e8
      Bjørn Mork authored
      Reported-by: default avatarOliver Neukum <oneukum@suse.de>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e62416e8
    • Bjørn Mork's avatar
    • Bjørn Mork's avatar
    • Bjørn Mork's avatar
    • Bjørn Mork's avatar
    • David S. Miller's avatar
      Merge branch 'qlcnic' · 96635fbd
      David S. Miller authored
      Himanshu Madhani says:
      
      ====================
      qlcnic: Multiple Tx queue support and code refactoring
      
      This Patch series contains following changes
      
      o Refactored code to calculate, validate and assign Tx/SDS rings for  various modes of driver.
      o Enhanced ethtool statistics for multi Tx queue on all supported adapters.
      o Enable multiple Tx queue for 83xx and 84xx Series adapters.
      o Register netdev for failed device state.
      
      changes from v1 -> v2
      o Dropped patch to replace inappropriate usage of kzalloc() with vzalloc().
      
      Please apply to net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96635fbd
    • Himanshu Madhani's avatar
      db62d7d9
    • Himanshu Madhani's avatar
      qlcnic: Enable multiple Tx queue support for 83xx/84xx Series adapters. · 18afc102
      Himanshu Madhani authored
      o 83xx and 84xx firmware is capable of multiple Tx queues.
        This patch will enable multiple Tx queues for 83xx/84xx
        series adapters. Max number of Tx queues supported will be 8.
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18afc102
    • Himanshu Madhani's avatar
      qlcnic: refactor Tx/SDS ring calculation and validation in driver. · 34e8c406
      Himanshu Madhani authored
      o Current driver has duplicate code for validating user input
        for changing Tx/SDS rings using set_channel ethtool interface.
        This patch removes duplicate code and refactored Tx/SDS ring
        validation for 82xx/83xx/84xx series adapter.
      o Refactored code now calculates maximum Tx/Rx ring driver can
        support based on Default, NPAR and SRIOV PF/VF mode of driver.
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34e8c406
    • Himanshu Madhani's avatar
      qlcnic: Enhance ethtool Statistics for Multiple Tx queue. · f27c75b3
      Himanshu Madhani authored
      o Enhance ethtool statistics to display multiple Tx queue stats for
        all supported adapters.
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f27c75b3
    • Sucheta Chakraborty's avatar
      qlcnic: Register netdev in FAILED state for 83xx/84xx · 78ea2d97
      Sucheta Chakraborty authored
      o Without failing probe, register netdev when device is in FAILED state.
      o Device will come up with minimum functionality and allow diagnostics and
        repair of the adapter.
      Signed-off-by: default avatarSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78ea2d97
    • Daniel Borkmann's avatar
      lib: crc32: reduce number of cases for crc32{, c}_combine · 16514839
      Daniel Borkmann authored
      We can safely reduce the number of test cases by a tenth.
      There is no particular need to run as many as we're running
      now for crc32{,c}_combine, that gives us still ~8000 tests
      we're doing if people run kernels with crc selftests enabled
      which is perfectly fine.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16514839
    • Daniel Borkmann's avatar
      lib: crc32: conditionally resched when running testcases · cc0ac199
      Daniel Borkmann authored
      Fengguang reports that when crc32 selftests are running on startup, on
      some e.g. 32bit systems, we can get a CPU stall like "INFO: rcu_sched
      self-detected stall on CPU { 0} (t=2101 jiffies g=4294967081 c=4294967080
      q=41)". As this is not intended, add a cond_resched() at the end of a
      test case to fix it. Introduced by efba721f ("lib: crc32: add test cases
      for crc32{, c}_combine routines").
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc0ac199
    • Daniel Borkmann's avatar
      net: checksum: fix warning in skb_checksum · cea80ea8
      Daniel Borkmann authored
      This patch fixes a build warning in skb_checksum() by wrapping the
      csum_partial() usage in skb_checksum(). The problem is that on a few
      architectures, csum_partial is used with prefix asmlinkage whereas
      on most architectures it's not. So fix this up generically as we did
      with csum_block_add_ext() to match the signature. Introduced by
      2817a336 ("net: skb_checksum: allow custom update/combine for
      walking skb").
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cea80ea8
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 394efd19
      David S. Miller authored
      Conflicts:
      	drivers/net/ethernet/emulex/benet/be.h
      	drivers/net/netconsole.c
      	net/bridge/br_private.h
      
      Three mostly trivial conflicts.
      
      The net/bridge/br_private.h conflict was a function signature (argument
      addition) change overlapping with the extern removals from Joe Perches.
      
      In drivers/net/netconsole.c we had one change adjusting a printk message
      whilst another changed "printk(KERN_INFO" into "pr_info(".
      
      Lastly, the emulex change was a new inline function addition overlapping
      with Joe Perches's extern removals.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      394efd19
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · be408cd3
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "I'm sending a pull request of these lingering bug fixes for networking
        before the normal merge window material because some of this stuff I'd
        like to get to -stable ASAP"
      
       1) cxgb3 stopped working on 32-bit machines, fix from Ben Hutchings.
      
       2) Structures passed via netlink for netfilter logging are not fully
          initialized.  From Mathias Krause.
      
       3) Properly unlink upper openvswitch device during notifications, from
          Alexei Starovoitov.
      
       4) Fix race conditions involving access to the IP compression scratch
          buffer, from Michal Kubrecek.
      
       5) We don't handle the expiration of MTU information contained in ipv6
          routes sometimes, fix from Hannes Frederic Sowa.
      
       6) With Fast Open we can miscompute the TCP SYN/ACK RTT, from Yuchung
          Cheng.
      
       7) Don't take TCP RTT sample when an ACK doesn't acknowledge new data,
          also from Yuchung Cheng.
      
       8) The decreased IPSEC garbage collection threshold causes problems for
          some people, bump it back up.  From Steffen Klassert.
      
       9) Fix skb->truesize calculated by tcp_tso_segment(), from Eric
          Dumazet.
      
      10) flow_dissector doesn't validate packet lengths sufficiently, from
          Jason Wang
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
        net/mlx4_core: Fix call to __mlx4_unregister_mac
        net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb
        net: flow_dissector: fail on evil iph->ihl
        xfrm: Fix null pointer dereference when decoding sessions
        can: kvaser_usb: fix usb endpoints detection
        can: c_can: Fix RX message handling, handle lost message before EOB
        doc:net: Fix typo in Documentation/networking
        bgmac: don't update slot on skb alloc/dma mapping error
        ibm emac: Fix locking for enable/disable eob irq
        ibm emac: Don't call napi_complete if napi_reschedule failed
        virtio-net: correctly handle cpu hotplug notifier during resuming
        bridge: pass correct vlan id to multicast code
        net: x25: Fix dead URLs in Kconfig
        netfilter: xt_NFQUEUE: fix --queue-bypass regression
        xen-netback: use jiffies_64 value to calculate credit timeout
        cxgb3: Fix length calculation in write_ofld_wr() on 32-bit architectures
        bnx2x: Disable VF access on PF removal
        bnx2x: prevent FW assert on low mem during unload
        tcp: gso: fix truesize tracking
        xfrm: Increase the garbage collector threshold
        ...
      be408cd3
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix call to __mlx4_unregister_mac · c32b7dfb
      Jack Morgenstein authored
      In function mlx4_master_deactivate_admin_state() __mlx4_unregister_mac was
      called using the MAC index. It should be called with the value of the MAC itself.
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c32b7dfb
    • David S. Miller's avatar
      Merge branch 'fixes-for-3.12' of git://gitorious.org/linux-can/linux-can · e9b51a19
      David S. Miller authored
      Marc Kleine-Budde says:
      
      ====================
      I have two late fixes for the v3.12 release:
      
      The first patch fixes a problem in the c_can's RX message handling, which can
      lead to an endless interrupt loop under heavy load if messages are lost. The
      second patch is by Olivier Sobrie and fixes the endpoint detection of the
      kvaser_usb driver, which is needed for some devices.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9b51a19
    • Daniel Borkmann's avatar
      net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb · 7926c1d5
      Daniel Borkmann authored
      Introduced in f9e42b85 ("net: sctp: sideeffect: throw BUG if
      primary_path is NULL"), we intended to find a buggy assoc that's
      part of the assoc hash table with a primary_path that is NULL.
      However, we better remove the BUG_ON for now and find a more
      suitable place to assert for these things as Mark reports that
      this also triggers the bug when duplication cookie processing
      happens, and the assoc is not part of the hash table (so all
      good in this case). Such a situation can for example easily be
      reproduced by:
      
        tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
        tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
        tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
                  protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2
      
      This drops 20% of COOKIE-ACK packets. After some follow-up
      discussion with Vlad we came to the conclusion that for now we
      should still better remove this BUG_ON() assertion, and come up
      with two follow-ups later on, that is, i) find a more suitable
      place for this assertion, and possibly ii) have a special
      allocator/initializer for such kind of temporary assocs.
      Reported-by: default avatarMark Thomas <Mark.Thomas@metaswitch.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7926c1d5
    • Arvid Brodin's avatar
      net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) · f421436a
      Arvid Brodin authored
      High-availability Seamless Redundancy ("HSR") provides instant failover
      redundancy for Ethernet networks. It requires a special network topology where
      all nodes are connected in a ring (each node having two physical network
      interfaces). It is suited for applications that demand high availability and
      very short reaction time.
      
      HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
      send special HSR frames in both directions over the ring. The driver creates
      virtual network interfaces that can be used just like any ordinary Linux
      network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
      must be HSR capable.
      
      This code is a "best effort" to comply with the HSR standard as described in
      IEC 62439-3:2010 (HSRv0).
      Signed-off-by: default avatarArvid Brodin <arvid.brodin@xdin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f421436a
    • Eric Dumazet's avatar
      net: extend net_device allocation to vmalloc() · 74d332c1
      Eric Dumazet authored
      Joby Poriyath provided a xen-netback patch to reduce the size of
      xenvif structure as some netdev allocation could fail under
      memory pressure/fragmentation.
      
      This patch is handling the problem at the core level, allowing
      any netdev structures to use vmalloc() if kmalloc() failed.
      
      As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
      to kzalloc() flags to do this fallback only when really needed.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJoby Poriyath <joby.poriyath@citrix.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74d332c1
    • David S. Miller's avatar
      Merge branch 'sctp_csum' · b397f999
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      SCTP fix/updates
      
      Please see patch 5 for the main description/motivation, the rest just
      brings in the needed functionality for that. Although this is actually
      a fix, I've based it against net-next as some additional work for
      fixing it was needed.
      ====================
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b397f999
    • Daniel Borkmann's avatar
      net: sctp: fix and consolidate SCTP checksumming code · e6d8b64b
      Daniel Borkmann authored
      This fixes an outstanding bug found through IPVS, where SCTP packets
      with skb->data_len > 0 (non-linearized) and empty frag_list, but data
      accumulated in frags[] member, are forwarded with incorrect checksum
      letting SCTP initial handshake fail on some systems. Linearizing each
      SCTP skb in IPVS to prevent that would not be a good solution as
      this leads to an additional and unnecessary performance penalty on
      the load-balancer itself for no good reason (as we actually only want
      to update the checksum, and can do that in a different/better way
      presented here).
      
      The actual problem is elsewhere, namely, that SCTP's checksumming
      in sctp_compute_cksum() does not take frags[] into account like
      skb_checksum() does. So while we are fixing this up, we better reuse
      the existing code that we have anyway in __skb_checksum() and use it
      for walking through the data doing checksumming. This will not only
      fix this issue, but also consolidates some SCTP code with core
      sk_buff code, bringing it closer together and removing respectively
      avoiding reimplementation of skb_checksum() for no good reason.
      
      As crc32c() can use hardware implementation within the crypto layer,
      we leave that intact (it wraps around / falls back to e.g. slice-by-8
      algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
      combinator for crc32c blocks.
      
      Also, we remove all other SCTP checksumming code, so that we only
      have to use sctp_compute_cksum() from now on; for doing that, we need
      to transform SCTP checkumming in output path slightly, and can leave
      the rest intact.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6d8b64b
    • Daniel Borkmann's avatar
      net: skb_checksum: allow custom update/combine for walking skb · 2817a336
      Daniel Borkmann authored
      Currently, skb_checksum walks over 1) linearized, 2) frags[], and
      3) frag_list data and calculats the one's complement, a 32 bit
      result suitable for feeding into itself or csum_tcpudp_magic(),
      but unsuitable for SCTP as we're calculating CRC32c there.
      
      Hence, in order to not re-implement the very same function in
      SCTP (and maybe other protocols) over and over again, use an
      update() + combine() callback internally to allow for walking
      over the skb with different algorithms.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2817a336
    • Daniel Borkmann's avatar
      lib: crc32: add test cases for crc32{, c}_combine routines · efba721f
      Daniel Borkmann authored
      We already have 100 test cases for crcs itself, so split the test
      buffer with a-prio known checksums, and test crc of two blocks
      against crc of the whole block for the same results.
      
      Output/result with CONFIG_CRC32_SELFTEST=y:
      
        [    2.687095] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
        [    2.687097] crc32: self tests passed, processed 225944 bytes in 278177 nsec
        [    2.687383] crc32c: CRC_LE_BITS = 64
        [    2.687385] crc32c: self tests passed, processed 225944 bytes in 141708 nsec
        [    7.336771] crc32_combine: 113072 self tests passed
        [   12.050479] crc32c_combine: 113072 self tests passed
        [   17.633089] alg: No test for crc32 (crc32-pclmul)
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efba721f
    • Daniel Borkmann's avatar
      lib: crc32: add functionality to combine two crc32{, c}s in GF(2) · 6e95fcaa
      Daniel Borkmann authored
      This patch adds a combinator to merge two or more crc32{,c}s
      into a new one. This is useful for checksum computations of
      fragmented skbs that use crc32/crc32c as checksums.
      
      The arithmetics for combining both in the GF(2) was taken and
      slightly modified from zlib. Only passing two crcs is insufficient
      as two crcs and the length of the second piece is needed for
      merging. The code is made generic, so that only polynomials
      need to be passed for crc32_le resp. crc32c_le.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e95fcaa
    • Daniel Borkmann's avatar
      lib: crc32: clean up spacing in test cases · d921e049
      Daniel Borkmann authored
      This is nothing more but a whitepace cleanup, as 80 chars is not a
      hard but soft limit, and otherwise makes the test cases array really
      look ugly. So fix it up.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d921e049
  2. 03 Nov, 2013 1 commit