Commits · 5a6e55c461db3364aa5be919101db972bc859133 · Kirill Smelkov / linux

04 Nov, 2013 39 commits

David S. Miller authored Nov 04, 2013

Or Gerlitz says:

====================
Mellanox driver updates

This patch set from Jack Morgenstein does the following:

1. Fix MAC/VLAN SRIOV implementation, and add wrapper functions for VLAN allocation
   and de-allocation (patches 1-6).

2. Implements resource quotas when running under SRIOV (patches 7-10).
   Patch 7 is a small bug fix, and patches 8-10 implement the quotas.

Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.

The series is against net-next commit ba486502 "ipv6: remove the unnecessary statement in find_match()"

changes from V0:
 - dropped the 1st patch which needs to go to -stable and hence through net,
   not net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5a6e55c4

net/mlx4_core: Implement resource quota enforcement · 146f3ef4

Jack Morgenstein authored Nov 03, 2013

Implements resource quota grant decision when resources are requested,
for the following resources:  QPs, CQs, SRQs, MPTs, MTTs, vlans, MACs,
and Counters.

When granting a resource, the quota system increases the allocated-count
for that slave.

When the slave later frees the resource, its allocated-count is reduced.

A spinlock is used to protect the integrity of each resource's free-pool counter.
(One slave may be in the process of being granted a resource while another
slave has crashed, initiating cleanup of that slave's resource quotas).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

146f3ef4

net/mlx4_core: Fix quota handling in the QUERY_FUNC_CAP wrapper · eb456a68

Jack Morgenstein authored Nov 03, 2013

In current kernels, the mlx4 driver running on a VM does not
differentiate between max resource numbers for the HCA and
max quotas -- it simply takes the quota values passed to it
as max-resource values.

However, the driver actually requires the VFs to be aware of
the actual number of resources that the HCA was initialized with,
for QPs, CQs, SRQs and MPTs.

For QPs, CQs and SRQs, the reason is that in completion handling
the driver must know which of the 24 bits are the actual resource
number, and which are "padding" bits.

For MPTs, also, the driver assumes knowledge of the number of MPTs
in the system.

The previous commit fixes the quota logic on the VM for the quota values
passed to it by QUERY_FUNC_CAPS.

For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers
from QUERY_HCA (and not QUERY_FUNC_CAPS).  The quotas passed
in QUERY_FUNC_CAPS are used to report max resource number values
in the response to ib_query_device.

However, the Hypervisor driver must consider that VMs
may be running previous kernels, and compatibility must be preserved.

To resolve the incompatibility with previous kernels running on VMs,
we deprecated the quota fields in mlx4_QUERY_FUNC_CAP.  In the
deprecated fields, we pass the max-resource values from INIT_HCA

The quota fields are moved to a new location, and the current kernel
driver takes the proper values from that location. There is
also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox;
if this flag is set, the (VM) driver takes the quota values from the
new location.

VMs running previous kernels will work properly, except that the max resource
numbers reported in ib_query_device for these resources will be
too high.  The Hypervisor driver will, however, enforce the quotas
for these VMs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb456a68

mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61

Jack Morgenstein authored Nov 03, 2013

This is step #1 for implementing SRIOV resource quotas for VFs.

Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.

Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                             VLAN, and Counters.

The quota system works as follows:
Each entity (VF or PF) is given a max number of a given resource (its quota),
and a guaranteed minimum number for each resource (starvation prevention).

For QPs, CQs, SRQs, MPTs and MTTs:
50% of the available quantity for the resource is divided equally among
the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
The other 50% is the "free pool", allocated on a first-come-first-serve basis.
For each VF/PF, resources are first allocated from its "guaranteed-minimum"
pool. When that pool is exhausted, the driver attempts to allocate from
the resource "free-pool".

The quota (i.e., max) for the VFs and the PF is:
  The free-pool amount (50% of the real max) + the guaranteed minimum

For MACs:
  Guarantee 2 MACs per VF/PF per port. As a result, since we have only
  128 MACs per port, reduce the allowable number of VFs from 64 to 63.
  Any remaining MACs are put into a free pool.

For VLANs:
  For the PF, the per-port quota is 128 and guarantee is 64
     (to allow the PF to register at least a VLAN per VF in VST mode).
  For the VFs, the per-port quota is 64 and the guarantee is 0.
      We assume that VGT VFs are trusted not to abuse the VLAN resource.

For Counters:
  For all functions (PF and VFs), the quota is 128 and the guarantee is 0.

In this patch, we define the needed structures, which are added to the
resource-tracker struct.  In addition, we do initialization
for the resource quota, and adjust the query_device response to use quotas
rather than resource maxima.

As part of the implementation, we introduce a new field in
mlx4_dev: quotas.  This field holds the resource quotas used
to report maxima to the upper layers (ib_core, via query_device).

The HCA maxima of these values are passed to the VFs (via
QUERY_HCA) so that they may continue to use these in handling
QPs, CQs, SRQs and MPTs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a0d0a61

net/mlx4_core: Fix checking order in MR table init · a30f1bc5

Jack Morgenstein authored Nov 03, 2013

In procedure mlx4_init_mr_table(), slaves should do no processing,
but should return success. This initialization is hypervisor-only.

However, the check for num_mpts being a power-of-2 was performed
before the check to return immediately if the driver is for a slave.
This resulted in spurious failures.

The order of performing the checks is reversed, so that if the
driver is for a slave, no processing is done and success is returned.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a30f1bc5

net/mlx4_core: Don't fail reg/unreg vlan for older guests · 2c957ff2

Jack Morgenstein authored Nov 03, 2013

In upstream kernels under SRIOV, the vlan register/unregister calls
were NOPs (doing nothing and returning OK). We detect these old
calls from guests (via the comm channel), since previously the
port number in mlx4_register_vlan was passed (improperly) in the
out_param. This has been corrected so that the port number is now
passed in bits 8..15 of the in_modifier field.

For old calls, these bits will be zero, so if the passed port
number is zero, we can still look at the out_param field to see
if it contains a valid port number. If yes, the VM is running
an old driver.

Since for old drivers, the register/unregister_vlan wrappers were
NOPs, we continue this policy -- the reason being that upstream
had an additional bug in eth driver running on guests (where
procedure mlx4_en_vlan_rx_kill_vid() had the following code:

if (!mlx4_find_cached_vlan(mdev->dev, priv->port, vid, &idx))
        mlx4_unregister_vlan(mdev->dev, priv->port, idx);
else
        en_err(priv, "could not find vid %d in cache\n", vid);

On a VM, mlx4_find_cached_vlan() will always fail, since the
vlan cache is located on the Hypervisor; on guests it is empty.

Therefore, if we allow upstream guests to register vlans, we will
have vlan leakage since the unregister will never be performed.
Leaving vlan reg/unreg for old guest drivers as a NOP is not a
feature regression, since in upstream the register/unregister
vlan wrapper is a NOP.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c957ff2

net/mlx4_core: Resource tracker for reg/unreg vlans · 4874080d

Jack Morgenstein authored Nov 03, 2013

Add resource tracker support for reg/unreg vlans calls done by VFs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4874080d

net/mlx4_en: Use vlan id instead of vlan index for unregistration · 2009d005

Jack Morgenstein authored Nov 03, 2013

Use of vlan_index created problems unregistering vlans on guests.

In addition, tools delete vlan by tag, not by index, lets follow that.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2009d005

net/mlx4_core: Fix reg/unreg vlan/mac to conform to the firmware spec · acddd5dd

Jack Morgenstein authored Nov 03, 2013

The functions mlx4_register_vlan, mlx4_unregister_vlan, mlx4_register_mac,
mlx4_unregister_mac all made illegal use of the out_param in multifunc mode
to pass the port number. The firmware spec specifies that the port number
should be passed in bits 8..15 of the input-modifier field for ALLOC_RES and
FREE_RES (sections 20.15.1 and 20.15.2).

For MAC register/unregister, this patch contains workarounds so that guests
running previous kernels continue to work on a new Hypervisor, and guests
running the new kernel will continue to work on old hypervisors.

Vlan registeration capability is still not operational in multifunction mode,
since the vlan wrapper functions are not implemented in this patch.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

acddd5dd

net/mlx4_core: Fix register/unreg vlan flow · 162226a1

Jack Morgenstein authored Nov 03, 2013

The reg/unreg vlan code was broken:

1. a wrapped function called another wrapped function, causing a deadlock.

2. unregister_vlan called cmd_box instead of cmd_box_imm, leading to
   incorrectly passed parameters.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

162226a1

sh_eth: check platform data pointer · 3b4c5cbf

Sergei Shtylyov authored Oct 30, 2013

Check the platform data pointer before dereferencing it and error out of the
probe() method if it's NULL.

This has additional effect of preventing kernel oops with outdated platform data
containing zero PHY address instead (such as on SolutionEngine7710).
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b4c5cbf

Merge branch 'usbnet' · 6f9fed0b

David S. Miller authored Nov 04, 2013

Bjørn Mork says:

====================
cdc_mbim + qmi_wwan trivial fixes

This series fixes three problems Oliver pointed out during the
review of the new huawei_cdc_ncm driver:
http://patchwork.ozlabs.org/patch/278903/

That innocent driver only used cdc_mbim as a blueprint, and
all the blame should really have gone to me....

I do have a similar fix for the manage_power issue in the
cdc-wdm USB class driver as well.  It will be submitted to
linux-usb as soon as Greg opens up his mailbox again :-)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6f9fed0b

net: cdc_mbim: fixup error return value · e62416e8

Bjørn Mork authored Nov 01, 2013

Reported-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

e62416e8

net: cdc_mbim: no need to check for resume if suspend exists · c37ac9b2

Bjørn Mork authored Nov 01, 2013

Reported-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

c37ac9b2

net: qmi_wwan: no need to check for resume if suspend exists · a298807b

Bjørn Mork authored Nov 01, 2013

Reported-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

a298807b

net: qmi_wwan: manage_power should always set needs_remote_wakeup · 79d9b62a

Bjørn Mork authored Nov 01, 2013

Reported-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

79d9b62a

net: cdc_mbim: manage_power should always set needs_remote_wakeup · 04d948a9

Bjørn Mork authored Nov 01, 2013

Reported-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

04d948a9

Merge branch 'qlcnic' · 96635fbd

David S. Miller authored Nov 04, 2013

Himanshu Madhani says:

====================
qlcnic: Multiple Tx queue support and code refactoring

This Patch series contains following changes

o Refactored code to calculate, validate and assign Tx/SDS rings for  various modes of driver.
o Enhanced ethtool statistics for multi Tx queue on all supported adapters.
o Enable multiple Tx queue for 83xx and 84xx Series adapters.
o Register netdev for failed device state.

changes from v1 -> v2
o Dropped patch to replace inappropriate usage of kzalloc() with vzalloc().

Please apply to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

96635fbd

qlcnic: update version to 5.3.52 · db62d7d9

Himanshu Madhani authored Nov 04, 2013

Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db62d7d9

qlcnic: Enable multiple Tx queue support for 83xx/84xx Series adapters. · 18afc102

Himanshu Madhani authored Nov 04, 2013

o 83xx and 84xx firmware is capable of multiple Tx queues.
  This patch will enable multiple Tx queues for 83xx/84xx
  series adapters. Max number of Tx queues supported will be 8.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18afc102

qlcnic: refactor Tx/SDS ring calculation and validation in driver. · 34e8c406

Himanshu Madhani authored Nov 04, 2013

o Current driver has duplicate code for validating user input
  for changing Tx/SDS rings using set_channel ethtool interface.
  This patch removes duplicate code and refactored Tx/SDS ring
  validation for 82xx/83xx/84xx series adapter.
o Refactored code now calculates maximum Tx/Rx ring driver can
  support based on Default, NPAR and SRIOV PF/VF mode of driver.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34e8c406

qlcnic: Enhance ethtool Statistics for Multiple Tx queue. · f27c75b3

Himanshu Madhani authored Nov 04, 2013

o Enhance ethtool statistics to display multiple Tx queue stats for
  all supported adapters.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f27c75b3

qlcnic: Register netdev in FAILED state for 83xx/84xx · 78ea2d97

Sucheta Chakraborty authored Nov 04, 2013

o Without failing probe, register netdev when device is in FAILED state.
o Device will come up with minimum functionality and allow diagnostics and
  repair of the adapter.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78ea2d97

lib: crc32: reduce number of cases for crc32{, c}_combine · 16514839

Daniel Borkmann authored Nov 04, 2013

We can safely reduce the number of test cases by a tenth.
There is no particular need to run as many as we're running
now for crc32{,c}_combine, that gives us still ~8000 tests
we're doing if people run kernels with crc selftests enabled
which is perfectly fine.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16514839

lib: crc32: conditionally resched when running testcases · cc0ac199

Daniel Borkmann authored Nov 04, 2013

Fengguang reports that when crc32 selftests are running on startup, on
some e.g. 32bit systems, we can get a CPU stall like "INFO: rcu_sched
self-detected stall on CPU { 0} (t=2101 jiffies g=4294967081 c=4294967080
q=41)". As this is not intended, add a cond_resched() at the end of a
test case to fix it. Introduced by efba721f ("lib: crc32: add test cases
for crc32{, c}_combine routines").
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc0ac199

net: checksum: fix warning in skb_checksum · cea80ea8

Daniel Borkmann authored Nov 04, 2013

This patch fixes a build warning in skb_checksum() by wrapping the
csum_partial() usage in skb_checksum(). The problem is that on a few
architectures, csum_partial is used with prefix asmlinkage whereas
on most architectures it's not. So fix this up generically as we did
with csum_block_add_ext() to match the signature. Introduced by
2817a336 ("net: skb_checksum: allow custom update/combine for
walking skb").
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cea80ea8

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 394efd19

David S. Miller authored Nov 04, 2013

Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.
Signed-off-by: David S. Miller <davem@davemloft.net>

394efd19

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · be408cd3

Linus Torvalds authored Nov 04, 2013

Pull networking fixes from David Miller:
 "I'm sending a pull request of these lingering bug fixes for networking
  before the normal merge window material because some of this stuff I'd
  like to get to -stable ASAP"

 1) cxgb3 stopped working on 32-bit machines, fix from Ben Hutchings.

 2) Structures passed via netlink for netfilter logging are not fully
    initialized.  From Mathias Krause.

 3) Properly unlink upper openvswitch device during notifications, from
    Alexei Starovoitov.

 4) Fix race conditions involving access to the IP compression scratch
    buffer, from Michal Kubrecek.

 5) We don't handle the expiration of MTU information contained in ipv6
    routes sometimes, fix from Hannes Frederic Sowa.

 6) With Fast Open we can miscompute the TCP SYN/ACK RTT, from Yuchung
    Cheng.

 7) Don't take TCP RTT sample when an ACK doesn't acknowledge new data,
    also from Yuchung Cheng.

 8) The decreased IPSEC garbage collection threshold causes problems for
    some people, bump it back up.  From Steffen Klassert.

 9) Fix skb->truesize calculated by tcp_tso_segment(), from Eric
    Dumazet.

10) flow_dissector doesn't validate packet lengths sufficiently, from
    Jason Wang

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  net/mlx4_core: Fix call to __mlx4_unregister_mac
  net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb
  net: flow_dissector: fail on evil iph->ihl
  xfrm: Fix null pointer dereference when decoding sessions
  can: kvaser_usb: fix usb endpoints detection
  can: c_can: Fix RX message handling, handle lost message before EOB
  doc:net: Fix typo in Documentation/networking
  bgmac: don't update slot on skb alloc/dma mapping error
  ibm emac: Fix locking for enable/disable eob irq
  ibm emac: Don't call napi_complete if napi_reschedule failed
  virtio-net: correctly handle cpu hotplug notifier during resuming
  bridge: pass correct vlan id to multicast code
  net: x25: Fix dead URLs in Kconfig
  netfilter: xt_NFQUEUE: fix --queue-bypass regression
  xen-netback: use jiffies_64 value to calculate credit timeout
  cxgb3: Fix length calculation in write_ofld_wr() on 32-bit architectures
  bnx2x: Disable VF access on PF removal
  bnx2x: prevent FW assert on low mem during unload
  tcp: gso: fix truesize tracking
  xfrm: Increase the garbage collector threshold
  ...

be408cd3

net/mlx4_core: Fix call to __mlx4_unregister_mac · c32b7dfb

Jack Morgenstein authored Nov 03, 2013

In function mlx4_master_deactivate_admin_state() __mlx4_unregister_mac was
called using the MAC index. It should be called with the value of the MAC itself.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c32b7dfb

Merge branch 'fixes-for-3.12' of git://gitorious.org/linux-can/linux-can · e9b51a19

David S. Miller authored Nov 04, 2013

Marc Kleine-Budde says:

====================
I have two late fixes for the v3.12 release:

The first patch fixes a problem in the c_can's RX message handling, which can
lead to an endless interrupt loop under heavy load if messages are lost. The
second patch is by Olivier Sobrie and fixes the endpoint detection of the
kvaser_usb driver, which is needed for some devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e9b51a19

net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb · 7926c1d5

Daniel Borkmann authored Oct 31, 2013

Introduced in f9e42b85 ("net: sctp: sideeffect: throw BUG if
primary_path is NULL"), we intended to find a buggy assoc that's
part of the assoc hash table with a primary_path that is NULL.
However, we better remove the BUG_ON for now and find a more
suitable place to assert for these things as Mark reports that
this also triggers the bug when duplication cookie processing
happens, and the assoc is not part of the hash table (so all
good in this case). Such a situation can for example easily be
reproduced by:

  tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
  tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
  tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
            protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2

This drops 20% of COOKIE-ACK packets. After some follow-up
discussion with Vlad we came to the conclusion that for now we
should still better remove this BUG_ON() assertion, and come up
with two follow-ups later on, that is, i) find a more suitable
place for this assertion, and possibly ii) have a special
allocator/initializer for such kind of temporary assocs.
Reported-by: Mark Thomas <Mark.Thomas@metaswitch.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7926c1d5

net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) · f421436a

Arvid Brodin authored Oct 30, 2013

High-availability Seamless Redundancy ("HSR") provides instant failover
redundancy for Ethernet networks. It requires a special network topology where
all nodes are connected in a ring (each node having two physical network
interfaces). It is suited for applications that demand high availability and
very short reaction time.

HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
send special HSR frames in both directions over the ring. The driver creates
virtual network interfaces that can be used just like any ordinary Linux
network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
must be HSR capable.

This code is a "best effort" to comply with the HSR standard as described in
IEC 62439-3:2010 (HSRv0).
Signed-off-by: Arvid Brodin <arvid.brodin@xdin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f421436a

net: extend net_device allocation to vmalloc() · 74d332c1

Eric Dumazet authored Oct 30, 2013

Joby Poriyath provided a xen-netback patch to reduce the size of
xenvif structure as some netdev allocation could fail under
memory pressure/fragmentation.

This patch is handling the problem at the core level, allowing
any netdev structures to use vmalloc() if kmalloc() failed.

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Joby Poriyath <joby.poriyath@citrix.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74d332c1

Merge branch 'sctp_csum' · b397f999

David S. Miller authored Nov 03, 2013

Daniel Borkmann says:

====================
SCTP fix/updates

Please see patch 5 for the main description/motivation, the rest just
brings in the needed functionality for that. Although this is actually
a fix, I've based it against net-next as some additional work for
fixing it was needed.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b397f999

net: sctp: fix and consolidate SCTP checksumming code · e6d8b64b

Daniel Borkmann authored Oct 30, 2013

This fixes an outstanding bug found through IPVS, where SCTP packets
with skb->data_len > 0 (non-linearized) and empty frag_list, but data
accumulated in frags[] member, are forwarded with incorrect checksum
letting SCTP initial handshake fail on some systems. Linearizing each
SCTP skb in IPVS to prevent that would not be a good solution as
this leads to an additional and unnecessary performance penalty on
the load-balancer itself for no good reason (as we actually only want
to update the checksum, and can do that in a different/better way
presented here).

The actual problem is elsewhere, namely, that SCTP's checksumming
in sctp_compute_cksum() does not take frags[] into account like
skb_checksum() does. So while we are fixing this up, we better reuse
the existing code that we have anyway in __skb_checksum() and use it
for walking through the data doing checksumming. This will not only
fix this issue, but also consolidates some SCTP code with core
sk_buff code, bringing it closer together and removing respectively
avoiding reimplementation of skb_checksum() for no good reason.

As crc32c() can use hardware implementation within the crypto layer,
we leave that intact (it wraps around / falls back to e.g. slice-by-8
algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
combinator for crc32c blocks.

Also, we remove all other SCTP checksumming code, so that we only
have to use sctp_compute_cksum() from now on; for doing that, we need
to transform SCTP checkumming in output path slightly, and can leave
the rest intact.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6d8b64b

net: skb_checksum: allow custom update/combine for walking skb · 2817a336

Daniel Borkmann authored Oct 30, 2013

Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.

Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2817a336

lib: crc32: add test cases for crc32{, c}_combine routines · efba721f

Daniel Borkmann authored Oct 30, 2013

We already have 100 test cases for crcs itself, so split the test
buffer with a-prio known checksums, and test crc of two blocks
against crc of the whole block for the same results.

Output/result with CONFIG_CRC32_SELFTEST=y:

  [    2.687095] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
  [    2.687097] crc32: self tests passed, processed 225944 bytes in 278177 nsec
  [    2.687383] crc32c: CRC_LE_BITS = 64
  [    2.687385] crc32c: self tests passed, processed 225944 bytes in 141708 nsec
  [    7.336771] crc32_combine: 113072 self tests passed
  [   12.050479] crc32c_combine: 113072 self tests passed
  [   17.633089] alg: No test for crc32 (crc32-pclmul)
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

efba721f

lib: crc32: add functionality to combine two crc32{, c}s in GF(2) · 6e95fcaa

Daniel Borkmann authored Oct 30, 2013

This patch adds a combinator to merge two or more crc32{,c}s
into a new one. This is useful for checksum computations of
fragmented skbs that use crc32/crc32c as checksums.

The arithmetics for combining both in the GF(2) was taken and
slightly modified from zlib. Only passing two crcs is insufficient
as two crcs and the length of the second piece is needed for
merging. The code is made generic, so that only polynomials
need to be passed for crc32_le resp. crc32c_le.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

6e95fcaa

lib: crc32: clean up spacing in test cases · d921e049

Daniel Borkmann authored Oct 30, 2013

This is nothing more but a whitepace cleanup, as 80 chars is not a
hard but soft limit, and otherwise makes the test cases array really
look ugly. So fix it up.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

d921e049

03 Nov, 2013 1 commit
- Linux 3.12 · 5e01dc7b
  Linus Torvalds authored Nov 03, 2013
  
  5e01dc7b