Commits · a9a52506277275b73955504bf4df745502a28b8b · Kirill Smelkov / linux

16 Feb, 2012 13 commits

sfc: Pass NIC structure into efx_wanted_parallelism() · a9a52506

Ben Hutchings authored Feb 14, 2012

This lets us identify the NIC affected in case of failure, and
will be necessary to adjust for SR-IOV constraints.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

a9a52506

sfc: Add support for 'extra' channel types · 7f967c01

Ben Hutchings authored Feb 13, 2012

Abstract some of the channel operations to allow for 'extra'
channels that do not have RX or TX queues.

- Try to assign a channel to each extra channel type that is enabled
  for the NIC, but gracefully degrade if we can't allocate sufficient
  MSI-X vectors
- Allow each extra channel type to generate its own channel name
- Allow channel types to disable reallocation and reinitialisation
  of their channels
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

7f967c01

sfc: Make all CPU/IRQ/channel/queue counts unsigned · a16e5b24
Ben Hutchings authored Feb 14, 2012
```
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
```
a16e5b24
sfc: Make buffer table indices and counts consistently unsigned · 5bbe2f4f
Ben Hutchings authored Feb 13, 2012
```
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
```
5bbe2f4f

sfc: Disable flow control during flushes · a606f432

Steve Hodgson authored May 23, 2011

The TX DMA engine issues upstream read requests when there is room in
the TX FIFO for the completion. However, the fetches for the rest of
the packet might be delayed by any back pressure.  Since a flush must
wait for an EOP, the entire flush may be delayed by back pressure.

Mitigate this by disabling flow control before the flushes are
started.  Since PF and VF flushes run in parallel introduce
fc_disable, a reference count of the number of flushes outstanding.

The same principle could be applied to Falcon, but that
would bring with it its own testing.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

a606f432

sfc: Generalise event generation to cover VF-owned event queues · 90893000

Ben Hutchings authored Feb 10, 2012

For SR-IOV we will need to send events to event queues that belong to
VFs serviced by other drivers.  Change the parameters of
efx_generate_event() to allow this and declare it extern.

While we're at it, remove the existing declaration under the wrong
name efx_nic_generate_event().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

90893000

sfc: Use proper function to test for RX channel in efx_poll() · 9d9a6973
Ben Hutchings authored Feb 10, 2012
```
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
```
9d9a6973

sfc: Leave interrupts and event queues enabled whenever we can · 9f2cb71c

Ben Hutchings authored Feb 08, 2012

When SR-IOV is enabled we may receive FLR (Function-Level Reset)
events, associated queue flush events and requests from VF drivers at
any time.  Therefore we need to keep event queues and interrupts
enabled whenever possible.

Currently we stop interrupt-driven event processing before flushing RX
and TX queues; efx_nic_flush_queues() then polls event queues for
flush events and discards any others it finds.  Change it to work with
the regular event handling functions.

Currently efx_start_channel() fills RX queues synchronously when a
device is brought up.  This could now race with NAPI, so change it to
send fill events.

This was almost entirely written by Steve Hodgson, formerly
shodgson@solarflare.com.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

9f2cb71c

sfc: Generate RX fill events based on RX queues, not channels · 2ae75dac

Ben Hutchings authored Feb 07, 2012

This makes it harder to accidentally send such events to TX-only
channels.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

2ae75dac

sfc: Generalise driver event generation · 4ef594eb
Ben Hutchings authored Feb 07, 2012
```
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
```
4ef594eb

sfc: Correct MAC filter bitfield definitions · 055e0ad0

Ben Hutchings authored Feb 06, 2012

The RMFT_DEST_MAC and TMFT_SRC_MAC register fields were previously
documented as 44 bits wide, whereas a MAC address has 48 bits.
Thankfully the hardware uses the correct width and the driver has
used separate definitions that divide each of these into 32-bit and
16-bit fields.

Fix the initial definitions for these fields and rewrite the latter
definitions to use them.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

055e0ad0

sfc: Add support for TX MAC filters · 3d885e39

Ben Hutchings authored Feb 06, 2012

On Siena each TX queue can be configured to send only packets for
which there is a TX MAC filter that matches the source MAC address,
queue ID, and optionally VID.  This will be used to implement the
'spoofchk' feature for SR-IOV virtual functions.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

3d885e39

sfc: Add support for configuring RX unicast/multicast default filters · c274d65c

Ben Hutchings authored Feb 02, 2012

On Siena all received packets that don't match a more specific filter
will match the unicast or multicast default filter. Currently we
leave these set to the default values (RSS with base queue number of
0). Allow them to be reconfigured to select a single RX queue.

These default filters are programmed through the FILTER_CTL register,
but we represent them internally as an additional table of size 2.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

c274d65c

13 Feb, 2012 4 commits

sfc: Warn if unable to create MTDs · 7c43161c

Ben Hutchings authored Jan 27, 2012

Log an explicit warning if we are unable to create MTDs for a net
device.  Also correct the comment about why mtd_device_register() may
fail; there is no longer an MTD table to fill up.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

7c43161c

sfc: Replace some literal constants with EFX_PAGE_SIZE/EFX_BUF_SIZE · 5b6262d0

Ben Hutchings authored Feb 02, 2012

The 'page size' for PCIe DMA, i.e. the alignment of boundaries at
which DMA must be broken, is 4KB.  Name this value as EFX_PAGE_SIZE
and use it in efx_max_tx_len().  Redefine EFX_BUF_SIZE as
EFX_PAGE_SIZE since its value is also a result of that requirement,
and use it in efx_init_special_buffer().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

5b6262d0

sfc: Do not retry hardware probe if it schedules a reset · fadac6aa

Ben Hutchings authored Nov 19, 2011

If efx_pci_probe_main() schedules an INVISIBLE or ALL reset (but
nothing more drastic), we retry it up to 5 times.  So far as I'm
aware, this was a workaround for bugs in Falcon A0 which were fixed
in production silicon.  Remove the retry.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

fadac6aa

sfc: Skip RX end-of-batch work on channels without an RX queue · d9ab7007

Ben Hutchings authored Feb 13, 2012

The code in efx_process_channel() to update the RX queue after each
batch of RX completions works out as a no-op on a TX-only channel
where the RX queue structure is set to all-zeroes, but
(1) efx_channel_get_rx_queue() will BUG() if DEBUG is defined, and
(2) it's a waste of time.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

d9ab7007

07 Feb, 2012 17 commits

sonice: Fix build due to botched netdev_alloc_skb() conversion. · 7280f5ae
David S. Miller authored Feb 07, 2012
```
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
7280f5ae

caif: remove duplicate initialization · af2ce213

Dan Carpenter authored Feb 06, 2012

"priv" is initialized twice.  I kept the second one, because it is next
to the check for NULL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af2ce213

sh-eth: use netdev stats structure and fix dma_map_single · bb7d92e3

Eric Dumazet authored Feb 06, 2012

No need to maintain a parallel net_device_stats structure in
sh_eth_private, since we have a generic one in netdev

Fix two dma_map_single() incorrect parameters, passing skb->tail instead
of skb->data. Seems that there is no corresponding dmap_unmap_single()
calls for the moment in this driver.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb7d92e3

net: fec: Fix build due to wrong dev annotation · b72061a3

Fabio Estevam authored Feb 07, 2012

commit 21a4e469 (netdev: ethernet dev_alloc_skb to netdev_alloc_skb)
should have used "ndev" instead of "dev".

This causes the following build errors:

drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_rx':
drivers/net/ethernet/freescale/fec.c:714: error: 'dev' undeclared (first use in this function)
drivers/net/ethernet/freescale/fec.c:714: error: (Each undeclared identifier is reported only once
drivers/net/ethernet/freescale/fec.c:714: error: for each function it appears in.)
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_alloc_buffers':
drivers/net/ethernet/freescale/fec.c:1213: error: 'dev' undeclared (first use in this function)

Fix it, so that fec driver can be built again.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b72061a3

net/sched: sch_plug - Queue traffic until an explicit release command · c3059be1

Shriram Rajagopalan authored Feb 05, 2012

The qdisc supports two operations - plug and unplug. When the
qdisc receives a plug command via netlink request, packets arriving
henceforth are buffered until a corresponding unplug command is received.
Depending on the type of unplug command, the queue can be unplugged
indefinitely or selectively.

This qdisc can be used to implement output buffering, an essential
functionality required for consistent recovery in checkpoint based
fault-tolerance systems. Output buffering enables speculative execution
by allowing generated network traffic to be rolled back. It is used to
provide network protection for Xen Guests in the Remus high availability
project, available as part of Xen.

This module is generic enough to be used by any other system that wishes
to add speculative execution and output buffering to its applications.

This module was originally available in the linux 2.6.32 PV-OPS tree,
used as dom0 for Xen.

For more information, please refer to http://nss.cs.ubc.ca/remus/
and http://wiki.xensource.com/xenwiki/Remus

Changes in V3:
  * Removed debug output (printk) on queue overflow
  * Added TCQ_PLUG_RELEASE_INDEFINITE - that allows the user to
    use this qdisc, for simple plug/unplug operations.
  * Use of packet counts instead of pointers to keep track of
    the buffers in the queue.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
[author of the code in the linux 2.6.32 pvops tree]
Signed-off-by: David S. Miller <davem@davemloft.net>

c3059be1

Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux · 17b8a74f
David S. Miller authored Feb 07, 2012

17b8a74f

e1000e: minor whitespace and indentation cleanup · 0e15df49

Bruce Allan authored Jan 31, 2012

Cleanup of some whitespace and indentation of a single code block.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

0e15df49

e1000e: fix sparse warnings with -D__CHECK_ENDIAN__ · e885d762

Bruce Allan authored Jan 31, 2012

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e885d762

e1000e: fix checkpatch warning from MINMAX test · a2a5b323

Bruce Allan authored Jan 31, 2012

WARNING: min() should probably be min_t(unsigned int, 4, skb->data_len)
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

a2a5b323

e1000e: cleanup - use braces in both branches of a conditional statement · 24b706b2

Bruce Allan authored Jan 31, 2012

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

24b706b2

e1000e: cleanup e1000_set_phys_id · f23efdff

Bruce Allan authored Jan 31, 2012

Use the existing hw pointer.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f23efdff

e1000e: cleanup e1000_init_mac_params_82571() · 66092f59

Bruce Allan authored Jan 31, 2012

Combine two switch statements into one, convert a nebulous pointer to one
that is a bit more in keeping with the rest of the driver code and cleanup
some coding style. No change in functionality, just cosmetic changes.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

66092f59

e1000e: cleanup e1000_init_mac_params_80003es2lan() · e68782ed

Bruce Allan authored Jan 31, 2012

Combine two switch statements into one, convert a nebulous pointer to one
that is a bit more in keeping with the rest of the driver code and remove
some dead code (there are no 80003es2lan devices with fiber).  No change in
functionality, just cosmetic changes.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e68782ed

e1000e: cleanup - check return values consistently · 9e2d7657

Bruce Allan authored Jan 31, 2012

The majority of the e1000e code checks most function return values using a
test like 'if (ret_val)' or 'if (!ret_val)' but there are a few instances
of 'if (ret_val == 0)'.  This patch converts the latter to the former for
consistency.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

9e2d7657

e1000e: add missing initializers reported when compiling with W=1 · f36bb6ca

Bruce Allan authored Jan 31, 2012

warning: missing initializer
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f36bb6ca

e1000: Adding e1000_dump function · b04e36ba

Tushar Dave authored Jan 27, 2012

When TX hang occurs e1000_dump prints TX ring, RX ring and Device registers.
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

b04e36ba

igbvf: refactor Interrupt Throttle Rate code · ab50a2a4

Mitch A Williams authored Jan 14, 2012

The existing ITR code is broken and confusing, with lots of similarly-named
variables that do different things. Additionally, after the driver carefully
determines the optimal interrupt rate for the adapter, it then
ignores it and always writes a fixed, suboptimal value.

This patch refactors that code to make variable names more descriptive of
what they actually do, and then actually writes the calculated result to
the hardware.

Preliminary testing shows that netperf TCP_STREAM tests goes from ~918Mbps
to ~940Mbps, and TCP_RR goes from ~2k transactions/sec up to > 8k.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Robert E Garrett <robertX.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ab50a2a4

06 Feb, 2012 6 commits

tipc: Minor optimization to rejection of connection-based messages · dff10e9e

Allan Stephens authored Nov 02, 2011

Modifies message rejection logic so that TIPC doesn't attempt to
send a FIN message to the rejecting port if it is known in advance
that there is no such message because the rejecting port doesn't exist.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

dff10e9e

tipc: Eliminate alteration of publication key during name table purging · 3175bd9a

Allan Stephens authored Oct 28, 2011

Removes code that alters the publication key of a name table entry
that is being forcibly purged from TIPC's name table after contact
with the publishing node has been lost.

Current TIPC ensures that all defunct names are purged before
re-establishing contact with a failed node.  There used to be a risk
that the publication might be accidentally deleted because it might be
re-added to the name table before the purge operation was completed.
But now there is no longer a need to ensure that the new key is different
than the old one.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

3175bd9a

tipc: Prevent loss of fragmented messages over broadcast link · 63e7f1ac

Allan Stephens authored Oct 27, 2011

Modifies broadcast link so that an incoming fragmented message is not
lost if reassembly cannot begin because there currently is no buffer
big enough to hold the entire reassembled message. The broadcast link
now ignores the first fragment completely, which causes the sending node
to retransmit the first fragment so that reassembly can be re-attempted.

Previously, the sender would have had no reason to retransmit the 1st
fragment, so we would never have a chance to re-try the allocation.

To do this cleanly without duplicaton, a new bclink_accept_pkt()
function is introduced.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

63e7f1ac

tipc: Prevent loss of fragmented messages over unicast links · b76b27ca

Allan Stephens authored Oct 27, 2011

Modifies unicast link endpoint logic so an incoming fragmented message
is not lost if reassembly cannot begin because there is no buffer big
enough to hold the entire reassembled message. The link endpoint now
ignores the first fragment completely, which causes the sending node to
retransmit the first fragment so that reassembly can be re-attempted.

Previously, the sender would have had no reason to retransmit the 1st
fragment, so we would never have a chance to re-try the allocation.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>

b76b27ca

tipc: Remove obsolete broadcast tag capability · 1ec2bb08

Allan Stephens authored Oct 27, 2011

Eliminates support for the broadcast tag field, which is no longer
used by broadcast link NACK messages.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

1ec2bb08

tipc: Major redesign of broadcast link ACK/NACK algorithms · 7a54d4a9

Allan Stephens authored Oct 27, 2011

Completely redesigns broadcast link ACK and NACK mechanisms to prevent
spurious retransmit requests in dual LAN networks, and to prevent the
broadcast link from stalling due to the failure of a receiving node to
acknowledge receiving a broadcast message or request its retransmission.

Note: These changes only impact the timing of when ACK and NACK messages
are sent, and not the basic broadcast link protocol itself, so inter-
operability with nodes using the "classic" algorithms is maintained.

The revised algorithms are as follows:

1) An explicit ACK message is still sent after receiving 16 in-sequence
messages, and implicit ACK information continues to be carried in other
unicast link message headers (including link state messages). However,
the timing of explicit ACKs is now based on the receiving node's absolute
network address rather than its relative network address to ensure that
the failure of another node does not delay the ACK beyond its 16 message
target.

2) A NACK message is now typically sent only when a message gap persists
for two consecutive incoming link state messages; this ensures that a
suspected gap is not confirmed until both LANs in a dual LAN network have
had an opportunity to deliver the message, thereby preventing spurious NACKs.
A NACK message can also be generated by the arrival of a single link state
message, if the deferred queue is so big that the current message gap
cannot be the result of "normal" mis-ordering due to the use of dual LANs
(or one LAN using a bonded interface). Since link state messages typically
arrive at different nodes at different times the problem of multiple nodes
issuing identical NACKs simultaneously is inherently avoided.

3) Nodes continue to "peek" at NACK messages sent by other nodes. If
another node requests retransmission of a message gap suspected (but not
yet confirmed) by the peeking node, the peeking node forgets about the
gap and does not generate a duplicate retransmit request. (If the peeking
node subsequently fails to receive the lost message, later link state
messages will cause it to rediscover and confirm the gap and send another
NACK.)

4) Message gap "equality" is now determined by the start of the gap only.
This is sufficient to deal with the most common cases of message loss,
and eliminates the need for complex end of gap computations.

5) A peeking node no longer tries to determine whether it should send a
complementary NACK, since the most common cases of message loss don't
require it to be sent. Consequently, the node no longer examines the
"broadcast tag" field of a NACK message when peeking.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

7a54d4a9