Commits · 6039e3dff09c6b55612125c53127827723b80ce9 · Kirill Smelkov / linux

26 Jan, 2015 22 commits

David S. Miller authored Jan 26, 2015

Joe Stringer says:

====================
openvswitch: Introduce 128-bit unique flow identifiers.

This series extends the openvswitch datapath interface for flow commands to use
128-bit unique identifiers as an alternative to the netlink-formatted flow key.
This significantly reduces the cost of assembling messages between the kernel
and userspace, in particular improving Open vSwitch revalidation performance by
40% or more.

v14:
- Perform lookup using unmasked key in legacy case.
- Fix minor checkpatch.pl style violations.

v13:
- Embed sw_flow_id in sw_flow to save memory allocation in UFID case.
- Malloc unmasked key for id in non-UFID case.
- Fix bug where non-UFID case could double-serialize keys.

v12:
- Userspace patches fully merged into Open vSwitch master
- New minor refactor patches (2,3,4)
- Merge unmasked_key, ufid representation of flow identifier in sw_flow
- Improve memory allocation sizes when serializing ufid
- Handle corner case where a flow_new is requested with a flow that has an
  identical ufid as an existing flow, but a different flow key
- Limit UFID to between 1-16 octets inclusive.
- Add various helper functions to improve readibility

v11:
- Pushed most of the prerequisite patches for this series to OVS master.
- Split out openvswitch.h interface changes from datapath implementation
- Datapath implementation to be reviewed on net-next, separately

v10:
- New patch allowing datapath to serialize masked keys
- Simplify datapath interface by accepting UFID or flow_key, but not both
- Flows set up with UFID must be queried/deleted using UFID
- Reduce sw_flow memory usage for UFID
- Don't periodically rehash UFID table in linux datapath
- Remove kernel_only UFID in linux datapath

v9:
- No kernel changes

v8:
- Rename UID -> UFID
- Fix null dereference in datapath when paired with older userspace
- All patches are reviewed/acked except datapath changes.

v7:
- Remove OVS_DP_F_INDEX_BY_UID
- Rework datapath UID serialization for variable length UIDs

v6:
- Reduce netlink conversions for all datapaths
- Various bugfixes

v5:
- Various bugfixes
- Improve logging

v4:
- Datapath memory leak fixes
- Enable UID-based terse dumping and deleting by default
- Various fixes

RFCv3:
- Add datapath implementation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6039e3df

openvswitch: Add support for unique flow IDs. · 74ed7ab9

Joe Stringer authored Jan 21, 2015

Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74ed7ab9

genetlink: Add genlmsg_parse() helper function. · 7b1883ce

Joe Stringer authored Jan 21, 2015

The first user will be the next patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b1883ce

openvswitch: Use sw_flow_key_range for key ranges. · 272c2cf8

Joe Stringer authored Jan 21, 2015

These minor tidyups make a future patch a little tidier.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

272c2cf8

openvswitch: Refactor ovs_flow_tbl_insert(). · d29ab6f8

Joe Stringer authored Jan 21, 2015

Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
This tidies up a future patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d29ab6f8

openvswitch: Refactor ovs_nla_fill_match(). · 5b4237bb

Joe Stringer authored Jan 21, 2015

Refactor the ovs_nla_fill_match() function into separate netlink
serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
parameter - all callers need to nest the flow, and callers have better
knowledge about whether it is serializing a mask or not.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b4237bb

Merge tag 'linux-can-next-for-3.20-20150121' of... · 707d79e5

David S. Miller authored Jan 26, 2015

Merge tag 'linux-can-next-for-3.20-20150121' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-21-01

this is a pull request of 4 patches for net-next/master.

Andri Yngvason contributes one patch to further consolidate the CAN
state change handling. The next patch is by kbuild test robot/Fengguang
Wu which fixes a coccinelle warning in the CAN infrastructure. The two
last patches are by me, they remove a unused variable from the flexcan
and at91_can driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

707d79e5

Merge branch 'sh_eth' · 4141d93f

David S. Miller authored Jan 26, 2015

Sergei Shtylyov says:

====================
sh_eth: massage PM code

   Here's a set of 2 patches against DaveM's 'net-next.git' repo. We're adding
the support for suspend/hibernation as well as somewhat changing the existing
code. There are still MDIO-related issue with suspend (kernel exception), we've
been working on it and shall address it with a separate patch...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4141d93f

sh_eth: add more PM methods · b71af046

Mikhail Ulyanov authored Jan 22, 2015

Add sh_eth_{suspend|resume}() implementing {suspend|resume|freeze|thaw|poweroff|
restore}() PM methods to make it possible to restore from hibernation not only
in Linux  but also in e.g. U-Boot and  to have more determined state on resume/
restore.
Signed-off-by: Mikhail Ulyanov <mikhail.ulyanov@cogentembedded.com>
[Sergei: moved sh_eth_{suspend|resume}() before sh_eth_runtime_nop(), enclosed
them with #ifdef CONFIG_PM_SLEEP, reordered the local variables, got rid of
*goto* and label, reordered macro invocations, renamed, modified the changelog.]
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b71af046

sh_eth: use SET_RUNTIME_PM_OPS() · e7d7e898

Mikhail Ulyanov authored Jan 22, 2015

Use SET_RUNTIME_PM_OPS() macro to initialize the runtime PM method pointers in
the 'struct dev_pm_ops'.
Signed-off-by: Mikhail Ulyanov <mikhail.ulyanov@cogentembedded.com>
[Sergei: renamed, added the changelog.]
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7d7e898

net: stmmac: add BQL support · 38979574

Beniamino Galvani authored Jan 21, 2015

Add support for Byte Queue Limits to the STMicro MAC driver.

Tested on a Amlogic S802 quad Cortex-A9 board, where the use of BQL
decreases the latency of a high priority ping from ~12ms to ~1ms when
the 100Mbit link is saturated by 20 TCP streams.
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38979574

cxgb4: Fixes cxgb4_inet6addr_notifier unregister call · 1793c798

Hariprasad Shenai authored Jan 21, 2015

commit b5a02f50 ("cxgb4 : Update ipv6 address handling api") introduced
a regression where unregister cxgb4_inet6addr_notifier wasn't getting called
during module_exit.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1793c798

rhashtable: rhashtable_remove() must unlink in both tbl and future_tbl · fe6a043c

Thomas Graf authored Jan 21, 2015

As removals can occur during resizes, entries may be referred to from
both tbl and future_tbl when the removal is requested. Therefore
rhashtable_remove() must unlink the entry in both tables if this is
the case. The existing code did search both tables but stopped when it
hit the first match.

Failing to unlink in both tables resulted in use after free.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe6a043c

ipv6: tcp: fix race in IPV6_2292PKTOPTIONS · 1dc7b90f

Eric Dumazet authored Jan 21, 2015

IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r()
to charge the skb to socket.

It means that destructor must be called while socket is locked.

Therefore, we cannot use skb_get() or atomic_inc(&skb->users)
to protect ourselves : kfree_skb() might race with other users
manipulating sk->sk_forward_alloc

Fix this race by holding socket lock for the duration of
ip6_datagram_recv_ctl()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1dc7b90f

rhashtable: fix rht_for_each_entry_safe() endless loop · 607954b0

Patrick McHardy authored Jan 21, 2015

"next" is not updated, causing an endless loop for buckets with more than
one element.

Fixes: 88d6ed15 ("rhashtable: Convert bucket iterators to take table and index")
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

607954b0

net/fsl: Replace spin_event_timeout() with arch independent in xgmac_mdio · 22f6bba7

Shaohui Xie authored Jan 21, 2015

spin_event_timeout() is PPC dependent, use an arch independent
equivalent instead.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22f6bba7

net/fsl: drop in_be32() & out_be32() in xgmac_mdio · ca43e58c

Shaohui Xie authored Jan 21, 2015

Use ioread32be() & iowrite32be() instead.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca43e58c

bonding: handle more gso types · 24f87d4c

Eric Dumazet authored Jan 25, 2015

In commit 5a7baa78 ("bonding: Advertize vxlan offload features when
supported"), Or Gerlitz added support conditional vxlan offload.

In this patch I also add support for all kind of tunnels,
but we allow a bonding device to not require segmentation,
as it is always better to make this segmentation at the very last stage,
if a particular slave device requires it.

Tested:

 Setup a GRE tunnel,
 on a physical NIC not having tx-gre-segmentation.
 Results on bnx2x are even better, as we no longer have to segment
 in software.

ethtool -K bond0 tx-gre-segmentation off

super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
7538.32

ethtool -K bond0 tx-gre-segmentation on

super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
10200.5
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24f87d4c

bridge: simplify br_getlink() a bit · 1b846f92

Dan Carpenter authored Jan 21, 2015

Static checkers complain that we should maybe set "ret" before we do the
"goto out;".  They interpret the NULL return from br_port_get_rtnl() as
a failure and forgetting to set the error code is a common bug in this
situation.

The code is confusing but it's actually correct.  We are returning zero
deliberately.  Let's re-write it a bit to be more clear.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b846f92

Merge branch 'phy_dsa' · 5c66cfe0

David S. Miller authored Jan 25, 2015

Florian Fainelli says:

====================
net: phy and dsa random fixes/cleanups

These two patches were already present as part of my attempt to make
DSA modules work properly, these are the only two "valid" patches at
this point which should not need any further rework.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5c66cfe0

net: dsa: bcm_sf2: factor interrupt disabling in a function · 691c9a8f

Florian Fainelli authored Jan 20, 2015

Factor the interrupt disabling in a function: bcm_sf2_intr_disable()
since we are doing the same thing in the setup and suspend paths.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

691c9a8f

net: phy: fixed: allow setting no update_link callback · 799d4444

Florian Fainelli authored Jan 20, 2015

fixed_phy_set_link_update() contains an early check against a NULL
callback pointer, which basically prevents us from removing any
previous callback we may have set. The users of the fp->link_update
callback deal with a NULL callback just fine, so we really want to allow
"removing" a link_update callback to avoid dangling callback pointers
during e.g: module removal.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

799d4444

25 Jan, 2015 18 commits

net: ipv6: Add sysctl entry to disable MTU updates from RA · c2943f14

Harout Hedeshian authored Jan 20, 2015

The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2943f14

Merge branch 'fib_trie_next' · 46a93af2

David S. Miller authored Jan 25, 2015

Alexander Duyck says:

====================
Fixes and improvements for recent fib_trie updates

While performing testing and prepping the next round of patches I found a
few minor issues and improvements that could be made.

These changes should help to reduce the overall code size and improve the
performance slighlty as I noticed a 20ns or so improvement in my worst-case
testing which will likely only result in a 1ns difference with a standard
sized trie.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

46a93af2

fib_trie: Various clean-ups for handling slen · 64c62723

Alexander Duyck authored Jan 22, 2015

While doing further work on the fib_trie I noted a few items.

First I was using calls that were far more complicated than they needed to
be for determining when to push/pull the suffix length. I have updated the
code to reflect the simplier logic.

The second issue is that I realised we weren't necessarily handling the
case of a leaf_info struct surviving a flush. I have updated the logic so
that now we will call pull_suffix in the event of having a leaf info value
left in the leaf after flushing it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

64c62723

fib_trie: Move fib_find_alias to file where it is used · 02525368

Alexander Duyck authored Jan 22, 2015

The function fib_find_alias is only accessed by functions in fib_trie.c as
such it makes sense to relocate it and cast it as static so that the
compiler can take advantage of optimizations it can do to it as a local
function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02525368

fib_trie: Use empty_children instead of counting empty nodes in stats collection · 30cfe7c9

Alexander Duyck authored Jan 22, 2015

It doesn't make much sense to count the pointers ourselves when
empty_children already has a count for the number of NULL pointers stored
in the tnode.  As such save ourselves the cycles and just use
empty_children.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30cfe7c9

fib_trie: Add collapse() and should_collapse() to resize · 95f60ea3

Alexander Duyck authored Jan 22, 2015

This patch really does two things.

First it pulls the logic for determining if we should collapse one node out
of the tree and the actual code doing the collapse into a separate pair of
functions. This helps to make the changes to these areas more readable.

Second it encodes the upper 32b of the empty_children value onto the
full_children value in the case of bits == KEYLENGTH. By doing this we are
able to handle the case of a 32b node where empty_children would appear to
be 0 when it was actually 1ul << 32.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95f60ea3

fib_trie: Fall back to slen update on inflate/halve failure · a80e89d4

Alexander Duyck authored Jan 22, 2015

This change corrects an issue where if inflate or halve fails we were
exiting the resize function without at least updating the slen for the
node.  To correct this I have moved the update of max_size into the while
loop so that it is only decremented on a successful call to either inflate
or halve.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a80e89d4

fib_trie: Fix RCU bug and merge similar bits of inflate/halve · 69fa57b1

Alexander Duyck authored Jan 22, 2015

This patch addresses two issues.

The first issue is the fact that I believe I had the RCU freeing sequence
slightly out of order. As a result we could get into an issue if a caller
went into a child of a child of the new node, then backtraced into the to be
freed parent, and then attempted to access a child of a child that may have
been consumed in a resize of one of the new nodes children. To resolve this I
have moved the resize after we have freed the oldtnode. The only side effect
of this is that we will now be calling resize on more nodes in the case of
inflate due to the fact that we don't have a good way to test to see if a
full_tnode on the new node was there before or after the allocation. This
should have minimal impact however since the node should already be
correctly size so it is just the cost of calling should_inflate that we
will be taking on the node which is only a couple of cycles.

The second issue is the fact that inflate and halve were essentially doing
the same thing after the new node was added to the trie replacing the old
one. As such it wasn't really necessary to keep the code in both functions
so I have split it out into two other functions, called replace and
update_children.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69fa57b1

fib_trie: Use index & (~0ul << n->bits) instead of index >> n->bits · b3832117

Alexander Duyck authored Jan 22, 2015

In doing performance testing and analysis of the changes I recently found
that by shifting the index I had created an unnecessary dependency.

I have updated the code so that we instead shift a mask by bits and then
just test against that as that should save us about 2 CPU cycles since we
can generate the mask while the key and pos are being processed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3832117

Merge branch 'mlx4-next' · bc579ae5

David S. Miller authored Jan 25, 2015

Or Gerlitz says:

====================
mlx4: Fix and enhance the device reset flow

This series from Yishai Hadas fixes the device reset flow and adds SRIOV support.

Reset flows are required whenever a device experiences errors, is unresponsive,
or is not in a deterministic state. In such cases, the driver is expected to
reset the HW and continue operation. When SRIOV is enabled, these requirements
apply both to PF and VF devices.

Currently, the mlx4 reset flow doesn't work properly: when a fatal error is
detected on the FW internal buffer the chip is not reset and stays in its
bad state. There are cases that assumed to be fatal such as non-responsive FW,
errors via closing commands but are not handled today.

The AER mechanism should also be fixed:
- It should use mlx4_load_one instead of __mlx4_init_one which is done
  upon HCA probing.
- It must be aligned with concurrent catas flow, mark device to be in
  an error state, reset chip, etc.
- Port types should be restored to their original values before error occurred.

In addition, there the SRIOV use-case isn't supported.

In above cases when the device state becomes fatal we must act as follows:
1) Reset the chip and mark the HW device state as in fatal error.
2) Wake up any pending commands, preventing new ones to come in.
3) Restart the software stack.

We also address the SRIOV mode as follows: In case the PF detects a fatal error,
it lets VFs know about that, then both itself and VFs are restarted asynchronously.
However, in case only the VF encountered a fatal case or forced to be reset, they
reset the VF stuff and then restart software.

changes from V0:

No need to call pci_disable_device upon permanent PCI error. This will
be done as part of mlx4_remove_one which is called later once we
return PCI_ERS_RESULT_DISCONNECT from the pci error handler.

Initial toggle value should use only the T bit and not the whole byte value.
Not doing so sometimes broke SRIOV as of junky value seen by the VF as a
non-ready comm channel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bc579ae5

net/mlx4_core: Reset flow activation upon SRIOV fatal command cases · 0cd93027

Yishai Hadas authored Jan 25, 2015

When SRIOV commands are executed over the comm-channel and get
a fatal error (e.g. timeout, closing command failure) the VF enters
into error state and reset flow is activated.

To be able to recognize whether the failure was on a closing command, the
operational code for the given VHCR command is used. Once the device entered
into an error state we prevent redundant error messages from being printed.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cd93027

net/mlx4_core: Enable device recovery flow with SRIOV · 55ad3592

Yishai Hadas authored Jan 25, 2015

In SRIOV, both the PF and the VF may attempt device recovery whenever they
assume that the device is not functioning.  When the PF driver resets the
device, the VF should detect this and attempt to reinitialize itself.

The VF must be able to reset itself under all circumstances, even
if the PF is not responsive.

The VF shall reset itself in the following cases:

1. Commands are not processed within reasonable time over the communication channel.
This is done considering device state and the correct return code based on
the command as was done in the native mode, done in the next patch.

2. The VF driver receives an internal error event reported by the PF on the
communication channel. This occurs when the PF driver resets the device or
when VF is out of sync with the PF.

Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
PF is not responsive.

As PF and VF may run their reset flow simulantanisly, there are several cases
that are handled:
- Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
- Prevent PF getting VF commands before it has finished initializing its resources.
- Upon VF startup, check that comm-channel is online before sending
  commands to the PF and getting timed-out.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55ad3592

net/mlx4_core: Handle AER flow properly · 2ba5fbd6

Yishai Hadas authored Jan 25, 2015

Fix AER callbacks to work properly, it includes:
- Refractoring AER to be aligned with Reset flow support.
- Sync with concurrent catas flow.

In addition, fix the shutdown PCI callback to sync with
concurrent catas flow.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ba5fbd6

net/mlx4_core: Manage interface state for Reset flow cases · c69453e2

Yishai Hadas authored Jan 25, 2015

We need to manage interface state to sync between reset flow and some other
relative cases such as remove_one. This has to be done to prevent certain
races. For example in case software stack is down as a result of unload call,
the remove_one should skip the unload phase.

Implement the remove_one case, handling AER and other cases comes next.

The interface can be up/down, upon remove_one, the state will include an extra
bit indicating that the device is cleaned-up, forcing other tasks to finish
before the final cleanup.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c69453e2

net/mlx4_core: Activate reset flow upon fatal command cases · f5aef5aa

Yishai Hadas authored Jan 25, 2015

We activate reset flow upon command fatal errors, when the device enters an
erroneous state, and must be reset.

The cases below are assumed to be fatal: FW command timed-out, an error from FW
on closing commands, pci is offline when posting/pending a command.

In those cases we place the device into an error state: chip is reset, pending
commands are awakened and completed immediately. Subsequent commands will
return immediately.

The return code in the above cases will depend on the command. Commands which
free and close resources will return success (because the chip was reset, so
callers may safely free their kernel resources). Other commands will return -EIO.

Since the device's state was marked as error, the catas poller will
detect this and restart the device's software stack (as is done when a FW
internal error is directly detected). The device state is protected by a
persistent mutex lives on its mlx4_dev, as such no need any more for the
hcr_mutex which is removed.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5aef5aa

net/mlx4_core: Enhance the catas flow to support device reset · f6bc11e4

Yishai Hadas authored Jan 25, 2015

This includes:

- resetting the chip when a fatal error is detected (the current code
  does not do this).

- exposing the ability to enter error state from outside the catas code
  by calling its functionality. (E.g. FW Command timeout, AER error).

- managing a persistent device state. This is needed to sync between
  reset flow cases.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6bc11e4

net/mlx4_core: Refactor the catas flow to work per device · ad9a0bf0

Yishai Hadas authored Jan 25, 2015

Using a WQ per device instead of a single global WQ, this allows
independent reset handling per device even when SRIOV is used.

This comes as a pre-patch for supporting chip reset
for both native and SRIOV.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad9a0bf0

net/mlx4_core: Set device configuration data to be persistent across reset · dd0eefe3

Yishai Hadas authored Jan 25, 2015

When an HCA enters an internal error state, this is detected by the driver.
The driver then should reset the HCA and restart the software stack.

Keep ports information and some SRIOV configuration in a persistent area
to have it valid across reset.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd0eefe3