Commits · 44a4524c54af2059bd7e967b0527fb7c6618672a · Kirill Smelkov / linux

22 Jan, 2017 6 commits

net: systemport: Add support for SYSTEMPORT Lite · 44a4524c

Florian Fainelli authored Jan 20, 2017

Add supporf for the SYSTEMPORT Lite Ethernet controller, this piece of hardware
is largely based on the full-blown SYSTEMPORT and differs in the following:

- no full-blown UniMAC, instead we have the MagicPacket matching from UniMAC at
  same offset, and a GMII Interface Block (GIB) for the MAC-level stuff, since
  we are always interfaced to an Ethernet switch which is fully Ethernet compliant
  shortcuts could be made

- 16 transmit queues, whose interrupts are moved into the first Level-2 interrupt
  controller bank

- slight TDMA offset change (a register was inserted after TDMA_STATUS, *sigh*)

- 256 RX descriptors (512 words) and 256 TX descriptors (not visible)

As a consequence of these two things, update the code paths accordingly to
differentiate the full-blown from the light version.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

44a4524c

net: systemport: Dynamically allocate number of TX rings · 7b78be48

Florian Fainelli authored Jan 20, 2017

In preparation for adding SYSTEMPORT Lite, which has twice as less transmit
queues than SYSTEMPORT make sure we do allocate TX rings based on the
systemport,txq property to get an appropriate memory footprint.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b78be48

ipv6: add NUMA awareness to seg6_hmac_init_algo() · 9ca677b1

Eric Dumazet authored Jan 20, 2017

Since we allocate per cpu storage, let's also use NUMA hints.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ca677b1

net: stmicro: fix LS field mask in EEE configuration · f4ec6064

jpinto authored Jan 20, 2017

This patch fixes the LS mask when setting EEE timer.
LS field is 10 bits long and not 11 as currently.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Reported-By: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4ec6064

net/mlx4: use rb_entry() · 3704eb6f

Geliang Tang authored Jan 20, 2017

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3704eb6f

6lowpan: use rb_entry() · 530cef21

Geliang Tang authored Jan 20, 2017

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

530cef21

20 Jan, 2017 34 commits

Merge branch 'dsa-hwmon' · 9a549c1e

David S. Miller authored Jan 20, 2017

Andrew Lunn says:

====================
net: dsa: Move temperature sensor code into PHY.

Marvell Ethernet switches contain a temperature sensor. There appears
to be one sensor, which is shared by each of the internal PHYs. Each
PHY has independent registers to read this sensor, and to set a limit
for when an alarm should be raised.

Some Marvell discrete PHY also have the same sensor and registers.
Moving the HWMON code from DSA into the PHY makes the sensor available
in discrete PHYs, and removes the layering violation, the switch
driver poking around in PHY registers.

While moving the code into the PHY driver, it has been re-written to
use the new HWMON APIs.

v2:

Better Cover note explaining one sensor, but multiple independent
registers

Simply error checking.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9a549c1e

net: dsa: Remove hwmon support · cf1a56a4

Andrew Lunn authored Jan 20, 2017

Only the Marvell mv88e6xxx DSA driver made use of the HWMON support in
DSA. The temperature sensor registers are actually in the embedded
PHYs, and the PHY driver now supports it. So remove all HWMON support
from DSA and drivers.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf1a56a4

phy: marvell: Add support for temperature sensor · 0b04680f

Andrew Lunn authored Jan 20, 2017

Some Marvell PHYs have an inbuilt temperature sensor. Add hwmon
support for this sensor.

There are two different variants. The simpler, older chips have a 5
degree accuracy. The newer devices have 1 degree accuracy.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b04680f

inet: don't use sk_v6_rcv_saddr directly · 319554f2

Josef Bacik authored Jan 19, 2017

When comparing two sockets we need to use inet6_rcv_saddr so we get a NULL
sk_v6_rcv_saddr if the socket isn't AF_INET6, otherwise our comparison function
can be wrong.

Fixes: 637bc8bb ("inet: reset tb->fastreuseport when adding a reuseport sk")
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

319554f2

Merge tag 'mlx5-updates-2017-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 7d982567

David S. Miller authored Jan 20, 2017

Saeed Mahameed says:

====================
mlx5 and mlx5e updates 2017-01-19

This series includes some updates for mlx5 core and mlx5e netdevice driver.

From Leon, a small fix that remove an unnecessary print.

From Eli Cohen, a fix to the FW version printout in case of internal error.

From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per
second) mlx5 infrastructure support and the 2nd adds the necessary bits
for mlx5e ptp logic and structures.

From Mohamad, add support for s-tagged packet receive when in promiscuous
mode.

Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM
(Ports capabilities mask register) registers infrastructure, those
registers are needed in order to query the different statistics registers
support in FW, in order for the driver to enable/disable query and
reporting them back to user.  On top of this infrastructure we've exposed
new set of statistics groups:
   - MPCNT: Physical layer statistical counters (For symbol errors)
   - PPCNT: PCIe performance counters

In addition to the statistics capabilities series we've moved the mlx5 HCA
capabilities fields to a dedicated struct under the driver private data.

At the end a small patch to update & query statistics in the most desired
order.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7d982567

Merge branch 'cpsw-common-res-usage' · cc154783

David S. Miller authored Jan 20, 2017

Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: correct common res usage

This series is intended to remove unneeded redundancies connected with
common resource usage function.

Since v1:
- changed name to cpsw_get_usage_count()
- added comments to open/closw for cpsw_get_usage_count()
- added patch:
  net: ethernet: ti: cpsw: clarify ethtool ops changing num of descs

Based on net-next/master
====================
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc154783

net: ethernet: ti: cpsw: clarify ethtool ops changing num of descs · 022d7ad7

Ivan Khoronzhuk authored Jan 19, 2017

After adding cpsw_set_ringparam ethtool op, better to carry out
common parts of similar ops splitting descriptors in runtime. It
allows to reuse these parts and shows what the ops actually do.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

022d7ad7

net: ethernet: ti: cpsw: don't duplicate common res in rx handler · fe734d0a

Ivan Khoronzhuk authored Jan 19, 2017

No need to duplicate the same function in rx handler to get info
if any interface is running.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe734d0a

net: ethernet: ti: cpsw: don't duplicate ndev_running · 03fd01ad

Ivan Khoronzhuk authored Jan 19, 2017

No need to create additional vars to identify if interface is running.
So simplify code by removing redundant var and checking usage counter
instead.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

03fd01ad

net: ethernet: ti: cpsw: don't disable interrupts in ndo_open · 176b0cbf

Ivan Khoronzhuk authored Jan 19, 2017

No need to disable interrupts if no open devices,
they are disabled anyway.

Even no need to disable interrupts if some ndev is opened, In this
case shared resources are not touched, only parameters of ndev shell,
so no reason to disable them also. Removed lines have proved it.

So, no need in redundant check and interrupt disable.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

176b0cbf

net: ethernet: ti: cpsw: remove dual check from common res usage function · aafc93a3

Ivan Khoronzhuk authored Jan 19, 2017

Common res usage is possible only in case an interface is
running. In case of not dual emac here can be only one interface,
so while ndo_open and switch mode, only one interface can be opened,
thus if open is called no any interface is running ... and no common
res are used. So remove check on dual emac, it will simplify
code/understanding and will match the name it's called.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

aafc93a3

Merge branch 'rxbusy' · e9f3a685

David S. Miller authored Jan 20, 2017

Mahesh Bandewar says:

====================
use netdev_is_rx_handler_busy() in few known cases

netdev_rx_handler_register() was recently split into two parts - (a) check
if the handler is used, (b) register the new handler, parts. This is
helpful in scenarios like bonding where at the time of registration there
is too much state to unwind and it should check if the device is free
before building that state. IPvlan and macvlan drivers don't have this
issue however it can make use of the same check instead of using a device
specific check.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e9f3a685

macvlan: use netdev_is_rx_handler_busy instead of checking specific type · 322dc6e0

Mahesh Bandewar authored Jan 18, 2017

netdev_is_rx_handler_busy() check is a superset of netif_is_ipvlan_port()
check and hence should be preferred.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

322dc6e0

ipvlan: use netdev_is_rx_handler_busy instead of checking specific type · c3262d9d

Mahesh Bandewar authored Jan 18, 2017

IPvlan checks if the master device is already used by checking a
specific device (here it's macvlan device). This is technically not
sufficient and it should just ensure the rx_handler is busy or not.
This would be a super check that includes macvlan and any other that
has already registered rx-handler.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3262d9d

net: remove duplicate code. · 1b7cd004

Mahesh Bandewar authored Jan 18, 2017

netdev_rx_handler_register() checks to see if the handler is already
busy which was recently separated into netdev_is_rx_handler_busy(). So
use the same function inside register() to avoid code duplication.
Essentially this change should be a no-op
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b7cd004

fq_codel: Avoid regenerating skb flow hash unless necessary · 264b87fa

Andrew Collins authored Jan 18, 2017

The fq_codel qdisc currently always regenerates the skb flow hash.
This wastes some cycles and prevents flow seperation in cases where
the traffic has been encrypted and can no longer be understood by the
flow dissector.

Change it to use the prexisting flow hash if one exists, and only
regenerate if necessary.
Signed-off-by: Andrew Collins <acollins@cradlepoint.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

264b87fa

vxlan: preserve type of dst_port parm for encap_bypass_if_local() · 1f6cc07e

Lance Richardson authored Jan 18, 2017

Eliminate sparse warning by maintaining type of dst_port
as __be16.
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f6cc07e

csum: eliminate sparse warning in remcsum_unadjust() · 22fbece1

Lance Richardson authored Jan 18, 2017

Cast second parameter of csum_sub() from __sum16 to __wsum.
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22fbece1

Merge branch 'tipc-multicast-through-replication' · 8d00e202

David S. Miller authored Jan 20, 2017

Jon Maloy says:

====================
tipc: emulate multicast through replication

TIPC multicast messages are currently distributed via L2 broadcast
or IP multicast to all nodes in the cluster, irrespective of the
number of real destinations of the message.

In this series we introduce an option to transport messages via
replication ("replicast") across a selected number of unicast links,
instead of relying on the underlying media. This option is used when
true broadcast/multicast is not supported by the media, or when the
number of true destinations is much smaller than the cluster size.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8d00e202

tipc: make replicast a user selectable option · 01fd12bb

Jon Paul Maloy authored Jan 18, 2017

If the bearer carrying multicast messages supports broadcast, those
messages will be sent to all cluster nodes, irrespective of whether
these nodes host any actual destinations socket or not. This is clearly
wasteful if the cluster is large and there are only a few real
destinations for the message being sent.

In this commit we extend the eligibility of the newly introduced
"replicast" transmit option. We now make it possible for a user to
select which method he wants to be used, either as a mandatory setting
via setsockopt(), or as a relative setting where we let the broadcast
layer decide which method to use based on the ratio between cluster
size and the message's actual number of destination nodes.

In the latter case, a sending socket must stick to a previously
selected method until it enters an idle period of at least 5 seconds.
This eliminates the risk of message reordering caused by method change,
i.e., when changes to cluster size or number of destinations would
otherwise mandate a new method to be used.
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01fd12bb

tipc: introduce replicast as transport option for multicast · a853e4c6

Jon Paul Maloy authored Jan 18, 2017

TIPC multicast messages are currently carried over a reliable
'broadcast link', making use of the underlying media's ability to
transport packets as L2 broadcast or IP multicast to all nodes in
the cluster.

When the used bearer is lacking that ability, we can instead emulate
the broadcast service by replicating and sending the packets over as
many unicast links as needed to reach all identified destinations.
We now introduce a new TIPC link-level 'replicast' service that does
this.
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a853e4c6

tipc: add functionality to lookup multicast destination nodes · 2ae0b8af

Jon Paul Maloy authored Jan 18, 2017

As a further preparation for the upcoming 'replicast' functionality,
we add some necessary structs and functions for looking up and returning
a list of all nodes that host destinations for a given multicast message.
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ae0b8af

tipc: add function for checking broadcast support in bearer · 9999974a

Jon Paul Maloy authored Jan 18, 2017

As a preparation for the 'replicast' functionality we are going to
introduce in the next commits, we need the broadcast base structure to
store whether bearer broadcast is available at all from the currently
used bearer or bearers.

We do this by adding a new function tipc_bearer_bcast_support() to
the bearer layer, and letting the bearer selection function in
bcast.c use this to give a new boolean field, 'bcast_support' the
appropriate value.
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9999974a

bpf: add bpf_probe_read_str helper · a5e8c070

Gianluca Borello authored Jan 18, 2017

Provide a simple helper with the same semantics of strncpy_from_unsafe():

int bpf_probe_read_str(void *dst, int size, const void *unsafe_addr)

This gives more flexibility to a bpf program. A typical use case is
intercepting a file name during sys_open(). The current approach is:

SEC("kprobe/sys_open")
void bpf_sys_open(struct pt_regs *ctx)
{
	char buf[PATHLEN]; // PATHLEN is defined to 256
	bpf_probe_read(buf, sizeof(buf), ctx->di);

	/* consume buf */
}

This is suboptimal because the size of the string needs to be estimated
at compile time, causing more memory to be copied than often necessary,
and can become more problematic if further processing on buf is done,
for example by pushing it to userspace via bpf_perf_event_output(),
since the real length of the string is unknown and the entire buffer
must be copied (and defining an unrolled strnlen() inside the bpf
program is a very inefficient and unfeasible approach).

With the new helper, the code can easily operate on the actual string
length rather than the buffer size:

SEC("kprobe/sys_open")
void bpf_sys_open(struct pt_regs *ctx)
{
	char buf[PATHLEN]; // PATHLEN is defined to 256
	int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di);

	/* consume buf, for example push it to userspace via
	 * bpf_perf_event_output(), but this time we can use
	 * res (the string length) as event size, after checking
	 * its boundaries.
	 */
}

Another useful use case is when parsing individual process arguments or
individual environment variables navigating current->mm->arg_start and
current->mm->env_start: using this helper and the return value, one can
quickly iterate at the right offset of the memory area.

The code changes simply leverage the already existent
strncpy_from_unsafe() kernel function, which is safe to be called from a
bpf program as it is used in bpf_trace_printk().
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5e8c070

Merge branch 'bus-agnostic-num-vf' · 07604628

David S. Miller authored Jan 20, 2017

Phil Sutter says:

====================
Retrieve number of VFs in a bus-agnostic way

Previously, it was assumed that only PCI NICs would be capable of having
virtual functions - with my proposed enhancement of dummy NIC driver
implementing (fake) ones for testing purposes, this is no longer true.

Discussion of said patch has led to the suggestion of implementing a
bus-agnostic method for VF count retrieval so rtnetlink could work with
both real VF-capable PCI NICs as well as my dummy modifications without
introducing ugly hacks.

The following series tries to achieve just that by introducing a bus
type callback to retrieve a device's number of VFs, implementing this
callback for PCI bus and finally adjusting rtnetlink to make use of the
generalized infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

07604628

device: Implement a bus agnostic dev_num_vf routine · 9af15c38

Phil Sutter authored Jan 18, 2017

Now that pci_bus_type has num_vf callback set, dev_num_vf can be
implemented in a bus type independent way and the check for whether a
PCI device is being handled in rtnetlink can be dropped.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

9af15c38

PCI: implement num_vf bus type callback · 02e0bea6

Phil Sutter authored Jan 18, 2017

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

02e0bea6

device: bus_type: Introduce num_vf callback · 582a686f

Phil Sutter authored Jan 18, 2017

This allows for bus types to implement their own method of retrieving
the number of virtual functions a NIC on that type of bus supports.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

582a686f

sock: use hlist_entry_safe · 6c59ebd3

Geliang Tang authored Jan 20, 2017

Use hlist_entry_safe() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c59ebd3

gre6: Clean up unused struct ipv6_tel_txoption definition · c10aa71b

Jakub Sitnicki authored Jan 20, 2017

Commit b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE
functions") removed the ip6gre specific transmit function, but left the
struct ipv6_tel_txoption definition. Clean it up.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c10aa71b

net: remove bh disabling around percpu_counter accesses · c2a2efbb

Eric Dumazet authored Jan 20, 2017

Shaohua Li made percpu_counter irq safe in commit 098faf58
("percpu_counter: make APIs irq safe")

We can safely remove BH disable/enable sections around various
percpu_counter manipulations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2a2efbb

cxgb4: hide unused warnings · 0a327889

Arnd Bergmann authored Jan 18, 2017

The two new variables are only used inside of an #ifdef and cause
harmless warnings when that is disabled:

drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'init_one':
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:9: error: unused variable 'port_vec' [-Werror=unused-variable]
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:6: error: unused variable 'v' [-Werror=unused-variable]

This adds another #ifdef around the declarations.

Fixes: 96fe11f2 ("cxgb4: Implement ndo_get_phys_port_id for mgmt dev")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a327889

net: ipv6: Keep nexthop of multipath route on admin down · a1a22c12

David Ahern authored Jan 18, 2017

IPv6 deletes route entries associated with multipath routes on an
admin down where IPv4 does not. For example:
    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24 metric 64
            nexthop via 10.100.1.254  dev eth1 weight 1
            nexthop via 10.100.2.254  dev eth2 weight 1
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.4
    10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2:: dev red proto none metric 0  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024  pref medium
    2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
    ...

Set link down:
    $ ip li set eth1 down

IPv4 retains the multihop route but flags eth1 route as dead:

    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24
            nexthop via 10.100.1.16  dev eth1 weight 1 dead linkdown
            nexthop via 10.100.2.16  dev eth2 weight 1
    10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4

and IPv6 deletes the route as part of flushing all routes for the device:

    $ ip -6 ro ls vrf red
    2001:db8:2:: dev red proto none metric 0  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
    ...

Worse, on admin up of the device the multipath route has to be deleted
to get this leg of the route re-added.

This patch keeps routes that are part of a multipath route if
ignore_routes_with_linkdown is set with the dead and linkdown flags
enabling consistency between IPv4 and IPv6:

    $ ip -6 ro ls vrf red
    2001:db8:2:: dev red proto none metric 0  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024 dead linkdown  pref medium
    2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
    ...
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1a22c12

mlx4: support __GFP_MEMALLOC for rx · dceeab0e

Eric Dumazet authored Jan 17, 2017

Commit 04aeb56a ("net/mlx4_en: allocate non 0-order pages for RX
ring with __GFP_NOMEMALLOC") added code that appears to be not needed at
that time, since mlx4 never used __GFP_MEMALLOC allocations anyway.

As using memory reserves is a must in some situations (swap over NFS or
iSCSI), this patch adds this flag.

Note that this driver does not reuse pages (yet) so we do not have to
add anything else.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dceeab0e