Commits · c816061d27659cb666f8fd9a67669757f2c04a55 · Kirill Smelkov / linux

18 Nov, 2016 32 commits

liquidio CN23XX: bitwise vs logical AND typo · c816061d

Dan Carpenter authored Nov 18, 2016

We obviously intended a bitwise AND here, not a logical one.

Fixes: 8c978d05 ("liquidio CN23XX: Mailbox support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c816061d

lan78xx: relocate mdix setting to phy driver · f6e3ef3e

Woojung Huh authored Nov 17, 2016

Relocate mdix code to phy driver to be called at config_init().
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6e3ef3e

Merge branch 'net-marvell-freescale-compile-test' · 82e527df

David S. Miller authored Nov 18, 2016

Florian Fainelli says:

====================
net: Enable COMPILE_TEST for Marvell & Freescale drivers

This patch series allows building the Freescale and Marvell Ethernet network
drivers with COMPILE_TEST.

Changes in v4:

- add proper HAS_DMA to fix build errors on m32r
- provide an inline stub for mvebu_mbus_get_dram_win_info
- added an additional patch to fix build errors with mv88e6xxx on m32r

Changes in v3:

- reorder patches to avoid introducing a build warning between commits

Changes in v2:

- rename register define clash when building for i386 (spotted by LKP)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

82e527df

net: dsa: mv88e6xxx: Select IRQ_DOMAIN · 0717b876

Florian Fainelli authored Nov 17, 2016

Some architectures may not define IRQ_DOMAIN (like m32r), fixes
undefined references to IRQ_DOMAIN functions.

Fixes: dc30c35b ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0717b876

net: marvell: Allow drivers to be built with COMPILE_TEST · a0627f77

Florian Fainelli authored Nov 17, 2016

All Marvell Ethernet drivers actually build fine with COMPILE_TEST with
a few warnings. We need to add a few HAS_DMA dependencies to fix linking
failures on problematic architectures like m32r.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0627f77

bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info · 603ab573

Florian Fainelli authored Nov 17, 2016

In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST,
provide an inline stub for mvebu_mbus_get_dram_win_info().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

603ab573

net: fsl: Allow most drivers to be built with COMPILE_TEST · 0827be21

Florian Fainelli authored Nov 17, 2016

There are only a handful of Freescale Ethernet drivers that don't
actually build with COMPILE_TEST:

* FEC, for which we would need to define a default register layout if no
  supported architecture is defined

* UCC_GETH which depends on PowerPC cpm.h header (which could be moved
  to a generic location)

* GIANFAR needs to depend on HAS_DMA to fix linking failures on some
  architectures (like m32r)

We need to fix an unmet dependency to get there though:
warning: (FSL_XGMAC_MDIO) selects OF_MDIO which has unmet direct
dependencies (OF && PHYLIB)

which would result in CONFIG_OF_MDIO=[ym] without CONFIG_OF to be set.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0827be21

net: gianfar_ptp: Rename FS bit to FIPERST · 00a19e55

Florian Fainelli authored Nov 17, 2016

FS is a global symbol used by the x86 32-bit architecture, fixes builds
re-definitions:

>> drivers/net/ethernet/freescale/gianfar_ptp.c:75:0: warning: "FS"
>> redefined
    #define FS                    (1<<28) /* FIPER start indication */

   In file included from arch/x86/include/uapi/asm/ptrace.h:5:0,
                    from arch/x86/include/asm/ptrace.h:6,
                    from arch/x86/include/asm/math_emu.h:4,
                    from arch/x86/include/asm/processor.h:11,
                    from include/linux/mutex.h:19,
                    from include/linux/kernfs.h:13,
                    from include/linux/sysfs.h:15,
                    from include/linux/kobject.h:21,
                    from include/linux/device.h:17,
                    from
drivers/net/ethernet/freescale/gianfar_ptp.c:23:
   arch/x86/include/uapi/asm/ptrace-abi.h:15:0: note: this is the
location of the previous definition
    #define FS 9
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00a19e55

amd-xgbe: Update connection validation for backplane mode · 5a4e4c8f

Lendacky, Thomas authored Nov 17, 2016

Update the connection type enumeration for backplane mode and return
an error when there is a mismatch between the mode and the connection
type.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a4e4c8f

Merge branch 'ethtool-phy-downshift' · d3c19c0a

David S. Miller authored Nov 18, 2016

Allan W. Nielsen says:

====================
Adding PHY-Tunables and downshift support

(This is a re-post of the v3 patch set with a new cover letter - I was not
aware that the cover letters was used a commit comments in merge commits).

This series add support for PHY tunables, and uses this facility to
configure downshifting. The downshifting mechanism is implemented for MSCC
phys.

This series tries to address the comments provided back in mid October when
this feature was posted along with fast-link-failure. Fast-link-failure has
been separated out, but we would like to pick continue on that if/when we
agree on how the phy-tunables and downshifting should be done.

The proposed generic interface is similar to
ETHTOOL_GTUNABLE/ETHTOOL_STUNABLE, it uses the same type
(ethtool_tunable/tunable_type_id) but a new enum (phy_tunable_id) is added
to reflect the PHY tunable.

The implementation just call the newly added function pointers in
get_tunable/set_tunable phy_device structure.

To configure downshifting, the ethtool_tunable structure is used. 'id' must
be set to 'ETHTOOL_PHY_DOWNSHIFT', 'type_id' must be set to
'ETHTOOL_TUNABLE_U8' and 'data' value configure the amount of downshift
re-tries.

If configured to DOWNSHIFT_DEV_DISABLE, then downshift is disabled If
configured to DOWNSHIFT_DEV_DEFAULT_COUNT, then it is up to the device to
choose a device-specific re-try count.

Tested on Beaglebone Black with VSC 8531 PHY.

Change set:
v0:

- Link Speed downshift and Fast Link failure-2 features coded by using
  Device tree.
v1:
- Split the Downshift and FLF2 features in different set of patches.
- Removed DT access and implemented IOCTL access suggested by Andrew.
- Added function pointers in get_tunable/set_tunable phy_device structure
v2:
- Added trace message with a hist is printed when downshifting clould not
  be eanbled with the requested count
- (ethtool) Syntax is changed from "--set-phy-tunable downshift on|off|%d"
  to "--set-phy-tunable [downshift on|off [count N]]" - as requested by
  Andrew.
v3:
- Fixed Spelling in "net: phy: Add downshift get/set support in Microsemi
  PHYs driver"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d3c19c0a

net: phy: Add downshift get/set support in Microsemi PHYs driver · 310d9ad5

Raju Lakkaraju authored Nov 17, 2016

Implements the phy tunable function pointers and implement downshift
functionality for MSCC PHYs.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

310d9ad5

ethtool: Core impl for ETHTOOL_PHY_DOWNSHIFT tunable · 65feddd5

Raju Lakkaraju authored Nov 17, 2016

Adding validation support for the ETHTOOL_PHY_DOWNSHIFT. Functional
implementation needs to be done in the individual PHY drivers.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65feddd5

ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables · 607c7029

Raju Lakkaraju authored Nov 17, 2016

For operation in cabling environments that are incompatible with
1000BASE-T, PHY device may provide an automatic link speed downshift
operation. When enabled, the device automatically changes its 1000BASE-T
auto-negotiation to the next slower speed after a configured number of
failed attempts at 1000BASE-T.  This feature is useful in setting up in
networks using older cable installations that include only pairs A and B,
and not pairs C and D.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

607c7029

ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE · 968ad9da

Raju Lakkaraju authored Nov 17, 2016

Adding get_tunable/set_tunable function pointer to the phy_driver
structure, and uses these function pointers to implement the
ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

968ad9da

ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLE · 0d27f4e4

Raju Lakkaraju authored Nov 17, 2016

Defines a generic API to get/set phy tunables. The API is using the
existing ethtool_tunable/tunable_type_id types which is already being used
for mac level tunables.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d27f4e4

Merge branch 'mlx5-next' · 511d5d5b

David S. Miller authored Nov 18, 2016

Saeed Mahameed says:

====================
Mellanox 100G mlx5 update 2016-11-15

This series contains four humble mlx5 features.

From Gal,
 - Add the support for PCIe statistics and expose them in ethtool

From Huy,
 - Add the support for port module events reporting and statistics
 - Add the support for driver version setting into FW (for display purposes only)

From Mohamad,
 - Extended the command interface cache flexibility

This series was generated against commit
6a02f5eb ("Merge branch 'mlxsw-i2c")

V2:
 - Changed plain "unsigned" to "unsigned int"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

511d5d5b

net/mlx5e: Expose PCIe statistics to ethtool · 9c726239

Gal Pressman authored Nov 17, 2016

This patch exposes two groups of PCIe counters:
- Performance counters.
- Timers and states counters.
Queried with ethtool -S <devname>.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c726239

net/mlx5: Add MPCNT register infrastructure · 7f503169

Gal Pressman authored Nov 17, 2016

Add the needed infrastructure for future use of MPCNT register.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f503169

net/mlx5: Set driver version into firmware · 012e50e1

Huy Nguyen authored Nov 17, 2016

If driver_version capability bit is enabled, set driver version
to firmware after the init HCA command, for display purposes.

Example of driver version: "Linux,mlx5_core,3.0-1"
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

012e50e1

net/mlx5: Set driver version infrastructure · 0dbc6fe0

Saeed Mahameed authored Nov 17, 2016

Add driver_version capability bit is enabled, and set driver
version command in mlx5_ifc firmware header.  The only purpose
of this command is to store a driver version/OS string in FW
to be reported and displayed in various management systems,
such as IPMI/BMC.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0dbc6fe0

net/mlx5e: Add port module event counters to ethtool stats · bedb7c90

Huy Nguyen authored Nov 17, 2016

Add port module event counters to ethtool -S command
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bedb7c90

net/mlx5: Add handling for port module event · d4eb4cd7

Huy Nguyen authored Nov 17, 2016

For each asynchronous port module event:
  1. print with ratelimit to the dmesg log
  2. increment the corresponding event counter
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4eb4cd7

net/mlx5: Port module event hardware structures · 4ce3bf2f

Huy Nguyen authored Nov 17, 2016

Add hardware structures and constants definitions needed for module
events support.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ce3bf2f

net/mlx5: Make the command interface cache more flexible · 0ac3ea70

Mohamad Haj Yahia authored Nov 17, 2016

Add more cache command size sets and more entries for each set based on
the current commands set different sizes and commands frequency.

Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ac3ea70

Merge branch 'sfc-tso-v2' · 7a8bca04

David S. Miller authored Nov 18, 2016

Edward Cree says:

====================
sfc: Firmware-Assisted TSO version 2

The firmware on 8000 series SFC NICs supports a new TSO API ("FATSOv2"), and
 7000 series NICs will also support this in an imminent release.  This series
 adds driver support for this TSO implementation.
The series also removes SWTSO, as it's now equivalent to GSO.  This does not
 actually remove very much code, because SWTSO was grotesquely intertwingled
 with FATSOv1, which will also be removed once 7000 series supports FATSOv2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7a8bca04

sfc: remove Software TSO · 46d1efd8

Edward Cree authored Nov 17, 2016

It gives no advantage over GSO now that xmit_more exists. If we find
ourselves unable to handle a TSO skb (because our TXQ doesn't have a
TSOv2 context and the NIC doesn't support TSOv1), hand it back to GSO.
Also do that if the TSO handler fails with EINVAL for any other reason.
As Falcon-architecture NICs don't support any firmware-assisted TSO,
they no longer advertise TSO feature flags at all.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46d1efd8

sfc: handle failure to allocate TSOv2 contexts · e638ee1d

Edward Cree authored Nov 17, 2016

If we fail to init the TXQ because of insufficient TSOv2 contexts,
try again with TSOv2 disabled.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e638ee1d

sfc: Firmware-Assisted TSO version 2 · e9117e50

Bert Kenward authored Nov 17, 2016

Add support for FATSOv2 to the driver. FATSOv2 offloads far more of the task
 of TCP segmentation to the firmware, such that we now just pass a single
 super-packet to the NIC. This means TSO has a great deal in common with a
 normal DMA transmit, apart from adding a couple of option descriptors.
 NIC-specific checks have been moved off the fast path and in to
 initialisation where possible.

This also moves FATSOv1/SWTSO to a new file (tx_tso.c).  The end of transmit
 and some error handling is now outside TSO, since it is common with other
 code.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9117e50

sfc: Update EF10 register definitions · e17705c4

Edward Cree authored Nov 17, 2016

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e17705c4

sfc: Update MCDI protocol definitions · ece0cc17

Edward Cree authored Nov 17, 2016

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ece0cc17

netns: make struct pernet_operations::id unsigned int · c7d03a00

Alexey Dobriyan authored Nov 17, 2016

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

	void f(long *p, int i)
	{
		g(p[i]);
	}

  roughly translates to

	movsx	rsi, esi
	mov	rdi, [rsi+...]
	call 	g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

	static inline void *net_generic(const struct net *net, int id)
	{
		...
		ptr = ng->ptr[id - 1];
		...
	}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
	function                                     old     new   delta
	nfsd4_lock                                  3886    3959     +73
	tipc_link_build_proto_msg                   1096    1140     +44
	mac80211_hwsim_new_radio                    2776    2808     +32
	tipc_mon_rcv                                1032    1058     +26
	svcauth_gss_legacy_init                     1413    1429     +16
	tipc_bcbase_select_primary                   379     392     +13
	nfsd4_exchange_id                           1247    1260     +13
	nfsd4_setclientid_confirm                    782     793     +11
		...
	put_client_renew_locked                      494     480     -14
	ip_set_sockfn_get                            730     716     -14
	geneve_sock_add                              829     813     -16
	nfsd4_sequence_done                          721     703     -18
	nlmclnt_lookup_host                          708     686     -22
	nfsd4_lockt                                 1085    1063     -22
	nfs_get_client                              1077    1050     -27
	tcf_bpf_init                                1106    1076     -30
	nfsd4_encode_fattr                          5997    5930     -67
	Total: Before=154856051, After=154854321, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7d03a00

udp: enable busy polling for all sockets · e68b6e50

Eric Dumazet authored Nov 16, 2016

UDP busy polling is restricted to connected UDP sockets.

This is because sk_busy_loop() only takes care of one NAPI context.

There are cases where it could be extended.

1) Some hosts receive traffic on a single NIC, with one RX queue.

2) Some applications use SO_REUSEPORT and associated BPF filter
   to split the incoming traffic on one UDP socket per RX
queue/thread/cpu

3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()

This patch records the napi_id of first received skb, giving more
reach to busy polling.

Tested:

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read

lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done

Before patch :
   27867   28870   37324   41060   41215
   36764   36838   44455   41282   43843
After patch :
   73920   73213   70147   74845   71697
   68315   68028   75219   70082   73707
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e68b6e50

17 Nov, 2016 8 commits

Merge branch 'rds-ha-failover-fixes' · fcd2b0da

David S. Miller authored Nov 17, 2016

Sowmini Varadhan says:

====================
RDS: TCP: HA/Failover fixes

This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:

 while (1); do
   # modprobe rds-tcp on test nodes
   # run rds-stress in bi-dir mode between test machine pair
   # modprobe -r rds-tcp on test nodes
 done

rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the
bugs fixed in this series.

Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the  test overnight, without any issues.

Each patch has a detailed description of the root-cause fixed
by the patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fcd2b0da

RDS: TCP: Force every connection to be initiated by numerically smaller IP address · 1a0e100f

Sowmini Varadhan authored Nov 16, 2016

When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").

The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.

This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.

A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a0e100f

RDS: TCP: Track peer's connection generation number · 905dd418

Sowmini Varadhan authored Nov 16, 2016

The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
(b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().

In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.

To achieve this, the RDS handshake probe added as part of
commit 5916e2c1 ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

905dd418

RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list · 315ca6d9

Sowmini Varadhan authored Nov 16, 2016

As noted in rds_recv_incoming() sequence numbers on data packets
can decreas for the failover case, and the Rx path is equipped
to recover from this, if the RDS_FLAG_RETRANSMITTED is set
on the rds header of an incoming message with a suspect sequence
number.

The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
flag in the rds_message, so make sure the flag is set on messages
queued for retransmission.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

315ca6d9

net: stmmac: replace if (netif_msg_type) by their netif_xxx counterpart · b3e51069

LABBE Corentin authored Nov 16, 2016

As sugested by Joe Perches, we could replace all
if (netif_msg_type(priv)) dev_xxx(priv->devices, ...)
by the simpler macro netif_xxx(priv, hw, priv->dev, ...)
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3e51069

net: stmmac: replace hardcoded function name by __func__ · de9a2165

LABBE Corentin authored Nov 16, 2016

Some printing have the function name hardcoded.
It is better to use __func__ instead.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de9a2165

net: stmmac: replace all pr_xxx by their netdev_xxx counterpart · 38ddc59d

LABBE Corentin authored Nov 16, 2016

The stmmac driver use lots of pr_xxx functions to print information.
This is bad since we cannot know which device logs the information.
(moreover if two stmmac device are present)

Furthermore, it seems that it assumes wrongly that all logs will always
be subsequent by using a dev_xxx then some indented pr_xxx like this:
kernel: sun7i-dwmac 1c50000.ethernet: no reset control found
kernel:  Ring mode enabled
kernel:  No HW DMA feature register supported
kernel:  Normal descriptors
kernel:  TX Checksum insertion supported

So this patch replace all pr_xxx by their netdev_xxx counterpart.
Excepts for some printing where netdev "cause" unpretty output like:
sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found
In those case, I keep dev_xxx.

In the same time I remove some "stmmac:" print since
this will be a duplicate with that dev_xxx displays.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38ddc59d

net_sched: sch_fq: use hash_ptr() · 29c58472

Eric Dumazet authored Nov 17, 2016

When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful,
and I chose hash_32().

Linus Torvalds and George Spelvin fixed this issue, so we can
use hash_ptr() to get more entropy on 64bit arches with Terabytes
of memory, and avoid the cast games.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29c58472