Commits · ed1e8679d8bc6537077d1f24bc83b396f6062f09 · nexedi / linux

29 Sep, 2016 12 commits

rxrpc: Note serial number being ACK'd in the congestion management trace · ed1e8679

David Howells authored Sep 29, 2016

Note the serial number of the packet being ACK'd in the congestion
management trace rather than the serial number of the ACK packet. Whilst
the serial number of the ACK packet is useful for matching ACK packet in
the output of wireshark, the serial number that the ACK is in response to
is of more use in working out how different trace lines relate.
Signed-off-by: David Howells <dhowells@redhat.com>

ed1e8679

rxrpc: Request more ACKs in slow-start mode · b112a670

David Howells authored Sep 29, 2016

Set the request-ACK on more DATA packets whilst we're in slow start mode so
that we get sufficient ACKs back to supply information to configure the
window.
Signed-off-by: David Howells <dhowells@redhat.com>

b112a670

rxrpc: Reduce the rxrpc_local::services list to a pointer · 1e9e5c95

David Howells authored Sep 29, 2016

Reduce the rxrpc_local::services list to just a pointer as we don't permit
multiple service endpoints to bind to a single transport endpoints (this is
excluded by rxrpc_lookup_local()).

The reason we don't allow this is that if you send a request to an AFS
filesystem service, it will try to talk back to your cache manager on the
port you sent from (this is how file change notifications are handled). To
prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
sockets share a UDP socket if at least one of them has a service bound.
Signed-off-by: David Howells <dhowells@redhat.com>

1e9e5c95

rxrpc: When activating client conn channels, do state check inside lock · 2629c7fa

David Howells authored Sep 29, 2016

In rxrpc_activate_channels(), the connection cache state is checked outside
of the lock, which means it can change whilst we're waking calls up,
thereby changing whether or not we're allowed to wake calls up.

Fix this by moving the check inside the locked region.  The check to see if
all the channels are currently busy can stay outside of the locked region.

Whilst we're at it:

 (1) Split the locked section out into its own function so that we can call
     it from other places in a later patch.

 (2) Determine the mask of channels dependent on the state as we're going
     to add another state in a later patch that will restrict the number of
     simultaneous calls to 1 on a connection.
Signed-off-by: David Howells <dhowells@redhat.com>

2629c7fa

rxrpc: Make Tx loss-injection go through normal return and adjust tracing · a1767077

David Howells authored Sep 29, 2016

In rxrpc_send_data_packet() make the loss-injection path return through the
same code as the transmission path so that the RTT determination is
initiated and any future timer shuffling will be done, despite the packet
having been binned.

Whilst we're at it:

 (1) Add to the tx_data tracepoint an indication of whether or not we're
     retransmitting a data packet.

 (2) When we're deciding whether or not to request an ACK, rather than
     checking if we're in fast-retransmit mode check instead if we're
     retransmitting.

 (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
     not altering the sk_buff refcount nor are we just seeing it after
     getting it off the Tx list.

 (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.

 (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
Signed-off-by: David Howells <dhowells@redhat.com>

a1767077

rxrpc: Fix exclusive client connections · 8732db67

David Howells authored Sep 29, 2016

Exclusive connections are currently reusable (which they shouldn't be)
because rxrpc_alloc_client_connection() checks the exclusive flag in the
rxrpc_connection struct before it's initialised from the function
parameters. This means that the DONT_REUSE flag doesn't get set.

Fix this by checking the function parameters for the exclusive flag.
Signed-off-by: David Howells <dhowells@redhat.com>

8732db67

Merge branch 'qcom-emac-acpi' · 31fbe81f

David S. Miller authored Sep 29, 2016

Timur Tabi says:

====================
Add basic ACPI support to the Qualcomm Technologies EMAC driver

This patch series adds support to the EMAC driver for extracting addresses,
interrupts, and some _DSDs (properties) from ACPI.  The first two patches
clean up the code, and the third patch adds ACPI-specific functionality.

The first patch fixes a bug with handling the platform_device for the
internal PHY.  This phy is treated as a separate device in both DT and
ACPI, but since the platform is not released automatically when the
driver unloads, managed functions like devm_ioremap_resource cannot be
used.

The second patch replaces of_get_mac_address with its platform-independent
equivalent device_get_mac_address.

The third patch parses the ACPI tables to obtain the platform_device for
the primary EMAC node ("QCOM8070") and the internal phy node ("QCOM8071").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

31fbe81f

net: qcom/emac: initial ACPI support · 5f3d3807

Timur Tabi authored Sep 28, 2016

Add support for reading addresses, interrupts, and _DSD properties
from ACPI tables, just like with device tree.  The HID for the
EMAC device itself is QCOM8070.  The internal PHY is represented
by a child node with a HID of QCOM8071.

The EMAC also has some complex clock initialization requirements
that are not represented by this patch.  This will be addressed
in a future patch.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f3d3807

net: qcom/emac: use device_get_mac_address · 0de709ac

Timur Tabi authored Sep 28, 2016

Replace the DT-specific of_get_mac_address() function with
device_get_mac_address, which works on both DT and ACPI platforms.  This
change makes it easier to add ACPI support.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0de709ac

net: qcom/emac: do not use devm on internal phy pdev · 54e19bc7

Timur Tabi authored Sep 28, 2016

The platform_device returned by of_find_device_by_node() is not
automatically released when the driver unprobes.  Therefore,
managed calls like devm_ioremap_resource() should not be used.
Instead, we manually allocate the resources and then free them
on driver release.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

54e19bc7

bpf: allow access into map value arrays · 48461135

Josef Bacik authored Sep 28, 2016

Suppose you have a map array value that is something like this

struct foo {
	unsigned iter;
	int array[SOME_CONSTANT];
};

You can easily insert this into an array, but you cannot modify the contents of
foo->array[] after the fact.  This is because we have no way to verify we won't
go off the end of the array at verification time.  This patch provides a start
for this work.  We accomplish this by keeping track of a minimum and maximum
value a register could be while we're checking the code.  Then at the time we
try to do an access into a MAP_VALUE we verify that the maximum offset into that
region is a valid access into that memory region.  So in practice, code such as
this

unsigned index = 0;

if (foo->iter >= SOME_CONSTANT)
	foo->iter = index;
else
	index = foo->iter++;
foo->array[index] = bar;

would be allowed, as we can verify that index will always be between 0 and
SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
check to make sure the index isn't less than 0, or do something like index %=
SOME_CONSTANT.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

48461135

net: do not export sk_stream_write_space · 7836667c

Eric Dumazet authored Sep 28, 2016

Since commit 900f65d3 ("tcp: move duplicate code from
tcp_v4_init_sock()/tcp_v6_init_sock()") we no longer need
to export sk_stream_write_space()

From: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7836667c

28 Sep, 2016 18 commits

tcp: Change txhash on every SYN and RTO retransmit · 3acf3ec3

Lawrence Brakmo authored Sep 27, 2016

The current code changes txhash (flowlables) on every retransmitted
SYN/ACK, but only after the 2nd retransmitted SYN and only after
tcp_retries1 RTO retransmits.

With this patch:
1) txhash is changed with every SYN retransmits
2) txhash is changed with every RTO.

The result is that we can start re-routing around failed (or very
congested paths) as soon as possible. Otherwise application health
checks may fail and the connection may be terminated before we start
to change txhash.

v4: Removed sysctl, txhash is changed for all RTOs
v3: Removed text saying default value of sysctl is 0 (it is 100)
v2: Added sysctl documentation and cleaned code

Tested with packetdrill tests
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3acf3ec3

net/sched: pkt_cls: change tc actions order to be as the user sets · fa5effe7

Hadar Hen Zion authored Sep 27, 2016

Currently the created tc actions list is reversed against the order
set by the user.
Change the actions list order to be the same as was set by the user.

This patch doesn't affect dump actions behavior.
For dumping, action->order parameter is used so the list order doesn't
matter.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa5effe7

sh_eth: add R8A7743/5 support · c099ff3c

Sergei Shtylyov authored Sep 27, 2016

Add support for the first two members of the Renesas RZ/G family, RZ/G1M/E
(also known as  R8A7743/5). The Ether core is the same as in the R-Car gen2
SoCs, so will share the code/data with them...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

c099ff3c

Merge branch 'fib-offload-notifications' · 9c5982fe

David S. Miller authored Sep 28, 2016

Jiri Pirko says:

====================
fib offload: switch to notifier

The goal of this patchset is to allow driver to propagate all prefixes
configured in kernel down HW. This is necessary for routing to work
as expected. If we don't do that HW might forward prefixes known to kernel
incorrectly. Take an example when default route is set in switch HW and there
is an IP address set on a management (non-switch) port.

Currently, only FIB entries related to the switch port netdev are
offloaded using switchdev ops. This model is not extendable so the
first patch introduces a replacement: notifier to propagate FIB entry
additions and removals to whoever is interested.

The second patch introduces couple of helpers to deal with RTNH_F_OFFLOAD
flags. Currently it is set in switchdev core. There the assumption is
that only one offload device exists. But for FIB notifier, we assume
multiple offload devices. So the patch introduces a per FIB entry
reference counter and helpers use it in order to achieve this:
   0 means RTNH_F_OFFLOAD is not set, no device offloads this entry
   n means RTNH_F_OFFLOAD is set and the entry is offloaded by n devices

Patches 3 and 4 convert mlxsw and rocker to adopt this new way, registering
one notifier block for each asic instance. Both of these patches also
implement internal "abort" mechanism.

Using switchdev ops, "abort" is called by switchdev core whenever there is
an error during FIB entry add offload. This leads to removal of all
offloaded entries on system by fib_trie code.

Now the new notifier assumes the driver takes care of the abort action.
Here's why:
1) The fact that one HW cannot offload an entry does not mean that the
   others can't do it. So let only one entity to abort and leave the rest
   to work happily.
2) The driver knows what to in order to properly abort. For example,
   currently abort is broken for mlxsw, as for Spectrum there is a need
   to set 0.0.0.0/0 trap in RALUE register.

The fifth patch removes the old, no longer used FIB offload infrastructure.

The last patch reflects the changes into switchdev documentation file.

---
v2->v3:
 -patch 3/6
   -fixed offload inc/dec to be done in fib4_entry_init/fini and only
    in case !trap as suggested by Ido
v1->v2:
 -patch 3/6:
   -fixed lpm tree setup and binding for abort and pointed out by Ido
   -do nexthop checks as suggested by Ido
   -fix use after free during abort
 -patch 6/6:
   -fixed texts as suggested by Ido
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9c5982fe