Commits · 35b1de5579704be0e03454f713dddd6f86eccb7e · Kirill Smelkov / linux

02 Jul, 2014 7 commits

rdma/cxgb4: Fixes cxgb4 probe failure in VM when PF is exposed through PCI Passthrough · 35b1de55

Hariprasad Shenai authored Jun 27, 2014

Change logic which determines our Physical Function at PCI Probe time.
Now we read the PL_WHOAMI register and get the Physical Function.

Pass Physical Function to Upper Layer Drivers in lld_info structure in the
new field "pf" added to lld_info. This is useful for the cases where the
PF, say PF4, is attached to a Virtual Machine via some form of "PCI
Pass Through" technology and the PCI Function shows up as PF0 in the VM.

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35b1de55

Merge branch 'dp83640-next' · 2eb27a16

David S. Miller authored Jul 01, 2014

Stefan Sørensen says:

====================
dp83640: Increase support perout pins

This patch series increases the number of periodic output pins supported
on the dp83640 to 7, and allows for reprogramming the calibration pin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2eb27a16

ptp: Allow reassigning calibration pin function · 72df7a72

Stefan Sørensen authored Jun 27, 2014

The ptp pin function programming does not allow calibration pin to change
function. This is problematic on hardware that uses the default calibration
pin for other purposes.

Removing this limitation does not impact calibration if userspace does not
reprogram the calibration pin.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72df7a72

dp83640: Get calibration pin with ptp_find_pin · e0155950

Stefan Sørensen authored Jun 27, 2014

For consistency, use the ptp_find_pin function to get the calibration pin,
not gpio_tab.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0155950

dp83640: Verify calibration pin assignment · 6f39eb87

Stefan Sørensen authored Jun 27, 2014

This constraints the pin assignment to not allow the calibration function to
be reassigned and only allow reassigning the calibratin pin if only one phy is
connected.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f39eb87

dp83640: Increase supported perout pins to 7 · ad01577a

Stefan Sørensen authored Jun 27, 2014

This patch increases the number of supported periodic output pins from
1 to 7. The last pin is reserved for sync.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad01577a

dp83640: Program pulsewidth2 values of perout triggers 0 and 1 · 35e872ae

Stefan Sørensen authored Jun 27, 2014

Periodic output triggers 0 and 1 of the dp83640 has a programmable
duty-cycle which is controlled by the Pulsewidth2 field of the trigger
data register.  This field is not documented in the datasheet, but it
is described in the "PHYTER Software Development Guide" section
3.1.4.1. Failing to set the field can result in unstable/no trigger
output.

Add programming of the Pulsewidth2 field, setting it to the same value
as the Pulsewidth field for a 50% duty cycle.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35e872ae

01 Jul, 2014 13 commits

Merge branch 'bnx2x-next' · b6fd8b7f

David S. Miller authored Jul 01, 2014

Yuval Mintz says:

====================
bnx2x: Enhancement patch series

This patch series introduces the ability to propagate link parameters
to VFs as well as control the VF link via hypervisor.

In addition, it contains 2 small improvements [one IOV-related and the
other improves performance on machines with short cache lines].

Please consider applying these patches to `net-next'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b6fd8b7f

bnx2x: Fail probe of VFs using an old incompatible driver · ebf457f9

Yuval Mintz authored Jun 26, 2014

There are linux distributions where the inbox bnx2x driver contains SRIOV
support but doesn't contain the changes introduced in b9871bcf
"bnx2x: VF RSS support - PF side".

A VF in a VM running that distribution over a new hypervisor will access
incorrect addresses when trying to transmit packets, causing an attention
in the hypervisor and making that VF inactive until FLRed.

The driver in the VM has to ne upgraded [no real way to overcome this], but
due to the HW attention currently arising upgrading the driver in the VM
would not suffice [since the VF needs also be FLRed if the previous driver
was already loaded].

This patch causes the PF to fail the acquire message from a VF running an
old problematic driver; The VF will then gracefully fail it's probe preventing
the HW attention [and allow clean upgrade of driver in VM].
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebf457f9

bnx2x: enlarge minimal alignemnt of data offset · 9927b514

Dmitry Kravkov authored Jun 26, 2014

This improves the performance of driver on machine with L1_CACHE_SHIFT of at
most 32 bytes [HW was planned for 64-byte aligned fastpath data].
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9927b514

bnx2x: VF can report link speed · 6495d15a

Dmitry Kravkov authored Jun 26, 2014

Until now VFs were oblvious to the actual configured link parameters.
This patch does 2 things:

  1. It enables a PF to inform its VF using the bulletin board of the link
     configured, and allows the VF to present that information.

  2. It adds support of `ndo_set_vf_link_state', allowing the hypervisor
     to set the VF link state.
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6495d15a

Merge branch 'pktgen' · edd79ca8

David S. Miller authored Jul 01, 2014

Jesper Dangaard Brouer says:

====================
Optimizing pktgen for single CPU performance

This series focus on optimizing "pktgen" for single CPU performance.

V2-series:
 - Removed some patches
 - Doc real reason for TX ring buffer filling up

NIC tuning for pktgen:
 http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html

General overload setup according to:
 http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html

Hardware:
 System: CPU E5-2630
 NIC: Intel ixgbe/82599 chip

Testing done with net-next git tree on top of
 commit 6623b419 ("Merge branch 'master' of...jkirsher/net-next")

Pktgen script exercising race condition:
 https://github.com/netoptimizer/network-testing/blob/master/pktgen/unit_test01_race_add_rem_device_loop.sh

Tool for measuring LOCK overhead:
 https://github.com/netoptimizer/network-testing/blob/master/src/overhead_cmpxchg.c
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

edd79ca8

pktgen: RCU-ify "if_list" to remove lock in next_to_run() · 8788370a

Jesper Dangaard Brouer authored Jun 26, 2014

The if_lock()/if_unlock() in next_to_run() adds a significant
overhead, because its called for every packet in busy loop of
pktgen_thread_worker().  (Thomas Graf originally pointed me
at this lock problem).

Removing these two "LOCK" operations should in theory save us approx
16ns (8ns x 2), as illustrated below we do save 16ns when removing
the locks and introducing RCU protection.

Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
 * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
 * Diff   : +588195 pps --> -16.50ns

To understand this RCU patch, I describe the pktgen thread model
below.

In pktgen there is several kernel threads, but there is only one CPU
running each kernel thread.  Communication with the kernel threads are
done through some thread control flags.  This allow the thread to
change data structures at a know synchronization point, see main
thread func pktgen_thread_worker().

Userspace changes are communicated through proc-file writes.  There
are three types of changes, general control changes "pgctrl"
(func:pgctrl_write), thread changes "kpktgend_X"
(func:pktgen_thread_write), and interface config changes "etcX@N"
(func:pktgen_if_write).

Userspace "pgctrl" and "thread" changes are synchronized via the mutex
pktgen_thread_lock, thus only a single userspace instance can run.
The mutex is taken while the packet generator is running, by pgctrl
"start".  Thus e.g. "add_device" cannot be invoked when pktgen is
running/started.

All "pgctrl" and all "thread" changes, except thread "add_device",
communicate via the thread control flags.  The main problem is the
exception "add_device", that modifies threads "if_list" directly.

Fortunately "add_device" cannot be invoked while pktgen is running.
But there exists a race between "rem_device_all" and "add_device"
(which normally don't occur, because "rem_device_all" waits 125ms
before returning). Background'ing "rem_device_all" and running
"add_device" immediately allow the race to occur.

The race affects the threads (list of devices) "if_list".  The if_lock
is used for protecting this "if_list".  Other readers are given
lock-free access to the list under RCU read sections.

Note, interface config changes (via proc) can occur while pktgen is
running, which worries me a bit.  I'm assuming proc_remove() takes
appropriate locks, to assure no writers exists after proc_remove()
finish.

I've been running a script exercising the race condition (leading me
to fix the proc_remove order), without any issues.  The script also
exercises concurrent proc writes, while the interface config is
getting removed.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

8788370a

pktgen: avoid expensive set_current_state() call in loop · baac167b

Jesper Dangaard Brouer authored Jun 26, 2014

Avoid calling set_current_state() inside the busy-loop in
pktgen_thread_worker().  In case of pkt_dev->delay, then it is still
used/enabled in pktgen_xmit() via the spin() call.

The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit
is LOCK prefixed.  I've measured the asm LOCK operation to take approx
8ns on this E5-2630 CPU.  Performance increase corrolate with this
measurement.

Performance data with CLONE_SKB==100000, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev:  5454050 pps --> 183.35ns (1/5454050*10^9)
 * Now:   5684009 pps --> 175.93ns (1/5684009*10^9)
 * Diff:  +229959 pps -->  -7.42ns
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

baac167b

pktgen: document tuning for max NIC performance · 9ceb87fc

Jesper Dangaard Brouer authored Jun 26, 2014

Using pktgen I'm seeing the ixgbe driver "push-back", due TX ring
running full.  Thus, the TX ring is artificially limiting pktgen.
(Diagnose via "ethtool -S", look for "tx_restart_queue" or "tx_busy"
counters.)

Using ixgbe, the real reason behind the TX ring running full, is due
to TX ring not being cleaned up fast enough. The ixgbe driver combines
TX+RX ring cleanups, and the cleanup interval is affected by the
ethtool --coalesce setting of parameter "rx-usecs".

Do not increase the default NIC TX ring buffer or default cleanup
interval.  Instead simply document that pktgen needs special NIC
tuning for maximum packet per sec performance.

Performance results with pktgen with clone_skb=100000.
TX ring size 512 (default), adjusting "rx-usecs":
 (Single CPU performance, E5-2630, ixgbe)
 - 3935002 pps - rx-usecs:  1 (irqs:  9346)
 - 5132350 pps - rx-usecs: 10 (irqs: 99157)
 - 5375111 pps - rx-usecs: 20 (irqs: 50154)
 - 5454050 pps - rx-usecs: 30 (irqs: 33872)
 - 5496320 pps - rx-usecs: 40 (irqs: 26197)
 - 5502510 pps - rx-usecs: 50 (irqs: 21527)

TX ring size adjusting (ethtool -G), "rx-usecs==1" (default):
 - 3935002 pps - tx-size:  512
 - 5354401 pps - tx-size:  768
 - 5356847 pps - tx-size: 1024
 - 5327595 pps - tx-size: 1536
 - 5356779 pps - tx-size: 2048
 - 5353438 pps - tx-size: 4096

Notice after commit 6f25cd47 (pktgen: fix xmit test for BQL enabled
devices) pktgen uses netif_xmit_frozen_or_drv_stopped() and ignores
the BQL "stack" pause (QUEUE_STATE_STACK_XOFF) flag.  This allow us to put
more pressure on the TX ring buffers.

It is the ixgbe_maybe_stop_tx() call that stops the transmits, and
pktgen respecting this in the call to netif_xmit_frozen_or_drv_stopped(txq).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ceb87fc

openvswitch: introduce rtnl ops stub · 5b9e7e16

Jiri Pirko authored Jun 26, 2014

This stub now allows userspace to see IFLA_INFO_KIND for ovs master and
IFLA_INFO_SLAVE_KIND for slave.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b9e7e16

rtnetlink: allow to register ops without ops->setup set · b0ab2fab

Jiri Pirko authored Jun 26, 2014

So far, it is assumed that ops->setup is filled up. But there might be
case that ops might make sense even without ->setup. In that case,
forbid to newlink and dellink.

This allows to register simple rtnl link ops containing only ->kind.
That allows consistent way of passing device kind (either device-kind or
slave-kind) to userspace.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0ab2fab

net: fix some typos in comment · 9bf2b8c2

Ying Xue authored Jun 26, 2014

In commit 37112105("net:
QDISC_STATE_RUNNING dont need atomic bit ops") the
__QDISC_STATE_RUNNING is renamed to __QDISC___STATE_RUNNING,
but the old names existing in comment are not replaced with
the new name completely.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bf2b8c2

ipv6: Allow accepting RA from local IP addresses. · d9333196

Ben Greear authored Jun 25, 2014

This can be used in virtual networking applications, and
may have other uses as well.  The option is disabled by
default.

A specific use case is setting up virtual routers, bridges, and
hosts on a single OS without the use of network namespaces or
virtual machines.  With proper use of ip rules, routing tables,
veth interface pairs and/or other virtual interfaces,
and applications that can bind to interfaces and/or IP addresses,
it is possibly to create one or more virtual routers with multiple
hosts attached.  The host interfaces can act as IPv6 systems,
with radvd running on the ports in the virtual routers.  With the
option provided in this patch enabled, those hosts can now properly
obtain IPv6 addresses from the radvd.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9333196

ipv6: Add more debugging around accept-ra logic. · f2a762d8

Ben Greear authored Jun 25, 2014

This is disabled by default, just like similar debug info
already in this module.  But, makes it easier to find out
why RA is not being accepted when debugging strange behaviour.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2a762d8

30 Jun, 2014 1 commit

tcp: tcp_conn_request: fix build error when IPv6 is disabled · 4135ab82

Octavian Purdila authored Jun 28, 2014

Fixes build error introduced by commit 1fb6f159 (tcp: add
tcp_conn_request):

net/ipv4/tcp_input.c: In function 'pr_drop_req':
net/ipv4/tcp_input.c:5889:130: error: 'struct sock_common' has no member named 'skc_v6_daddr'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4135ab82

27 Jun, 2014 19 commits

Merge branch 'tcp_conn_request_unification' · 9e1a21b6

David S. Miller authored Jun 27, 2014

Octavian Purdila says:

====================
tcp: remove code duplication in tcp_v[46]_conn_request

This patch series unifies the TCPv4 and TCPv6 connection request flow
in a single new function (tcp_conn_request).

The first 3 patches are small cleanups and fixes found during the code
merge process.

The next patches add new methods in tcp_request_sock_ops to abstract
the IPv4/IPv6 operations and keep the TCP connection request flow
common.

To identify potential performance issues this patch has been tested
by measuring the connection per second rate with nginx and a httperf
like client (to allow for concurrent connection requests - 256 CC were
used during testing) using the loopback interface. A dual-core i5 Ivy
Bridge processor was used and each process was bounded to a different
core to make results consistent.

Results for IPv4, unit is connections per second, higher is better, 20
measurements have been collected:

		before		after
min		27917		27962
max		28262		28366
avg		28094.1		28212.75
stdev		87.35		97.26

Results for IPv6, unit is connections per second, higher is better, 20
measurements have been collected:

		before		after
min		24813		24877
max		25029		25119
avg		24935.5		25017
stdev		64.13		62.93

Changes since v1:

 * add benchmarking datapoints

 * fix a few issues in the last patch (IPv6 related)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9e1a21b6

tcp: add tcp_conn_request · 1fb6f159

Octavian Purdila authored Jun 25, 2014

Create tcp_conn_request and remove most of the code from
tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fb6f159

tcp: add queue_add_hash to tcp_request_sock_ops · 695da14e

Octavian Purdila authored Jun 25, 2014

Add queue_add_hash member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

695da14e

tcp: add mss_clamp to tcp_request_sock_ops · 2aec4a29

Octavian Purdila authored Jun 25, 2014

Add mss_clamp member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2aec4a29

tcp: unify tcp_v4_rtx_synack and tcp_v6_rtx_synack · 5db92c99

Octavian Purdila authored Jun 25, 2014

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5db92c99

tcp: add send_synack method to tcp_request_sock_ops · d6274bd8

Octavian Purdila authored Jun 25, 2014

Create a new tcp_request_sock_ops method to unify the IPv4/IPv6
signature for tcp_v[46]_send_synack. This allows us to later unify
tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with
tcp_v4_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d6274bd8

tcp: add init_seq method to tcp_request_sock_ops · 936b8bdb

Octavian Purdila authored Jun 25, 2014

More work in preparation of unifying tcp_v4_conn_request and
tcp_v6_conn_request: indirect the init sequence calls via the
tcp_request_sock_ops.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

936b8bdb

tcp: move around a few calls in tcp_v6_conn_request · 94037159

Octavian Purdila authored Jun 25, 2014

Make the tcp_v6_conn_request calls flow similar with that of
tcp_v4_conn_request.

Note that want_cookie can be true only if isn is zero and that is why
we can move the if (want_cookie) block out of the if (!isn) block.

Moving security_inet_conn_request() has a couple of side effects:
missing inet_rsk(req)->ecn_ok update and the req->cookie_ts
update. However, neither SELinux nor Smack security hooks seems to
check them. This change should also avoid future different behaviour
for IPv4 and IPv6 in the security hooks.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94037159

tcp: add route_req method to tcp_request_sock_ops · d94e0417

Octavian Purdila authored Jun 25, 2014

Create wrappers with same signature for the IPv4/IPv6 request routing
calls and use these wrappers (via route_req method from
tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request
with the purpose of unifying the two functions in a later patch.

We can later drop the wrapper functions and modify inet_csk_route_req
and inet6_cks_route_req to use the same signature.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d94e0417

tcp: add init_cookie_seq method to tcp_request_sock_ops · fb7b37a7

Octavian Purdila authored Jun 25, 2014

Move the specific IPv4/IPv6 cookie sequence initialization to a new
method in tcp_request_sock_ops in preparation for unifying
tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb7b37a7

tcp: add init_req method to tcp_request_sock_ops · 16bea70a

Octavian Purdila authored Jun 25, 2014

Move the specific IPv4/IPv6 intializations to a new method in
tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request
and tcp_v6_conn_request.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16bea70a

net: remove inet6_reqsk_alloc · 476eab82

Octavian Purdila authored Jun 25, 2014

Since pktops is only used for IPv6 only and opts is used for IPv4
only, we can move these fields into a union and this allows us to drop
the inet6_reqsk_alloc function as after this change it becomes
equivalent with inet_reqsk_alloc.

This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
field was not annotated after a request_sock was allocated.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

476eab82

tcp: tcp_v[46]_conn_request: fix snt_synack initialization · aa27fc50

Octavian Purdila authored Jun 25, 2014

Commit 016818d0 (tcp: TCP Fast Open Server - take SYNACK RTT after
completing 3WHS) changes the code to only take a snt_synack timestamp
when a SYNACK transmit or retransmit succeeds. This behaviour is later
broken by commit 843f4a55 (tcp: use tcp_v4_send_synack on first
SYN-ACK), as snt_synack is now updated even if tcp_v4_send_synack
fails.

Also, commit 3a19ce0e (tcp: IPv6 support for fastopen server) misses
the required IPv6 updates for 016818d0.

This patch makes sure that snt_synack is updated only when the SYNACK
trasnmit/retransmit succeeds, for both IPv4 and IPv6.

Cc: Cardwell <ncardwell@google.com>
Cc: Daniel Lee <longinus00@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa27fc50

tcp: cookie_v4_init_sequence: skb should be const · 57b47553

Octavian Purdila authored Jun 25, 2014

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57b47553

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · c1c27fb9

David S. Miller authored Jun 27, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-06-26

This series contains updates to i40e and i40evf.

Kamil provides a cleanup patch to i40e where we do not need to acquire the
NVM for shadow RAM checksum calculation, since we only read the shadow RAM
through SRCTL register.

Paul provides a fix for handling HMC for big endian architectures for i40e
and i40evf.

Mitch provides four cleanup and fixes for i40evf.  Fix an issue where if
the VF driver fails to complete early init, then rmmod can cause a softlock
when the driver tries to stop a watchdog timer that never got initialized.
So add a check to see if the timer is actually initialized before stopping
it.  Make the function i40evf_send_api_ver() return more useful information,
instead of just returning -EIO by propagating firmware errors back to the
caller and log a message if the PF sends an invalid reply.  Fix up a log
message that was missing a word, which makes the log message more readable.
Fix an initialization failure if many VFs are instantiated at the same time
and the VF module is autoloaded by simply resending firmware request if
there is no response the first time.

Jacob does a rename of the function i40e_ptp_enable() to
i40e_ptp_feature_enable(), like he did for ixgbe, to reduce possible
confusion and ambugity in the purpose of the function.  Does follow on
PTP work on i40e, like he did for ixgbe, by breaking the PTP hardware
control from the ioctl command for timestamping mode.  By doing this,
we can maintain state about the 1588 timestamping mode and properly
re-enable to the last known mode during a re-initialization of 1588 bits.

Anjali cleans up the i40e driver where TCP-IPv4 filters were being added
twice, which seems to be left over from when we had to add two PTYPEs for
one filter.  Fixes the flow director sideband logic to detect when there
is a full flow director table.  Also fixes the programming of FDIR where
a couple of fields in the descriptor setup that were not being
programmed, which left the opportunity for stale data to be pushed as
part of the descriptor next time it was used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c1c27fb9

Merge branch 'tipc-next' · 0ff9275a

David S. Miller authored Jun 27, 2014

Jon Maloy says:

====================
tipc: new unicast transmission code

As a step towards making the data transmission code more maintainable
and performant, we introduce a number of new functions, both for
building, sending and rejecting messages. The new functions will
eventually be used for alla data transmission, user data unicast,
service internal messaging, and multicast/broadcast.

We start with this series, where we introduce the functions, and
let user data unicast and the internal connection protocol use them.
The remaining users will come in a later series.

There are only minor changes to data structures, and no protocol
changes, so the older functions can still be used in parallel for
some time. Until the old functions are removed, we use temporary
names for the new functions, such as tipc_build_msg2, tipc_link_xmit2.

It should be noted that the first two commits are unrelated to the
rest of the series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0ff9275a

tipc: simplify connection congestion handling · 60120526

Jon Paul Maloy authored Jun 25, 2014

As a consequence of the recently introduced serialized access
to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa
("tipc: same receive code path for connection protocol and data
messages") we can make a number of simplifications in the
detection and handling of connection congestion situations.

- We don't need to keep two counters, one for sent messages and one
  for acked messages. There is no longer any risk for races between
  acknowledge messages arriving in BH and data message sending
  running in user context. So we merge this into one counter,
  'sent_unacked', which is incremented at sending and subtracted
  from at acknowledge reception.

- We don't need to set the 'congested' field in tipc_port to
  true before we sent the message, and clear it when sending
  is successful. (As a matter of fact, it was never necessary;
  the field was set in link_schedule_port() before any wakeup
  could arrive anyway.)

- We keep the conditions for link congestion and connection connection
  congestion separated. There would otherwise be a risk that an arriving
  acknowledge message may wake up a user sleeping because of link
  congestion.

- We can simplify reception of acknowledge messages.

We also make some cosmetic/structural changes:

- We rename the 'congested' field to the more correct 'link_cong´.

- We rename 'conn_unacked' to 'rcv_unacked'

- We move the above mentioned fields from struct tipc_port to
  struct tipc_sock.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60120526

tipc: clean up connection protocol reception function · ac0074ee

Jon Paul Maloy authored Jun 25, 2014

We simplify the code for receiving connection probes, leveraging the
recently introduced tipc_msg_reverse() function. We also stick to
the principle of sending a possible response message directly from
the calling (tipc_sk_rcv or backlog_rcv) functions, hence making
the call chain shallower and easier to follow.

We make one small protocol change here, allowed according to
the spec. If a protocol message arrives from a remote socket that
is not the one we are connected to, we are currently generating a
connection abort message and send it to the source. This behavior
is unnecessary, and might even be a security risk, so instead we
now choose to only ignore the message. The consequnce for the sender
is that he will need longer time to discover his mistake (until the
next timeout), but this is an extreme corner case, and may happen
anyway under other circumstances, so we deem this change acceptable.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac0074ee

tipc: same receive code path for connection protocol and data messages · ec8a2e56

Jon Paul Maloy authored Jun 25, 2014

As a preparation to eliminate port_lock we need to bring reception
of connection protocol messages under proper protection of bh_lock_sock
or socket owner.

We fix this by letting those messages follow the same code path as
incoming data messages.

As a side effect of this change, the last reference to the function
net_route_msg() disappears, and we can eliminate that function.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec8a2e56