Commits · 1ec8deac8c63505194e773b9657824ed3c2fbdd8 · Kirill Smelkov / linux

20 Mar, 2017 11 commits

i40e: explicitly fail on extended MAC field for ethtool_rx_flow_spec · 1ec8deac

Jacob Keller authored Feb 06, 2017

Although we will fail the filter later due to checking flow_type which
will have a bogus invalid type, it is possible future refactoring will
remove this hidden failure case. Avoid a possible issue in the future by
explicitly checking the flow type at the start.

Change-Id: Ia98eb26f7b93ccbe38c7141e8f203ef496fc6598
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

1ec8deac

i40e: add counters for UDP/IPv4 and IPv4 filters · 097dbf52

Jacob Keller authored Feb 06, 2017

In preparation for adding code to properly check the mask values, we
will need to know the number of active filters for each type. Add
counters for each filter type. Rename the already existing fd_tcp_rule
to fd_tcp4_filter_cnt to match the style of other names. To avoid style
warnings, avoid assigning multiple parameters at once, and fix up one
other case where we did so previously.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

097dbf52

i40e: don't re-enable ATR when flushing filters if SB has TCP4/IPv4 rules · 510dd460

Jacob Keller authored Feb 06, 2017

When flushing and replaying FDIR filters, it is possible we would
disable ATR, and then re-enable it even though we should have kept
it disabled due to existing TCP/IPv4 filters. Fix this by checking
whether we have TCP4/IPv4 filters before re-enabling.

Alternatively, we could instead restore ATR and then replay filters,
however, this would cause us to rapidly enable and then disable ATR in
some cases.

Change-ID: I076e4cc1e4409bce7f98f3c213295433a4ff43d8
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Avinash Dayanand <avinash.dayanand@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

510dd460

i40e: reset fd_tcp_rule count when restoring filters · 6d069425

Jacob Keller authored Feb 06, 2017

Since we're about to reprogram the filters, we need to ensure that the
fd_tcp_rule count is correctly reset to 0. Otherwise, we will keep
a stale count that does not accurately reflect the number of programmed
TCPv4 filters.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6d069425

i40e: remove redundant check for fd_tcp_rule when restoring filters · e122eb74

Jacob Keller authored Feb 06, 2017

i40e_fdir_filter_restore re-adds all existing filters, which already
checks when adding a TCPv4 filter to disable ATR. We don't need to make
the check twice, so remove this redundant code.

Change-ID: Ia0b0690e23523915199d601494557def135c9d7f
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e122eb74

i40e: exit ATR mode only when adding TCP/IPv4 filter succeeds · 377cc249

Jacob Keller authored Feb 06, 2017

Move ATR exit check after we have sent the TCP/IPv4 filter to the ring
successfully. This avoids an issue where we potentially update the
filter count without actually succeeding in adding the filter. Now, we
only increment the fd_tcp_rule after we've succeeded. Additionally, we
will re-enable ATR mode only after deletion of the filter is actually
posted to the FDIR ring.

Change-ID: If5c1dea422081cc5e2de65618b01b4c3bf6bd586
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

377cc249

i40e: return immediately when failing to add fdir filter · e5187ee3

Jacob Keller authored Feb 06, 2017

Instead of setting err=true and checking this to determine when to free
the raw_packet near the end of the function, simply kfree and return
immediately. The resulting code is a bit cleaner and has one less
variable. This also resolves a subtle bug in the ipv4 case which could
fail to add the first filter and then never free the memory, resulting
in a small memory leak.

Change-ID: I7583aac033481dc794b4acaa14445059c8930ff1
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Avinash Dayanand <avinash.dayanand@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e5187ee3

i40e: rework exit flow of i40e_add_fdir_ethtool · 01016da1

Jacob Keller authored Feb 06, 2017

Refactor the exit flow of the i40e_add_fdir_ethtool function. Move the
input_label to the end of the function, removing the dependency on
having a non-zero return value. Add a comment explaining why it is ok
not to free the fdir data structure, because the structure is now stored
in the fdir_filter_list.

Change-Id: I723342181d59cd0c9f3b31140c37961ba37bb242
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

01016da1

i40e: don't use arrays for (src|dst)_ip · 8ce43dce

Jacob Keller authored Feb 06, 2017

The code originally included src_ip and dst_ip with enough space to
support ipv6 filters. However, no actual support for ipv6 filters has
been implemented. Thus, remove the arrays and just use __be32 values.
Should ipv6 support be added in the future, we can replace these with
a union that has sizes for both values.

Change-Id: I1bc04032244a80eb6ebc8a4e6c723a4a665c1dd5
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

8ce43dce

i40e: send correct port number to AdminQ when enabling UDP tunnels · fe0b0cd9

Jacob Keller authored Feb 06, 2017

The firmware expects the port numbers for offloaded UDP tunnels in
Little Endian format. We accidentally sent the value in Big Endian
format which obviously will cause the wrong port number to be put into
the UDP tunnels list. This results in VxLAN and Geneve tunnel Rx
offloads being essentially disabled, unless the port number happens to
be identical after byte swapping. Note that i40e_aq_add_udp_tunnel()
will byteswap the parameter from host order into Little Endian so we
don't need worry about passing strictly a __le16 value to the command.

This patch essentially reverts b3f5c7bc ("i40e: Fix for extra byte
swap in tunnel setup", 2016-08-24), but in a way that makes the result
much more clear to the reader.

Fixes: b3f5c7bc ("i40e: Fix for extra byte swap in tunnel setup", 2016-08-24)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Williams, Mitch A <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

fe0b0cd9

i40evf: use new api ethtool_{get|set}_link_ksettings · 48ce8802

Philippe Reynes authored Feb 04, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

48ce8802

18 Mar, 2017 1 commit

i40e: use new api ethtool_{get|set}_link_ksettings · a7f90940

Philippe Reynes authored Feb 04, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

a7f90940

17 Mar, 2017 25 commits

liquidio: fix wrong information about link modes reported to ethtool · fe723dff

Manish Awasthi authored Mar 16, 2017

Information reported to ethtool about link modes is wrong for 25G NIC.  Fix
it by checking for presence of 25G NIC, checking the link speed reported by
NIC firmware, and then assigning proper values to the
ethtool_link_ksettings struct.
Signed-off-by: Manish Awasthi <manish.awasthi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe723dff

Merge branch 'netvsc-small-changes' · 513d2d01

David S. Miller authored Mar 16, 2017

Stephen Hemminger says:

====================
netvsc: small changes for net-next

One bugfix, and two non-code patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

513d2d01

netvsc: remove unused #define · 76f5ed88

stephen hemminger authored Mar 16, 2017

Not used anywhere.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76f5ed88

netvsc: add comments about callback's and NAPI · 262b7f14

stephen hemminger authored Mar 16, 2017

Add some short description of how callback's and NAPI interoperate.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

262b7f14

netvsc: avoid race with callback · 6de38af6

stephen hemminger authored Mar 16, 2017

Change the argument to channel callback from the channel pointer
to the internal data structure containing per-channel info.
This avoids any possible races when callback happens during
initialization and makes IRQ code simpler.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6de38af6

Merge branch 'bpf-inline-lookups' · 3a70418b

David S. Miller authored Mar 16, 2017

Alexei Starovoitov says:

====================
bpf: inline bpf_map_lookup_elem()

bpf_map_lookup_elem() is one of the most frequently used helper functions.
Improve JITed program performance by inlining this helper.

bpf_map_type	before  after
hash		58M	74M
array		174M	280M

The values are number of lookups per second in ideal conditions
measured by micro-benchmark in patch 6.

The 'perf report' for HASH map type:
before:
    54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
     5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
     1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
     1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

so the cost of htab_map_lookup_elem() and bpf_map_lookup_elem()
is gone after inlining.

'per-cpu' and 'lru' map types can be optimized similarly in the future.

Note the sparse will complain that bpf is addictive ;)
kernel/bpf/hashtab.c:438:19: sparse: subtraction of functions? Share your drugs
kernel/bpf/verifier.c:3342:38: sparse: subtraction of functions? Share your drugs
it's not a new warning, just in new places.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3a70418b

samples/bpf: add map_lookup microbenchmark · 95ff141e

Alexei Starovoitov authored Mar 15, 2017

$ map_perf_test 128
speed of HASH bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	46M		58M
after	42M		74M

perf report
before:
    54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
     5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
     1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
     1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial
functions, yet they take sizeable amount of cpu time.
htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts
htab_map_lookup_elem() into three BPF insns which causing cpu time
for bpf_prog_da4fc6a3f41761a2() slightly increase.

$ map_perf_test 256
speed of ARRAY bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	97M		174M
after	64M		280M

before:
    37.33%  map_perf_test  [kernel.kallsyms]  [k] array_map_lookup_elem
    13.95%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     6.54%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     4.57%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    32.86%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     6.54%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

array_map_gen_lookup() removes calls to array_map_lookup_elem()
and bpf_map_lookup_elem() and replaces them with 7 bpf insns.

The performance without JIT is slower, since executing extra insns
in the interpreter is slower than running native C code,
but with JIT the performance gains are obvious,
since native C->x86 code is replaced with fewer bpf->x86 instructions.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

95ff141e

bpf: inline htab_map_lookup_elem() · 9015d2f5

Alexei Starovoitov authored Mar 15, 2017

Optimize:
bpf_call
  bpf_map_lookup_elem
    map->ops->map_lookup_elem
      htab_map_lookup_elem
        __htab_map_lookup_elem
into:
bpf_call
  __htab_map_lookup_elem

to improve performance of JITed programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

9015d2f5

bpf: add helper inlining infra and optimize map_array lookup · 81ed18ab

Alexei Starovoitov authored Mar 15, 2017

Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem()
into a sequence of bpf instructions.
When JIT is on the sequence of bpf instructions is the sequence
of native cpu instructions with significantly faster performance
than indirect call and two function's prologue/epilogue.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

81ed18ab

bpf: adjust insn_aux_data when patching insns · 8041902d

Alexei Starovoitov authored Mar 15, 2017

convert_ctx_accesses() replaces single bpf instruction with a set of
instructions. Adjust corresponding insn_aux_data while patching.
It's needed to make sure subsequent 'for(all insn)' loops
have matching insn and insn_aux_data.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

8041902d

bpf: refactor fixup_bpf_calls() · 79741b3b

Alexei Starovoitov authored Mar 15, 2017

reduce indent and make it iterate over instructions similar to
convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

79741b3b

bpf: move fixup_bpf_calls() function · e245c5c6

Alexei Starovoitov authored Mar 15, 2017

no functional change.
move fixup_bpf_calls() to verifier.c
it's being refactored in the next patch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e245c5c6

tcp: remove tcp_tw_recycle · 4396e461

Soheil Hassas Yeganeh authored Mar 15, 2017

The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.

After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.

Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Lutz Vieweg <lvml@5t9.de>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

4396e461

tcp: remove per-destination timestamp cache · d82bae12

Soheil Hassas Yeganeh authored Mar 15, 2017

Commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection)
randomizes TCP timestamps per connection. After this commit,
there is no guarantee that the timestamps received from the
same destination are monotonically increasing. As a result,
the per-destination timestamp cache in TCP metrics (i.e., tcpm_ts
in struct tcp_metrics_block) is broken and cannot be relied upon.

Remove the per-destination timestamp cache and all related code
paths.

Note that this cache was already broken for caching timestamps of
multiple machines behind a NAT sharing the same address.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Lutz Vieweg <lvml@5t9.de>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

d82bae12

Merge branch 'sunvnet-better-connection-management' · 8b705f52

David S. Miller authored Mar 16, 2017

Shannon Nelson says:

====================
sunvnet: better connection management

These patches remove some problems in handling of carrier state
with the ldmvsw vswitch, remove  an xoff misuse in sunvnet, and
add stats for debug and tracking of point-to-point connections
between the ldom VMs.

v2:
 - added ldmvsw ndo_open to reset the LDC channel
 - updated copyrights
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8b705f52

sunvnet: xoff not needed when removing port link · 9c5a3a1f

Shannon Nelson authored Mar 14, 2017

The sunvnet netdev is connected to the controlling ldom's vswitch
for network bridging.  However, for higher performance between ldoms,
there also is a channel between each client ldom.  These connections are
represented in the sunvnet driver by a queue for each ldom.  The driver
uses select_queue to tell the stack which queue to use by tracking the mac
addresses on the other end of each port.  When a connected ldom shuts down,
the driver receives an LDC_EVENT_RESET and the port is removed from the
driver, thus a queue with no ldom on the other end will never be selected
for Tx.

The driver was trying to reinforce the "don't use this queue" notion with
netif_tx_stop_queue() and netif_tx_wake_queue(), which really should only
be used to signal a Tx queue is full (aka XOFF).  This misuse of queue
state resulted in NETDEV WATCHDOG messages and lots of unnecessary calls
into the driver's tx_timeout handler.  Simply removing these takes care
of the problem.

Orabug: 25190537
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c5a3a1f

sunvnet: count multicast packets · b12a96f5

Shannon Nelson authored Mar 14, 2017

Make sure multicast packets get counted in the device.

Orabug: 25190537
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b12a96f5

sunvnet: track port queues correctly · e1f1e5f7

Shannon Nelson authored Mar 14, 2017

Track our used and unused queue indexies correctly.  Otherwise, as ports
dropped out and returned, they all eventually ended up with the same
queue index.

Orabug: 25190537
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1f1e5f7

sunvnet: add stats to track ldom to ldom packets and bytes · 0f512c84

Shannon Nelson authored Mar 14, 2017

In this driver, there is a "port" created for the connection to each of
the other ldoms; a netdev queue is mapped to each port, and they are
collected under a single netdev.  The generic netdev statistics show
us all the traffic in and out of our network device, but don't show
individual queue/port stats.  This patch breaks out the traffic counts
for the individual ports and gives us a little view into the state of
those connections.

Orabug: 25190537
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f512c84

ldmvsw: better use of link up and down on ldom vswitch · 867fa150

Shannon Nelson authored Mar 14, 2017

When an ldom VM is bound, the network vswitch infrastructure is set up for
it, but was being forced 'UP' by the userland switch configuration script.
When 'UP' but not actually connected to a running VM, the ipv6 neighbor
probes fail (not a horrible thing) and start cluttering up the kernel logs.
Funny thing: these are debug messages that never actually show up, but
we do see the net_ratelimited messages that say N callbacks were
suppressed.

This patch defers the netif_carrier_on() until an actual link has been
established with the VM, as indicated by receiving an LDC_EVENT_UP from
the underlying LDC protocol.  Similarly, we take the link down when we
see the LDC_EVENT_RESET.  Now when we see the ndo_open(), we reset the
link to get things talking again.

Orabug: 25525312
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

867fa150

bonding: add 802.3ad support for 25G speeds · 19ddde1e

Jarod Wilson authored Mar 14, 2017

Cut-n-paste enablement of 802.3ad bonding on 25G NICs, which currently
report 0 as their bandwidth.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

19ddde1e

tcp_westwood: fix tcp_westwood_info() style mistakes · be7164cd

chun Long authored Mar 14, 2017

replace comma to semi colons in tcp_westwood_info().
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be7164cd

liquidio: use meaningful names for IRQs · 0c88a761

Rick Farrington authored Mar 13, 2017

All IRQs owned by the PF and VF drivers share the same nondescript name
"octeon"; this makes it difficult to setup interrupt affinity.

Change the IRQ names to reflect their specific purpose:

    LiquidIO<id>-<func>-<type>-<queue pair num>

Examples:
    LiquidIO0-pf0-rxtx-3
    LiquidIO1-vf1-rxtx-0
    LiquidIO0-pf0-aux

We cannot use netdev->name for naming the IRQs because:

    1.  Early during init, the PF and VF drivers require interrupts to
        send/receive control data from the NIC firmware; so the PF and VF
        must request IRQs long before the netdev struct is registered.

    2.  The IRQ name can only be specified at the time it is requested.
        It cannot be changed after that.
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c88a761

liquidio: remove/replace invalid code · b229487b

Rick Farrington authored Mar 13, 2017

Remove invalid call to dma_sync_single_for_cpu() because previous DMA
allocation was coherent--not streaming. Remove code that references fields
in struct list_head; replace it with calls to list_empty() and
list_first_entry(). Also, add comment to clarify complicated if statement.
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b229487b

netem: apply correct delay when rate throttling · 5080f39e

Nik Unger authored Mar 13, 2017

I recently reported on the netem list that iperf network benchmarks
show unexpected results when a bandwidth throttling rate has been
configured for netem. Specifically:

1) The measured link bandwidth *increases* when a higher delay is added
2) The measured link bandwidth appears higher than the specified limit
3) The measured link bandwidth for the same very slow settings varies significantly across
machines

The issue can be reproduced by using tc to configure netem with a
512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a
veth pair between network namespaces, and then using iperf (or any
other network benchmarking tool) to test throughput. Complete detailed
instructions are in the original email chain here:
https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html

There appear to be two underlying bugs causing these effects:

- The first issue causes long delays when the rate is slow and no
delay is configured (e.g., "rate 512kbit"). This is because SKBs are
not orphaned when no delay is configured, so orphaning does not
occur until *after* the rate-induced delay has been applied. For
this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us")
dramatically increases the measured bandwidth.

- The second issue is that rate-induced delays are not correctly
applied, allowing SKB delays to occur in parallel. The indended
approach is to compute the delay for an SKB and to add this delay to
the end of the current queue. However, the code does not detect
existing SKBs in the queue due to improperly testing sch->q.qlen,
which is nonzero even when packets exist only in the
rbtree. Consequently, new SKBs do not wait for the current queue to
empty. When packet delays vary significantly (e.g., if packet sizes
are different), then this also causes unintended reordering.

I modified the code to expect a delay (and orphan the SKB) when a rate
is configured. I also added some defensive tests that correctly find
the latest scheduled delivery time, even if it is (unexpectedly) for a
packet in sch->q. I have tested these changes on the latest kernel
(4.11.0-rc1+) and the iperf / ping test results are as expected.
Signed-off-by: Nik Unger <njunger@uwaterloo.ca>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5080f39e

16 Mar, 2017 3 commits

Merge branch 'sched-cleanups' · cd918afd

David S. Miller authored Mar 16, 2017

Or Gerlitz says:

====================
small set of sched cleanups

Just two cleanups -- but for the 2nd one I think we need ack from
Cong Wang to make sure this isn't actually a bug report..

changes from V1:
  - addressed comment from Sergei to use 12 hex digits etc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cd918afd

net/sched: fq_codel: Avoid set-but-unused variable · a5e6a3b0

Or Gerlitz authored Mar 16, 2017

The code introduced by commit 2ccccf5f ("net_sched: update
hierarchical backlog too") only sets prev_backlog in fq_codel_dequeue()
but not using that anywhere, remove that setting.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5e6a3b0

net/sched: act_ife: Staticfy find_decode_metaid() · 4dba87b0

Or Gerlitz authored Mar 16, 2017

As it's used only on that file.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4dba87b0