Commits · 3499abb249bb5ed9d21031944bc3059ec4aa2909 · Kirill Smelkov / linux

07 Aug, 2015 40 commits

netfilter: nfacct: per network namespace support · 3499abb2

Andreas Schultz authored Aug 05, 2015

- Move the nfnl_acct_list into the network namespace, initialize
  and destroy it per namespace
- Keep track of refcnt on nfacct objects, the old logic does not
  longer work with a per namespace list
- Adjust xt_nfacct to pass the namespace when registring objects
Signed-off-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3499abb2

netfilter: nft_limit: add per-byte limiting · d2168e84

Pablo Neira Ayuso authored Aug 05, 2015

This patch adds a new NFTA_LIMIT_TYPE netlink attribute to indicate the type of
limiting.

Contrary to per-packet limiting, the cost is calculated from the packet path
since this depends on the packet length.

The burst attribute indicates the number of bytes in which the rate can be
exceeded.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d2168e84

netfilter: nft_limit: constant token cost per packet · 8bdf3626

Pablo Neira Ayuso authored Aug 02, 2015

The cost per packet can be calculated from the control plane path since this
doesn't ever change.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8bdf3626

netfilter: nft_limit: add burst parameter · 3e87baaf

Pablo Neira Ayuso authored Aug 02, 2015

This patch adds the burst parameter. This burst indicates the number of packets
that can exceed the limit.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3e87baaf

netfilter: nft_limit: factor out shared code with per-byte limiting · f8d3a6bc
Pablo Neira Ayuso authored Aug 02, 2015
```
This patch prepares the introduction of per-byte limiting.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
f8d3a6bc

netfilter: nft_limit: convert to token-based limiting at nanosecond granularity · dba27ec1

Pablo Neira Ayuso authored Jul 31, 2015

Rework the limit expression to use a token-based limiting approach that refills
the bucket gradually. The tokens are calculated at nanosecond granularity
instead jiffies to improve precision.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

dba27ec1

netfilter: nft_limit: rename to nft_limit_pkts · 09e4e42a

Pablo Neira Ayuso authored Jul 31, 2015

To prepare introduction of bytes ratelimit support.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

09e4e42a

netfilter: nf_tables: add nft_dup expression · d877f071

Pablo Neira Ayuso authored May 31, 2015

This new expression uses the nf_dup engine to clone packets to a given gateway.
Unlike xt_TEE, we use an index to indicate output interface which should be
fine at this stage.

Moreover, change to the preemtion-safe this_cpu_read(nf_skb_duplicated) from
nf_dup_ipv{4,6} to silence a lockdep splat.

Based on the original tee expression from Arturo Borrero Gonzalez, although
this patch has diverted quite a bit from this initial effort due to the
change to support maps.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d877f071

netfilter: factor out packet duplication for IPv4/IPv6 · bbde9fc1

Pablo Neira Ayuso authored May 31, 2015

Extracted from the xtables TEE target. This creates two new modules for IPv4
and IPv6 that are shared between the TEE target and the new nf_tables dup
expressions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

bbde9fc1

netfilter: xt_TEE: get rid of WITH_CONNTRACK definition · 24b7811f
Pablo Neira Ayuso authored Jul 01, 2015
```
Use IS_ENABLED(CONFIG_NF_CONNTRACK) instead.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
24b7811f

netfilter: nft_counter: convert it to use per-cpu counters · 0c45e769

Pablo Neira Ayuso authored Jun 08, 2015

This patch converts the existing seqlock to per-cpu counters.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0c45e769

net_dbg_ratelimited: turn into no-op when !DEBUG · d92cff89

Jason A. Donenfeld authored Aug 04, 2015

The pr_debug family of functions turns into a no-op when -DDEBUG is not
specified, opting instead to call "no_printk", which gets compiled to a
no-op (but retains gcc's nice warnings about printf-style arguments).

The problem with net_dbg_ratelimited is that it is defined to be a
variant of net_ratelimited_function, which expands to essentially:

    if (net_ratelimit())
        pr_debug(fmt, ...);

When DEBUG is not defined, then this becomes,

    if (net_ratelimit())
        ;

This seems benign, except it isn't. Firstly, there's the obvious
overhead of calling net_ratelimit needlessly, which does quite some book
keeping for the rate limiting. Given that the pr_debug and
net_dbg_ratelimited family of functions are sprinkled liberally through
performance critical code, with developers assuming they'll be compiled
out to a no-op most of the time, we certainly do not want this needless
book keeping. Secondly, and most visibly, even though no debug message
is printed when DEBUG is not defined, if there is a flood of
invocations, dmesg winds up peppered with messages such as
"net_ratelimit: 320 callbacks suppressed". This is because our
aforementioned net_ratelimit() function actually prints this text in
some circumstances. It's especially odd to see this when there isn't any
other accompanying debug message.

So, in sum, it doesn't make sense to have this function's current
behavior, and instead it should match what every other debug family of
functions in the kernel does with !DEBUG -- nothing.

This patch replaces calls to net_dbg_ratelimited when !DEBUG with
no_printk, keeping with the idiom of all the other debug print helpers.

Also, though not strictly neccessary, it guards the call with an if (0)
so that all evaluation of any arguments are sure to be compiled out.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d92cff89

af_mpls: add null dev check in find_outdev · 3dcb615e

Roopa Prabhu authored Aug 04, 2015

This patch adds null dev check for the 'cfg->rc_via_table ==
NEIGH_LINK_TABLE or dev_get_by_index() failed' case
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3dcb615e

Merge branch 'test-bpf-next' · 3c621818

David S. Miller authored Aug 06, 2015

Nicolas Schichan says:

====================
test_bpf improvements

Please find below the patch series with my latest changes to test_bpf.

The first patch checks for unexpected NULL generated skbs before
running the filter.

The second patch adds the possibility for tests to generate fragmented
skbs.

The third patch tests LD_ABS and LD_IND on fragmented skbs.

The fourth patch adds the possibility to restrict the tests being run
by specifying the name/id/range of the test(s) to run via module
parameters.

The fifth patch tests LD_ABS and LD_IND on non fragmented skbs with
various sizes and alignments.

The sixth and final patch checks that the interpreter or JIT correctly
resets A and X to 0.

This serie is against today's net-next tree.

Changes in V2:

* move declaration of 'ptr' in if() block in patch 2/6.

* fix various typos in patch 4/6

* rework default init of test_range array and cleanup exclude_test()
  return condition in patch 4/6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3c621818

test_bpf: add tests checking that JIT/interpreter sets A and X to 0. · 86bf1721

Nicolas Schichan authored Aug 04, 2015

It is mandatory for the JIT or interpreter to reset the A and X
registers to 0 before running the filter. Check that it is the case on
various ALU and JMP instructions.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

86bf1721

test_bpf: add more tests for LD_ABS and LD_IND. · 08fcb08f

Nicolas Schichan authored Aug 04, 2015

This exerces the LD_ABS and LD_IND instructions for various sizes and
alignments. This also checks that X when used as an offset to a
BPF_IND instruction first in a filter is correctly set to 0.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

08fcb08f

test_bpf: add module parameters to filter the tests to run. · d2648d4e

Nicolas Schichan authored Aug 04, 2015

When developping on the interpreter or a particular JIT, it can be
interesting to restrict the tests list to a specific test or a
particular range of tests.

This patch adds the following module parameters to the test_bpf module:

* test_name=<string>: only the specified named test will be run.

* test_id=<number>: only the test with the specified id will be run
  (see the output of test_bpf without parameters to get the test id).

* test_range=<number>,<number>: only the tests within IDs in the
  specified id range are run (see the output of test_bpf without
  parameters to get the test ids).

Any invalid range, test id or test name will result in -EINVAL being
returned and no tests being run.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2648d4e

test_bpf: test LD_ABS and LD_IND instructions on fragmented skbs. · 2cf1ad75

Nicolas Schichan authored Aug 04, 2015

These new tests exercise various load sizes and offsets crossing the
head/fragment boundary.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2cf1ad75

test_bpf: allow tests to specify an skb fragment. · bac142ac

Nicolas Schichan authored Aug 04, 2015

This introduce a new test->aux flag (FLAG_SKB_FRAG) to tell the
populate_skb() function to add a fragment to the test skb containing
the data specified in test->frag_data).
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bac142ac

test_bpf: avoid oopsing the kernel when generate_test_data() fails. · e34684f8

Nicolas Schichan authored Aug 04, 2015

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e34684f8

Merge branch 'mlx5e-next' · c71b5ad0

David S. Miller authored Aug 06, 2015

Amir Vadai says:

====================
net/mlx5e: Driver updates 04-Aug-2015

This patchset introduces two features to the ConnectX-4 driver: Patch 8/8
("Support physical port counters") exposes some hardware counters through
ethtool. Rest of the patches are preparation and usage of what we call
light-weight netdev open/close. Some flows that used to be in the ndo_open/stop
are moved to the PCI probe/remove flows - i.e. we will make the netdev
open/close operations more "light-weight".

The benefits of this change are:
1) Reduce the execution time of the stop/open operations.
2) Avoid saving SW shadows of resource configurations that must
   persist through stop/open operations (e.g flow table steering
   rules), and avoid deleting/applying them from/to the device upon
   netdev stop/open.
3) Avoid synchronizing threads that access those resources with the
   netdev stop/open threads.

Instead of create/destroy the resource during netdev open/stop, This patchset
changes the behavior such that upon netdev stop, traffic is redirected to a
"Drop RQ" (a RQ that silently drops, at the NIC HW level all incoming traffic).
After redirecting the traffic, RX/TX software resources could be destroyed.
During netdev open, the RX/TX rings are created and traffic is redirected to
the RX rings.

Patchset was applied and tested over commit ba7591d8 ("ebpf: add skb->hash to
offset map for usage in {cls, act}_bpf or filters")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c71b5ad0

net/mlx5_core: Support physical port counters · efea389d

Gal Pressman authored Aug 04, 2015

Added physical port counters in the following standard formats to
ethtool statistics:
  - IEEE 802.3
  - RFC2863
  - RFC2819
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

efea389d

net/mlx5e: Take advantage of the light-weight netdev open/stop · 9b37b07f

Achiad Shochat authored Aug 04, 2015

Now that TIRs, TISs and flow tables are kept alive while the netdev is
stopped (after executing ndo_stop()) we can do the following
improvements:

- Obsolete the active_vlans SW shadow.
- Do not delete/add flow table rules upon ndo_stop/open.
  In addition to simplifying the flow, this change also fastens
  the ndo_open/close operations.
- Obsolete synchronization of threads accessing the flow tables
  with the netdev stop/open threads.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b37b07f

net/mlx5e: Disable async events before unregister_netdev() · 1cefa326

Achiad Shochat authored Aug 04, 2015

It does not make sense to allow events while the netdev is
unregistered.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1cefa326

net/mlx5e: Rename/move functions following the ndo_stop flow change · 40ab6a6e

Achiad Shochat authored Aug 04, 2015

Rename some functions that used to be invoked upon ndo_open/stop and
are now invoked upon create/destroy_netdev() in order to better hint
their place in the flow.

Change some functions location in the file so that functions involved
in ndo_open/stop flow will not be interleaved with other functions.

This is a cosmetic change, no logical change here.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40ab6a6e

net/mlx5e: Light-weight netdev open/stop · 5c50368f

Achiad Shochat authored Aug 04, 2015

Create/destroy TIRs, TISs and flow tables upon PCI probe/remove rather
than upon the netdev ndo_open/stop.

Upon ndo_stop(), redirect all RX traffic to the (lately introduced)
"Drop RQ" and then close only the RX/TX rings, leaving the TIRs,
TISs and flow tables alive.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c50368f

net/mlx5_core: Introduce access function to modify RSS/LRO params · d9eea403

Achiad Shochat authored Aug 04, 2015

To be used by the mlx5 Eth driver in following commit.

This is in preparation for netdev "light-weight" open/stop flow
change described in previous commit.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9eea403

net/mlx5e: Introduce the "Drop RQ" · 50cfa25a

Achiad Shochat authored Aug 04, 2015

RX traffic routed to this RQ will be silently dropped, at the NIC HW
level.

This is in preparation for netdev "light-weight" open/stop flow
change described in previous commit.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50cfa25a

net/mlx5e: Unify the RX flow · 4cbeaff5

Achiad Shochat authored Aug 04, 2015

Generally an RX packet flows through the following objects:
Flow table --> TIR --> RQT --> RQ

Where:
- TIR stands for "Transport Interface Receive", defining the RSS and
  LRO paramaters.
- RQT stands for "RQ Table", implementing the RSS indirection table.
- RQ stands for "Receive Queue"

For flows that do not need LRO, nor RSS, the driver made a shortcut to
the above RX flow by pointing to the RQ directly from the TIR, yielding
this flow:
Flow table --> TIR --> RQ

In this commit we remove this shortcut by "inserting" a single-RQ RQT
between the TIR and the RQ, i.e RX packets will reach the same RQ but
will go through an RQT of size 1, pointing to just a single RQ.

This way the RX traffic re-direction to/from the "Drop RQ" will be more
uniform (AKA "one flow"), as it will involve only RQTs re-direction and
no TIRs re-direction.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4cbeaff5

Merge branch 'cpsw-next' · adc4cc99

David S. Miller authored Aug 06, 2015

Mugunthan V N says:

====================
CPSW interrupt handling cleanup and performance improvement

This patch series removes the irq controller disable interrupt and
adding a napi for tx event handling which improves the performance by
~180Mbps on dra7-evm

[  5] local 192.168.10.116 port 5001 connected with 192.168.10.165 port 44176
[  5]  0.0-60.0 sec  1.48 GBytes   210 Mbits/sec
[  4] local 192.168.10.116 port 5001 connected with 192.168.10.165 port 33257
[  4]  0.0-60.0 sec  2.71 GBytes   386 Mbits/sec

Changes from initial version:
* Added a patch to have napi only for first interface as there is
  no use of having seperate napis for each interface as the
  interrupt is shared by both interface and only one napi is
  scheduled for each interrupt.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

adc4cc99

drivers: net: cpsw: add separate napi for tx · 32a7432c

Mugunthan V N authored Aug 04, 2015

Instead of processing tx events in isr adding separate napi for
tx which improves performance by ~180Mbps with
omap2plus_defconfig on DRA74x platform. Also cleaning up rx napis
by renaming to napi_rx for better understanding the code.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

32a7432c

drivers: net: cpsw: dual_emac: simplify napi usage · d354eb85

Mugunthan V N authored Aug 04, 2015

Since interrupt is shared between the two ethernet interface and
in isr only one napi is scheduled at an instance so having two
napis doesn't make any difference. So making napi also as a
common resource for the dual ethernet interfaces.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d354eb85

drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself · 870915fe

Mugunthan V N authored Aug 04, 2015

CPSW interrupts can be disabled by masking CPSW interrupts and
clearing interrupt by writing appropriate EOI. So removing all
disable_irq/enable_irq as discussed in [1]

[1] http://patchwork.ozlabs.org/patch/492741/Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

870915fe

mpls: small cleanup in inet/inet6_fib_lookup_dev() · 5a9348b5

Dan Carpenter authored Aug 04, 2015

We recently changed this code from returning NULL to returning ERR_PTR.
There are some left over NULL assignments which we can remove.  We can
preserve the error code from ip_route_output() instead of always
returning -ENODEV.  Also these functions use a mix of gotos and direct
returns.  There is no cleanup necessary so I changed the gotos to
direct returns.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a9348b5

Merge branch 'bnx2x-cnic-bnx2fc-bd-support' · 02b52428

David S. Miller authored Aug 06, 2015

Yuval Mintz says:

====================
bnx2x, cnic, bnx2fc: add support for BD

Commit 230d00eb ("bnx2x: new Multi-function mode - BD") added support
for a new multi-function mode, but it added only the support required by
bnx2x for L2 interfaces.

This adds the required changes to support the new multi-function mode in
the offloaded storage protocols.

Dave,

Please consider applying this series to `net-next'.

Do notice that this involves non-networking driver changes -
but sending this as a single series seemed like the best approach as
we had to have bnx2x changes to support the new functionality.
If this is problematic, please tell us what's the preferred solution here.

Changes from previous versions
------------------------------

 - From v1 - no actual changes; v1 failed to reach netdev so in order to
   keep things in line I've termed this one v2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

02b52428

bnx2fc: Read npiv table from nvram and create vports. · 2971ff67

Joe Carnuccio authored Aug 04, 2015

Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2971ff67

bnx2x: Add BD support for storage · 97ac4ef7

Yuval Mintz authored Aug 04, 2015

Commit 230d00eb ("bnx2x: new Multi-function mode - BD") adds support
for the new mode in bnx2x. This expands this support by implementing
APIs required by our storage drivers to support that mode.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

97ac4ef7

cnic: Add the interfaces to get FC-NPIV table. · 9b8d5044

Adheer Chandravanshi authored Aug 04, 2015

Signed-off-by: Adheer Chandravanshi <adheer.chandravanshi@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b8d5044

cnic: Populate upper layer driver state in MFW · eddb7554

Tej Parkash authored Aug 04, 2015

Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eddb7554

rocker: use netdev_err after register_netdev · ff147028

Scott Feldman authored Aug 03, 2015

After successful register_netdev, we can use netdev_err rather the more
generic dev_err.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff147028