Commits · f330a7fdbe1611104622faff7e614a246a7d20f0 · nexedi / linux

30 Aug, 2016 3 commits

netfilter: conntrack: get rid of conntrack timer · f330a7fd

Florian Westphal authored Aug 25, 2016

With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.

Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9, 'timers: Switch to
a non-cascading wheel').

Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.

During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.

The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.

Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.

This is the standard/expected way when conntrack entries are destroyed.

Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

f330a7fd

netfilter: don't rely on DYING bit to detect when destroy event was sent · 616b14b4

Florian Westphal authored Aug 25, 2016

The reliable event delivery mode currently (ab)uses the DYING bit to
detect which entries on the dying list have to be skipped when
re-delivering events from the eache worker in reliable event mode.

Currently when we delete the conntrack from main table we only set this
bit if we could also deliver the netlink destroy event to userspace.

If we fail we move it to the dying list, the ecache worker will
reattempt event delivery for all confirmed conntracks on the dying list
that do not have the DYING bit set.

Once timer is gone, we can no longer use if (del_timer()) to detect
when we 'stole' the reference count owned by the timer/hash entry, so
we need some other way to avoid racing with other cpu.

Pablo suggested to add a marker in the ecache extension that skips
entries that have been unhashed from main table but are still waiting
for the last reference count to be dropped (e.g. because one skb waiting
on nfqueue verdict still holds a reference).

We do this by adding a tristate.
If we fail to deliver the destroy event, make a note of this in the
eache extension.  The worker can then skip all entries that are in
a different state.  Either they never delivered a destroy event,
e.g. because the netlink backend was not loaded, or redelivery took
place already.

Once the conntrack timer is removed we will now be able to replace
del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid
racing with other cpu that tries to evict the same conntrack.

Because DYING will then be set right before we report the destroy event
we can no longer skip event reporting when dying bit is set.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

616b14b4

netfilter: restart search if moved to other chain · 95a8d19f

Florian Westphal authored Aug 25, 2016

In case nf_conntrack_tuple_taken did not find a conflicting entry
check that all entries in this hash slot were tested and restart
in case an entry was moved to another chain.
Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: ea781f19 ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

95a8d19f

26 Aug, 2016 3 commits

netfilter: nf_tables: Use nla_put_be32() to dump immediate parameters · 7073b16f

Pablo Neira Ayuso authored Aug 26, 2016

nft_dump_register() should only be used with registers, not with
immediates.

Fixes: cb1b69b0 ("netfilter: nf_tables: add hash expression")
Fixes: 91dbc6be("netfilter: nf_tables: add number generator expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7073b16f

netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion · c016c7e4

Pablo Neira Ayuso authored Aug 24, 2016

If the NLM_F_EXCL flag is set, then new elements that clash with an
existing one return EEXIST. In case you try to add an element whose
data area differs from what we have, then this returns EBUSY. If no
flag is specified at all, then this returns success to userspace.

This patch also update the set insert operation so we can fetch the
existing element that clashes with the one you want to add, we need
this to make sure the element data doesn't differ.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c016c7e4

rhashtable: add rhashtable_lookup_get_insert_key() · 5ca8cc5b

Pablo Neira Ayuso authored Aug 24, 2016

This patch modifies __rhashtable_insert_fast() so it returns the
existing object that clashes with the one that you want to insert.
In case the object is successfully inserted, NULL is returned.
Otherwise, you get an error via ERR_PTR().

This patch adapts the existing callers of __rhashtable_insert_fast()
so they handle this new logic, and it adds a new
rhashtable_lookup_get_insert_key() interface to fetch this existing
object.

nf_tables needs this change to improve handling of EEXIST cases via
honoring the NLM_F_EXCL flag and by checking if the data part of the
mapping matches what we have.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

5ca8cc5b

23 Aug, 2016 3 commits

netfilter: nf_tables: reject hook configuration updates on existing chains · 6133740d

Pablo Neira Ayuso authored Aug 02, 2016

Currently, if you add a base chain whose name clashes with an existing
non-base chain, nf_tables doesn't complain about this. Similarly, if you
update the chain type, the hook number and priority.

With this patch, nf_tables bails out in case any of this unsupported
operations occur by returning EBUSY.

 # nft add table x
 # nft add chain x y
 # nft add chain x y { type nat hook input priority 0\; }
 <cmdline>:1:1-49: Error: Could not process rule: Device or resource busy
 add chain x y { type nat hook input priority 0; }
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6133740d

netfilter: nf_tables: introduce nft_chain_parse_hook() · 508f8ccd

Pablo Neira Ayuso authored Aug 02, 2016

Introduce a new function to wrap the code that parses the chain hook
configuration so we can reuse this code to validate chain updates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

508f8ccd

netfilter: nf_tables: typo in trace attribute definition · b43f9569

Pablo Neira authored Aug 22, 2016

Should be attributes, instead of attibutes, for consistency with other
definitions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b43f9569

22 Aug, 2016 4 commits

netfilter: nft_hash: fix non static symbol warning · a5e57336

Wei Yongjun authored Aug 21, 2016

Fixes the following sparse warning:

net/netfilter/nft_hash.c:40:25: warning:
 symbol 'nft_hash_policy' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

a5e57336

netfilter: fix spelling mistake: "delimitter" -> "delimiter" · 8d6c0eaa

Colin Ian King authored Aug 18, 2016

trivial fix to spelling mistake in pr_debug message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8d6c0eaa

netfilter: nf_tables: add number generator expression · 91dbc6be

Laura Garcia Liebana authored Aug 18, 2016

This patch adds the numgen expression that allows us to generated
incremental and random numbers, this generator is bound to a upper limit
that is specified by userspace.

This expression is useful to distribute packets in a round-robin fashion
as well as randomly.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

91dbc6be

netfilter: nf_tables: add quota expression · 3d2f30a1

Pablo Neira Ayuso authored Aug 18, 2016

This patch adds the quota expression. This new stateful expression
integrate easily into the dynset expression to build 'hashquota' flow
tables.

Arguably, we could use instead "counter bytes > 1000" instead, but this
approach has several problems:

1) We only support for one single stateful expression in dynamic set
   definitions, and the expression above is a composite of two
   expressions: get counter + comparison.

2) We would need to restore the packed counter representation (that we
   used to have) based on seqlock to synchronize this, since per-cpu is
   not suitable for this.

So instead of bloating the counter expression back with the seqlock
representation and extending the existing set infrastructure to make it
more complex for the composite described above, let's follow the more
simple approach of adding a quota expression that we can plug into our
existing infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3d2f30a1

18 Aug, 2016 1 commit

netfilter: nf_conntrack: restore nf_conntrack_htable_size as exported symbol · 2567c4ea

Pablo Neira Ayuso authored Aug 18, 2016

This is required to iterate over the hash table in cttimeout, ctnetlink
and nf_conntrack_ipv4.

>> ERROR: "nf_conntrack_htable_size" [net/netfilter/nfnetlink_cttimeout.ko] undefined!
   ERROR: "nf_conntrack_htable_size" [net/netfilter/nf_conntrack_netlink.ko] undefined!
   ERROR: "nf_conntrack_htable_size" [net/ipv4/netfilter/nf_conntrack_ipv4.ko] undefined!

Fixes: adf05168 ("netfilter: remove ip_conntrack* sysctl compat code")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2567c4ea

17 Aug, 2016 1 commit

netfilter: conntrack: simplify the code by using nf_conntrack_get_ht · 92e47ba8

Liping Zhang authored Aug 13, 2016

Since commit 64b87639 ("netfilter: conntrack: fix race between
nf_conntrack proc read and hash resize") introduce the
nf_conntrack_get_ht, so there's no need to check nf_conntrack_generation
again and again to get the hash table and hash size. And convert
nf_conntrack_get_ht to inline function here.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

92e47ba8

13 Aug, 2016 1 commit

netfilter: remove ip_conntrack* sysctl compat code · adf05168

Pablo Neira Ayuso authored Aug 12, 2016

This backward compatibility has been around for more than ten years,
since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
the conntrack utility got adopted by many people in the user community
according to what I observed on the netfilter user mailing list.

So let's get rid of this.

Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
not need to be exported as symbol anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

adf05168

12 Aug, 2016 1 commit

netfilter: nf_tables: add hash expression · cb1b69b0

Laura Garcia Liebana authored Aug 11, 2016

This patch adds a new hash expression, this provides jhash support but
this can be extended to support for other hash functions. The modulus
and seed already comes embedded into this new expression.

Use case example:

	... meta mark set hash ip saddr mod 10
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cb1b69b0

11 Aug, 2016 23 commits

netfilter: nf_tables: rename set implementations · 0ed6389c

Pablo Neira Ayuso authored Aug 09, 2016

Use nft_set_* prefix for backend set implementations, thus we can use
nft_hash for the new hash expression.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0ed6389c

ipvs: use nf_ct_kill helper · a6c46d9b

Florian Westphal authored Aug 03, 2016

Once timer is removed from nf_conn struct we cannot open-code
the removal sequence anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

a6c46d9b

netfilter: use_nf_conn_expires helper in more places · d0b35b93

Florian Westphal authored Aug 03, 2016

... so we don't need to touch all of these places when we get rid of the
timer in nf_conn.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d0b35b93

netfilter: nf_dup4: remove redundant checksum recalculation · 9f7c824a

Liping Zhang authored Jul 30, 2016

IP header checksum will be recalculated at ip_local_out, so
there's no need to calculated it here, remove it. Also update
code comments to illustrate it, and delete the misleading
comments about checksum recalculation.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

9f7c824a

netfilter: physdev: add missed blank · ceee4091

Hangbin Liu authored Jul 25, 2016

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ceee4091

netfilter: conntrack: Only need first 4 bytes to get l4proto ports · e5e693ab

Gao Feng authored Jul 23, 2016

We only need first 4 bytes instead of 8 bytes to get the ports of
tcp/udp/dccp/sctp/udplite in their pkt_to_tuple function.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e5e693ab

net: ethernet: renesas: sh_eth: use new api ethtool_{get|set}_link_ksettings · f08aff44

Philippe Reynes authored Aug 10, 2016

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

f08aff44

net: ethernet: renesas: sh_eth: use phydev from struct net_device · 9fd0375a

Philippe Reynes authored Aug 10, 2016

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy_dev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9fd0375a

samples/bpf: fix bpf_perf_event_output prototype · 05b8ad25

Adam Barth authored Aug 10, 2016

The commit 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
started using 20 of initially reserved upper 32-bits of 'flags' argument
in bpf_perf_event_output(). Adjust corresponding prototype in samples/bpf/bpf_helpers.h
Signed-off-by: Adam Barth <arb@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

05b8ad25

net: macb: Add 64 bit addressing support for GEM · fff8019a

Harini Katakam authored Aug 09, 2016

This patch adds support for 64 bit addressing and BDs.
-> Enable 64 bit addressing in DMACFG register.
-> Set DMA mask when design config register shows support for 64 bit addr.
-> Add new BD words for higher address when 64 bit DMA support is present.
-> Add and update TBQPH and RBQPH for MSB of BD pointers.
-> Change extraction and updation of buffer addresses to use
64 bit address.
-> In gem_rx extract address in one place insted of two and use a
separate flag for RXUSED.
Signed-off-by: Harini Katakam <harinik@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fff8019a

qed*: Add support for ethtool link_ksettings callbacks. · 054c67d1

Sudarsana Reddy Kalluru authored Aug 09, 2016

This patch adds the driver implementation for ethtool link_ksettings
callbacks. qed driver now defines/uses the qed specific masks for
representing link capability values. qede driver maps these values to
to new link modes defined by the kernel implementation of link_ksettings.

Please consider applying this to 'net-next' branch.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

054c67d1

Merge branch 'cpsw-refactor' · e27d6cf5

David S. Miller authored Aug 10, 2016

Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: split driver data and per ndev data

In dual_emac mode the driver can handle 2 network devices. Each of them can use
its own private data and common data/resources. This patchset splits common driver
data/resources and private per net device data.
It leads to:
- reduce memory usage
- increase code readability
- allows add a bunch of simplification
- create prerequisites to add multi-channel support,
  when channels are shared between net devices

Doesn't have bad impact on performance.
v2: https://lkml.org/lkml/2016/8/6/108

Since v2:
- removed patch:
  net: ethernet: ti: cpsw: fix int dbg message
- replaced patch:
  "net: ethernet: ti: cpsw: remove redundant check in napi poll"
  on "net: ethernet: ti: cpsw: remove intr dbg msg from poll handlers"
- removed macro "cpsw_get_slave_ndev"
- corrected some commits

Since v1:
- added several patch improvements
- avoided variable reordering in structures
- removed static variable for common function
- split big patch on several patches:
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e27d6cf5

net: ethernet: ti: cpsw: move ale, cpts and drivers params under cpsw_common · 2a05a622

Ivan Khoronzhuk authored Aug 10, 2016

The ale, cpts, version, rx_packet_max, bus_freq, interrupt pacing
parameters are common per net device that uses the same h/w. So,
move them to common driver structure.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a05a622

net: ethernet: ti: cpsw: move napi struct to cpsw_common · dbc4ec52

Ivan Khoronzhuk authored Aug 10, 2016

The napi structs are common for both net devices in dual_emac
mode, In order to not hold duplicate links to them, move to
cpsw_common.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dbc4ec52

net: ethernet: ti: cpsw: move platform data and slaves info to cpsw_common · 606f3993

Ivan Khoronzhuk authored Aug 10, 2016

These data are common for net devs in dual_emac mode. No need to hold
it for every priv instance, so move them under cpsw_common.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

606f3993

net; ethernet: ti: cpsw: move irq stuff under cpsw_common · e38b5a3d

Ivan Khoronzhuk authored Aug 10, 2016

The irq data are common for net devs in dual_emac mode. So no need to
hold these data in every priv struct, move them under cpsw_common.
Also delete irq_num var, as after optimization it's not needed.
Correct number of irqs to 2, as anyway, driver is using only 2,
at least for now.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e38b5a3d

net: ethernet: ti: cpsw: move cpdma resources to cpsw_common · 2c836bd9

Ivan Khoronzhuk authored Aug 10, 2016

Every net device private struct holds links to shared cpdma resources.
No need to save and every time synchronize these resources per net dev.
So, move it to common driver struct.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c836bd9

net: ethernet: ti: cpsw: move links on h/w registers to cpsw_common · 5d8d0d4d

Ivan Khoronzhuk authored Aug 10, 2016

The pointers on h/w registers are common for every cpsw_private
instance, so no need to hold them for every ndev.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d8d0d4d

net: ethernet: ti: cpsw: replace pdev on dev · 56e31bd8

Ivan Khoronzhuk authored Aug 10, 2016

No need to hold pdev link when only dev is needed.
This allows to simplify a bunch of cpsw->pdev->dev now and farther.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56e31bd8

net: ethernet: ti: cpsw: create common struct to hold shared driver data · 649a1688

Ivan Khoronzhuk authored Aug 10, 2016

This patch simply create holder for common data and as a start moves
pdev var to it.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

649a1688

net: ethernet: ti: cpsw: don't check slave num in runtime · 82b52104

Ivan Khoronzhuk authored Aug 10, 2016

No need to check const slave num in runtime for every packet,
and ndev for slaves w/o ndev is anyway NULL. So remove redundant
check and macro.
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

82b52104

net: ethernet: ti: cpsw: remove clk var from priv · ef4183a1

Ivan Khoronzhuk authored Aug 10, 2016

There is no need to hold link to clk, it's used only once
while probe.
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef4183a1

net: ethernet: ti: cpsw: remove priv from cpsw_get_slave_port() parameters list · 6f1f5836

Ivan Khoronzhuk authored Aug 10, 2016

There is no need in priv here.
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f1f5836