Commits · 36f877804c2471226b6c5cd72232c707e0e4c497 · nexedi / linux

24 Jan, 2017 15 commits

Merge branch 'packet-sampling-offload' · 36f87780

David S. Miller authored Jan 24, 2017

Jiri Pirko says:

====================
Add support for offloading packet-sampling

Yotam says:

The first patch introduces the psample module, a netlink channel dedicated
to packet sampling implemented using generic netlink. This module provides
a generic way for kernel modules to sample packets, while not being tied
to any specific subsystem like NFLOG.

The second patch adds the sample tc action, which uses psample to randomly
sample packets that match a classifier. The user can configure the psample
group number, the sampling rate and the packet's truncation (to save
kernel-user traffic).

The last two patches add the support for offloading the matchall-sample
tc command in the mlxsw driver, for ingress qdiscs.

An example for psample usage can be found in the libpsample project at:
https://github.com/Mellanox/libpsample

v1->v2:
- Reword first patch's commit message
- Fix typo in comment in second patch
- Change order of tc_sample uapi enum to match convention
- Rename act_sample action callback tcf_sample -> tcf_sample_act
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

36f87780

mlxsw: spectrum: Add packet sample offloading support · 98d0f7b9

Yotam Gigi authored Jan 23, 2017

Using the MPSC register, add the functions that configure port-based
packet sampling in hardware and the necessary datatypes in the
mlxsw_sp_port struct. In addition, add the necessary trap for sampled
packets and integrate with matchall offloading to allow offloading of the
sample tc action.

The current offload support is for the tc command:

tc filter add dev <DEV> parent ffff: \
	  matchall skip_sw \
	  action sample rate <RATE> group <GROUP> [trunc <SIZE>]

Where only ingress qdiscs are supported, and only a combination of
matchall classifier and sample action will lead to activating hardware
packet sampling.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98d0f7b9

mlxsw: reg: add the Monitoring Packet Sampling Configuration Register · 0677d682

Yotam Gigi authored Jan 23, 2017

The MPSC register allows to configure ingress packet sampling on specific
port of the mlxsw device. The sampled packets are then trapped via
PKT_SAMPLE trap.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0677d682

net/sched: Introduce sample tc action · 5c5670fa

Yotam Gigi authored Jan 23, 2017

This action allows the user to sample traffic matched by tc classifier.
The sampling consists of choosing packets randomly and sampling them using
the psample module. The user can configure the psample group number, the
sampling rate and the packet's truncation (to save kernel-user traffic).

Example:
To sample ingress traffic from interface eth1, one may use the commands:

tc qdisc add dev eth1 handle ffff: ingress

tc filter add dev eth1 parent ffff: \
	   matchall action sample rate 12 group 4

Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets on
dev eth1 to psample group 4.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c5670fa

net: Introduce psample, a new genetlink channel for packet sampling · 6ae0a628

Yotam Gigi authored Jan 23, 2017

Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be used by tc,
iptables, etc. and allow to standardize packet sampling in the kernel.

For every sampled packet, the psample module adds the following metadata
fields:

PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable

PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable

PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been
   truncated during sampling

PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the
   user who initiated the sampling. This field allows the user to
   differentiate between several samplers working simultaneously and
   filter packets relevant to him

PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The
   sequence is kept for each group

PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets

PSAMPLE_ATTR_DATA - the actual packet bits

The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast
group. In addition, add the GET_GROUPS netlink command which allows the
user to see the current sample groups, their refcount and sequence number.
This command currently supports only netlink dump mode.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ae0a628

Merge branch 'mdio_module_driver-misc' · d36db83b

David S. Miller authored Jan 24, 2017

Florian Fainelli says:

====================
net: couple mdio_module_driver changes

Small patch series fixing a comment for mdio_module_driver and
finally utilizing it in b53_mdio.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d36db83b

net: dsa: b53: Utilize mdio_module_driver · 8a180cc7

Florian Fainelli authored Jan 22, 2017

Eliminate a bit of boilerplate code.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a180cc7

net: phy: Fix typo for MDIO module boilerplate comment · b70f43a1

Florian Fainelli authored Jan 22, 2017

The module boilerplate macro is named mdio_module_driver and not
module_mdio_driver, fix that.

Fixes: a9049e0c ("mdio: Add support for mdio drivers.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b70f43a1

Merge branch 'stmmac-dwmac-meson8b-configurable-RGMII-TX-delay' · dd8e01fb

David S. Miller authored Jan 24, 2017

Martin Blumenstingl says:

====================
stmmac: dwmac-meson8b: configurable RGMII TX delay

Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4
cycle (= 2ns) TX clock delay. This seems to work fine for many boards
(for example Odroid-C2 or Amlogic's reference boards) but there are
some others where TX traffic is simply broken.
There are probably multiple reasons why it's working on some boards
while it's broken on others:
- some of Amlogic's reference boards are using a Micrel PHY
- hardware circuit design
- maybe more...

iperf3 results on my Mecool BB2 board (Meson GXM, RTL8211F PHY) with
TX clock delay disabled on the MAC (as it's enabled in the PHY driver).
TX throughput was virtually zero before:
$ iperf3 -c 192.168.1.100 -R
Connecting to host 192.168.1.100, port 5201
Reverse mode, remote host 192.168.1.100 is sending
[  4] local 192.168.1.206 port 52828 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   108 MBytes   901 Mbits/sec
[  4]   1.00-2.00   sec  94.2 MBytes   791 Mbits/sec
[  4]   2.00-3.00   sec  96.5 MBytes   810 Mbits/sec
[  4]   3.00-4.00   sec  96.2 MBytes   808 Mbits/sec
[  4]   4.00-5.00   sec  96.6 MBytes   810 Mbits/sec
[  4]   5.00-6.00   sec  96.5 MBytes   810 Mbits/sec
[  4]   6.00-7.00   sec  96.6 MBytes   810 Mbits/sec
[  4]   7.00-8.00   sec  96.5 MBytes   809 Mbits/sec
[  4]   8.00-9.00   sec   105 MBytes   884 Mbits/sec
[  4]   9.00-10.00  sec   111 MBytes   934 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1000 MBytes   839 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   998 MBytes   837 Mbits/sec                  receiver

iperf Done.
$ iperf3 -c 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  4] local 192.168.1.206 port 52832 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.01   sec  99.5 MBytes   829 Mbits/sec  117    139 KBytes
[  4]   1.01-2.00   sec   105 MBytes   884 Mbits/sec  129   70.7 KBytes
[  4]   2.00-3.01   sec   107 MBytes   889 Mbits/sec  106    187 KBytes
[  4]   3.01-4.01   sec   105 MBytes   878 Mbits/sec   92    143 KBytes
[  4]   4.01-5.00   sec   105 MBytes   882 Mbits/sec  140    129 KBytes
[  4]   5.00-6.01   sec   106 MBytes   883 Mbits/sec  115    195 KBytes
[  4]   6.01-7.00   sec   102 MBytes   863 Mbits/sec  133   70.7 KBytes
[  4]   7.00-8.01   sec   106 MBytes   884 Mbits/sec  143   97.6 KBytes
[  4]   8.01-9.01   sec   104 MBytes   875 Mbits/sec  124    107 KBytes
[  4]   9.01-10.01  sec   105 MBytes   876 Mbits/sec   90    139 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.01  sec  1.02 GBytes   874 Mbits/sec  1189             sender
[  4]   0.00-10.01  sec  1.02 GBytes   873 Mbits/sec                  receiver

iperf Done.

I get similar TX throughput on my Meson GXBB "MXQ Pro+" board when I
disable the PHY's TX-delay and configure a 4ms TX-delay on the MAC.
So changes to at least the RTL8211F PHY driver are needed to get it
working properly in all situations.

Changes since v4:
- add a fallback of 2ns (the value which was previously hardcoded) for
  the TX delay so we are backwards-compatible with older .dts'
- update the documentation with the new fallback value and add a small
  note that the "amlogic,tx-delay" property is ignored when the phy-mode
  is "rmii".

Changes since v3:
- rebased to apply against current net-next branch (fixes a conflict
  with d2ed0a77 "net: ethernet: stmmac: fix of-node and
  fixed-link-phydev leaks")

Changes since v2:
- moved all .dts patches (3-7) to a separate series
- removed the default 2ns TX delay when phy-mode RGMII is specified
- (rebased against current net-next)

Changes since v1:
- renamed the devicetree property "amlogic,tx-delay" to
  "amlogic,tx-delay-ns", which makes the .dts easier to read as we can
  simply specify human-readable values instead of having "preprocessor
  defines and calculation in human brain". Thanks to Andrew Lunn for
  the suggestion!
- improved documentation to indicate when the MAC TX-delay should be
  configured and how to use the PHY's TX-delay
- changed the default TX-delay in the dwmac-meson8b driver from 2ns
  to 0ms when any of the rgmii-*id modes are used (the 2ns default
  value still applies for phy-mode "rgmii")
- added patches to properly reset the PHY on Meson GXBB devices and to
  use a similar configuration than the one we use on Meson GXL devices
  (by passing a phy-handle to stmmac and defining the PHY in the mdio0
  bus - patch 3-6)
- add the "amlogic,tx-delay-ns" property to all boards which are using
  the RGMII PHY (patch 7)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dd8e01fb

net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable · b765234e

Martin Blumenstingl authored Jan 22, 2017

Prior to this patch we were using a hardcoded RGMII TX clock delay of
2ns (= 1/4 cycle of the 125MHz RGMII TX clock). This value works for
many boards, but unfortunately not for all (due to the way the actual
circuit is designed, sometimes because the TX delay is enabled in the
PHY, etc.). Making the TX delay on the MAC side configurable allows us
to support all possible hardware combinations.

This allows fixing a compatibility issue on some boards, where the
RTL8211F PHY is configured to generate the TX delay. We can now turn
off the TX delay in the MAC, because otherwise we would be applying the
delay twice (which results in non-working TX traffic).
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b765234e

net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac · d5490f1f

Martin Blumenstingl authored Jan 22, 2017

This allows configuring the RGMII TX clock delay. The RGMII clock is
generated by underlying hardware of the the Meson 8b / GXBB DWMAC glue.
The configuration depends on the actual hardware (no delay may be
needed due to the design of the actual circuit, the PHY might add this
delay, etc.).
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5490f1f

net: dsa: Fix inverted test for multiple CPU interface · 23e3d618

Andrew Lunn authored Jan 22, 2017

Remove the wrong !, otherwise we get false positives about having
multiple CPU interfaces.

Fixes: b22de490 ("net: dsa: store CPU switch structure in the tree")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23e3d618

bridge: multicast to unicast · 6db6f0ea

Felix Fietkau authored Jan 21, 2017

Implements an optional, per bridge port flag and feature to deliver
multicast packets to any host on the according port via unicast
individually. This is done by copying the packet per host and
changing the multicast destination MAC to a unicast one accordingly.

multicast-to-unicast works on top of the multicast snooping feature of
the bridge. Which means unicast copies are only delivered to hosts which
are interested in it and signalized this via IGMP/MLD reports
previously.

This feature is intended for interface types which have a more reliable
and/or efficient way to deliver unicast packets than broadcast ones
(e.g. wifi).

However, it should only be enabled on interfaces where no IGMPv2/MLDv1
report suppression takes place. This feature is disabled by default.

The initial patch and idea is from Felix Fietkau.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[linus.luessing@c0d3.blue: various bug + style fixes, commit message]
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6db6f0ea

Introduce a sysctl that modifies the value of PROT_SOCK. · 4548b683

Krister Johansen authored Jan 20, 2017

Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace. To
disable all privileged ports set this to zero. It also checks for
overlap with the local port range. The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration. The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace. This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4548b683

bpf, lpm: fix kfree of im_node in trie_update_elem · d140199a

Daniel Borkmann authored Jan 24, 2017

We need to initialize im_node to NULL, otherwise in case of error path
it gets passed to kfree() as uninitialized pointer.

Fixes: b95a5c4d ("bpf: add a longest prefix match trie map implementation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d140199a

23 Jan, 2017 9 commits

Merge branch 'bpf-lpm' · 2acc76cb

David S. Miller authored Jan 23, 2017

Daniel Mack says:

====================
bpf: add longest prefix match map

This patch set adds a longest prefix match algorithm that can be used
to match IP addresses to a stored set of ranges. It is exposed as a
bpf map type.

Internally, data is stored in an unbalanced tree of nodes that has a
maximum height of n, where n is the prefixlen the trie was created
with.

Note that this has nothing to do with fib or fib6 and is in no way meant
to replace or share code with it. It's rather a much simpler
implementation that is specifically written with bpf maps in mind.

Patch 1/2 adds the implementation, 2/2 an extensive test suite and 3/3
has benchmarking code for the new trie type.

Feedback is much appreciated.

Changelog:

v3 -> v4:
	* David added a 3rd patch that augments map_perf_test for
	  LPM trie benchmarks
	* Limit allocation of maps of this new type to CAP_SYS_ADMIN
	  for now, as requested by Alexei
	* Add a stub .map_delete_elem so the core does not stumble
	  over a NULL pointer when the syscall is invoked
	* Tests for non-power-of-2 prefix lengths were added
	* More comment style fixes

v2 -> v3:
	* Store both the key match data and the caller provided
	  value in the same byte array attached to a node. This
	  avoids double allocations
	* Bring back node->flags to distinguish between 'real'
	  and intermediate nodes
	* Fix comment style and some typos

v1 -> v2:
	* Turn spin lock into raw spinlock
	* Lock with irqsave options during trie_update_elem()
	* Return -ENOMEM properly from trie_alloc()
	* Force attr->flags == BPF_F_NO_PREALLOC during creation
	* Set trie->map.pages after creation to account for map memory
	* Allow arbitrary value sizes
	* Removed node->flags and denode intermediate nodes through
	  node->value == NULL instead

rfc -> v1:
	* Add __rcu pointer annotations to make sparse happy
	* Fold _lpm_trie_find_target_node() into its only caller
	* Fix some minor documentation issues
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2acc76cb

samples/bpf: add lpm-trie benchmark · b8a943e2

David Herrmann authored Jan 21, 2017

Extend the map_perf_test_{user,kern}.c infrastructure to stress test
lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
the latency depending on trie size and lookup count.

On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
bpf program takes roughly 6.5us on my system. Lookups in empty tries
take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
entries take ~7.1us (on the first _and_ any subsequent try).
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8a943e2

bpf: Add tests for the lpm trie map · 4d3381f5

David Herrmann authored Jan 21, 2017

The first part of this program runs randomized tests against the
lpm-bpf-map. It implements a "Trivial Longest Prefix Match" (tlpm)
based on simple, linear, single linked lists. The implementation
should be pretty straightforward.

Based on tlpm, this inserts randomized data into bpf-lpm-maps and
verifies the trie-based bpf-map implementation behaves the same way
as tlpm.

The second part uses 'real world' IPv4 and IPv6 addresses and tests
the trie with those.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d3381f5

bpf: add a longest prefix match trie map implementation · b95a5c4d

Daniel Mack authored Jan 21, 2017

This trie implements a longest prefix match algorithm that can be used
to match IP addresses to a stored set of ranges.

Internally, data is stored in an unbalanced trie of nodes that has a
maximum height of n, where n is the prefixlen the trie was created
with.

Tries may be created with prefix lengths that are multiples of 8, in
the range from 8 to 2048. The key used for lookup and update operations
is a struct bpf_lpm_trie_key, and the value is a uint64_t.

The code carries more information about the internal implementation.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b95a5c4d

net: xilinx: constify net_device_ops structure · 10eeb5e6

Bhumika Goyal authored Jan 21, 2017

Declare net_device_ops structure as const as it is only stored in
the netdev_ops field of a net_device structure. This field is of type
const, so net_device_ops structures having same properties can be made
const too.
Done using Coccinelle:

@r1 disable optional_qualifier@
identifier i;
position p;
@@
static struct net_device_ops i@p={...};

@ok1@
identifier r1.i;
position p;
struct net_device ndev;
@@
ndev.netdev_ops=&i@p

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct net_device_ops i;

File size before:
   text	   data	    bss	    dec	    hex	filename
   6201	    744	      0	   6945	   1b21 ethernet/xilinx/xilinx_emaclite.o

File size after:
   text	   data	    bss	    dec	    hex	filename
   6745	    192	      0	   6937	   1b19 ethernet/xilinx/xilinx_emaclite.o
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10eeb5e6

net: moxa: constify net_device_ops structures · 30bd2f52

Bhumika Goyal authored Jan 21, 2017

Declare net_device_ops structure as const as it is only stored in
the netdev_ops field of a net_device structure. This field is of type
const, so net_device_ops structures having same properties can be made
const too.
Done using Coccinelle:

@r1 disable optional_qualifier@
identifier i;
position p;
@@
static struct net_device_ops i@p={...};

@ok1@
identifier r1.i;
position p;
struct net_device ndev;
@@
ndev.netdev_ops=&i@p

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct net_device_ops i;

File size before:
   text	   data	    bss	    dec	    hex	filename
   4821	    744	      0	   5565	   15bd ethernet/moxa/moxart_ether.o

File size after:
   text	   data	    bss	    dec	    hex	filename
   5373	    192	      0	   5565	   15bd ethernet/moxa/moxart_ether.o
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30bd2f52

net: qcom/emac: claim the irq only when the device is opened · 4404323c

Timur Tabi authored Jan 20, 2017

During reset, functions emac_mac_down() and emac_mac_up() are called,
so we don't want to free and claim the IRQ unnecessarily.  Move those
operations to open/close.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

4404323c

net: qcom/emac: rename emac_phy to emac_sgmii and move it · 41c1093f

Timur Tabi authored Jan 20, 2017

The EMAC has an internal PHY that is often called the "SGMII". This
SGMII is also connected to an external PHY, which is managed by phylib.
These dual PHYs often cause confusion. In this case, the data structure
for managing the SGMII was mis-named and located in the wrong header file.

Structure emac_phy is renamed to emac_sgmii to clearly indicate it applies
to the internal PHY only. It also also moved from emac_phy.h (which
supports the external PHY) to emac_sgmii.h (where it belongs).

To keep the changes minimal, only the structure name is changed, not
the names of any variables of that type.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

41c1093f

bnx2x: avoid two atomic ops per page on x86 · b9032741

Eric Dumazet authored Jan 20, 2017

Commit 4cace675 ("bnx2x: Alloc 4k fragment for each rx ring buffer
element") added extra put_page() and get_page() calls on arches where
PAGE_SIZE=4K like x86

Reorder things to avoid this overhead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9032741

22 Jan, 2017 15 commits

Merge branch 'bcm7278' · 41e8c70e

David S. Miller authored Jan 22, 2017

Florian Fainelli says:

====================
net: dsa: bcm_sf2: Add support for BCM7278

This patch series adds support for the Broadcom BCM7278 integrated switch
which is a successor of the BCM7445 switch. We have a little bit of
register shuffling going on, which is why most of the functional changes
are to deal with that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

41e8c70e

net: phy: bcm7xxx: Implement EGPHY workaround for 7278 · 039a7b85

Florian Fainelli authored Jan 20, 2017

Implement the HW design team recommended workaround in for 7278. Since
the GPHY now returns its revision information in MII_PHYS_ID[23] we need
to check whether the revision provided in flags is 0 or not.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

039a7b85

net: phy: bcm7xxx: Add entry for BCM7278 · 582d0ac3

Florian Fainelli authored Jan 20, 2017

Add support for the BCM7278 28nm process Gigabit Ethernet PHY.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

582d0ac3

net: dsa: bcm_sf2: Allow non-IMP ports to have Broadcom tags enabled · 64ff2aef

Florian Fainelli authored Jan 20, 2017

Parse the "brcm,use-bcm-hdr" boolean property during ports
identification to fill a bitmask of ports that should have Broadcom tags
enabled. This is needed in some configurations where per-packet metadata
can be exchanged using Broadcom tags between the switch and an on-chip
acceleration device.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

64ff2aef

net: dsa: bcm_sf2: Move code enabling Broadcom tags · ebb2ac4f

Florian Fainelli authored Jan 20, 2017

In preparation for enabling Broadcom tags on different ports based on
configuration information, dedicate a function that is responsible for
enabling Broadcom tags for a given port and update the IMP port setup to
call it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebb2ac4f

net: dsa: bcm_sf2: Add support for BCM7278 integrated switch · 0fe99338

Florian Fainelli authored Jan 20, 2017

Add support for the integrated switch found on BCM7278:

- core_reg_align is set to 1, to force a translation into the target
  address space which is 8 bytes aligned
- an alternate SWITCH_REG layout is provided since registers are largely
  bit/masks compatible but have different offsets
- conditional for all CORE_STS_OVERRIDE_{IMP,GMII_P} since those got
  moved way out of the traditional register space
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fe99338

net: dsa: bcm_sf2: Prepare for different register layouts · a78e86ed

Florian Fainelli authored Jan 20, 2017

In preparation for supporting a new device with a slightly different
register layout, affecting the SWITCH_REG and SWITCH_CORE address
spaces, perform a few preparatory steps:

- allow matching the compatible string against a data description
- convert the SWITCH_REG register accesses into an indirection table
- prepare for supporting a SWITCH_CORE register alignment requirement
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a78e86ed

net: dsa: bcm_sf2: Make SF2_IO64_MACRO() utilize 32-bit macro · 329b5c58

Florian Fainelli authored Jan 20, 2017

There is no point inlining the 32-bit direct register read/write part,
just infer it from the existing macro. This will make it easier to
centralize the address rewriting that we are going to introduce later
on.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

329b5c58

Merge branch 'systemport-lite' · b20b564b

David S. Miller authored Jan 22, 2017

Florian Fainelli says:

====================
net: systemport: Add support for SYSTEMPORT lite

This patch series adds support for SYSTEMPORT Lite which is an evolution
of the existing SYSTEMPORT adapter.

The two generations are largely identical as far as the transmit/receive
path are concerned, and there were just a few control path changes here
and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b20b564b

net: systemport: Add support for SYSTEMPORT Lite · 44a4524c

Florian Fainelli authored Jan 20, 2017

Add supporf for the SYSTEMPORT Lite Ethernet controller, this piece of hardware
is largely based on the full-blown SYSTEMPORT and differs in the following:

- no full-blown UniMAC, instead we have the MagicPacket matching from UniMAC at
  same offset, and a GMII Interface Block (GIB) for the MAC-level stuff, since
  we are always interfaced to an Ethernet switch which is fully Ethernet compliant
  shortcuts could be made

- 16 transmit queues, whose interrupts are moved into the first Level-2 interrupt
  controller bank

- slight TDMA offset change (a register was inserted after TDMA_STATUS, *sigh*)

- 256 RX descriptors (512 words) and 256 TX descriptors (not visible)

As a consequence of these two things, update the code paths accordingly to
differentiate the full-blown from the light version.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

44a4524c

net: systemport: Dynamically allocate number of TX rings · 7b78be48

Florian Fainelli authored Jan 20, 2017

In preparation for adding SYSTEMPORT Lite, which has twice as less transmit
queues than SYSTEMPORT make sure we do allocate TX rings based on the
systemport,txq property to get an appropriate memory footprint.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b78be48

ipv6: add NUMA awareness to seg6_hmac_init_algo() · 9ca677b1

Eric Dumazet authored Jan 20, 2017

Since we allocate per cpu storage, let's also use NUMA hints.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ca677b1

net: stmicro: fix LS field mask in EEE configuration · f4ec6064

jpinto authored Jan 20, 2017

This patch fixes the LS mask when setting EEE timer.
LS field is 10 bits long and not 11 as currently.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Reported-By: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4ec6064

net/mlx4: use rb_entry() · 3704eb6f

Geliang Tang authored Jan 20, 2017

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3704eb6f

6lowpan: use rb_entry() · 530cef21

Geliang Tang authored Jan 20, 2017

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

530cef21

20 Jan, 2017 1 commit

Merge branch 'dsa-hwmon' · 9a549c1e

David S. Miller authored Jan 20, 2017

Andrew Lunn says:

====================
net: dsa: Move temperature sensor code into PHY.

Marvell Ethernet switches contain a temperature sensor. There appears
to be one sensor, which is shared by each of the internal PHYs. Each
PHY has independent registers to read this sensor, and to set a limit
for when an alarm should be raised.

Some Marvell discrete PHY also have the same sensor and registers.
Moving the HWMON code from DSA into the PHY makes the sensor available
in discrete PHYs, and removes the layering violation, the switch
driver poking around in PHY registers.

While moving the code into the PHY driver, it has been re-written to
use the new HWMON APIs.

v2:

Better Cover note explaining one sensor, but multiple independent
registers

Simply error checking.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9a549c1e