Commits · a7975a2f9a7984de9b9b318da9d1826033db32c7 · Kirill Smelkov / linux

15 Oct, 2017 10 commits

Rahul Lakkireddy authored Oct 13, 2017

Add base to collect dump entities.  Collect register dump and
update template header accordingly.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7975a2f

cxgb4: implement ethtool dump data operations · ad75b7d3

Rahul Lakkireddy authored Oct 13, 2017

Implement operations to set/get dump data via ethtool.  Also add
template header that precedes dump data, which helps in decoding
and extracting the dump data.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad75b7d3

nfp: Explicitly include linux/bug.h · 4c7787ba

Mark Brown authored Oct 13, 2017

Today's -next build encountered an error due to a missing definition of
WARN_ON(), caused by some header reorganization removing an implicit
inclusion of linux/bug.h.  Fix this with an explicit inclusion.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c7787ba

Merge branch 'dsa-remove-set_addr' · 7f4e568d

David S. Miller authored Oct 14, 2017

Vivien Didelot says:

====================
net: dsa: remove .set_addr

An Ethernet switch may support having a MAC address, which can be used
as the switch's source address in transmitted full-duplex Pause frames.

If a DSA switch supports the related .set_addr operation, the DSA core
sets the master's MAC address on the switch.

This won't make sense anymore in a multi-CPU ports system, because there
won't be a unique master device assigned to a switch tree.

Moreover this operation is confusing because it makes the user think
that it could be used to program the switch with the MAC address of the
CPU/management port such that MAC address learning can be disabled on
said port, but in fact, that's not how it is currently used.

To fix this, assign a random MAC address at setup time in the mv88e6060
and mv88e6xxx drivers before removing .set_addr completely from DSA.

Changes in v3:
  - include fix for mv88e6060 switch MAC address setter.

Changes in v2:
  - remove .set_addr implementation from drivers and use a random MAC.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7f4e568d

net: dsa: remove .set_addr · 841f4f24

Vivien Didelot authored Oct 13, 2017

Now that there is no user for the .set_addr function, remove it from
DSA. If a switch supports this feature (like mv88e6xxx), the
implementation can be done in the driver setup.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

841f4f24

net: dsa: dsa_loop: remove .set_addr · 93004a93

Vivien Didelot authored Oct 13, 2017

The .set_addr function does nothing, remove the dsa_loop implementation
before getting rid of it completely in DSA.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93004a93

net: dsa: mv88e6060: setup random mac address · 56c3ff9b

Vivien Didelot authored Oct 13, 2017

As for mv88e6xxx, setup the switch from within the mv88e6060 driver with
a random MAC address, and remove the .set_addr implementation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56c3ff9b

net: dsa: mv88e6060: fix switch MAC address · 1723ab4f

Vivien Didelot authored Oct 13, 2017

The 88E6060 Ethernet switch always transmits the multicast bit of the
switch MAC address as a zero. It re-uses the corresponding bit 8 of the
register "Switch MAC Address Register Bytes 0 & 1" for "DiffAddr".

If the "DiffAddr" bit is 0, then all ports transmit the same source
address. If it is set to 1, then bit 2:0 are used for the port number.

The mv88e6060 driver is currently wrongly shifting the MAC address byte
0 by 9. To fix this, shift it by 8 as usual and clear its bit 0.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1723ab4f

net: dsa: mv88e6xxx: setup random mac address · 04a69a17

Vivien Didelot authored Oct 13, 2017

An Ethernet switch may support having a MAC address, which can be used
as the switch's source address in transmitted full-duplex Pause frames.

If a DSA switch supports the related .set_addr operation, the DSA core
sets the master's MAC address on the switch. This won't make sense
anymore in a multi-CPU ports system, because there won't be a unique
master device assigned to a switch tree.

Instead, setup the switch from within the Marvell driver with a random
MAC address, and remove the .set_addr implementation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04a69a17

atm: fore200e: mark expected switch fall-throughs · ec0d0987

Gustavo A. R. Silva authored Oct 12, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec0d0987

14 Oct, 2017 17 commits

Merge branch 'nfp-bpf-support-direct-packet-access' · 8bc46548

David S. Miller authored Oct 14, 2017

Jakub Kicinski says:

====================
nfp: bpf: support direct packet access

The core of this series is direct packet access support.  With a
small change to the verifier, the offloaded code can now make
use of DPA.  We need to be careful to use kernel (after initial
translation) offsets in our JIT.  Direct packet access also brings
us to the problem of eBPF endianness.  After considering the
changes necessary we decided to not support translation on both
BE and LE hosts, for now.

This series contains two fixes - one for compare instructions and
one for ineffective jne optimization.  I chose to include fixes
in this set because the code in -net works only with unreleased
PoC FW (ABI version 1) and therefore nobody outside of Netronome
can exercise it anyway.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8bc46548

nfp: bpf: support direct packet access in TC · bfddbc8a

Jakub Kicinski authored Oct 12, 2017

Add support for direct packet access in TC, note that because
writing the packet will cause the verifier to generate a csum
fixup prologue we won't be able to offload packet writes from
TC, just yet, only the reads will work.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfddbc8a

nfp: bpf: direct packet access - write · e663fe38

Jakub Kicinski authored Oct 12, 2017

This patch adds ability to write packet contents using pre-validated
packet pointers (direct packet access).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e663fe38

nfp: bpf: add support for direct packet access - read · 2ca71441

Jakub Kicinski authored Oct 12, 2017

In direct packet access bound checks are already done, we can
simply dereference the packet pointer.

Verifier/parser logic needs to record pointer type.  Note that
although verifier does protect us from CTX vs other pointer
changes we will also want to differentiate between PACKET vs
MAP_VALUE or STACK, so we can add the check already.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ca71441

nfp: bpf: separate I/O from checks for legacy data load · 0a793977

Jakub Kicinski authored Oct 12, 2017

Move data load into a separate function and separate it from
packet length checks of legacy I/O.  This makes the code more
readable and easier to reuse.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a793977

nfp: bpf: fix context accesses · 943c57b9

Jakub Kicinski authored Oct 12, 2017

Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend
on target architecture.  Take that into account and use struct xdp_buff,
not struct xdp_md.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

943c57b9

nfp: bpf: support BPF offload only on little endian · 0f6cf4dd

Jakub Kicinski authored Oct 12, 2017

eBPF is host-endian specific.  Translating both BE and LE eBPF
to the NFP is feasible, but would require quite a bit of indirection.
The fact that I don't have access to any BE hosts that would fit
a 25G/40G/100G NIC is also limiting my ability to test big endian.

For now restrict the offload to little endian hosts only.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f6cf4dd

nfp: bpf: implement byte swap instruction · 3119d1fd

Jakub Kicinski authored Oct 12, 2017

Implement byte swaps with rotations, shifts and byte loads.
Remember to clear upper parts of the 64 bit registers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3119d1fd

nfp: bpf: add mov helper · c000dfb5

Jakub Kicinski authored Oct 12, 2017

Register move operation is encoded as alu no op.  This means
that one has to specify number of unused/none parameters to
the emit_alu().  Add a helper.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c000dfb5

nfp: bpf: fix compare instructions · 26fa818d

Jakub Kicinski authored Oct 12, 2017

Now that we have BPF assemebler support in LLVM 6 we can easily
test all compare instructions (LLVM 4 didn't generate most of them
from C).  Fix the compare to immediates and refactor the order
of compare to regs to make sure they both follow the same pattern.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26fa818d

nfp: bpf: add missing return in jne_imm optimization · 82837370

Jakub Kicinski authored Oct 12, 2017

We optimize comparisons to immediate 0 as if (reg.lo | reg.hi).
The early return statement was missing, however, which means we
would generate two comparisons - optimized one followed by a
normal 2x 32 bit compare.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

82837370

nfp: bpf: reorder arguments to emit_ld_field_any() · bc8c80a8

Jakub Kicinski authored Oct 12, 2017

ld_field instruction has the following format in NFP assembler:

  ld_field[dst, 1000, src, <<24]

reoder parameters to emit_ld_field_any() to make it closer to
the familiar assembler order.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc8c80a8

bpf: verifier: set reg_type on context accesses in second pass · 1bdec449

Jakub Kicinski authored Oct 12, 2017

Use a simplified is_valid_access() callback when verifier
is used for program analysis by non-host JITs.  This allows
us to teach the verifier about packet start and packet end
offsets for direct packet access.

We can extend the callback as needed but for most packet
processing needs there isn't much more the offloads may
require.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bdec449

Merge branch 'stmmac-Improvements-for-multi-queuing-and-for-AVB' · 40d0af56

David S. Miller authored Oct 14, 2017

Jose Abreu says:

====================
net: stmmac: Improvements for multi-queuing and for AVB

Two improvements for stmmac: First one corrects the available fifo
size per queue, second one corrects enabling of AVB queues. More info
in commit log.

Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>

Changes from v1:
- Fix typo in second patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

40d0af56

net: stmmac: Disable flow ctrl for RX AVB queues and really enable TX AVB queues · a0daae13

Jose Abreu authored Oct 13, 2017

Flow control must be disabled for AVB enabled queues and TX
AVB queues must be enabled by setting BIT(2) of TXQEN.

Correct this by passing the queue mode to DMA callbacks
and by checking in these functions wether we are in AVB
performing the necessary adjustments.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0daae13

net: stmmac: Use correct values in TQS/RQS fields · 52a76235

Jose Abreu authored Oct 13, 2017

Currently we are using all the available fifo size in RQS and
TQS fields. This will not work correctly in multi-queues IP's
because total fifo size must be splitted to the enabled queues.

Correct this by computing the available fifo size per queue and
setting the right value in TQS and RQS fields.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52a76235

icmp: don't fail on fragment reassembly time exceeded · 258bbb1b

Matteo Croce authored Oct 12, 2017

The ICMP implementation currently replies to an ICMP time exceeded message
(type 11) with an ICMP host unreachable message (type 3, code 1).

However, time exceeded messages can either represent "time to live exceeded
in transit" (code 0) or "fragment reassembly time exceeded" (code 1).

Unconditionally replying to "fragment reassembly time exceeded" with
host unreachable messages might cause unjustified connection resets
which are now easily triggered as UFO has been removed, because, in turn,
sending large buffers triggers IP fragmentation.

The issue can be easily reproduced by running a lot of UDP streams
which is likely to trigger IP fragmentation:

  # start netserver in the test namespace
  ip netns add test
  ip netns exec test netserver

  # create a VETH pair
  ip link add name veth0 type veth peer name veth0 netns test
  ip link set veth0 up
  ip -n test link set veth0 up

  for i in $(seq 20 29); do
      # assign addresses to both ends
      ip addr add dev veth0 192.168.$i.1/24
      ip -n test addr add dev veth0 192.168.$i.2/24

      # start the traffic
      netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
  done

  # wait
  send_data: data send error: No route to host (errno 113)
  netperf: send_omni: send_data failed: No route to host

We need to differentiate instead: if fragment reassembly time exceeded
is reported, we need to silently drop the packet,
if time to live exceeded is reported, maintain the current behaviour.
In both cases increment the related error count "icmpInTimeExcds".

While at it, fix a typo in a comment, and convert the if statement
into a switch to mate it more readable.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

258bbb1b

13 Oct, 2017 13 commits

Merge branch 'tipc-comm-groups' · a00344bd

David S. Miller authored Oct 13, 2017

Jon Maloy says:

====================
tipc: Introduce Communcation Group feature

With this commit series we introduce a 'Group Communication' feature in
order to resolve the datagram and multicast flow control problem. This
new feature makes it possible for a user to instantiate multiple private
virtual brokerless message buses by just creating and joining member
sockets.

The main features are as follows:
---------------------------------
- Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
  If it is the first socket of the group this implies creation of the
  group. This call takes four parameters: 'type' serves as group
  identifier, 'instance' serves as member identifier, and 'scope'
  indicates the visibility of the group (node/cluster/zone). Finally,
  'flags' indicates different options for the socket joining the group.
  For the time being, there are only two such flags: 1) 'LOOPBACK'
  indicates if the creator of the socket wants to receive a copy of
  broadcast or multicast messages it sends to the group, 2) EVENTS
  indicates if it wants to receive membership (JOINED/LEFT) events for
  the other members of the group.

- Groups are closed, i.e., sockets which have not joined a group will
  not be able to send messages to or receive messages from members of
  the group, and vice versa. A socket can only be member of one group
  at a time.

- There are four transmission modes.
  1: Unicast. The sender transmits a message using the port identity
     (node:port tuple) of the receiving socket.
  2: Anycast. The sender transmits a message using a port name (type:
     instance:scope) of one of the receiving sockets. If more than
     one member socket matches the given address a destination is
     selected according to a round-robin algorithm, but also considering
     the destination load (advertised window size) as an additional
     criteria.
  3: Multicast. The sender transmits a message using a port name
     (type:instance:scope) of one or more of the receiving sockets.
     All sockets in the group matching the given address will receive
     a copy of the message.
  4: Broadcast. The sender transmits a message using the primtive
     send(). All members of the group, irrespective of their member
     identity (instance) number receive a copy of the message.

- TIPC broadcast is used for carrying messages in mode 3 or 4 when
  this is deemed more efficient, i.e., depending on number of actual
  destinations.

- All transmission modes are flow controlled, so that messages never
  are dropped or rejected, just like we are used to from connection
  oriented communication. A special algorithm guarantees that this is
  true even for multipoint-to-point communication, i.e., at occasions
  where many source sockets may decide to send simultaneously towards
  the same  destination socket.

- Sequence order is always guaranteed, even between the different
  transmission modes.

- Member join/leave events are received in all other member sockets
  in guaranteed order. I.e., a 'JOINED' (an empty message with the OOB
  bit set) will always be received before the first data message from
  a new member, and a 'LEAVE' (like 'JOINED', but with EOR bit set) will
  always arrive after the last data message from a leaving member.

-----
v2: Reordered variable declarations in descending length order, as per
    feedback from David Miller. This was done as far as permitted by the
    the initialization order.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a00344bd