Commits · 8ba7e7bfc39e05b57ed90a70db2b411b1204377b · Kirill Smelkov / linux

14 May, 2014 14 commits

dccp: make the request_retries minimum is 1 · 8ba7e7bf

wangweidong authored May 13, 2014

In Documentation/networking/dccp.txt points that request_retries
should be greater than 0. So make the extra1 to be &one instead
of &zero.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ba7e7bf

snmp: fix some left over of snmp stats · c9f2dba6

WANG Cong authored May 12, 2014

Fengguang reported the following sparse warning:

>> net/ipv6/proc.c:198:41: sparse: incorrect type in argument 1 (different address spaces)
   net/ipv6/proc.c:198:41:    expected void [noderef] <asn:3>*mib
   net/ipv6/proc.c:198:41:    got void [noderef] <asn:3>**pcpumib

Fixes: commit 698365fa (net: clean up snmp stats code)
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9f2dba6

ipv4: make ip_local_reserved_ports per netns · 122ff243

WANG Cong authored May 12, 2014

ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.

By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

122ff243

irda: sh_irda: Enable driver compilation with COMPILE_TEST · 9cc5e36d

Laurent Pinchart authored May 13, 2014

This helps increasing build testing coverage.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cc5e36d

Merge branch 'tipc-next' · d6cc76d3

David S. Miller authored May 14, 2014

Jon Maloy says:

====================
tipc: bug fixes and improvements

Intensive and extensive testing has revealed some rather infrequent
problems related to flow control, buffer handling and link
establishment. Commits ##1 to 4 deal with these problems.

The remaining four commits are just code improvments, aiming at
making the code more comprehensible and maintainable. There are
no functional enhancements in this series.

v2: Fixed a typo in commit log #2. Otherwise no changes from v1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d6cc76d3

tipc: merge port message reception into socket reception function · 9816f061

Jon Paul Maloy authored May 14, 2014

In order to reduce complexity and save a call level during message
reception at port/socket level, we remove the function tipc_port_rcv()
and merge its functionality into tipc_sk_rcv().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9816f061

tipc: clean up neigbor discovery message reception · c82910e2

Jon Paul Maloy authored May 14, 2014

The function tipc_disc_rcv(), which is handling received neighbor
discovery messages, is perceived as messy, and it is hard to verify
its correctness by code inspection. The fact that the task it is set
to resolve is fairly complex does not make the situation better.

In this commit we try to take a more systematic approach to the
problem. We define a decision machine which takes three state flags
as input, and produces three action flags as output. We then walk
through all permutations of the state flags, and for each of them we
describe verbally what is going on, plus that we set zero or more of
the action flags. The action flags indicate what should be done once
the decision machine has finished its job, while the last part of the
function deals with performing those actions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c82910e2

tipc: improve and extend media address conversion functions · 38504c28

Jon Paul Maloy authored May 14, 2014

TIPC currently handles two media specific addresses: Ethernet MAC
addresses and InfiniBand addresses. Those are kept in three different
formats:

1) A "raw" format as obtained from the device. This format is known
   only by the media specific adapter code in eth_media.c and
   ib_media.c.
2) A "generic" internal format, in the form of struct tipc_media_addr,
   which can be referenced and passed around by the generic media-
   unaware code.
3) A serialized version of the latter, to be conveyed in neighbor
   discovery messages.

Conversion between the three formats can only be done by the media
specific code, so we have function pointers for this purpose in
struct tipc_media. Here, the media adapters can install their own
conversion functions at startup.

We now introduce a new such function, 'raw2addr()', whose purpose
is to convert from format 1 to format 2 above. We also try to as far
as possible uniform commenting, variable names and usage of these
functions, with the purpose of making them more comprehensible.

We can now also remove the function tipc_l2_media_addr_set(), whose
job is done better by the new function.

Finally, we expand the field for serialized addresses (format 3)
in discovery messages from 20 to 32 bytes. This is permitted
according to the spec, and reduces the risk of problems when we
add new media in the future.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38504c28

tipc: rename and move message reassembly function · 37e22164

Jon Paul Maloy authored May 14, 2014

The function tipc_link_frag_rcv() is in reality a re-entrant generic
message reassemby function that has nothing in particular to do with
the link, where it is defined now. This becomes obvious when we see
the need to call the function from other places in the code.

In this commit rename it to tipc_buf_append() and move it to the file
msg.c. We also simplify its signature by moving the tail pointer to
the control block of the head buffer, hence making the head buffer
self-contained.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37e22164

tipc: mark head of reassembly buffer as non-linear · 5074ab89

Jon Paul Maloy authored May 14, 2014

The message reassembly function does not update the 'len' and 'data_len'
fields of the head skbuff correctly when fragments are chained to it.
This may sometimes lead to obsure errors, such as fragment reordering
when we receive fragments which are cloned buffers.

This commit fixes this, by ensuring that the two fields are updated
correctly.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5074ab89

tipc: don't record link RESET or ACTIVATE messages as traffic · ec37dcd3

Jon Paul Maloy authored May 14, 2014

In the current code, all incoming LINK_PROTOCOL messages, irrespective
of type, nudge the "last message received" checkpoint, informing the
link state machine that a message was received from the peer since last
supervision timeout event. This inhibits the link from starting probing
the peer unnecessarily.

However, not only STATE messages are recorded as legitimate incoming
traffic this way, but even RESET and ACTIVATE messages, which in
reality are there to inform the link that the peer endpoint has been
reset. At the same time, some RESET messages may be dropped instead
of causing a link reset. This happens when the link endpoint thinks
it is fully up and working, and the session number of the RESET is
lower than or equal to the current link session. In such cases the
RESET is perceived as a delayed remnant from an earlier session, or
the current one, and dropped.

Now, if a TIPC module is removed and then immediately reinserted, e.g.
when using a script, RESET messages may arrive at the peer link endpoint
before this one has had time to discover the failure. The RESET may be
dropped because of the session number, but only after it has been
recorded as a legitimate traffic event. Hence, the receiving link will
not start probing, and not discover that the peer endpoint is down, at
the same time ignoring the periodic RESET messages coming from that
endpoint. We have ended up in a stale state where a failed link cannot
be re-established.

In this commit, we remedy this by nudging the checkpoint only for
received STATE messages, not for RESET or ACTIVATE messages.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec37dcd3

tipc: compensate for double accounting in socket rcv buffer · 4f4482dc

Jon Paul Maloy authored May 14, 2014

The function net/core/sock.c::__release_sock() runs a tight loop
to move buffers from the socket backlog queue to the receive queue.

As a security measure, sk_backlog.len of the receiving socket
is not set to zero until after the loop is finished, i.e., until
the whole backlog queue has been transferred to the receive queue.
During this transfer, the data that has already been moved is counted
both in the backlog queue and the receive queue, hence giving an
incorrect picture of the available queue space for new arriving buffers.

This leads to unnecessary rejection of buffers by sk_add_backlog(),
which in TIPC leads to unnecessarily broken connections.

In this commit, we compensate for this double accounting by adding
a counter that keeps track of it. The function socket.c::backlog_rcv()
receives buffers one by one from __release_sock(), and adds them to the
socket receive queue. If the transfer is successful, it increases a new
atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
transferred buffer. If a new buffer arrives during this transfer and
finds the socket busy (owned), we attempt to add it to the backlog.
However, when sk_add_backlog() is called, we adjust the 'limit'
parameter with the value of the new counter, so that the risk of
inadvertent rejection is eliminated.

It should be noted that this change does not invalidate the original
purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
doesn't respect the send window) keeps pumping in buffers to
sk_add_backlog(), he will eventually reach an upper limit,
(2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
to the backlog, and the connection will be broken. Ordinary, well-
behaved senders will never reach this buffer limit at all.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f4482dc

tipc: decrease connection flow control window · 6163a194

Jon Paul Maloy authored May 14, 2014

Memory overhead when allocating big buffers for data transfer may
be quite significant. E.g., truesize of a 64 KB buffer turns out
to be 132 KB, 2 x the requested size.

This invalidates the "worst case" calculation we have been
using to determine the default socket receive buffer limit,
which is based on the assumption that 1024x64KB = 67MB buffers
may be queued up on a socket.

Since TIPC connections cannot survive hitting the buffer limit,
we have to compensate for this overhead.

We do that in this commit by dividing the fix connection flow
control window from 1024 (2*512) messages to 512 (2*256). Since
older version nodes send out acks at 512 message intervals,
compatibility with such nodes is guaranteed, although performance
may be non-optimal in such cases.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6163a194

bonding: alloc the structure ad_info dynamically in per slave · 3fdddd85

dingtianhong authored May 12, 2014

The struct ad_slave_info is very huge, and only be used for 802.3ad mode,
so alloc the structure dynamically could save 356 Bits for every slave in
non 802.3ad mode.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fdddd85

13 May, 2014 26 commits

sh_eth: replace devm_kzalloc() with devm_kmalloc_array() · 86b5d251

Sergei Shtylyov authored May 13, 2014

When I was converting the driver to the managed device API, only devm_kzalloc()
was available for memory allocation, so I had to use it, despite zeroing out the
PHY IRQ array right before initializing all its entries to PHY_POLL was quite
stupid. Now that devm_kmalloc_array() has become available, we can avoid the
needless zeroing out...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

86b5d251

Merge branch 'tg3-next' · 1e1c77bf

David S. Miller authored May 13, 2014

Michael Chan says:

====================
tg3: TSO related enhancements to prevent memory allocation failure

Michael Chan (3):
  tg3: Don't modify ip header fields when doing GSO
  tg3: Prevent page allocation failure during TSO workaround
  tg3: Update copyright and version to 3.137
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1e1c77bf

tg3: Update copyright and version to 3.137 · de750e4c

Michael Chan authored May 11, 2014

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de750e4c

tg3: Prevent page allocation failure during TSO workaround · d3f6f3a1

Michael Chan authored May 11, 2014

If any TSO fragment hits hardware bug conditions (e.g. 4G boundary), the
driver will workaround by calling skb_copy() to copy to a linear SKB. Users
have reported page allocation failures as the TSO packet can be up to 64K.
Copying such a large packet is also very inefficient. We fix this by using
existing tg3_tso_bug() to transmit the packet using GSO.
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3f6f3a1

tg3: Don't modify ip header fields when doing GSO · d71c0dc4

Michael Chan authored May 11, 2014

tg3 uses GSO as workaround if the hardware cannot perform TSO on certain
packets.  We should not modify the ip header fields if we do GSO on the
packet.  It happens to work by accident because GSO recalculates the IP
checksum and IP total length.

Also fix the tg3_start_xmit comment to reflect that this is the only
xmit function for all devices.
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d71c0dc4

Merge branch 'inet_fwmark_reflect' · b6bd26c4

David S. Miller authored May 13, 2014

Lorenzo Colitti says:

====================
Make mark-based routing work better with multiple separate networks.

Mark-based routing (ip rule fwmark 17 lookup 100) combined with
either iptables marking (iptables -j MARK --set-mark 17) or
application-based marking (the SO_MARK setsockopt) are a good
way to deal with connecting simultaneously to multiple networks.

Each network can be given a routing table, and ip rules can
be configured to make different fwmarks select different
networks. Applications can select networks them by setting
appropriate socket marks, and iptables rules can be used to
handle non-aware applications, enforce policy, etc.

This patch series improves functionality when mark-based routing
is used in this way. Current behaviour has the following
limitations:

1. Kernel-originated replies that are not associated with a
   socket always use a mark of zero. This means that, for
   example, when the kernel sends a ping reply or a TCP reset,
   it does not send it on the network from which it received the
   original packet.
2. Path MTU discovery, which is triggered by incoming packets,
   does not always work correctly, because the routing lookups it
   uses to clone routes do not take the fwmark into account and
   thus can happen in the wrong routing table.
3. Application-based marking works well for outbound connections,
   but does not work well for incoming connections. Marking a
   listening socket causes that socket to only accept
   connections from a given network, and sockets that are
   returned by accept() are not marked (and are thus not routed
   correctly).

sysctl. This causes route lookups for kernel-generated replies
and PMTUD to use the fwmark of the packet that caused them.

which causes TCP sockets returned by accept() to be marked with
the same mark that sent the intial SYN packet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b6bd26c4

net: support marking accepting TCP sockets · 84f39b08

Lorenzo Colitti authored May 13, 2014

When using mark-based routing, sockets returned from accept()
may need to be marked differently depending on the incoming
connection request.

This is the case, for example, if different socket marks identify
different networks: a listening socket may want to accept
connections from all networks, but each connection should be
marked with the network that the request came in on, so that
subsequent packets are sent on the correct network.

This patch adds a sysctl to mark TCP sockets based on the fwmark
of the incoming SYN packet. If enabled, and an unmarked socket
receives a SYN, then the SYN packet's fwmark is written to the
connection's inet_request_sock, and later written back to the
accepted socket when the connection is established.  If the
socket already has a nonzero mark, then the behaviour is the same
as it is today, i.e., the listening socket's fwmark is used.

Black-box tested using user-mode linux:

- IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
  mark of the incoming SYN packet.
- The socket returned by accept() is marked with the mark of the
  incoming SYN packet.
- Tested with syncookies=1 and syncookies=2.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84f39b08

net: Use fwmark reflection in PMTU discovery. · 1b3c61dc

Lorenzo Colitti authored May 13, 2014

Currently, routing lookups used for Path PMTU Discovery in
absence of a socket or on unmarked sockets use a mark of 0.
This causes PMTUD not to work when using routing based on
netfilter fwmark mangling and fwmark ip rules, such as:

  iptables -j MARK --set-mark 17
  ip rule add fwmark 17 lookup 100

This patch causes these route lookups to use the fwmark from the
received ICMP error when the fwmark_reflect sysctl is enabled.
This allows the administrator to make PMTUD work by configuring
appropriate fwmark rules to mark the inbound ICMP packets.

Black-box tested using user-mode linux by pointing different
fwmarks at routing tables egressing on different interfaces, and
using iptables mangling to mark packets inbound on each interface
with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery
work as expected when mark reflection is enabled and fail when
it is disabled.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b3c61dc

net: add a sysctl to reflect the fwmark on replies · e110861f

Lorenzo Colitti authored May 13, 2014

Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
 - ICMP/ICMPv6 echo replies and errors.
 - TCP RST packets (IPv4 and IPv6).
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e110861f

Merge branch 'arc_emac-next' · 87e067cd

David S. Miller authored May 13, 2014

Beniamino Galvani says:

====================
arc_emac: promiscuous/multicast mode and netpoll support

These patches add support for promiscuous mode, multicast filtering
and netpoll to the ARC EMAC driver.

They were both tested on a Radxa Rock board which uses a ARC EMAC IP
core integrated in the Rockchip RK3188 SoC.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

87e067cd

arc_emac: add netpoll support · 5a45e57a

Beniamino Galvani authored May 11, 2014

Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a45e57a

arc_emac: implement promiscuous mode and multicast filtering · 775dd682

Beniamino Galvani authored May 11, 2014

This patch implements the set_rx_mode function to enable/disable
promiscuous or all-multicast modes and to update the multicast
filtering list of the device.
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

775dd682

Merge branch 'tcp-fastopen-ipv6' · ae8b42c6

David S. Miller authored May 13, 2014

Yuchung Cheng says:

====================
tcp: IPv6 support for fastopen server

This patch series add IPv6 support for fastopen server. To minimize
code duplication in IPv4 and IPv6, the current v4 only code is
refactored and common code is moved into net/ipv4/tcp_fastopen.c.

Also the current code uses a different function from
tcp_v4_send_synack() to send the first SYN-ACK in fastopen.
The new code eliminates this separate function by refactoring the
child-socket and syn-ack creation code.  After these refactoring
in the first four patches, we can easily add the fastopen code in
IPv6 by changing corresponding IPv6 functions.

Note Fast Open client already supports IPv6. This patch is for
the server-side (passive open) IPv6 support only.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ae8b42c6

tcp: IPv6 support for fastopen server · 3a19ce0e

Daniel Lee authored May 11, 2014

After all the preparatory works, supporting IPv6 in Fast Open is now easy.
We pretty much just mirror v4 code. The only difference is how we
generate the Fast Open cookie for IPv6 sockets. Since Fast Open cookie
is 128 bits and we use AES 128, we use CBC-MAC to encrypt both the
source and destination IPv6 addresses since the cookie is a MAC tag.
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a19ce0e

tcp: improve fastopen icmp handling · 0a672f74

Yuchung Cheng authored May 11, 2014

If a fast open socket is already accepted by the user, it should
be treated like a connected socket to record the ICMP error in
sk_softerr, so the user can fetch it. Do that in both tcp_v4_err
and tcp_v6_err.

Also refactor the sequence window check to improve readability
(e.g., there were two local variables named 'req').
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a672f74

tcp: use tcp_v4_send_synack on first SYN-ACK · 843f4a55

Yuchung Cheng authored May 11, 2014

To avoid large code duplication in IPv6, we need to first simplify
the complicate SYN-ACK sending code in tcp_v4_conn_request().

To use tcp_v4(6)_send_synack() to send all SYN-ACKs, we need to
initialize the mini socket's receive window before trying to
create the child socket and/or building the SYN-ACK packet. So we move
that initialization from tcp_make_synack() to tcp_v4_conn_request()
as a new function tcp_openreq_init_req_rwin().

After this refactoring the SYN-ACK sending code is simpler and easier
to implement Fast Open for IPv6.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

843f4a55

tcp: simplify fast open cookie processing · 89278c9d

Yuchung Cheng authored May 11, 2014

Consolidate various cookie checking and generation code to simplify
the fast open processing. The main goal is to reduce code duplication
in tcp_v4_conn_request() for IPv6 support.

Removes two experimental sysctl flags TFO_SERVER_ALWAYS and
TFO_SERVER_COOKIE_NOT_CHKD used primarily for developmental debugging
purposes.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

89278c9d

tcp: move fastopen functions to tcp_fastopen.c · 5b7ed089

Yuchung Cheng authored May 11, 2014

Move common TFO functions that will be used by both v4 and v6
to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b7ed089

Merge branch 'cdc_mbim-next' · 4b9734e5

David S. Miller authored May 13, 2014

Bjørn Mork says:

====================
cdc_mbim: cleanups and new features

This series depends on commit 6b5eeb7f ("net: cdc_mbim: handle
unaccelerated VLAN tagged frames"), which is currently in "net" but
not yet in "net-next".

Patch 4 might have a minor context collision with the "cdc_ncm: add
buffer tuning and stats using ethtool" series I just posted for
review.  Please let me know if I should submit an adjusted version
in either direction.  These two series' are otherwise completely
independent of each other.

The major new feature here is in patch 1, which I hope will solve
some problems with the original design without changing the existing
API, optionally allowing IP session 0 to be treated like any other
MBIM session.

The rest are some minor cleanups and finally some documentation of
the current driver APIs, after this series has been applied.  I
started feeling a bit more mortal than usual lately, which probably
is healthy, and realized that I should put some of the stuff in my
head in a somewhat less volatile storage.

v2:

Fixed patch 1 so that it actually does what it claims to do. This time
it is even tested for functionality, and not just build tested...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4b9734e5

net: cdc_ncm/cdc_mbim: rework probing of NCM/MBIM functions · 50a0ffaf

Bjørn Mork authored May 11, 2014

The NCM class match in the cdc_mbim driver is confusing and
cause unexpected behaviour. The USB core guarantees that a
USB interface is in altsetting 0 when probing starts. This
means that devices implementing a NCM 1.0 backwards
compatible MBIM function (a "NCM/MBIM function") always hit
the NCM entry in the cdc_mbim driver match table. Such
functions will never match any of the MBIM entries.

This causes unexpeced behaviour for cases where the NCM and
MBIM entries are differet, which is currently the case for
all except Ericsson devices.

Improve the probing of NCM/MBIM functions by looking up the
device again in the cdc_mbim match table after switching to
the MBIM identity.

The shared altsetting selection is updated to better
accommodate the new probing logic, returning the preferred
altsetting for the control interface instead of the data
interface. The control interface altsetting update is moved
to the cdc_mbim driver. It is never necessary to change the
control interface altsetting for NCM.

Cc: Greg Suarez <gsuarez@smithmicro.com>
Reported by: Yu-an Shih <yshih@nvidia.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

50a0ffaf

net: cdc_mbim: add driver documentation · a563babe

Bjørn Mork authored May 11, 2014

An initial attempt on describing some of the odd APIs
provided by this driver.

Cc: Greg Suarez <gsuarez@smithmicro.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

a563babe

net: cdc_mbim: reject IP packets on DSS VLANs · 6e1b3095

Bjørn Mork authored May 11, 2014

DSS VLANs are pseudo network interfaces representing arbitrary
data streams, and specifically not IP. Preventing spurious IP
packets can sometimes be a hassle. The kernel will for example
send an IPv6 Router Solicit when the interface is brought up
unless the user has been careful enough to disable IPv6 first.
Such packets forwared to a MBIM DSS session will look like
spurious noise to the device, and can cause it to log an error
or even malfunction.

Drop all IP packets on the designated DSS VLANs to prevent such
unwanted spurious transmissions.

Cc: Greg Suarez <gsuarez@smithmicro.com>
Reported-by: Arnaud Desmier <adesmier@sequans.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e1b3095

net: cdc_mbim: optionally use VLAN ID 4094 for IP session 0 · 146a08d2

Bjørn Mork authored May 11, 2014

The cdc_mbim driver maps 802.1q VLANs to MBIM IP and DSS
sessions. MBIM IP session 0 is handled as an exception and
is mapped to untagged frames.

This patch adds optional support for remapping MBIM IP
session 0 to 802.1q VLAN ID 4094 instead. The default
behaviour is not changed. The new behaviour is triggered
by adding a link for this previously unsupported VLAN.

The untagged mapping was chosen initially to support the
assumed most common use case: Most current MBIM devices only
support a single IP session (i.e. session 0 only), and using
untagged frames lets the users completely ignore the
additonal complexity of the multiplexing layer.

But when the multiplexing features of MBIM are used, then
this netdev gets a double meaning: It becomes the master
interface for all the VLAN subdevs the additional sessions
are mapped to, while still serving as the untagged IP
interface for session 0.

This can be problematic, especially when using Device Service
Streams (DSS), as have become apparent recently with the
availability of devices with real DSS support. Some use cases
need to e.g set a MTU which is higher than allowed for IP
Session 0. The dual role also leads to the situation where
the IP Session 0 interface cannot be taken down without
breaking unrelated IP or DSS sessions - a devastating side
effect which applications managing a simple IP session cannot
be expected to be aware of. A typical DHCP client will assume
that it should bring the interface down after releasing the
IP lease.

These problems can be avoided by tagging IP session 0 packets
too, making this session similar to all other multiplexed
sessions. This redefines the main netdev as an upper master
interface only.

Cc: Greg Suarez <gsuarez@smithmicro.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

146a08d2

net: get rid of SET_ETHTOOL_OPS · 7ad24ea4

Wilfried Klaebe authored May 11, 2014

net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.
Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ad24ea4

net: ptp: mark filter as __initdata · 0f49ff07

Mathias Krause authored May 10, 2014

sk_unattached_filter_create() will copy the filter's instructions so we
don't need to have the master copy hanging around after initialization.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f49ff07

net: fix test_bpf build to depend on NET · 98920ba6

Randy Dunlap authored May 13, 2014

Fix build when CONFIG_NET is not enabled.
Fixes these build errors:

WARNING: "sk_unattached_filter_destroy" [lib/test_bpf.ko] undefined!
WARNING: "kfree_skb" [lib/test_bpf.ko] undefined!
WARNING: "sk_unattached_filter_create" [lib/test_bpf.ko] undefined!
WARNING: "sk_run_filter_int_skb" [lib/test_bpf.ko] undefined!
WARNING: "__alloc_skb" [lib/test_bpf.ko] undefined!
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98920ba6