Commits · 71cea17ed39fdf1c0634f530ddc6a2c2fc601c2b · Kirill Smelkov / linux

20 May, 2013 19 commits

tcp: md5: remove spinlock usage in fast path · 71cea17e

Eric Dumazet authored May 20, 2013

TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.

[ tcp_md5sig_pool_lock is locked twice per incoming packet ]

Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.

Next step would be to allow crypto allocations being done in a NUMA
aware way.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

71cea17e

net: ipv6: remove 'next' member from inet6_dev · 168fc21a

Daniel Borkmann authored May 20, 2013

The next pointer within the inet6_dev structure seems not to be used
anywhere. So just remove it. Tested with allmodconfig on x86_64.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

168fc21a

rps: selective flow shedding during softnet overflow · 99bbc707

Willem de Bruijn authored May 20, 2013

A cpu executing the network receive path sheds packets when its input
queue grows to netdev_max_backlog. A single high rate flow (such as a
spoofed source DoS) can exceed a single cpu processing rate and will
degrade throughput of other flows hashed onto the same cpu.

This patch adds a more fine grained hashtable. If the netdev backlog
is above a threshold, IRQ cpus track the ratio of total traffic of
each flow (using 4096 buckets, configurable). The ratio is measured
by counting the number of packets per flow over the last 256 packets
from the source cpu. Any flow that occupies a large fraction of this
(set at 50%) will see packet drop while above the threshold.

Tested:
Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
kernel receive (RPS) on cpu0 and application threads on cpus 2--7
each handling 20k req/s. Throughput halves when hit with a 400 kpps
antagonist storm. With this patch applied, antagonist overload is
dropped and the server processes its complete load.

The patch is effective when kernel receive processing is the
bottleneck. The above RPS scenario is a extreme, but the same is
reached with RFS and sufficient kernel processing (iptables, packet
socket tap, ..).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99bbc707

fec: Let device core handle pinctrl · 4a5bddf7

Fabio Estevam authored May 20, 2013

Since commit ab78029e (drivers/pinctrl: grab default handles from device core)
we can rely on device core for handling pinctrl, so remove
devm_pinctrl_get_select_default() from the driver.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a5bddf7

xen-netfront: avoid leaking resources when setup_netfront fails · 1ca2983a

Wei Liu authored May 20, 2013

We should correctly free related resources (grant ref, memory page, evtchn)
when setup_netfront fails.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ca2983a

net: velocity: Add platform device support to VIA velocity driver · 6dffbe53

Tony Prisk authored May 18, 2013

Add support for the VIA Velocity network driver to be bound to a
OF created platform device.
Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

6dffbe53

net: velocity: Convert to generic dma functions · e2c41f14

Tony Prisk authored May 18, 2013

Remove the pci_* dma functions and replace with the more generic
versions.

In preparation of adding platform support, a new struct device *dev
is added to struct velocity_info which can be used by both the pci
and platform code.
Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2c41f14

net: velocity: Rename vptr->dev to vptr->netdev · a9683c94

Tony Prisk authored May 18, 2013

Improve the clarity of the code in preparation for converting the
dma functions to generic versions, which require a struct device *.

This makes it possible to store a 'struct device *dev' in the
velocity_info structure.
Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9683c94

3c59x: remove useless VORTEX_PCI() invocations · 4fc1ad6f

Sergei Shtylyov authored May 19, 2013

It's suboptimal to invoke quite complex VORTEX_PCI() macro every time we want
to get a 'struct pci_dev *' when we already have it in a variable...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fc1ad6f

ThunderLAN: remove is_eisa flag · 1e18583a

Rolf Eike Beer authored May 18, 2013

These 2 places are the only matches for is_eisa in the whole tree.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e18583a

net-bnx2x: dont reload on GRO change · 8802f579

Eric Dumazet authored May 18, 2013

bnx2x_set_features() forces a driver reload if GRO setting is changed.

A reload makes the ethernet port unresponsive for about 5 seconds.

This is not needed in the common case LRO is enabled, as LRO
(TPA_ENABLE_FLAG) has precedence over GRO (GRO_ENABLE_FLAG)

Tested:
 Verified that "ethtool -K eth0 gro {on|off}" doesn't blackout
 the NIC anymore

Google-Bug-Id: 8440442
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kravkov <dmitry@broadcom.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8802f579

Merge branch 'tg3_eee' · f6abf2b1

David S. Miller authored May 20, 2013

Nithin Nayak Sujir says:

====================
This series adds support for modifying EEE settings via ethtool. Since this can
impact Link Flap Avoidance, the driver pulls the current hardware settings if
LFA is enabled. This is similar to how we do the link settings to avoid a flap.

v2: Fixes pointed out by Ben Hutchings.
 - Use MDIO_AN_EEE_LPABLE to set the lp_advertised field.
 - Check that tx_lpi_timer is within valid range.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f6abf2b1

tg3: Implement set/get_eee handlers · 1cbf9eb8

Nithin Sujir authored May 18, 2013

Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1cbf9eb8

tg3: Simplify tg3_phy_eee_config_ok() by reusing tg3_eee_pull_config() · 5b6c273a

Nithin Sujir authored May 18, 2013

eee_config_ok() was checking only for mismatch in advertised settings.
This patch expands the scope of eee_config_ok() to check for mismatch in
the other eee settings. On mismatch we will require a call to
tg3_setup_eee() to push the configured settings to the hardware.
Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b6c273a

tg3: Add tg3_eee_pull_config() function · 400dfbaa

Nithin Sujir authored May 18, 2013

Add tg3_eee_pull_config() to pull the settings from the hardware and
populate the eee structure.

If Link Flap Avoidance is enabled, we pull the eee settings from the hw
so as not to cause a phy reset on eee config mismatch later. This
requires moving down tg3_setup_eee() below the tg3_pull_config() to not
trample existing settings.
Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

400dfbaa

tg3: Add ethtool_eee struct and tg3_setup_eee() · 9e2ecbeb

Nithin Sujir authored May 18, 2013

Add an eee structure and update it with eee settings. This will be used
for set/get_eee operations. Add common function tg3_setup_eee() that
will be used in the subsequent patches.
Reviewed-by: Ben Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e2ecbeb

filter: do not output bpf image address for security reason · 16495445

Eric Dumazet authored May 17, 2013

Do not leak starting address of BPF JIT code for non root users,
as it might help intruders to perform an attack.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16495445

x86: bpf_jit_comp: secure bpf jit against spraying attacks · 314beb9b

Eric Dumazet authored May 17, 2013

hpa bringed into my attention some security related issues
with BPF JIT on x86.

This patch makes sure the bpf generated code is marked read only,
as other kernel text sections.

It also splits the unused space (we vmalloc() and only use a fraction of
the page) in two parts, so that the generated bpf code not starts at a
known offset in the page, but a pseudo random one.

Refs:
http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.htmlReported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

314beb9b

tcp: remove bad timeout logic in fast recovery · 3e59cb0d

Yuchung Cheng authored May 17, 2013

tcp_timeout_skb() was intended to trigger fast recovery on timeout,
unfortunately in reality it often causes spurious retransmission
storms during fast recovery. The particular sign is a fast retransmit
over the highest sacked sequence (SND.FACK).

Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
to avoid spurious timeout: when SND.UNA advances the sender re-arms
RTO and extends the timeout by icsk_rto. The sender does not offset
the time elapsed since the packet at SND.UNA was sent.

But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
sent before the icsk_rto interval lost, including one that's above the
highest sacked sequence. Most likely a large part of scorebard will be
marked.

If most packets are not lost then the subsequent DUPACKs with new SACK
blocks will cause the sender to continue to retransmit packets beyond
SND.FACK spuriously. Even if only one packet is lost the sender may
falsely retransmit almost the entire window.

The situation becomes common in the world of bufferbloat: the RTT
continues to grow as the queue builds up but RTTVAR remains small and
close to the minimum 200ms. If a data packet is lost and the DUPACK
triggered by the next data packet is slightly delayed, then a spurious
retransmission storm forms.

As the original comment on tcp_timeout_skb() suggests: the usefulness
of this feature is questionable. It also wastes cycles walking the
sack scoreboard and is actually harmful because of false recovery.

It's time to remove this.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e59cb0d

19 May, 2013 2 commits

Documentation/sysctl/net.txt: fix (attribute removal). · 3cc7587b

Rami Rosen authored May 17, 2013

This patch removes mentioning the sysfsf net_device weight attribute
(class/net/<device>/weight)
in Documentation/sysctl/net.txt, since the net sysfs weight attribute
was removed by the following patch:

[NET]: Make NAPI polling independent of struct net_device objects
 bea3348eSigned-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3cc7587b

ipv6: add support of peer address · caeaba79

Nicolas Dichtel authored May 16, 2013

This patch adds the support of peer address for IPv6. For example, it is
possible to specify the remote end of a 6inY tunnel.
This was already possible in IPv4:
 ip addr add ip1 peer ip2 dev dev1

The peer address is specified with IFA_ADDRESS and the local address with
IFA_LOCAL (like explained in include/uapi/linux/if_addr.h).
Note that the API is not changed, because before this patch, it was not
possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE.
There is a small change for the dump: if the peer is different from ::,
IFA_ADDRESS will contain the peer address instead of the local address.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caeaba79

18 May, 2013 5 commits

sparc: bpf_jit_comp: can call module_free() from any context · 5199dfe5

Eric Dumazet authored May 17, 2013

module_free()/vfree() takes care of details, we no longer need a wrapper
and a work_struct.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5199dfe5

dev: remove duplicate 'skb->dev = dev' in dev_forward_skb() · 57b354e6

Nicolas Dichtel authored May 16, 2013

This was added by commit 59b9997b (Revert "net: maintain namespace
isolation between vlan and real device").
In fact, before the initial commit - the one that is reverted -, this
statement was not present.
'skb->dev = dev' is already done in eth_type_trans(), which is call just
after.
Spotted-by: Alain Ritoux <alain.ritoux@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57b354e6

xen-netback: enable user to unload netback module · b103f358

Wei Liu authored May 16, 2013

This patch enables user to unload netback module, which is useful when user
wants to upgrade to a newer netback module without rebooting the host.

Netfront cannot handle netback removal event. As we cannot fix all possible
frontends we add module get / put along with vif get / put to avoid
mis-unloading of netback. To unload netback module, user needs to shutdown all
VMs or migrate them to another host or unplug all vifs before hand.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>¬
Signed-off-by: David S. Miller <davem@davemloft.net>

b103f358

xen-netback: remove dead code · f1db320e

Wei Liu authored May 16, 2013

The array mmap_pages is never touched in the initialization function. This is
remnant of mapping mechanism, which does not exist upstream. In current
upstream code this array only tracks usage of pages inside netback. Those
pages are allocated when contructing a SKB and passed directly to network
subsystem.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1db320e

x86: bpf_jit_comp: can call module_free() from any context · 650c8496

Eric Dumazet authored May 16, 2013

It looks like we can call module_free()/vfree() from softirq context,
so no longer need a wrapper and a work_struct.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

650c8496

17 May, 2013 8 commits

net/usb: r8152: Use module_usb_driver() · b4236daa

Sachin Kamat authored May 16, 2013

module_usb_driver() eliminates boilerplate and simplifies the code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4236daa

net/usb: r8152: Remove redundant version.h header inclusion · 18cf1f12

Sachin Kamat authored May 16, 2013

version.h header inclusion is not necessary as detected by
checkversion.pl.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

18cf1f12

vxlan: listen on multiple ports · 553675fb

stephen hemminger authored May 16, 2013

The commit 823aa873
  Author: stephen hemminger <stephen@networkplumber.org>
  Date:   Sat Apr 27 11:31:57 2013 +0000

    vxlan: allow choosing destination port per vxlan

introduced per-vxlan UDP port configuration but only did half of the
necessary work.  It added per vxlan destination for sending, but
overlooked the handling of multiple ports for incoming traffic.

This patch changes the listening port management to handle multiple
incoming UDP ports. The earlier per-namespace structure is now a hash
list per namespace.

It is also now possible to define the same virtual network id
but with different UDP port values which can be useful for migration.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

553675fb

net: ethernet: korina: initialize variables directly · e998fd41

Emilio López authored May 17, 2013

Clean up the code a bit to initialize the variables directly when
defining them.
Signed-off-by: Emilio López <emilio@elopez.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>

e998fd41

net: ethernet: davicom: dm9000: initialize variables directly · 35e729ac

Emilio López authored May 17, 2013

Clean up the code a bit to initialize the variables directly when
defining them.
Signed-off-by: Emilio López <emilio@elopez.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>

35e729ac

net: ethernet: apple: initialize variables directly · 3b0aaef8

Emilio López authored May 17, 2013

Clean up the code a bit to initialize the variables directly when
defining them.
Signed-off-by: Emilio López <emilio@elopez.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b0aaef8

net: ethernet: sun: initialize variables directly · bfd428da

Emilio López authored May 17, 2013

Clean up the code a bit to initialize the variables directly when
defining them.
Signed-off-by: Emilio López <emilio@elopez.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfd428da

Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next · 345af953

David S. Miller authored May 16, 2013

Marc Kleine-Budde says:

====================
this is a pull-request for net-next/master. It consists of 4 patches by
Jingoo Han, which remove the unnecessary platform_set_drvdata() and a
patch by Laurent Navet converting the grcan driver to use
devm_ioremap_resource().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

345af953

16 May, 2013 6 commits

net: 3com: 3c509: remove unnecessary code · ed8a83a1

govindarajulu.v authored May 16, 2013

This patch removes unnecessary #if 0 code from 3c509.c
Signed-off-by: govindarajulu.v <govindarajulu90@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed8a83a1

drivers/net/ethernet/renesas: don't check resource with devm_ioremap_resource · d6a98c96

Wolfram Sang authored May 16, 2013

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

d6a98c96

tcp: speedup tcp_fixup_rcvbuf() · d2cf4367

Eric Dumazet authored May 15, 2013

tcp_fixup_rcvbuf() contains a loop to estimate initial socket
rcv space needed for a given mss. With large MTU (like 64K on lo),
we can loop ~500 times and consume a lot of cpu cycles.

perf top of 200 concurrent netperf -t TCP_CRR

5.62%  netperf  [kernel.kallsyms]  [k] tcp_init_buffer_space
1.71%  netperf  [kernel.kallsyms]  [k] _raw_spin_lock
1.55%  netperf  [kernel.kallsyms]  [k] kmem_cache_free
1.51%  netperf  [kernel.kallsyms]  [k] tcp_transmit_skb
1.50%  netperf  [kernel.kallsyms]  [k] tcp_ack

Lets use a 100% factor, and remove the loop.

100% is needed anyway for tcp_adv_win_scale=1
default value, and is also the maximum factor.

Refs: commit b49960a0
      ("tcp: change tcp_adv_win_scale and tcp_rmem[2]")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2cf4367

net: can: ti_hecc: remove unnecessary platform_set_drvdata() · 5727dc6b

Jingoo Han authored May 07, 2013

The driver core clears the driver data to NULL after device_release
or on probe failure, since commit 0998d063
(device-core: Ensure drvdata = NULL when no driver is bound).
Thus, it is not needed to manually clear the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

5727dc6b

net: can: flexcan: remove unnecessary platform_set_drvdata() · 688a2e74

Jingoo Han authored May 07, 2013

The driver core clears the driver data to NULL after device_release
or on probe failure, since commit 0998d063
(device-core: Ensure drvdata = NULL when no driver is bound).
Thus, it is not needed to manually clear the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

688a2e74

net: can: c_can: remove unnecessary platform_set_drvdata() · 5e946e56

Jingoo Han authored May 07, 2013

The driver core clears the driver data to NULL after device_release
or on probe failure, since commit 0998d063
(device-core: Ensure drvdata = NULL when no driver is bound).
Thus, it is not needed to manually clear the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

5e946e56