Commits · f576e24ffaf2c6b01af389e3bad3342681a8b84f · nexedi / linux

26 Apr, 2007 40 commits

[NET] ETHERNET: Use htons() where appropriate. · f576e24f

YOSHIFUJI Hideaki authored Mar 07, 2007

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f576e24f

[NET] CORE: Use htons() where appropriate. · 724800d6

YOSHIFUJI Hideaki authored Mar 25, 2007

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

724800d6

[NET] BLUETOOTH: Use cpu_to_le{16,32}() where appropriate. · aca3192c

YOSHIFUJI Hideaki authored Mar 25, 2007

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

aca3192c

[NET] ATM: Use htons() where appropriate. · acde4855

YOSHIFUJI Hideaki authored Mar 25, 2007

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

acde4855

[NET] 8021Q: Use htons() where appropriate. · b93b7eeb

YOSHIFUJI Hideaki authored Mar 25, 2007

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b93b7eeb

[NET] 802: Use hton{s,l}() where appropriate. · 2953fd24

YOSHIFUJI Hideaki authored Mar 25, 2007

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2953fd24

[UDP]: Clean up UDP-Lite receive checksum · 759e5d00

Herbert Xu authored Mar 25, 2007

This patch eliminates some duplicate code for the verification of
receive checksums between UDP-Lite and UDP.  It does this by
introducing __skb_checksum_complete_head which is identical to
__skb_checksum_complete_head apart from the fact that it takes
a length parameter rather than computing the first skb->len bytes.

As a result UDP-Lite will be able to use hardware checksum offload
for packets which do not use partial coverage checksums.  It also
means that UDP-Lite loopback no longer does unnecessary checksum
verification.

If any NICs start support UDP-Lite this would also start working
automatically.

This patch removes the assumption that msg_flags has MSG_TRUNC clear
upon entry in recvmsg.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

759e5d00

[UDP6]: Restore sk_filter optimisation · 1ab6eb62

Herbert Xu authored Mar 06, 2007

This reverts the changeset

    [IPV6]: UDPv6 checksum.

    We always need to check UDPv6 checksum because it is mandatory.

The sk_filter optimisation has nothing to do whether we verify the
checksum.  It simply postpones it to the point when the user calls
recv or poll.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ab6eb62

[IPV4]: Optimize inet_getpeer() · 243bbcaa

Eric Dumazet authored Mar 06, 2007

1) Some sysctl vars are declared __read_mostly

2) We can avoid updating stack[] when doing an AVL lookup only.

    lookup() macro is extended to receive a second parameter, that may be NULL
in case of a pure lookup (no need to save the AVL path). This removes
unnecessary instructions, because compiler knows if this _stack parameter is
NULL or not.

    text size of net/ipv4/inetpeer.o is 2063 bytes instead of 2107 on x86_64
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

243bbcaa

[TCP] TCP Yeah: cleanup · 43e68392

Stephen Hemminger authored Mar 06, 2007

Eliminate need for full 6/4/64 divide to compute queue.
Variable maxqueue was really a constant.
Fix indentation.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

43e68392

[TCP] tcp_cubic: faster cube root · c5f5877c

Stephen Hemminger authored Mar 25, 2007

The Newton-Raphson method is quadratically convergent so
only a small fixed number of steps are necessary.
Therefore it is faster to unroll the loop. Since div64_64 is no longer
inline it won't cause code explosion.

Also fixes a bug that can occur if x^2 was bigger than 32 bits.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5f5877c

[ATM] ENI: Convert to struct timeval to ktime_t. · 8570419f

YOSHIFUJI Hideaki authored Mar 06, 2007

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8570419f

[NETLINK]: Limit NLMSG_GOODSIZE to 8K. · fc910a27
David S. Miller authored Mar 25, 2007
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
fc910a27

[IPV6] ADDRCONF: Fix possible inet6_ifaddr leakage with CONFIG_OPTIMISTIC_DAD. · ca043569

YOSHIFUJI Hideaki authored Feb 28, 2007

The inet6_ifaddr for source address of RS is leaked if the address
is not an optimistic address.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca043569

[IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support. · 95c385b4

Neil Horman authored Apr 25, 2007

Nominally an autoconfigured IPv6 address is added to an interface in the
Tentative state (as per RFC 2462). Addresses in this state remain in this
state while the Duplicate Address Detection process operates on them to
determine their uniqueness on the network. During this period, these
tentative addresses may not be used for communication, increasing the time
before a node may be able to communicate on a network. Using Optimistic
Duplicate Address Detection, autoconfigured addresses may be used
immediately for communication on the network, as long as certain rules are
followed to avoid conflicts with other nodes during the Duplicate Address
Detection process.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

95c385b4

[IPV6] IP6TUNNEL: Enable to control the handled inner protocol. · 502b0935

Yasuyuki Kozakai authored Nov 30, 2006

ip6_tunnel before supporting IPv4/IPv6 tunnel allows only IPPROTO_IPV6
in configurations from userland. This allows userland to set IPPROTO_IPIP
and 0(wildcard). ip6_tunnel only handles allowed inner protocols.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

502b0935

[IPV6] IP6TUNNEL: Rename functions ip6ip6_* to ip6_tnl_*. · 3144581c

Yasuyuki Kozakai authored Feb 10, 2007

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3144581c

[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel. · c4d3efaf

Yasuyuki Kozakai authored Feb 15, 2007

Some notes
- Protocol number IPPROTO_IPIP is used for IPv4 over IPv6 packets.
- If IP6_TNL_F_USE_ORIG_TCLASS is set, TOS in IPv4 header is copied to
  Traffic Class in outer IPv6 header on xmit.
- IP6_TNL_F_USE_ORIG_FLOWLABEL is ignored on xmit of IPv4 packets, because
  IPv4 header does not have flow label.
- Kernel sends ICMP error if IPv4 packet is too big on xmit, even if
  DF flag is not set.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4d3efaf

[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_xmit(). · 61ec2aec

Yasuyuki Kozakai authored Nov 05, 2006

This enables to add IPv4/IPv6 specific handling later,
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

61ec2aec

[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_rcv(). · 8359925b

Yasuyuki Kozakai authored Nov 03, 2006

This enables to add IPv4/IPv6 specific handling later,
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8359925b

[IPV6] IP6TUNNEL: Split out generic routine in ip6ip6_err(). · e490d1d8

Yasuyuki Kozakai authored Oct 31, 2006

This enables to add IPv4/IPv6 specific error handling later,
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e490d1d8

[IPV6]: Decentralize EXPORT_SYMBOLs. · 7159039a
YOSHIFUJI Hideaki authored Feb 22, 2007
```
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
```
7159039a

[NETLINK]: Mirror UDP MSG_TRUNC semantics. · b558ff79

David S. Miller authored Mar 06, 2007

If the user passes MSG_TRUNC in via msg_flags, return
the full packet size not the truncated size.

Idea from Herbert Xu and Thomas Graf.
Signed-off-by: David S. Miller <davem@davemloft.net>

b558ff79

[NET]: convert network timestamps to ktime_t · b7aa0bf7

Eric Dumazet authored Apr 19, 2007

We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

b7aa0bf7

[NET]: div64_64 consolidate (rev3) · 3927f2e8

Stephen Hemminger authored Mar 25, 2007

Here is the current version of the 64 bit divide common code.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3927f2e8

[NET]: Convert xtime.tv_sec to get_seconds() · 9d729f72

James Morris authored Mar 04, 2007

Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d729f72

[PKTGEN]: fix device name handling · 39df232f

Stephen Hemminger authored Mar 04, 2007

Since devices can change name and other wierdness, don't hold onto
a copy of device name, instead use pointer to output device.

Fix a couple of leaks in error handling path as well.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

39df232f

[PKTGEN]: don't use __constant_htonl() · d5f1ce9a

Stephen Hemminger authored Mar 04, 2007

The existing htonl() macro is smart enough to do the same code as
using __constant_htonl() and it looks cleaner.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5f1ce9a

[PKTGEN]: use random32 · 5fa6fc76

Stephen Hemminger authored Mar 04, 2007

Can use random32() now.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

5fa6fc76

[PKTGEN]: use pr_debug · 25c4e53a

Stephen Hemminger authored Mar 04, 2007

Remove private debug macro and replace with standard version
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

25c4e53a

[NET]: Keep sk_backlog near sk_lock · fa438ccf

Eric Dumazet authored Mar 04, 2007

sk_backlog is a critical field of struct sock. (known famous words)

It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(),
tcp_v4_rcv(), sk_receive_skb().

It really makes sense to place it next to sk_lock, because sk_backlog is only
used after sk_lock locked (and thus memory cache line in L1 cache). This
should reduce cache misses and sk_lock acquisition time.

(In theory, we could only move the head pointer near sk_lock, and leaving tail
far away, because 'tail' is normally not so hot, but keep it simple :) )
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa438ccf

[TCP]: FRTO undo response falls back to ratehalving one if ECEd · e317f6f6

Ilpo Järvinen authored Mar 02, 2007

Undoing ssthresh is disabled in fastretrans_alert whenever
FLAG_ECE is set by clearing prior_ssthresh. The clearing does
not protect FRTO because FRTO operates before fastretrans_alert.
Moving the clearing of prior_ssthresh earlier seems to be a
suboptimal solution to the FRTO case because then FLAG_ECE will
cause a second ssthresh reduction in try_to_open (the first
occurred when FRTO was entered). So instead, FRTO falls back
immediately to the rate halving response, which switches TCP to
CA_CWR state preventing the latter reduction of ssthresh.

If the first ECE arrived before the ACK after which FRTO is able
to decide RTO as spurious, prior_ssthresh is already cleared.
Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
set also in the following ACKs resulting in rate halving response
that sees TCP is already in CA_CWR, which again prevents an extra
ssthresh reduction on that round-trip.

If the first ECE arrived before RTO, ssthresh has already been
adapted and prior_ssthresh remains cleared on entry because TCP
is in CA_CWR (the same applies also to a case where FRTO is
entered more than once and ECE comes in the middle).

High_seq must not be touched after tcp_enter_cwr because CWR
round-trip calculation depends on it.

I believe that after this patch, FRTO should be ECN-safe and
even able to take advantage of synergy benefits.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

e317f6f6

[TCP]: Complete icsk-to-local-variable change (in tcp_enter_cwr) · e01f9d77

Ilpo Järvinen authored Mar 02, 2007

A local variable for icsk was created but this change was
missing. Spotted by Jarek Poplawski.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

e01f9d77

[TCP] Sysctl documentation: tcp_frto_response · 89808060

Ilpo Järvinen authored Feb 27, 2007

In addition, fixed minor things in tcp_frto sysctl.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

89808060

[TCP]: Add two new spurious RTO responses to FRTO · 3cfe3baa

Ilpo Järvinen authored Feb 27, 2007

New sysctl tcp_frto_response is added to select amongst these
responses:
	- Rate halving based; reuses CA_CWR state (default)
	- Very conservative; used to be the only one available (=1)
	- Undo cwr; undoes ssthresh and cwnd reductions (=2)

The response with rate halving requires a new parameter to
tcp_enter_cwr because FRTO has already reduced ssthresh and
doing a second reduction there has to be prevented. In addition,
to keep things nice on 80 cols screen, a local variable was
added.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

3cfe3baa

[TCP]: Correct reordering detection change (no FRTO case) · c5e7af0d

Ilpo Järvinen authored Feb 23, 2007

The reordering detection must work also when FRTO has not been
used at all which was the original intention of mine, just the
expression of the idea was flawed.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5e7af0d

[TCP]: Make snd_cwnd_clamp a u32. · e0ef57cc
David S. Miller authored Feb 22, 2007
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
e0ef57cc

[TCP]: Keep copied_seq, rcv_wup and rcv_next together. · 54287cc1

Eric Dumazet authored Feb 22, 2007

I noticed in oprofile study a cache miss in tcp_rcv_established() to read
copied_seq.

ffffffff80400a80 <tcp_rcv_established>: /* tcp_rcv_established total: 4034293  
2.0400 */

 55493  0.0281 :ffffffff80400bc9:   mov    0x4c8(%r12),%eax copied_seq
543103  0.2746 :ffffffff80400bd1:   cmp    0x3e0(%r12),%eax   rcv_nxt    

if (tp->copied_seq == tp->rcv_nxt &&
        len - tcp_header_len <= tp->ucopy.len) {

In this function, the cache line 0x4c0 -> 0x500 is used only for this
reading 'copied_seq' field.

rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of
active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...)

As you suggested, I changed tcp_create_openreq_child() so that these fields
are changed together, to avoid adding a new store buffer stall.

Patch is 64bit friendly (no new hole because of alignment constraints)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

54287cc1

[TCP]: struct *sock argument renamed: sp -> sk · cf4c6bf8

Ilpo Järvinen authored Feb 22, 2007

In general, TCP code uses "sk" for struct sock pointer.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf4c6bf8

[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. · 886236c1
John Heffner authored Mar 25, 2007
```
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
886236c1