Commits · 92101b3b2e3178087127709a556b091dae314e9e · Kirill Smelkov / linux

23 Jul, 2012 15 commits

ipv4: Prepare for change of rt->rt_iif encoding. · 92101b3b

David S. Miller authored Jul 23, 2012

Use inet_iif() consistently, and for TCP record the input interface of
cached RX dst in inet sock.

rt->rt_iif is going to be encoded differently, so that we can
legitimately cache input routes in the FIB info more aggressively.

When the input interface is "use SKB device index" the rt->rt_iif will
be set to zero.

This forces us to move the TCP RX dst cache installation into the ipv4
specific code, and as well it should since doing the route caching for
ipv6 is pointless at the moment since it is not inspected in the ipv6
input paths yet.

Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
obsolete set to a non-zero value to force invocation of the check
callback.
Signed-off-by: David S. Miller <davem@davemloft.net>

92101b3b

ipv4: Remove all RTCF_DIRECTSRC handliing. · fe3edf45

David S. Miller authored Jul 23, 2012

The last and final kernel user, ICMP address replies,
has been removed.
Signed-off-by: David S. Miller <davem@davemloft.net>

fe3edf45

ipv4: Really ignore ICMP address requests/replies. · 838942a5

David S. Miller authored Jul 23, 2012

Alexey removed kernel side support for requests, and the
only thing we do for replies is log a message if something
doesn't look right.

As Alexey's comment indicates, this belongs in userspace (if
anywhere), and thus we can safely just get rid of this code.
Signed-off-by: David S. Miller <davem@davemloft.net>

838942a5

decnet: Don't set RTCF_DIRECTSRC. · 8acfaa94

David S. Miller authored Jul 23, 2012

It's an ipv4 defined route flag, and only ipv4 uses it.
Signed-off-by: David S. Miller <davem@davemloft.net>

8acfaa94

net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. · e7d4b18c

Saurabh authored Jul 23, 2012

With CONFIG_SPARSE_RCU_POINTER=y sparse identified references which did not
specificy __rcu in ip_vti.c
Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7d4b18c

ipv4: Remove redundant assignment · 8fe5cb87

Lin Ming authored Jul 23, 2012

It is redundant to set no_addr and accept_local to 0 and then set them
with other values just after that.
Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fe5cb87

rds: set correct msg_namelen · 06b6a1cf

Weiping Pan authored Jul 23, 2012

Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
	int sock_fd;
	struct sockaddr_in serverAddr;
	struct sockaddr_in toAddr;
	char recvBuffer[128] = "data from client";
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if (sock_fd < 0) {
		perror("create socket error\n");
		exit(1);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4001);

	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind() error\n");
		close(sock_fd);
		exit(1);
	}

	memset(&toAddr, 0, sizeof(toAddr));
	toAddr.sin_family = AF_INET;
	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	toAddr.sin_port = htons(4000);
	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	if (sendmsg(sock_fd, &msg, 0) == -1) {
		perror("sendto() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("client send data:%s\n", recvBuffer);

	memset(recvBuffer, '\0', 128);

	msg.msg_name = &toAddr;
	msg.msg_namelen = sizeof(toAddr);
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;
	if (recvmsg(sock_fd, &msg, 0) == -1) {
		perror("recvmsg() error\n");
		close(sock_fd);
		exit(1);
	}

	printf("receive data from server:%s\n", recvBuffer);

	close(sock_fd);

	return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
	struct sockaddr_in fromAddr;
	int sock_fd;
	struct sockaddr_in serverAddr;
	unsigned int addrLen;
	char recvBuffer[128];
	struct msghdr msg;
	struct iovec iov;

	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
	if(sock_fd < 0) {
		perror("create socket error\n");
		exit(0);
	}

	memset(&serverAddr, 0, sizeof(serverAddr));
	serverAddr.sin_family = AF_INET;
	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
	serverAddr.sin_port = htons(4000);
	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
		perror("bind error\n");
		close(sock_fd);
		exit(1);
	}

	printf("server is waiting to receive data...\n");
	msg.msg_name = &fromAddr;

	/*
	 * I add 16 to sizeof(fromAddr), ie 32,
	 * and pay attention to the definition of fromAddr,
	 * recvmsg() will overwrite sock_fd,
	 * since kernel will copy 32 bytes to userspace.
	 *
	 * If you just use sizeof(fromAddr), it works fine.
	 * */
	msg.msg_namelen = sizeof(fromAddr) + 16;
	/* msg.msg_namelen = sizeof(fromAddr); */
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_iov->iov_base = recvBuffer;
	msg.msg_iov->iov_len = 128;
	msg.msg_control = 0;
	msg.msg_controllen = 0;
	msg.msg_flags = 0;

	while (1) {
		printf("old socket fd=%d\n", sock_fd);
		if (recvmsg(sock_fd, &msg, 0) == -1) {
			perror("recvmsg() error\n");
			close(sock_fd);
			exit(1);
		}
		printf("server received data from client:%s\n", recvBuffer);
		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
		printf("new socket fd=%d\n", sock_fd);
		strcat(recvBuffer, "--data from server");
		if (sendmsg(sock_fd, &msg, 0) == -1) {
			perror("sendmsg()\n");
			close(sock_fd);
			exit(1);
		}
	}

	close(sock_fd);
	return 0;
}
Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06b6a1cf

openvswitch: potential NULL deref in sample() · 5b3e7e6c

Dan Carpenter authored Jul 23, 2012

If there is no OVS_SAMPLE_ATTR_ACTIONS set then "acts_list" is NULL and
it leads to a NULL dereference when we call nla_len(acts_list). This
is a static checker fix, not something I have seen in testing.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b3e7e6c

tcp: dont drop MTU reduction indications · 563d34d0

Eric Dumazet authored Jul 23, 2012

ICMP messages generated in output path if frame length is bigger than
mtu are actually lost because socket is owned by user (doing the xmit)

One example is the ipgre_tunnel_xmit() calling
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));

We had a similar case fixed in commit a34a101e (ipv6: disable GSO on
sockets hitting dst_allfrag).

Problem of such fix is that it relied on retransmit timers, so short tcp
sessions paid a too big latency increase price.

This patch uses the tcp_release_cb() infrastructure so that MTU
reduction messages (ICMP messages) are not lost, and no extra delay
is added in TCP transmits.
Reported-by: Maciej Żenczykowski <maze@google.com>
Diagnosed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

563d34d0

bnx2x: Add new 57840 device IDs · c3def943

Yuval Mintz authored Jul 23, 2012

The 57840 boards come in two flavours: 2 x 20G and 4 x 10G.
To better differentiate between the two flavours, a separate device ID
was assigned to each.
The silicon default value is still the currently supported 57840 device ID
(0x168d), and since a user can damage the nvram (e.g., 'ethtool -E')
the driver will still support this device ID to allow the user to amend the
nvram back into a supported configuration.

Notice this patch contains lines longer than 80 characters (strings).
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3def943

tcp: avoid oops in tcp_metrics and reset tcpm_stamp · 9a0a9502

Julian Anastasov authored Jul 23, 2012

	In tcp_tw_remember_stamp we incorrectly checked tw
instead of tm, it can lead to oops if the cached entry is
not found.

	tcpm_stamp was not updated in tcpm_check_stamp when
tcpm_suck_dst was called, move the update into tcpm_suck_dst,
so that we do not call it infinitely on every next cache hit
after TCP_METRICS_TIMEOUT.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a0a9502

niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value · 9b70749e

Shuah Khan authored Jul 20, 2012

Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return
value to be consistent with the rest of the checks after niu_rbr_add_page()
calls in this file.
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b70749e

niu: Fix to check for dma mapping errors. · ec2deec1

Shuah Khan authored Jul 20, 2012

Fix Neptune ethernet driver to check dma mapping error after map_page()
interface returns.
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec2deec1

net: Fix references to out-of-scope variables in put_cmsg_compat() · 81881047

Jesper Juhl authored Jul 22, 2012

In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
either the 'ctv' or 'cts' local variables inside the 'if
(!COMPAT_USE_64BIT_TIME)' branch.

Those variables go out of scope at the end of the 'if' statement, so
when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
it may be refering to - not good.

Fix the problem by simply giving 'ctv' and 'cts' function scope.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

81881047

Merge branch 'kill_rtcache' · 5e9965c1

David S. Miller authored Jul 22, 2012

The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.

The general flow of this patch series is that first the routing cache
is removed.  We build a completely new rtable entry every lookup
request.

Next we make some simplifications due to the fact that removing the
routing cache causes several members of struct rtable to become no
longer necessary.

Then we need to make some amends such that we can legally cache
pre-constructed routes in the FIB nexthops.  Firstly, we need to
invalidate routes which are hit with nexthop exceptions.  Secondly we
have to change the semantics of rt->rt_gateway such that zero means
that the destination is on-link and non-zero otherwise.

Now that the preparations are ready, we start caching precomputed
routes in the FIB nexthops.  Output and input routes need different
kinds of care when determining if we can legally do such caching or
not.  The details are in the commit log messages for those changes.

The patch series then winds down with some more struct rtable
simplifications and other tidy ups that remove unnecessary overhead.

On a SPARC-T3 output route lookups are ~876 cycles.  Input route
lookups are ~1169 cycles with rpfilter disabled, and about ~1468
cycles with rpfilter enabled.

These measurements were taken with the kbench_mod test module in the
net_test_tools GIT tree:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git

That GIT tree also includes a udpflood tester tool and stresses
route lookups on packet output.

For example, on the same SPARC-T3 system we can run:

	time ./udpflood -l 10000000 10.2.2.11

with routing cache:
real    1m21.955s       user    0m6.530s        sys     1m15.390s

without routing cache:
real    1m31.678s       user    0m6.520s        sys     1m25.140s

Performance undoubtedly can easily be improved further.

For example fib_table_lookup() performs a lot of excessive
computations with all the masking and shifting, some of it
conditionalized to deal with edge cases.

Also, Eric's no-ref optimization for input route lookups can be
re-instated for the FIB nexthop caching code path.  I would be really
pleased if someone would work on that.

In fact anyone suitable motivated can just fire up perf on the loading
of the test net_test_tools benchmark kernel module.  I spend much of
my time going:

bash# perf record insmod ./kbench_mod.ko dst=172.30.42.22 src=74.128.0.1 iif=2
bash# perf report

Thanks to helpful feedback from Joe Perches, Eric Dumazet, Ben
Hutchings, and others.
Signed-off-by: David S. Miller <davem@davemloft.net>

5e9965c1

22 Jul, 2012 22 commits

net: ethernet: davinci_emac: add pm_runtime support · 3ba97381

Mark A. Greer authored Jul 20, 2012

Add pm_runtime support to the TI Davinci EMAC driver.

CC: Sekhar Nori <nsekhar@ti.com>
CC: Kevin Hilman <khilman@ti.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ba97381

net: ethernet: davinci_emac: Remove unnecessary #include · c6f0b4ea

Mark A. Greer authored Jul 20, 2012

The '#include <mach/mux.h>' line in davinci_emac.c
causes a compile error because that header file
isn't found.  It turns out that the #include isn't
needed because the driver isn't (and shoudn't be)
touching the mux anyway, so remove it.

CC: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6f0b4ea

net-next: minor cleanups for bonding documentation · f8b72d36

Rick Jones authored Jul 20, 2012

The section titled "Configuring Bonding for Maximum Throughput" is
actually section twelve not thirteen, and there are a couple of words
spelled incorrectly.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8b72d36

net: netprio_cgroup: rework update socket logic · 406a3c63

John Fastabend authored Jul 20, 2012

Instead of updating the sk_cgrp_prioidx struct field on every send
this only updates the field when a task is moved via cgroup
infrastructure.

This allows sockets that may be used by a kernel worker thread
to be managed. For example in the iscsi case today a user can
put iscsid in a netprio cgroup and control traffic will be sent
with the correct sk_cgrp_prioidx value set but as soon as data
is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
is updated with the kernel worker threads value which is the
default case.

It seems more correct to only update the field when the user
explicitly sets it via control group infrastructure. This allows
the users to manage sockets that may be used with other threads.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

406a3c63

tun: experimental zero copy tx support · 0690899b

Michael S. Tsirkin authored Jul 20, 2012

Let vhost-net utilize zero copy tx when used with tun.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0690899b

skbuff: export skb_copy_ubufs · dcc0fb78

Michael S. Tsirkin authored Jul 20, 2012

Export skb_copy_ubufs so that modules can orphan frags.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dcc0fb78

net: orphan frags on receive · 1080e512

Michael S. Tsirkin authored Jul 20, 2012

zero copy packets are normally sent to the outside
network, but bridging, tun etc might loop them
back to host networking stack. If this happens
destructors will never be called, so orphan
the frags immediately on receive.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1080e512

tun: orphan frags on xmit · 868eefeb

Michael S. Tsirkin authored Jul 20, 2012

tun xmit is actually receive of the internal tun
socket. Orphan the frags same as we do for normal rx path.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

868eefeb

skbuff: convert to skb_orphan_frags · 70008aa5

Michael S. Tsirkin authored Jul 20, 2012

Reduce code duplication a bit using the new helper.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

70008aa5

skbuff: add an api to orphan frags · a353e0ce

Michael S. Tsirkin authored Jul 20, 2012

Many places do
       if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
		skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a353e0ce

ixgbe: Fix build with PCI_IOV enabled. · d47e12d6
David S. Miller authored Jul 22, 2012
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
d47e12d6

forcedeth: advertise transmit time stamping · 7491302d

Richard Cochran authored Jul 22, 2012

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>

Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7491302d

e1000e: advertise transmit time stamping · 7b1115e0

Richard Cochran authored Jul 22, 2012

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>

Cc: Willem de Bruijn <willemb@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: David S. Miller <davem@davemloft.net>

7b1115e0

e1000: advertise transmit time stamping · e10df2c6

Richard Cochran authored Jul 22, 2012

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>

Cc: Willem de Bruijn <willemb@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: David S. Miller <davem@davemloft.net>

e10df2c6

bnx2x: advertise transmit time stamping · be53ce1e

Richard Cochran authored Jul 22, 2012

This driver now offers software transmit time stamping, so it should
advertise that fact via ethtool. Compile tested only.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be53ce1e

rtnl: Add #ifdef CONFIG_RPS around num_rx_queues reference · 1d69c2b3

Mark A. Greer authored Jul 20, 2012

Commit 76ff5cc9
(rtnl: allow to specify number of rx and tx queues
on device creation) added a reference to the net_device
structure's 'num_rx_queues' member in

	net/core/rtnetlink.c:rtnl_fill_ifinfo()

However, the definition for 'num_rx_queues' is surrounded
by an '#ifdef CONFIG_RPS' while the new reference to it is
not.  This causes a compile error when CONFIG_RPS is not
defined.

Fix the compile error by surrounding the new reference to
'num_rx_queues' by an '#ifdef CONFIG_RPS'.

CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d69c2b3

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · fd183f6a

David S. Miller authored Jul 22, 2012

Jeff Kirsher says:

--------------------
This series contains updates to ixgbe and ixgbevf.
 ...
Akeem G. Abodunrin (1):
  igb: reset PHY in the link_up process to recover PHY setting after
    power down.

Alexander Duyck (8):
  ixgbe: Drop probe_vf and merge functionality into ixgbe_enable_sriov
  ixgbe: Change how we check for pre-existing and assigned VFs
  ixgbevf: Add lock around mailbox ops to prevent simultaneous access
  ixgbevf: Add support for PCI error handling
  ixgbe: Fix handling of FDIR_HASH flag
  ixgbe: Reduce Rx header size to what is actually used
  ixgbe: Use num_tcs.pg_tcs as upper limit for TC when checking based
    on UP
  ixgbe: Use 1TC DCB instead of disabling DCB for MSI and legacy
    interrupts

Don Skidmore (1):
  ixgbe: add support for new 82599 device

Greg Rose (1):
  ixgbevf: Fix namespace issue with ixgbe_write_eitr

John Fastabend (2):
  ixgbe: fix RAR entry counting for generic and fdb_add()
  ixgbe: remove extra unused queues in DCB + FCoE case
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fd183f6a

Merge branch 'vhost-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 5dc7c779
David S. Miller authored Jul 22, 2012

5dc7c779

wimax: fix printk format warnings · 2a304bb8

Randy Dunlap authored Jul 21, 2012

Fix printk format warnings in drivers/net/wimax/i2400m:

drivers/net/wimax/i2400m/control.c: warning: format '%zu' expects argument of type 'size_t', but argument 4 has type 'ssize_t' [-Wformat]
drivers/net/wimax/i2400m/control.c: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'ssize_t' [-Wformat]
drivers/net/wimax/i2400m/usb-fw.c: warning: format '%zu' expects argument of type 'size_t', but argument 4 has type 'ssize_t' [-Wformat]

I don't see these warnings on x86. The warnings that are quoted above
are from Geert's kernel build reports.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
Cc: wimax@linuxwimax.org
Signed-off-by: David S. Miller <davem@davemloft.net>

2a304bb8

sctp: Implement quick failover draft from tsvwg · 5aa93bcf

Neil Horman authored Jul 21, 2012

I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters.  While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established.  I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5aa93bcf

net: fix race condition in several drivers when reading stats · e3906486

Kevin Groeneveld authored Jul 21, 2012

Fix race condition in several network drivers when reading stats on 32bit
UP architectures.  These drivers update their stats in a BH context and
therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
stats.
Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3906486

ipv4: tcp: set unicast_sock uc_ttl to -1 · 0980e56e

Eric Dumazet authored Jul 20, 2012

Set unicast_sock uc_ttl to -1 so that we select the right ttl,
instead of sending packets with a 0 ttl.

Bug added in commit be9f4a44 (ipv4: tcp: remove per net tcp_sock)
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0980e56e

21 Jul, 2012 3 commits

igb: reset PHY in the link_up process to recover PHY setting after power down. · 76886596

Akeem G. Abodunrin authored Jul 17, 2012

There was a previous patch to resolve issue with 82576 losing PHY setting
after PHY power down. However that previous implementation triggered speed
mismatch and occasional link lost. Now, this patch resolves both initial
PHY setting and speed mismatch issues.
Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

76886596

ixgbe: Use 1TC DCB instead of disabling DCB for MSI and legacy interrupts · b724e9f2

Alexander Duyck authored Jul 17, 2012

This change makes it so that we can use 1TC DCB in the case of MSI and
legacy interrupts.  The advantage to this is that it allows us to fully
support FCoE w/ DCB instead of having to drop to link flow control only
when using these interrupt modes.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

b724e9f2

ixgbe: add support for new 82599 device · b6dfd939

Don Skidmore authored Jul 11, 2012

This patch adds support for a new 82599 device that supports WoL.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

b6dfd939