Commits · 91c8028c95a468da9c0aafd2d91cf24e27784206 · nexedi / linux

15 Jun, 2012 3 commits

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 91c8028c
David S. Miller authored Jun 15, 2012

91c8028c

ipv6: Handle PMTU in ICMP error handlers. · 81aded24

David S. Miller authored Jun 15, 2012

One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
to handle the error pass the 32-bit info cookie in network byte order
whereas ipv4 passes it around in host byte order.

Like the ipv4 side, we have two helper functions.  One for when we
have a socket context and one for when we do not.

ip6ip6 tunnels are not handled here, because they handle PMTU events
by essentially relaying another ICMP packet-too-big message back to
the original sender.

This patch allows us to get rid of rt6_do_pmtu_disc().  It handles all
kinds of situations that simply cannot happen when we do the PMTU
update directly using a fully resolved route.

In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
likely be removed or changed into a BUG_ON() check.  We should never
have a prefixed ipv6 route when we get there.

Another piece of strange history here is that TCP and DCCP, unlike in
ipv4, never invoke the update_pmtu() method from their ICMP error
handlers.  This is incredibly astonishing since this is the context
where we have the most accurate context in which to make a PMTU
update, namely we have a fully connected socket and associated cached
socket route.
Signed-off-by: David S. Miller <davem@davemloft.net>

81aded24

ipv4: Handle PMTU in all ICMP error handlers. · 36393395

David S. Miller authored Jun 14, 2012

With ip_rt_frag_needed() removed, we have to explicitly update PMTU
information in every ICMP error handler.

Create two helper functions to facilitate this.

1) ipv4_sk_update_pmtu()

   This updates the PMTU when we have a socket context to
   work with.

2) ipv4_update_pmtu()

   Raw version, used when no socket context is available.  For this
   interface, we essentially just pass in explicit arguments for
   the flow identity information we would have extracted from the
   socket.

   And you'll notice that ipv4_sk_update_pmtu() is simply implemented
   in terms of ipv4_update_pmtu()

Note that __ip_route_output_key() is used, rather than something like
ip_route_output_flow() or ip_route_output_key().  This is because we
absolutely do not want to end up with a route that does IPSEC
encapsulation and the like.  Instead, we only want the route that
would get us to the node described by the outermost IP header.
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36393395

14 Jun, 2012 8 commits

ixgbe: Check PTP Rx timestamps via BPF filter · 1d1a79b5

Jacob Keller authored May 22, 2012

This patch fixes a potential Rx timestamp deadlock that causes the Rx
timestamping to stall indefinitely. The issue could occur when a PTP packet is
timestamped by hardware but never reaches the Rx queue. In order to prevent a
permanent loss of timestamping, the RXSTMP(L/H) registers have to be read to
unlock them. (This used to only occur when a packet that was timestamped
reached the software.) However the registers can't be read early otherwise
there is no way to correlate them to the packet.

This patch introduces a filter function which can be used to determine if a
packet should have been timestamped. Supplied with the filter setup by the
hwtstamp ioctl, check to make sure the PTP protocol and message type match the
expected values. If so, then read the timestamp registers (to free them.) At
this point check the descriptor bit, if the bit is set then we know this
packet correlates to the timestamp stored in the RXTSTAMP registers.
Otherwise, assume that packet was dropped by the hardware, and ignore this
timestamp value. However, we have at least unlocked the rxtstamp registers for
future timestamping.

Due to the way the driver handles skb data, it cannot be directly accessed. In
order to work around this, a copy of the skb data into a linear buffer is
made. From this buffer it becomes possible to read the data correctly
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

1d1a79b5

ixgbe: PTP Fix hwtstamp mode settings · c19197a7

Jacob Keller authored May 22, 2012

When enabling the hwtstamp mode for Rx timestamping the V2 ptp event type
specific modes (Delay Request and Sync) have been rolled into the V2 all event
packet modes, in order to more accurately represent what hardware is doing.
Hardware always timestamps the Path delay packets when a V2 mode is selected,
regardless of what type was selected (in order to always support Path delay
mode). However this means the user selected modes of timestamping only Sync or
Delay Request is not truly supported. This patch correctly sets the mode for
the hwtstamp config and returns to the user that all V2 event packets will be
timestamped.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

c19197a7

ixgbe: ptp code cleanup · 0ede4a60

Jacob Keller authored May 22, 2012

This patch fixes two minor nits from Richard Cochran. The first is a case of
ambitious line wrapping that wasn't necessary. The second is to re-order the
flag checks for PPS support. Previously, the hardware test was done first, and
the interrupt flag test was done second. Now, test the interrupt flag and use
the unlikely macro.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

0ede4a60

ixgbe: do not compile ixgbe_sysfs.c when CONFIG_IXGBE_HWMON is not set · 6cbc52ef

Emil Tantilov authored May 16, 2012

ixgbe_sysfs.c is only needed when CONFIG_IXGBE_HWMON is configured in the
kernel.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Don Skidmore <Donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

6cbc52ef

ixgbe: align flow control DV macros with datasheet · 4f8a91ad

John Fastabend authored Mar 28, 2012

The flow control DV macros are used to calculate the flow control
high and low thresholds. This patch annotates these macros slightly
better and fixes the issues below.

The macro variables are renamed LINK to _max_frame_link and TC to
_max_frame_tc. This was to avoid confusion and make them more
readable. It was found that people auditing the code read TC to be
'traffic class' in the 802.1Q definition instead of the max frame
size of the tc. Hopefully it is clear now.

This audit also found the following real deviations from the
theoretical values. Fixed in this patch.

  * I multiplied the DV calculations by (36/25) which always
    evaluates to 1. This does not match the intended theoretical
    value of 1.44.

  * IXGBE_BT2KB added 1023 to account for rounding however this
    really should be 8 * 1023 - 1 to account for division by 8k.

  * x2 multiplication of max frame in DV calculations to account
    for updated hardware recommendations.

With this patch the DV values are inline with the recommendations
in the 82599 and 82598 data sheets. Its worth noting I did not
see any dropped frames with flow control on in my experiments without
this patch. However aligning with the hardware specs and
recommendations seems like a good idea here to account for worst
case scenarios.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

4f8a91ad

e1000e: use more informative logging macros when netdev not yet registered · 185095fb

Bruce Allan authored Jun 07, 2012

Based on a report from Ethan Zhao, before calling register_netdev() the
driver should be using logging macros that do not display the potentially
confusing "(unregistered net_device)" yet still display the useful driver
name and PCI bus/device/function.
Reported-by: Ethan Zhao <ethan.kernel@gmail.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

185095fb

dcbnl: Use BUG_ON() instead of BUG() · b3908e22

Thomas Graf authored Jun 13, 2012

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3908e22

dcbnl: Silence harmless gcc warning about uninitialized reply_nlh · 39912f9c
Thomas Graf authored Jun 13, 2012
```
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
39912f9c

13 Jun, 2012 19 commits

bonding: drop_monitor aware · 04502430

Eric Dumazet authored Jun 13, 2012

When packets are dropped in TX path, its better to use kfree_skb()
instead of dev_kfree_skb() to give proper drop_monitor events.

Also move the kfree_skb() call after read_unlock() in bond_alb_xmit()
and bond_xmit_activebackup()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04502430

dcbnl: Use type safe nlmsg_data() · 7a282bc3

Thomas Graf authored Jun 13, 2012

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a282bc3

dcbnl: Move dcb app allocation into dcb_app_add() · 4e4f2f69

Thomas Graf authored Jun 13, 2012

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e4f2f69

dcbnl: Move dcb app lookup code into dcb_app_lookup() · 716b31ab
Thomas Graf authored Jun 13, 2012
```
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
716b31ab

dcbnl: Return consistent error codes · 3d1f4869

Thomas Graf authored Jun 13, 2012

EMSGSIZE - ran out of space while constructing message
EOPNOTSUPP - driver/hardware does not support operation
ENODEV - network device not found
EINVAL - invalid message
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d1f4869

dcbnl: Use dcbnl_newmsg() where possible · ab6d4707

Thomas Graf authored Jun 13, 2012

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab6d4707

dcbnl: Remove now unused dcbnl_reply() · 77c6849d

Thomas Graf authored Jun 13, 2012

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

77c6849d

dcbnl: Shorten all command handling functions · 7be99413

Thomas Graf authored Jun 13, 2012

Allocating and sending the skb in dcb_doit() allows for much
shorter and cleaner command handling functions.

The huge switch statement is replaced with an array based definition
of the handling function and reply message type.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

7be99413

dcbnl: Prepare framework to shorten handling functions · 33a03aad

Thomas Graf authored Jun 13, 2012

There is no need to allocate and send the reply message in each
handling function separately. Instead, the reply skb can be allocated
and sent in dcb_doit() directly.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

33a03aad

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 43b03f1f

David S. Miller authored Jun 12, 2012

Conflicts:
	MAINTAINERS
	drivers/net/wireless/iwlwifi/pcie/trans.c

The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.

The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.
Signed-off-by: David S. Miller <davem@davemloft.net>

43b03f1f

ethtool: Make more commands available to unprivileged processes · 2da45db2

Ben Hutchings authored Jun 12, 2012

'Get' commands should generally not require CAP_NET_ADMIN, with
the exception of those that expose internal state.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2da45db2

net-next: add dev_loopback_xmit() to avoid duplicate code · 95603e22

Michel Machado authored Jun 12, 2012

Add dev_loopback_xmit() in order to deduplicate functions
ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).

I was about to reinvent the wheel when I noticed that
ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
need and are not IP-only functions, but they were not available to reuse
elsewhere.

ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
understand that this is harmless, and should be in dev_loopback_xmit().
Signed-off-by: Michel Machado <michel@digirati.com.br>
CC: "David S. Miller" <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jpirko@redhat.com>
CC: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95603e22

bonding: remove packet cloning in recv_probe() · de063b70

Eric Dumazet authored Jun 11, 2012

Cloning all packets in input path have a significant cost.

Use skb_header_pointer()/skb_copy_bits() instead of pskb_may_pull() so
that recv_probe handlers (bond_3ad_lacpdu_recv / bond_arp_rcv /
rlb_arp_recv ) dont touch input skb.

bond_handle_frame() can avoid the skb_clone()/dev_kfree_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de063b70

usbnet: don't initialize transfer buffer before submit status URB · 072c0559

tom.leiming@gmail.com authored Jun 11, 2012

The line below in intr_complete isn't needed,

	memset(urb->transfer_buffer, 0, urb->transfer_buffer_length);

so just remove it.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

072c0559

usbnet: remove declaration for intr_complete · 24ead299

tom.leiming@gmail.com authored Jun 11, 2012

Remove declaration for intr_complete so that ctags may be happy to
decrease duplicated symbols, also decrease one line code.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24ead299

usbnet: remove flag of EVENT_DEV_WAKING · 4a5a14d3

tom.leiming@gmail.com authored Jun 11, 2012

The flag of EVENT_DEV_WAKING is not used any more, so just remove it.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a5a14d3

usbnet:cdc-phonet: remove usb_get/put_dev in .probe and .disconnect · 50e7d153

tom.leiming@gmail.com authored Jun 11, 2012

usb_device is parent device of usb_interface in the view of driver
model, so its reference count is always held during .probe/.disconnect
of usb_interface instance.

This patch just removes the unnecessay usb_get/put_dev.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50e7d153

usbnet:pegasus: remove usb_get/put_dev in .probe and .disconnect · 5c2f0513

tom.leiming@gmail.com authored Jun 11, 2012

usb_device is parent device of usb_interface in the view of driver
model, so its reference count is always held during .probe/.disconnect
of usb_interface instance.

This patch just removes the unnecessay usb_get/put_dev.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c2f0513

usbnet: remove usb_get/put_dev in .probe and .disconnect · ef9d884d

tom.leiming@gmail.com authored Jun 11, 2012

usb_device is parent device of usb_interface in the view of driver
model, so its reference count is always held during .probe/.disconnect
of usb_interface instance.

This patch just removes the unnecessay usb_get/put_dev.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef9d884d

12 Jun, 2012 6 commits

bonding: Fix corrupted queue_mapping · 5ee31c68

Eric Dumazet authored Jun 12, 2012

In the transmit path of the bonding driver, skb->cb is used to
stash the skb->queue_mapping so that the bonding device can set its
own queue mapping.  This value becomes corrupted since the skb->cb is
also used in __dev_xmit_skb.

When transmitting through bonding driver, bond_select_queue is
called from dev_queue_xmit.  In bond_select_queue the original
skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
and skb->queue_mapping is overwritten with the bond driver queue.

Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
the packet length into skb->cb, thereby overwriting the stashed
queue mappping.  In bond_dev_queue_xmit (called from hard_start_xmit),
the queue mapping for the skb is set to the stashed value which is now
the skb length and hence is an invalid queue for the slave device.

If we want to save skb->queue_mapping into skb->cb[], best place is to
add a field in struct qdisc_skb_cb, to make sure it wont conflict with
other layers (eg : Qdiscc, Infiniband...)

This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8
bytes :

netem qdisc for example assumes it can store an u64 in it, without
misalignment penalty.

Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[].
The largest user is CHOKe and it fills it.

Based on a previous patch from Tom Herbert.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Tom Herbert <therbert@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ee31c68

ipv4: Add interface option to enable routing of 127.0.0.0/8 · d0daebc3

Thomas Graf authored Jun 12, 2012

Routing of 127/8 is tradtionally forbidden, we consider
packets from that address block martian when routing and do
not process corresponding ARP requests.

This is a sane default but renders a huge address space
practically unuseable.

The RFC states that no address within the 127/8 block should
ever appear on any network anywhere but it does not forbid
the use of such addresses outside of the loopback device in
particular. For example to address a pool of virtual guests
behind a load balancer.

This patch adds a new interface option 'route_localnet'
enabling routing of the 127/8 address block and processing
of ARP requests on a specific interface.

Note that for the feature to work, the default local route
covering 127/8 dev lo needs to be removed.

Example:
  $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
  $ ip route del 127.0.0.0/8 dev lo table local
  $ ip addr add 127.1.0.1/16 dev eth0
  $ ip route flush cache

V2: Fix invalid check to auto flush cache (thanks davem)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0daebc3

bonding:record primary when modify it via sysfs · 8a93664d

Weiping Pan authored Jun 10, 2012

If we modify primary via sysfs and it is not a valid slave,
we should record it for future use, and this behavior is the same with
bond_check_params().
Signed-off-by: Weiping Pan <wpan@redhat.com>
Acked-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a93664d

Merge branch 'master' of git://1984.lsi.us.es/net · 5aa04d3a
David S. Miller authored Jun 12, 2012

5aa04d3a

Merge branch 'master' of... · 0440507b

John W. Linville authored Jun 12, 2012

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem

0440507b

tilegx network driver: initial support · e3d62d7e

Chris Metcalf authored Jun 07, 2012

This change adds support for the tilegx network driver based on the
GXIO IORPC support in the tilegx software stack, using the on-chip
mPIPE packet processing engine.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3d62d7e

11 Jun, 2012 4 commits

phy: Use pr_<level> · 8d242488

Joe Perches authored Jun 09, 2012

Use a more current logging style.

Add pr_fmt and missing newlines.
Remove embedded prefixes.
Neaten phy_print_status to avoid using KERN_CONT.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d242488

tg3: Apply short DMA frag workaround to 5906 · b7abee6e

Matt Carlson authored Jun 07, 2012

5906 devices also need the short DMA fragment workaround.  This patch
makes the necessary change.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b7abee6e

af_packet: use sizeof instead of constant in spkt_device · de74e92a

danborkmann@iogearbox.net authored Jun 10, 2012

This small patch removes access to the last element of the spkt_device
array through a constant. Instead, it is accessed by sizeof() to respect
possible changes in if_packet.h.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

de74e92a

net: stmmac: Fix clock en-/disable calls · 883ffd6e

Stefan Roese authored Jun 07, 2012

clk_{un}prepare is mandatory for platforms using common clock framework.
Since these drivers are used by SPEAr platform, which supports common
clock framework, add clk_{un}prepare() support for them. Otherwise
the clocks are not correctly en-/disabled and ethernet support doesn't
work.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Viresh Kumar <viresh.linux@gmail.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

883ffd6e