Commits · b1d8e431bd5639c03ff99d08fd2d5d621969bdc5 · Kirill Smelkov / linux

01 Apr, 2014 16 commits

can: c_can: Avoid led toggling for every packet. · b1d8e431

Thomas Gleixner authored Mar 18, 2014

There is no point to toggle the RX led for every packet. Especially if
we have a full FIFO we want to avoid everything we can.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

b1d8e431

can: c_can: Simplify TX interrupt cleanup · 5a7513ad

Thomas Gleixner authored Mar 18, 2014

The function loads the message object from the hardware to get the
payload length. The previous patch stores that information in an
array, so we can avoid the hardware access.

Remove the hardware access and move the led toggle outside of the
spinlocked region. Toggle the led only once when at least one packet
has been received.

Binary size shrinks along with the code
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

5a7513ad

can: c_can: Store dlc private · 90247008

Thomas Gleixner authored Mar 18, 2014

We can avoid the HW access in TX cleanup path for retrieving the DLC
of the sent package if we store the DLC in a private array.

Ideally this should be handled in the can_echo_skb functions, but I
leave that exercise to the CAN folks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

90247008

can: c_can: Reduce register access · c0a9f4d3

Thomas Gleixner authored Mar 18, 2014

commit 4ce78a83 (can: c_can: Speed up rx_poll function) hyped a
performance improvement by reducing the access to the interrupt
pending register from a dual 16 bit to a single 16 bit access. Wow!

Thereby it crippled the driver to cast the 16 msg objects in stone,
which is completly braindead as contemporary hardware has up to 128
message objects. Supporting larger object buffers is a major surgery,
but it'd be definitely worth it especially as the driver does not
support HW message filtering ....

The logic of the "FIFO" implementation is to split the FIFO in half.

For the lower half we read the buffers and clear the interrupt pending
bit, but keep the newdat bit set, so the HW will queue above those
buffers.

When we read out the last low buffer then we reenable all the low half
buffers by clearing the newdat bit.

The upper half buffers clear the newdat and the interrupt pending bit
right away as we know that the lower half bits are clear and give us a
headstart against the hardware.

Now the implementation is:

    transfer_message_object()
    read_object_and_put_into_skb();

    if (obj < END_OF_LOW_BUF)
       clear_intpending(obj)
    else if (obj > END_OF_LOW_BUF)
       clear_intpending_and_newdat(obj)
    else if (obj == END_OF_LOW_BUF)
       clear_newdat_of_all_low_objects()

The hardware allows to avoid most of the mess simply because we can
tell the transfer_message_object() function to clear bits right away.

So we can be clever and do:

   if (obj <= END_OF_LOW_BUF)
      ctrl = TRANSFER_MSG | CLEAR_INTPND;
   else
      ctrl = TRANSFER_MSG | CLEAR_INTPND | CLEAR_NEWDAT;

    transfer_message_object(ctrl)
    read_object_and_put_into_skb();

    if (obj == END_OF_LOW_BUF)
       clear_newdat_of_all_low_objects()

So we save a complete control operation on all message objects except
the one which is the end of the low buffer. That's a few micro seconds
per object.

I'm not adding a boasting profile to that, simply because it's self
explaining.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

c0a9f4d3

can: c_can: Make the code readable · 520f570c

Thomas Gleixner authored Mar 18, 2014

If every other line contains line breaks, that's a clear sign for
indentation level madness. Split out the inner loop and move the code
to a separate function. gcc creates slightly worse code for that, but
we'll fix that in the next step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

520f570c

can: c_can: Provide protection in the xmit path · bf88a206

Thomas Gleixner authored Mar 18, 2014

The network core does not serialize the access to the hardware. The
xmit related code lets the following happen:

CPU0 	     	       CPU1
interrupt()
 do_poll()
   c_can_do_tx()
    Fiddle with HW and	xmit()
    internal data	  Fiddle with HW and
    	     		  internal data

due the complete lack of serialization.

Add proper locking.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

bf88a206

can: c_can: Remove EOB exit · 710c5610

Thomas Gleixner authored Mar 18, 2014

The rx_poll code has the following gem:

if (msg_ctrl_save & IF_MCONT_EOB)
return num_rx_pkts;

The EOB bit is the indicator for the hardware that this is the last
configured FIFO object. But this object can contain valid data, if we
manage to free up objects before the overrun case hits.

Now if the code exits due to the EOB bit set, then this buffer is
stale and the interrupt bit and NewDat bit of the buffer are still
set. Results in a nice interrupt storm unless we come into an overrun
situation where the MSGLST bit gets set.

ksoftirqd/0-3 [000] ..s. 79.124101: c_can_poll: rx_poll: val: 00008001 pend 00008001
ksoftirqd/0-3 [000] ..s. 79.124176: c_can_poll: rx_poll: val: 00008000 pend 00008000
ksoftirqd/0-3 [000] ..s. 79.124187: c_can_poll: rx_poll: val: 00008002 pend 00008002
ksoftirqd/0-3 [000] ..s. 79.124256: c_can_poll: rx_poll: val: 00008000 pend 00008000
ksoftirqd/0-3 [000] ..s. 79.124267: c_can_poll: rx_poll: val: 00008000 pend 00008000

The amazing thing is that the check of the MSGLST (aka overrun bit)
used to be after the check of the EOB bit. That was "fixed" in commit
5d0f801a(can: c_can: Fix RX message handling, handle lost message
before EOB). But the author of this "fix" did not even understand that
the EOB check is broken as well.

Again a simple solution: Remove
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

710c5610

can: c_can: Fix the lost message handling · 07c7b6f6

Thomas Gleixner authored Mar 18, 2014

The lost message handling is broken in several ways.

1) Clearing the message lost flag is done by writing 0 to the
   message control register of the object.

   #define IF_MCONT_CLR_MSGLST    (0 << 14)

   That clears the object buffer configuration in the worst case,
   which results in a loss of the EOB flag. That leaves the FIFO chain
   without a limit and causes a complete lockup of the HW

2) In case that the error skb allocation fails, the code happily
   claims that it handed down a packet. Just an accounting bug, but ....

3) The code adds a lot of pointless overhead to that error case, where
   we need to get stuff done as fast as possible to avoid more packet
   loss.

   - printk an annoying error message
   - reread the object buffer for nothing

Fix is simple again:

  - Use the already known MSGCTRL content and only clear the MSGLST bit
  - Fix the buffer accounting by adding a proper return code
  - Remove the pointless operations
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

07c7b6f6

can: c_can: Fix buffer ordering · 64f08f2f

Thomas Gleixner authored Mar 18, 2014

The buffer handling of c_can has been broken forever. That leads to
message reordering:

ksoftirqd/0-3     [000] ..s.    79.123776: c_can_poll: rx_poll: val: 00007fff
ksoftirqd/0-3     [000] ..s.    79.124101: c_can_poll: rx_poll: val: 00008001

What happens is:

CPU				HW
				queue new packet into obj 16 (0-15 are busy)
read obj 1-15
return because pending is 0
				set pending obj 16 -> pending reg 8000
				queue new packet into obj 1
				set pending obj 1 -> pending reg 8001

So the current algorithmus reads the newest message first, which
violates the ordering rules of CAN.

Add proper handling of that situation by analyzing the contents of the
pending register for gaps.

This does NOT fix the message object corruption which can lead to
interrupt storms. Thats addressed in the next patches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

64f08f2f

can: c_can: Make it SMP safe · 640916db

Thomas Gleixner authored Mar 18, 2014

The hardware has two message control interfaces, but the code only uses the
first one. So on SMP the following can be observed:

CPU0 	       	CPU1
rx_poll()
  write IF1	xmit()
		write IF1
  write IF1

That results in corrupted message object configurations. The TX/RX is not
globally serialized it's only serialized on a core.

Simple solution: Let RX use IF1 and TX use IF2 and all is good.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

640916db

can: c_can: Fix hardware raminit function · 5bb9cbaa

Thomas Gleixner authored Mar 18, 2014

The function is broken in several ways:

    - The function does not wait for the init to complete.
      That can take quite some microseconds.

    - No protection against being called for two chips at the same
      time. SMP is such a new thing, right?

Clear the start and the init done bit unconditionally and wait for both bits to
be clear.

In the enable path set the init bit and wait for the init done bit.

Add proper locking.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

5bb9cbaa

can: c_can: Wait for CONTROL_INIT to be cleared · 9fac1d1a

Thomas Gleixner authored Mar 18, 2014

According to the documentation the CPU must wait for CONTROL_INIT to
be cleared before writing to the baudrate registers.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

9fac1d1a

can: c_can: check return value to users of c_can_set_bittiming() · 130a5171

Marc Kleine-Budde authored Mar 18, 2014

This patch adds return value checking to all direct and indirect users of
c_can_set_bittiming().
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

130a5171

can: c_can: free_c_can_dev(): add missing netif_napi_del() · f29b4238

Marc Kleine-Budde authored Mar 18, 2014

This patch adds the missing netif_napi_del() to the free_c_can_dev() function.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

f29b4238

can: Documentation: fix parameter name "sample-point" · 5fb7639d

Robert Schwebel authored Mar 19, 2014

This patch fixes the name of the parameter to configure the sample point used
in iproute2's ip command. The correct writing is "sample-point" not
"sample_point".
Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

5fb7639d

can: usb_8dev: Fix memory leak in usb_8dev_start_xmit · 636d0375

Bjorn Van Tilt authored Mar 24, 2014

Fixed a memory leak when an error occurred in the transmit function. In the
error handling the urb wasn't freed before returning. There was also a call to
the usb_unanchor_urb() function but the urb wasn't anchored.
Signed-off-by: Bjorn Van Tilt <bjorn.vantilt@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

636d0375

31 Mar, 2014 1 commit

at86rf230: mask irq's before deregister device · 17e84a92

Alexander Aring authored Mar 31, 2014

While transmit over a at86rf231 device and unloading the module I got:

[   29.643073] WARNING: CPU: 0 PID: 3 at kernel/workqueue.c:1335 __queue_work+0xb4/0x224()
[   29.651457] Modules linked in: at86rf230(-) autofs4
[   29.656612] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W    3.14.0-rc6-01602-g902659e-dirty #294
[   29.666490] [<c00124f0>] (unwind_backtrace) from [<c0010ad0>] (show_stack+0x10/0x14)
[   29.674628] [<c0010ad0>] (show_stack) from [<c0032c80>] (warn_slowpath_common+0x60/0x80)
[   29.683116] [<c0032c80>] (warn_slowpath_common) from [<c0032d30>] (warn_slowpath_null+0x18/0x20)
[   29.692329] [<c0032d30>] (warn_slowpath_null) from [<c0045b08>] (__queue_work+0xb4/0x224)
[   29.700906] [<c0045b08>] (__queue_work) from [<c0045cc8>] (queue_work_on+0x50/0x78)
[   29.708944] [<c0045cc8>] (queue_work_on) from [<c05669cc>] (mac802154_tx+0x1e4/0x240)
[   29.717164] [<c05669cc>] (mac802154_tx) from [<c0471814>] (dev_hard_start_xmit+0x2f0/0x43c)
[   29.725926] [<c0471814>] (dev_hard_start_xmit) from [<c04878d0>] (sch_direct_xmit+0x64/0x2a0)
[   29.734867] [<c04878d0>] (sch_direct_xmit) from [<c0487c38>] (__qdisc_run+0x12c/0x18c)
[   29.743169] [<c0487c38>] (__qdisc_run) from [<c046e1b0>] (net_tx_action+0xe0/0x178)
[   29.751205] [<c046e1b0>] (net_tx_action) from [<c0036690>] (__do_softirq+0x100/0x264)
[   29.759420] [<c0036690>] (__do_softirq) from [<c0036818>] (run_ksoftirqd+0x24/0x4c)
[   29.767453] [<c0036818>] (run_ksoftirqd) from [<c005232c>] (smpboot_thread_fn+0x128/0x13c)
[   29.776121] [<c005232c>] (smpboot_thread_fn) from [<c004c3fc>] (kthread+0xd0/0xe4)
[   29.784061] [<c004c3fc>] (kthread) from [<c000da88>] (ret_from_fork+0x14/0x2c)
[   29.791628] ---[ end trace 3406ff24bd973834 ]---

The problem is there are still interrupts after deregister ieee802154
device. This patch mask all interrupts in the at86rf2xx chips before
deregister the device.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17e84a92

29 Mar, 2014 4 commits

Merge branch 'xen-netback' · df69491b

David S. Miller authored Mar 29, 2014

Paul Durrant says:

====================
xen-netback: fix rx slot estimation

Sander Eikelenboom reported an issue with ring overflow in netback in
3.14-rc3. This turns outo be be because of a bug in the ring slot estimation
code. This patch series fixes the slot estimation, fixes the BUG_ON() that
was supposed to catch the issue that Sander ran into and also makes a small
fix to start_new_rx_buffer().

v3:
 - Added a cap of MAX_SKB_FRAGS to estimate in patch #2

v2:
 - Added BUG_ON() to patch #1
 - Added more explanation to patch #3
====================
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df69491b

xen-netback: BUG_ON in xenvif_rx_action() not catching overflow · 1425c7a4

Paul Durrant authored Mar 28, 2014

The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption
that meta_slots_used == ring slots used. This is not necessarily the case
for GSO packets, because the non-prefix GSO protocol consumes one more ring
slot than meta-slot for the 'extra_info'. This patch changes the test to
actually check ring slots.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

1425c7a4

xen-netback: worse-case estimate in xenvif_rx_action is underestimating · a02eb473

Paul Durrant authored Mar 28, 2014

The worse-case estimate for skb ring slot usage in xenvif_rx_action()
fails to take fragment page_offset into account. The page_offset does,
however, affect the number of times the fragmentation code calls
start_new_rx_buffer() (i.e. consume another slot) and the worse-case
should assume that will always return true. This patch adds the page_offset
into the DIV_ROUND_UP for each frag.

Unfortunately some frontends aggressively limit the number of requests
they post into the shared ring so to avoid an estimate that is 'too'
pessimal it is capped at MAX_SKB_FRAGS.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

a02eb473

xen-netback: remove pointless clause from if statement · 0576eddf

Paul Durrant authored Mar 28, 2014

This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

0576eddf

28 Mar, 2014 19 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 49d8137a

Linus Torvalds authored Mar 28, 2014

Pull networking fixes from David Miller:

 1) We've discovered a common error in several networking drivers, they
    put VLAN offload features into ->vlan_features, which would suggest
    that they support offloading 2 or more levels of VLAN encapsulation.
    Not only do these devices not do that, but we don't have the
    infrastructure yet to handle that at all.

    Fixes from Vlad Yasevich.

 2) Fix tcpdump crash with bridging and vlans, also from Vlad.

 3) Some MAINTAINERS updates for random32 and bonding.

 4) Fix late reseeds of prandom generator, from Sasha Levin.

 5) Bridge doesn't handle stacked vlans properly, fix from Toshiaki
    Makita.

 6) Fix deadlock in openvswitch, from Flavio Leitner.

 7) get_timewait4_sock() doesn't report delay times correctly, fix from
    Eric Dumazet.

 8) Duplicate address detection and addrconf verification need to run in
    contexts where RTNL can be obtained.  Move them to run from a
    workqueue.  From Hannes Frederic Sowa.

 9) Fix route refcount leaking in ip tunnels, from Pravin B Shelar.

10) Don't return -EINTR from non-blocking recvmsg() on AF_UNIX sockets,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
  vlan: Warn the user if lowerdev has bad vlan features.
  veth: Turn off vlan rx acceleration in vlan_features
  ifb: Remove vlan acceleration from vlan_features
  qlge: Do not propaged vlan tag offloads to vlans
  bridge: Fix crash with vlan filtering and tcpdump
  net: Account for all vlan headers in skb_mac_gso_segment
  MAINTAINERS: bonding: change email address
  MAINTAINERS: bonding: change email address
  ipv6: move DAD and addrconf_verify processing to workqueue
  tcp: fix get_timewait4_sock() delay computation on 64bit
  openvswitch: fix a possible deadlock and lockdep warning
  bridge: Fix handling stacked vlan tags
  bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
  vhost: validate vhost_get_vq_desc return value
  vhost: fix total length when packets are too short
  random32: avoid attempt to late reseed if in the middle of seeding
  random32: assign to network folks in MAINTAINERS
  net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  vlan: Set hard_header_len according to available acceleration
  ...

49d8137a

Merge branch 'vlan_offloads' · 5f2feca2

David S. Miller authored Mar 28, 2014

Vlad Yasevich says:

====================
Audit all drivers for correct vlan_features.

Some drivers set vlan acceleration features in vlan_features.  This causes
issues with Q-in-Q/802.1ad configurations.

Audit all the drivers for correct vlan_features.  Fix broken ones.
Add a warning to vlan code to help catch future offenders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5f2feca2

vlan: Warn the user if lowerdev has bad vlan features. · 2adb956b

Vlad Yasevich authored Mar 27, 2014

Some drivers incorrectly assign vlan acceleration features to
vlan_features thus causing issues for Q-in-Q vlan configurations.
Warn the user of such cases.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2adb956b

veth: Turn off vlan rx acceleration in vlan_features · 3f8c707b

Vlad Yasevich authored Mar 27, 2014

For completeness, turn off vlan rx acceleration in vlan_features so
that it doesn't show up on q-in-q setups.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f8c707b

ifb: Remove vlan acceleration from vlan_features · 8dd6e147

Vlad Yasevich authored Mar 27, 2014

Do not include vlan acceleration features in vlan_features as that
precludes correct Q-in-Q operation.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8dd6e147

qlge: Do not propaged vlan tag offloads to vlans · f6d1ac4b

Vlad Yasevich authored Mar 27, 2014

qlge driver turns off NETIF_F_HW_CTAG_FILTER, but forgets to
turn off HW_CTAG_TX and HW_CTAG_RX on vlan devices.  With the
current settings, q-in-q will only generate a single vlan header.
Remember to mask off CTAG_TX and CTAG_RX features in vlan_features.

CC: Shahed Shaikh <shahed.shaikh@qlogic.com>
CC: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
CC: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6d1ac4b

bridge: Fix crash with vlan filtering and tcpdump · fc92f745

Vlad Yasevich authored Mar 27, 2014

When the vlan filtering is enabled on the bridge, but
the filter is not configured on the bridge device itself,
running tcpdump on the bridge device will result in a
an Oops with NULL pointer dereference.  The reason
is that br_pass_frame_up() will bypass the vlan
check because promisc flag is set.  It will then try
to get the table pointer and process the packet based
on the table.  Since the table pointer is NULL, we oops.
Catch this special condition in br_handle_vlan().
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc92f745

net: Account for all vlan headers in skb_mac_gso_segment · 53d6471c

Vlad Yasevich authored Mar 27, 2014

skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

53d6471c

MAINTAINERS: bonding: change email address · 898602a0

Veaceslav Falico authored Mar 27, 2014

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

898602a0

MAINTAINERS: bonding: change email address · 79b30750

Jay Vosburgh authored Mar 27, 2014

Update my email address.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79b30750

Merge branch 'akpm' (patches from Andrew Morton) · bc53267e

Linus Torvalds authored Mar 28, 2014

Merge two fixes from Andrew Morton:
 "The x86 fix should come from x86 guys but they appear to be
  conferencing or otherwise distracted.

  The ocfs2 fix is a bit of a mess - the code runs into an immediate
  NULL deref and we're trying to work out how this got through test and
  review, but we haven't heard from Goldwyn in the past few days.
  Sasha's patch fixes the oops, but the feature as a whole is probably
  broken.  So this is a stopgap for 3.14 - I'll aim to get the real
  fixes into 3.14.x"

* emailed patches from Andrew Morton akpm@linux-foundation.org>:
  x86: fix boot on uniprocessor systems
  ocfs2: check if cluster name exists before deref

bc53267e

x86: fix boot on uniprocessor systems · 825600c0

Artem Fetishev authored Mar 28, 2014

On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too.  Enabling them also fixes
rapl_pmu code.
Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

825600c0

ocfs2: check if cluster name exists before deref · d9060742

Sasha Levin authored Mar 28, 2014

Commit c74a3bdd ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: strlcpy (lib/string.c:388 lib/string.c:151)
  CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
  RIP:  strlcpy (lib/string.c:388 lib/string.c:151)
  Call Trace:
   ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
   ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
   user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
   dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
   vfs_mkdir (fs/namei.c:3467)
   SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
   tracesys (arch/x86/kernel/entry_64.S:749)

akpm: this patch probably disables the feature.  A temporary thing to
avoid triviel oopses.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d9060742

ipv6: move DAD and addrconf_verify processing to workqueue · c15b1cca

Hannes Frederic Sowa authored Mar 27, 2014

addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c15b1cca

tcp: fix get_timewait4_sock() delay computation on 64bit · e2a1d3e4

Eric Dumazet authored Mar 27, 2014

It seems I missed one change in get_timewait4_sock() to compute
the remaining time before deletion of IPV4 timewait socket.

This could result in wrong output in /proc/net/tcp for tm->when field.

Fixes: 96f817fe ("tcp: shrink tcp6_timewait_sock by one cache line")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2a1d3e4

openvswitch: fix a possible deadlock and lockdep warning · 4f647e0a

Flavio Leitner authored Mar 27, 2014

There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +->spin_lock remote CPU#1        ovs_flow_stats_get()
 |  <interrupted>                  stats_read()
 |  ...                       +-->  spin_lock remote CPU#0
 |                            |     <interrupted>
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <jesse@nicira.com>

=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06aa #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06aa #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
 [<ffffffff810f8963>] mark_lock+0x223/0x2b0
 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
 [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f647e0a

bridge: Fix handling stacked vlan tags · 99b192da

Toshiaki Makita authored Mar 27, 2014

If a bridge with vlan_filtering enabled receives frames with stacked
vlan tags, i.e., they have two vlan tags, br_vlan_untag() strips not
only the outer tag but also the inner tag.

br_vlan_untag() is called only from br_handle_vlan(), and in this case,
it is enough to set skb->vlan_tci to 0 here, because vlan_tci has already
been set before calling br_handle_vlan().
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99b192da

bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled · 12464bb8

Toshiaki Makita authored Mar 27, 2014

Bridge vlan code (br_vlan_get_tag()) assumes that all frames have vlan_tci
if they are tagged, but if vlan tx offload is manually disabled on bridge
device and frames are sent from vlan device on the bridge device, the tags
are embedded in skb->data and they break this assumption.
Extract embedded vlan tags and move them to vlan_tci at ingress.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12464bb8

vhost: validate vhost_get_vq_desc return value · a39ee449

Michael S. Tsirkin authored Mar 27, 2014

vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014ad
    vhost-net: mergeable buffers support

CVE-2014-0055
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a39ee449