Commits · aeea64ac717a920ea655b061e37b14fbc872f7db · Kirill Smelkov / linux

25 Jun, 2013 40 commits

bonding: don't trust arp requests unless active slave really works · aeea64ac

Veaceslav Falico authored Jun 24, 2013

Currently, if we receive any arp packet on a backup slave in active-backup
mode and arp_validate enabled, we suppose that it's an arp request, swap
source/target ip and try to validate it. This optimization gives us
virtually no downtime in the most common situation (active and backup
slaves are in the same broadcast domain and the active slave failed).

However, if we can't reach the arp_ip_target(s), we end up in an endless
loop of reselecting slaves, because we receive our arp requests, sent by
the active slave, and think that backup slaves are up, thus selecting them
as active and, again, sending arp requests, which fool our backup slaves.

Fix this by not validating the swapped arp packets if the current active
slave didn't receive any arp reply after it was selected as active. This
way we will only accept arp requests if we know that the current active
slave can actually reach arp_ip_target.

v3->v4:
Obey 80 lines and make checkpatch.pl happy, per Sergei's suggestion.

v1->v3:
No change.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aeea64ac

bonding: don't validate arp if we don't have to · 2c146102

Veaceslav Falico authored Jun 24, 2013

Currently, we validate all the incoming arps if arp_validate not 0.
However, we don't have to validate backup slaves if arp_validate == active
and vice versa, so return early in bond_arp_rcv() in these cases.

It works correctly now because we verify arp_validate in slave_last_rx(),
however we're just doing useless work in bond_arp_rcv().
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c146102

bonding: don't add duplicate targets to arp_ip_target · 0afee4e8

Veaceslav Falico authored Jun 24, 2013

Print a warning and skip them.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0afee4e8

bonding: add helper function bond_get_targets_ip(targets, ip) · 87a7b84b

Veaceslav Falico authored Jun 24, 2013

Add function bond_get_targets_ip(targets, ip) which searches through
targets array of ips (arp_targets) and returns the position of first
match. If ip == 0, returns the first free slot. On failure to find the
ip or free slot, return -1.

Use it to verify if the arp we've received is valid and in sysfs.

v1->v2:
Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)",
per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might
update 'null' arp_ip_targets' last_rx. Also, address style.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87a7b84b

net: davinci_mdio: gaurd the DT code with IS_ENABLED(CONFIG_OF) · 277e2a84

Lad, Prabhakar authored Jun 25, 2013

guard the davinci_mdio_of_mtable table and davinci_mdio_probe_dt()
with CONFIG_OF.
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

277e2a84

net: davinci_emac: simplify the OF parser code · 151328c8

Lad, Prabhakar authored Jun 25, 2013

This patch cleans up the OF parser code, removes unnecessary checks
on of_property_read_*() and guards davinci_emac_of_match table with
CONFIG_OF.
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

151328c8

net: davinci: emac: Convert to devm_* api · 6892b41d

Lad, Prabhakar authored Jun 25, 2013

Use devm_ioremap_resource instead of devm_request_mem_region()/devm_ioremap()
and devm_request_irq() instead of request_irq().

This ensures more consistent error values and simplifies error paths.
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6892b41d

doc: fix some syntax errors in netlink mmap sample code · 76237576

Cong Wang authored Jun 24, 2013

Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76237576

macvtap: Perform GSO on forwarding path. · 3e4f8b78

Vlad Yasevich authored Jun 25, 2013

When macvtap forwards skb to its tap, it needs to check
if GSO needs to be performed.  This is sometimes necessary
when the HW device performed GRO, but the guest reading
from the tap does not support it (ex: Windows 7).
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e4f8b78

macvtap: Let TUNSETOFFLOAD actually controll offload features. · 2be5c767

Vlad Yasevich authored Jun 25, 2013

When the user issues TUNSETOFFLOAD ioctl, macvtap does not do
anything other then to verify arguments.  This patch adds
functionality to allow users to actually control offload features.
NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the
features can be controlled.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2be5c767

macvtap: Consistently use rcu functions · ac4e4af1

Vlad Yasevich authored Jun 25, 2013

Currently macvtap uses rcu_bh functions in its
user facing fuction macvtap_get_user() and macvtap_put_user().
However, its packet handlers use normal rcu as the rcu_read_lock()
is taken in netif_receive_skb().  We can safely discontinue
the usage or rcu with bh disabled.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac4e4af1

macvtap: Convert to using rtnl lock · 441ac0fc

Vlad Yasevich authored Jun 25, 2013

Macvtap uses a private lock to protect the relationship between
macvtap_queue and macvlan_dev.  The private lock is not needed
since the relationship is managed by user via open(), release(),
and dellink() calls.  dellink() already happens under rtnl, so
we can safely convert open() and release(), and use it in ioctl()
as well.

Suggested by Eric Dumazet.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

441ac0fc

net: poll/select low latency socket support · 2d48d67f

Eliezer Tamir authored Jun 24, 2013

select/poll busy-poll support.

Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txt

Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.

When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.

Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d48d67f

ethernet/arc/arc_emac - Add new driver · e4f2379d

Alexey Brodkin authored Jun 24, 2013

Driver for non-standard on-chip ethernet device ARC EMAC 10/100,
instantiated in some legacy ARC (Synopsys) FPGA Boards such as
ARCAngel4/ML50x.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>

Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Joe Perches <joe@perches.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Mischa Jonker <mjonker@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Cc: Florian Fainelli <florian@openwrt.org>
Cc: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4f2379d

net: sctp: simplify sctp_get_port · 62208f12

Daniel Borkmann authored Jun 25, 2013

No need to have an extra ret variable when we directly can return
the value of sctp_get_port_local().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62208f12

net: sctp: decouple cleaning some socket data from endpoint · 0a2fbac1

Daniel Borkmann authored Jun 25, 2013

Rather instead of having the endpoint clean the garbage from the
socket, use a sk_destruct handler sctp_destruct_sock(), that does
the job for that when there are no more references on the socket.
At least do this for our crypto transform through crypto_free_hash()
that is allocated when in listening state.

Also, perform sctp_put_port() only when sk is valid. At a later
point in time we can still determine if there's an option of
placing this into sk_prot->unhash() or sctp_endpoint_free() without
any races. For now, leave it in sctp_endpoint_destroy() though.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a2fbac1

net: sctp: minor: sctp_seq_dump_local_addrs add missing newline · b527fe69

Daniel Borkmann authored Jun 25, 2013

A trailing newline has been forgotten to add into the WARN().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b527fe69

net: sctp: migrate cookie life from timeval to ktime · 52db882f

Daniel Borkmann authored Jun 25, 2013

Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.

We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52db882f

ktime: add ms_to_ktime() and ktime_add_ms() helpers · d36f82b2

Daniel Borkmann authored Jun 25, 2013

Add two ktime helper functions that i) convert a given msec value to
a ktime structure and ii) that adds a msec value to a ktime structure.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d36f82b2

net: sctp: remove TEST_FRAME ifdef · f072d7ab

Daniel Borkmann authored Jun 25, 2013

We do neither ship a test_frame.h, nor will this be compatible with
the 2.5 out-of-tree lksctp kernel test suite anyway. So remove this
artefact.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f072d7ab

net/mlx4_core: Fail device init if num_vfs is negative · 30e514a7

Jack Morgenstein authored Jun 25, 2013

Should not allow negative num_vfs
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30e514a7

net/mlx4_core: Add warning in case of command timeouts · 674925ed

Dotan Barak authored Jun 25, 2013

Warning prints when there are command timeout to help debugging future
failures.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

674925ed

net/mlx4_core: Replace sscanf() with kstrtoint() · 618fad95

Dotan Barak authored Jun 25, 2013

It is not safe to use sscanf.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

618fad95

net/mlx4_en: Remove an unnecessary test · 42f1e902

Dotan Barak authored Jun 25, 2013

Since this variable is now part of a structure and not allocated dynamically,
this test is irrelevant now.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

42f1e902

net/mlx4_en: Add prints when TX timeout occurs · b944ebec

Yevgeny Petrilin authored Jun 25, 2013

Print a warning when a TX timeout is detected
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b944ebec

net/mlx4_en: Fix a race between napi poll function and RX ring cleanup · 0cc5c8bf

Eugenia Emantayev authored Jun 25, 2013

The RX rings were cleaned while there was still possible RX traffic completion
handling.
Change the sequance of events so that the port is closed and the QPs are being
stopped before RX cleanup.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cc5c8bf

net/mlx4_en: Change log level from error to debug for vlan related messages · 9e19b545

Eugenia Emantayev authored Jun 25, 2013

The port vlan table size is 126 (used for IBoE) so after 126 we will
not have space and the user need to see it only in debug print and not
error.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e19b545

net/mlx4_en: Move register_netdev() to the end of initialization function · 4801ae70

Eugenia Emantayev authored Jun 25, 2013

To avoid a race between the open function and everything that happens after
register_netdev() move it to be the last operation called.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4801ae70

net/mlx4_en: Do not query stats when device port is down · 6123db2e

Jack Morgenstein authored Jun 25, 2013

There are no counters allocated to the eth device when the port is down, so
this query is meaningless at that time.

It also leads to querying incorrect counters (since the counter_index is not
valid when the device port is down).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6123db2e

net/mlx4_en: Fix resource leak in error flow · 8850494a

Dotan Barak authored Jun 25, 2013

Wrong condition was used when calling iounmap.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8850494a

ipv6: remove old token ipv6 address as soon as possible · 2b9651d7

Hannes Frederic Sowa authored Jun 24, 2013

If the tokenized ip address is re-set on an interface we depend on the
arrival of a new router advertisment to call addrconf_verify to clean
up the old address (which valid_lft is now set to 0). Old addresses can
linger around for a longer time if e.g. the source of router advertisments
vanishes.

So, call addrconf_verify immediately after setting the new tokenized
address to get rid of the old tokenized addresses.

Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b9651d7

ipv6: don't disable interface if last ipv6 address is removed · 876fd05d

Hannes Frederic Sowa authored Jun 24, 2013

The reason behind this change is that as soon as we delete
the last ipv6 address of an interface we also lose the
/proc/sys/net/ipv6/conf/<interface> directory. This seems to be a
usability problem for me.

I don't see any reason why we should shutdown ipv6 on that interface in
such cases.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

876fd05d

ipv6: split duplicate address detection and router solicitation timer · b7b1bfce

Hannes Frederic Sowa authored Jun 23, 2013

This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.

The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.

If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.

Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b7b1bfce

mlx4: allow order-0 memory allocations in RX path · 51151a16

Eric Dumazet authored Jun 23, 2013

Signed-off-by: Eric Dumazet <edumazet@google.com>

mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.

We therefore drop frames more than needed.

This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.

By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.

Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51151a16

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next · 3bae9db9

David S. Miller authored Jun 25, 2013

Ben Hutchings says:

====================
1. Make EEH recovery work when using legacy interrupts, from Alexandre
   Rames.

2. Enable accelerated RFS for VLAN-tagged flows, from Andy Lutomirski.

3. Improve performance for non-TCP (and particularly UDP) traffic, which
   regressed in 3.10 when we switched to always allocating paged RX
   buffers.  Partly by Jon Cooper.

4. Some minor bug fixes to IOMMU detection, timestamping capabilities,
   and IRQ cleanup on the probe failure path.

I've dropped the RX skb cache, which improved some benchmarks but
perhaps needs some reworking to be more generally useful.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3bae9db9

qeth: use default napi weight · 6541aa52

Sebastian Ott authored Jun 24, 2013

Since commit 82dc3c63
"net: introduce NAPI_POLL_WEIGHT" network drivers receive a warning
when they use napi weight higher than NAPI_POLL_WEIGHT. This patch
reduces QETH_NAPI_WEIGHT from 128 to 64 (NAPI_POLL_WEIGHT).
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6541aa52

qeth: Fix crash on initial MTU size change · ede88671

Stefan Raspl authored Jun 24, 2013

When the initial MTU size is changed prior to any activity on the device
(e.g. by attaching a z/VM vNIC already configured in Linux to a guestLAN),
we call dev_kfree_skb_irq(NULL) which results in a kernel panic.
Adding a proper check for NULL pointers to address this issue.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ede88671

qeth: change default standard blkt settings for OSA · a0c98523

Ursula Braun authored Jun 24, 2013

blkt settings (or LAN idle settings) for an OSA Express card
determine when and how often an OSA Express card tells the
operating system about new incoming packets. The semantic of
these settings has changed starting with OSA Express3. Currently
the qeth standard settings apply to OSA Express2 and older
generations of OSA Express cards, while new generations of OSA
Express cards require extra coding of their reasonable default.

To cover future OSA Express generations the qeth default standard
blkt setting is now the desired setting for OSA generations
starting with OSA Express3, while the fixed set of older OSA
Express cards receives its blkt settings explicitly.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0c98523

qeth: Increase default MTU for OSA devices · fe44014a

Stefan Raspl authored Jun 24, 2013

Increase the default MTU for real OSA devices in layer 2 mode
to 1500 Bytes for increased compatibility.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe44014a

netiucv: remove unused macro · 819bc78f

Andy Shevchenko authored Jun 24, 2013

If someone is interested to dump something they may consider to use
print_hex_dump() or print_hex_dump_bytes() kernel helpers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

819bc78f