- 24 Apr, 2018 14 commits
-
-
NeilBrown authored
When a walk of an rhashtable is interrupted with rhastable_walk_stop() and then rhashtable_walk_start(), the location to restart from is based on a 'skip' count in the current hash chain, and this can be incorrect if insertions or deletions have happened. This does not happen when the walk is not stopped and started as iter->p is a placeholder which is safe to use while holding the RCU read lock. In rhashtable_walk_start() we can revalidate that 'p' is still in the same hash chain. If it isn't then the current method is still used. With this patch, if a rhashtable walker ensures that the current object remains in the table over a stop/start period (possibly by elevating the reference count if that is sufficient), it can be sure that a walk will not miss objects that were in the hashtable for the whole time of the walk. rhashtable_walk_start() may not find the object even though it is still in the hashtable if a rehash has moved it to a new table. In this case it will (eventually) get -EAGAIN and will need to proceed through the whole table again to be sure to see everything at least once. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
NeilBrown authored
The documentation claims that when rhashtable_walk_start_check() detects a resize event, it will rewind back to the beginning of the table. This is not true. We need to set ->slot and ->skip to be zero for it to be true. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
NeilBrown authored
Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though they do take a spinlock without irq protection. So revise the comments to accurately state the contexts in which these functions can be called. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
NeilBrown authored
grow_decision and shink_decision no longer exist, so remove the remaining references to them. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive, given they have no result. We can omit the calls for sockets that have no md5 keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yafang Shao authored
This reverts commit <c6849a3a> ("net: init sk_cookie for inet socket") Per discussion with Eric, when update sock_net(sk)->cookie_gen, the whole cache cache line will be invalidated, as this cache line is shared with all cpus, that may cause great performace hit. Bellow is the data form Eric. "Performance is reduced from ~5 Mpps to ~3.8 Mpps with 16 RX queues on my host" when running synflood test. Have to revert it to prevent from cache line false sharing. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Tal Gilboa says: ==================== Introduce adaptive TX interrupt moderation to net DIM Net DIM is a library designed for dynamic interrupt moderation. It was implemented and optimized with receive side interrupts in mind, since these are usually the CPU expensive ones. This patch-set introduces adaptive transmit interrupt moderation to net DIM, complete with a usage in the mlx5e driver. Using adaptive TX behavior would reduce interrupt rate for multiple scenarios. Furthermore, it is essential for increasing bandwidth on cases where payload aggregation is required. v3: Remove "inline" from functions in .c files (requested by DaveM). Revert adding "enabled" field from struct net_dim and applied mlx5e structural suggestions (suggested by SaeedM). v2: Rebase over proper tree. v1: Fix compilation issues due to missed function renaming. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tal Gilboa authored
Add support for adaptive TX moderation. This greatly reduces TX interrupt rate and increases bandwidth, mostly for TCP bandwidth over ARM architecture (below). There is a slight single stream TCP with very large message sizes degradation (x86). In this case if there's any moderation on transmitted packets the bandwidth would reduce due to hitting TCP output limit. Since this is a synthetic case, this is still worth doing. Performance improvement (ConnectX-4Lx 40GbE, ARM) TCP 64B bandwidth with 1-50 streams increased 6-35%. TCP 64B bandwidth with 100-500 streams increased 20-70%. Performance improvement (ConnectX-5 100GbE, x86) Bandwidth: increased up to 40% (1024B with 10s of streams). Interrupt rate: reduced up to 50% (1024B with 1000s of streams). Performance degradation (ConnectX-5 100GbE, x86) Bandwidth: up to 10% decrease single stream TCP (1MB message size from 51Gb/s to 47Gb/s). Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tal Gilboa authored
Interrupt moderation for TX traffic requires different profiles than RX interrupt moderation. The main goal here is to reduce interrupt rate and allow better payload aggregation by keeping SKBs in the TX queue a bit longer. Ping-pong behavior would get a profile with a short timer, so latency wouldn't increase for these scenarios. There might be a slight degradation in bandwidth for single stream with large message sizes, since net.ipv4.tcp_limit_output_bytes is limiting the allowed TX traffic, but with many streams performance is always improved. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tal Gilboa authored
Preparation for introducing adaptive TX to net DIM. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Similar to commit a2ac9990 ("vhost-net: set packet weight of tx polling to 2 * vq size"), we need a packet-based limit for handler_rx, too - elsewhere, under rx flood with small packets, tx can be delayed for a very long time, even without busypolling. The pkt limit applied to handle_rx must be the same applied by handle_tx, or we will get unfair scheduling between rx and tx. Tying such limit to the queue length makes it less effective for large queue length values and can introduce large process scheduler latencies, so a constant valued is used - likewise the existing bytes limit. The selected limit has been validated with PVP[1] performance test with different queue sizes: queue size 256 512 1024 baseline 366 354 362 weight 128 715 723 670 weight 256 740 745 733 weight 512 600 460 583 weight 1024 423 427 418 A packet weight of 256 gives peek performances in under all the tested scenarios. No measurable regression in unidirectional performance tests has been detected. [1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
Fixes: b16fb418 ("net: fib_rules: add extack support") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anders Roxell authored
Fixes: 192dc405 ("selftests: net: add tcp_mmap program") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Function dca_common_get_tag is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: drivers/dca/dca-core.c:273:4: warning: symbol 'dca_common_get_tag' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Apr, 2018 15 commits
-
-
David S. Miller authored
David Ahern says: ==================== net/ipv6: couple of fixes for rcu change to from So many details... I am thankful for all the robots running the permutations and tools. Two bug fixes from the rcu change to rt->from: 1. missing rcu lock in ip6_negative_advice 2. rcu dereferences in 2 sites ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
kbuild test robot reported 2 uses of rt->from not properly accessed using rcu_dereference: 1. add rcu_dereference_protected to rt6_remove_exception_rt and make sure it is always called with rcu lock held. 2. change rt6_do_redirect to take a reference on 'from' when accessed the first time so it can be used the sceond time outside of the lock Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
syzbot reported a suspicious rcu_dereference_check: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4592 rt6_check_expired+0x38b/0x3e0 net/ipv6/route.c:410 ip6_negative_advice+0x67/0xc0 net/ipv6/route.c:2204 dst_negative_advice include/net/sock.h:1786 [inline] sock_setsockopt+0x138f/0x1fe0 net/core/sock.c:1051 __sys_setsockopt+0x2df/0x390 net/socket.c:1899 SYSC_setsockopt net/socket.c:1914 [inline] SyS_setsockopt+0x34/0x50 net/socket.c:1911 Add rcu locking around call to rt6_check_expired in ip6_negative_advice. Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected") Reported-by: syzbot+2422c9e35796659d2273@syzkaller.appspotmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Denis Bolotin says: ==================== Add configuration information to register dump and debug data The purpose of this patchset is to add configuration information to the debug data collection, which already contains register dump. The first patch (removing the ptt) is essential because it prevents the unnecessary ptt acquirement when calling mcp APIs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Denis Bolotin authored
Configuration information is added to the debug data collection, in addition to register dump. Added qed_dbg_nvm_image() that receives an image type, allocates a buffer and reads the image. The images are saved in the buffers and the dump size is updated. Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Denis Bolotin authored
Since nvm images attributes are cached during driver load, acquiring ptt is not needed when calling qed_mcp_get_nvm_image(). Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Move all the core version detection to a common place ("hwif.c") and implement a table which can be used to lookup the correct callbacks for each IP version. This simplifies the initialization flow of each IP version and eases future implementation of new IP versions. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Vitor Soares <soares@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
There's no benefit in using netif_info et al before the net_device has been registered. We get messages like r8169 0000:03:00.0 (unnamed net_device) (uninitialized): [message] Therefore use dev_info/dev_err instead. As a side effect we don't need parameter dev for function rtl8169_get_mac_version() any longer. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yafang Shao authored
With sk_cookie we can identify a socket, that is very helpful for traceing and statistic, i.e. tcp tracepiont and ebpf. So we'd better init it by default for inet socket. When using it, we just need call atomic64_read(&sk->sk_cookie). Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Roopa Prabhu says: ==================== fib rules extack support First patch refactors code to move fib rule netlink handling into a common function. This became obvious when adding duplicate extack msgs in add and del paths. Second patch adds extack msgs. v2 - Dropped the ip route get support and selftests from the series to look at the input path some more (as pointed out by ido). Will come back to that next week when i have some time. resending just the extack part for now. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This reduces code duplication in the fib rule add and del paths. Get rid of validate_rulemsg. This became obvious when adding duplicate extack support in fib newrule/delrule error paths. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Mashak authored
Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yafang Shao authored
tcp_rcv_space_adjust is called every time data is copied to user space, introducing a tcp tracepoint for which could show us when the packet is copied to user. When a tcp packet arrives, tcp_rcv_established() will be called and with the existed tracepoint tcp_probe we could get the time when this packet arrives. Then this packet will be copied to user, and tcp_rcv_space_adjust will be called and with this new introduced tracepoint we could get the time when this packet is copied to user. With these two tracepoints, we could figure out whether the user program processes this packet immediately or there's latency. Hence in the printk message, sk_cookie is printed as a key to relate tcp_rcv_space_adjust with tcp_probe. Maybe we could export sockfd in this new tracepoint as well, then we could relate this new tracepoint with epoll/read/recv* tracepoints, and finally that could show us the whole lifespan of this packet. But we could also implement that with pid as these functions are executed in process context. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
The conversion of rndis friendly name to utf8 uses a standard kernel routine which is optional in config. Therefore build would fail for some configurations. Resolve by selecting needed library. Fixes: 0fe554a4 ("hv_netvsc: propogate Hyper-V friendly name into interface alias") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 21 Apr, 2018 10 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller authored
Conflicts were simple overlapping changes in microchip driver. Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
David Ahern says: ==================== net/ipv6: Another followup to the fib6_info change Last one - for this week. Patches 1, 2 and 7 are more cleanup patches - removing dead code, moving code from a header to near its single caller, and updating function name. Patches 3-5 do some refactoring leading up to patch 6 which fixes a NULL dereference. I have only managed to trigger a panic once, so I can not definitively confirm it addresses the problem but it seems pretty clear that it is a race on removing a 'from' reference on an rt6_info and another path using that 'from' value to do cookie checking. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Dan reported an imbalance in fib6_check on use of f6i and checking whether it is null. Since fib6_check is only called if f6i is non-null, remove the unnecessary check. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
When a dst entry is created from a fib entry, the 'from' in rt6_info is set to the fib entry. The 'from' reference is used most notably for cookie checking - making sure stale dst entries are updated if the fib entry is changed. When a fib entry is deleted, the pcpu routes on it are walked releasing the fib6_info reference. This is needed for the fib6_info cleanup to happen and to make sure all device references are released in a timely manner. There is a race window when a FIB entry is deleted and the 'from' on the pcpu route is dropped and the pcpu route hits a cookie check. Handle this race using rcu on from. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
A later patch protects 'from' in rt6_info and this simplifies the locking needed by it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
A later patch protects 'from' in rt6_info and this simplifies the locking needed by it. With the move, the fib6_info_hold for the uncached_rt is no longer needed since the rcu_lock is still held. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
rt6_get_cookie_safe takes a fib6_info and checks the sernum of the node. Update the name to reflect its purpose. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
rt6_clean_expires and rt6_set_expires are no longer used. Removed them. rt6_update_expires has 1 caller in route.c, so move it from the header. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-04-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Initial work on BPF Type Format (BTF) is added, which is a meta data format which describes the data types of BPF programs / maps. BTF has its roots from CTF (Compact C-Type format) with a number of changes to it. First use case is to provide a generic pretty print capability for BPF maps inspection, later work will also add BTF to bpftool. pahole support to convert dwarf to BTF will be upstreamed as well (https://github.com/iamkafai/pahole/tree/btf), from Martin. 2) Add a new xdp_bpf_adjust_tail() BPF helper for XDP that allows for changing the data_end pointer. Only shrinking is currently supported which helps for crafting ICMP control messages. Minor changes in drivers have been added where needed so they recalc the packet's length also when data_end was adjusted, from Nikita. 3) Improve bpftool to make it easier to feed hex bytes via cmdline for map operations, from Quentin. 4) Add support for various missing BPF prog types and attach types that have been added to kernel recently but neither to bpftool nor libbpf yet. Doc and bash completion updates have been added as well for bpftool, from Andrey. 5) Proper fix for avoiding to leak info stored in frame data on page reuse for the two bpf_xdp_adjust_{head,meta} helpers by disallowing to move the pointers into struct xdp_frame area, from Jesper. 6) Follow-up compile fix from BTF in order to include stdbool.h in libbpf, from Björn. 7) Few fixes in BPF sample code, that is, a typo on the netdevice in a comment and fixup proper dump of XDP action code in the tracepoint exception, from Wang and Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 20 Apr, 2018 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermalLinus Torvalds authored
Pull thermal fixes from Eduardo Valentin: "A couple of fixes for the thermal subsystem" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: dt-bindings: thermal: Remove "cooling-{min|max}-level" properties dt-bindings: thermal: remove no longer needed samsung thermal properties
-