- 06 Oct, 2017 36 commits
-
-
Bjorn Helgaas authored
Use pci_ari_enabled() from the PCI core instead of the identical local copy bnx2x_ari_enabled(). No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jesper Dangaard Brouer says: ==================== Improve xdp_monitor samples/bpf Here are some improvements to the xdp_monitor tool currently located under samples/bpf/. Once the tools library libbpf become more feature complete, xdp_monitor should be converted to use it, and be moved into tools/bpf/xdp/ or tools/xdp/. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
Other concurrent running programs, like perf or the XDP program what needed to be monitored, might take up part of the max locked memory limit. Thus, the xdp_monitor tool have to set the RLIMIT_MEMLOCK to RLIM_INFINITY, as it cannot determine a more sane limit. Using the man exit(3) specified EXIT_FAILURE return exit code, and correct other users too. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
Also monitor the tracepoint xdp_exception. This tracepoint is usually invoked by the drivers. Programs themselves can activate this by returning XDP_ABORTED, which will drop the packet but also trigger the tracepoint. This is useful for distinguishing intentional (XDP_DROP) vs. ebpf-program error cases that cased a drop (XDP_ABORTED). Drivers also use this tracepoint for reporting on XDP actions that are unknown to the specific driver. This can help the user to detect if a driver e.g. doesn't implement XDP_REDIRECT yet. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
The first 8 bytes of the tracepoint context struct are not accessible by the bpf code. This is a choice that dates back to the original inclusion of this code. See explaination in: commit 98b5c2c6 ("perf, bpf: allow bpf programs attach to tracepoints") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Simon Horman says: ==================== nfp: extend match and action for flower offload Pieter says: This series extends flower offload match and action capabilities. It specifically adds offload capabilities for matching on MPLS, TTL, TOS and flow label. Furthermore offload capabilities for action have been expanded to include set ethernet, ipv4, ipv6, tcp and udp headers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Previously we did not have offloading support for set TCP/UDP actions. This patch enables TC flower offload of set TCP/UDP sport and dport actions. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Previously we did not have offloading support for set IPv6 actions. This patch enables TC flower offload of set IPv6 src and dst address actions. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Previously we did not have offloading support for set IPv4 actions. This patch enables TC flower offload of set IPv4 src and dst address actions. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Previously we did not have offloading support for set ethernet actions. This patch enables TC flower offload of set ethernet actions. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Previously matching on IPv6 ttl and tos fields were not offloaded. This patch enables offloading IPv6 ttl and tos as match fields. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Previously matching on IPv4 ttl and tos fields were not offloaded. This patch enables offloading IPv4 ttl and tos as match fields. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pieter Jansen van Vuuren authored
Previously MPLS match offloading was not supported. This patch enables MPLS match offloading support for label, bos and tc fields. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
commit cc71b7b0 ("net/ipv6: remove unused err variable on icmpv6_push_pending_frames") exposed icmpv6_push_pending_frames return value not being used. Remove now unnecessary int err declarations and uses. Miscellanea: o Remove unnecessary goto and out: labels o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tim Hansen authored
int err is unused by icmpv6_push_pending_frames(), this patch returns removes the variable and returns the function with 0. git bisect shows this variable has been around since linux has been in git in commit 1da177e4. This was found by running make coccicheck M=net/ipv6/ on linus' tree on commit 77ede3a0 (current HEAD as of this patch). Signed-off-by: Tim Hansen <devtimhansen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lin Zhang authored
Storing the left length of skb into 'len' actually has no effect so we can remove it. Signed-off-by: Lin Zhang <xiaolou4617@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Craig Gallek says: ==================== libbpf: support more map options The functional change to this series is the ability to use flags when creating maps from object files loaded by libbpf. In order to do this, the first patch updates the library to handle map definitions that differ in size from libbpf's struct bpf_map_def. For object files with a larger map definition, libbpf will continue to load if the unknown fields are all zero, otherwise the map is rejected. If the map definition in the object file is smaller than expected, libbpf will use zero as a default value in the missing fields. ==================== Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Craig Gallek authored
This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type which requires flags. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Craig Gallek authored
This library previously assumed a fixed-size map options structure. Any new options were ignored. In order to allow the options structure to grow and to support parsing older programs, this patch updates the maps section parsing to handle varying sizes. Object files with maps sections smaller than expected will have the new fields initialized to zero. Object files which have larger than expected maps sections will be rejected unless all of the unrecognized data is zero. This change still assumes that each map definition in the maps section is the same size. Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The function emac_isr is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warnings: symbol 'emac_isr' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Yuchung Cheng says: ==================== tcp: improving RACK cpu performance This patch set improves the CPU consumption of the RACK TCP loss recovery algorithm, in particular for high-speed networks. Currently, for every ACK in recovery RACK can potentially iterate over all sent packets in the write queue. On large BDP networks with non-trivial losses the RACK write queue walk CPU usage becomes unreasonably high. This patch introduces a new queue in TCP that keeps only skbs sent and not yet (s)acked or marked lost, in time order instead of sequence order. With that, RACK can examine this time-sorted list and only check packets that were sent recently, within the reordering window, per ACK. This is the fastest way without any write queue walks. The number of skbs examined per ACK is reduced by orders of magnitude. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Refactor the RACK loop to improve readability and speed up the checks. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Use the new time-ordered list to speed up RACK. The detection logic is identical. But since the list is chronologically ordered by skb_mstamp and contains only skbs not yet acked or sacked, RACK can abort the loop upon hitting skbs that were sent more recently. On YouTube servers this patch reduces the iterations on write queue by 40x. The improvement is even bigger with large BDP networks. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This patch adds a new queue (list) that tracks the sent but not yet acked or SACKed skbs for a TCP connection. The list is chronologically ordered by skb->skb_mstamp (the head is the oldest sent skb). This list will be used to optimize TCP Rack recovery, which checks an skb's timestamp to judge if it has been lost and needs to be retransmitted. Since TCP write queue is ordered by sequence instead of sent time, RACK has to scan over the write queue to catch all eligible packets to detect lost retransmission, and iterates through SACKed skbs repeatedly. Special cares for rare events: 1. TCP repair fakes skb transmission so the send queue needs adjusted 2. SACK reneging would require re-inserting SACKed skbs into the send queue. For now I believe it's not worth the complexity to make RACK work perfectly on SACK reneging, so we do nothing here. 3. Fast Open: currently for non-TFO, send-queue correctly queues the pure SYN packet. For TFO which queues a pure SYN and then a data packet, send-queue only queues the data packet but not the pure SYN due to the structure of TFO code. This is okay because the SYN receiver would never respond with a SACK on a missing SYN (i.e. SYN is never fast-retransmitted by SACK/RACK). In order to not grow sk_buff, we use an union for the new list and _skb_refdst/destructor fields. This is a bit complicated because we need to make sure _skb_refdst and destructor are properly zeroed before skb is cloned/copied at transmit, and before being freed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Avinash Repaka authored
Use max_1m_mrs/max_8k_mrs while setting max_items, as the former variables are set based on the underlying device attributes. Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Avinash Repaka authored
This patch fixes the scope of has_fr and has_fmr variables as they are needed only in rds_ib_add_one(). Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tim Hansen authored
int rc is unmodified after initalization in net/ipv4/route.c, this patch simply cleans up that variable and returns 0. This was found with coccicheck M=net/ipv4/ on linus' tree. Signed-off-by: Tim Hansen <devtimhansen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
This commit does a cleanup and moves tcp_rearm_rto() call in the TFO server case into a previous spot in tcp_rcv_state_process() to make it more compact. This is only a cosmetic change. Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
Currently in the TCP code, the initialization sequence for cached metrics, congestion control, BPF, etc, after successful connection is very inconsistent. This introduces inconsistent bevhavior and is prone to bugs. The current call sequence is as follows: (1) for active case (tcp_finish_connect() case): tcp_mtup_init(sk); icsk->icsk_af_ops->rebuild_header(sk); tcp_init_metrics(sk); tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB); tcp_init_congestion_control(sk); tcp_init_buffer_space(sk); (2) for passive case (tcp_rcv_state_process() TCP_SYN_RECV case): icsk->icsk_af_ops->rebuild_header(sk); tcp_call_bpf(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); tcp_init_congestion_control(sk); tcp_mtup_init(sk); tcp_init_buffer_space(sk); tcp_init_metrics(sk); (3) for TFO passive case (tcp_fastopen_create_child()): inet_csk(child)->icsk_af_ops->rebuild_header(child); tcp_init_congestion_control(child); tcp_mtup_init(child); tcp_init_metrics(child); tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); tcp_init_buffer_space(child); This commit uniforms the above functions to have the following sequence: tcp_mtup_init(sk); icsk->icsk_af_ops->rebuild_header(sk); tcp_init_metrics(sk); tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE/PASSIVE_ESTABLISHED_CB); tcp_init_congestion_control(sk); tcp_init_buffer_space(sk); This sequence is the same as the (1) active case. We pick this sequence because this order correctly allows BPF to override the settings including congestion control module and initial cwnd, etc from the route, and then allows the CC module to see those settings. Suggested-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Stefan Hajnoczi says: ==================== VSOCK: add sock_diag interface v3: * Rebased onto net-next/master and resolved Hyper-V transport conflict v2: * Moved tests to tools/testing/vsock/. I was unable to put them in selftests/ because they require manual setup of a VMware/KVM guest. * Moved to __vsock_in_bound/connected_table() to af_vsock.h * Fixed local variable ordering in Patch 4 There is currently no way for userspace to query open AF_VSOCK sockets. This means ss(8), netstat(8), and other utilities cannot display AF_VSOCK sockets. This patch series adds the netlink sock_diag interface for AF_VSOCK. Userspace programs sent a DUMP request including an sk_state bitmap to filter sockets based on their state (connected, listening, etc). The vsock_diag.ko module replies with information about matching sockets. This userspace ABI is defined in <linux/vm_sockets_diag.h>. The final patch adds a test suite that exercises the basic cases. Jorgen and Dexuan: I have only tested the virtio transport but this should also work for VMCI and Hyper-V. Please give it a shot if you have time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Hajnoczi authored
This patch adds tests for the vsock_diag.ko module. These tests are not self-tests because they require manual set up of a KVM or VMware guest. Please see tools/testing/vsock/README for instructions. The control.h and timeout.h infrastructure can be used for additional AF_VSOCK tests in the future. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Hajnoczi authored
This patch adds the sock_diag interface for querying sockets from userspace. Tools like ss(8) and netstat(8) can use this interface to list open sockets. The userspace ABI is defined in <linux/vm_sockets_diag.h> and includes netlink request and response structs. The request can query sockets based on their sk_state (e.g. listening sockets only) and the response contains socket information fields including the local/remote addresses, inode number, etc. This patch does not dump VMCI pending sockets because I have only tested the virtio transport, which does not use pending sockets. Support can be added later by extending vsock_diag_dump() if needed by VMCI users. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Hajnoczi authored
There are two state fields: socket->state and sock->sk_state. The socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the sock->sk_state typically uses values that match TCP state constants (TCP_CLOSE, TCP_ESTABLISHED). AF_VSOCK does not follow this convention and instead uses SS_* constants for both fields. The sk_state field will be exposed to userspace through the vsock_diag interface for ss(8), netstat(8), and other programs. This patch switches sk_state to TCP state constants so that the meaning of this field is consistent with other address families. Not just AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too. The following mapping was used to convert the code: SS_FREE -> TCP_CLOSE SS_UNCONNECTED -> TCP_CLOSE SS_CONNECTING -> TCP_SYN_SENT SS_CONNECTED -> TCP_ESTABLISHED SS_DISCONNECTING -> TCP_CLOSING VSOCK_SS_LISTEN -> TCP_LISTEN In __vsock_create() the sk_state initialization was dropped because sock_init_data() already initializes sk_state to TCP_CLOSE. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Hajnoczi authored
The vsock_diag.ko module will need to check socket table membership. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Hajnoczi authored
The socket table symbols need to be exported from vsock.ko so that the vsock_diag.ko module will be able to traverse sockets. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller authored
Just simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 05 Oct, 2017 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fix from Rafael Wysocki: "This fixes a code ordering issue in the main suspend-to-idle loop that causes some "low power S0 idle" conditions to be incorrectly reported as unmet with suspend/resume debug messages enabled" * tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / s2idle: Invoke the ->wake() platform callback earlier
-
Rafael J. Wysocki authored
* pm-sleep: PM / s2idle: Invoke the ->wake() platform callback earlier
-
Linus Torvalds authored
Merge tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a stable fix for the alignment of the event number reported at the end of the 'DM_LIST_DEVICES' ioctl. - a couple stable fixes for the DM crypt target. - a DM raid health status reporting fix. * tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm raid: fix incorrect status output at the end of a "recover" process dm crypt: reject sector_size feature if device length is not aligned to it dm crypt: fix memory leak in crypt_ctr_cipher_old() dm ioctl: fix alignment of event number in the device list
-
Jonathan Brassow authored
There are three important fields that indicate the overall health and status of an array: dev_health, sync_ratio, and sync_action. They tell us the condition of the devices in the array, and the degree to which the array is synchronized. This commit fixes a condition that is reported incorrectly. When a member of the array is being rebuilt or a new device is added, the "recover" process is used to synchronize it with the rest of the array. When the process is complete, but the sync thread hasn't yet been reaped, it is possible for the state of MD to be: mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ] curr_resync_completed = <max dev size> (but not MaxSector) and all rdevs to be In_sync. This causes the 'array_in_sync' output parameter that is passed to rs_get_progress() to be computed incorrectly and reported as 'false' -- or not in-sync. This in turn causes the dev_health status characters to be reported as all 'a', rather than the proper 'A'. This can cause erroneous output for several seconds at a time when tools will want to be checking the condition due to events that are raised at the end of a sync process. Fix this by properly calculating the 'array_in_sync' return parameter in rs_get_progress(). Also, remove an unnecessary intermediate 'recovery_cp' variable in rs_get_progress(). Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-