- 05 Oct, 2015 12 commits
-
-
Arnd Bergmann authored
We want to deprecate the use of 'struct timespec' on 32-bit architectures, as it is will overflow in 2038. The igb driver uses it to read the current time, and can simply be changed to use ktime_get_real_ts64() instead. Because of hardware limitations, there is still an overflow in year 2106, which we cannot really avoid, but this documents the overflow. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: intel-wired-lan@lists.osuosl.org Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
We want to deprecate the use of 'struct timespec' on 32-bit architectures, as it is will overflow in 2038. The stmmac driver uses it to read the current time, and can simply be changed to use ktime_get_real_ts64() instead. Because of hardware limitations, there is still an overflow in year 2106, which we cannot really avoid, but this documents the overflow. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Richard Cochran <richardcochran@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The fec_ptp_enable_pps uses an open-coded implementation of ns_to_timespec, which will be removed eventually as it is not y2038-safe on 32-bit architectures. Two more instances of the same code in this file were already converted to use the safe ns_to_timespec64 in commit 6630514f ("ptp: fec: use helpers for converting ns to timespec"), this changes the last one as well. The seconds portion here is actually unused and we could just remove the timespec variable, but using ns_to_timespec64 can still be better as the implementation can be hand-optimized in the future. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Fugang Duan <b38611@freescale.com> Cc: Luwei Zhou <b45643@freescale.com> Cc: Frank Li <Frank.Li@freescale.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Peter Nørlund says: ==================== ipv4: Hash-based multipath routing When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed from more or less being destination-based into being quasi-random per-packet scheduling. This increases the risk of out-of-order packets and makes it impossible to use multipath together with anycast services. This patch series replaces the old implementation with flow-based load balancing based on a hash over the source and destination addresses. Distribution of the hash is done with thresholds as described in RFC 2992. This reduces the disruption when a path is added/remove when having more than two paths. To futher the chance of successful usage in conjuction with anycast, ICMP error packets are hashed over the inner IP addresses. This ensures that PMTU will work together with anycast or load-balancers such as IPVS. Port numbers are not considered since fragments could cause problems with anycast and IPVS. Relying on the DF-flag for TCP packets is also insufficient, since ICMP inspection effectively extracts information from the opposite flow which might have a different state of the DF-flag. This is also why the RSS hash is not used. These are typically based on the NDIS RSS spec which mandates TCP support. Measurements of the additional overhead of a two-path multipath (p_mkroute_input excl. __mkroute_input) on a Xeon X3550 (4 cores, 2.66GHz): Original per-packet: ~394 cycles/packet L3 hash: ~76 cycles/packet Changes in v5: - Fixed compilation error Changes in v4: - Functions take hash directly instead of func ptr - Added inline hash function - Added dummy macros to minimize ifdefs - Use upper 31 bits of hash instead of lower Changes in v3: - Multipath algorithm is no longer configurable (always L3) - Added random seed to hash - Moved ICMP inspection to isolated function - Ignore source quench packets (deprecated as per RFC 6633) Changes in v2: - Replaced 8-bit xor hash with 31-bit jenkins hash - Don't scale weights (since 31-bit) - Avoided unnecesary renaming of variables - Rely on DF-bit instead of fragment offset when checking for fragmentation - upper_bound is now inclusive to avoid overflow - Use a callback to postpone extracting flow information until necessary - Skipped ICMP inspection entirely with L4 hashing - Handle newly added sysctl ignore_routes_with_linkdown ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Nørlund authored
ICMP packets are inspected to let them route together with the flow they belong to, minimizing the chance that a problematic path will affect flows on other paths, and so that anycast environments can work with ECMP. Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Nørlund authored
Replaces the per-packet multipath with a hash-based multipath using source and destination address. Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== tcp: lockless listener fixes and improvement This fixes issues with TCP FastOpen vs lockless listeners, and SYNACK being attached to request sockets. Then, last patch brings performance improvement for syncookies generation and validation. Tested under a 4.3 Mpps SYNFLOOD attack, new perf profile looks like : 12.11% [kernel] [k] sha_transform 5.83% [kernel] [k] tcp_conn_request 4.59% [kernel] [k] __inet_lookup_listener 4.11% [kernel] [k] ipt_do_table 3.91% [kernel] [k] tcp_make_synack 3.05% [kernel] [k] fib_table_lookup 2.74% [kernel] [k] sock_wfree 2.66% [kernel] [k] memcpy_erms 2.12% [kernel] [k] tcp_v4_rcv ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
inet_reqsk_alloc() is used to allocate a temporary request in order to generate a SYNACK with a cookie. Then later, syncookie validation also uses a temporary request. These paths already took a reference on listener refcount, we can avoid a couple of atomic operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a sk_dst_cache pointer. Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6 pointer. Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
SYN_RECV & TIMEWAIT sockets are not full blown, do not even try to call ip_sk_use_pmtu() on them. Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
There are multiple races that need fixes : 1) skb_get() + queue skb + kfree_skb() is racy An accept() can be done on another cpu, data consumed immediately. tcp_recvmsg() uses __kfree_skb() as it is assumed all skb found in socket receive queue are private. Then the kfree_skb() in tcp_rcv_state_process() uses an already freed skb 2) tcp_reqsk_record_syn() needs to be done before tcp_try_fastopen() for the same reasons. 3) We want to send the SYNACK before queueing child into accept queue, otherwise we might reintroduce the ooo issue fixed in commit 7c85af88 ("tcp: avoid reorders for TFO passive connections") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 04 Oct, 2015 26 commits
-
-
David S. Miller authored
Nikolay Aleksandrov says: ==================== bridge: complete netlink support This set completes the bridge device's netlink support and makes it possible to view and configure everything that can be configured via sysfs. I have tested all of these (setting and getting). There're a few longer line warnings about the br_get_size() ifla comments but I think we should have them to know what has been accounted for. I have used the sysfs interface as a guide of what and how to set. As usual I'll send the corresponding iproute2 patches later. The bridge port's netlink interface will be completed after this set gets applied in some form. This patch-set is on top of my last vlan cleanups set: http://www.spinics.net/lists/netdev/msg346005.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_VLAN_DEFAULT_PVID to allow setting/getting bridge's default_pvid via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add support to allow getting/setting netfilter tables settings. Currently these are IFLA_BR_NF_CALL_IPTABLES, IFLA_BR_NF_CALL_IP6TABLES and IFLA_BR_NF_CALL_ARPTABLES. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add support to set/get all of the igmp's configurable intervals via netlink. These currently are: IFLA_BR_MCAST_LAST_MEMBER_INTVL IFLA_BR_MCAST_MEMBERSHIP_INTVL IFLA_BR_MCAST_QUERIER_INTVL IFLA_BR_MCAST_QUERY_INTVL IFLA_BR_MCAST_QUERY_RESPONSE_INTVL IFLA_BR_MCAST_STARTUP_QUERY_INTVL Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_STARTUP_QUERY_CNT to allow setting/getting br->multicast_startup_query_count via netlink. Also align the ifla comments. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_LAST_MEMBER_CNT to allow setting/getting br->multicast_last_member_count via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_HASH_MAX to allow setting/getting br->hash_max via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_HASH_ELASTICITY to allow setting/getting br->hash_elasticity via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_QUERIER to allow setting/getting br->multicast_querier via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_QUERY_USE_IFADDR to allow setting/getting br->multicast_query_use_ifaddr via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_SNOOPING to allow enabling/disabling multicast snooping via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_MCAST_ROUTER to allow setting and retrieving br->multicast_router when igmp snooping is enabled. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Simple attribute that flushes the bridge's fdb. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_GROUP_ADDR attribute to allow setting and retrieving the group_addr via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Export the following bridge timers (also exported via sysfs): IFLA_BR_HELLO_TIMER, IFLA_BR_TCN_TIMER, IFLA_BR_TOPOLOGY_CHANGE_TIMER, IFLA_BR_GC_TIMER via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_TOPOLOGY_CHANGE and IFLA_BR_TOPOLOGY_CHANGE_DETECTED and export them via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_ROOT_PATH_COST and export it via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_ROOT_PORT and export it via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_BRIDGE_ID and export br->bridge_id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_ROOT_ID and export br->designated_root via netlink. For this purpose add struct ifla_bridge_id that would represent struct bridge_id. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Add IFLA_BR_GROUP_FWD_MASK attribute to allow setting and retrieving the group_fwd_mask via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Nikolay Aleksandrov says: ==================== bridge: vlan: cleanups & fixes (part 2) This is the second follow-up set with one fix (patch 01) and more cleanups (patches 02,03 and 04). These are minor compared to the previous ones and should be the last before taking on the optimization changes on the fast-path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
The checks that lead to num_vlans change are always what br_vlan_should_use checks for, namely if the vlan is only a context or not and depending on that it's either not counted or counted as a real/used vlan respectively. Also give better explanation in br_vlan_should_use's comment. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
There's only one user now and we can include the flag directly. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Introduce br_vlan_(get|put)_master which take a reference (or create the master vlan first if it didn't exist) and drop a reference respectively. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
When I did the conversion to rhashtable I missed the required locking of one important user of the vlan list - br_get_link_af_size_filtered() which is called: br_ifinfo_notify() -> br_nlmsg_size() -> br_get_link_af_size_filtered() and the notifications can be sent without holding rtnl. Before this conversion the function relied on using rcu and since we already use rcu to destroy the vlans, we can simply migrate the list to use the rcu helpers. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 03 Oct, 2015 2 commits
-
-
Eric Dumazet authored
Before letting request sockets being put in TCP/DCCP regular ehash table, we need to add either : - SLAB_DESTROY_BY_RCU flag to their kmem_cache - add RCU grace period before freeing them. Since we carefully respected the SLAB_DESTROY_BY_RCU protocol like ESTABLISH and TIMEWAIT sockets, use it here. req_prot_init() being only used by TCP and DCCP, I did not add a new slab_flags into their rsk_prot, but reuse prot->slab_flags Since all reqsk_alloc() users are correctly dealing with a failure, add the __GFP_NOWARN flag to avoid traces under pressure. Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-09-30 This series contains updates to i40e and i40evf only. Vasily Averin provides a couple of rtnl lock/unlock fixes for both i40e and i40evf. Shannon provides several updates and fixes, first fixes up a type clash in i40e_aq_rc_to_posix(), where the error codes are signed values, so we need to treat them as such. Then fixes up a padding issue where an extra byte is added in i40e_aqc_get_cee_dcb_cfg_v1_resp to directly acknowledge the padding. Updated i40e to keep debugfs register read and writes from accessing outside of the io-remapped space. Added support and device id for another 20 GbE device. Jesse fixes the transmit hand workaround code for ARM that was causing Tx hangs to still occur occasionally when there really was no hang. Then fixed the receive dropped counter to show up in netstat interface. Refactor the interrupt enable function since it was always making the caller add the base_vector from the VSI struct which is already passed to the function. Fix kbuild warnings found in 0day build infrastructure by adding a harmless cast to a dev_info(), also fix 32 bit build warnings found by sparse. Greg fixed a configuration error that results if a port VLAN is set for a VF before the VF driver is loaded, so that when the VF driver is loaded the port VLAN is ignored. Mitch fixes the use of QOS field consistently in i40e_ndo_set_vf_port_vlan(). Modified the init timing of the driver to increase stability on load/unload and SR-IOV enable/disable cycles. Anjali updates i40e to not collect VEB stats if they are disabled in the hardware for performance reasons. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-