- 05 Feb, 2015 10 commits
-
-
Moni Shoua authored
Supply interface functions to bond and unbond ports of a mlx4 internal interfaces. Example for such an interface is the one registered by the mlx4 IB driver under RoCE. There are 1. Functions to go in/out to/from bonded mode 2. Function to remap virtual ports to physical ports The bond_mutex prevents simultaneous access to data that keep status of the device in bonded mode. The upper mlx4 interface marks to the mlx4 core module that they want to be subject for such bonding by setting the MLX4_INTFF_BONDING flag. Interface which goes to/from bonded mode is re-created. The mlx4 Ethernet driver does not set this flag when registering the interface, the IB driver does. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moni Shoua authored
Implement the hardware interface required for port aggregation. 1. Disable RX port check on receive - don't perform a validity check that matches to QP's port and the port where the packet is received. 2. Virtual to physical port remap - configure virtual to physical port mapping. Port remap capability for virtual functions. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moni Shoua authored
Use notifier chain to dispatch an event upon a change in slave state. Event is dispatched with slave specific info. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moni Shoua authored
Move slave state changes to a helper function, this is a pre-step for adding functionality of dispatching an event when this helper is called. This commit doesn't add new functionality. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moni Shoua authored
Add event which provides an indication on a change in the state of a bonding slave. The event handler should cast the pointer to the appropriate type (struct netdev_bonding_info) in order to get the full info about the slave. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jon Maloy says: ==================== tipc: some small fixes During extensive testing and analysis of running dual links between nodes, we have encountered some issues that potentially may cause problems. We choose to fix those proactively in this series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Paul Maloy authored
When a new link instance is created, it is trigged to start by sending it a TIPC_STARTING_EVT, whereafter a regular link reset is applied to it. The starting event is codewise treated as a timeout event, and prompts a link RESET message to be sent to the peer node, carrying a link session identifier. The later link_reset() call nudges this session identifier, whereafter all subsequent RESET messages will be sent out with the new identifier. The latter session number overrides the former, causing the peer to unconditionally accept it irrespective of its current working state. We don't think that this causes any problem, but it is not in accordance with the protocol spec, and may cause confusion when debugging TIPC sessions. To avoid this, we make the starting event distinct from the subsequent timeout events, by not allowing the former to send out any RESET message. This eliminates the described problem. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Paul Maloy authored
Instances of struct node are created in the function tipc_disc_rcv() under the assumption that there is no race between received discovery messages arriving from the same node. This assumption is wrong. When we use more than one bearer, it is possible that discovery messages from the same node arrive at the same moment, resulting in creation of two instances of struct tipc_node. This may later cause confusion during link establishment, and may result in one of the links never becoming activated. We fix this by making lookup and potential creation of nodes atomic. Instead of first looking up the node, and in case of failure, create it, we now start with looking up the node inside node_link_create(), and return a reference to that one if found. Otherwise, we go ahead and create the node as we did before. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Paul Maloy authored
During link failover it may happen that the remaining link goes down while it is still in the process of taking over traffic from a previously failed link. When this happens, we currently abort the failover procedure and reset the first failed link to non-failover mode, so that it will be ready to re-establish contact with its peer when it comes available. However, if the first link goes down because its bearer was manually disabled, it is not enough to reset it; it must also be deleted; which is supposed to happen when the failover procedure is finished. Otherwise it will remain a zombie link: attached to the owner node structure, in mode LINK_STOPPED, and permanently blocking any re- establishing of the link to the peer via the interface in question. We fix this by amending the failover abort procedure. Apart from resetting the link to non-failover state, we test if the link is also in LINK_STOPPED mode. If so, we delete it, using the conditional tipc_link_delete() function introduced in the previous commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Paul Maloy authored
When a bearer is disabled, all pertaining links will be reset and deleted. However, if there is a second active link towards a killed link's destination, the delete has to be postponed until the failover is finished. During this interval, we currently put the link in zombie mode, i.e., we take it out of traffic, delete its timer, but leave it attached to the owner node structure until all missing packets have been received. When this is done, we detach the link from its node and delete it, assuming that the synchronous timer deletion that was initiated earlier in a different thread has finished. This is unsafe, as the failover may finish before del_timer_sync() has returned in the other thread. We fix this by adding an atomic reference counter of type kref in struct tipc_link. The counter keeps track of the references kept to the link by the owner node and the timer. We then do a conditional delete, based on the reference counter, both after the failover has been finished and when the timer expires, if applicable. Whoever comes last, will actually delete the link. This approach also implies that we can make the deletion of the timer asynchronous. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 04 Feb, 2015 14 commits
-
-
David S. Miller authored
Merge tag 'mac80211-next-for-davem-2015-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Last round of updates for net-next: * revert a patch that caused a regression with mesh userspace (Bob) * fix a number of suspend/resume related races (from Emmanuel, Luca and myself - we'll look at backporting later) * add software implementations for new ciphers (Jouni) * add a new ACPI ID for Broadcom's rfkill (Mika) * allow using netns FD for wireless (Vadim) * some other cleanups (various) Signed-off-by: David S. Miller <davem@davemloft.net>
-
Praveen Madhavan authored
This patch is to use firmware version macros from t4fw_version.h and also enables 40g T5 adapter. Signed-off-by: Praveen Madhavan <praveenm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicholas Mc Guire authored
This is only an API consolidation and should make things more readable it replaces var * HZ / 1000 by msecs_to_jiffies(var). As there is a discrepancy between the code and the comments this is in a separate patch. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicholas Mc Guire authored
This is only an API consolidation and should make things more readable it replaces var * HZ / 1000 by msecs_to_jiffies(var). Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-02-03 Here's what's likely the last bluetooth-next pull request for 3.20. Notable changes include: - xHCI workaround + a new id for the ath3k driver - Several new ids for the btusb driver - Support for new Intel Bluetooth controllers - Minor cleanups to ieee802154 code - Nested sleep warning fix in socket accept() code path - Fixes for Out of Band pairing handling - Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER - Improvements to data we expose through debugfs - Proper handling of Hardware Error HCI events Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This patch adds skb_remcsum_process and skb_gro_remcsum_process to perform the appropriate adjustments to the skb when receiving remote checksum offload. Updated vxlan and gue to use these functions. Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did not see any change in performance. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Siva Mannem authored
bridge: Let bridge not age 'externally' learnt FDB entries, they are removed when 'external' entity notifies the aging When 'learned_sync' flag is turned on, the offloaded switch port syncs learned MAC addresses to bridge's FDB via switchdev notifier (NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are wrongly being deleted by bridge aging logic. This patch ensures that FDB entries synced from offloaded switch ports are not deleted by bridging logic. Such entries can only be deleted via switchdev notifier (NETDEV_SWITCH_FDB_DEL). Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
LEROY Christophe authored
Freescale ethernet controllers have the capability to re-assemble fragmented data into a single ethernet frame. This patch uses this capability and implements NETIP_F_SG feature into the fs_enet ethernet driver. On a MPC885, I get 53% performance improvement on a ftp transfer of a 15Mb file: * Without the patch : 2,8 Mbps * With the patch : 4,3 Mbps Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
A typical qdisc setup is the following : bond0 : bonding device, using HTB hierarchy eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc XPS allows to spread packets on specific tx queues, based on the cpu doing the send. Problem is that dequeues from bond0 qdisc can happen on random cpus, due to the fact that qdisc_run() can dequeue a batch of packets. CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1 CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0 CPUB -> dequeue packet P1 from bond0 enqueue packet on eth1/eth2 CPUC -> dequeue packet P2 from bond0 enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0) get_xps_queue() then might select wrong queue for P1, since current cpu might be different than CPUA. P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping), if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock) Effect of this bug is TCP reorders, and more generally not optimal TX queue placement. (A victim bulk flow can be migrated to the wrong TX queue for a while) To fix this, we have to record sender cpu number the first time dev_queue_xmit() is called for one tx skb. We can union napi_id (used on receive path) and sender_cpu, granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for this union idea) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Markus Elfring says: ==================== netlabel: Deletion of a few unnecessary checks Further update suggestions were taken into account after patches were applied from static source code analysis. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Markus Elfring authored
The functions "cipso_v4_doi_putdef" and "kfree" could be called in some cases by the netlbl_mgmt_add_common() function during error handling even if the passed variables contained still a null pointer. * This implementation detail could be improved by adjustments for jump labels. * Let us return immediately after the first failed function call according to the current Linux coding style convention. * Let us delete also an unnecessary check for the variable "entry" there. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Markus Elfring authored
The cipso_v4_doi_free() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Markus Elfring authored
The cipso_v4_doi_putdef() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shruti Kanetkar authored
The device tree binding(s) document has fallen out of sync with the driver code. Update the list of supported devices to reflect current driver capabilities Change-Id: I440d8de2ee2d9c3b7b23e69b3da851cab18a4c9a Signed-off-by: Shruti Kanetkar <Kanetkar.Shruti@gmail.com> Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 03 Feb, 2015 16 commits
-
-
Mika Westerberg authored
This is yet another Broadcom bluetooth chip with ACPI ID BCM2E40. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
-
Johan Hedberg authored
The bnep_get_device function may be triggered by an ioctl just after a connection has gone down. In such a case the respective L2CAP chan->conn pointer will get set to NULL (by l2cap_chan_del). This patch adds a missing NULL check for this case in the bnep_get_device() function. Reported-by: Patrik Flykt <patrik.flykt@linux.intel.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
-
Matej Dubovy authored
Please add support for sub BT chip on the combo card Broadcom 43142A0 (in Lenovo E145), 04ca:2007 /sys/kernel/debug/usb/devices T: Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 3 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=04ca ProdID=2007 Rev= 1.12 S: Manufacturer=Broadcom Corp S: Product=BCM43142A0 S: SerialNumber=28E347EC73BD C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr= 0mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none) E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none) E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none) E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none) E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none) E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none) E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none) E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=84(I) Atr=02(Bulk) MxPS= 32 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 32 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none) Firmware for 04ca:2007 can be extracted from the latest Lenovo E145 Bluetooth driver for Windows (driver is however described as BCM20702 but contains also firwmare for BCM43142). Search for BCM43142A0_001.001.011.0122.0153.hex within hex files, then it must be converted using hex2hcd utility. Rename file to BCM43142A0-04ca-2007.hcd, then move to /lib/firmware/brcm/. Signed-off-by: Matej Dubovy <matej.dubovy@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org
-
Markus Elfring authored
The kfree() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-By: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladislav Yasevich says: ==================== ipv6: Add lockless UDP send path This series introduces a lockless UDPv6 send path similar to what Herbert Xu did for IPv4 a while ago. There are some difference from IPv4. IPv6 caching for flow label is a bit different, as well as it requires another cork cork structure that holds the IPv6 ancillary data. Please take a look. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
Currntly, if we are not doing UFO on the packet, all UDP packets will start with CHECKSUM_NONE and thus perform full checksum computations in software even if device support IPv6 checksum offloading. Let's start start with CHECKSUM_PARTIAL if the device supports it and we are sending only a single packet at or below mtu size. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
This commit adds the same functionaliy to IPv6 that commit 903ab86d Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Tue Mar 1 02:36:48 2011 +0000 udp: Add lockless transmit path added to IPv4. UDP transmit path can now run without a socket lock, thus allowing multiple threads to send to a single socket more efficiently. This is only used when corking/MSG_MORE is not used. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
Now that we can individually construct IPv6 skbs to send, add a udpv6_send_skb() function to populate the udp header and send the skb. This allows udp_v6_push_pending_frames() to re-use this function as well as enables us to add lockless sendmsg() support. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
This commit is very similar to commit 1c32c5ad Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Tue Mar 1 02:36:47 2011 +0000 inet: Add ip_make_skb and ip_finish_skb It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb. The job of ip6_make_skb is to collect messages into an ipv6 packet and poplulate ipv6 eader. The job of ip6_finish_skb is to transmit the generated skb. Together they replicated the job of ip6_push_pending_frames() while also provide the capability to be called independently. This will be needed to add lockless UDP sendmsg support. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
Add the ability to append data to arbitrary queue. This will be needed later to implement lockless UDP sends. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Yasevich authored
Pull IPv6 cork initialization into its own function that can be re-used. IPv6 specific cork data did not have an explicit data structure. This patch creats eone so that just ipv6 cork data can be as arguemts. Also, since IPv6 tries to save the flow label into inet_cork_full tructure, pass the full cork. Adjust ip6_cork_release() to take cork data structures. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anish Bhatt authored
* Add support for IEEE ets & pfc api. * Fix bug that resulted in incorrect bandwidth percentage being returned for CEE peers * Convert pfc enabled info from firmware format to what dcbnl expects before returning Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
ARM has 32-byte cache lines, which according to the comment in the init registers function seems to work best with the default value of 0x4800 that is also used on sparc and parisc. This adds ARM to the same list, to use that default but no longer warn about it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The hip04 ethernet driver causes a new compile-time warning when built as a loadable module: WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/hisilicon/hip04_eth.o see include/linux/module.h for more information This adds the license as "GPL", which matches the header of the file. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
One deployment requirement of DCTCP is to be able to run in a DC setting along with TCP traffic. As Glenn Judd's NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter" [1] (tba) explains, one way to solve this on switch side is to split DCTCP and TCP traffic in two queues per switch port based on the DSCP: one queue soley intended for DCTCP traffic and one for non-DCTCP traffic. For the DCTCP queue, there's the marking threshold K as explained in commit e3118e83 ("net: tcp: add DCTCP congestion control algorithm") for RED marking ECT(0) packets with CE. For the non-DCTCP queue, there's f.e. a classic tail drop queue. As already explained in e3118e83, running DCTCP at scale when not marking SYN/SYN-ACK packets with ECT(0) has severe consequences as for non-ECT(0) packets, traversing the RED marking DCTCP queue will result in a severe reduction of connection probability. This is due to the DCTCP queue being dominated by ECT(0) traffic and switches handle non-ECT traffic in the RED marking queue after passing K as drops, where K is usually a low watermark in order to leave enough tailroom for bursts. Splitting DCTCP traffic among several queues (ECN and non-ECN queue) is being considered a terrible idea in the network community as it splits single flows across multiple network paths. Therefore, commit e3118e83 implements this on Linux as ECT(0) marked traffic, as we argue that marking all packets of a DCTCP flow is the only viable solution and also doesn't speak against the draft. However, recently, a DCTCP implementation for FreeBSD hit also their mainline kernel [2]. In order to let them play well together with Linux' DCTCP, we would need to loosen the requirement that ECT(0) has to be asserted during the 3WHS as not implemented in FreeBSD. This simplifies the ECN test and lets DCTCP work together with FreeBSD. Joint work with Daniel Borkmann. [1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd [2] https://github.com/freebsd/freebsd/commit/8ad879445281027858a7fa706d13e458095b595fSigned-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Glenn Judd <glenn.judd@morganstanley.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Willem de Bruijn says: ==================== net-timestamp: blinding Changes (v2 -> v3) - rebase only: v2 did not make it to patchwork / netdev (v1 -> v2) - fix capability check in patch 2 this could be moved into net/core/sock.c as sk_capable_nouser() (rfc -> v1) - dropped patch 4: timestamp batching due to complexity, as discussed - dropped patch 5: default mode because it does not really cover all use cases, as discussed - added documentation - minor fix, see patch 2 Two issues were raised during recent timestamping discussions: 1. looping full packets on the error queue exposes packet headers 2. TCP timestamping with retransmissions generates many timestamps This RFC patchset is an attempt at addressing both without breaking legacy behavior. Patch 1 reintroduces the "no payload" timestamp option, which loops timestamps onto an empty skb. This reduces the pressure on SO_RCVBUF from looping many timestamps. It does not reduce the number of recv() calls needed to process them. The timestamp cookie mechanism developed in http://patchwork.ozlabs.org/patch/427213/ did, but this is considerably simpler. Patch 2 then gives administrators the power to block all timestamp requests that contain data by unprivileged users. I proposed this earlier as a backward compatible workaround in the discussion of net-timestamp: pull headers for SOCK_STREAM http://patchwork.ozlabs.org/patch/414810/ Patch 3 only updates the txtimestamp example to test this option. Verified that with option '-n', length is zero in all cases and option '-I' (PKTINFO) stops working. ==================== Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-