- 24 Sep, 2012 21 commits
-
-
Eric Dumazet authored
We currently use a per socket order-0 page cache for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent) But it wastes a lot of memory for applications handling many mostly idle sockets, since each socket holds one page in sk->sk_sndmsg_page Its also quite inefficient to build TSO 64KB packets, because we need about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit page allocator more than wanted. This patch adds a per task frag allocator and uses bigger pages, if available. An automatic fallback is done in case of memory pressure. (up to 32768 bytes per frag, thats order-3 pages on x86) This increases TCP stream performance by 20% on loopback device, but also benefits on other network devices, since 8x less frags are mapped on transmit and unmapped on tx completion. Alexander Duyck mentioned a probable performance win on systems with IOMMU enabled. Its possible some SG enabled hardware cant cope with bigger fragments, but their ndo_start_xmit() should already handle this, splitting a fragment in sub fragments, since some arches have PAGE_SIZE=65536 Successfully tested on various ethernet devices. (ixgbe, igb, bnx2x, tg3, mellanox mlx4) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Claudiu Manoil authored
This is primarily to address transmission timeout occurrences, when multiple H/W Tx queues are being used concurrently. Because in the priority scheduling mode the controller does not service the Tx queues equally (but in ascending index order), Tx timeouts are being triggered rightaway for a basic test with multiple simultaneous connections like: iperf -c <server_ip> -n 100M -P 8 resulting in kernel trace: NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue <X> timed out ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:255 ... and controller reset during intense traffic, and possibly further complications. This patch changes the default H/W Tx scheduling setting (TXSCHED) for multi-queue devices, from priority scheduling mode to a weighted round robin mode with equal weights for all H/W Tx queues, and addresses the issue above. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
loopback current mtu of 16436 bytes allows no more than 3 MSS TCP segments per frame, or 48 Kbytes. Changing mtu to 64K allows TCP stack to build large frames and significantly reduces stack overhead. Performance boost on bulk TCP transferts can be up to 30 %, partly because we now have one ACK message for two 64KB segments, and a lower probability of hitting /proc/sys/net/ipv4/tcp_reordering default limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
All callers provide a non-NULL scm argument. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Merav Sicron authored
This patch changes the definition of bnx2x_tests_str_arr from static char pointer to static const char bi-directional array. Also the bnx2x_get_strings function is simplified. Reported-by: Joe Perches <joe@perches.com> Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Merav Sicron <meravs@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Senna Tschudin authored
Convert a nonnegative error return code to a negative one, as returned elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> ( if@p1 (\(ret < 0\|ret != 0\)) { ... return ret; } | ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Senna Tschudin authored
removes unnecessary semicolon Found by Coccinelle: http://coccinelle.lip6.fr/Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
ipaddr has been allocated in function qeth_l3_add_vipa() but does not free before leaving from the error handling cases. The same problem also exists in function qeth_l3_add_rxip(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Ott authored
Make sure that all ccws used for writing are initialized with zeros - especially since the last ccw contains a TIC for which the unused fields have to be zeros. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Ott authored
Cleanup the qeth_get_channel_path_desc function and rename it to qeth_update_from_chp_desc. No functional change. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Acked-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://1984.lsi.us.es/nf-nextDavid S. Miller authored
Pablo Neira Ayuso says: ==================== This patchset contains updates for your net-next tree, they are: * Mostly fixes for the recently pushed IPv6 NAT support: - Fix crash while removing nf_nat modules from Patrick McHardy. - Fix unbalanced rcu_read_unlock from Ulrich Weber. - Merge NETMAP and REDIRECT into one single xt_target module, from Jan Engelhardt. - Fix Kconfig for IPv6 NAT, which allows inconsistent configurations, from myself. * Updates for ipset, all of the from Jozsef Kadlecsik: - Add the new "nomatch" option to obtain reverse set matching. - Support for /0 CIDR in hash:net,iface set type. - One non-critical fix for a rare crash due to pass really wrong configuration parameters. - Coding style cleanups. - Sparse fixes. - Add set revision supported via modinfo.i * One extension for the xt_time match, to support matching during the transition between two days with one single rule, from Florian Westphal. * Fix maximum packet length supported by nfnetlink_queue and add NFQA_CAP_LEN attribute, from myself. You can notice that this batch contains a couple of fixes that may go to 3.6-rc but I don't consider them critical to push them: * The ipset fix for the /0 cidr case, which is triggered with one inconsistent command line invocation of ipset. * The nfnetlink_queue maximum packet length supported since it requires the new NFQA_CAP_LEN attribute to provide a full workaround for the described problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pablo Neira Ayuso authored
This patch adds the NFQA_CAP_LEN attribute that allows us to know what is the real packet size from user-space (even if we decided to retrieve just a few bytes from the packet instead of all of it). Security software that inspects packets should always check for this new attribute to make sure that it is inspecting the entire packet. This also helps to provide a workaround for the problem described in: http://marc.info/?l=netfilter-devel&m=134519473212536&w=2 Original idea from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
The packets that we send via NFQUEUE are encapsulated in the NFQA_PAYLOAD attribute. The length of the packet in userspace is obtained via attr->nla_len field. This field contains the size of the Netlink attribute header plus the packet length. If the maximum packet length is specified, ie. 65535 bytes, and packets in the range of (65531,65535] are sent to userspace, the attr->nla_len overflows and it reports bogus lengths to the application. To fix this, this patch limits the maximum packet length to 65531 bytes. If larger packet length is specified, the packet that we send to user-space is truncated to 65531 bytes. To support 65535 bytes packets, we have to revisit the idea of the 32-bits Netlink attribute length. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
This patch allows the FTP helper to pickup the sequence tracking from the first packet seen. This is useful to fix the breakage of the first FTP command after the failover while using conntrackd to synchronize states. The seq_aft_nl_num field in struct nf_ct_ftp_info has been shrinked to 16-bits (enough for what it does), so we can use the remaining 16-bits to store the flags while using the same size for the private FTP helper data. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Currently, if you want to do something like: "match Monday, starting 23:00, for two hours" You need two rules, one for Mon 23:00 to 0:00 and one for Tue 0:00-1:00. The rule: --weekdays Mo --timestart 23:00 --timestop 01:00 looks correct, but it will first match on monday from midnight to 1 a.m. and then again for another hour from 23:00 onwards. This permits userspace to explicitly ignore the day transition and match for a single, continuous time period instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Alexander Duyck authored
With recent kernel changes we can now return errors on a failure to setup a VLAN filter. This patch takes advantage of that opportunity so that we can return either an EIO error in the case of a mailbox failure, or an EACCESS error in the case of being denied access to the VLAN filter table by the PF. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Robert Garrett <robertx.e.garrett@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This change fixes the ixgbevf driver so that it can correctly drop a frame should it receive a jumbo frame. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Greg Rose authored
While fixing up a patch from Alex Duyck to use q_vectors in ring containers to update the ITR I bungled it and missed actually updating the counters in the ring container q_vectors. This patch fixes my mistake and makes interrupt moderation actually work. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Narendra K authored
Remove 'rx_ring' parameter as it is not used in ixgbevf_receive_skb Signed-off-by: Narendra K <narendra_k@dell.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Greg Rose authored
The counter is not valid unless the controller is running in IOV mode. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
The VF driver was not designed to correctly handle a message timeout. As a result it is possible for one bad message to invalidate all messages following it until the part is reset. Instead we should copy the example in igbvf of how to handle a mailbox event and message timeout. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 23 Sep, 2012 2 commits
-
-
David S. Miller authored
Suggested by Jan Engelhardt. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
When recording the number of SYNACK retransmits for servers using TCP Fast Open, fix the code to ensure that we copy over the retransmit count from the request_sock after we receive the ACK that completes the 3-way handshake. The story here is similar to that of SYNACK RTT measurements. Previously we were always doing this in tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we receive the SYN. So for TFO we must copy the final SYNACK retransmit count in tcp_rcv_state_process(). Note that copying over the SYNACK retransmit count will give us the correct count since, as is mentioned in a comment in tcp_retransmit_timer(), before we receive an ACK for our SYN-ACK a TFO passive connection does not retransmit anything else (e.g., data or FIN segments). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 22 Sep, 2012 17 commits
-
-
David S. Miller authored
We're now using isa_virt_to_bus(), and there really isn't a generic and consistent test for whether a platform provides this interface or not. This driver is also for an x86-only device. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jozsef Kadlecsik authored
Exceptions can now be matched and we can branch according to the possible cases: a. match in the set if the element is not flagged as "nomatch" b. match in the set if the element is flagged with "nomatch" c. no match i.e. iptables ... -m set --match-set ... -j ... iptables ... -m set --match-set ... --nomatch-entries -j ... ... Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Jozsef Kadlecsik authored
Now it is possible to setup a single hash:net,iface type of set and a single ip6?tables match which covers all egress/ingress filtering. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Neal Cardwell authored
A TCP Fast Open (TFO) passive connection must call both tcp_check_req() and tcp_validate_incoming() for all incoming ACKs that are attempting to complete the 3WHS. This is needed to parallel all the action that happens for a non-TFO connection, where for an ACK that is attempting to complete the 3WHS we call both tcp_check_req() and tcp_validate_incoming(). For example, upon receiving the ACK that completes the 3WHS, we need to call tcp_fast_parse_options() and update ts_recent based on the incoming timestamp value in the ACK. One symptom of the problem with the previous code was that for passive TFO connections using TCP timestamps, the outgoing TS ecr values ignored the incoming TS val value on the ACK that completed the 3WHS. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
Previously, when using TCP Fast Open a server would return from tcp_check_req() before updating snt_synack based on TCP timestamp echo replies and whether or not we've retransmitted the SYNACK. The result was that (a) for TFO connections using timestamps we used an incorrect baseline SYNACK send time (tcp_time_stamp of SYNACK send instead of rcv_tsecr), and (b) for TFO connections that do not have TCP timestamps but retransmit the SYNACK we took a SYNACK RTT sample when we should not take a sample. This fix merely moves the snt_synack update logic a bit earlier in the function, so that connections using TCP Fast Open will properly do these updates when the ACK for the SYNACK arrives. Moving this snt_synack update logic means that with TCP_DEFER_ACCEPT enabled we do a few instructions of wasted work on each bare ACK, but that seems OK. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
When taking SYNACK RTT samples for servers using TCP Fast Open, fix the code to ensure that we only call tcp_valid_rtt_meas() after we receive the ACK that completes the 3-way handshake. Previously we were always taking an RTT sample in tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we receive the SYN. So for TFO we must wait until tcp_rcv_state_process() to take the RTT sample. To fix this, we wait until after TFO calls tcp_v4_syn_recv_sock() before we set the snt_synack timestamp, since tcp_synack_rtt_meas() already ensures that we only take a SYNACK RTT sample if snt_synack is non-zero. To be careful, we only take a snt_synack timestamp when a SYNACK transmit or retransmit succeeds. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
In preparation for adding another spot where we compute the SYNACK RTT, extract this code so that it can be shared. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
There has been some confusion among PHC driver authors about the intended purpose of the clock_name attribute. This patch expands the documation in order to clarify how the clock_name field should be understood. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
PTP Hardware Clock devices appear as class devices in sysfs. This patch changes the registration API to use the parent device, clarifying the clock's relationship to the underlying device. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
If the timex.mode field indicates a query, then we provide the value of the current frequency adjustment. [ Get rid of extraneous empty lines -DaveM ] Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Cochran authored
This patch adds a field to the representation of a PTP hardware clock in order to remember the frequency adjustment value dialed by the user. Adding this field will let us answer queries in the manner of adjtimex in a follow on patch. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-nextDavid S. Miller authored
Jeff Kirsher says: ==================== This series contains updates to igb only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andreas Larsson authored
One-shot mode uses the TCS bit of the status register to discern whether a transmission was successful or not. On a failed transmission, the frame is not echoed back. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Alexander Duyck authored
This change is meant to improve performance on systems that do not require the DMA unmap calls. On those systems we do not need to make use of the unmap address for Tx or the unmap length so we can drop both thereby reducing the size of the Tx buffer info structure. In addition I have changed the logic to check for unmap length instead of unmap address when checking to see if a buffer needs to be unmapped from DMA use. The reasons for this change is that on some platforms it is possible to receive a valid DMA address of 0 from an IOMMU. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-