- 02 Mar, 2016 40 commits
-
-
Hariprasad Shenai authored
Adds a new function t4vf_fl_pkt_align() and use the same in SGE initialization code to find out freelist packet alignment Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
T4 and T5 hardware will not coalesce Free List PCI-E Fetch Requests if the Host Driver provides more Free List Pointers than the Fetch Burst Minimum value. So if we set FBMIN to 64 bytes and the Host Driver supplies 128 bytes of Free List Pointer data, the hardware will issue two 64-byte PCI-E Fetch Requests rather than a single coallesced 128-byte Fetch Request. T6 fixes this. So, for T4/T5 we set the FBMIN value to 128 Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Use freelist capacity instead of freelist size while checking, if freelist needs to be refilled Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alexandre TORGUE says: ==================== stmmac: enhance driver performances and update the version According to Giuseppe, I send the v3 series. This is a subset of patches to rework the driver in order to improve its performances and make it more robust under stress conditions. All patches have been ported on STi mainstream kernel branch and tested on ARM STiH4xx platforms and newer ones. This series also updates the driver version and prepares it to include further development to support new chips. In detail, these patches are: o to rework and improve the internal DMA bus settings Fine tuning is mandatory on some platforms for both performance and stability issues. o to rework and optimize the descriptor management. This will help a lot on performance side and preparing the inclusion on the GMAC4.x. o to add a set of optimizations for both xmit and rx functions. These will help a lot on performance side and making the driver more robust in case of low memory conditions and under some stress test, performed for example on IP-STB. Below some throughput figures obtained on some boxes before and after the patches. nuttcp (mbps) iperf (Mbps) ------------------------------------------------------------------ tcp udp tcp udp tx rx tx rx tx rx tx rx ------------------------------------------ old 680 800 480 506 760 800 600 700 new 830 880 540 630 840 880 700 800 V2: - rx_copybreak is now managed by using ethtool. V3: - improve comments on PCIe detailing that there are no regressions - rework some APIs to properly define some params as bool as expected - rework the formula to get the element inside the ring. Comparing V2, patches 4 and 13 have been merged because the same formula have been used. After this rework, no evident benefit has been noticed in terms of performances so the table above is still valid. Disassembling the code for SH4 and ARM, with the new formula just an instr is saved (depending on compiler flags) and this gives us not so relevanti gain, for example, on SH4 where some instr are executed in the same pipeline stage. Ring sizes are now fixed and maybe they can be reworked to be tuned w/o using stmmaceth= cmdline option. Indeed, nobody change these sizes and indeed the numbers selected by default respect the budget and avoid to pass invalid setup. These are the best driver default sizes for ring and chain. ====================
-
Giuseppe Cavallaro authored
This patch just updates the driver to the version fully tested on STi platforms. This version is Oct_2015. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
There is a threshold now used to also limit the skb allocation when use zero-copy. This is to avoid that there are incoherence in the ring due to a failure on skb allocation under very aggressive testing and under low memory conditions. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch is to allow this driver to copy tiny frames during the reception process. This is giving more stability while stressing the driver on STi embedded systems. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fabrice Gasnier authored
phy_bus_name can be NULL when "fixed-link" property isn't used. Then, since "stmmac: do not poll phy handler when attach a switch", phy_bus_name ptr needs to be checked before strcmp is called. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch avoids to call the stmmac_adjust_link when the driver is connected to a switch by using the FIXED_PHY support. Prior this patch the phydev->irq was set as PHY_POLL so periodically the phy handler was invoked spending useless time because the link cannot actually change. Note that the stmmac_adjust_link will be called just one time and this guarantees that the ST glue logic will be setup according to the mode and speed fixed. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch is to fill the first descriptor just before granting the DMA engine so at the end of the xmit. The patch takes care about the algorithm adopted to mitigate the interrupts, then it fixes the last segment in case of no fragments. Moreover, this new implementation does not pass any "ter" field when prepare the descriptors because this is not necessary. The patch also details the memory barrier in the xmit. As final results, this patch guarantees the same performances but fixing a case if small datagram are sent. In fact, this kind of test is impacted if no coalesce is done. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
The dirty index can be updated out of the loop where all the tx resources are claimed. This will help on performances too. Also a useless debug printk has been removed from the main loop. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fabrice Gasnier authored
This patch "inline" get_tx_owner and get_ls routines. It Results in a unique read to tdes0, instead of three, to check TX_OWN and LS bits, and other status bits. It helps improve driver TX path by removing two uncached read/writes inside TX clean loop for enhanced descriptors but not for normal ones because the des1 must be read in any case. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch is to optimize the way to manage the TDES inside the xmit function. When prepare the frame, some settings (e.g. OWN bit) can be merged. This has been reworked to improve the tx performances. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fabrice Gasnier authored
The RDES0 register can be read several times while doing RX of a packet. This patch slightly improves RX path performance by reading rdes0 once for two operation: check rx owner, get rx status bits. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
Optimize tx_clean by avoiding a des3 read in stmmac_clean_desc3(). In ring mode, TX, des3 seems only used when xmit a jumbo frame. In case of normal descriptors, it may also be used for time stamping. Clean it in the above two case, without reading it. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
last_segment field is read twice from dma descriptors in stmmac_clean(). Add last_segment to dma data so that this flag is from priv structure in cache instead of memory. It avoids reading twice from memory for each loop in stmmac_clean(). Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
Currently, the code pulls out the length field when unmapping a buffer directly from the descriptor. This will result in an uncached read to a dma_alloc_coherent() region. There is no need to do this, so this patch simply puts the value directly into a data structure which will hit the cache. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch is to rework the ring management now optimized. The indexes into the ring buffer are always incremented, and the entry is accessed via doing a modulo to find the "real" position in the ring. It is inefficient, modulo is an expensive operation. The formula [(entry + 1) & (size - 1)] is now adopted on a ring that is power-of-2 in size. Then, the number of elements cannot be set by command line but it is fixed. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch completely changes the descriptor layout to improve the whole performances due to the single read usage of the descriptors in critical paths. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch restructures the DMA bus settings and this is done by introducing a new platform structure used for programming the AXI Bus Mode Register inside the DMA module. This structure can be populated from device-tree as documented in the binding txt file. After initializing the DMA, the AXI register can be optionally tuned for platform drivers based. This patch also reworks some parameters to make coherent the DMA configuration now that AXI register is introduced. For example, the burst_len is managed by using the mentioned axi support above; so the snps,burst-len parameter has been removed. It makes sense to provide the AAL parameter from DT to Address-Aligned Beats inside the Register0 and review the PBL settings when initialize the engine. For PCI glue, rebuilding the story of this setting, it was added to align a configuration so not for fixing some known problem. No issue raised after this patch. It is safe to use the default burst length instead of tuning it to the maximum value Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Giuseppe Cavallaro authored
This patch is to share the same reset procedure between dwmac100 and dwmac1000 chips. This will also help on enhancing the driver and support new chips. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Santosh Shilimkar says: ==================== RDS: Major clean-up with couple of new features for 4.6 v3: Re-generated the same series by omitting "-D" option from git format-patch command. Since first patch has file removals, git apply/am can't deal with it when formated with '-D' option. v2: Dropped module parameter from [PATCH 11/13] as suggested by David Miller Series is generated against net-next but also applies against Linus's tip cleanly. Entire patchset is available at below git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.6/net-next/rds_v2 The diff-stat looks bit scary since almost ~4K lines of code is getting removed. Brief summary of the series: - Drop the stale iWARP support: RDS iWarp support code has become stale and non testable for sometime. As discussed and agreed earlier on list, am dropping its support for good. If new iWarp user(s) shows up in future, the plan is to adapt existing IB RDMA with special sink case. - RDS gets SO_TIMESTAMP support - Long due RDS maintainer entry gets updated - Some RDS IB code refactoring towards new FastReg Memory registration (FRMR) - Lastly the initial support for FRMR RDS IB RDMA performance with FRMR is not yet as good as FMR and I do have some patches in progress to address that. But they are not ready for 4.6 so I left them out of this series. Also am keeping eye on new CQ API adaptations like other ULPs doing and will try to adapt RDS for the same most likely in 4.7+ timeframe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Avinash Repaka authored
Fastreg MR(FRMR) is another method with which one can register memory to HCA. Some of the newer HCAs supports only fastreg mr mode, so we need to add support for it to have RDS functional on them. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
Fastreg MR(FRMR) memory registration and invalidation makes use of work request and completion queues for its operation. Patch allocates extra queue space towards these operation(s). Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
Discovere Fast Memmory Registration support using IB device IB_DEVICE_MEM_MGT_EXTENSIONS. Certain HCA might support just FRMR or FMR or both FMR and FRWR. In case both mr type are supported, default FMR is used. Default MR is still kept as FMR against what everyone else is following. Default will be changed to FRMR once the RDS performance with FRMR is comparable with FMR. The work is in progress for the same. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
Add MR reuse statistics to RDS IB transport. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
Drop the RDS connection on RDMA_CM_EVENT_TIMEWAIT_EXIT so that it can reconnect and resume. While testing fastreg, this error happened in couple of tests but was getting un-noticed. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
Preperatory patch for FRMR support. From connection info, we can retrieve cm_id which contains qp handled needed for work request posting. We also need to drop the RDS connection on QP error states where connection handle becomes useful. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
Keep fmr related filed in its own struct. Fastreg MR structure will be added to the union. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
No functional changes. This is in preperation towards adding fastreg memory resgitration support. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
This helps to combine asynchronous fastreg MR completion handler with send completion handler. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
Acked-by: Chien Yen <chien.yen@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
The SO_TIMESTAMP generates time stamp for each incoming RDS messages User app can enable it by using SO_TIMESTAMP setsocketopt() at SOL_SOCKET level. CMSG data of cmsg type SO_TIMESTAMP contains the time stamp in struct timeval format. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
santosh.shilimkar@oracle.com authored
RDS iWarp support code has become stale and non testable. As indicated earlier, am dropping the support for it. If new iWarp user(s) shows up in future, we can adapat the RDS IB transprt for the special RDMA READ sink case. iWarp needs an MR for the RDMA READ sink. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Yuval Mintz says: ==================== qed: update series This patch series tries to improve general configuration by changing configuration to better suit B0 boards and allow more available resources to each physical function. In additition, it contains some small fixes and semantic changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Remove 2 unused fields from driver code. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
In case of problems when initializing the chip, the error flows aren't being properly done. Specifically, it's possible that the chip would be left in a configuration allowing it [internally] to access the host memory, causing fatal problems in the device that would require power cycle to overcome. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Current statistics logic is meant for L2, not for all future protocols. Move this content to the proper designated file. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
BB_A0 is a development model that is will not reach actual clients. In fact, future firmware would simply fail to initialize such chip. This changes the configuration into B0 instead of A0, and adds a safeguard against the slim chance someone would actually try this with an A0 adapter in which case probe would gracefully fail. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-