-
Alex Williamson authored
I've been doing some performance tuning and adding some functionality to sba_iommu for zx1/sx1000 chipsets. This adds: * Long overdue consistent_dma_mask support * Long overdue ability to do large mappings in the iommu * Tightened spinlock usage for better performance/scalability * Added branch prediction hints for some of the performance paths * Added explicit data prefetching to some performance paths - perfmon shows roughly a 20% decrease in L3 misses in the bitmap search code * Increased delayed resource freeing depth and added a separate lock per ioc to avoid contention * Added code to free up queued pdir entries should we be unable to find space for new ones (not that I've ever seen the pdir anywhere close to full) * Finished cleaning out the hint support code, Grant is maintaining this separately for now * Added option to control bypass of sg mappings separately from single/coherent mappings Much like the swiotlb, sba_iommu allows devices capable of 64bit addressing to bypass the iommu and DMA directly to/from memory. Using a worst case scenario test (64bit bypass disabled, all DMA mapped through the iommu), I saw a 60% increase in sequential block input throughput using bonnie++ on a large RAID0 MD array. In fact, this patch provides the best bonnie++ performance with bypass disabled. This is likely due to benefits seen from coalescing the scatterlist, allowing better disk streaming. I assume that network performance will likely be limited by mapping latency, so I added the last bullet item to allow sg mappings to get the benefit of coalescing while keeping a low latency path for single and coherent mappings. If anyone is setup for network benchmarks, I'd be interested in a before and after with this patch.
6898da46