x86: optimise barriers
According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. Also according to those documents, and according to existing practice in Linux (eg. spin_unlock doesn't enforce ordering), stores to cacheable memory are visible in program order too. Special string stores are safe -- their constituent stores may be out of order, but they must complete in order WRT surrounding stores. Nontemporal stores to WB memory can go out of order, and so they should be fenced explicitly to make them appear in-order WRT other stores. Hence, smp_wmb() may be a simple barrier. http://developer.intel.com/products/processor/manuals/318147.pdf http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf In userspace microbenchmarks on a core2 system, fence instructions range anywhere from around 15 cycles to 50, which may not be totally insignificant in performance critical paths (code size will go down too). However the primary motivation for this is to have the canonical barrier implementation for x86 architecture. smp_rmb on buggy pentium pros remains a locked op, which is apparently required. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Showing
Please register or sign in to comment