[PATCH] Fast path context switch - microoptimize FPU reload
Following some changes on x86-64. When cpu_has_fxsr is defined to 1 like in many kernels unlazy_fpu can collapse to three instructions. For that inlining is a very good idea. Otherwise it's 10 instructions or so, which can be still inlined. We don't need the lock prefix to test our local thread flags state. Unfortunately test_thread_flag currently always uses test_bit which has a LOCK on SMP, but that's unnecessary. LOCK is costly on P4, so it's a good idea to avoid it. Work around this for now by testing directly. Better would be probably to define __set_bit for all architectures to not guarantee atomicity and then always use that for local thread_info accesses in linux/thread_info.h
Showing
Please register or sign in to comment