• Dominik Brodowski's avatar
    x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros · dde3036d
    Dominik Brodowski authored
    Previously, error_entry() and paranoid_entry() saved the GP registers
    onto stack space previously allocated by its callers. Combine these two
    steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro
    for that.
    
    This adds a significant amount ot text size. However, Ingo Molnar points
    out that:
    
    	"these numbers also _very_ significantly over-represent the
    	extra footprint. The assumptions that resulted in
    	us compressing the IRQ entry code have changed very
    	significantly with the new x86 IRQ allocation code we
    	introduced in the last year:
    
    	- IRQ vectors are usually populated in tightly clustered
    	  groups.
    
    	  With our new vector allocator code the typical per CPU
    	  allocation percentage on x86 systems is ~3 device vectors
    	  and ~10 fixed vectors out of ~220 vectors - i.e. a very
    	  low ~6% utilization (!). [...]
    
    	  The days where we allocated a lot of vectors on every
    	  CPU and the compression of the IRQ entry code text
    	  mattered are over.
    
    	- Another issue is that only a small minority of vectors
    	  is frequent enough to actually matter to cache utilization
    	  in practice: 3-4 key IPIs and 1-2 device IRQs at most - and
    	  those vectors tend to be tightly clustered as well into about
    	  two groups, and are probably already on 2-3 cache lines in
    	  practice.
    
    	  For the common case of 'cache cold' IRQs it's the depth of
    	  the call chain and the fragmentation of the resulting I$
    	  that should be the main performance limit - not the overall
    	  size of it.
    
    	- The CPU side cost of IRQ delivery is still very expensive
    	  even in the best, most cached case, as in 'over a thousand
    	  cycles'. So much stuff is done that maybe contemporary x86
    	  IRQ entry microcode already prefetches the IDT entry and its
    	  expected call target address."[*]
    
    [*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com
    
    The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need
    modification. Previously, %rsp was manually decreased by 15*8; with
    this patch, %rsp is decreased by 15 pushq instructions.
    
    [jpoimboe@redhat.com: unwind hint improvements]
    Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: dan.j.williams@intel.com
    Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    dde3036d
calling.h 9.54 KB