• Eric Dumazet's avatar
    tcp: add tcp_tx_skb_cache sysctl · 0b7d7f6b
    Eric Dumazet authored
    Feng Tang reported a performance regression after introduction
    of per TCP socket tx/rx caches, for TCP over loopback (netperf)
    
    There is high chance the regression is caused by a change on
    how well the 32 KB per-thread page (current->task_frag) can
    be recycled, and lack of pcp caches for order-3 pages.
    
    I could not reproduce the regression myself, cpus all being
    spinning on the mm spinlocks for page allocs/freeing, regardless
    of enabling or disabling the per tcp socket caches.
    
    It seems best to disable the feature by default, and let
    admins enabling it.
    
    MM layer either needs to provide scalable order-3 pages
    allocations, or could attempt a trylock on zone->lock if
    the caller only attempts to get a high-order page and is
    able to fallback to order-0 ones in case of pressure.
    
    Tests run on a 56 cores host (112 hyper threads)
    
    -	35.49%	netperf 		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
       - 35.49% queued_spin_lock_slowpath
    	  - 18.18% get_page_from_freelist
    		 - __alloc_pages_nodemask
    			- 18.18% alloc_pages_current
    				 skb_page_frag_refill
    				 sk_page_frag_refill
    				 tcp_sendmsg_locked
    				 tcp_sendmsg
    				 inet_sendmsg
    				 sock_sendmsg
    				 __sys_sendto
    				 __x64_sys_sendto
    				 do_syscall_64
    				 entry_SYSCALL_64_after_hwframe
    				 __libc_send
    	  + 17.31% __free_pages_ok
    +	31.43%	swapper 		 [kernel.vmlinux]	  [k] intel_idle
    +	 9.12%	netperf 		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
    +	 6.53%	netserver		 [kernel.vmlinux]	  [k] copy_user_enhanced_fast_string
    +	 0.69%	netserver		 [kernel.vmlinux]	  [k] queued_spin_lock_slowpath
    +	 0.68%	netperf 		 [kernel.vmlinux]	  [k] skb_release_data
    +	 0.52%	netperf 		 [kernel.vmlinux]	  [k] tcp_sendmsg_locked
    	 0.46%	netperf 		 [kernel.vmlinux]	  [k] _raw_spin_lock_irqsave
    
    Fixes: 472c2e07 ("tcp: add one skb cache for tx")
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Reported-by: default avatarFeng Tang <feng.tang@intel.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    0b7d7f6b
sysctl_net_ipv4.c 32.8 KB