• Eric Dumazet's avatar
    sch_netem: faster rb tree removal · 3aa605f2
    Eric Dumazet authored
    While running TCP tests involving netem storing millions of packets,
    I had the idea to speed up tfifo_reset() and did experiments.
    
    I tried the rbtree_postorder_for_each_entry_safe() method that is
    used in skb_rbtree_purge() but discovered it was slower than the
    current tfifo_reset() method.
    
    I measured time taken to release skbs with three occupation levels :
    10^4, 10^5 and 10^6 skbs with three methods :
    
    1) (current 'naive' method)
    
    	while ((p = rb_first(&q->t_root))) {
    		struct sk_buff *skb = netem_rb_to_skb(p);
    
    		rb_erase(p, &q->t_root);
    		rtnl_kfree_skbs(skb, skb);
    	}
    
    2) Use rb_next() instead of rb_first() in the loop :
    
    	p = rb_first(&q->t_root);
    	while (p) {
    		struct sk_buff *skb = netem_rb_to_skb(p);
    
    		p = rb_next(p);
    		rb_erase(&skb->rbnode, &q->t_root);
    		rtnl_kfree_skbs(skb, skb);
    	}
    
    3) "optimized" method using rbtree_postorder_for_each_entry_safe()
    
    	struct sk_buff *skb, *next;
    
    	rbtree_postorder_for_each_entry_safe(skb, next,
    					     &q->t_root, rbnode) {
                   rtnl_kfree_skbs(skb, skb);
    	}
    	q->t_root = RB_ROOT;
    
    Results :
    
    method_1:while (rb_first()) rb_erase() 10000 skbs in 690378 ns (69 ns per skb)
    method_2:rb_first; while (p) { p = rb_next(p); ...}  10000 skbs in 541846 ns (54 ns per skb)
    method_3:rbtree_postorder_for_each_entry_safe() 10000 skbs in 868307 ns (86 ns per skb)
    
    method_1:while (rb_first()) rb_erase() 99996 skbs in 7804021 ns (78 ns per skb)
    method_2:rb_first; while (p) { p = rb_next(p); ...}  100000 skbs in 5942456 ns (59 ns per skb)
    method_3:rbtree_postorder_for_each_entry_safe() 100000 skbs in 11584940 ns (115 ns per skb)
    
    method_1:while (rb_first()) rb_erase() 1000000 skbs in 108577838 ns (108 ns per skb)
    method_2:rb_first; while (p) { p = rb_next(p); ...}  1000000 skbs in 82619635 ns (82 ns per skb)
    method_3:rbtree_postorder_for_each_entry_safe() 1000000 skbs in 127328743 ns (127 ns per skb)
    
    Method 2) is simply faster, probably because it maintains a smaller
    working size set.
    
    Note that this is the method we use in tcp_ofo_queue() already.
    
    I will also change skb_rbtree_purge() in a second patch.
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    3aa605f2
sch_netem.c 27.5 KB