• Eric Dumazet's avatar
    net_sched: remove need_resched() from qdisc_run() · b60fa1c5
    Eric Dumazet authored
    The introduction of this schedule point was done in commit
    2ba2506c ("[NET]: Add preemption point in qdisc_run")
    at a time the loop was not bounded.
    
    Then later in commit d5b8aa1d ("net_sched: fix dequeuer fairness")
    we added a limit on the number of packets.
    
    Now is the time to remove the schedule point, since the default
    limit of 64 packets matches the number of packets a typical NAPI
    poll can process in a row.
    
    This solves a latency problem for most TCP receivers under moderate load :
    
    1) host receives a packet.
       NET_RX_SOFTIRQ is raised by NIC hard IRQ handler
    
    2) __do_softirq() does its first loop, handling NET_RX_SOFTIRQ
       and calling the driver napi->loop() function
    
    3) TCP stores the skb in socket receive queue:
    
    4) TCP calls sk->sk_data_ready() and wakeups a user thread
       waiting for EPOLLIN (as a result, need_resched() might now be true)
    
    5) TCP cooks an ACK and sends it.
    
    6) qdisc_run() processes one packet from qdisc, and sees need_resched(),
       this raises NET_TX_SOFTIRQ (even if there are no more packets in
       the qdisc)
    
    Then we go back to the __do_softirq() in 2), and we see that new
    softirqs were raised. Since need_resched() is true, we end up waking
    ksoftirqd in this path :
    
        if (pending) {
                if (time_before(jiffies, end) && !need_resched() &&
                    --max_restart)
                        goto restart;
    
                wakeup_softirqd();
        }
    
    So we have many wakeups of ksoftirqd kernel threads,
    and more calls to qdisc_run() with associated lock overhead.
    
    Note that another way to solve the issue would be to change TCP
    to first send the ACK packet, then signal the EPOLLIN,
    but this changes P99 latencies, as sending the ACK packet
    can add a long delay.
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    b60fa1c5
sch_generic.c 32.5 KB