• Fred Lotter's avatar
    nfp: flower: increase cmesg reply timeout · 96439889
    Fred Lotter authored
    QA tests report occasional timeouts on REIFY message replies. Profiling
    of the two cmesg reply types under burst conditions, with a 12-core host
    under heavy cpu and io load (stress --cpu 12 --io 12), show both PHY MTU
    change and REIFY replies can exceed the 10ms timeout. The maximum MTU
    reply wait under burst is 16ms, while the maximum REIFY wait under 40 VF
    burst is 12ms. Using a 4 VF REIFY burst results in an 8ms maximum wait.
    A larger VF burst does increase the delay, but not in a linear enough
    way to justify a scaled REIFY delay. The worse case values between
    MTU and REIFY appears close enough to justify a common timeout. Pick a
    conservative 40ms to make a safer future proof common reply timeout. The
    delay only effects the failure case.
    
    Change the REIFY timeout mechanism to use wait_event_timeout() instead
    of wait_event_interruptible_timeout(), to match the MTU code. In the
    current implementation, theoretically, a signal could interrupt the
    REIFY waiting period, with a return code of ERESTARTSYS. However, this is
    caught under the general timeout error code EIO. I cannot see the benefit
    of exposing the REIFY waiting period to signals with such a short delay
    (40ms), while the MTU mechnism does not use the same logic. In the absence
    of any reply (wakeup() call), both reply types will wake up the task after
    the timeout period. The REIFY timeout applies to the entire representor
    group being instantiated (e.g. VFs), while the MTU timeout apples to a
    single PHY MTU change.
    Signed-off-by: default avatarFred Lotter <frederik.lotter@netronome.com>
    Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    96439889
cmsg.c 8.01 KB