• Uday Shankar's avatar
    nvme: improve handling of long keep alives · c7275ce6
    Uday Shankar authored
    Upon keep alive completion, nvme_keep_alive_work is scheduled with the
    same delay every time. If keep alive commands are completing slowly,
    this may cause a keep alive timeout. The following trace illustrates the
    issue, taking KATO = 8 and TBKAS off for simplicity:
    
    1. t = 0: run nvme_keep_alive_work, send keep alive
    2. t = ε: keep alive reaches controller, controller restarts its keep
              alive timer
    3. t = 4: host receives keep alive completion, schedules
              nvme_keep_alive_work with delay 4
    4. t = 8: run nvme_keep_alive_work, send keep alive
    
    Here, a keep alive having RTT of 4 causes a delay of at least 8 - ε
    between the controller receiving successive keep alives. With ε small,
    the controller is likely to detect a keep alive timeout.
    
    Fix this by calculating the RTT of the keep alive command, and adjusting
    the scheduling delay of the next keep alive work accordingly.
    Reported-by: default avatarCosta Sapuntzakis <costa@purestorage.com>
    Reported-by: default avatarRandy Jennings <randyj@purestorage.com>
    Signed-off-by: default avatarUday Shankar <ushankar@purestorage.com>
    Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
    c7275ce6
core.c 144 KB