• Yangyang Li's avatar
    RDMA/hns: Disable local invalidate operation · 9e272ed6
    Yangyang Li authored
    When function reset and local invalidate are mixed, HNS RoCEE may hang.
    Before introducing the cause of the problem, two hardware internal
    concepts need to be introduced:
    
        1. Execution queue: The queue of hardware execution instructions,
        function reset and local invalidate are queued for execution in this
        queue.
    
        2.Local queue: A queue that stores local operation instructions. The
        instructions in the local queue will be sent to the execution queue
        for execution. The instructions in the local queue will not be removed
        until the execution is completed.
    
    The reason for the problem is as follows:
    
        1. There is a function reset instruction in the execution queue, which
        is currently being executed. A necessary condition for the successful
        execution of function reset is: the hardware pipeline needs to empty
        the instructions that were not completed before;
    
        2. A local invalidate instruction at the head of the local queue is
        sent to the execution queue. Now there are two instructions in the
        execution queue, the first is the function reset instruction, and the
        second is the local invalidate instruction, which will be executed in
        se quence;
    
        3. The user has issued many local invalidate operations, causing the
        local queue to be filled up.
    
        4. The user still has a new local operation command and is queuing to
        enter the local queue. But the local queue is full and cannot receive
        new instructions, this instruction is temporarily stored at the
        hardware pipeline.
    
        5. The function reset has been waiting for the instruction before the
        hardware pipeline stage is drained. The hardware pipeline stage also
        caches a local invalidate instruction, so the function reset cannot be
        completed, and the instructions after it cannot be executed.
    
    These factors together cause the execution logic deadlock of the hardware,
    and the consequence is that RoCEE will not have any response.  Considering
    that the local operation command may potentially cause RoCEE to hang, this
    feature is no longer supported.
    
    Fixes: e93df010 ("RDMA/hns: Support local invalidate for hip08 in kernel space")
    Signed-off-by: default avatarYangyang Li <liyangyang20@huawei.com>
    Signed-off-by: default avatarWenpeng Liang <liangwenpeng@huawei.com>
    Signed-off-by: default avatarHaoyue Xu <xuhaoyue1@hisilicon.com>
    Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
    9e272ed6
hns_roce_hw_v2.h 47.2 KB