• Erez Shitrit's avatar
    IB/ipoib: Fix race condition in neigh creation · 16ba3def
    Erez Shitrit authored
    When using enhanced mode for IPoIB, two threads may execute xmit in
    parallel to two different TX queues while the target is the same.
    In this case, both of them will add the same neighbor to the path's
    neigh link list and we might see the following message:
    
      list_add double add: new=ffff88024767a348, prev=ffff88024767a348...
      WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70
      ipoib_start_xmit+0x477/0x680 [ib_ipoib]
      dev_hard_start_xmit+0xb9/0x3e0
      sch_direct_xmit+0xf9/0x250
      __qdisc_run+0x176/0x5d0
      __dev_queue_xmit+0x1f5/0xb10
      __dev_queue_xmit+0x55/0xb10
    
    Analysis:
    Two SKB are scheduled to be transmitted from two cores.
    In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get.
    Two calls to neigh_add_path are made. One thread takes the spin-lock
    and calls ipoib_neigh_alloc which creates the neigh structure,
    then (after the __path_find) the neigh is added to the path's neigh
    link list. When the second thread enters the critical section it also
    calls ipoib_neigh_alloc but in this case it gets the already allocated
    ipoib_neigh structure, which is already linked to the path's neigh
    link list and adds it again to the list. Which beside of triggering
    the list, it creates a loop in the linked list. This loop leads to
    endless loop inside path_rec_completion.
    
    Solution:
    Check list_empty(&neigh->list) before adding to the list.
    Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send"
    
    Fixes: b63b70d8 ('IPoIB: Use a private hash table for path lookup in xmit path')
    Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
    Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
    Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    16ba3def
ipoib_main.c 60.9 KB