• Alex Elder's avatar
    libceph: must hold mutex for reset_changed_osds() · 14d2f38d
    Alex Elder authored
    An osd client has a red-black tree describing its osds, and
    occasionally we would get crashes due to one of these trees tree
    becoming corrupt somehow.
    
    The problem turned out to be that reset_changed_osds() was being
    called without protection of the osd client request mutex.  That
    function would call __reset_osd() for any osd that had changed, and
    __reset_osd() would call __remove_osd() for any osd with no
    outstanding requests, and finally __remove_osd() would remove the
    corresponding entry from the red-black tree.  Thus, the tree was
    getting modified without having any lock protection, and was
    vulnerable to problems due to concurrent updates.
    
    This appears to be the only osd tree updating path that has this
    problem.  It can be fairly easily fixed by moving the call up
    a few lines, to just before the request mutex gets dropped
    in kick_requests().
    
    This resolves:
        http://tracker.ceph.com/issues/5043
    
    Cc: stable@vger.kernel.org # 3.4+
    Signed-off-by: default avatarAlex Elder <elder@inktank.com>
    Reviewed-by: default avatarSage Weil <sage@inktank.com>
    14d2f38d
osd_client.c 67.8 KB