• Martin Beck's avatar
    MDEV-27088: Server crash on ARM (WMM architecture) due to missing barriers in lf-hash (10.5) · 4ebac0fc
    Martin Beck authored
    MariaDB server crashes on ARM (weak memory model architecture) while
    concurrently executing l_find to load node->key and add_to_purgatory
    to store node->key = NULL. l_find then uses key (which is NULL), to
    pass it to a comparison function.
    
    The specific problem is the out-of-order execution that happens on a
    weak memory model architecture. Two essential reorderings are possible,
    which need to be prevented.
    
    a) As l_find has no barriers in place between the optimistic read of
    the key field lf_hash.cc#L117 and the verification of link lf_hash.cc#L124,
    the processor can reorder the load to happen after the while-loop.
    
    In that case, a concurrent thread executing add_to_purgatory on the same
    node can be scheduled to store NULL at the key field lf_alloc-pin.c#L253
    before key is loaded in l_find.
    
    b) A node is marked as deleted by a CAS in l_delete lf_hash.cc#L247 and
    taken off the list with an upfollowing CAS lf_hash.cc#L252. Only if both
    CAS succeed, the key field is written to by add_to_purgatory. However,
    due to a missing barrier, the relaxed store of key lf_alloc-pin.c#L253
    can be moved ahead of the two CAS operations, which makes the value of
    the local purgatory list stored by add_to_purgatory visible to all threads
    operating on the list. As the node is not marked as deleted yet, the
    same error occurs in l_find.
    
    This change three accesses to be atomic.
    
    * optimistic read of key in l_find lf_hash.cc#L117
    * read of link for verification lf_hash.cc#L124
    * write of key in add_to_purgatory lf_alloc-pin.c#L253
    
    Reviewers: Sergei Vojtovich, Sergei Golubchik
    
    Fixes: MDEV-23510 / d30c1331a18d875e553f3fcf544997e4f33fb943
    4ebac0fc
lf_hash.cc 17.9 KB