• Vlad Lesin's avatar
    MDEV-21136 InnoDB's records_in_range estimates can be way off · 222e800e
    Vlad Lesin authored
    Get rid of BTR_ESTIMATE and btr_cur_t::path_arr.
    
    Before the fix btr_estimate_n_rows_in_range_low() used two
    btr_cur_search_to_nth_level() calls to create two arrays of tree path,
    the array per border. And then it tried to estimate the number of rows
    diving level-by-level with the array elements. As the path pages are
    unlatched during the arrays iterating, the tree could be modified, the
    estimation function called itself until the number of attempts exceed.
    
    After the fix the estimation happens during search process. Roughly, the
    algorithm is the following. Dive in the left page, then if there are pages
    between left and right ones, read a few pages to the right, if the right
    page is reached, fetch it and count the exact number of rows, otherwise
    count the estimated number of rows, and fetch the right page.
    
    The latching order corresponds to WL#6326 rules, i.e.:
    
    (2.1) [same as (1.1)]: Page latches must be acquired in descending order
    of tree level.
    
    (2.2) When acquiring a node pointer page latch at level L, we must hold
    the left sibling page latch (at level L) or some ancestor latch
    (at level>L).
    
    When we dive to the level down, the parent page is unlatched only after
    the the current level page is latched. When we estimate the number of rows
    on some level, we latch the left border, then fetch the next page, and
    then fetch the next page unlatching the previous page after the current
    page is latched until the right border is reached. I.e. the left sibling
    is always latched when we acquire page latch on the same level. When we
    reach the right border, the current page is unlatched, and then the right
    border is latched. Following to (2.2) rule, we can do this because the
    right border's parent is latched.
    222e800e
btr0btr.h 25.1 KB