• Marko Mäkelä's avatar
    MDEV-19514 Defer change buffer merge until pages are requested · b42294bc
    Marko Mäkelä authored
    We will remove the InnoDB background operation of merging buffered
    changes to secondary index leaf pages. Changes will only be merged as a
    result of an operation that accesses a secondary index leaf page,
    such as a SQL statement that performs a lookup via that index,
    or is modifying the index. Also ROLLBACK and some background operations,
    such as purging the history of committed transactions, or computing
    index cardinality statistics, can cause change buffer merge.
    Encryption key rotation will not perform change buffer merge.
    
    The motivation of this change is to simplify the I/O logic and to
    allow crash recovery to happen in the background (MDEV-14481).
    We also hope that this will reduce the number of "mystery" crashes
    due to corrupted data. Because change buffer merge will typically
    take place as a result of executing SQL statements, there should be
    a clearer connection between the crash and the SQL statements that
    were executed when the server crashed.
    
    In many cases, a slight performance improvement was observed.
    
    This is joint work with Thirunarayanan Balathandayuthapani
    and was tested by Axel Schwenke and Matthias Leich.
    
    The InnoDB monitor counter innodb_ibuf_merge_usec will be removed.
    
    On slow shutdown (innodb_fast_shutdown=0), we will continue to
    merge all buffered changes (and purge all undo log history).
    
    Two InnoDB configuration parameters will be changed as follows:
    
    innodb_disable_background_merge: Removed.
    This parameter existed only in debug builds.
    All change buffer merges will use synchronous reads.
    
    innodb_force_recovery will be changed as follows:
    * innodb_force_recovery=4 will be the same as innodb_force_recovery=3
    (the change buffer merge cannot be disabled; it can only happen as
    a result of an operation that accesses a secondary index leaf page).
    The option used to be capable of corrupting secondary index leaf pages.
    Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'.
    * innodb_force_recovery=5 (which essentially hard-wires
    SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED)
    becomes safe to use. Bogus data can be returned to SQL, but
    persistent InnoDB data files will not be corrupted further.
    * innodb_force_recovery=6 (ignore the redo log files)
    will be the only option that can potentially cause
    persistent corruption of InnoDB data files.
    
    Code changes:
    
    buf_page_t::ibuf_exist: New flag, to indicate whether buffered
    changes exist for a buffer pool page. Pages with pending changes
    can be returned by buf_page_get_gen(). Previously, the changes
    were always merged inside buf_page_get_gen() if needed.
    
    ibuf_page_exists(const buf_page_t&): Check if a buffered changes
    exist for an X-latched or read-fixed page.
    
    buf_page_get_gen(): Add the parameter allow_ibuf_merge=false.
    All callers that know that they may be accessing a secondary index
    leaf page must pass this parameter as allow_ibuf_merge=true,
    unless it does not matter for that caller whether all buffered
    changes have been applied. Assert that whenever allow_ibuf_merge
    holds, the page actually is a leaf page. Attempt change buffer
    merge only to secondary B-tree index leaf pages.
    
    btr_block_get(): Add parameter 'bool merge'.
    All callers of btr_block_get() should know whether the page could be
    a secondary index leaf page. If it is not, we should avoid consulting
    the change buffer bitmap to even consider a merge. This is the main
    interface to requesting index pages from the buffer pool.
    
    ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace
    buf_page_get_known_nowait() with much simpler logic, because
    it is now guaranteed that that the block is x-latched or read-fixed.
    
    mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge().
    On crash recovery, we will no longer merge any buffered changes
    for the pages that we read into the buffer pool during the last batch
    of applying log records.
    
    buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove.
    
    btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait()
    to its only remaining caller.
    
    buf_page_make_young_if_needed(): Define as an inline function.
    Add the parameter buf_pool.
    
    buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the
    parameter buf_pool.
    
    fil_space_validate_for_mtr_commit(): Remove a bogus comment
    about background merge of the change buffer.
    
    btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(),
    btr_cur_open_at_index_side_func(): Use narrower data types and scopes.
    
    ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages().
    Merge the change buffer by invoking buf_page_get_gen().
    b42294bc
btr0sea.cc 53.9 KB