• Marko Mäkelä's avatar
    MDEV-12353: Change the redo log encoding · 7ae21b18
    Marko Mäkelä authored
    log_t::FORMAT_10_5: physical redo log format tag
    
    log_phys_t: Buffered records in the physical format.
    The log record bytes will follow the last data field,
    making use of alignment padding that would otherwise be wasted.
    If there are multiple records for the same page, also those
    may be appended to an existing log_phys_t object if the memory
    is available.
    
    In the physical format, the first byte of a record identifies the
    record and its length (up to 15 bytes). For longer records, the
    immediately following bytes will encode the remaining length
    in a variable-length encoding. Usually, a variable-length-encoded
    page identifier will follow, followed by optional payload, whose
    length is included in the initially encoded total record length.
    
    When a mini-transaction is updating multiple fields in a page,
    it can avoid repeating the tablespace identifier and page number
    by setting the same_page flag (most significant bit) in the first
    byte of the log record. The byte offset of the record will be
    relative to where the previous record for that page ended.
    
    Until MDEV-14425 introduces a separate file-level log for
    redo log checkpoints and file operations, we will write the
    file-level records in the page-level redo log file.
    The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
    will be removed in MDEV-14425, and one sequential scan of the
    page recovery log will suffice.
    
    Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
    If the information is needed, it can be parsed from WRITE records that
    modify FSP_SPACE_FLAGS.
    
    MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
    as part of this work, before being replaced with WRITE (along with
    MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).
    
    mtr_buf_t::empty(): Check if the buffer is empty.
    
    mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.
    
    mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
    for the same_page encoding.
    
    page_recv_t::last_offset: Reflects mtr_t::m_last_offset.
    
    Valid values for last_offset during recovery should be 0 or above 8.
    (The first 8 bytes of a page are the checksum and the page number,
    and neither are ever updated directly by log records.)
    Internally, the special value 1 indicates that the same_page form
    will not be allowed for the subsequent record.
    
    mtr_t::page_create(): Take the block descriptor as parameter,
    so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
    record will always followed by a subtype byte, because same_page
    records must be longer than 1 byte.
    
    trx_undo_page_init(): Combine the writes in WRITE record.
    
    trx_undo_header_create(): Write 4 bytes using a special MEMSET
    record that includes 1 bytes of length and 2 bytes of payload.
    
    flst_write_addr(): Define as a static function. Combine the writes.
    
    flst_zero_both(): Replaces two flst_zero_addr() calls.
    
    flst_init(): Do not inline the function.
    
    fsp_free_seg_inode(): Zerofill the whole inode.
    
    fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
    to FIL_NULL when using the physical format.
    
    btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
    must have been invoked.
    
    fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.
    
    fil_names_dirty_and_write(): Remove the parameter mtr.
    Write the records using a separate mini-transaction object,
    because any FILE_ records must be at the start of a mini-transaction log.
    
    recv_recover_page(): Add a fil_space_t* parameter.
    After applying log to the a ROW_FORMAT=COMPRESSED page,
    invoke buf_zip_decompress() to restore the uncompressed page.
    
    buf_page_io_complete(): Remove the temporary hack to discard the
    uncompressed page of a ROW_FORMAT=COMPRESSED page.
    
    page_zip_write_header(): Remove. Use mtr_t::write() or
    mtr_t::memset() instead, and update the compressed page frame
    separately.
    
    trx_undo_header_add_space_for_xid(): Remove.
    
    trx_undo_seg_create(): Perform the changes that were previously
    made by trx_undo_header_add_space_for_xid().
    
    btr_reset_instant(): New function: Reset the table to MariaDB 10.2
    or 10.3 format when rolling back an instant ALTER TABLE operation.
    
    page_rec_find_owner_rec(): Merge with the only callers.
    
    page_cur_insert_rec_low(): Combine writes by using a local buffer.
    MEMMOVE data from the preceding record whenever feasible
    (copying at least 3 bytes).
    
    page_cur_insert_rec_zip(): Combine writes to page header fields.
    
    PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
    part from the preceding record.
    
    PageBulk::finishPage(): Combine the writes to the page header
    and to the sparse page directory slots.
    
    mtr_t::write(): Only log the least significant (last) bytes
    of multi-byte fields that actually differ.
    
    For updating FSP_SIZE, we must always write all 4 bytes to the
    redo log, so that the fil_space_set_recv_size() logic in
    recv_sys_t::parse() will work.
    
    mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
    instead of a numeric offset to the page frame. Only log the
    last bytes of multi-byte fields that actually differ.
    
    In fil_space_crypt_t::write_page0(), we must log also any
    unchanged bytes, so that recovery will recognize the record
    and invoke fil_crypt_parse().
    
    Future work:
    MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
    MDEV-21725 Optimize btr_page_reorganize_low() redo logging
    MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
    7ae21b18
srv0start.cc 64.3 KB