1. 21 Feb, 2020 8 commits
  2. 20 Feb, 2020 2 commits
    • Marko Mäkelä's avatar
      Cleanup: Remove dict_ind_redundant · 96901d95
      Marko Mäkelä authored
      There is no reason for the dummy index object dict_ind_redundant
      to exist any more. It was only being passed to btr_create().
      
      btr_create(): If !index, assume that a ROW_FORMAT=REDUNDANT
      table is being created.
      
      We could pass ibuf.index, dict_sys.sys_tables->indexes.start
      and so on, if those objects had been initialized before the
      function btr_create() is called.
      96901d95
    • Eugene Kosov's avatar
      MDEV-21774 Innodb, Windows : restore file sharing logic in Innodb · 6618fc29
      Eugene Kosov authored
      recv_sys_t opened redo log files along with log_sys_t. That's why I
      removed file sharing logic from InnoDB
      in 9ef2d29f
      But it was actually used to ensure that only one MariaDB instance
      will touch the same InnoDB files.
      
      os0file.cc: revert some changes done previously
      
      mapped_file_t::map(): now has arguments read_only, nvme
      
      file_io::open(): now has argument read_only
      
      class file_os_io: make final
      
      log_file_t::open(): now has argument read_only
      6618fc29
  3. 19 Feb, 2020 7 commits
    • Marko Mäkelä's avatar
      MDEV-12353: Reduce log volume by an UNDO_APPEND record · 84e3f9ce
      Marko Mäkelä authored
      We introduce an EXTENDED log record for appending an undo log record
      to an undo log page. This is equivalent to the MLOG_UNDO_INSERT record
      that was removed in commit f802c989,
      only using more compact encoding.
      
      mtr_t::log_write(): Fix a bug that affects longer log
      record writes in the !same_page && !have_offset case.
      Similar code is already implemented for the have_offset code path.
      The bug was unobservable before we started to write longer
      EXTENDED records. All !have_offset records (FREE_PAGE, INIT_PAGE,
      EXTENDED) that were written so far are short, and we never write
      RESERVED or OPTION records.
      
      mtr_t::undo_append(): Write an UNDO_APPEND record.
      
      log_phys_t::undo_append(): Apply an UNDO_APPEND record.
      
      trx_undo_page_set_next_prev_and_add(),
      trx_undo_page_report_modify(),
      trx_undo_page_report_rename():
      Invoke mtr_t::undo_append() instead of emitting WRITE records.
      84e3f9ce
    • Marko Mäkelä's avatar
      MDEV-12353: Reduce log volume by an UNDO_INIT record · 86f262f1
      Marko Mäkelä authored
      We introduce an EXTENDED log record for initializing an undo log page.
      The size of the record will be 2 bytes plus the optional page identifier.
      The entire undo page will be initialized, except the space that is
      already reserved for TRX_UNDO_SEG_HDR in trx_undo_seg_create().
      
      mtr_t::undo_create(): Write the UNDO_INIT record.
      
      trx_undo_page_init(): Initialize the undo page corresponding to the
      UNDO_INIT record. Unlike the former MLOG_UNDO_INIT record, we will
      initialize almost the entire page, including initializing the
      TRX_UNDO_PAGE_NODE to an empty list node, so that the subsequent call
      to flst_init() will avoid writing log for the undo page.
      86f262f1
    • Eugene Kosov's avatar
      revert accidental libmariadb change · 3ee100b0
      Eugene Kosov authored
      3ee100b0
    • Eugene Kosov's avatar
      fix libpmem InnoDB linking · 29bb3744
      Eugene Kosov authored
      29bb3744
    • Eugene Kosov's avatar
      remove unused function · e62e285f
      Eugene Kosov authored
      e62e285f
    • Eugene Kosov's avatar
      MDEV-14425 deprecate and ignore innodb_log_files_in_group · 9ef2d29f
      Eugene Kosov authored
      Now there can be only one log file instead of several which
      logically work as a single file.
      
      Possible names of redo log files: ib_logfile0,
      ib_logfile101 (for just created one)
      
      innodb_log_fiels_in_group: value of this variable is not used
      by InnoDB. Possible values are still 1..100, to not break upgrade
      
      LOG_FILE_NAME: add constant of value "ib_logfile0"
      LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"
      
      get_log_file_path(): convenience function that returns full
      path of a redo log file
      
      SRV_N_LOG_FILES_MAX: removed
      
      srv_n_log_files: we can't remove this for compatibility reasons,
      but now server doesn't use this variable
      
      log_sys_t::file::fd: now just one, not std::vector
      
      log_sys_t::log_capacity: removed word 'group'
      
      find_and_check_log_file(): part of logic from huge srv_start()
      moved here
      
      recv_sys_t::files: file descriptors of redo log files.
      There can be several of those in case we're upgrading
      from older MariaDB version.
      
      recv_sys_t::remove_extra_log_files: whether to remove
      ib_logfile{1,2,3...} after successfull upgrade.
      
      recv_sys_t::read(): open if needed and read from one
      of several log files
      
      recv_sys_t::files_size(): open if needed and return files count
      
      redo_file_sizes_are_correct(): check that redo log files
      sizes are equal. Just to log an error for a user.
      Corresponding check was moved from srv0start.cc
      
      namespace deprecated: put all deprecated variables here to
      prevent usage of it by us, developers
      9ef2d29f
    • Jan Lindström's avatar
      Update wsrep-lib submodule. · 8d7a8e45
      Jan Lindström authored
      8d7a8e45
  4. 18 Feb, 2020 3 commits
  5. 17 Feb, 2020 6 commits
    • Marko Mäkelä's avatar
      MDEV-21744 Assertion `!rec_offs_nth_sql_null(offsets, n)' failed · 41fe972d
      Marko Mäkelä authored
      commit 08ba3887 of MDEV-12353
      introduced an incorrect assumption, which was documented by
      the failing assertion.
      
      After instant ADD COLUMN, we can have a null (and in-place) UPDATE
      of NULL to NULL. No data needs to be written for such updates.
      
      For ROW_FORMAT=REDUNDANT, we reserve space for the NULL values,
      and to be compatible with existing behaviour, we will zerofill
      the unused data bytes when updating to NULL value.
      41fe972d
    • Marko Mäkelä's avatar
      MDEV-21174: Correct a debug assertion failure · 055ce75d
      Marko Mäkelä authored
      trx_purge_free_segment(): In some cases (observed when running
      the test innodb_zip.wl5522_debug_zip), there is no change to
      the TRX_UNDO_NEEDS_PURGE field. Add mtr_t::OPT to disable a debug check.
      
      The bogus debug check was introduced in
      commit 56f6dab1.
      055ce75d
    • Marko Mäkelä's avatar
      MDEV-12353: Reformat page_delete_rec_list_end() · 22f649a6
      Marko Mäkelä authored
      We add FIXME comments and some sketch code for the following cases:
      
      It is possible to write considerably less log for ROW_FORMAT=COMPRESSED
      pages. For now, we will delete the records one by one.
      
      It is also possible to treat 'deleting the last records' as a special
      case that would involve shrinking PAGE_HEAP_TOP. That should reduce
      the need of reorganizing pages.
      22f649a6
    • Marko Mäkelä's avatar
      MDEV-12353: Optimize page_cur_delete_rec() logging further · 09feb176
      Marko Mäkelä authored
      page_mem_free(): When deleting the very last record of the page,
      even if the record did not fully utilize all bytes in a
      former PAGE_FREE record, truncate the PAGE_HEAP_TOP and reduce
      PAGE_GARBAGE by the saved amount.
      09feb176
    • Ian Gilfillan's avatar
    • Marko Mäkelä's avatar
      MDEV-12353: Write less log for BLOB pages · fc876980
      Marko Mäkelä authored
      fsp_page_create(): Always initialize the page. The logic to
      avoid initialization was made redundant and should have been removed
      in mysql/mysql-server@ce0a1e85e24e48b8171f767b44330da635a6ea0a
      (MySQL 5.7.5).
      
      btr_store_big_rec_extern_fields(): Remove the redundant initialization
      of FIL_PAGE_PREV and FIL_PAGE_NEXT. An INIT_PAGE record will have
      been written already. Only write the ROW_FORMAT=COMPRESSED page payload
      from FIL_PAGE_DATA onwards. We were unnecessarily writing from
      FIL_PAGE_TYPE onwards, which caused an assertion failure on recovery:
      
      	recv_sys_t::alloc(size_t): Assertion 'len <= srv_page_size' failed
      
      when running the following tests:
      
      	./mtr --no-reorder innodb_zip.blob,4k innodb_zip.bug56680,4k
      fc876980
  6. 16 Feb, 2020 4 commits
    • Marko Mäkelä's avatar
      MDEV-12353: Fix a Galera assertion failure · 5874aac7
      Marko Mäkelä authored
      trx_rseg_write_wsrep_checkpoint(): Add missing mtr_t::OPT,
      and avoid an unnecessary call to mtr_t::memset().
      
      This addresses a debug assertion failure in wsrep_info.plugin.
      5874aac7
    • Marko Mäkelä's avatar
      d657cd74
    • Marko Mäkelä's avatar
      MDEV-12353: Remove bogus conditions · 5876de19
      Marko Mäkelä authored
      page_update_max_trx_id(), page_delete_rec_list_end(): Remove conditions
      on recv_recovery_is_on(). These conditions should have been removed in
      or before commit f8a9f906
      (removing the support for crash-upgrade).
      
      The physical redo log based recovery will not call such high-level code.
      5876de19
    • Marko Mäkelä's avatar
      MDEV-12353: Optimize page_cur_delete_rec() logging · 3887daf8
      Marko Mäkelä authored
      page_mem_free(): When deleting the last record of a page,
      do not add it to the PAGE_FREE list, but instead truncate the
      PAGE_HEAP_TOP. Modify the page header fields by writing fewer
      records.
      
      page_cur_delete_rec(): Let page_mem_free() reset the PAGE_LAST_INSERT.
      
      page_header_reset_last_insert(): Issue memset(), not memcpy(), for
      the ROW_FORMAT=COMPRESSED page.
      3887daf8
  7. 14 Feb, 2020 7 commits
    • Daniel Bartholomew's avatar
      bump the VERSION · 2c34315d
      Daniel Bartholomew authored
      2c34315d
    • Marko Mäkelä's avatar
    • Eugene Kosov's avatar
      fix Win build · 735c6ea3
      Eugene Kosov authored
      735c6ea3
    • Eugene Kosov's avatar
      MDEV-17084 Optimize append only files for NVDIMM · 3daef523
      Eugene Kosov authored
      Optionally use libpmem for InnoDB redo log writing.
      
      When server is built -DWITH_PMEM=ON InnoDB tries to detect
      that redo log is located on persistent memory storage and
      uses faster file access method.
      
      When server is built with -DWITH_PMEM=OFF preprocessor is
      used to ensure that no slowdown will present due to allocations
      and virtual function calls. So, we don't slow down server
      in a common case.
      
      mapped_file_t: an map file, unmap file and returns mapped memory buffer
      
      file_io: abstraction around memory mapped files and file descriptors.
      Allows writing, reading and flushing to files.
      
      file_io::writes_are_durable(): notable method of a class.
      When it returns true writes are flushed immediately.
      
      file_os_io: file descriptor based file access. Depends on a global state
      like srv_read_only_mode
      
      file_pmem_io: file access via libpmem
      
      This is a collaboration work with Sergey Vojtovich
      3daef523
    • Marko Mäkelä's avatar
      MDEV-19747: Fix a warning · d901919d
      Marko Mäkelä authored
      In commit fc2f2fa8
      we replaced FlushObserver* with bool, but forgot to
      replace one NULL with false.
      d901919d
    • Marko Mäkelä's avatar
      MDEV-12353: Remove bogus comments and clean up code · 37dc087f
      Marko Mäkelä authored
      This is a fixup for commit 7ae21b18.
      
      It turns out that even if we in the future made LSN
      count mini-transactions instead of bytes, we will need
      both start LSN and end LSN, which must exactly match
      between mtr_t::commit() and log_phys_t::apply().
      
      log_rec_t::lsn: Restore the const qualifier.
      
      log_phys_t::append(): Remove the lsn parameter. Both the start
      and end LSN must remain unchanged. We can only append log from
      the same mini-transaction to a single log record snippet.
      If we combined the log from mini-transactions A and B, it could
      happen that the FIL_PAGE_LSN of the page is somewhere between
      A.start_lsn and B.start_lsn. In that case, also the log of B
      would be wrongly skipped.
      
      recv_sys_t::add(): Assert that if the start LSN matches, also
      the end LSN will match.
      37dc087f
    • Sergei Golubchik's avatar
      dd87a8b3
  8. 13 Feb, 2020 3 commits
    • Sergei Golubchik's avatar
      alpha -> beta · 6dc46b5a
      Sergei Golubchik authored
      6dc46b5a
    • Marko Mäkelä's avatar
      MDEV-12353: Remove support for crash-upgrade · f8a9f906
      Marko Mäkelä authored
      We tighten some assertions regarding dict_index_t::is_dummy
      and crash recovery, now that redo log processing will
      no longer create dummy objects.
      f8a9f906
    • Marko Mäkelä's avatar
      MDEV-12353: Change the redo log encoding · 7ae21b18
      Marko Mäkelä authored
      log_t::FORMAT_10_5: physical redo log format tag
      
      log_phys_t: Buffered records in the physical format.
      The log record bytes will follow the last data field,
      making use of alignment padding that would otherwise be wasted.
      If there are multiple records for the same page, also those
      may be appended to an existing log_phys_t object if the memory
      is available.
      
      In the physical format, the first byte of a record identifies the
      record and its length (up to 15 bytes). For longer records, the
      immediately following bytes will encode the remaining length
      in a variable-length encoding. Usually, a variable-length-encoded
      page identifier will follow, followed by optional payload, whose
      length is included in the initially encoded total record length.
      
      When a mini-transaction is updating multiple fields in a page,
      it can avoid repeating the tablespace identifier and page number
      by setting the same_page flag (most significant bit) in the first
      byte of the log record. The byte offset of the record will be
      relative to where the previous record for that page ended.
      
      Until MDEV-14425 introduces a separate file-level log for
      redo log checkpoints and file operations, we will write the
      file-level records in the page-level redo log file.
      The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
      will be removed in MDEV-14425, and one sequential scan of the
      page recovery log will suffice.
      
      Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
      If the information is needed, it can be parsed from WRITE records that
      modify FSP_SPACE_FLAGS.
      
      MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
      as part of this work, before being replaced with WRITE (along with
      MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).
      
      mtr_buf_t::empty(): Check if the buffer is empty.
      
      mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.
      
      mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
      for the same_page encoding.
      
      page_recv_t::last_offset: Reflects mtr_t::m_last_offset.
      
      Valid values for last_offset during recovery should be 0 or above 8.
      (The first 8 bytes of a page are the checksum and the page number,
      and neither are ever updated directly by log records.)
      Internally, the special value 1 indicates that the same_page form
      will not be allowed for the subsequent record.
      
      mtr_t::page_create(): Take the block descriptor as parameter,
      so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
      record will always followed by a subtype byte, because same_page
      records must be longer than 1 byte.
      
      trx_undo_page_init(): Combine the writes in WRITE record.
      
      trx_undo_header_create(): Write 4 bytes using a special MEMSET
      record that includes 1 bytes of length and 2 bytes of payload.
      
      flst_write_addr(): Define as a static function. Combine the writes.
      
      flst_zero_both(): Replaces two flst_zero_addr() calls.
      
      flst_init(): Do not inline the function.
      
      fsp_free_seg_inode(): Zerofill the whole inode.
      
      fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
      to FIL_NULL when using the physical format.
      
      btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
      must have been invoked.
      
      fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.
      
      fil_names_dirty_and_write(): Remove the parameter mtr.
      Write the records using a separate mini-transaction object,
      because any FILE_ records must be at the start of a mini-transaction log.
      
      recv_recover_page(): Add a fil_space_t* parameter.
      After applying log to the a ROW_FORMAT=COMPRESSED page,
      invoke buf_zip_decompress() to restore the uncompressed page.
      
      buf_page_io_complete(): Remove the temporary hack to discard the
      uncompressed page of a ROW_FORMAT=COMPRESSED page.
      
      page_zip_write_header(): Remove. Use mtr_t::write() or
      mtr_t::memset() instead, and update the compressed page frame
      separately.
      
      trx_undo_header_add_space_for_xid(): Remove.
      
      trx_undo_seg_create(): Perform the changes that were previously
      made by trx_undo_header_add_space_for_xid().
      
      btr_reset_instant(): New function: Reset the table to MariaDB 10.2
      or 10.3 format when rolling back an instant ALTER TABLE operation.
      
      page_rec_find_owner_rec(): Merge with the only callers.
      
      page_cur_insert_rec_low(): Combine writes by using a local buffer.
      MEMMOVE data from the preceding record whenever feasible
      (copying at least 3 bytes).
      
      page_cur_insert_rec_zip(): Combine writes to page header fields.
      
      PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
      part from the preceding record.
      
      PageBulk::finishPage(): Combine the writes to the page header
      and to the sparse page directory slots.
      
      mtr_t::write(): Only log the least significant (last) bytes
      of multi-byte fields that actually differ.
      
      For updating FSP_SIZE, we must always write all 4 bytes to the
      redo log, so that the fil_space_set_recv_size() logic in
      recv_sys_t::parse() will work.
      
      mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
      instead of a numeric offset to the page frame. Only log the
      last bytes of multi-byte fields that actually differ.
      
      In fil_space_crypt_t::write_page0(), we must log also any
      unchanged bytes, so that recovery will recognize the record
      and invoke fil_crypt_parse().
      
      Future work:
      MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
      MDEV-21725 Optimize btr_page_reorganize_low() redo logging
      MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
      7ae21b18