1. 25 Apr, 2019 4 commits
    • Oleksandr Byelkin's avatar
      MDEV-17036: BULK with replace doesn't take the first parameter in account · 3dffdee6
      Oleksandr Byelkin authored
      INSERT and REPLACE served by the same function, so flags (and processing) should be the same.
      3dffdee6
    • Marko Mäkelä's avatar
      Implement --debug=d,ib_log_checkpoint_avoid · b2dbc781
      Marko Mäkelä authored
      Normally, the InnoDB master thread executes InnoDB log checkpoints
      so frequently that bugs in crash recovery or redo logging can be
      hard to reproduce. This is because crash recovery would start replaying
      the log only from the latest checkpoint. Because the InnoDB redo log
      format only allows saving information for at most 2 latest checkpoints,
      and because the log files are written in a circular fashion, it would
      be challenging to implement a debug option that would start the redo
      log apply from the very start of the redo log file.
      b2dbc781
    • Eugene Kosov's avatar
      MDEV-19231 make DB_SUCCESS equal to 0 · 6c5c1f0b
      Eugene Kosov authored
      It's a micro optimization. On most platforms CPUs has instructions to
      compare with 0 fast. DB_SUCCESS is the most popular outcome of functions
      and this patch optimized code like (err == DB_SUCCESS)
      
      BtrBulk::finish(): bogus assertion fixed
      
      fil_node_t::read_page0(): corrected usage of os_file_read()
      
      que_eval_sql(): bugus assertion removed. Apparently it checked that
      the field was assigned after having been zero-initialized at
      object creation.
      
      It turns out that the return type of os_file_read_func() was changed
      in mysql/mysql-server@98909cefbc37e54efc6452c7e95bccbf64ac9213 (MySQL 5.7)
      from ibool to dberr_t. The reviewer (if there was any) failed to
      point out that because of future merges, it could be a bad idea to
      change the return type of a function without changing the function name.
      
      This change was applied to MariaDB 10.2.2 in
      commit 2e814d47 but the
      MariaDB-specific code was not fully adjusted accordingly,
      e.g. in fil_node_open_file(). Essentially, code like
      !os_file_read(...) became dead code in MariaDB and later
      in Mariabackup 10.2, and we could be dealing with an uninitialized
      buffer after a failed page read.
      6c5c1f0b
    • Marko Mäkelä's avatar
      Merge 10.1 into 10.2 · bc145193
      Marko Mäkelä authored
      bc145193
  2. 24 Apr, 2019 5 commits
    • Marko Mäkelä's avatar
      Merge 5.5 into 10.1 · bfb0726f
      Marko Mäkelä authored
      bfb0726f
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-15772 Potential list overrun during XA recovery · d5da8ae0
      Thirunarayanan Balathandayuthapani authored
      InnoDB could return the same list again and again if the buffer
      passed to trx_recover_for_mysql() is smaller than the number of
      transactions that InnoDB recovered in XA PREPARE state.
      
      We introduce the transaction state TRX_PREPARED_RECOVERED, which
      is like TRX_PREPARED, but will be set during trx_recover_for_mysql()
      so that each transaction will only be returned once.
      
      Because init_server_components() is invoking ha_recover() twice,
      we must reset the state of the transactions back to TRX_PREPARED
      after returning the complete list, so that repeated traversals
      will see the complete list again, instead of seeing an empty list.
      Without this tweak, the test main.tc_heuristic_recover would hang
      in MariaDB 10.1.
      d5da8ae0
    • Varun Gupta's avatar
      MDEV-15837: Assertion `item1->type() == Item::FIELD_ITEM && item2->type() == Item::FIELD_ITEM' · 1f1a61cf
      Varun Gupta authored
                  failed in compare_order_elements function
      
      The issue here is the function compare_order_lists() is called for the order by list of the window functions
      so that those window function that can be computed together are adjacent.
      So in the function compare_order_list we iterate over all the elements in the order list of the two functions and
      compare the items in their order by clause.
      The function compare_order_elements() is called for each item in the
      order by clause. This function assumes that all the items that are in the order by list would be of the type
      Item::FIELD_ITEM.
      
      The case we have is that we have constants in the order by clause. We should ignore the constant and only compare
      items of the type Item::FIELD_ITEM in compare_order_elements()
      1f1a61cf
    • Sujatha Sivakumar's avatar
      MDEV-17260: Memory leaks in mysqlbinlog · cb8d888c
      Sujatha Sivakumar authored
      Problem:
      ========
      The mysqlbinlog tool is leaking memory, causing failures in various tests when
      compiling and testing with AddressSanitizer or LeakSanitizer like this:
      
      cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_ASAN:BOOL=ON /path/to/source
      make -j$(nproc)
      cd mysql-test
      ASAN_OPTIONS=abort_on_error=1 ./mtr --parallel=auto rpl.rpl_row_mysqlbinlog
      
      CURRENT_TEST: rpl.rpl_row_mysqlbinlog
      
      Direct leak of 112 byte(s) in 1 object(s) allocated from:
      #0 0x4eff87 in __interceptor_malloc (/dev/shm/5.5/client/mysqlbinlog+0x4eff87)
      #1 0x60eaab in my_malloc /mariadb/5.5/mysys/my_malloc.c:41:10
      #2 0x5300dd in Log_event::read_log_event(char const*, unsigned int, char const**,
         Format_description_log_event const*, char) /mariadb/5.5/sql/log_event.cc:1568:
      #3 0x564a9c in dump_remote_log_entries(st_print_event_info*, char const*)
      /mariadb/5.5/client/mysqlbinlog.cc:1978:17
      
      Analysis:
      ========
      'mysqlbinlog' tool is being used to read binary log events from a remote server.
      While reading binary log, if a fake rotate event is found following actions are
      taken.
      
      If 'to-last-log' option is specified, then fake rotate event is processed.
      In the absence of 'to-last-log' skip the fake rotate event.
      
      In this skipped case the fake rotate event object is not getting cleaned up
      resulting in memory leak.
      
      Fix:
      ===
      Cleanup the fake rotate event.
      
      This issues is already fixed in MariaDB 10.0.23 and higher versions as part of
      commit c3018b0f
      cb8d888c
    • Igor Babaev's avatar
      MDEV-17796 WHERE filter is ignored by DISTINCT IFNULL(GROUP_CONCAT(X), Y) · 5fc8dd8b
      Igor Babaev authored
                 with GROUP BY + ORDER BY
      
      The method JOIN::create_postjoin_aggr_table() should not call
      call JOIN::add_sorting_to_table() unless the first non-constant join
      table is passed as the first parameter to the method.
      5fc8dd8b
  3. 23 Apr, 2019 5 commits
  4. 22 Apr, 2019 3 commits
  5. 19 Apr, 2019 2 commits
  6. 18 Apr, 2019 1 commit
    • Sergei Petrunia's avatar
      MDEV-17297: stats.records=0 for a table of Archive engine when it has rows,... · 056b6fe1
      Sergei Petrunia authored
      MDEV-17297: stats.records=0 for a table of Archive engine when it has rows, when we run ANALYZE command
      
      Archive storage engine assumed that any query that attempts to read from
      the table will call ha_archive::info() beforehand. ha_archive would flush
      un-written data in that call (this would make it visible for the reads).
      
      Break this assumption. Flush the data when the table is opened for reading.
      
      This way, one can do multiple write statements without causing a flush, but
      as soon as we might need the data, we flush it.
      056b6fe1
  7. 17 Apr, 2019 3 commits
    • Alexander Barkov's avatar
    • Marko Mäkelä's avatar
      MDEV-12699 Improve crash recovery of corrupted data pages · 169c0099
      Marko Mäkelä authored
      InnoDB crash recovery used to read every data page for which
      redo log exists. This is unnecessary for those pages that are
      initialized by the redo log. If a newly created page is corrupted,
      recovery could unnecessarily fail. It would suffice to reinitialize
      the page based on the redo log records.
      
      To add insult to injury, InnoDB crash recovery could hang if it
      encountered a corrupted page. We will fix also that problem.
      InnoDB would normally refuse to start up if it encounters a
      corrupted page on recovery, but that can be overridden by
      setting innodb_force_recovery=1.
      
      Data pages are completely initialized by the records
      MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
      MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
      which notifies that a page has been freed and its contents
      can be discarded (filled with zeroes).
      
      The record MLOG_INDEX_LOAD notifies that redo logging has
      been re-enabled after being disabled. We can avoid loading
      the page if all buffered redo log records predate the
      MLOG_INDEX_LOAD record.
      
      For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
      records were written before commit aa3f7a10.
      Hence, we will skip these optimizations for tables whose
      name starts with FTS_.
      
      This is joint work with Thirunarayanan Balathandayuthapani.
      
      fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
      latest recovered MLOG_INDEX_LOAD record for a tablespace.
      
      mlog_init: Page initialization operations discovered during
      redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
      and should be removed in MDEV-19176.
      
      recv_addr_state: Add the new state RECV_WILL_NOT_READ to
      indicate that according to mlog_init, the page will be
      initialized based on redo log record contents.
      
      recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
      if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
      as page initialization. This works around bugs in the crash
      recovery of ROW_FORMAT=COMPRESSED tables.
      
      recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
      by resetting the state to RECV_NOT_PROCESSED and by updating
      the fil_name_t::enable_lsn.
      
      recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
      to fil_space_t::enable_lsn.
      
      recv_recover_page(): Add the parameter init_lsn, to ignore
      any log records that precede the page initialization.
      Add DBUG output about skipped operations.
      
      buf_page_create(): Initialize FIL_PAGE_LSN, so that
      recv_recover_page() will not wrongly skip applying
      the page-initialization record due to the field containing
      some newer LSN as a leftover from a different page.
      Do not invoke ibuf_merge_or_delete_for_page() during
      crash recovery.
      
      recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
      Note if a corrupted page was found during recovery.
      After invoking buf_page_create(), do invoke
      ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
      in the last recovery batch.
      
      ibuf_merge_or_delete_for_page(): Relax a debug assertion.
      
      innobase_start_or_create_for_mysql(): Abort startup if
      a corrupted page was found during recovery. Corrupted pages
      will not be flagged if innodb_force_recovery is set.
      However, the recv_sys->found_corrupt_fs flag can be set
      regardless of innodb_force_recovery if file names are found
      to be incorrect (for example, multiple files with the same
      tablespace ID).
      169c0099
    • Marko Mäkelä's avatar
      MDEV-19241 InnoDB fails to write MLOG_INDEX_LOAD upon completing ALTER TABLE · 376bf4ed
      Marko Mäkelä authored
      Similar to what was done in commit aa3f7a10
      for FULLTEXT INDEX, we must ensure that MLOG_INDEX_LOAD records will always
      be written if redo logging was disabled.
      
      row_merge_build_indexes(): Invoke row_merge_write_redo() also when
      online operation is not being executed or an error occurs.
      In case of an error, invoke flush_observer->interrupted() so that
      the pages will not be flushed but merely evicted from the buffer pool.
      Before resuming redo logging, it is crucial for the correctness of
      mariabackup and InnoDB crash recovery to flush or evict all affected pages
      and to write MLOG_INDEX_LOAD records.
      376bf4ed
  8. 15 Apr, 2019 1 commit
  9. 11 Apr, 2019 1 commit
  10. 10 Apr, 2019 2 commits
  11. 09 Apr, 2019 1 commit
  12. 08 Apr, 2019 6 commits
  13. 07 Apr, 2019 5 commits
  14. 06 Apr, 2019 1 commit
    • Marko Mäkelä's avatar
      MDEV-12699 preparation: Clean up recv_sys · 1d30b7b1
      Marko Mäkelä authored
      The recv_sys data structures are accessed not only from the thread
      that executes InnoDB plugin initialization, but also from the
      InnoDB I/O threads, which can invoke recv_recover_page().
      
      Assert that sufficient concurrency control is in place.
      Some code was accessing recv_sys data structures without
      holding recv_sys->mutex.
      
      recv_recover_page(bpage): Refactor the call from buf_page_io_complete()
      into a separate function that performs necessary steps. The
      main thread was unnecessarily releasing and reacquiring recv_sys->mutex.
      
      recv_recover_page(block,mtr,recv_addr): Pass more parameters from
      the caller. Avoid redundant lookups and computations. Eliminate some
      redundant variables.
      
      recv_get_fil_addr_struct(): Assert that recv_sys->mutex is being held.
      That was not always the case!
      
      recv_scan_log_recs(): Acquire recv_sys->mutex for the whole duration
      of the function. (While we are scanning and buffering redo log records,
      no pages can be read in.)
      
      recv_read_in_area(): Properly protect access with recv_sys->mutex.
      
      recv_apply_hashed_log_recs(): Check recv_addr->state only once,
      and continuously hold recv_sys->mutex. The mutex will be released
      and reacquired inside recv_recover_page() and recv_read_in_area(),
      allowing concurrent processing by buf_page_io_complete() in I/O threads.
      1d30b7b1