1. 16 Dec, 2019 2 commits
    • Marko Mäkelä's avatar
      MDEV-16678: Actually ignore #sql-ib tables · 89633995
      Marko Mäkelä authored
      Apparently, regular expression operations that remove entire lines
      of output do not work with list_files, and hence the adjustments in
      commit 1c282d4b were ineffective.
      
      For cat_file (preceded by list_files_write_file) the replace_regex
      does work.
      
      For some reason, for suite/parts/inc/partition_crash_exchange.inc
      some file names will be lost when using list_files_write_file
      instead of list_files.
      
      We use a precise pattern match. dict_mem_create_temporary_tablename()
      is generating #sql-ib names followed by decimal digits only.
      89633995
    • Marko Mäkelä's avatar
      Merge 10.4 into 10.5 · 28c89b71
      Marko Mäkelä authored
      28c89b71
  2. 13 Dec, 2019 8 commits
  3. 12 Dec, 2019 8 commits
    • Eugene Kosov's avatar
      MDEV-20950 Reduce size of record offsets · f0aa073f
      Eugene Kosov authored
      offset_t: this is a type which represents one record offset.
      It's unsigned short int.
      
      a lot of functions: replace ulint with offset_t
      
      btr_pcur_restore_position_func(),
      page_validate(),
      row_ins_scan_sec_index_for_duplicate(),
      row_upd_clust_rec_by_insert_inherit_func(),
      row_vers_impl_x_locked_low(),
      trx_undo_prev_version_build():
        allocate record offsets on the stack instead of waiting for rec_get_offsets()
        to allocate it from mem_heap_t. So, reducing  memory allocations.
      
      RECORD_OFFSET, INDEX_OFFSET:
        now it's less convenient to store pointers in offset_t*
        array. One pointer occupies now several offset_t. And those constant are start
        indexes into array to places where to store pointer values
      
      REC_OFFS_HEADER_SIZE: adjusted for the new reality
      
      REC_OFFS_NORMAL_SIZE:
        increase size from 100 to 300 which means less heap allocations.
        And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
        is smaller than previous 800 bytes.
      
      REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality
      
      rem0rec.h, rem0rec.ic, rem0rec.cc:
        various arguments, return values and local variables types were changed to
        fix numerous integer conversions issues.
      
      enum field_type_t:
        offset types concept was introduces which replaces old offset flags stuff.
        Like in earlier version, 2 upper bits are used to store offset type.
        And this enum represents those types.
      
      REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed
      
      get_type(), set_type(), get_value(), combine():
        these are convenience functions to work with offsets and it's types
      
      rec_offs_base()[0]:
        still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL
      
      rec_offs_base()[i]:
        these have type offset_t now. Two upper bits contains type.
      f0aa073f
    • Eugene Kosov's avatar
      optimize crash recovery · 014e1258
      Eugene Kosov authored
      recv_dblwr_t::list is used for appending to the beginning and iterating
      through its elements. std::deque fits better for that purpose because
      it does less allocations than std::forward_list and provides better memory
      locality.
      014e1258
    • Eugene Kosov's avatar
      fix a memory leak introduced by f4b42846 · ce41a907
      Eugene Kosov authored
      ce41a907
    • Marko Mäkelä's avatar
      Merge 10.2 into 10.3 · 0a20e5ab
      Marko Mäkelä authored
      0a20e5ab
    • Alexander Barkov's avatar
      MDEV-20667 Server crash on pop_cursor · e0f9540b
      Alexander Barkov authored
      When backpatching a forward GOTO label, the old code erroneously
      used CURSOR/HANDLER difference between context frames "c" and "a" to tune
      a cpop/hpop command. So the cpop/hpop command later tried to pop
      all cursors/handlers declared between "a" and "c", but those between
      "b" and "c" were not cpushed/hpoped yet during the execution of "GOTO x".
      
      Fixing the code to use the difference between frames "b" and "a" only.
      
      BEGIN     -- a
       ...
      GOTO x;   -- b
       ...
      <<x>>     -- c
       ...
      END       -- d
      e0f9540b
    • Vladislav Vaintroub's avatar
      MDEV-21260 Innodb/Wjndows do not report error when trying open volumes on UNC paths · 3304004a
      Vladislav Vaintroub authored
      fil_node_t::find_metadata() tries to find out whether file
      is on an SSD, and the disk sector size.
      On Windows, it opens the corresponding volume for finding this data.
      
      This does not go well, if datadir is on network path/UNC. The volume name
      is invalid, CreateFile() function fails, and a cryptic (from the end user
      perspective) error is reported. Like this
      
      [ERROR] InnoDB: File \\.\\\workpc\work: 'CreateFile()' returned OS error 203.
      
      The fix is not to report error if open volume failed, and the path was not
      on fixed disk, i.e not on HDD or SSD. This is not a fatal error, there is
      a fallback anyway.
      3304004a
    • Vladislav Vaintroub's avatar
      git ignore generated stuff · 71e47f34
      Vladislav Vaintroub authored
      71e47f34
    • Vlad Lesin's avatar
      MDEV-21255: Deadlock of parallel slave and mariabackup (with failed log · beec9c0e
      Vlad Lesin authored
      copy thread)
      
      mariabackup hangs waiting until innodb redo log thread read log till certain
      LSN, and it waits under FTWRL. If there is redo log read error in the thread,
      it is finished, and main thread knows nothing about it, what leads to hanging.
      As it hangs under FTWRL, slave threads on server side can be blocked due
      to MDL lock conflict.
      
      The fix is to finish mariabackup with error message on innodb redo log read
      failure.
      beec9c0e
  4. 11 Dec, 2019 8 commits
  5. 10 Dec, 2019 11 commits
    • Marko Mäkelä's avatar
      Cleanup test sys_vars.innodb_buffer_pool_size_basic · f2d3b2ee
      Marko Mäkelä authored
      When using huge pages, the innodb_buffer_pool_size cannot necessarily
      be restored. Simplify things by restarting the server.
      f2d3b2ee
    • Marko Mäkelä's avatar
      MDEV-14482 - Cache line contention on ut_rnd_interval() · 41e6a154
      Marko Mäkelä authored
      InnoDB RNG maintains global state, causing otherwise unnecessary bus
      traffic. Even worse, this is cross-mutex traffic. That is, different
      mutexes suffer from contention.
      
      Fixed delay of 4 was verified to give best throughput by OLTP update
      index and read-write benchmarks on Intel Broadwell (2/20/40) and
      ARM (1/46/46).
      
      This is a backport of ce047900 from
      MariaDB Server 10.3.
      41e6a154
    • Marko Mäkelä's avatar
      MDEV-21256: Replace the 64-bit LCG with a 32-bit Galois LFSR · b1f2d3a8
      Marko Mäkelä authored
      We should not need anywhere near 32 bits of entropy, so we might
      just limit ourselves to a 32-bit random number generator.
      
      Also, it might be cheaper to use exclusive-or, bit shifting and
      conditional jumps, instead of multiplication and addition.
      
      We use relaxed atomic operations on the global random number generator
      state in order in an attempt to silence any warnings about race conditions.
      There is an obvious race condition between the load and store in
      ut_rnd_gen(), but we do not think that it matters much that the
      state of the random number generator could 'stutter'.
      
      This change seems makes the 'uncompress_ops' nondeterministic
      in innodb_zip.cmp_per_index after the restart. It looks like
      there is an inherent race condition in the test, because the
      table could be opened for InnoDB statistics recalculation
      already before innodb_cmp_per_index_enabled was set. We might
      end up having uncompress_ops anywhere between 0 and 9, or perhaps
      even more. Let us remove that part of the test.
      b1f2d3a8
    • Marko Mäkelä's avatar
      MDEV-21256: Simplify ut_rnd_interval() · d146e3dc
      Marko Mäkelä authored
      ut_rnd_interval(): Remove the first parameter, which was mostly
      passed as 0. Implement as a simple wrapper around ut_rnd_gen().
      Trivially return 0 if the size of the interval is smaller than 2.
      
      ut_rnd_ulint_counter, ut_rnd_gen_next_ulint(), ut_rnd_gen_ulint(): Remove.
      d146e3dc
    • Marko Mäkelä's avatar
      MDEV-21256: Reduce the use of ut_rnd_gen_next_ulint() · 51fc8ab7
      Marko Mäkelä authored
      ut_rnd_set_seed(): Unused function; remove.
      
      ut_rnd_gen(): Renamed from page_cur_lcg_prng().
      
      ut_rnd_current: The internal state of ut_rnd_gen().
      
      page_cur_open_on_rnd_user_rec(): Replace linear search with
      page_rec_get_nth().
      51fc8ab7
    • Marko Mäkelä's avatar
      MDEV-16678: Fix a problem with duplicate #sql2 table names · adb117cf
      Marko Mäkelä authored
      row_drop_table_for_mysql(): If a #sql2 table is open in another
      thread (purge) while we attempting to drop it, rename it to #sql-ib
      name to hide it from the SQL layer.
      
      Adjust tests accordingly to hide #sql-ib tables, which can
      continue to exist until the background DROP TABLE completes.
      adb117cf
    • Eugene Kosov's avatar
      MDEV-21223 innodb_fts.sync_ddl fails in buildbot, server crashed in que_thr_step · 4c0854f2
      Eugene Kosov authored
      FreeState(): replace pointer to freed memory with NULL. This actually fixes a crash
      which is use-after-free as reported by ASAN
      
      DbugParse(): unconditionally lock mutex because we're touching shared init_settings.keywords
      4c0854f2
    • Elena Stepanova's avatar
      7c2c420b
    • Marko Mäkelä's avatar
      MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks · ea37b144
      Marko Mäkelä authored
      This is joint work with Thirunarayanan Balathandayuthapani.
      The MDL interface between InnoDB and the rest of the server
      (in storage/innobase/dict/dict0dict.cc and in include/)
      is my work, while most everything else is Thiru's.
      
      The collection of InnoDB persistent statistics and the
      defragmentation were not refactored to use MDL. They will
      keep relying on lower-level interlocking with
      fil_check_pending_operations().
      
      The purge of transaction history and the background operations on
      fulltext indexes will use MDL. We will revert
      commit 2c4844c9
      (MDEV-17813) because thanks to MDL, purge cannot conflict
      with DDL operations anymore. For a similar reason, we will remove
      the MDEV-16222 test case from gcol.innodb_virtual_debug_purge.
      
      Purge is essentially replacing all use of the global dict_sys.latch
      with MDL. Purge will skip the undo log records for tables whose names
      start with #sql-ib or #sql2. Theoretically, such tables might
      be renamed back to visible table names if TRUNCATE fails to
      create a new table, or the final rename in ALTER TABLE...ALGORITHM=COPY
      fails. In that case, purge could permanently leave some garbage
      in the table. Such garbage will be tolerated; the table would not
      be considered corrupted.
      
      To avoid repeated MDL releases and acquisitions,
      trx_purge_attach_undo_recs() will sort undo log records by table_id,
      and purge_node_t will keep the MDL and table handle open for multiple
      successive undo log records.
      
      get_purge_table(): A new accessor, used during the purge of
      history for indexed virtual columns. This interface should ideally
      not exist at all.
      
      thd_mdl_context(): Accessor of THD::mdl_context.
      Wrapped in a new thd_mdl_service.
      
      dict_get_db_name_len(): Define inline.
      
      dict_acquire_mdl_shared(): Acquire explicit shared MDL on a table name
      if needed.
      
      dict_table_open_on_id(): Return MDL_ticket, if requested.
      
      dict_table_close(): Release MDL ticket, if requested.
      
      dict_fts_index_syncing(), dict_index_t::index_fts_syncing: Remove.
      row_drop_table_for_mysql() no longer needs to check these, because
      MDL guarantees that a fulltext index sync will not be in progress
      while MDL_EXCLUSIVE is protecting a DDL operation.
      
      dict_table_t::parse_name(): Parse the table name for acquiring MDL.
      
      purge_node_t::undo_recs: Change the type to std::list<trx_purge_rec_t*>
      (different container, and storing also roll_ptr).
      
      purge_node_t: Add mdl_ticket, last_table_id, purge_thd, mdl_hold_recs
      for acquiring MDL and for keeping the table open across multiple
      undo log records.
      
      purge_vcol_info_t, row_purge_store_vsec_cur(), row_purge_restore_vsec_cur():
      Remove. We will acquire the MDL earlier.
      
      purge_sys_t::heap: Added, for reading undo log records.
      
      fts_sync_during_ddl(): Invoked during ALGORITHM=INPLACE operations
      to ensure that fts_sync_table() will not conflict with MDL_EXCLUSIVE.
      Uses fts_t::sync_message for bookkeeping.
      ea37b144
    • Oleksandr Byelkin's avatar
      MDEV-18460: Server crashed in strmake / tdc_create_key / THD::create_tmp_table_def_key · af650c76
      Oleksandr Byelkin authored
      When there is a WITH clause we postpone check for tables without
      database for later stages when tables in WITH will be defined.
      But we should not try to open such tables as temporary tables because
      temporary tables always belong to a some database.
      af650c76
    • Varun Gupta's avatar
      MDEV-20900: IN predicate to IN subquery conversion causes performance regression · 246e2ae1
      Varun Gupta authored
      Disable the IN predicate to IN subquery conversion when the types on the left and
      right hand side of the IN predicate are not of comparable type.
      246e2ae1
  6. 09 Dec, 2019 3 commits
    • Vladislav Vaintroub's avatar
      MDEV-21262 MTR does not work with Windows ASAN builds · e47bd007
      Vladislav Vaintroub authored
      Do not use suppressions on Windows, the ASAN_OPTIONS can't be parsed
      whenever full file names are used, because file names contain ':'
      which is ASAN_OPTIONS delimeter.
      e47bd007
    • Vladislav Vaintroub's avatar
      MDEV-16264 - some improvements · 66de4fef
      Vladislav Vaintroub authored
      - wait notification, tpool_wait_begin/tpool_wait_end - to notify the
      threadpool that current thread is going to wait
      
      Use it to wait for IOs to complete and also when purge waits for workers.
      66de4fef
    • Marko Mäkelä's avatar
      MDEV-21259 Assertion failed in mtr_t::write() · d3b2625b
      Marko Mäkelä authored
      btr_free_externally_stored_field(): Pass w=mtr_t::OPT to
      note that the BTR_EXTERN_LEN is not necessarily changing
      when a multi-page ROW_FORMAT=COMPRESSED off-page column
      is being freed, and to allow redundant writes to the redo
      log to be optimized away.
      
      Ever since commit 56f6dab1
      the refactored function mtr_t::write() asserts by default
      that the page contents is being changed.
      d3b2625b