1. 21 Dec, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-21452 fixup: Fix fake server hang reports · 8c68b549
      Marko Mäkelä authored
      srv_monitor_task(): Make the innodb_fatal_semaphore_wait_threshold
      watchdog tolerate non-monotonic clock. On NUMA systems, the
      my_hrtime_coarse() executed by different NUMA nodes are not in sync,
      and the clock could appear to run backwards. We must treat negative
      time durations as zero, just like we did in
      commit ff5d306e in
      dict_sys_t::mutex_lock_wait().
      
      The wrong logic caused occasional crashes of the test
      mariabackup.apply-log-only-incr when it was run concurrently with
      itself with a large number of instances.
      8c68b549
  2. 19 Dec, 2020 1 commit
  3. 18 Dec, 2020 4 commits
    • Marko Mäkelä's avatar
      MDEV-24445 Using innodb_undo_tablespaces corrupts system tablespace · 0c23e32d
      Marko Mäkelä authored
      In the rewrite of MDEV-8139 (based on MDEV-15528), we introduced a
      wrong assumption that any persistent tablespace that is not an .ibd
      file is the system tablespace. This assumption is broken when
      innodb_undo_tablespaces (files undo001, undo002, ...) are being used.
      By default, we have innodb_undo_tablespaces=0 (the persistent undo
      log is being stored in the system tablespace).
      
      In MDEV-15528 and MDEV-8139 we rewrote the page scrubbing logic
      so that it will follow the tried-and-true write-ahead logging
      protocol, first writing FREE_PAGE records and then in the page
      flushing, zerofilling or hole-punching freed pages.
      
      Unfortunately, the implementation included a wrong assumption that
      that anything that is not in an .ibd file must be the system tablespace.
      This wrong assumption would cause overwrites of valid data pages in
      the system tablespace.
      
      mtr_t::m_freed_in_system_tablespace: Remove.
      
      mtr_t::m_freed_space: The tablespace associated with m_freed_pages.
      
      buf_page_free(): Take the tablespace and page number as a parameter,
      instead of taking a page identifier.
      0c23e32d
    • Marko Mäkelä's avatar
      MDEV-24442 Assertion space->referenced() failed in fil_crypt_space_needs_rotation · cd093d79
      Marko Mäkelä authored
      A race condition between deleting an .ibd file and fil_crypt_thread
      marking pages dirty was introduced in
      commit 118e258a (part of MDEV-23855).
      
      fil_space_t::acquire_if_not_stopped(): Correctly return false
      if the STOPPING flag is set, indicating that any further activity
      on the tablespace must be avoided. Also, remove the constant parameter
      have_mutex=true and move the function declaration to the same
      compilation unit with the only callers.
      
      fil_crypt_flush_space(): Remove an unused variable.
      cd093d79
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 4e0004ea
      Marko Mäkelä authored
      4e0004ea
    • Marko Mäkelä's avatar
      MDEV-24426 fixup: Assertion failure on shutdown · a1974d19
      Marko Mäkelä authored
      fil_crypt_find_space_to_rotate(): Always treat the sentinel value
      that indicates that we have run out of work, even if at the same
      time the thread should shut down due to other reasons.
      
      Thanks to Matthias Leich for reproducing this bug with RQG.
      a1974d19
  4. 17 Dec, 2020 2 commits
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · c36a2a0d
      Marko Mäkelä authored
      c36a2a0d
    • Marko Mäkelä's avatar
      MDEV-24426 fil_crypt_thread keep spinning even if innodb_encryption_rotate_key_age=0 · 1fe3dd00
      Marko Mäkelä authored
      After MDEV-15528, two modes of operation in the fil_crypt_thread
      remains, depending on whether innodb_encryption_rotate_key_age=0
      (whether key rotation is disabled). If the key rotation is disabled,
      the fil_crypt_thread miss the opportunity to sleep, which will result
      in lots of wasted CPU usage.
      
      fil_crypt_return_iops(): Add a parameter to specify whether other
      fil_crypt_thread should be woken up.
      
      fil_system_t::keyrotate_next(): Return the special value
      fil_system.temp_space to indicate that no work is to be done.
      
      fil_space_t::next(): Propagage the special value fil_system.temp_space
      to the caller.
      
      fil_crypt_find_space_to_rotate(): If no work is to be done,
      do not wake up other threads.
      1fe3dd00
  5. 16 Dec, 2020 2 commits
    • Marko Mäkelä's avatar
      Speed up mariabackup.xb_compressed_encrypted · af1335c2
      Marko Mäkelä authored
      With system mutexes, contention can be very expensive.
      Let us configure innodb_encryption_threads=1 to minimize contention.
      The actual work is being done in buf_flush_page_cleaner thread anyway.
      af1335c2
    • Marko Mäkelä's avatar
      MDEV-24167 fixup: Wake up all update_lock() in u_unlock() · 07e4b6b2
      Marko Mäkelä authored
      It turns out that the hang that was fixed in
      commit 43d3dad1
      for the SRW_LOCK_DUMMY implementation is also possible in the futex
      implementation. We have observed hangs of ssux_lock_low::u_unlock()
      on Windows where the undesirable value is rw_lock::UPDATER, in the
      test mariabackup.xb_compressed_encrypted.
      
      The exact sequence of events to the hang is not known, but
      it seems that u_unlock() had better always wake up one thread.
      Possibly, the case involves multiple blocked u_unlock().
      
      On a busy server, the hang might be 'rescued' by a subsequent
      lock acquisition and release that is executed by another thread.
      
      rw_lock::update_unlock(): Change the return type to void.
      
      ssux_lock_low::u_unlock(): Always invoke readers_wake() [sic],
      to wake up any pending update_lock() or write_lock().
      On futex implementation, this will wake up all waiters.
      On SRW_LOCK_DUMMY, writer_wake() and readers_wake() do the same
      thing: wake up one write_lock(), or all update_lock() waiters.
      07e4b6b2
  6. 15 Dec, 2020 19 commits
    • Etienne Guesnet's avatar
      Contain AIX perror · 6bb3949e
      Etienne Guesnet authored
      6bb3949e
    • Etienne Guesnet's avatar
      Fix build on GCC 5 · 2ce48f06
      Etienne Guesnet authored
      2ce48f06
    • Etienne Guesnet's avatar
      Add LARGE_FILES flag for GCC AIX build · a6e90992
      Etienne Guesnet authored
      a6e90992
    • Etienne Guesnet's avatar
      Add -berok for head test on AIX · 4fade4da
      Etienne Guesnet authored
      4fade4da
    • Etienne Guesnet's avatar
      Parse GSSAPI flags on AIX · 2dee6a74
      Etienne Guesnet authored
      2dee6a74
    • Etienne Guesnet's avatar
      Add flags for AIX build · 1d7fc728
      Etienne Guesnet authored
      1d7fc728
    • Etienne Guesnet's avatar
      Remove -Werror for AIX · b23e5457
      Etienne Guesnet authored
      b23e5457
    • Etienne Guesnet's avatar
      AIX workaround for GCC include bug · 1a49619a
      Etienne Guesnet authored
      1a49619a
    • Etienne Guesnet's avatar
      AIX workaround for GCC TOC bug · 2c724762
      Etienne Guesnet authored
      2c724762
    • Etienne Guesnet's avatar
      Support of AIX for auth_socket plugin · 77d7de8d
      Etienne Guesnet authored
      77d7de8d
    • Etienne Guesnet's avatar
      Add build on AIX · 2f5d3724
      Etienne Guesnet authored
      2f5d3724
    • Marko Mäkelä's avatar
      MDEV-21452: Retain the watchdog only on dict_sys.mutex, for performance · cf2480dd
      Marko Mäkelä authored
      Most hangs seem to involve dict_sys.mutex. While holding lock_sys.mutex
      we rarely acquire any buffer pool page latches, which are a frequent
      source of potential hangs.
      cf2480dd
    • Marko Mäkelä's avatar
      MDEV-21452: Replace ib_mutex_t with mysql_mutex_t · ff5d306e
      Marko Mäkelä authored
      SHOW ENGINE INNODB MUTEX functionality is completely removed,
      as are the InnoDB latching order checks.
      
      We will enforce innodb_fatal_semaphore_wait_threshold
      only for dict_sys.mutex and lock_sys.mutex.
      
      dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex.
      
      lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex.
      
      FIXME: srv_sys should be removed altogether; it is duplicating tpool
      functionality.
      
      fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must
      not hold fil_system.mutex.
      
      fil_close_all_files(): To prevent SAFE_MUTEX warnings for
      fil_space_destroy_crypt_data(), we must not hold fil_system.mutex
      while invoking fil_space_free_low() on a detached tablespace.
      ff5d306e
    • Marko Mäkelä's avatar
      MDEV-21452: Remove os_event_t, MUTEX_EVENT, TTASEventMutex, sync_array · db006a9a
      Marko Mäkelä authored
      We will default to MUTEXTYPE=sys (using OSTrackMutex) for those
      ib_mutex_t that have not been replaced yet.
      
      The view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is removed.
      
      The parameter innodb_sync_array_size is removed.
      
      FIXME: innodb_fatal_semaphore_wait_threshold will no longer be enforced.
      We should enforce it for lock_sys.mutex and dict_sys.mutex somehow!
      
      innodb_sync_debug=ON might still cover ib_mutex_t.
      db006a9a
    • Marko Mäkelä's avatar
      MDEV-21452: Replace all direct use of os_event_t · 38fd7b7d
      Marko Mäkelä authored
      Let us replace os_event_t with mysql_cond_t, and replace the
      necessary ib_mutex_t with mysql_mutex_t so that they can be
      used with condition variables.
      
      Also, let us replace polling (os_thread_sleep() or timed waits)
      with plain mysql_cond_wait() wherever possible.
      
      Furthermore, we will use the lightweight srw_mutex for trx_t::mutex,
      to hopefully reduce contention on lock_sys.mutex.
      
      FIXME: Add test coverage of
      mariabackup --backup --kill-long-queries-timeout
      38fd7b7d
    • Marko Mäkelä's avatar
      Fix the SRW_LOCK_DUMMY build with PLUGIN_PERFSCHEMA=NO · 59b2848a
      Marko Mäkelä authored
      srw_lock_low: Declare the member functions public when wrapping rw_lock_t
      59b2848a
    • Marko Mäkelä's avatar
      MDEV-24410: Bug in SRW_LOCK_DUMMY rw_lock_t wrapper · 20da7b22
      Marko Mäkelä authored
      In commit 43d3dad1 we forgot to
      invert the return values of rw_tryrdlock() and rw_trywrlock(),
      causing strange failures.
      20da7b22
    • Marko Mäkelä's avatar
      MDEV-24142/MDEV-24167 fixup: Split ssux_lock and srw_lock · 43d3dad1
      Marko Mäkelä authored
      This conceptually reverts commit 1fdc161d
      and reintroduces an option for srw_lock to wrap a native implementation.
      
      The srw_lock and srw_lock_low differ from ssux_lock and ssux_lock_low
      in that Slim SUX locks support three modes (Shared, Update, eXclusive)
      while Slim RW locks support only two (Read, Write).
      
      On Microsoft Windows, the srw_lock will be implemented by SRWLOCK.
      On Linux and OpenBSD, it will be implemented by rw_lock and the
      futex system call, just like earlier.
      On other systems or if SRW_LOCK_DUMMY is defined on anything else
      than Microsoft Windows, rw_lock_t will be used.
      
      ssux_lock_low::read_lock(), ssux_lock_low::update_lock(): Correct
      the SRW_LOCK_DUMMY implementation to prevent hangs. The intention of
      commit 1fdc161d seems to have been
      do ... while loops, but the 'do' keyword was missing. This total
      breakage was missed in commit 260161fc
      which did reduce the probability of the hangs.
      
      ssux_lock_low::u_unlock(): In the SRW_LOCK_DUMMY implementation
      (based on a mutex and two condition variables), always invoke
      writer_wake() in order to ensure that a waiting update_lock()
      will be woken up.
      
      ssux_lock_low::writer_wait(), ssux_lock_low::readers_wait():
      In the SRW_LOCK_DUMMY implementation, keep waiting for the signal
      until the lock word has changed. The "while" had been changed to "if"
      in order to avoid hangs.
      43d3dad1
    • zhaorenhai's avatar
      MDEV-24366 Use environment variables as S3 test case variables · ee69c153
      zhaorenhai authored
      Move the S3 test case variables to suite.pm to use environment variables.
      
      Use minio credentials if a TCP connection to localhost:9000 is accepted
      so the current build works corrected.
      
      Reviewer: Daniel Black
      
      closes #1711
      ee69c153
  7. 14 Dec, 2020 7 commits
    • Stepan Patryshev's avatar
      e4c25895
    • Stepan Patryshev's avatar
      1c660211
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 9ecd7665
      Marko Mäkelä authored
      9ecd7665
    • Marko Mäkelä's avatar
      MDEV-24313 fixup: GCC 8 -Wconversion · e8217d07
      Marko Mäkelä authored
      e8217d07
    • Marko Mäkelä's avatar
      MDEV-24313 fixup: GCC -Wparentheses · 2c226e01
      Marko Mäkelä authored
      2c226e01
    • Marko Mäkelä's avatar
      MDEV-24313 (2 of 2): Silently ignored innodb_use_native_aio=1 · f24b7383
      Marko Mäkelä authored
      In commit 5e62b6a5 (MDEV-16264)
      the logic of os_aio_init() was changed so that it will never fail,
      but instead automatically disable innodb_use_native_aio (which is
      enabled by default) if the io_setup() system call would fail due
      to resource limits being exceeded. This is questionable, especially
      because falling back to simulated AIO may lead to significantly
      reduced performance.
      
      srv_n_file_io_threads, srv_n_read_io_threads, srv_n_write_io_threads:
      Change the data type from ulong to uint.
      
      os_aio_init(): Remove the parameters, and actually return an error code.
      
      thread_pool::configure_aio(): Do not silently fall back to simulated AIO.
      
      Reviewed by: Vladislav Vaintroub
      f24b7383
    • Marko Mäkelä's avatar
      MDEV-24313 (1 of 2): Hang with innodb_write_io_threads=1 · 17d3f856
      Marko Mäkelä authored
      After commit a5a2ef07 (part of MDEV-23855)
      implemented asynchronous doublewrite, it is possible that the server will
      hang when the following parametes are in effect:
      
          innodb_doublewrite=1 (default)
          innodb_write_io_threads=1
          innodb_use_native_aio=0
      
      Note: In commit 5e62b6a5 (MDEV-16264)
      the logic of os_aio_init() was changed so that it will never fail,
      but instead automatically disable innodb_use_native_aio (which is
      enabled by default) if the io_setup() system call would fail due
      to resource limits being exceeded.
      
      Before commit a5a2ef07, we used
      a synchronous write for the doublewrite buffer batches, always at
      most 64 pages at a time. So, upon completing a doublewrite batch,
      a single thread would submit at most 64 page writes (for the
      individual pages that were first written to the doublewrite buffer).
      With that commit, we may submit up to 128 page writes at a time.
      
      The maximum number of outstanding requests per thread is 256.
      Because the maximum number of asynchronous write submissions per
      thread was roughly doubled, it is now possible that
      buf_dblwr_t::flush_buffered_writes_completed() will hang in
      io_slots::acquire(), called via os_aio() and fil_space_t::io(),
      when submitting writes of the individual blocks.
      
      We will prevent this type of hang by increasing the minimum number
      of innodb_write_io_threads from 1 to 2, so that this type of hang
      would only become possible when 512 outstanding write requests
      are exceeded.
      17d3f856
  8. 11 Dec, 2020 2 commits
    • Varun Gupta's avatar
      MDEV-24353: Adding GROUP BY slows down a query · d79c3f32
      Varun Gupta authored
      A heuristic in best_access_path says that if for an index
      ref access involved key parts which are greater than equal to that
      for range access, then range access should not be considered.
      The assumption made by this heuristic does not hold when
      the range optimizer opted to use the group-by min-max optimization.
      So the fix here would be to not consider the heuristic if
      the range optimizer picked the usage of group-by min-max
      optimization.
      d79c3f32
    • Marko Mäkelä's avatar
      MDEV-24391 heap-use-after-free in fil_space_t::flush_low() · 8677c14e
      Marko Mäkelä authored
      We observed a race condition that involved two threads
      executing fil_flush_file_spaces() and one thread
      executing fil_delete_tablespace(). After one of the
      fil_flush_file_spaces() observed that
      space.needs_flush_not_stopping() is set and was
      releasing the fil_system.mutex, the other fil_flush_file_spaces()
      would complete the execution of fil_space_t::flush_low() on
      the same tablespace. Then, fil_delete_tablespace() would
      destroy the object, because the value of fil_space_t::n_pending
      did not prevent that. Finally, the fil_flush_file_spaces() would
      resume execution and invoke fil_space_t::flush_low() on the freed
      object.
      
      This race condition was introduced in
      commit 118e258a of MDEV-23855.
      
      fil_space_t::flush(): Add a template parameter that indicates
      whether the caller is holding a reference to prevent the
      tablespace from being freed.
      
      buf_dblwr_t::flush_buffered_writes_completed(),
      row_quiesce_table_start(): Acquire a reference for the duration
      of the fil_space_t::flush_low() operation. It should be impossible
      for the object to be freed in these code paths, but we want to
      satisfy the debug assertions.
      
      fil_space_t::flush_low(): Do not increment or decrement the
      reference count, but instead assert that the caller is holding
      a reference.
      
      fil_space_extend_must_retry(), fil_flush_file_spaces():
      Acquire a reference before releasing fil_system.mutex.
      This is what will fix the race condition.
      8677c14e
  9. 09 Dec, 2020 2 commits
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · be4d2665
      Marko Mäkelä authored
      be4d2665
    • Marko Mäkelä's avatar
      Remove unused DBUG_EXECUTE_IF "ignore_punch_hole" · 0c7c4492
      Marko Mäkelä authored
      Since commit ea21d630 we
      conditionally define a variable that only plays a role on
      systems that support hole-punching (explicit creation of sparse files).
      However, that broke debug builds on such systems.
      
      It turns out that the debug_dbug label "ignore_punch_hole" is
      not at all used in MariaDB server. It would be covered by
      the MySQL 5.7 test innodb.table_compress. (Note: MariaDB 10.1
      implemented page_compressed tables before something comparable
      appeared in MySQL 5.7.)
      0c7c4492