MDEV-24302: RESET MASTER hangs

Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed, we are observing sporadic hangs during the execution of the RESET MASTER statement. We are hoping to fix the hangs with these changes, but due to the rather infrequent occurrence of the hangs and our inability to reliably reproduce the hangs, we cannot be sure of this. What we do know is that innodb_force_recovery=2 (or a larger setting) will prevent srv_master_callback (the former srv_master_thread) from running. In that mode, periodic log flushes would never occur and RESET MASTER could hang indefinitely. That is demonstrated by the new test case that was developed by Andrei Elkin. We fix this case by implementing a special case for it. This also includes some code cleanup and renames of misleadingly named code. The interface has nothing to do with log checkpoints in the storage engine; it is only about requesting log writes to be persistent. handlerton::commit_checkpoint_request, commit_checkpoint_notify_ha(): Remove the unused parameter hton. log_requests.start: Replaces pending_checkpoint_list. log_requests.end: Replaces pending_checkpoint_list_end. log_requests.mutex: Replaces pending_checkpoint_mutex. log_flush_notify_and_unlock(), log_flush_notify(): Replaces innobase_mysql_log_notify(). The new implementation should be functionally equivalent to the old one. innodb_log_flush_request(): Replaces innobase_checkpoint_request(). Implement a fast path for common cases, and reduce the mutex hold time. POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha() for the current request if it is already satisfied, as well as invoke log_flush_notify_and_unlock() for any satisfied requests. log_write(): Invoke log_flush_notify() when the write is already durable. This was missing WITH_PMEM when the log is in persistent memory. Reviewed by: Vladislav Vaintroub

MDEV-24302: RESET MASTER hangs
Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed, we are observing sporadic hangs during the execution of the RESET MASTER statement. We are hoping to fix the hangs with these changes, but due to the rather infrequent occurrence of the hangs and our inability to reliably reproduce the hangs, we cannot be sure of this. What we do know is that innodb_force_recovery=2 (or a larger setting) will prevent srv_master_callback (the former srv_master_thread) from running. In that mode, periodic log flushes would never occur and RESET MASTER could hang indefinitely. That is demonstrated by the new test case that was developed by Andrei Elkin. We fix this case by implementing a special case for it. This also includes some code cleanup and renames of misleadingly named code. The interface has nothing to do with log checkpoints in the storage engine; it is only about requesting log writes to be persistent. handlerton::commit_checkpoint_request, commit_checkpoint_notify_ha(): Remove the unused parameter hton. log_requests.start: Replaces pending_checkpoint_list. log_requests.end: Replaces pending_checkpoint_list_end. log_requests.mutex: Replaces pending_checkpoint_mutex. log_flush_notify_and_unlock(), log_flush_notify(): Replaces innobase_mysql_log_notify(). The new implementation should be functionally equivalent to the old one. innodb_log_flush_request(): Replaces innobase_checkpoint_request(). Implement a fast path for common cases, and reduce the mutex hold time. POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha() for the current request if it is already satisfied, as well as invoke log_flush_notify_and_unlock() for any satisfied requests. log_write(): Invoke log_flush_notify() when the write is already durable. This was missing WITH_PMEM when the log is in persistent memory. Reviewed by: Vladislav Vaintroub
e8b7fceb · Marko Mäkelä · 8e2d69f7 · e8b7fceb · e8b7fceb · e8b7fceb
Commit e8b7fceb authored Mar 29, 2021 by Marko Mäkelä
9 changed files
--- a/mysql-test/suite/innodb/r/group_commit_force_recovery.result
+++ b/mysql-test/suite/innodb/r/group_commit_force_recovery.result
+CREATE TABLE t1(a int) ENGINE=InnoDB;
+INSERT INTO t1 SET a=1;
+RESET MASTER;
+DROP TABLE t1;
+End of the tests.
--- a/mysql-test/suite/innodb/t/group_commit_force_recovery-master.opt
+++ b/mysql-test/suite/innodb/t/group_commit_force_recovery-master.opt
+--innodb-force-recovery=2
--- a/mysql-test/suite/innodb/t/group_commit_force_recovery.test
+++ b/mysql-test/suite/innodb/t/group_commit_force_recovery.test
+# MDEV-24302 RESET MASTER hangs as Innodb does not report on binlog checkpoint
+# Testing binlog checkpoint notification works under stringent condition
+# set by innodb_force_recovery = 2.
+
+--source include/have_innodb.inc
+--source include/have_binlog_format_mixed.inc
+
+# Binlog checkpoint notification consumers such as RESET MASTER
+# receive one when lsn_0 at the time of the request is finally gets flushed
+#   flush_lsn >= lsn_0
+# The bug situation was that when lsn_0 reflects a write of an internal innodb trx
+# and RESET MASTER was not followed by any more user transaction
+# it would hang.
+
+CREATE TABLE t1(a int) ENGINE=InnoDB;
+INSERT INTO t1 SET a=1;
+RESET MASTER;
+
+# final cleanup
+DROP TABLE t1;
+--echo End of the tests.
--- a/sql/handler.cc
+++ b/sql/handler.cc
 /* Copyright (c) 2000, 2016, Oracle and/or its affiliates.
-   Copyright (c) 2009, 2020, MariaDB Corporation.
+   Copyright (c) 2009, 2021, MariaDB Corporation.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
@@ -861,7 +861,7 @@ static my_bool commit_checkpoint_request_handlerton(THD *unused1, plugin_ref plu
    void *cookie= st->cookie;
    if (st->pre_hook)
      (*st->pre_hook)(cookie);
-    (*hton->commit_checkpoint_request)(hton, cookie);
+    (*hton->commit_checkpoint_request)(cookie);
  }
  return FALSE;
 }
@@ -2437,8 +2437,7 @@ int ha_recover(HASH *commit_list)
  Called by engine to notify TC that a new commit checkpoint has been reached.
  See comments on handlerton method commit_checkpoint_request() for details.
 */
-void
-commit_checkpoint_notify_ha(handlerton *hton, void *cookie)
+void commit_checkpoint_notify_ha(void *cookie)
 {
  tc_log->commit_checkpoint_notify(cookie);
 }

--- a/sql/handler.h
+++ b/sql/handler.h
@@ -1471,7 +1471,7 @@ struct handlerton
     recovery. It uses that to reduce the work needed for any subsequent XA
     recovery process.
   */
-   void (*commit_checkpoint_request)(handlerton *hton, void *cookie);
+   void (*commit_checkpoint_request)(void *cookie);
  /*
    "Disable or enable checkpointing internal to the storage engine. This is
    used for FLUSH TABLES WITH READ LOCK AND DISABLE CHECKPOINT to ensure that
@@ -5211,7 +5211,7 @@ void trans_register_ha(THD *thd, bool all, handlerton *ht,

 const char *get_canonical_filename(handler *file, const char *path,
                                   char *tmp_path);
-void commit_checkpoint_notify_ha(handlerton *hton, void *cookie);
+void commit_checkpoint_notify_ha(void *cookie);

 inline const LEX_CSTRING *table_case_name(HA_CREATE_INFO *info, const LEX_CSTRING *name)
 {

--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
--- a/storage/innobase/include/ha_prototypes.h
+++ b/storage/innobase/include/ha_prototypes.h
@@ -145,16 +145,6 @@ innobase_mysql_print_thd(
 	uint	max_query_len);	/*!< in: max query length to print, or 0 to
 				   use the default max length */

-/*****************************************************************//**
-Log code calls this whenever log has been written and/or flushed up
-to a new position. We use this to notify upper layer of a new commit
-checkpoint when necessary.*/
-UNIV_INTERN
-void
-innobase_mysql_log_notify(
-/*======================*/
-	ib_uint64_t	flush_lsn);	/*!< in: LSN flushed to disk */
-
 /** Converts a MySQL type to an InnoDB type. Note that this function returns
 the 'mtype' of InnoDB. InnoDB differentiates between MySQL's old <= 4.1
 VARCHAR and the new true VARCHAR in >= 5.0.3 by the 'prtype'.

--- a/storage/innobase/log/log0log.cc
+++ b/storage/innobase/log/log0log.cc
@@ -657,6 +657,10 @@ log_buffer_switch()
 	log_sys.buf_next_to_write = log_sys.buf_free;
 }

+/** Invoke commit_checkpoint_notify_ha() to notify that outstanding
+log writes have been completed. */
+void log_flush_notify(lsn_t flush_lsn);
+
 /**
 Writes log buffer to disk
 which is the "write" part of log_write_up_to().
@@ -759,8 +763,10 @@ static void log_write(bool rotate_key)
 		start_offset - area_start);
 	srv_stats.log_padded.add(pad_size);
 	log_sys.write_lsn = write_lsn;
-	if (log_sys.log.writes_are_durable())
+	if (log_sys.log.writes_are_durable()) {
 		log_sys.set_flushed_lsn(write_lsn);
+		log_flush_notify(write_lsn);
+	}
 	return;
 }

@@ -823,7 +829,7 @@ void log_write_up_to(lsn_t lsn, bool flush_to_disk, bool rotate_key)
  log_write_flush_to_disk_low(flush_lsn);
  flush_lock.release(flush_lsn);

-  innobase_mysql_log_notify(flush_lsn);
+  log_flush_notify(flush_lsn);
 }

 /** write to the log file up to the last log entry.

--- a/storage/rocksdb/ha_rocksdb.cc
+++ b/storage/rocksdb/ha_rocksdb.cc
@@ -4111,15 +4111,14 @@ static int rocksdb_recover(handlerton* hton, XID* xid_list, uint len)
  MariaRocks just flushes everything right away ATM
 */

-static void rocksdb_checkpoint_request(handlerton *hton,
-                                       void *cookie)
+static void rocksdb_checkpoint_request(void *cookie)
 {
  const rocksdb::Status s= rdb->SyncWAL();
  //TODO: what to do on error?
  if (s.ok())
  {
    rocksdb_wal_group_syncs++;
-    commit_checkpoint_notify_ha(hton, cookie);
+    commit_checkpoint_notify_ha(cookie);
  }
 }