MDEV-20928 mtr test galera.galera_var_innodb_disallow_writes test failure
The sporadic test hangs happen because of mutex dealock between innodb background threads and two test connection executions. The test sets variable innodb_disallow_writes, which blocks all writes to filesyste. The test logic is to execute an INSERT, which should hang because of filesytstem writes are blocked, and through another session verify by SELECT that this hanging happens. The SELECT session will then release innodb_disallow_writes blocking. However, filesystem write blocking affects also innodb background threads and they may hang while keeping some other resources locked. As an example, in one test hang situation, buffer pool access was blocked. And, if buffer pool is blocked, the test connections will be blocked as well, and the SELECT session will not be able to continue to release the innodb_disallow_writes. The fix in this commit is refactoring of the test logic. The test will now set first innodb_disallow_writes blocking, and then record a hash of data directory's filesystem contents. This works as checksum of the state of data on the datadirectory. Then some SQL load is tried on both nodes, these sessions will be blocking due to frozen file system state. The test will have a short sleep to allow innodb background threads to loop and possibly encounter innodb_disallow_writes blocking as well. After the sleep, the test will record file system checksun for the second time, and then release the innodb_disallow-writes blocking. Finally, the two checksums are compared, they should be identical to verify that nothing was written on datadirectory during the test execution. The checksum is implemented by md5sum hash over all files found in datadirectory by find command. all these file hashes are hashed together by one more md5sum. The test therefore depends on md5sum and find. find may work differently with some OS distributions, e.g. freebsd may be problematic.
Showing
Please register or sign in to comment