MDEV-17458 Unable to start galera node
Bootstrapping a new cluster from a backup created from a MariaDB version prior to 10.3.5 may result in error "SST position can't be set in past" when attempting to join additional nodes. The problem stems from the fact that when reading the wsrep position from InnoDB, the position is looked up in two places: the TRX_SYS page, where versions prior to 10.3.5 used to store WSREP's position; and rollback segments, this is where newer versions store the position. When starting a new cluster, the starting seqno is 0 and a new cluster UUID is generated. This is persisted in rollback segments, but the old UUID and seqno are not cleared from TRX_SYS page. Subsequently, when reading back the position, trx_rseg_read_wsrep_checkpoint() is going to return the maximum seqno found in both TRX_SYS page and rollback segments. So in the case of a newly bootstrapped cluster, it's always going to return the old cluster information. The fix consists of changing trx_rseg_read_wsrep_checkpoint() so that only rollback segments are looked up. On startup, position is read from the TRX_SYS page, and if present, it is copied to rollback segments (unless a newer position is already present in the rollback segments). Finally the position stored in TRX_SYS page is cleared.
Showing
Please register or sign in to comment