• Monty's avatar
    MDEV-25180 Atomic ALTER TABLE · 7762ee5d
    Monty authored
    MDEV-25604 Atomic DDL: Binlog event written upon recovery does not
               have default database
    
    The purpose of this task is to ensure that ALTER TABLE is atomic even if
    the MariaDB server would be killed at any point of the alter table.
    This means that either the ALTER TABLE succeeds (including that triggers,
    the status tables and the binary log are updated) or things should be
    reverted to their original state.
    
    If the server crashes before the new version is fully up to date and
    commited, it will revert to the original table and remove all
    temporary files and tables.
    If the new version is commited, crash recovery will use the new version,
    and update triggers, the status tables and the binary log.
    The one execption is ALTER TABLE .. RENAME .. where no changes are done
    to table definition. This one will work as RENAME and roll back unless
    the whole statement completed, including updating the binary log (if
    enabled).
    
    Other changes:
    - Added handlerton->check_version() function to allow the ddl recovery
      code to check, in case of inplace alter table, if the table in the
      storage engine is of the new or old version.
    - Added handler->table_version() so that an engine can report the current
      version of the table. This should be changed each time the table
      definition changes.
    - Added  ha_signal_ddl_recovery_done() and
      handlerton::signal_ddl_recovery_done() to inform all handlers when
      ddl recovery has been done. (Needed by InnoDB).
    - Added handlerton call inplace_alter_table_committed, to signal engine
      that ddl_log has been closed for the alter table query.
    - Added new handerton flag
      HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we
      should call hton->notify_tabledef_changed() during
      mysql_inplace_alter_table. This was required as MyRocks and InnoDB
      needed the call at different times.
    - Added function server_uuid_value() to be able to generate a temporary
      xid when ddl recovery writes the query to the binary log. This is
      needed to be able to handle crashes during ddl log recovery.
    - Moved freeing of the frm definition to end of mysql_alter_table() to
      remove duplicate code and have a common exit strategy.
    
    -------
    InnoDB part of atomic ALTER TABLE
    (Implemented by Marko Mäkelä)
    innodb_check_version(): Compare the saved dict_table_t::def_trx_id
    to determine whether an ALTER TABLE operation was committed.
    
    We must correctly recover dict_table_t::def_trx_id for this to work.
    Before purge removes any trace of DB_TRX_ID from system tables, it
    will make an effort to load the user table into the cache, so that
    the dict_table_t::def_trx_id can be recovered.
    
    ha_innobase::table_version(): return garbage, or the trx_id that would
    be used for committing an ALTER TABLE operation.
    
    In InnoDB, table names starting with #sql-ib will remain special:
    they will be dropped on startup. This may be revisited later in
    MDEV-18518 when we implement proper undo logging and rollback
    for creating or dropping multiple tables in a transaction.
    
    Table names starting with #sql will retain some special meaning:
    dict_table_t::parse_name() will not consider such names for
    MDL acquisition, and dict_table_rename_in_cache() will treat such
    names specially when handling FOREIGN KEY constraints.
    
    Simplify InnoDB DROP INDEX.
    Prevent purge wakeup
    
    To ensure that dict_table_t::def_trx_id will be recovered correctly
    in case the server is killed before ddl_log_complete(), we will block
    the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS
    between ha_innobase::commit_inplace_alter_table(commit=true)
    (purge_sys.stop_SYS()) and purge_sys.resume_SYS().
    The completion callback purge_sys.resume_SYS() must be between
    ddl_log_complete() and MDL release.
    
    --------
    
    MyRocks support for atomic ALTER TABLE
    (Implemented by Sergui Petrunia)
    
    Implement these SE API functions:
    - ha_rocksdb::table_version()
    - hton->check_version = rocksdb_check_versionMyRocks data dictionary
      now stores table version for each table.
      (Absence of table version record is interpreted as table_version=0,
      that is, which means no upgrade changes are needed)
    - For inplace alter table of a partitioned table, call the underlying
      handlerton when checking if the table is ok. This assumes that the
      partition engine commits all changes at once.
    7762ee5d
srv0srv.cc 63.5 KB