Commit d5399185 authored by unknown's avatar unknown

- speed optimization:

minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).


storage/maria/ha_maria.cc:
  disable CACHE INDEX in Maria for now (there is a single cache for now),
  it crashes and it's not a priority
storage/maria/ma_bitmap.c:
  debug message
storage/maria/ma_check.c:
  The statement before maria_repair() may not flush state,
  so it needs to be done by maria_repair() (indeed this function
  uses maria_open(HA_OPEN_COPY) so reads state from disk,
  so needs to find it up-to-date on disk).
  For safety (but normally this is not needed) we remove index blocks
  out of the cache before repairing.
  _ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
  it now additionally flushes the data file and state and syncs files.
  As a side effect, the assertion "no WRITE_CACHE_USED" from
  _ma_flush_table_files() fired so we move all end_io_cache() done
  at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
  when closing a transactional table, we fsync it. But we need to
  do this only after writing its state.
  We need to write the state at close time only for transactional
  tables (the other tables do that at last unlock).
  Putting back the O_RDONLY||crashed condition which I had
  removed earlier.
  Unmap the file before syncing it (does not matter now as Maria
  does not use mmap)
storage/maria/ma_delete_all.c:
  need to flush data pages before chsize-ing it. Was needed even when
  we flushed data pages at the end of each statement, because we didn't
  anyway do it if under LOCK TABLES: the change here thus fixes this bug:
  create table t(a int) engine=maria;lock tables t write;
  insert into t values(1);delete from t;unlock tables;check table t;
  "Size of datafile is: 16384       Should be: 8192"
  (an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
  When doing share->last_version=0, we make the MARIA_SHARE-in-memory
  invisible to future openers, so need to have an up-to-date state
  on disk for them. The same way, future openers will reopen the data
  and index file, so they will not find our cached blocks, so we
  need to flush them to disk.
  In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
  tables normally get closed, we however add a safety flush.
  In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
  Windows we additionally need to close files.
  In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
  remove dirty cached blocks from memory. On Windows we need to close
  files.
  Closing files forces us to sync them before (requirement for transactional
  tables).
  For mutex reasons (don't lock intern_lock twice), we move
  maria_lock_database() and _ma_decrement_open_count() first in the list
  of operations.
  Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
  For transactional tables:
    - don't write data pages / state at unlock time;
    as a consequence, "share->changed=0" cannot be done.
    - don't write state in _ma_writeinfo()
    - don't maintain open_count on disk (Recovery corrects the table in case of crash
    anyway, and we gain speed by not writing open_count to disk),
  For non-transactional tables, flush the state at unlock only
  if the table was changed (optimization).
  Code which read the state from disk is relevant only with
  external locking, we disable it (if want to re-enable it, it shouldn't
  for transactional tables as state on disk may be obsolete (such tables
  does not flush state at unlock anymore).
  The comment "We have to flush the write cache" is now wrong because
  maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
  we are not using external locking.
storage/maria/ma_open.c:
  _ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
  set MARIA_SHARE::changed to TRUE when we are going to apply a
  REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
  Changes introduced by this patch:
  - good: the "open" (table open, not properly closed) is gone,
  it was pointless for a recovered table
  - bad: stemming from different moments of writing the index's state
  probably (_ma_writeinfo() used to write the state after every row
  write in ma_test* programs, doesn't anymore as the table is
  transactional): some differences in indexes (not relevant as we don't
  yet have recovery for them); some differences in count of records
  (changed from a wrong value to another wrong value) (not relevant
  as we don't recover this count correctly yet anyway, though
  a patch will be pushed soon).
storage/maria/ma_test_recovery:
  for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
  function renamed
storage/maria/maria_def.h:
  Function became local to ma_open.c. Function renamed.
parent ac4ad9bd
......@@ -1348,6 +1348,9 @@ int ha_maria::assign_to_keycache(THD * thd, HA_CHECK_OPT *check_opt)
TABLE_LIST *table_list= table->pos_in_table_list;
DBUG_ENTER("ha_maria::assign_to_keycache");
/* for now, it is disabled */
DBUG_RETURN(HA_ADMIN_NOT_IMPLEMENTED);
table->keys_in_use_for_query.clear_all();
if (table_list->process_index_hints(table))
......
......@@ -265,6 +265,7 @@ my_bool _ma_bitmap_end(MARIA_SHARE *share)
my_bool _ma_flush_bitmap(MARIA_SHARE *share)
{
my_bool res= 0;
DBUG_ENTER("_ma_flush_bitmap");
if (share->bitmap.changed)
{
pthread_mutex_lock(&share->bitmap.bitmap_lock);
......@@ -275,7 +276,7 @@ my_bool _ma_flush_bitmap(MARIA_SHARE *share)
}
pthread_mutex_unlock(&share->bitmap.bitmap_lock);
}
return res;
DBUG_RETURN(res);
}
......
......@@ -1996,9 +1996,12 @@ int maria_repair(HA_CHECK *param, register MARIA_HA *info,
/*
The physical size of the data file is sometimes used during repair (see
sort_info.filelength further below); we need to flush to have it exact.
We flush the state because our maria_open(HA_OPEN_COPY) will want to read
it from disk. Index file will be recreated.
*/
if (_ma_flush_table_files(info, MARIA_FLUSH_DATA, FLUSH_FORCE_WRITE,
FLUSH_KEEP))
if (_ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
FLUSH_FORCE_WRITE, FLUSH_IGNORE_CHANGED) ||
_ma_state_info_write(share->kfile.file, &share->state, 1|2))
goto err;
if (!rep_quick)
......@@ -2025,13 +2028,9 @@ int maria_repair(HA_CHECK *param, register MARIA_HA *info,
share->state.header.org_data_file_type == BLOCK_RECORD))
{
MARIA_HA *new_info;
/**
@todo RECOVERY it's a bit worrying to have two MARIA_SHARE on the
same index file:
- Checkpoint will see them as two tables
- are we sure that new_info never flushes an in-progress state
to the index file? And how to prevent Checkpoint from doing that?
- in the close future maria_close() will write the state...
/*
It's ok for Recovery to have two MARIA_SHARE on the same index file
because the one below is not transactional
*/
if (!(sort_info.new_info= maria_open(info->s->open_file_name, O_RDWR,
HA_OPEN_COPY | HA_OPEN_FOR_REPAIR)))
......@@ -2264,6 +2263,11 @@ int maria_repair(HA_CHECK *param, register MARIA_HA *info,
if (scan_inited)
maria_scan_end(sort_info.info);
VOID(end_io_cache(&param->read_cache));
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
/* this below could fail, shouldn't we detect error? */
VOID(end_io_cache(&info->rec_cache));
got_error|= _ma_flush_table_files_after_repair(param, info);
if (got_error)
{
if (! param->error_printed)
......@@ -2298,10 +2302,6 @@ int maria_repair(HA_CHECK *param, register MARIA_HA *info,
my_free(sort_param.rec_buff, MYF(MY_ALLOW_ZERO_PTR));
my_free(sort_param.record,MYF(MY_ALLOW_ZERO_PTR));
my_free(sort_info.buff,MYF(MY_ALLOW_ZERO_PTR));
VOID(end_io_cache(&param->read_cache));
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
VOID(end_io_cache(&info->rec_cache));
got_error|=_ma_flush_blocks(param, share->pagecache, &share->kfile);
if (!got_error && (param->testflag & T_UNPACK))
restore_data_file_type(share);
share->state.changed|= (STATE_NOT_OPTIMIZED_KEYS | STATE_NOT_SORTED_PAGES |
......@@ -2443,18 +2443,31 @@ void maria_lock_memory(HA_CHECK *param __attribute__((unused)))
} /* maria_lock_memory */
/* Flush all changed blocks to disk */
/**
Flush all changed blocks to disk so that we can say "at the end of repair,
the table is fully ok on disk".
It is a requirement for transactional tables.
We release blocks as it's unlikely that they would all be needed soon.
@param param description of the repair operation
@param info table
*/
int _ma_flush_blocks(HA_CHECK *param, PAGECACHE *pagecache,
PAGECACHE_FILE *file)
int _ma_flush_table_files_after_repair(HA_CHECK *param, MARIA_HA *info)
{
if (flush_pagecache_blocks(pagecache, file, FLUSH_RELEASE))
MARIA_SHARE *share= info->s;
if (_ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
FLUSH_RELEASE, FLUSH_RELEASE) ||
_ma_state_info_write(share->kfile.file, &share->state, 1) ||
(share->now_transactional && !share->temporary
&& _ma_sync_table_files(info)))
{
_ma_check_print_error(param,"%d when trying to write bufferts",my_errno);
return(1);
return 1;
}
return 0;
} /* _ma_flush_blocks */
} /* _ma_flush_table_files_after_repair */
/* Sort index for more efficent reads */
......@@ -3064,8 +3077,10 @@ int maria_repair_by_sort(HA_CHECK *param, register MARIA_HA *info,
memcpy( &share->state.state, info->state, sizeof(*info->state));
err:
got_error|= _ma_flush_blocks(param, share->pagecache, &share->kfile);
VOID(end_io_cache(&info->rec_cache));
VOID(end_io_cache(&param->read_cache));
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
got_error|= _ma_flush_table_files_after_repair(param, info);
if (!got_error)
{
/* Replace the actual file with the temporary file */
......@@ -3105,8 +3120,6 @@ int maria_repair_by_sort(HA_CHECK *param, register MARIA_HA *info,
my_free((uchar*) sort_info.key_block,MYF(MY_ALLOW_ZERO_PTR));
my_free((uchar*) sort_info.ft_buf, MYF(MY_ALLOW_ZERO_PTR));
my_free(sort_info.buff,MYF(MY_ALLOW_ZERO_PTR));
VOID(end_io_cache(&param->read_cache));
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
if (!got_error && (param->testflag & T_UNPACK))
restore_data_file_type(share);
DBUG_RETURN(got_error);
......@@ -3581,13 +3594,14 @@ int maria_repair_parallel(HA_CHECK *param, register MARIA_HA *info,
memcpy(&share->state.state, info->state, sizeof(*info->state));
err:
got_error|= _ma_flush_blocks(param, share->pagecache, &share->kfile);
/*
Destroy the write cache. The master thread did already detach from
the share by remove_io_thread() or it was not yet started (if the
error happend before creating the thread).
*/
VOID(end_io_cache(&info->rec_cache));
VOID(end_io_cache(&param->read_cache));
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
/*
Destroy the new data cache in case of non-quick repair. All slave
threads did either detach from the share by remove_io_thread()
......@@ -3596,6 +3610,7 @@ int maria_repair_parallel(HA_CHECK *param, register MARIA_HA *info,
*/
if (!rep_quick)
VOID(end_io_cache(&new_data_cache));
got_error|= _ma_flush_table_files_after_repair(param, info);
if (!got_error)
{
/* Replace the actual file with the temporary file */
......@@ -3637,8 +3652,6 @@ int maria_repair_parallel(HA_CHECK *param, register MARIA_HA *info,
my_free((uchar*) sort_info.key_block,MYF(MY_ALLOW_ZERO_PTR));
my_free((uchar*) sort_param,MYF(MY_ALLOW_ZERO_PTR));
my_free(sort_info.buff,MYF(MY_ALLOW_ZERO_PTR));
VOID(end_io_cache(&param->read_cache));
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
if (!got_error && (param->testflag & T_UNPACK))
restore_data_file_type(share);
DBUG_RETURN(got_error);
......@@ -5587,13 +5600,13 @@ static int write_log_record_for_repair(const HA_CHECK *param, MARIA_HA *info)
translog_flush(share->state.create_rename_lsn)))
return 1;
/*
But this piece is really needed, to have the new table's content durable
and to not apply old REDOs to the new table. The table's existence was
made durable earlier (MY_SYNC_DIR passed to maria_change_to_newfile()).
The table's existence was made durable earlier (MY_SYNC_DIR passed to
maria_change_to_newfile()).
_ma_flush_table_files_after_repair() is later called by maria_repair(),
and makes sure to flush the data, index and state and sync, so
create_rename_lsn reaches disk, thus we won't apply old REDOs to the new
table.
*/
DBUG_ASSERT(info->dfile.file >= 0);
return (_ma_update_create_rename_lsn_on_disk(share, FALSE) ||
_ma_sync_table_files(info));
}
return 0;
}
......@@ -75,12 +75,10 @@ int maria_close(register MARIA_HA *info)
FLUSH_IGNORE_CHANGED :
FLUSH_RELEASE)))
error= my_errno;
/*
File must be synced as it is going out of the maria_open_list and so
becoming unknown to Checkpoint.
*/
if (share->now_transactional && my_sync(share->kfile.file, MYF(MY_WME)))
error= my_errno;
#ifdef HAVE_MMAP
if (share->file_map)
_ma_unmap_file(info);
#endif
/*
If we are crashed, we can safely flush the current state as it will
not change the crashed state.
......@@ -88,15 +86,21 @@ int maria_close(register MARIA_HA *info)
may be using the file at this point
IF using --external-locking, which does not apply to Maria.
*/
if (share->changed)
_ma_state_info_write(share->kfile.file, &share->state, 1);
if ((share->changed && share->base.born_transactional) ||
(share->mode != O_RDONLY && maria_is_crashed(info)))
{
/*
File must be synced as it is going out of the maria_open_list and so
becoming unknown to Checkpoint. State must be written to file as
it was not done at table's unlocking.
*/
if (_ma_state_info_write(share->kfile.file, &share->state, 1) ||
my_sync(share->kfile.file, MYF(MY_WME)))
error= my_errno;
}
if (my_close(share->kfile.file, MYF(0)))
error= my_errno;
}
#ifdef HAVE_MMAP
if (share->file_map)
_ma_unmap_file(info);
#endif
#ifdef THREAD
thr_lock_delete(&share->lock);
VOID(pthread_mutex_destroy(&share->intern_lock));
......
......@@ -73,11 +73,13 @@ int maria_delete_all_rows(MARIA_HA *info)
/*
If we are using delayed keys or if the user has done changes to the tables
since it was locked then there may be key blocks in the key cache
since it was locked then there may be key blocks in the page cache. Or
there may be data blocks there. We need to throw them away or they may
re-enter the emptied table later.
*/
flush_pagecache_blocks(share->pagecache, &share->kfile,
FLUSH_IGNORE_CHANGED);
if (my_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
if (_ma_flush_table_files(info, MARIA_FLUSH_DATA|MARIA_FLUSH_INDEX,
FLUSH_IGNORE_CHANGED, FLUSH_IGNORE_CHANGED) ||
my_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
my_chsize(share->kfile.file, share->base.keystart, 0, MYF(MY_WME)) )
goto err;
......
......@@ -256,60 +256,107 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
}
}
share->state.state= *info->state;
/*
That state write to disk must be done, even for transactional tables;
indeed the table's share is going to be lost (there was a
HA_EXTRA_FORCE_REOPEN before, which set share->last_version to
0), and so the only way it leaves information (share->state.key_map)
for the posterity is by writing it to disk.
*/
error=_ma_state_info_write(share->kfile.file, &share->state, (1 | 2));
}
break;
case HA_EXTRA_FORCE_REOPEN:
/*
Normally MySQL uses this case when it is going to close all open
instances of the table, thus going to flush all data/index/state.
We however do a flush here for additional safety.
*/
/** @todo consider porting these flush-es to MyISAM */
error= _ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
FLUSH_FORCE_WRITE, FLUSH_FORCE_WRITE) ||
_ma_state_info_write(share->kfile.file, &share->state, 1 | 2) ||
(share->changed= 0);
pthread_mutex_lock(&THR_LOCK_maria);
/* this makes the share not be re-used next time the table is opened */
share->last_version= 0L; /* Impossible version */
pthread_mutex_unlock(&THR_LOCK_maria);
break;
case HA_EXTRA_PREPARE_FOR_DROP:
case HA_EXTRA_PREPARE_FOR_RENAME:
{
my_bool do_flush= test(function != HA_EXTRA_PREPARE_FOR_DROP);
pthread_mutex_lock(&THR_LOCK_maria);
share->last_version= 0L; /* Impossible version */
#ifdef __WIN__
/* Close the isam and data files as Win32 can't drop an open table */
pthread_mutex_lock(&share->intern_lock);
/*
If this is Windows we remove blocks from pagecache. If not Windows we
don't do it, so these pages stay in the pagecache? So they may later be
flushed to a wrong file?
Or is it that this flush_pagecache_blocks() never finds any blocks? Then
why do we do it on Windows?
Don't we wait for all instances to be closed before dropping the table?
Do we ever do something useful here?
BUG?
FLUSH_IGNORE_CHANGED: we are also throwing away unique index blocks?
Does ENABLE KEYS rebuild them too?
This share, having last_version=0, needs to save all its data/index
blocks to disk if this is not for a DROP TABLE. Otherwise they would be
invisible to future openers; and they could even go to disk late and
cancel the work of future openers.
On Windows, which cannot delete an open file (cannot drop an open table)
we have to close the table's files.
*/
if (flush_pagecache_blocks(share->pagecache, &share->kfile,
(function == HA_EXTRA_PREPARE_FOR_DROP ?
FLUSH_IGNORE_CHANGED : FLUSH_RELEASE)))
if (info->lock_type != F_UNLCK && !info->was_locked)
{
info->was_locked= info->lock_type;
if (maria_lock_database(info, F_UNLCK))
error= my_errno;
info->lock_type= F_UNLCK;
}
if (share->kfile.file >= 0)
_ma_decrement_open_count(info);
pthread_mutex_lock(&share->intern_lock);
enum flush_type type= do_flush ? FLUSH_RELEASE : FLUSH_IGNORE_CHANGED;
if (_ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
type, type))
{
error=my_errno;
share->changed=1;
maria_print_error(info->s, HA_ERR_CRASHED);
maria_mark_crashed(info); /* Fatal error found */
}
if (info->opt_flag & (READ_CACHE_USED | WRITE_CACHE_USED))
{
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
error=end_io_cache(&info->rec_cache);
if (end_io_cache(&info->rec_cache))
error= 1;
}
if (info->lock_type != F_UNLCK && ! info->was_locked)
if (share->kfile.file >= 0)
{
info->was_locked=info->lock_type;
if (maria_lock_database(info,F_UNLCK))
error=my_errno;
info->lock_type = F_UNLCK;
if (do_flush)
{
/*
Save the state so that others can find it from disk.
We have to sync now, as on Windows we are going to close the file
(so cannot sync later).
*/
if (_ma_state_info_write(share->kfile.file, &share->state, 1 | 2) ||
my_sync(share->kfile.file, MYF(0)))
error= my_errno;
else
share->changed= 0;
}
if (share->kfile.file >= 0)
else
{
_ma_decrement_open_count(info);
if (my_close(share->kfile,MYF(0)))
/* be sure that state is not tried for write as file may be closed */
share->changed= 0;
}
#ifdef __WIN__
if (my_close(share->kfile, MYF(0)))
error=my_errno;
share->kfile.file= -1;
#endif
}
if (share->data_file_type == BLOCK_RECORD &&
share->bitmap.file.file >= 0)
{
if (do_flush && my_sync(share->bitmap.file.file, MYF(0)))
error= my_errno;
#ifdef __WIN__
if (my_close(share->bitmap.file.file, MYF(0)))
error= my_errno;
share->bitmap.file.file= -1;
#endif
}
#ifdef __WIN__
{
LIST *list_element ;
for (list_element=maria_open_list ;
......@@ -319,24 +366,23 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
MARIA_HA *tmpinfo=(MARIA_HA*) list_element->data;
if (tmpinfo->s == info->s)
{
/**
@todo RECOVERY BUG: flush of bitmap and sync of dfile are missing
*/
if (tmpinfo->dfile.file >= 0 &&
if (share->data_file_type != BLOCK_RECORD &&
tmpinfo->dfile.file >= 0 &&
my_close(tmpinfo->dfile.file, MYF(0)))
error = my_errno;
tmpinfo->dfile.file= -1;
}
}
}
share->kfile.file= -1; /* Files aren't open anymore */
pthread_mutex_unlock(&share->intern_lock);
#endif
pthread_mutex_unlock(&share->intern_lock);
pthread_mutex_unlock(&THR_LOCK_maria);
break;
}
case HA_EXTRA_FLUSH:
if (!share->temporary)
flush_pagecache_blocks(share->pagecache, &share->kfile, FLUSH_KEEP);
error= _ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
FLUSH_KEEP, FLUSH_KEEP);
#ifdef HAVE_PWRITE
_ma_decrement_open_count(info);
#endif
......@@ -489,8 +535,8 @@ int maria_reset(MARIA_HA *info)
int _ma_sync_table_files(const MARIA_HA *info)
{
return (my_sync(info->dfile.file, MYF(0)) ||
my_sync(info->s->kfile.file, MYF(0)));
return (my_sync(info->dfile.file, MYF(MY_WME)) ||
my_sync(info->s->kfile.file, MYF(MY_WME)));
}
......@@ -527,6 +573,8 @@ int _ma_flush_table_files(MARIA_HA *info, uint flush_data_or_index,
{
if (info->opt_flag & WRITE_CACHE_USED)
{
/* normally any code which creates a WRITE_CACHE destroys it later */
DBUG_ASSERT(0);
if (end_io_cache(&info->rec_cache))
goto err;
info->opt_flag&= ~WRITE_CACHE_USED;
......
......@@ -76,14 +76,11 @@ int maria_lock_database(MARIA_HA *info, int lock_type)
/* Mark that table must be checked */
maria_mark_crashed(info);
}
if (share->data_file_type == BLOCK_RECORD &&
flush_pagecache_blocks(share->pagecache, &info->dfile, FLUSH_KEEP))
{
/* pages of transactional tables get flushed at Checkpoint */
if (!share->base.born_transactional &&
_ma_flush_table_files(info, MARIA_FLUSH_DATA,
FLUSH_KEEP, FLUSH_KEEP))
error= my_errno;
maria_print_error(info->s, HA_ERR_CRASHED);
/* Mark that table must be checked */
maria_mark_crashed(info);
}
}
if (info->opt_flag & (READ_CACHE_USED | WRITE_CACHE_USED))
{
......@@ -116,9 +113,17 @@ int maria_lock_database(MARIA_HA *info, int lock_type)
share->state.process= share->last_process=share->this_process;
share->state.unique= info->last_unique= info->this_unique;
share->state.update_count= info->last_loop= ++info->this_loop;
/* transactional tables rather flush their state at Checkpoint */
if (!share->base.born_transactional)
{
if (_ma_state_info_write(share->kfile.file, &share->state, 1))
error=my_errno;
share->changed=0;
error= my_errno;
else
{
/* A value of 0 means below means "state flushed" */
share->changed= 0;
}
}
if (maria_flush)
{
if (_ma_sync_table_files(info))
......@@ -135,6 +140,7 @@ int maria_lock_database(MARIA_HA *info, int lock_type)
}
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
info->lock_type= F_UNLCK;
/* verify that user of the table cleaned up after itself */
DBUG_ASSERT(share->now_transactional == share->base.born_transactional);
break;
case F_RDLCK:
......@@ -151,14 +157,17 @@ int maria_lock_database(MARIA_HA *info, int lock_type)
info->lock_type=lock_type;
break;
}
#ifdef MARIA_EXTERNAL_LOCKING
if (!share->r_locks && !share->w_locks)
{
/* note that a transactional table should not do this */
if (_ma_state_info_read_dsk(share->kfile.file, &share->state))
{
error=my_errno;
break;
}
}
#endif
VOID(_ma_test_if_changed(info));
share->r_locks++;
share->tot_locks++;
......@@ -175,12 +184,29 @@ int maria_lock_database(MARIA_HA *info, int lock_type)
break;
}
}
#ifdef MARIA_EXTERNAL_LOCKING
if (!(share->options & HA_OPTION_READ_ONLY_DATA))
{
if (!share->w_locks)
{
if (!share->r_locks)
{
/*
Note that transactional tables should not do this.
If we enabled this code, we should make sure to skip it if
born_transactional is true. We should not test
now_transactional to decide if we can call
_ma_state_info_read_dsk(), because it can temporarily be 0
(TRUNCATE on a partitioned table) and thus it would make a state
modification below without mutex, confusing a concurrent
checkpoint running.
Even if this code was enabled only for non-transactional tables:
in scenario LOCK TABLE t1 WRITE; INSERT INTO t1; DELETE FROM t1;
state on disk read by DELETE is obsolete as it was not flushed
at the end of INSERT. MyISAM same. It however causes no issue as
maria_delete_all_rows() calls _ma_reset_status() thus is not
influenced by the obsolete read values.
*/
if (_ma_state_info_read_dsk(share->kfile.file, &share->state))
{
error=my_errno;
......@@ -189,6 +215,7 @@ int maria_lock_database(MARIA_HA *info, int lock_type)
}
}
}
#endif /* defined(MARIA_EXTERNAL_LOCKING) */
VOID(_ma_test_if_changed(info));
info->lock_type=lock_type;
......@@ -278,24 +305,15 @@ void _ma_update_status(void* param)
(long) info->s->state.state.key_file_length,
(long) info->s->state.state.data_file_length));
#endif
/*
we are going to modify the state without lock's log, this would break
recovery if done with a transactional table.
*/
DBUG_ASSERT(!info->s->base.born_transactional);
info->s->state.state= *info->state;
info->state= &info->s->state.state;
}
info->append_insert_at_end= 0;
/*
We have to flush the write cache here as other threads may start
reading the table before maria_lock_database() is called
*/
if (info->opt_flag & WRITE_CACHE_USED)
{
if (end_io_cache(&info->rec_cache))
{
maria_print_error(info->s, HA_ERR_CRASHED);
maria_mark_crashed(info);
}
info->opt_flag&= ~WRITE_CACHE_USED;
}
}
......@@ -355,8 +373,11 @@ my_bool _ma_check_status(void *param)
** functions to read / write the state
****************************************************************************/
int _ma_readinfo(register MARIA_HA *info, int lock_type, int check_keybuffer)
int _ma_readinfo(register MARIA_HA *info __attribute__ ((unused)),
int lock_type __attribute__ ((unused)),
int check_keybuffer __attribute__ ((unused)))
{
#ifdef MARIA_EXTERNAL_LOCKING
DBUG_ENTER("_ma_readinfo");
if (info->lock_type == F_UNLCK)
......@@ -364,6 +385,7 @@ int _ma_readinfo(register MARIA_HA *info, int lock_type, int check_keybuffer)
MARIA_SHARE *share=info->s;
if (!share->tot_locks)
{
/* should not be done for transactional tables */
if (_ma_state_info_read_dsk(share->kfile.file, &share->state))
{
int error=my_errno ? my_errno : -1;
......@@ -381,6 +403,9 @@ int _ma_readinfo(register MARIA_HA *info, int lock_type, int check_keybuffer)
DBUG_RETURN(-1); /* when have read_lock() */
}
DBUG_RETURN(0);
#else
return 0;
#endif /* defined(MARIA_EXTERNAL_LOCKING) */
} /* _ma_readinfo */
......@@ -398,8 +423,9 @@ int _ma_writeinfo(register MARIA_HA *info, uint operation)
share->tot_locks));
error=0;
if (share->tot_locks == 0)
if (share->tot_locks == 0 && !share->base.born_transactional)
{
/* transactional tables flush their state at Checkpoint */
if (operation)
{ /* Two threads can't be here */
olderror= my_errno; /* Remember last error */
......@@ -459,7 +485,7 @@ int _ma_test_if_changed(register MARIA_HA *info)
state.open_count in the .MYI file is used the following way:
- For the first change of the .MYI file in this process open_count is
incremented by maria_mark_file_change(). (We have a write lock on the file
incremented by _ma_mark_file_changed(). (We have a write lock on the file
when this happens)
- In maria_close() it's decremented by _ma_decrement_open_count() if it
was incremented in the same process.
......@@ -467,6 +493,8 @@ int _ma_test_if_changed(register MARIA_HA *info)
This mean that if we are the only process using the file, the open_count
tells us if the MARIA file wasn't properly closed. (This is true if
my_disable_locking is set).
open_count is not maintained on disk for transactional or temporary tables.
*/
......@@ -485,7 +513,12 @@ int _ma_mark_file_changed(MARIA_HA *info)
share->global_changed=1;
share->state.open_count++;
}
if (!share->temporary)
/*
temp tables don't need an open_count as they are removed on crash;
transactional tables are fixed by log-based recovery, so don't need an
open_count either (and we thus avoid the disk write below).
*/
if (!(share->temporary | share->base.born_transactional))
{
mi_int2store(buff,share->state.open_count);
buff[2]=1; /* Mark that it's changed */
......@@ -517,11 +550,14 @@ int _ma_decrement_open_count(MARIA_HA *info)
if (share->state.open_count > 0)
{
share->state.open_count--;
if (!(share->temporary | share->base.born_transactional))
{
mi_int2store(buff,share->state.open_count);
write_error= my_pwrite(share->kfile.file, buff, sizeof(buff),
sizeof(share->state.header),
MYF(MY_NABP));
}
}
if (!lock_error)
lock_error=maria_lock_database(info,old_lock);
}
......
......@@ -39,6 +39,7 @@ static void maria_scan_end_dummy(MARIA_HA *info);
static my_bool maria_once_init_dummy(MARIA_SHARE *, File);
static my_bool maria_once_end_dummy(MARIA_SHARE *);
static uchar *_ma_base_info_read(uchar *ptr, MARIA_BASE_INFO *base);
static uchar *_ma_state_info_read(uchar *ptr, MARIA_STATE_INFO *state);
#define get_next_element(to,pos,size) { memcpy((char*) to,pos,(size_t) size); \
pos+=size;}
......@@ -1049,7 +1050,7 @@ uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
}
uchar *_ma_state_info_read(uchar *ptr, MARIA_STATE_INFO *state)
static uchar *_ma_state_info_read(uchar *ptr, MARIA_STATE_INFO *state)
{
uint i,keys,key_parts;
memcpy_fixed(&state->header,ptr, sizeof(state->header));
......@@ -1103,7 +1104,9 @@ uchar *_ma_state_info_read(uchar *ptr, MARIA_STATE_INFO *state)
/**
@brief Fills the state by reading its copy on disk.
@note Does nothing in single user mode.
Should not be called for transactional tables, as their state on disk is
rarely current and so is often misleading for a reader.
Does nothing in single user mode.
@param file file to read from
@param state state which will be filled
......@@ -1114,6 +1117,8 @@ uint _ma_state_info_read_dsk(File file, MARIA_STATE_INFO *state)
{
char buff[MARIA_STATE_INFO_SIZE + MARIA_STATE_EXTRA_SIZE];
/* trick to detect transactional tables */
DBUG_ASSERT(state->create_rename_lsn == LSN_IMPOSSIBLE);
if (!maria_single_user)
{
if (my_pread(file, buff, state->state_length, 0L, MYF(MY_NABP)))
......
......@@ -1456,8 +1456,7 @@ static MARIA_HA *get_MARIA_HA_from_REDO_record(const
record's we will modify the page
*/
fprintf(tracef, ", applying record\n");
/* to flush data/index pages and state on close: */
info->s->changed= 1;
_ma_writeinfo(info, WRITEINFO_UPDATE_KEYFILE); /* to flush state on close */
return info;
}
......@@ -1476,11 +1475,10 @@ static MARIA_HA *get_MARIA_HA_from_UNDO_record(const
fprintf(tracef, ", table skipped, so skipping record\n");
return NULL;
}
_ma_writeinfo(info, WRITEINFO_UPDATE_KEYFILE); /* to flush state on close */
fprintf(tracef, ", '%s'", info->s->open_file_name);
DBUG_ASSERT(info->s->last_version != 0);
fprintf(tracef, ", applying record\n");
/* to flush data/index pages and state on close: */
info->s->changed= 1;
return info;
}
......
......@@ -96,13 +96,13 @@ echo "Testing the REDO PHASE ALONE"
# identical to the saved original.
# Does not test the index file as we don't have logging for it yet.
set -- "$maria_path/ma_test1 $silent -M -T -c" "$maria_path/ma_test2 $silent -L -K -W -P -M -T -c" "$maria_path/ma_test2 $silent -M -T -c -b"
set -- "ma_test1 $silent -M -T -c" "ma_test2 $silent -L -K -W -P -M -T -c" "ma_test2 $silent -M -T -c -b"
while [ $# != 0 ]
do
prog=$1
rm maria_log.* maria_log_control
echo "TEST WITH $prog"
$prog
$maria_path/$prog
# derive table's name from program's name
table=`echo $prog | sed -e 's;.*ma_\(test[0-9]\).*;\1;' `
$maria_path/maria_chk -dvv $table | grep -v "Creation time:"> $tmp/maria_chk_message.good.txt 2>&1
......@@ -131,7 +131,7 @@ do
for test_undo in 1 2 3
do
# first iteration tests rollback of insert, second tests rollback of delete
set -- "$maria_path/ma_test1 $silent -M -T -c -N $blobs" "--testflag=1" "--testflag=2" "$maria_path/ma_test1 $silent -M -T -c -N --debug=d:t:i:o,/tmp/ma_test1.trace $blobs" "--testflag=3" "--testflag=4"
set -- "ma_test1 $silent -M -T -c -N $blobs" "--testflag=1" "--testflag=2" "ma_test1 $silent -M -T -c -N --debug=d:t:i:o,/tmp/ma_test1.trace $blobs" "--testflag=3" "--testflag=4"
# -N (create NULL fields) is needed because --test-undo adds it anyway
while [ $# != 0 ]
do
......@@ -140,7 +140,7 @@ do
abort_run_args=$3;
rm maria_log.* maria_log_control
echo "TEST WITH $prog $commit_run_args (commit at end)"
$prog $commit_run_args
$maria_path/$prog $commit_run_args
# derive table's name from program's name
table=`echo $prog | sed -e 's;.*ma_\(test[0-9]\).*;\1;' `
$maria_path/maria_chk -dvv $table | grep -v "Creation time:"> $tmp/maria_chk_message.good.txt 2>&1
......@@ -149,7 +149,7 @@ do
rm $table.MAI
rm maria_log.* maria_log_control
echo "TEST WITH $prog $abort_run_args --test-undo=$test_undo (additional aborted work)"
$prog $abort_run_args --test-undo=$test_undo
$maria_path/$prog $abort_run_args --test-undo=$test_undo
cp $table.MAD $tmp/$table.MAD.before_undo
if [ $test_undo -lt 3 ]
then
......
This diff is collapsed.
......@@ -1142,7 +1142,7 @@ static int maria_chk(HA_CHECK *param, char *filename)
if ((info->s->data_file_type != STATIC_RECORD) ||
(param->testflag & (T_EXTEND | T_MEDIUM)))
error|=maria_chk_data_link(param, info, param->testflag & T_EXTEND);
error|=_ma_flush_blocks(param, share->pagecache, &share->kfile);
error|= _ma_flush_table_files_after_repair(param, info);
VOID(end_io_cache(&param->read_cache));
}
if (!error)
......@@ -1658,8 +1658,7 @@ static int maria_sort_records(HA_CHECK *param,
my_free(sort_info.buff,MYF(MY_ALLOW_ZERO_PTR));
sort_info.buff=0;
share->state.sortkey=sort_key;
DBUG_RETURN(_ma_flush_blocks(param, share->pagecache, &share->kfile) |
got_error);
DBUG_RETURN(_ma_flush_table_files_after_repair(param, info) | got_error);
} /* sort_records */
......
......@@ -865,7 +865,6 @@ extern uint _ma_nommap_pwrite(MARIA_HA *info, uchar *Buffer,
uint Count, my_off_t offset, myf MyFlags);
uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite);
uchar *_ma_state_info_read(uchar *ptr, MARIA_STATE_INFO *state);
uint _ma_state_info_read_dsk(File file, MARIA_STATE_INFO *state);
uint _ma_base_info_write(File file, MARIA_BASE_INFO *base);
int _ma_keyseg_write(File file, const HA_KEYSEG *keyseg);
......@@ -927,8 +926,7 @@ int _ma_thr_write_keys(MARIA_SORT_PARAM *sort_param);
#ifdef THREAD
pthread_handler_t _ma_thr_find_all_keys(void *arg);
#endif
int _ma_flush_blocks(HA_CHECK *param, PAGECACHE *pagecache,
PAGECACHE_FILE *file);
int _ma_flush_table_files_after_repair(HA_CHECK *param, MARIA_HA *info);
int _ma_sort_write_record(MARIA_SORT_PARAM *sort_param);
int _ma_create_index_by_sort(MARIA_SORT_PARAM *info, my_bool no_messages,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment