Commit 1a962591 authored by unknown's avatar unknown

- WL#3239 "log CREATE TABLE in Maria"

- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.


storage/maria/Makefile.am:
  more files to build
storage/maria/ha_maria.cc:
  - logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
  - ha_maria::data_file_type does not have to be set in every info()
  call, just do it once in open().
  - if caller said that transactionality can be disabled (like if
  caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
  temporarily disable transactionality of the table in external_lock();
  that will ensure that no REDOs/UNDOs are logged for this possibly
  massive write operation (they are not needed, as if any write fails,
  the table will be dropped). We re-enable in external_lock(F_UNLCK),
  which in ALTER TABLE happens before the tmp table replaces the original
  one (which is good, as thus the final table will have a REDO RENAME
  and a correct create_rename_lsn).
  - when we commit we also have to write a log record, so
  trnman_commit_trn() calls become ma_commit() calls
  - at end of engine's initialization, we are potentially entering a
  multi-threaded dangerous world (clients are going to be accepted)
  and so some assertions of mutex-owning become enforceable, for that
  we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
  new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
  - fixing comments according to discussion with Monty
  - if a table is transactional but temporarily non-transactional
  (like in ALTER TABLE), we need to give a sensible LSN to the pages
  (and, if we give 0, pagecache asserts).
  - translog_write_record() now takes care of storing the share's
  2-byte-id in the log record
storage/maria/ma_blockrec.h:
  fixing comment according to discussion with Monty
storage/maria/ma_check.c:
  When REPAIR/OPTIMIZE modify the data/index file, if this is a
  transactional table, they must sync it; if they remove files or rename
  files, they must sync the directory, so that everything is durable.
  This is just applying to REPAIR/OPTIMIZE the logic already implemented
  in CREATE/DROP/RENAME a few months ago.
  Adding a function to write a LOGREC_REPAIR_TABLE at end of
  REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
  to update the table's create_rename_lsn.
storage/maria/ma_close.c:
  fix for a future bug
storage/maria/ma_control_file.c:
  ensuring that if Maria is running in multi-threaded mode, anybody
  wanting to write to the control file and update
  last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
  see ma_control_file.c
storage/maria/ma_create.c:
  when creating a table:
  - sync it and its directory only if this is a transactional table
  and there is a log (no point in syncing in maria_chk)
  - decouple the two uses of linkname/linkname_ptr (for index file and
  for data file) into more variables, as we need to know all links
  until the moment we write the LOGREC_CREATE_TABLE.
  - set share.data_file_type early so that _ma_initialize_data_file()
  knows it (Monty's bugfix so that a table always has at least a bitmap
  page when it is created; so data-file is not 0 bytes anymore).
  - log a LOGREC_CREATE_TABLE; it contains the bytes which we have
  just written to the index file's header. Update table's
  create_rename_lsn.
  - syncing of kfile had been bugified in a previous merge, correcting
  - syncing of dfile is now needed as it's not empty anymore
  - in _ma_initialize_data_file(), use share's block_size and not the
  global one. This is a gratuitous change, both variables are equal,
  just that I find it more future-proof to use share-bound variable
  rather than global one.
storage/maria/ma_delete_all.c:
  log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
  update create_rename_lsn then.
storage/maria/ma_delete_table.c:
  - logging LOGREC_DROP_TABLE; knowing if this is needed, requires
  knowing if the table is transactional, which requires opening the
  table.
  - we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
  questions
storage/maria/ma_init.c:
  when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
  - translog_inited has to be visible to ma_create() (see how it is used
  in ma_create())
  - checkpoint record will be a single record, not three
  - no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
  log a REDO_CREATE)
  - adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
  truncating the files), REPAIR.
  - MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
  - in translog_write_record(), if MARIA_SHARE does not yet have a
  2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
  store this short id into log records.
  - in translog_write_record(), if transaction has not logged its
  long trid, log LOGREC_LONG_TRANSACTION_ID.
  - For Checkpoint, we need to know the current end-of-log: adding
  translog_get_horizon().
  - For Control File, adding an assertion that the thread owns the
  log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
  Changes in log records (see ma_loghandler.c).
  new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
  adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
  where the most significant byte is used for flags.
storage/maria/ma_open.c:
  storing the create_rename_lsn in the index file's header (in the
  state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
  - my set_if_bigger was wrong, correcting it
  - if the first_in_switch list is not empty, it means that
  changed_blocks misses some dirty pages, so Checkpoint cannot run and
  needs to wait. A variable missing_blocks_in_changed_list is added to
  tell that (should it be named missing_blocks_in_changed_blocks?)
  - pagecache_collect_changed_blocks_with_lsn() now also tells the
  minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
  see ma_pagecache.c
storage/maria/ma_panic.c:
  comment
storage/maria/ma_range.c:
  comment
storage/maria/ma_rename.c:
  - logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
  knowing if the table is transactional, which requires opening the
  table.
  - update create_rename_lsn
  - we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
  comment
storage/maria/ma_test_all.sh:
  - tip for Valgrind-ing ma_test_all
  - do "export maria_path=somepath" before calling ma_test_all,
  if you want to run ma_test_all out of storage/maria (useful
  to have parallel runs, like one normal and one Valgrind, they
  must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
  - state now contains, in memory and on disk, the create_rename_lsn
  - share now contains a 2-byte-id
storage/maria/trnman.c:
  preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
  minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
  using most significant byte of first_undo_lsn to hold miscellaneous
  flags, for now TRANSACTION_LOGGED_LONG_ID.
  dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
  dummy_transaction_object was declared in all files including
  trnman_public.h, while in fact it's a single object.
  new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
  update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
  update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
  update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
  update for new prototype
storage/maria/ma_commit.c:
  function which wraps:
  - writing a LOGREC_COMMIT record (==commit on disk)
  - calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
  new header file
.tree-is-private:
  this file is now needed to keep our tree private (don't push it
  to public trees). When 5.1 is merged into mysql-maria, we can abandon
  our maria-specific post-commit trigger; .tree_is_private will take
  care of keeping commit mails private. Don't push this file to public
  trees.
parent fd9bd580
......@@ -54,7 +54,8 @@ noinst_HEADERS = maria_def.h ma_rt_index.h ma_rt_key.h ma_rt_mbr.h \
ma_sp_defs.h ma_fulltext.h ma_ftdefs.h ma_ft_test1.h \
ma_ft_eval.h trnman.h lockman.h tablockman.h \
ma_control_file.h ha_maria.h ma_blockrec.h \
ma_loghandler.h ma_loghandler_lsn.h ma_pagecache.h
ma_loghandler.h ma_loghandler_lsn.h ma_pagecache.h \
ma_commit.h
ma_test1_DEPENDENCIES= $(LIBRARIES)
ma_test1_LDADD= @CLIENT_EXTRA_LDFLAGS@ libmaria.a \
$(top_builddir)/storage/myisam/libmyisam.a \
......@@ -112,7 +113,8 @@ libmaria_a_SOURCES = ma_init.c ma_open.c ma_extra.c ma_info.c ma_rkey.c \
ha_maria.cc trnman.c lockman.c tablockman.c \
ma_rt_index.c ma_rt_key.c ma_rt_mbr.c ma_rt_split.c \
ma_sp_key.c ma_control_file.c ma_loghandler.c \
ma_pagecache.c ma_pagecaches.c
ma_pagecache.c ma_pagecaches.c \
ma_commit.c
CLEANFILES = test?.MA? FT?.MA? isam.log ma_test_all ma_rt_test.MA? sp_test.MA?
SUFFIXES = .sh
......
......@@ -30,6 +30,7 @@
#include "maria_def.h"
#include "ma_rt_index.h"
#include "ma_blockrec.h"
#include "ma_commit.h"
#define MARIA_CANNOT_ROLLBACK HA_NO_TRANSACTIONS
#ifdef MARIA_CANNOT_ROLLBACK
......@@ -690,7 +691,8 @@ int ha_maria::open(const char *name, int mode, uint test_if_locked)
info(HA_STATUS_NO_LOCK | HA_STATUS_VARIABLE | HA_STATUS_CONST);
if (!(test_if_locked & HA_OPEN_WAIT_IF_LOCKED))
VOID(maria_extra(file, HA_EXTRA_WAIT_LOCK, 0));
if (file->s->data_file_type != STATIC_RECORD)
save_transactional= file->s->base.transactional;
if ((data_file_type= file->s->data_file_type) != STATIC_RECORD)
int_table_flags |= HA_REC_NOT_IN_SEQ;
if (file->s->options & (HA_OPTION_CHECKSUM | HA_OPTION_COMPRESS_RECORD))
int_table_flags |= HA_HAS_CHECKSUM;
......@@ -1178,6 +1180,8 @@ int ha_maria::repair(THD *thd, HA_CHECK &param, bool do_optimize)
llstr(rows, llbuff),
llstr(file->state->records, llbuff2));
}
if (!error)
error= _ma_repair_write_log_record(&param, file);
}
else
{
......@@ -1806,7 +1810,6 @@ int ha_maria::info(uint flag)
MY_APPEND_EXT | MY_UNPACK_FILENAME);
if (strcmp(name_buff, maria_info.index_file_name))
index_file_name=maria_info.index_file_name;
data_file_type= maria_info.data_file_type;
}
if (flag & HA_STATUS_ERRKEY)
{
......@@ -1860,7 +1863,7 @@ int ha_maria::external_lock(THD *thd, int lock_type)
{
TRN *trn= THD_TRN;
DBUG_ENTER("ha_maria::external_lock");
if (!file->s->base.transactional)
if (!save_transactional)
goto skip_transaction;
if (!trn && lock_type != F_UNLCK) /* no transaction yet - open it now */
{
......@@ -1884,6 +1887,19 @@ int ha_maria::external_lock(THD *thd, int lock_type)
trans_register_ha(thd, FALSE, maria_hton);
trnman_new_statement(trn);
}
if (!thd->transaction.on)
{
/*
No need to log REDOs/UNDOs. If this is an internal temporary table
which will be renamed to a permanent table (like in ALTER TABLE),
the rename happens after unlocking so will be durable (and the table
will get its create_rename_lsn).
Note: if we wanted to enable users to have an old backup and apply
tons of archived logs to roll-forward, we could then not disable
REDOs/UNDOs in this case.
*/
file->s->base.transactional= FALSE;
}
}
else
{
......@@ -1894,7 +1910,8 @@ int ha_maria::external_lock(THD *thd, int lock_type)
{
/* autocommit ? rollback a transaction */
#ifdef MARIA_CANNOT_ROLLBACK
trnman_commit_trn(trn);
if (ma_commit(trn))
DBUG_RETURN(1);
THD_TRN= 0;
#else
if (!(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)))
......@@ -1906,6 +1923,7 @@ int ha_maria::external_lock(THD *thd, int lock_type)
#endif
}
}
file->s->base.transactional= save_transactional;
}
skip_transaction:
DBUG_RETURN(maria_lock_database(file, !table->s->tmp_table ?
......@@ -1916,7 +1934,7 @@ int ha_maria::external_lock(THD *thd, int lock_type)
int ha_maria::start_stmt(THD *thd, thr_lock_type lock_type)
{
TRN *trn= THD_TRN;
if (file->s->base.transactional)
if (save_transactional)
{
DBUG_ASSERT(trn); // this may be called only after external_lock()
DBUG_ASSERT(trnman_has_locked_tables(trn));
......@@ -2186,8 +2204,7 @@ static int maria_commit(handlerton *hton __attribute__ ((unused)),
DBUG_RETURN(0); // end of statement
DBUG_PRINT("info", ("THD_TRN set to 0x0"));
THD_TRN= 0;
DBUG_RETURN(trnman_commit_trn(trn) ?
HA_ERR_OUT_OF_MEM : 0); // end of transaction
DBUG_RETURN(ma_commit(trn)); // end of transaction
}
......@@ -2212,6 +2229,7 @@ static int maria_rollback(handlerton *hton __attribute__ ((unused)),
static int ha_maria_init(void *p)
{
int res;
maria_hton= (handlerton *)p;
maria_hton->state= SHOW_OPTION_YES;
maria_hton->db_type= DB_TYPE_MARIA;
......@@ -2223,14 +2241,16 @@ static int ha_maria_init(void *p)
maria_hton->flags= HTON_CAN_RECREATE | HTON_SUPPORT_LOG_TABLES;
bzero(maria_log_pagecache, sizeof(*maria_log_pagecache));
maria_data_root= mysql_real_data_home;
return (test(maria_init() || ma_control_file_create_or_open() ||
(init_pagecache(maria_log_pagecache,
TRANSLOG_PAGECACHE_SIZE, 0, 0,
TRANSLOG_PAGE_SIZE) == 0) ||
translog_init(maria_data_root, TRANSLOG_FILE_SIZE,
MYSQL_VERSION_ID, server_id, maria_log_pagecache,
TRANSLOG_DEFAULT_FLAGS) ||
trnman_init()));
res= maria_init() || ma_control_file_create_or_open() ||
(init_pagecache(maria_log_pagecache,
TRANSLOG_PAGECACHE_SIZE, 0, 0,
TRANSLOG_PAGE_SIZE) == 0) ||
translog_init(maria_data_root, TRANSLOG_FILE_SIZE,
MYSQL_VERSION_ID, server_id, maria_log_pagecache,
TRANSLOG_DEFAULT_FLAGS) ||
trnman_init();
maria_multi_threaded= TRUE;
return res;
}
......
......@@ -39,6 +39,11 @@ class ha_maria :public handler
char *data_file_name, *index_file_name;
enum data_file_type data_file_type;
bool can_enable_indexes;
/**
@brief for temporarily disabling table's transactionality
(if THD::transaction::on is false), remember the original value here
*/
bool save_transactional;
int repair(THD * thd, HA_CHECK &param, bool optimize);
public:
......
......@@ -171,11 +171,14 @@
started and we can then delete TRANSID and VER_PTR from the row to
gain more space.
If a row is deleted in Maria, we change TRANSID to current transid and
change VER_PTR to point to the undo record for the delete. The undo
record must contain the original TRANSID, so that another transaction
can use this to check if they should use the found row or go to the
previous row pointed to by the VER_PTR in the undo row.
If a row is deleted in Maria, we change TRANSID to the deleting
transaction's id, change VER_PTR to point to the undo record for the delete,
and add DELETE_TRANSID (the id of the transaction which last
inserted/updated the row before its deletion). DELETE_TRANSID allows an old
transaction to avoid reading the log to know if it can see the last version
before delete (in other words it reduces the probability of having to follow
VER_PTR). TODO: depending on a compilation option, evaluate the performance
impact of not storing DELETE_TRANSID (which would make the row smaller).
Description of the different parts:
......@@ -391,7 +394,12 @@ my_bool _ma_once_end_block_record(MARIA_SHARE *share)
share->temporary ? FLUSH_IGNORE_CHANGED :
FLUSH_RELEASE))
res= 1;
if (my_close(share->bitmap.file.file, MYF(MY_WME)))
/*
File must be synced as it is going out of the maria_open_list and so
becoming unknown to Checkpoint.
*/
if (my_sync(share->bitmap.file.file, MYF(MY_WME)) ||
my_close(share->bitmap.file.file, MYF(MY_WME)))
res= 1;
/*
Trivial assignment to guard against multiple invocations
......@@ -400,6 +408,8 @@ my_bool _ma_once_end_block_record(MARIA_SHARE *share)
*/
share->bitmap.file.file= -1;
}
if (share->id != 0)
translog_deassign_id_from_share(share);
return res;
}
......@@ -573,7 +583,14 @@ void _ma_unpin_all_pages(MARIA_HA *info, LSN undo_lsn)
DBUG_ASSERT(undo_lsn != 0 || !info->s->base.transactional);
if (!info->s->base.transactional)
undo_lsn= 0; /* Avoid assert in key cache */
{
/*
If this is a transactional table but with transactionality temporarily
disabled (like in ALTER TABLE) we need to give a sensible LSN to pages
and not 0. If this is not a transactional table it will reduce to 0.
*/
undo_lsn= info->s->state.create_rename_lsn;
}
while (pinned_page-- != page_link)
pagecache_unlock_by_link(info->s->pagecache, pinned_page->link,
......@@ -1133,7 +1150,6 @@ static my_bool write_tail(MARIA_HA *info,
LSN lsn;
/* Log REDO changes of tail page */
fileid_store(log_data, info->dfile.file);
page_store(log_data+ FILEID_STORE_SIZE, block->page);
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
row_pos.rownr);
......@@ -1143,7 +1159,8 @@ static my_bool write_tail(MARIA_HA *info,
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= length;
if (translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_TAIL,
info->trn, share, sizeof(log_data) + length,
TRANSLOG_INTERNAL_PARTS + 2, log_array))
TRANSLOG_INTERNAL_PARTS + 2, log_array,
log_data))
DBUG_RETURN(1);
}
......@@ -1388,7 +1405,6 @@ static my_bool free_full_pages(MARIA_HA *info, MARIA_ROW *row)
size_t extents_length= row->extents_count * ROW_EXTENT_SIZE;
DBUG_ENTER("free_full_pages");
fileid_store(log_data, info->dfile.file);
pagerange_store(log_data + FILEID_STORE_SIZE,
row->extents_count);
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
......@@ -1397,7 +1413,8 @@ static my_bool free_full_pages(MARIA_HA *info, MARIA_ROW *row)
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= extents_length;
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS, info->trn,
info->s, sizeof(log_data) + extents_length,
TRANSLOG_INTERNAL_PARTS + 2, log_array))
TRANSLOG_INTERNAL_PARTS + 2, log_array,
log_data))
DBUG_RETURN(1);
DBUG_RETURN (_ma_bitmap_free_full_pages(info, row->extents,
......@@ -1431,7 +1448,6 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
{
LSN lsn;
DBUG_ASSERT(info->trn->rec_lsn);
fileid_store(log_data, info->dfile.file);
pagerange_store(log_data + FILEID_STORE_SIZE, 1);
int5store(log_data + FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE,
page);
......@@ -1442,7 +1458,8 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS,
info->trn, info->s, sizeof(log_data),
TRANSLOG_INTERNAL_PARTS + 1, log_array))
TRANSLOG_INTERNAL_PARTS + 1, log_array,
log_data))
res= 1;
}
......@@ -1455,24 +1472,25 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
}
/*
Write a record to a (set of) pages
/**
@brief Write a record to a (set of) pages
SYNOPSIS
write_block_record()
info Maria handler
old_record Orignal record in case of update; NULL in case of insert
record Record we should write
row Statistics about record (calculated by calc_record_size())
map_blocks On which pages the record should be stored
row_pos Position on head page where to put head part of record
@param info Maria handler
@param old_record Original record in case of update; NULL in case of
insert
@param record Record we should write
@param row Statistics about record (calculated by
calc_record_size())
@param map_blocks On which pages the record should be stored
@param row_pos Position on head page where to put head part of
record
NOTES
On return all pinned pages are released.
@note
On return all pinned pages are released.
RETURN
0 ok
1 error
@return Operation status
@retval 0 OK
@retval 1 Error
*/
static my_bool write_block_record(MARIA_HA *info,
......@@ -1940,7 +1958,6 @@ static my_bool write_block_record(MARIA_HA *info,
size_t data_length= (size_t) (data - row_pos->data);
/* Log REDO changes of head page */
fileid_store(log_data, info->dfile.file);
page_store(log_data+ FILEID_STORE_SIZE, head_block->page);
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
row_pos->rownr);
......@@ -1950,7 +1967,8 @@ static my_bool write_block_record(MARIA_HA *info,
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= data_length;
if (translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_HEAD, info->trn,
share, sizeof(log_data) + data_length,
TRANSLOG_INTERNAL_PARTS + 2, log_array))
TRANSLOG_INTERNAL_PARTS + 2, log_array,
log_data))
goto disk_err;
}
......@@ -2010,7 +2028,6 @@ static my_bool write_block_record(MARIA_HA *info,
NullS))
goto disk_err;
}
fileid_store(log_data, info->dfile.file);
log_pos= log_data + FILEID_STORE_SIZE;
log_array_pos= log_array+ TRANSLOG_INTERNAL_PARTS+1;
......@@ -2068,7 +2085,7 @@ static my_bool write_block_record(MARIA_HA *info,
error= translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_BLOBS,
info->trn, share, log_entry_length,
(uint) (log_array_pos - log_array),
log_array);
log_array, log_data);
if (log_array != tmp_log_array)
my_free((gptr) log_array, MYF(0));
if (error)
......@@ -2084,7 +2101,6 @@ static my_bool write_block_record(MARIA_HA *info,
/* LOGREC_UNDO_ROW_INSERT & LOGREC_UNDO_ROW_INSERT share same header */
lsn_store(log_data, info->trn->undo_lsn);
fileid_store(log_data + LSN_STORE_SIZE, info->dfile.file);
page_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE,
head_block->page);
dirpos_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE +
......@@ -2099,7 +2115,8 @@ static my_bool write_block_record(MARIA_HA *info,
/* Write UNDO log record for the INSERT */
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_INSERT,
info->trn, share, sizeof(log_data),
TRANSLOG_INTERNAL_PARTS + 1, log_array))
TRANSLOG_INTERNAL_PARTS + 1, log_array,
log_data + LSN_STORE_SIZE))
goto disk_err;
}
else
......@@ -2114,7 +2131,7 @@ static my_bool write_block_record(MARIA_HA *info,
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_UPDATE, info->trn,
share, sizeof(log_data) + row_length,
TRANSLOG_INTERNAL_PARTS + 1 + row_parts_count,
log_array))
log_array, log_data + LSN_STORE_SIZE))
goto disk_err;
}
}
......@@ -2164,6 +2181,15 @@ static my_bool write_block_record(MARIA_HA *info,
my_errno= HA_ERR_WRONG_IN_RECORD;
disk_err:
/**
@todo RECOVERY we are going to let dirty pages go to disk while we have
logged UNDO, this violates WAL. If we have not written any full pages,
all dirty pages are pinned so we could just delete them from the
pagecache. Moreover, we have written some REDOs without a closing UNDO,
it's possible that a next operation by this transaction succeeds and then
Recovery would glue the "orphan REDOs" to the succeeded operation and
execute the failed REDOs.
*/
/* Unpin all pinned pages to not cause problems for disk cache */
_ma_unpin_all_pages(info, 0);
......@@ -2229,20 +2255,18 @@ my_bool _ma_write_block_record(MARIA_HA *info __attribute__ ((unused)),
}
/*
Remove row written by _ma_write_block_record
/**
@brief Remove row written by _ma_write_block_record()
SYNOPSIS
_ma_abort_write_block_record()
info Maria handler
@param info Maria handler
INFORMATION
This is called in case we got a duplicate unique key while
writing keys.
@note
This is called in case we got a duplicate unique key while
writing keys.
RETURN
0 ok
1 error
@return Operation status
@retval 0 OK
@retval 1 Error
*/
my_bool _ma_write_abort_block_record(MARIA_HA *info)
......@@ -2288,16 +2312,19 @@ my_bool _ma_write_abort_block_record(MARIA_HA *info)
really undo a failed insert. Note that this UNDO will cause recover
to ignore the LOGREC_UNDO_ROW_INSERT that is the previous entry
in the UNDO chain.
We will soon change that: we will here execute the UNDO records
generated while we were trying to write the row; this will log some CLRs
which will replace this LOGREC_UNDO_PURGE. RECOVERY TODO BUG.
*/
/**
@todo RECOVERY BUG
We will soon change that: we will here execute the UNDO records
generated while we were trying to write the row; this will log some
CLRs which will replace this LOGREC_UNDO_PURGE.
*/
lsn_store(log_data, info->trn->undo_lsn);
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_PURGE,
info->trn, info->s, sizeof(log_data),
TRANSLOG_INTERNAL_PARTS + 1, log_array))
info->trn, NULL, sizeof(log_data),
TRANSLOG_INTERNAL_PARTS + 1, log_array, NULL))
res= 1;
}
_ma_unpin_all_pages(info, info->trn->undo_lsn);
......@@ -2514,7 +2541,6 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
DBUG_ASSERT(share->pagecache->block_size == block_size);
/* Log REDO data */
fileid_store(log_data, info->dfile.file);
page_store(log_data+ FILEID_STORE_SIZE, page);
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
record_number);
......@@ -2524,7 +2550,8 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
if (translog_write_record(&lsn, (head ? LOGREC_REDO_PURGE_ROW_HEAD :
LOGREC_REDO_PURGE_ROW_TAIL),
info->trn, share, sizeof(log_data),
TRANSLOG_INTERNAL_PARTS + 1, log_array))
TRANSLOG_INTERNAL_PARTS + 1, log_array,
log_data))
DBUG_RETURN(1);
if (pagecache_write(share->pagecache,
&info->dfile, page, 0,
......@@ -2545,7 +2572,6 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
PAGE_STORE_SIZE + PAGERANGE_STORE_SIZE];
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
fileid_store(log_data, info->dfile.file);
pagerange_store(log_data + FILEID_STORE_SIZE, 1);
page_store(log_data+ FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE, page);
pagerange_store(log_data + FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE +
......@@ -2554,7 +2580,8 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS,
info->trn, share, sizeof(log_data),
TRANSLOG_INTERNAL_PARTS + 1, log_array))
TRANSLOG_INTERNAL_PARTS + 1, log_array,
log_data))
DBUG_RETURN(1);
DBUG_ASSERT(empty_space >= info->s->bitmap.sizes[0]);
}
......@@ -2631,7 +2658,6 @@ my_bool _ma_delete_block_record(MARIA_HA *info, const byte *record)
/* Write UNDO record */
lsn_store(log_data, info->trn->undo_lsn);
fileid_store(log_data+ LSN_STORE_SIZE, info->dfile.file);
page_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE, page);
dirpos_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE +
PAGE_STORE_SIZE, record_number);
......@@ -2645,7 +2671,7 @@ my_bool _ma_delete_block_record(MARIA_HA *info, const byte *record)
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_DELETE, info->trn,
info->s, sizeof(log_data) + row_length,
TRANSLOG_INTERNAL_PARTS + 1 + row_parts_count,
info->log_row_parts))
info->log_row_parts, log_data + LSN_STORE_SIZE))
goto err;
}
......
......@@ -96,7 +96,7 @@ enum en_page_type { UNALLOCATED_PAGE, HEAD_PAGE, TAIL_PAGE, BLOB_PAGE, MAX_PAGE_
/******* defines that affects allocation (density) of data *******/
/*
If the tail part (from the main block or a blob) uses more than 75 % of
If the tail part (from the main block or a blob) would use more than 75 % of
the size of page, store the tail on a full page instead of a shared
tail page.
*/
......
......@@ -53,6 +53,7 @@
#endif
#include "ma_rt_index.h"
#include "ma_blockrec.h"
#include "trnman_public.h"
/* Functions defined in this file */
......@@ -2132,11 +2133,15 @@ int maria_repair(HA_CHECK *param, register MARIA_HA *info,
/* Replace the actual file with the temporary file */
if (new_file >= 0)
{
myf sync_dir= (share->base.transactional && !share->temporary) ?
MY_SYNC_DIR : 0;
my_close(new_file,MYF(0));
info->dfile.file= new_file= -1;
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
DATA_TMP_EXT, (param->testflag & T_BACKUP_DATA ?
MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
DATA_TMP_EXT,
MYF((param->testflag & T_BACKUP_DATA ?
MY_REDEL_MAKE_BACKUP : 0) |
sync_dir)) ||
_ma_open_datafile(info,share,-1))
got_error=1;
}
......@@ -2328,6 +2333,8 @@ int maria_sort_index(HA_CHECK *param, register MARIA_HA *info, my_string name)
int old_lock;
MARIA_SHARE *share=info->s;
MARIA_STATE_INFO old_state;
myf sync_dir= (share->base.transactional && !share->temporary) ?
MY_SYNC_DIR : 0;
DBUG_ENTER("maria_sort_index");
/* cannot sort index files with R-tree indexes */
......@@ -2388,7 +2395,7 @@ int maria_sort_index(HA_CHECK *param, register MARIA_HA *info, my_string name)
share->kfile.file = -1;
VOID(my_close(new_file,MYF(MY_WME)));
if (maria_change_to_newfile(share->index_file_name, MARIA_NAME_IEXT,
INDEX_TMP_EXT, MYF(0)) ||
INDEX_TMP_EXT, sync_dir) ||
_ma_open_keyfile(share))
goto err2;
info->lock_type= F_UNLCK; /* Force maria_readinfo to lock */
......@@ -2604,6 +2611,8 @@ int maria_repair_by_sort(HA_CHECK *param, register MARIA_HA *info,
char llbuff[22];
MARIA_SORT_INFO sort_info;
ulonglong key_map=share->state.key_map;
myf sync_dir= (share->base.transactional && !share->temporary) ?
MY_SYNC_DIR : 0;
DBUG_ENTER("maria_repair_by_sort");
start_records=info->state->records;
......@@ -2922,8 +2931,9 @@ int maria_repair_by_sort(HA_CHECK *param, register MARIA_HA *info,
info->dfile.file= new_file= -1;
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
DATA_TMP_EXT,
(param->testflag & T_BACKUP_DATA ?
MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
MYF((param->testflag & T_BACKUP_DATA ?
MY_REDEL_MAKE_BACKUP : 0) |
sync_dir)) ||
_ma_open_datafile(info,share,-1))
got_error=1;
}
......@@ -3022,6 +3032,8 @@ int maria_repair_parallel(HA_CHECK *param, register MARIA_HA *info,
MARIA_SORT_INFO sort_info;
ulonglong key_map=share->state.key_map;
pthread_attr_t thr_attr;
myf sync_dir= (share->base.transactional && !share->temporary) ?
MY_SYNC_DIR : 0;
DBUG_ENTER("maria_repair_parallel");
start_records=info->state->records;
......@@ -3445,8 +3457,9 @@ int maria_repair_parallel(HA_CHECK *param, register MARIA_HA *info,
info->dfile.file= new_file= -1;
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
DATA_TMP_EXT,
(param->testflag & T_BACKUP_DATA ?
MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
MYF((param->testflag & T_BACKUP_DATA ?
MY_REDEL_MAKE_BACKUP : 0) |
sync_dir)) ||
_ma_open_datafile(info,share,-1))
got_error=1;
}
......@@ -5135,3 +5148,64 @@ static void restore_data_file_type(MARIA_SHARE *share)
share->data_file_type= share->state.header.data_file_type=
share->pack.header_length= 0;
}
/**
@brief Writes a LOGREC_REPAIR_TABLE record and updates create_rename_lsn
REPAIR/OPTIMIZE have replaced the data/index file with a new file
and so, in this scenario:
@verbatim
CHECKPOINT - REDO_INSERT - COMMIT - ... - REPAIR - ... - crash
@endverbatim
we do not want Recovery to apply the REDO_INSERT to the table, as it would
then possibly wrongly extend the table. By updating create_rename_lsn at
the end of REPAIR, we know that REDO_INSERT will be skipped.
@param param description of the REPAIR operation
@param info table
@return Operation status
@retval 0 ok
@retval 1 error (disk problem)
*/
int _ma_repair_write_log_record(const HA_CHECK *param, MARIA_HA *info)
{
MARIA_SHARE *share= info->s;
/* Only called from ha_maria.cc, not maria_check, so translog is inited */
if (share->base.transactional && !share->temporary)
{
/* For now this record is only informative */
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
uchar log_data[LSN_STORE_SIZE];
compile_time_assert(LSN_STORE_SIZE >= (FILEID_STORE_SIZE + 4));
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= FILEID_STORE_SIZE + 4;
/*
testflag gives an idea of what REPAIR did (in particular T_QUICK
or not: did it touch the data file or not?).
*/
int4store(log_data + FILEID_STORE_SIZE, param->testflag);
if (unlikely(translog_write_record(&share->state.create_rename_lsn,
LOGREC_REDO_REPAIR_TABLE,
&dummy_transaction_object, share,
log_array[TRANSLOG_INTERNAL_PARTS +
0].length,
sizeof(log_array)/sizeof(log_array[0]),
log_array, log_data)))
return 1;
/*
But this piece is really needed, to have the new table's content durable
and to not apply old REDOs to the new table. The table's existence was
made durable earlier (MY_SYNC_DIR passed to maria_change_to_newfile()).
*/
lsn_store(log_data, share->state.create_rename_lsn);
DBUG_ASSERT(info->dfile.file >= 0);
DBUG_ASSERT(share->kfile.file >= 0);
return (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
_ma_sync_table_files(info));
}
return 0;
}
......@@ -57,14 +57,6 @@ int maria_close(register MARIA_HA *info)
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
}
flag= !--share->reopen;
/*
RECOVERY TODO:
If "flag" is TRUE, in the line below we are going to make the table
unknown to future checkpoints, so it needs to have fsync'ed itself
entirely (bitmap, pages, etc) at this point.
The flushing is currently done a few lines further (which is ok, as we
still hold THR_LOCK_maria), but syncing is missing.
*/
maria_open_list=list_delete(maria_open_list,&info->open_list);
pthread_mutex_unlock(&share->intern_lock);
......@@ -82,7 +74,12 @@ int maria_close(register MARIA_HA *info)
FLUSH_IGNORE_CHANGED :
FLUSH_RELEASE)))
error= my_errno;
/*
File must be synced as it is going out of the maria_open_list and so
becoming unknown to Checkpoint.
*/
if (my_sync(share->kfile.file, MYF(MY_WME)))
error= my_errno;
/*
If we are crashed, we can safely flush the current state as it will
not change the crashed state.
......
/* Copyright (C) 2007 MySQL AB
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
#include "maria_def.h"
#include "trnman.h"
/**
@brief writes a COMMIT record to log and commits transaction in memory
@param trn transaction
@return Operation status
@retval 0 ok
@retval 1 error (disk error or out of memory)
*/
int ma_commit(TRN *trn)
{
if (trn->undo_lsn == 0) /* no work done, rollback (cheaper than commit) */
return trnman_rollback_trn(trn);
/*
- if COMMIT record is written before trnman_commit_trn():
if Checkpoint comes in the middle it will see trn is not committed,
then if crash, Recovery might roll back trn (if min(rec_lsn) is after
COMMIT record) and this is not an issue as
* transaction's updates were not made visible to other transactions
* "commit ok" was not sent to client
Alternatively, Recovery might commit trn (if min(rec_lsn) is before COMMIT
record), which is ok too. All in all it means that "trn committed" is not
100% equal to "COMMIT record written".
- if COMMIT record is written after trnman_commit_trn():
if crash happens between the two, trn will be rolled back which is an
issue (transaction's updates were made visible to other transactions).
So we need to go the first way.
*/
/**
@todo RECOVERY share's state is written to disk only in
maria_lock_database(), so COMMIT record is not the last record of the
transaction! It is probably an issue. Recovery of the state is a problem
not yet solved.
*/
LSN commit_lsn;
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS];
/*
We do not store "thd->transaction.xid_state.xid" for now, it will be
needed only when we support XA.
*/
return
translog_write_record(&commit_lsn, LOGREC_COMMIT,
trn, NULL, 0,
sizeof(log_array)/sizeof(log_array[0]),
log_array, NULL) ||
translog_flush(commit_lsn) || trnman_commit_trn(trn);
/*
Note: if trnman_commit_trn() fails above, we have already
written the COMMIT record, so Checkpoint and Recovery will see the
transaction as committed.
*/
}
/* Copyright (C) 2007 MySQL AB
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
C_MODE_START
int ma_commit(TRN *trn);
C_MODE_END
......@@ -50,6 +50,13 @@
LSN last_checkpoint_lsn;
uint32 last_logno;
/**
@brief If log's lock should be asserted when writing to control file.
Can be re-used by any function which needs to be thread-safe except when
it is called at startup.
*/
my_bool maria_multi_threaded= FALSE;
/*
Control file is less then 512 bytes (a disk sector),
......@@ -203,6 +210,8 @@ CONTROL_FILE_ERROR ma_control_file_create_or_open()
the last_checkpoint_lsn and last_logno global variables.
Called when we have created a new log (after syncing this log's creation)
and when we have written a checkpoint (after syncing this log record).
Variables last_checkpoint_lsn and last_logno must be protected by caller
using log's lock, unless this function is called at startup.
SYNOPSIS
ma_control_file_write_and_force()
......@@ -233,12 +242,14 @@ int ma_control_file_write_and_force(const LSN checkpoint_lsn, uint32 logno,
DBUG_ENTER("ma_control_file_write_and_force");
DBUG_ASSERT(control_file_fd >= 0); /* must be open */
#ifndef DBUG_OFF
if (maria_multi_threaded)
translog_lock_assert_owner();
#endif
memcpy(buffer + CONTROL_FILE_MAGIC_STRING_OFFSET,
CONTROL_FILE_MAGIC_STRING, CONTROL_FILE_MAGIC_STRING_SIZE);
/* TODO: you need some protection to be able to read last_* global vars */
if (objs_to_write == CONTROL_FILE_UPDATE_ONLY_LSN)
update_checkpoint_lsn= TRUE;
else if (objs_to_write == CONTROL_FILE_UPDATE_ONLY_LOGNO)
......@@ -270,7 +281,6 @@ int ma_control_file_write_and_force(const LSN checkpoint_lsn, uint32 logno,
my_sync(control_file_fd, MYF(MY_WME)))
DBUG_RETURN(1);
/* TODO: you need some protection to be able to write last_* global vars */
if (update_checkpoint_lsn)
last_checkpoint_lsn= checkpoint_lsn;
if (update_logno)
......
......@@ -43,6 +43,8 @@ extern LSN last_checkpoint_lsn;
*/
extern uint32 last_logno;
extern my_bool maria_multi_threaded;
typedef enum enum_control_file_error {
CONTROL_FILE_OK= 0,
CONTROL_FILE_TOO_SMALL,
......
......@@ -19,6 +19,7 @@
#include "ma_sp_defs.h"
#include <my_bit.h>
#include "ma_blockrec.h"
#include "trnman_public.h"
#if defined(MSDOS) || defined(__WIN__)
#ifdef __WIN__
......@@ -51,7 +52,8 @@ int maria_create(const char *name, enum data_file_type datafile_type,
unique_key_parts,fulltext_keys,offset, not_block_record_extra_length;
uint max_field_lengths, extra_header_size;
ulong reclength, real_reclength,min_pack_length;
char filename[FN_REFLEN],linkname[FN_REFLEN], *linkname_ptr;
char filename[FN_REFLEN], dlinkname[FN_REFLEN], *dlinkname_ptr= NULL,
klinkname[FN_REFLEN], *klinkname_ptr= NULL;
ulong pack_reclength;
ulonglong tot_length,max_rows, tmp;
enum en_fieldtype type;
......@@ -62,11 +64,12 @@ int maria_create(const char *name, enum data_file_type datafile_type,
HA_KEYSEG *keyseg,tmp_keyseg;
MARIA_COLUMNDEF *column, *end_column;
ulong *rec_per_key_part;
my_off_t key_root[HA_MAX_POSSIBLE_KEY];
my_off_t key_root[HA_MAX_POSSIBLE_KEY], kfile_size_before_extension;
MARIA_CREATE_INFO tmp_create_info;
my_bool tmp_table= FALSE; /* cache for presence of HA_OPTION_TMP_TABLE */
my_bool forced_packed;
myf sync_dir= MY_SYNC_DIR;
myf sync_dir= 0;
uchar *log_data= NULL;
DBUG_ENTER("maria_create");
DBUG_PRINT("enter", ("keys: %u columns: %u uniques: %u flags: %u",
keys, columns, uniques, flags));
......@@ -250,8 +253,9 @@ int maria_create(const char *name, enum data_file_type datafile_type,
if (flags & HA_CREATE_TMP_TABLE)
{
options|= HA_OPTION_TMP_TABLE;
tmp_table= TRUE;
create_mode|= O_EXCL | O_NOFOLLOW;
/* temp tables are not crash-safe (dropped at restart) */
/* "CREATE TEMPORARY" tables are not crash-safe (dropped at restart) */
ci->transactional= FALSE;
}
share.base.null_bytes= ci->null_bytes;
......@@ -624,6 +628,7 @@ int maria_create(const char *name, enum data_file_type datafile_type,
share.state.dellink = HA_OFFSET_ERROR;
share.state.first_bitmap_with_space= 0;
share.state.create_rename_lsn= 0;
share.state.process= (ulong) getpid();
share.state.unique= (ulong) 0;
share.state.update_count=(ulong) 0;
......@@ -671,11 +676,15 @@ int maria_create(const char *name, enum data_file_type datafile_type,
#endif
/* max_data_file_length and max_key_file_length are recalculated on open */
if (options & HA_OPTION_TMP_TABLE)
{
tmp_table= TRUE;
sync_dir= 0;
if (tmp_table)
share.base.max_data_file_length= (my_off_t) ci->data_file_length;
else if (ci->transactional && translog_inited)
{
/*
we have checked translog_inited above, because maria_chk may call us
(via maria_recreate_table()) and it does not have a log.
*/
sync_dir= MY_SYNC_DIR;
}
if (datafile_type == BLOCK_RECORD)
......@@ -712,9 +721,9 @@ int maria_create(const char *name, enum data_file_type datafile_type,
MY_UNPACK_FILENAME | (have_iext ? MY_REPLACE_EXT :
MY_APPEND_EXT));
}
fn_format(linkname, name, "", MARIA_NAME_IEXT,
fn_format(klinkname, name, "", MARIA_NAME_IEXT,
MY_UNPACK_FILENAME|MY_APPEND_EXT);
linkname_ptr=linkname;
klinkname_ptr= klinkname;
/*
Don't create the table if the link or file exists to ensure that one
doesn't accidently destroy another table.
......@@ -730,7 +739,6 @@ int maria_create(const char *name, enum data_file_type datafile_type,
(MY_UNPACK_FILENAME |
(flags & HA_DONT_TOUCH_DATA) ? MY_RETURN_REAL_PATH : 0) |
MY_APPEND_EXT);
linkname_ptr=0;
/*
Replace the current file.
Don't sync dir now if the data file has the same path.
......@@ -753,7 +761,7 @@ int maria_create(const char *name, enum data_file_type datafile_type,
goto err;
}
if ((file= my_create_with_symlink(linkname_ptr, filename, 0, create_mode,
if ((file= my_create_with_symlink(klinkname_ptr, filename, 0, create_mode,
MYF(MY_WME|create_flag))) < 0)
goto err;
errpos=1;
......@@ -780,24 +788,24 @@ int maria_create(const char *name, enum data_file_type datafile_type,
MY_UNPACK_FILENAME |
(have_dext ? MY_REPLACE_EXT : MY_APPEND_EXT));
}
fn_format(linkname, name, "",MARIA_NAME_DEXT,
fn_format(dlinkname, name, "",MARIA_NAME_DEXT,
MY_UNPACK_FILENAME | MY_APPEND_EXT);
linkname_ptr=linkname;
dlinkname_ptr= dlinkname;
create_flag=0;
}
else
{
fn_format(filename,name,"", MARIA_NAME_DEXT,
MY_UNPACK_FILENAME | MY_APPEND_EXT);
linkname_ptr=0;
create_flag=MY_DELETE_OLD;
}
if ((dfile=
my_create_with_symlink(linkname_ptr, filename, 0, create_mode,
my_create_with_symlink(dlinkname_ptr, filename, 0, create_mode,
MYF(MY_WME | create_flag | sync_dir))) < 0)
goto err;
errpos=3;
share.data_file_type= datafile_type;
if (_ma_initialize_data_file(dfile, &share))
goto err;
}
......@@ -925,14 +933,82 @@ int maria_create(const char *name, enum data_file_type datafile_type,
goto err;
}
if ((kfile_size_before_extension= my_tell(file,MYF(0))) == MY_FILEPOS_ERROR)
goto err;
#ifndef DBUG_OFF
if ((uint) my_tell(file,MYF(0)) != info_length)
if (kfile_size_before_extension != info_length)
DBUG_PRINT("warning",("info_length: %u != used_length: %u",
info_length, (uint)kfile_size_before_extension));
#endif
if (sync_dir)
{
uint pos= (uint) my_tell(file,MYF(0));
DBUG_PRINT("warning",("info_length: %d != used_length: %d",
info_length, pos));
/*
we log the first bytes and then the size to which we extend; this is
not log 1 KB of mostly zeroes if this is a small table.
*/
char empty_string[]= "";
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 3];
uint total_rec_length= 0;
uint i;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= 1 + 2 +
kfile_size_before_extension;
/* we are needing maybe 64 kB, so don't use the stack */
log_data= my_malloc(log_array[TRANSLOG_INTERNAL_PARTS + 0].length, MYF(0));
if ((log_data == NULL) ||
my_pread(file, 1 + 2 + log_data, kfile_size_before_extension,
0, MYF(MY_NABP)))
goto err_no_lock;
/*
remember if the data file was created or not, to know if Recovery can
do it or not, in the future
*/
log_data[0]= test(flags & HA_DONT_TOUCH_DATA);
int2store(log_data + 1, kfile_size_before_extension);
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= log_data;
/* symlink description is also needed for re-creation by Recovery: */
log_array[TRANSLOG_INTERNAL_PARTS + 1].str=
dlinkname_ptr ? dlinkname : empty_string;
log_array[TRANSLOG_INTERNAL_PARTS + 1].length=
strlen(log_array[TRANSLOG_INTERNAL_PARTS + 1].str);
log_array[TRANSLOG_INTERNAL_PARTS + 2].str=
klinkname_ptr ? klinkname : empty_string;
log_array[TRANSLOG_INTERNAL_PARTS + 2].length=
strlen(log_array[TRANSLOG_INTERNAL_PARTS + 2].str);
for (i= TRANSLOG_INTERNAL_PARTS;
i < (sizeof(log_array)/sizeof(log_array[0])); i++)
total_rec_length+= log_array[i].length;
/*
For this record to be of any use for Recovery, we need the upper
MySQL layer to be crash-safe, which it is not now (that would require
work using the ddl_log of sql/sql_table.cc); when it is, we should
reconsider the moment of writing this log record (before or after op,
under THR_LOCK_maria or not...), how to use it in Recovery, and force
the log. For now this record is just informative.
Note that in case of TRUNCATE TABLE we also come here.
When in CREATE/TRUNCATE (or DROP or RENAME or REPAIR) we have not called
external_lock(), so have no TRN. It does not matter, as all these
operations are non-transactional and sync their files.
*/
if (unlikely(translog_write_record(&share.state.create_rename_lsn,
LOGREC_REDO_CREATE_TABLE,
&dummy_transaction_object, NULL,
total_rec_length,
sizeof(log_array)/sizeof(log_array[0]),
log_array, NULL)))
goto err_no_lock;
/*
store LSN into file, needed for Recovery to not be confused if a
DROP+CREATE happened (applying REDOs to the wrong table).
If such direct my_pwrite() to a fixed offset is too "hackish", I can
call ma_state_info_write() again but it will be less efficient.
*/
lsn_store(log_data, share.state.create_rename_lsn);
if (my_pwrite(file, log_data, LSN_STORE_SIZE,
sizeof(share.state.header) + 2, MYF(MY_NABP)))
goto err_no_lock;
my_free(log_data, MYF(0));
}
#endif
/* Enlarge files */
DBUG_PRINT("info", ("enlarge to keystart: %lu",
......@@ -940,38 +1016,25 @@ int maria_create(const char *name, enum data_file_type datafile_type,
if (my_chsize(file,(ulong) share.base.keystart,0,MYF(0)))
goto err;
if (sync_dir && my_sync(file, MYF(0)))
goto err;
if (! (flags & HA_DONT_TOUCH_DATA))
{
#ifdef USE_RELOC
if (my_chsize(dfile,share.base.min_pack_length*ci->reloc_rows,0,MYF(0)))
goto err;
if (!tmp_table && my_sync(file, MYF(0)))
goto err;
#endif
/* if !USE_RELOC, there was no write to the file, no need to sync it */
errpos=2;
if (my_close(dfile,MYF(0)))
if ((sync_dir && my_sync(dfile, MYF(0))) || my_close(dfile,MYF(0)))
goto err;
}
errpos=0;
pthread_mutex_unlock(&THR_LOCK_maria);
res= 0;
my_free((char*) rec_per_key_part,MYF(0));
errpos=0;
if (my_close(file,MYF(0)))
res= my_errno;
/*
RECOVERY TODO
Write a log record describing the CREATE operation (just the file
names, link names, and the full header's content).
For this record to be of any use for Recovery, we need the upper
MySQL layer to be crash-safe, which it is not now (that would require work
using the ddl_log of sql/sql_table.cc); when is is, we should reconsider
the moment of writing this log record (before or after op, under
THR_LOCK_maria or not...), how to use it in Recovery, and force the log.
For now this record is just informative.
If operation failed earlier, we clean up in "err:" and the MySQL layer
will clean up the frm, so we needn't write anything to the log.
*/
my_free((char*) rec_per_key_part,MYF(0));
DBUG_RETURN(res);
err:
......@@ -996,6 +1059,7 @@ int maria_create(const char *name, enum data_file_type datafile_type,
MY_UNPACK_FILENAME | MY_APPEND_EXT),
sync_dir);
}
my_free(log_data, MYF(MY_ALLOW_ZERO_PTR));
my_free((char*) rec_per_key_part, MYF(0));
DBUG_RETURN(my_errno=save_errno); /* return the fatal errno */
}
......@@ -1086,9 +1150,9 @@ int _ma_initialize_data_file(File dfile, MARIA_SHARE *share)
{
if (share->data_file_type == BLOCK_RECORD)
{
if (my_chsize(dfile, maria_block_size, 0, MYF(MY_WME)))
if (my_chsize(dfile, share->base.block_size, 0, MYF(MY_WME)))
return 1;
share->state.state.data_file_length= maria_block_size;
share->state.state.data_file_length= share->base.block_size;
_ma_bitmap_delete_all(share);
}
return 0;
......
......@@ -17,21 +17,38 @@
/* This clears the status information and truncates files */
#include "maria_def.h"
#include "trnman_public.h"
/**
@brief deletes all rows from a table
@param info Maria handler
@return Operation status
@retval 0 ok
@retval 1 error
*/
int maria_delete_all_rows(MARIA_HA *info)
{
uint i;
MARIA_SHARE *share=info->s;
MARIA_STATE_INFO *state=&share->state;
my_bool log_record;
DBUG_ENTER("maria_delete_all_rows");
if (share->options & HA_OPTION_READ_ONLY_DATA)
{
DBUG_RETURN(my_errno=EACCES);
}
/* LOCK TODO take X-lock on table here */
/**
@todo LOCK take X-lock on table here.
When we have versioning, if some other thread is looking at this table,
we cannot shrink the file like this.
*/
if (_ma_readinfo(info,F_WRLCK,1))
DBUG_RETURN(my_errno);
log_record= share->base.transactional && !share->temporary;
if (_ma_mark_file_changed(info))
goto err;
......@@ -54,27 +71,13 @@ int maria_delete_all_rows(MARIA_HA *info)
*/
flush_pagecache_blocks(share->pagecache, &share->kfile,
FLUSH_IGNORE_CHANGED);
/*
RECOVERY TODO Log the two chsize and header modifications and force the
log. So that if crash between the two chsize, we finish the work at
Recovery. For this scenario:
"TRUNCATE TABLE t1; DROP TABLE t1; RENAME TABLE t2 to t1; crash;"
Recovery mustn't truncate the new t1, so the log records of TRUNCATE
should be applied only if t1 exists and its ZeroDirtyPagesLSN is smaller
than the records'. See more comments below.
*/
if (my_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
my_chsize(share->kfile.file, share->base.keystart, 0, MYF(MY_WME)) )
goto err;
if (_ma_initialize_data_file(info->dfile.file, info->s))
if (_ma_initialize_data_file(info->dfile.file, share))
goto err;
/*
RECOVERY TODO Consider updating ZeroDirtyPagesLSN here. It is
not a necessity (it is one only in RENAME commands) but an optional
optimization which will allow some REDO skipping at Recovery.
*/
VOID(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
#ifdef HAVE_MMAP
/* Resize mmaped area */
......@@ -82,24 +85,48 @@ int maria_delete_all_rows(MARIA_HA *info)
_ma_remap_file(info, (my_off_t)0);
rw_unlock(&info->s->mmap_lock);
#endif
/*
RECOVERY TODO Until we have the TRUNCATE log record and take it into
account for log-low-water-mark calculation and use it in Recovery, we need
to sync.
*/
if (_ma_sync_table_files(info))
goto err;
if (log_record)
{
/* For now this record is only informative */
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
uchar log_data[LSN_STORE_SIZE];
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= FILEID_STORE_SIZE;
if (unlikely(translog_write_record(&share->state.create_rename_lsn,
LOGREC_REDO_DELETE_ALL,
info->trn, share, 0,
sizeof(log_array)/sizeof(log_array[0]),
log_array, log_data)))
goto err;
/*
store LSN into file. It is an optimization so that all old REDOs for
this table are ignored (scenario: checkpoint, INSERT1s, DELETE ALL;
INSERT2s, crash: then Recovery can skip INSERT1s). It also allows us to
ignore the present record at Recovery.
Note that storing the LSN could not be done by _ma_writeinfo() above as
the table is locked at this moment. So we need to do it by ourselves.
*/
lsn_store(log_data, share->state.create_rename_lsn);
if (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
_ma_sync_table_files(info))
goto err;
/**
@todo RECOVERY Until we take into account the log record above
for log-low-water-mark calculation and use it in Recovery, we need
to sync above.
*/
}
allow_break(); /* Allow SIGHUP & SIGINT */
DBUG_RETURN(0);
err:
{
int save_errno=my_errno;
/* RECOVERY TODO log the header modifications */
VOID(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
info->update|=HA_STATE_WRITTEN; /* Buffer changed */
/* RECOVERY TODO until we log above we have to sync */
if (_ma_sync_table_files(info) && !save_errno)
/** @todo RECOVERY until we use the log record above we have to sync */
if (log_record &&_ma_sync_table_files(info) && !save_errno)
save_errno= my_errno;
allow_break(); /* Allow SIGHUP & SIGINT */
DBUG_RETURN(my_errno=save_errno);
......
......@@ -13,11 +13,18 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
/*
deletes a table
*/
#include "ma_fulltext.h"
#include "trnman_public.h"
/**
@brief drops (deletes) a table
@param name table's name
@return Operation status
@retval 0 ok
@retval 1 error
*/
int maria_delete_table(const char *name)
{
......@@ -25,56 +32,78 @@ int maria_delete_table(const char *name)
#ifdef USE_RAID
uint raid_type=0,raid_chunks=0;
#endif
MARIA_HA *info;
myf sync_dir;
DBUG_ENTER("maria_delete_table");
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(name,"delete");
#endif
/* LOCK TODO take X-lock on table here */
/** @todo LOCK take X-lock on table */
/*
We need to know if this table is transactional.
When built with RAID support, we also need to determine if this table
makes use of the raid feature. If yes, we need to remove all raid
chunks. This is done with my_raid_delete(). Unfortunately it is
necessary to open the table just to check this. We use
'open_for_repair' to be able to open even a crashed table. If even
this open fails, we assume no raid configuration for this table
and try to remove the normal data file only. This may however
leave the raid chunks behind.
*/
if (!(info= maria_open(name, O_RDONLY, HA_OPEN_FOR_REPAIR)))
{
#ifdef USE_RAID
raid_type= 0;
#endif
sync_dir= 0;
}
else
{
MARIA_HA *info;
/*
When built with RAID support, we need to determine if this table
makes use of the raid feature. If yes, we need to remove all raid
chunks. This is done with my_raid_delete(). Unfortunately it is
necessary to open the table just to check this. We use
'open_for_repair' to be able to open even a crashed table. If even
this open fails, we assume no raid configuration for this table
and try to remove the normal data file only. This may however
leave the raid chunks behind.
*/
if (!(info= maria_open(name, O_RDONLY, HA_OPEN_FOR_REPAIR)))
raid_type= 0;
else
{
raid_type= info->s->base.raid_type;
raid_chunks= info->s->base.raid_chunks;
maria_close(info);
}
#ifdef USE_RAID
raid_type= info->s->base.raid_type;
raid_chunks= info->s->base.raid_chunks;
#endif
sync_dir= (info->s->base.transactional && !info->s->temporary) ?
MY_SYNC_DIR : 0;
maria_close(info);
}
#ifdef USE_RAID
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(name,"delete");
#endif
#endif /* USE_RAID */
if (sync_dir)
{
/*
For this log record to be of any use for Recovery, we need the upper
MySQL layer to be crash-safe in DDLs; when it is we should reconsider
the moment of writing this log record, how to use it in Recovery, and
force the log. For now this record is only informative.
*/
LSN lsn;
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char *)name;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= strlen(name);
if (unlikely(translog_write_record(&lsn, LOGREC_REDO_DROP_TABLE,
&dummy_transaction_object, NULL,
log_array[TRANSLOG_INTERNAL_PARTS +
0].length,
sizeof(log_array)/sizeof(log_array[0]),
log_array, NULL)))
DBUG_RETURN(1);
}
fn_format(from,name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
/*
RECOVERY TODO log the two deletes below.
Then do the file deletions.
For this log record to be of any use for Recovery, we need the upper MySQL
layer to be crash-safe in DDLs; when it is we should reconsider the moment
of writing this log record, how to use it in Recovery, and force the log.
For now this record is only informative.
*/
if (my_delete_with_symlink(from, MYF(MY_WME | MY_SYNC_DIR)))
if (my_delete_with_symlink(from, MYF(MY_WME | sync_dir)))
DBUG_RETURN(my_errno);
fn_format(from,name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
#ifdef USE_RAID
if (raid_type)
DBUG_RETURN(my_raid_delete(from, raid_chunks, MYF(MY_WME | MY_SYNC_DIR)) ?
DBUG_RETURN(my_raid_delete(from, raid_chunks, MYF(MY_WME | sync_dir)) ?
my_errno : 0);
#endif
DBUG_RETURN(my_delete_with_symlink(from, MYF(MY_WME | MY_SYNC_DIR)) ?
DBUG_RETURN(my_delete_with_symlink(from, MYF(MY_WME | sync_dir)) ?
my_errno : 0);
}
......@@ -21,21 +21,20 @@
static void maria_extra_keyflag(MARIA_HA *info,
enum ha_extra_function function);
/**
@brief Set options and buffers to optimize table handling
/*
Set options and buffers to optimize table handling
@param name table's name
@param info open table
@param function operation
@param extra_arg Pointer to extra argument (normally pointer to
ulong); used when function is one of:
HA_EXTRA_WRITE_CACHE
HA_EXTRA_CACHE
SYNOPSIS
maria_extra()
info open table
function operation
extra_arg Pointer to extra argument (normally pointer to ulong)
Used when function is one of:
HA_EXTRA_WRITE_CACHE
HA_EXTRA_CACHE
RETURN VALUES
0 ok
# error
@return Operation status
@retval 0 ok
@retval !=0 error
*/
int maria_extra(MARIA_HA *info, enum ha_extra_function function,
......@@ -265,14 +264,24 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
pthread_mutex_unlock(&THR_LOCK_maria);
break;
case HA_EXTRA_PREPARE_FOR_DELETE:
/* QQ: suggest to rename it to "PREPARE_FOR_DROP" */
pthread_mutex_lock(&THR_LOCK_maria);
share->last_version= 0L; /* Impossible version */
#ifdef __WIN__
/* Close the isam and data files as Win32 can't drop an open table */
pthread_mutex_lock(&share->intern_lock);
/*
If this is Windows we remove blocks from pagecache. If not Windows we
don't do it, so these pages stay in the pagecache? So they may later be
flushed to a wrong file?
Or is it that this flush_pagecache_blocks() never finds any blocks? Then
why do we do it on Windows?
Don't we wait for all instances to be closed before dropping the table?
Do we ever do something useful here?
BUG?
*/
if (flush_pagecache_blocks(share->pagecache, &share->kfile,
(function == HA_EXTRA_FORCE_REOPEN ?
FLUSH_RELEASE : FLUSH_IGNORE_CHANGED)))
FLUSH_IGNORE_CHANGED))
{
error=my_errno;
share->changed=1;
......@@ -292,9 +301,11 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
info->lock_type = F_UNLCK;
}
if (share->kfile.file >= 0)
{
_ma_decrement_open_count(info);
if (share->kfile.file >= 0 && my_close(share->kfile,MYF(0)))
error=my_errno;
if (my_close(share->kfile,MYF(0)))
error=my_errno;
}
{
LIST *list_element ;
for (list_element=maria_open_list ;
......@@ -304,6 +315,9 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
MARIA_HA *tmpinfo=(MARIA_HA*) list_element->data;
if (tmpinfo->s == info->s)
{
/**
@todo RECOVERY BUG: flush of bitmap and sync of dfile are missing
*/
if (tmpinfo->dfile.file >= 0 &&
my_close(tmpinfo->dfile.file, MYF(0)))
error = my_errno;
......
......@@ -53,7 +53,7 @@ void maria_end(void)
{
if (maria_inited)
{
maria_inited= FALSE;
maria_inited= maria_multi_threaded= FALSE;
ft_free_stopwords();
trnman_destroy();
translog_destroy();
......
......@@ -17,6 +17,14 @@
#include "ma_blockrec.h"
#include "trnman.h"
/**
@file
@brief Module which writes and reads to a transaction log
@todo LOG: in functions where the log's lock is required, a
translog_assert_owner() could be added.
*/
/* number of opened log files in the pagecache (should be at least 2) */
#define OPENED_FILES_NUM 3
......@@ -166,7 +174,7 @@ static struct st_translog_descriptor log_descriptor;
/* Marker for end of log */
static byte end_of_log= 0;
static my_bool translog_inited;
my_bool translog_inited= 0;
/* record classes */
enum record_class
......@@ -218,7 +226,7 @@ struct st_log_record_type_descriptor
uint16 read_header_len;
/* HOOK for writing the record called before lock */
prewrite_rec_hook prewrite_hook;
/* HOOK for writing the record called when LSN is known */
/* HOOK for writing the record called when LSN is known, inside lock */
inwrite_rec_hook inwrite_hook;
/* HOOK for reading headers */
read_rec_hook read_hook;
......@@ -230,6 +238,13 @@ struct st_log_record_type_descriptor
};
#include <my_atomic.h>
/* an array that maps id of a MARIA_SHARE to this MARIA_SHARE */
static MARIA_SHARE **id_to_share= NULL;
#define SHARE_ID_MAX 65535 /* array's size */
/* lock for id_to_share */
static my_atomic_rwlock_t LOCK_id_to_share;
static my_bool write_hook_for_redo(enum translog_record_type type,
TRN *trn, LSN *lsn,
struct st_translog_parts *parts);
......@@ -291,7 +306,9 @@ static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_HEAD=
write_hook_for_redo, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_TAIL=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, NULL, NULL, 0};
{LOGRECTYPE_VARIABLE_LENGTH, 0,
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE, NULL,
write_hook_for_redo, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_BLOB=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, write_hook_for_redo, NULL, 0};
......@@ -376,15 +393,9 @@ static LOG_DESC INIT_LOGREC_COMMIT=
static LOG_DESC INIT_LOGREC_COMMIT_WITH_UNDO_PURGE=
{LOGRECTYPE_PSEUDOFIXEDLENGTH, 5, 5, NULL, NULL, NULL, 1};
static LOG_DESC INIT_LOGREC_CHECKPOINT_PAGE=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 6, NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_CHECKPOINT_TRAN=
static LOG_DESC INIT_LOGREC_CHECKPOINT=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_CHECKPOINT_TABL=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_CREATE_TABLE=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
......@@ -394,8 +405,13 @@ static LOG_DESC INIT_LOGREC_REDO_RENAME_TABLE=
static LOG_DESC INIT_LOGREC_REDO_DROP_TABLE=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_TRUNCATE_TABLE=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_DELETE_ALL=
{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE, FILEID_STORE_SIZE,
NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_REPAIR_TABLE=
{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE + 4, FILEID_STORE_SIZE + 4,
NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_FILE_ID=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 4, NULL, NULL, NULL, 0};
......@@ -403,6 +419,7 @@ static LOG_DESC INIT_LOGREC_FILE_ID=
static LOG_DESC INIT_LOGREC_LONG_TRANSACTION_ID=
{LOGRECTYPE_FIXEDLENGTH, 6, 6, NULL, NULL, NULL, 0};
const myf log_write_flags= MY_WME | MY_NABP | MY_WAIT_IF_FULL;
static void loghandler_init()
{
......@@ -454,20 +471,18 @@ static void loghandler_init()
INIT_LOGREC_COMMIT;
log_record_type_descriptor[LOGREC_COMMIT_WITH_UNDO_PURGE]=
INIT_LOGREC_COMMIT_WITH_UNDO_PURGE;
log_record_type_descriptor[LOGREC_CHECKPOINT_PAGE]=
INIT_LOGREC_CHECKPOINT_PAGE;
log_record_type_descriptor[LOGREC_CHECKPOINT_TRAN]=
INIT_LOGREC_CHECKPOINT_TRAN;
log_record_type_descriptor[LOGREC_CHECKPOINT_TABL]=
INIT_LOGREC_CHECKPOINT_TABL;
log_record_type_descriptor[LOGREC_CHECKPOINT]=
INIT_LOGREC_CHECKPOINT;
log_record_type_descriptor[LOGREC_REDO_CREATE_TABLE]=
INIT_LOGREC_REDO_CREATE_TABLE;
log_record_type_descriptor[LOGREC_REDO_RENAME_TABLE]=
INIT_LOGREC_REDO_RENAME_TABLE;
log_record_type_descriptor[LOGREC_REDO_DROP_TABLE]=
INIT_LOGREC_REDO_DROP_TABLE;
log_record_type_descriptor[LOGREC_REDO_TRUNCATE_TABLE]=
INIT_LOGREC_REDO_TRUNCATE_TABLE;
log_record_type_descriptor[LOGREC_REDO_DELETE_ALL]=
INIT_LOGREC_REDO_DELETE_ALL;
log_record_type_descriptor[LOGREC_REDO_REPAIR_TABLE]=
INIT_LOGREC_REDO_REPAIR_TABLE;
log_record_type_descriptor[LOGREC_FILE_ID]=
INIT_LOGREC_FILE_ID;
log_record_type_descriptor[LOGREC_LONG_TRANSACTION_ID]=
......@@ -554,6 +569,7 @@ static File open_logfile_by_number_no_cache(uint32 file_no)
DBUG_ENTER("open_logfile_by_number_no_cache");
/* TODO: add O_DIRECT to open flags (when buffer is aligned) */
/* TODO: use my_create() */
if ((file= my_open(translog_filename_by_fileno(file_no, path),
O_CREAT | O_BINARY | O_RDWR,
MYF(MY_WME))) < 0)
......@@ -615,7 +631,7 @@ static my_bool translog_write_file_header()
bzero(page, sizeof(page_buff) - (page- page_buff));
DBUG_RETURN(my_pwrite(log_descriptor.log_file_num[0], page_buff,
sizeof(page_buff), 0, MYF(MY_WME | MY_NABP)) != 0);
sizeof(page_buff), 0, log_write_flags) != 0);
}
......@@ -1222,7 +1238,7 @@ static my_bool translog_buffer_next(TRANSLOG_ADDRESS *horizon,
/*
Set max LSN send to file
Set max LSN sent to file
SYNOPSIS
translog_set_sent_to_file()
......@@ -1512,7 +1528,7 @@ static my_bool translog_buffer_flush(struct st_translog_buffer *buffer)
}
if (my_pwrite(buffer->file, (char*) buffer->buffer,
buffer->size, LSN_OFFSET(buffer->offset),
MYF(MY_WME | MY_NABP)))
log_write_flags))
{
UNRECOVERABLE_ERROR(("Can't write buffer (%lu,0x%lx) size %lu "
"to the disk (%d)",
......@@ -2230,7 +2246,16 @@ my_bool translog_init(const char *directory,
*/
log_descriptor.flushed--; /* offset decreased */
log_descriptor.sent_to_file--; /* offset decreased */
/*
Log records will refer to a MARIA_SHARE by a unique 2-byte id; set up
structures for generating 2-byte ids:
*/
my_atomic_rwlock_init(&LOCK_id_to_share);
id_to_share= (MARIA_SHARE **) my_malloc(SHARE_ID_MAX*sizeof(MARIA_SHARE*),
MYF(MY_WME|MY_ZEROFILL));
if (unlikely(!id_to_share))
DBUG_RETURN(1);
id_to_share--; /* min id is 1 */
translog_inited= 1;
DBUG_RETURN(0);
}
......@@ -2303,6 +2328,8 @@ void translog_destroy()
}
pthread_mutex_destroy(&log_descriptor.sent_to_file_lock);
my_close(log_descriptor.directory_fd, MYF(MY_WME));
my_atomic_rwlock_destroy(&LOCK_id_to_share);
my_free((gptr)(id_to_share + 1), MYF(MY_ALLOW_ZERO_PTR));
translog_inited= 0;
}
DBUG_VOID_RETURN;
......@@ -2362,6 +2389,14 @@ static inline my_bool translog_unlock()
}
#define translog_buffer_lock_assert_owner(B) \
safe_mutex_assert_owner(&B->mutex);
void translog_lock_assert_owner()
{
translog_buffer_lock_assert_owner(log_descriptor.bc.buffer);
}
/*
Start new page
......@@ -4154,26 +4189,30 @@ static my_bool translog_write_fixed_record(LSN *lsn,
}
/*
Write the log record
SYNOPSIS
translog_write_record()
lsn LSN of the record will be written here
type the log record type
trn Transaction structure pointer for hooks by
record log type, for short_id
share MARIA_SHARE of table or NULL
rec_len record length or 0 (count it)
part_no number of parts or 0 (count it)
parts_data zero ended (in case of number of parts is 0)
array of LEX_STRINGs (parts), first
TRANSLOG_INTERNAL_PARTS positions in the log
should be unused (need for loghandler)
RETURN
0 OK
1 Error
/**
@brief Writes the log record
If share has no 2-byte-id yet, gives an id to the share and logs
LOGREC_FILE_ID. If transaction has not logged LOGREC_LONG_TRANSACTION_ID
yet, logs it.
@param lsn LSN of the record will be written here
@param type the log record type
@param trn Transaction structure pointer for hooks by
record log type, for short_id
@param share MARIA_SHARE of table or NULL
@param rec_len record length or 0 (count it)
@param part_no number of parts or 0 (count it)
@param parts_data zero ended (in case of number of parts is 0)
array of LEX_STRINGs (parts), first
TRANSLOG_INTERNAL_PARTS positions in the log
should be unused (need for loghandler)
@param store_share_id if share!=NULL then share's id will automatically
be stored in the two first bytes pointed (so
pointer is assumed to be !=NULL)
@return Operation status
@retval 0 OK
@retval 1 Error
*/
my_bool translog_write_record(LSN *lsn,
......@@ -4181,7 +4220,8 @@ my_bool translog_write_record(LSN *lsn,
TRN *trn, struct st_maria_share *share,
translog_size_t rec_len,
uint part_no,
LEX_STRING *parts_data)
LEX_STRING *parts_data,
uchar *store_share_id)
{
struct st_translog_parts parts;
LEX_STRING *part;
......@@ -4191,10 +4231,41 @@ my_bool translog_write_record(LSN *lsn,
DBUG_PRINT("enter", ("type: %u ShortTrID: %u",
(uint) type, (uint)short_trid));
if (share && !share->base.transactional)
if (share)
{
DBUG_PRINT("info", ("It is not transactional table"));
DBUG_RETURN(0);
if (!share->base.transactional)
{
DBUG_PRINT("info", ("It is not transactional table"));
DBUG_RETURN(0);
}
if (unlikely(share->id == 0))
{
/*
First log write for this MARIA_SHARE; give it a short id.
When the lock manager is enabled and needs a short id, it should be
assigned in the lock manager (because row locks will be taken before
log records are written; for example SELECT FOR UPDATE takes locks but
writes no log record.
*/
if (unlikely(translog_assign_id_to_share(share, trn)))
DBUG_RETURN(1);
}
fileid_store(store_share_id, share->id);
}
if (unlikely(!(trn->first_undo_lsn & TRANSACTION_LOGGED_LONG_ID)))
{
LSN lsn;
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
uchar log_data[6];
int6store(log_data, trn->trid);
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
trn->first_undo_lsn|= TRANSACTION_LOGGED_LONG_ID; /* no recursion */
if (unlikely(translog_write_record(&lsn, LOGREC_LONG_TRANSACTION_ID,
trn, NULL, sizeof(log_data),
sizeof(log_array)/sizeof(log_array[0]),
log_array, NULL)))
DBUG_RETURN(1);
}
parts.parts= parts_data;
......@@ -4375,20 +4446,19 @@ void translog_free_record_header(TRANSLOG_HEADER_BUFFER *buff)
}
/*
Set current horizon in the scanner data structure
/**
@brief Returns the current horizon at the end of the current log
SYNOPSIS
translog_scanner_set_horizon()
scanner Information about current chunk during scanning
@return Horizon
*/
static void translog_scanner_set_horizon(struct st_translog_scanner_data
*scanner)
TRANSLOG_ADDRESS translog_get_horizon()
{
TRANSLOG_ADDRESS res;
translog_lock();
scanner->horizon= log_descriptor.horizon;
res= log_descriptor.horizon;
translog_unlock();
return res;
}
......@@ -4446,7 +4516,7 @@ my_bool translog_init_scanner(LSN lsn,
scanner->fixed_horizon= fixed_horizon;
translog_scanner_set_horizon(scanner);
scanner->horizon= translog_get_horizon();
DBUG_PRINT("info", ("horizon: (0x%lu,0x%lx)",
(ulong) LSN_FILE_NO(scanner->horizon),
(ulong) LSN_OFFSET(scanner->horizon)));
......@@ -4499,7 +4569,7 @@ static my_bool translog_scanner_eol(TRANSLOG_SCANNER_DATA *scanner)
DBUG_PRINT("info", ("Horizon is fixed and reached"));
DBUG_RETURN(1);
}
translog_scanner_set_horizon(scanner);
scanner->horizon= translog_get_horizon();
DBUG_PRINT("info",
("Horizon is re-read, EOL: %d",
scanner->horizon <= (scanner->page_addr +
......@@ -5368,17 +5438,31 @@ static void translog_force_current_buffer_to_finish()
}
/*
Flush the log up to given LSN (included)
SYNOPSIS
translog_flush()
lsn log record serial number up to which (inclusive)
the log have to be flushed
RETURN
0 OK
1 Error
/**
@brief Flush the log up to given LSN (included)
@param lsn log record serial number up to which (inclusive)
the log has to be flushed
@return Operation status
@retval 0 OK
@retval 1 Error
@todo LOG: when a log write fails, we should not write to this log anymore
(if we add more log records to this log they will be unreadable: we will hit
the broken log record): all translog_flush() should be made to fail (because
translog_flush() is when a a transaction wants something durable and we
cannot make anything durable as log is corrupted). For that, a "my_bool
st_translog_descriptor::write_error" could be set to 1 when a
translog_write_record() or translog_flush() fails, and translog_flush()
would test this var (and translog_write_record() could also test this var if
it wants, though it's not absolutely needed).
Then, either shut Maria down immediately, or switch to a new log (but if we
get write error after write error, that would create too many logs).
A popular open-source transactional engine intentionally crashes as soon as
a log flush fails (we however don't want to crash the entire mysqld, but
stopping all engine's operations immediately would make sense).
Same applies to translog_write_record().
*/
my_bool translog_flush(LSN lsn)
......@@ -5469,24 +5553,55 @@ my_bool translog_flush(LSN lsn)
/* We sync file when we are closing it => do nothing if file closed */
}
log_descriptor.flushed= sent_to_file;
/** @todo LOG decide if syncing of directory is needed */
rc|= my_sync(log_descriptor.directory_fd, MYF(MY_WME));
translog_unlock();
DBUG_RETURN(rc);
}
/**
@brief Sets transaction's rec_lsn if needed
A transaction sometimes writes a REDO even before the page is in the
pagecache (example: brand new head or tail pages; full pages). So, if
Checkpoint happens just after the REDO write, it needs to know that the
REDO phase must start before this REDO. Scanning the pagecache cannot
tell that as the page is not in the cache. So, transaction sets its rec_lsn
to the REDO's LSN or somewhere before, and Checkpoint reads the
transaction's rec_lsn.
@todo move it to a separate file
@return Operation status, always 0 (success)
*/
static my_bool write_hook_for_redo(enum translog_record_type type
__attribute__ ((unused)),
TRN *trn, LSN *lsn,
struct st_translog_parts *parts
__attribute__ ((unused)))
{
/*
If the hook stays so simple, it would be faster to pass
!trn->rec_lsn ? trn->rec_lsn : some_dummy_lsn
to translog_write_record(), like Monty did in his original code, and not
have a hook. For now we keep it like this.
*/
if (trn->rec_lsn == 0)
trn->rec_lsn= *lsn;
return 0;
}
/**
@brief Sets transaction's undo_lsn, first_undo_lsn if needed
@todo move it to a separate file
@return Operation status, always 0 (success)
*/
static my_bool write_hook_for_undo(enum translog_record_type type
__attribute__ ((unused)),
TRN *trn, LSN *lsn,
......@@ -5494,11 +5609,109 @@ static my_bool write_hook_for_undo(enum translog_record_type type
__attribute__ ((unused)))
{
trn->undo_lsn= *lsn;
if (trn->first_undo_lsn == 0)
trn->first_undo_lsn= *lsn;
if (unlikely(LSN_WITH_FLAGS_TO_LSN(trn->first_undo_lsn) == 0))
trn->first_undo_lsn=
trn->undo_lsn | LSN_WITH_FLAGS_TO_FLAGS(trn->first_undo_lsn);
return 0;
/*
when we implement purging, we will specialize this hook: UNDO_PURGE
records will additionally set trn->undo_purge_lsn
*/
}
/**
@brief Gives a 2-byte-id to MARIA_SHARE and logs this fact
If a MARIA_SHARE does not yet have a 2-byte-id (unique over all currently
open MARIA_SHAREs), give it one and record this assignment in the log
(LOGREC_FILE_ID log record).
@param share table
@param trn calling transaction
@return Operation status
@retval 0 OK
@retval 1 Error
@note Can be called even if share already has an id (then will do nothing)
*/
int translog_assign_id_to_share(MARIA_SHARE *share, TRN *trn)
{
/*
If you give an id to a non-BLOCK_RECORD table, you also need to release
this id somewhere. Then you can change the assertion.
*/
DBUG_ASSERT(share->data_file_type == BLOCK_RECORD);
/* re-check under mutex to avoid having 2 ids for the same share */
pthread_mutex_lock(&share->intern_lock);
if (likely(share->id == 0))
{
/* Inspired by set_short_trid() of trnman.c */
int i= share->kfile.file % SHARE_ID_MAX + 1;
my_atomic_rwlock_wrlock(&LOCK_id_to_share);
/**
@todo RECOVERY BUG: if all slots are used, and we're using rwlocks
above, we will never exit the loop. To be discussed with Serg.
*/
for ( ; ; i= i % SHARE_ID_MAX + 1) /* the range is [1..SHARE_ID_MAX] */
{
void *tmp= NULL;
if (id_to_share[i] == NULL &&
my_atomic_casptr((void **)&id_to_share[i], &tmp, share))
break;
}
my_atomic_rwlock_wrunlock(&LOCK_id_to_share);
share->id= (uint16)i;
DBUG_PRINT("info", ("id_to_share: 0x%lx -> %u", (ulong)share, i));
LSN lsn;
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 2];
uchar log_data[FILEID_STORE_SIZE];
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
/*
open_file_name is an unresolved name (symlinks are not resolved, datadir
is not realpath-ed, etc) which is good: the log can be moved to another
directory and continue working.
*/
log_array[TRANSLOG_INTERNAL_PARTS + 1].str= share->open_file_name;
/**
@todo if we had the name's length in MARIA_SHARE we could avoid this
strlen()
*/
log_array[TRANSLOG_INTERNAL_PARTS + 1].length=
strlen(share->open_file_name);
if (unlikely(translog_write_record(&lsn, LOGREC_FILE_ID, trn, share,
sizeof(log_data) +
log_array[TRANSLOG_INTERNAL_PARTS +
1].length,
sizeof(log_array)/sizeof(log_array[0]),
log_array, log_data)))
return 1;
}
pthread_mutex_unlock(&share->intern_lock);
return 0;
}
/**
@brief Recycles a MARIA_SHARE's short id.
@param share table
@note Must be called only if share has an id (i.e. id != 0)
*/
void translog_deassign_id_from_share(MARIA_SHARE *share)
{
DBUG_PRINT("info", ("id_to_share: 0x%lx id %u -> 0",
(ulong)share, share->id));
/*
We don't need any mutex as we are called only when closing the last
instance of the table: no writes can be happening.
*/
my_atomic_rwlock_rdlock(&LOCK_id_to_share);
my_atomic_storeptr((void **)&id_to_share[share->id], 0);
my_atomic_rwlock_rdunlock(&LOCK_id_to_share);
}
......@@ -86,13 +86,12 @@ enum translog_record_type
LOGREC_PREPARE_WITH_UNDO_PURGE,
LOGREC_COMMIT,
LOGREC_COMMIT_WITH_UNDO_PURGE,
LOGREC_CHECKPOINT_PAGE,
LOGREC_CHECKPOINT_TRAN,
LOGREC_CHECKPOINT_TABL,
LOGREC_CHECKPOINT,
LOGREC_REDO_CREATE_TABLE,
LOGREC_REDO_RENAME_TABLE,
LOGREC_REDO_DROP_TABLE,
LOGREC_REDO_TRUNCATE_TABLE,
LOGREC_REDO_DELETE_ALL,
LOGREC_REDO_REPAIR_TABLE,
LOGREC_FILE_ID,
LOGREC_LONG_TRANSACTION_ID,
LOGREC_RESERVED_FUTURE_EXTENSION= 63
......@@ -181,9 +180,7 @@ struct st_translog_reader_data
};
struct st_transaction;
#ifdef __cplusplus
extern "C" {
#endif
C_MODE_START
/* Records types for unittests */
#define LOGREC_FIXED_RECORD_0LSN_EXAMPLE 1
......@@ -199,13 +196,12 @@ extern my_bool translog_init(const char *directory, uint32 log_file_max_size,
uint32 server_version, uint32 server_id,
PAGECACHE *pagecache, uint flags);
extern my_bool translog_write_record(LSN *lsn,
enum translog_record_type type,
struct st_transaction *trn,
struct st_maria_share *share,
translog_size_t rec_len,
uint part_no,
LEX_STRING *parts_data);
extern my_bool
translog_write_record(LSN *lsn, enum translog_record_type type,
struct st_transaction *trn,
struct st_maria_share *share,
translog_size_t rec_len, uint part_no,
LEX_STRING *parts_data, uchar *store_share_id);
extern void translog_destroy();
......@@ -232,7 +228,10 @@ extern translog_size_t translog_read_next_record_header(TRANSLOG_SCANNER_DATA
*scanner,
TRANSLOG_HEADER_BUFFER
*buff);
#ifdef __cplusplus
}
#endif
extern void translog_lock_assert_owner();
extern TRANSLOG_ADDRESS translog_get_horizon();
extern int translog_assign_id_to_share(struct st_maria_share *share,
struct st_transaction *trn);
extern void translog_deassign_id_from_share(struct st_maria_share *share);
extern my_bool translog_inited;
C_MODE_END
......@@ -35,7 +35,7 @@ typedef TRANSLOG_ADDRESS LSN;
/* checks LSN */
#define LSN_VALID(L) DBUG_ASSERT((L) >= 0 && (L) < (uint64)0xFFFFFFFFFFFFFFLL)
/* size of stored LSN on a disk */
/* size of stored LSN on a disk, don't change it! */
#define LSN_STORE_SIZE 7
/* Puts LSN into buffer (dst) */
......@@ -53,4 +53,12 @@ typedef TRANSLOG_ADDRESS LSN;
#define LSN_REPLACE_OFFSET(L, S) (LSN_FINE_NO_PART(L) | (S))
/*
an 8-byte type whose most significant byte is used for "flags"; 7
other bytes are a LSN.
*/
typedef LSN LSN_WITH_FLAGS;
#define LSN_WITH_FLAGS_TO_LSN(x) (x & ULL(0x00FFFFFFFFFFFFFF))
#define LSN_WITH_FLAGS_TO_FLAGS(x) (x & ULL(0xFF00000000000000))
#endif
......@@ -919,12 +919,23 @@ static void setup_key_functions(register MARIA_KEYDEF *keyinfo)
}
/*
Function to save and store the header in the index file (.MYI)
/**
@brief Function to save and store the header in the index file (.MYI)
@param file descriptor of the index file to write
@param state state information to write to the file
@param pWrite bitmap (determines the amount of information to
write, and if my_write() or my_pwrite() should be
used)
@return Operation status
@retval 0 OK
@retval 1 Error
*/
uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
{
/** @todo RECOVERY write it only at checkpoint time */
uchar buff[MARIA_STATE_INFO_SIZE + MARIA_STATE_EXTRA_SIZE];
uchar *ptr=buff;
uint i, keys= (uint) state->header.keys;
......@@ -935,6 +946,11 @@ uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
/* open_count must be first because of _ma_mark_file_changed ! */
mi_int2store(ptr,state->open_count); ptr+= 2;
/*
if you change the offset of this LSN inside the file, fix
ma_create + ma_rename + ma_delete_all + backward-compatibility.
*/
lsn_store(ptr, state->create_rename_lsn); ptr+= LSN_STORE_SIZE;
*ptr++= (uchar)state->changed;
*ptr++= state->sortkey;
mi_rowstore(ptr,state->state.records); ptr+= 8;
......@@ -959,6 +975,7 @@ uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
{
mi_sizestore(ptr,state->key_root[i]); ptr+= 8;
}
/** @todo RECOVERY key_del is a problem for recovery */
mi_sizestore(ptr,state->key_del); ptr+= 8;
if (pWrite & 2) /* From maria_chk */
{
......@@ -994,6 +1011,7 @@ byte *_ma_state_info_read(byte *ptr, MARIA_STATE_INFO *state)
key_parts= mi_uint2korr(state->header.key_parts);
state->open_count = mi_uint2korr(ptr); ptr+= 2;
state->create_rename_lsn= lsn_korr(ptr); ptr+= LSN_STORE_SIZE;
state->changed= (my_bool) *ptr++;
state->sortkey= (uint) *ptr++;
state->state.records= mi_rowkorr(ptr); ptr+= 8;
......
......@@ -114,6 +114,11 @@
/* TODO: put it to my_static.c */
my_bool my_disable_flush_pagecache_blocks= 0;
/**
when flushing pages of a file, it can happen that we take some dirty blocks
out of changed_blocks[]; Checkpoint must not run at this moment.
*/
uint changed_blocks_is_incomplete= 0;
#define STRUCT_PTR(TYPE, MEMBER, a) \
(TYPE *) ((char *) (a) - offsetof(TYPE, MEMBER))
......@@ -308,7 +313,7 @@ struct st_pagecache_block_link
enum pagecache_page_type type; /* type of the block */
uint hits_left; /* number of hits left until promotion */
ulonglong last_hit_time; /* timestamp of the last hit */
LSN rec_lsn; /* LSN when first became dirty */
LSN rec_lsn; /**< LSN when first became dirty */
KEYCACHE_CONDVAR *condvar; /* condition variable for 'no readers' event */
};
......@@ -2523,7 +2528,8 @@ void pagecache_unlock(PAGECACHE *pagecache,
{
DBUG_ASSERT(lock == PAGECACHE_LOCK_WRITE_UNLOCK);
DBUG_ASSERT(pin == PAGECACHE_UNPIN);
set_if_bigger(block->rec_lsn, first_REDO_LSN_for_page);
if (block->rec_lsn == 0)
block->rec_lsn= first_REDO_LSN_for_page;
}
if (lsn != 0)
{
......@@ -2685,7 +2691,8 @@ void pagecache_unlock_by_link(PAGECACHE *pagecache,
DBUG_ASSERT(lock == PAGECACHE_LOCK_WRITE_UNLOCK ||
lock == PAGECACHE_LOCK_READ_UNLOCK);
DBUG_ASSERT(pin == PAGECACHE_UNPIN);
set_if_bigger(block->rec_lsn, first_REDO_LSN_for_page);
if (block->rec_lsn == 0)
block->rec_lsn= first_REDO_LSN_for_page;
}
if (lsn != 0)
{
......@@ -3279,8 +3286,8 @@ my_bool pagecache_write_part(PAGECACHE *pagecache,
if (need_lock_change)
{
/*
RECOVERY TODO BUG We are doing an unlock here, so need to give the
page its rec_lsn
We don't set rec_lsn of the block; this is ok as for the
Maria-block-record's pages, we always keep pages pinned here.
*/
if (make_lock_and_pin(pagecache, block,
write_lock_change_table[lock].unlock_lock,
......@@ -3500,22 +3507,21 @@ static int flush_cached_blocks(PAGECACHE *pagecache,
}
/*
flush all key blocks for a file to disk, but don't do any mutex locks
/**
@brief flush all key blocks for a file to disk but don't do any mutex locks
flush_pagecache_blocks_int()
pagecache pointer to a key cache data structure
file handler for the file to flush to
flush_type type of the flush
@param pagecache pointer to a pagecache data structure
@param file handler for the file to flush to
@param flush_type type of the flush
NOTES
This function doesn't do any mutex locks because it needs to be called
both from flush_pagecache_blocks and flush_all_key_blocks (the later one
does the mutex lock in the resize_pagecache() function).
@note
This function doesn't do any mutex locks because it needs to be called
both from flush_pagecache_blocks and flush_all_key_blocks (the later one
does the mutex lock in the resize_pagecache() function).
RETURN
0 ok
1 error
@return Operation status
@retval 0 OK
@retval 1 Error
*/
static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
......@@ -3547,6 +3553,7 @@ static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
#if defined(PAGECACHE_DEBUG)
uint cnt= 0;
#endif
uint8 changed_blocks_is_incomplete_incremented= 0;
if (type != FLUSH_IGNORE_CHANGED)
{
......@@ -3636,16 +3643,23 @@ static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
else
{
/* Link the block into a list of blocks 'in switch' */
/*
RECOVERY TODO BUG this unlink_changed() is a serious problem for
Maria's Checkpoint: it removes a page from the list of dirty
pages, while it's still dirty. A solution is to abandon
first_in_switch, just wait for this page to be
flushed by somebody else, and loop. TODO: check all places
where we remove a page from the list of dirty pages
*/
unlink_changed(block);
link_changed(block, &first_in_switch);
/*
We have just removed a page from the list of dirty pages
("changed_blocks") though it's still dirty (the flush by another
thread has not yet happened). Checkpoint will miss the page and so
must be blocked until that flush has happened.
*/
/**
@todo RECOVERY: check all places where we remove a page from the
list of dirty pages
*/
if (unlikely(!changed_blocks_is_incomplete_incremented))
{
changed_blocks_is_incomplete_incremented= 1;
changed_blocks_is_incomplete++;
}
}
}
}
......@@ -3683,6 +3697,8 @@ static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
KEYCACHE_DBUG_ASSERT(cnt <= pagecache->blocks_used);
#endif
}
changed_blocks_is_incomplete-=
changed_blocks_is_incomplete_incremented;
/* The following happens very seldom */
if (! (type == FLUSH_KEEP || type == FLUSH_FORCE_WRITE))
{
......@@ -3789,51 +3805,56 @@ int reset_pagecache_counters(const char *name, PAGECACHE *pagecache)
}
/*
Allocates a buffer and stores in it some information about all dirty pages
of type PAGECACHE_LSN_PAGE.
SYNOPSIS
pagecache_collect_changed_blocks_with_lsn()
pagecache pointer to the page cache
str (OUT) pointer to a LEX_STRING where the allocated buffer, and
its size, will be put
max_lsn (OUT) pointer to a LSN where the maximum rec_lsn of all
relevant dirty pages will be put
DESCRIPTION
Does the allocation because the caller cannot know the size itself.
Memory freeing is to be done by the caller (if the "str" member of the
LEX_STRING is not NULL).
Ignores all pages of another type than PAGECACHE_LSN_PAGE, because they
are not interesting for a checkpoint record.
The caller has the intention of doing checkpoints.
RETURN
0 on success
1 on error
/**
@brief Allocates a buffer and stores in it some info about all dirty pages
Does the allocation because the caller cannot know the size itself.
Memory freeing is to be done by the caller (if the "str" member of the
LEX_STRING is not NULL).
Ignores all pages of another type than PAGECACHE_LSN_PAGE, because they
are not interesting for a checkpoint record.
The caller has the intention of doing checkpoints.
@param pagecache pointer to the page cache
@param[out] str pointer to where the allocated buffer, and
its size, will be put
@param[out] min_rec_lsn pointer to where the minimum rec_lsn of all
relevant dirty pages will be put
@param[out] max_rec_lsn pointer to where the maximum rec_lsn of all
relevant dirty pages will be put
@return Operation status
@retval 0 OK
@retval 1 Error
*/
my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
LEX_STRING *str,
LSN *max_lsn)
LSN *min_rec_lsn,
LSN *max_rec_lsn)
{
my_bool error= 0;
ulong stored_list_size= 0;
uint file_hash;
char *ptr;
LSN minimum_rec_lsn= ULONGLONG_MAX, maximum_rec_lsn= 0;
DBUG_ENTER("pagecache_collect_changed_blocks_with_LSN");
*max_lsn= 0;
DBUG_ASSERT(NULL == str->str);
/*
We lock the entire cache but will be quick, just reading/writing a few MBs
of memory at most.
When we enter here, we must be sure that no "first_in_switch" situation
is happening or will happen (either we have to get rid of
first_in_switch in the code or, first_in_switch has to increment a
"danger" counter for this function to know it has to wait). TODO.
*/
pagecache_pthread_mutex_lock(&pagecache->cache_lock);
while (changed_blocks_is_incomplete > 0)
{
/*
Some pages are more recent in memory than on disk (=dirty) and are not
in "changed_blocks" so we cannot know them. Wait.
*/
pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
sleep(1);
pagecache_pthread_mutex_lock(&pagecache->cache_lock);
}
/* Count how many dirty pages are interesting */
for (file_hash= 0; file_hash < PAGECACHE_CHANGED_BLOCKS_HASH; file_hash++)
......@@ -3851,35 +3872,15 @@ my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
DBUG_ASSERT(block->status & PCBLOCK_CHANGED);
if (block->type != PAGECACHE_LSN_PAGE)
continue; /* no need to store it */
/*
In the current pagecache, rec_lsn is not set correctly:
1) it is set on pagecache_unlock(), too late (a page is dirty
(PCBLOCK_CHANGED) since the first pagecache_write()). So in this
scenario:
thread1: thread2:
write_REDO
pagecache_write() checkpoint : reclsn not known
pagecache_unlock(sets rec_lsn)
commit
crash,
at recovery we will wrongly skip the REDO. It also affects the
low-water mark's computation.
2) sometimes the unlocking can be an implicit action of
pagecache_write(), without any call to pagecache_unlock(), then
rec_lsn is not set.
1) and 2) are critical problems.
TODO: fix this when Monty has explained how he writes BLOB pages.
*/
if (block->rec_lsn == 0)
{
DBUG_ASSERT(0);
goto err;
}
stored_list_size++;
}
}
str->length= 8+(4+4+8)*stored_list_size;
str->length= 8 + /* number of dirty pages */
(4 + /* file */
4 + /* pageno */
LSN_STORE_SIZE /* rec_lsn */
) * stored_list_size;
if (NULL == (str->str= my_malloc(str->length, MYF(MY_WME))))
goto err;
ptr= str->str;
......@@ -3896,19 +3897,27 @@ my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
{
if (block->type != PAGECACHE_LSN_PAGE)
continue; /* no need to store it in the checkpoint record */
DBUG_ASSERT((4 == sizeof(block->hash_link->file.file)));
DBUG_ASSERT((4 == sizeof(block->hash_link->pageno)));
compile_time_assert((4 == sizeof(block->hash_link->file.file)));
compile_time_assert((4 == sizeof(block->hash_link->pageno)));
int4store(ptr, block->hash_link->file.file);
ptr+= 4;
int4store(ptr, block->hash_link->pageno);
ptr+= 4;
int8store(ptr, (ulonglong) block->rec_lsn);
ptr+= 8;
set_if_bigger(*max_lsn, block->rec_lsn);
lsn_store(ptr, block->rec_lsn);
ptr+= LSN_STORE_SIZE;
if (block->rec_lsn != 0)
{
if (cmp_translog_addr(block->rec_lsn, minimum_rec_lsn) < 0)
minimum_rec_lsn= block->rec_lsn;
if (cmp_translog_addr(block->rec_lsn, maximum_rec_lsn) > 0)
maximum_rec_lsn= block->rec_lsn;
} /* otherwise, some trn->rec_lsn should hold the info */
}
}
end:
pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
*min_rec_lsn= minimum_rec_lsn;
*max_rec_lsn= maximum_rec_lsn;
DBUG_RETURN(error);
err:
......
......@@ -239,6 +239,7 @@ extern my_bool pagecache_delete_pages(PAGECACHE *pagecache,
extern void end_pagecache(PAGECACHE *keycache, my_bool cleanup);
extern my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
LEX_STRING *str,
LSN *min_lsn,
LSN *max_lsn);
extern int reset_pagecache_counters(const char *name, PAGECACHE *pagecache);
......
......@@ -52,7 +52,12 @@ int maria_panic(enum ha_panic_function flag)
info=(MARIA_HA*) list_element->data;
switch (flag) {
case HA_PANIC_CLOSE:
pthread_mutex_unlock(&THR_LOCK_maria); /* Not exactly right... */
/*
If bad luck (if some tables would be used now, which normally does not
happen in MySQL), as we release the mutex, the list may change and so
we may crash.
*/
pthread_mutex_unlock(&THR_LOCK_maria);
if (maria_close(info))
error=my_errno;
pthread_mutex_lock(&THR_LOCK_maria);
......
......@@ -29,25 +29,22 @@ static uint _ma_keynr(MARIA_HA *info, MARIA_KEYDEF *keyinfo, byte *page,
byte *keypos, uint *ret_max_key);
/*
Estimate how many records there is in a given range
/**
@brief Estimate how many records there is in a given range
SYNOPSIS
maria_records_in_range()
info MARIA handler
inx Index to use
min_key Min key. Is = 0 if no min range
max_key Max key. Is = 0 if no max range
@param info MARIA handler
@param inx Index to use
@param min_key Min key. Is = 0 if no min range
@param max_key Max key. Is = 0 if no max range
NOTES
We should ONLY return 0 if there is no rows in range
@note
We should ONLY return 0 if there is no rows in range
RETURN
HA_POS_ERROR error (or we can't estimate number of rows)
number Estimated number of rows
@return Estimated number of rows or error
@retval HA_POS_ERROR error (or we can't estimate number of rows)
@retval number Estimated number of rows
*/
ha_rows maria_records_in_range(MARIA_HA *info, int inx, key_range *min_key,
key_range *max_key)
{
......@@ -115,6 +112,13 @@ ha_rows maria_records_in_range(MARIA_HA *info, int inx, key_range *min_key,
rw_unlock(&info->s->key_root_lock[inx]);
fast_ma_writeinfo(info);
/**
@todo LOCK
If res==0 (no rows), if we need to guarantee repeatability of the search,
we will need to set a next-key lock in this statement.
Also SELECT COUNT(*)...
*/
DBUG_PRINT("info",("records: %ld",(ulong) (res)));
DBUG_RETURN(res);
}
......
......@@ -18,6 +18,18 @@
*/
#include "ma_fulltext.h"
#include "trnman_public.h"
/**
@brief renames a table
@param old_name current name of table
@param new_name table should be renamed to this name
@return Operation status
@retval 0 OK
@retval !=0 Error
*/
int maria_rename(const char *old_name, const char *new_name)
{
......@@ -26,22 +38,73 @@ int maria_rename(const char *old_name, const char *new_name)
#ifdef USE_RAID
uint raid_type=0,raid_chunks=0;
#endif
MARIA_HA *info;
MARIA_SHARE *share;
myf sync_dir;
DBUG_ENTER("maria_rename");
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(old_name,"rename old_table");
_ma_check_table_is_closed(new_name,"rename new table2");
#endif
/* LOCK TODO take X-lock on table here */
/** @todo LOCK take X-lock on table */
if (!(info= maria_open(old_name, O_RDWR, HA_OPEN_FOR_REPAIR)))
DBUG_RETURN(my_errno);
share= info->s;
#ifdef USE_RAID
raid_type = share->base.raid_type;
raid_chunks = share->base.raid_chunks;
#endif
sync_dir= (share->base.transactional && !share->temporary) ?
MY_SYNC_DIR : 0;
if (sync_dir)
{
MARIA_HA *info;
if (!(info=maria_open(old_name, O_RDONLY, 0)))
DBUG_RETURN(my_errno);
raid_type = info->s->base.raid_type;
raid_chunks = info->s->base.raid_chunks;
maria_close(info);
uchar log_data[LSN_STORE_SIZE];
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 3];
uint old_name_len= strlen(old_name), new_name_len= strlen(new_name);
int2store(log_data, old_name_len);
int2store(log_data + 2, new_name_len);
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= log_data;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= 2 + 2;
log_array[TRANSLOG_INTERNAL_PARTS + 1].str= (char *)old_name;
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= old_name_len;
log_array[TRANSLOG_INTERNAL_PARTS + 2].str= (char *)new_name;
log_array[TRANSLOG_INTERNAL_PARTS + 2].length= new_name_len;
/*
For this record to be of any use for Recovery, we need the upper
MySQL layer to be crash-safe, which it is not now (that would require
work using the ddl_log of sql/sql_table.cc); when it is, we should
reconsider the moment of writing this log record (before or after op,
under THR_LOCK_maria or not...), how to use it in Recovery, and force
the log. For now this record is just informative.
*/
if (unlikely(translog_write_record(&share->state.create_rename_lsn,
LOGREC_REDO_RENAME_TABLE,
&dummy_transaction_object, NULL,
2 + 2 + old_name_len + new_name_len,
sizeof(log_array)/sizeof(log_array[0]),
log_array, NULL)))
{
maria_close(info);
DBUG_RETURN(1);
}
/*
store LSN into file, needed for Recovery to not be confused if a
RENAME happened (applying REDOs to the wrong table).
*/
lsn_store(log_data, share->state.create_rename_lsn);
if (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
my_sync(share->kfile.file, MYF(MY_WME)))
{
maria_close(info);
DBUG_RETURN(1);
}
}
maria_close(info);
#ifdef USE_RAID
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(old_name,"rename raidcheck");
#endif
......@@ -49,29 +112,18 @@ int maria_rename(const char *old_name, const char *new_name)
fn_format(from,old_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
fn_format(to,new_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
/*
RECOVERY TODO log the two renames below. Update
ZeroDirtyPagesLSN of the table on disk (=> sync the files), this is
needed so that Recovery does not pick a wrong table.
Then do the file renames.
For this log record to be of any use for Recovery, we need the upper MySQL
layer to be crash-safe in DDLs; when it is we should reconsider the moment
of writing this log record, how to use it in Recovery, and force the log.
For now this record is only informative. But ZeroDirtyPagesLSN is
critically needed!
*/
if (my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR)))
if (my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir)))
DBUG_RETURN(my_errno);
fn_format(from,old_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
fn_format(to,new_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
#ifdef USE_RAID
if (raid_type)
data_file_rename_error= my_raid_rename(from, to, raid_chunks,
MYF(MY_WME | MY_SYNC_DIR));
MYF(MY_WME | sync_dir));
else
#endif
data_file_rename_error=
my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR));
my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir));
if (data_file_rename_error)
{
/*
......@@ -81,7 +133,7 @@ int maria_rename(const char *old_name, const char *new_name)
data_file_rename_error= my_errno;
fn_format(from, old_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
fn_format(to, new_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
my_rename_with_symlink(to, from, MYF(MY_WME | MY_SYNC_DIR));
my_rename_with_symlink(to, from, MYF(MY_WME | sync_dir));
}
DBUG_RETURN(data_file_rename_error);
......
......@@ -47,7 +47,13 @@ PAGECACHE *maria_pagecache= &maria_pagecache_var;
PAGECACHE maria_log_pagecache_var;
PAGECACHE *maria_log_pagecache= &maria_log_pagecache_var;
/* For using maria externally */
/**
@brief when transactionality does not matter we can use this transaction
Used in external programs like ma_test*, and also internally inside
libmaria when there is no transaction around and the operation isn't
transactional (CREATE/DROP/RENAME/OPTIMIZE/REPAIR).
*/
TRN dummy_transaction_object;
/* Enough for comparing if number is zero */
......
......@@ -3,10 +3,16 @@
# Execute some simple basic test on MyISAM libary to check if things
# works at all.
# If you want to run this in Valgrind, you should use --trace-children=yes,
# so that it detects problems in ma_test* and not in the shell script
valgrind="valgrind --alignment=8 --leak-check=yes"
silent="-s"
suffix=""
#set -x -v -e
if [ -z "$maria_path" ]
then
maria_path="."
fi
run_tests()
{
......@@ -14,139 +20,139 @@ run_tests()
#
# First some simple tests
#
./ma_test1$suffix $silent $row_type
./maria_chk$suffix -se test1
./ma_test1$suffix $silent -N $row_type
./maria_chk$suffix -se test1
./ma_test1$suffix $silent -P --checksum $row_type
./maria_chk$suffix -se test1
./ma_test1$suffix $silent -P -N $row_type
./maria_chk$suffix -se test1
./ma_test1$suffix $silent -B -N -R2 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -k 480 --unique $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -N -R1 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -p $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -p -N --unique $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -p -N --key_length=127 --checksum $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -p -N --key_length=128 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -p --key_length=480 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -B $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -B --key_length=64 --unique $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -B -k 480 --checksum $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -B -k 480 -N --unique --checksum $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -m $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -m -P --unique --checksum $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -m -P --key_length=480 --key_cache $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -m -p $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -w --unique $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -w --key_length=64 --checksum $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -w -N --key_length=480 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -w --key_length=480 --checksum $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -b -N $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -a -b --key_length=480 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent -p -B --key_length=480 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent --checksum --unique $row_type
./maria_chk$suffix -se test1
./ma_test1$suffix $silent --unique $row_type
./maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent $row_type
$maria_path/maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent -N $row_type
$maria_path/maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent -P --checksum $row_type
$maria_path/maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent -P -N $row_type
$maria_path/maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent -B -N -R2 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -k 480 --unique $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -N -R1 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -p $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -p -N --unique $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -p -N --key_length=127 --checksum $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -p -N --key_length=128 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -p --key_length=480 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -B $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -B --key_length=64 --unique $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -B -k 480 --checksum $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -B -k 480 -N --unique --checksum $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -m $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -m -P --unique --checksum $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -m -P --key_length=480 --key_cache $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -m -p $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -w --unique $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -w --key_length=64 --checksum $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -w -N --key_length=480 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -w --key_length=480 --checksum $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -b -N $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -a -b --key_length=480 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent -p -B --key_length=480 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent --checksum --unique $row_type
$maria_path/maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent --unique $row_type
$maria_path/maria_chk$suffix -se test1
./ma_test1$suffix $silent --key_multiple -N -S $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent --key_multiple -a -p --key_length=480 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent --key_multiple -a -B --key_length=480 $row_type
./maria_chk$suffix -sm test1
./ma_test1$suffix $silent --key_multiple -P -S $row_type
./maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent --key_multiple -N -S $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent --key_multiple -a -p --key_length=480 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent --key_multiple -a -B --key_length=480 $row_type
$maria_path/maria_chk$suffix -sm test1
$maria_path/ma_test1$suffix $silent --key_multiple -P -S $row_type
$maria_path/maria_chk$suffix -sm test1
./maria_pack$suffix --force -s test1
./maria_chk$suffix -ess test1
$maria_path/maria_pack$suffix --force -s test1
$maria_path/maria_chk$suffix -ess test1
./ma_test2$suffix $silent -L -K -W -P $row_type
./maria_chk$suffix -sm test2
./ma_test2$suffix $silent -L -K -W -P -A $row_type
./maria_chk$suffix -sm test2
./ma_test2$suffix $silent -L -K -P -R3 -m50 -b1000000 $row_type
./maria_chk$suffix -sm test2
./ma_test2$suffix $silent -L -B $row_type
./maria_chk$suffix -sm test2
./ma_test2$suffix $silent -D -B -c $row_type
./maria_chk$suffix -sm test2
./ma_test2$suffix $silent -m10000 -e4096 -K $row_type
./maria_chk$suffix -sm test2
./ma_test2$suffix $silent -m10000 -e8192 -K $row_type
./maria_chk$suffix -sm test2
./ma_test2$suffix $silent -m10000 -e16384 -E16384 -K -L $row_type
./maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -L -K -W -P $row_type
$maria_path/maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -L -K -W -P -A $row_type
$maria_path/maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -L -K -P -R3 -m50 -b1000000 $row_type
$maria_path/maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -L -B $row_type
$maria_path/maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -D -B -c $row_type
$maria_path/maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -m10000 -e4096 -K $row_type
$maria_path/maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -m10000 -e8192 -K $row_type
$maria_path/maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -m10000 -e16384 -E16384 -K -L $row_type
$maria_path/maria_chk$suffix -sm test2
}
run_repair_tests()
{
row_type=$1
./ma_test1$suffix $silent --checksum $row_type
./maria_chk$suffix -se test1
./maria_chk$suffix -rs test1
./maria_chk$suffix -se test1
./maria_chk$suffix -rqs test1
./maria_chk$suffix -se test1
./maria_chk$suffix -rs --correct-checksum test1
./maria_chk$suffix -se test1
./maria_chk$suffix -rqs --correct-checksum test1
./maria_chk$suffix -se test1
./maria_chk$suffix -ros --correct-checksum test1
./maria_chk$suffix -se test1
./maria_chk$suffix -rqos --correct-checksum test1
./maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent --checksum $row_type
$maria_path/maria_chk$suffix -se test1
$maria_path/maria_chk$suffix -rs test1
$maria_path/maria_chk$suffix -se test1
$maria_path/maria_chk$suffix -rqs test1
$maria_path/maria_chk$suffix -se test1
$maria_path/maria_chk$suffix -rs --correct-checksum test1
$maria_path/maria_chk$suffix -se test1
$maria_path/maria_chk$suffix -rqs --correct-checksum test1
$maria_path/maria_chk$suffix -se test1
$maria_path/maria_chk$suffix -ros --correct-checksum test1
$maria_path/maria_chk$suffix -se test1
$maria_path/maria_chk$suffix -rqos --correct-checksum test1
$maria_path/maria_chk$suffix -se test1
}
run_pack_tests()
{
row_type=$1
# check of maria_pack / maria_chk
./ma_test1$suffix $silent --checksum $row_type
./maria_pack$suffix --force -s test1
./maria_chk$suffix -ess test1
./maria_chk$suffix -rqs test1
./maria_chk$suffix -es test1
./maria_chk$suffix -rs test1
./maria_chk$suffix -es test1
./maria_chk$suffix -rus test1
./maria_chk$suffix -es test1
$maria_path/ma_test1$suffix $silent --checksum $row_type
$maria_path/maria_pack$suffix --force -s test1
$maria_path/maria_chk$suffix -ess test1
$maria_path/maria_chk$suffix -rqs test1
$maria_path/maria_chk$suffix -es test1
$maria_path/maria_chk$suffix -rs test1
$maria_path/maria_chk$suffix -es test1
$maria_path/maria_chk$suffix -rus test1
$maria_path/maria_chk$suffix -es test1
./ma_test1$suffix $silent --checksum -S $row_type
./maria_chk$suffix -se test1
./maria_chk$suffix -ros test1
./maria_chk$suffix -rqs test1
./maria_chk$suffix -se test1
$maria_path/ma_test1$suffix $silent --checksum -S $row_type
$maria_path/maria_chk$suffix -se test1
$maria_path/maria_chk$suffix -ros test1
$maria_path/maria_chk$suffix -rqs test1
$maria_path/maria_chk$suffix -se test1
./maria_pack$suffix --force -s test1
./maria_chk$suffix -rqs test1
./maria_chk$suffix -es test1
./maria_chk$suffix -rus test1
./maria_chk$suffix -es test1
$maria_path/maria_pack$suffix --force -s test1
$maria_path/maria_chk$suffix -rqs test1
$maria_path/maria_chk$suffix -es test1
$maria_path/maria_chk$suffix -rus test1
$maria_path/maria_chk$suffix -es test1
}
echo "Running tests with dynamic row format"
......@@ -169,27 +175,27 @@ run_tests "-M -T"
# Tests that gives warnings
#
./ma_test2$suffix $silent -L -K -W -P -S -R1 -m500
./maria_chk$suffix -sm test2
$maria_path/ma_test2$suffix $silent -L -K -W -P -S -R1 -m500
$maria_path/maria_chk$suffix -sm test2
echo "ma_test2$suffix $silent -L -K -R1 -m2000 ; Should give error 135"
./ma_test2$suffix $silent -L -K -R1 -m2000
echo "./maria_chk$suffix -sm test2 will warn that 'Datafile is almost full'"
./maria_chk$suffix -sm test2
./maria_chk$suffix -ssm test2
$maria_path/ma_test2$suffix $silent -L -K -R1 -m2000
echo "$maria_path/maria_chk$suffix -sm test2 will warn that 'Datafile is almost full'"
$maria_path/maria_chk$suffix -sm test2
$maria_path/maria_chk$suffix -ssm test2
#
# Some timing tests
#
time ./ma_test2$suffix $silent
time ./ma_test2$suffix $silent -S
time ./ma_test2$suffix $silent -M
time ./ma_test2$suffix $silent -B
time ./ma_test2$suffix $silent -L
time ./ma_test2$suffix $silent -K
time ./ma_test2$suffix $silent -K -B
time ./ma_test2$suffix $silent -L -B
time ./ma_test2$suffix $silent -L -K -B
time ./ma_test2$suffix $silent -L -K -W -B
time ./ma_test2$suffix $silent -L -K -W -B -S
time ./ma_test2$suffix $silent -L -K -W -B -M
time ./ma_test2$suffix $silent -D -K -W -B -S
time $maria_path/ma_test2$suffix $silent
time $maria_path/ma_test2$suffix $silent -S
time $maria_path/ma_test2$suffix $silent -M
time $maria_path/ma_test2$suffix $silent -B
time $maria_path/ma_test2$suffix $silent -L
time $maria_path/ma_test2$suffix $silent -K
time $maria_path/ma_test2$suffix $silent -K -B
time $maria_path/ma_test2$suffix $silent -L -B
time $maria_path/ma_test2$suffix $silent -L -K -B
time $maria_path/ma_test2$suffix $silent -L -K -W -B
time $maria_path/ma_test2$suffix $silent -L -K -W -B -S
time $maria_path/ma_test2$suffix $silent -L -K -W -B -M
time $maria_path/ma_test2$suffix $silent -D -K -W -B -S
......@@ -93,6 +93,7 @@ typedef struct st_maria_state_info
uint sortkey; /* sorted by this key (not used) */
uint open_count;
uint8 changed; /* Changed since mariachk */
LSN create_rename_lsn; /**< LSN when table was last created/renamed */
/* the following isn't saved on disk */
uint state_diff_length; /* Should be 0 */
......@@ -101,7 +102,8 @@ typedef struct st_maria_state_info
} MARIA_STATE_INFO;
#define MARIA_STATE_INFO_SIZE (24 + 4 + 11*8 + 4*4 + 8 + 3*4 + 5*8)
#define MARIA_STATE_INFO_SIZE \
(24 + LSN_STORE_SIZE + 4 + 11*8 + 4*4 + 8 + 3*4 + 5*8)
#define MARIA_STATE_KEY_SIZE 8
#define MARIA_STATE_KEYBLOCK_SIZE 8
#define MARIA_STATE_KEYSEG_SIZE 4
......@@ -229,6 +231,7 @@ typedef struct st_maria_share
PAGECACHE *pagecache; /* ref to the current key cache */
MARIA_DECODE_TREE *decode_trees;
uint16 *decode_tables;
uint16 id; /**< 2-byte id by which log records refer to the table */
/* Called the first time the table instance is opened */
my_bool (*once_init)(struct st_maria_share *, File);
/* Called when the last instance of the table is closed */
......@@ -889,6 +892,7 @@ volatile int *_ma_killed_ptr(HA_CHECK *param);
void _ma_check_print_error _VARARGS((HA_CHECK *param, const char *fmt, ...));
void _ma_check_print_warning _VARARGS((HA_CHECK *param, const char *fmt, ...));
void _ma_check_print_info _VARARGS((HA_CHECK *param, const char *fmt, ...));
int _ma_repair_write_log_record(const HA_CHECK *param, MARIA_HA *info);
C_MODE_END
int _ma_flush_pending_blocks(MARIA_SORT_PARAM *param);
......
......@@ -52,6 +52,7 @@ static my_atomic_rwlock_t LOCK_short_trid_to_trn, LOCK_pool;
/*
Simple interface functions
QQ: if they stay so simple, should we make them inline?
*/
uint trnman_increment_locked_tables(TRN *trn)
......@@ -343,6 +344,9 @@ int trnman_end_trn(TRN *trn, my_bool commit)
LF_PINS *pins= trn->pins;
DBUG_ENTER("trnman_end_trn");
DBUG_ASSERT(trn->rec_lsn == 0);
/* if a rollback, all UNDO records should have been executed */
DBUG_ASSERT(commit || trn->undo_lsn == 0);
DBUG_PRINT("info", ("pthread_mutex_lock LOCK_trn_list"));
pthread_mutex_lock(&LOCK_trn_list);
......@@ -379,8 +383,6 @@ int trnman_end_trn(TRN *trn, my_bool commit)
/*
if transaction is committed and it was not the only active transaction -
add it to the committed list (which is used for read-from relation)
TODO check in the condition below that a transaction have made some
changes, was not read-only. Something like '&& UndoLSN != 0'
*/
if (commit && active_list_min.next != &active_list_max)
{
......@@ -390,6 +392,19 @@ int trnman_end_trn(TRN *trn, my_bool commit)
trnman_committed_transactions++;
res= lf_hash_insert(&trid_to_committed_trn, pins, &trn);
/*
By going on with life is res<0, we let other threads block on
our rows (because they will never see us committed in
trid_to_committed_trn) until they timeout. Though correct, this is not a
good situation:
- if connection reconnects and wants to check if its rows have been
committed, it will not be able to do that (it will just lock on them) so
connection stays permanently in doubt
- internal structures trid_to_committed_trn and committed_list are
desynchronized.
So we should take Maria down immediately, the two problems being
automatically solved at restart.
*/
DBUG_ASSERT(res <= 0);
}
if (res)
......@@ -526,71 +541,133 @@ void trnman_rollback_statement(TRN *trn __attribute__ ((unused)))
}
/*
Allocates two buffers and stores in them some information about transactions
of the active list (into the first buffer) and of the committed list (into
the second buffer).
SYNOPSIS
trnman_collect_transactions()
str_act (OUT) pointer to a LEX_STRING where the allocated buffer, and
its size, will be put
str_com (OUT) pointer to a LEX_STRING where the allocated buffer, and
its size, will be put
/**
@brief Allocates buffers and stores in them some info about transactions
Does the allocation because the caller cannot know the size itself.
Memory freeing is to be done by the caller (if the "str" member of the
LEX_STRING is not NULL).
The caller has the intention of doing checkpoints.
DESCRIPTION
Does the allocation because the caller cannot know the size itself.
Memory freeing is to be done by the caller (if the "str" member of the
LEX_STRING is not NULL).
The caller has the intention of doing checkpoints.
@param[out] str_act pointer to where the allocated buffer,
and its size, will be put; buffer will be filled
with info about active transactions
@param[out] str_com pointer to where the allocated buffer,
and its size, will be put; buffer will be filled
with info about committed transactions
@param[out] min_first_undo_lsn pointer to where the minimum
first_undo_lsn of all transactions will be put
RETURN
0 on success
1 on error
@return Operation status
@retval 0 OK
@retval 1 Error
*/
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com)
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com,
LSN *min_rec_lsn, LSN *min_first_undo_lsn)
{
my_bool error;
TRN *trn;
char *ptr;
uint stored_transactions= 0;
LSN minimum_rec_lsn= ULONGLONG_MAX, minimum_first_undo_lsn= ULONGLONG_MAX;
DBUG_ENTER("trnman_collect_transactions");
DBUG_ASSERT((NULL == str_act->str) && (NULL == str_com->str));
/* validate the use of read_non_atomic() in general: */
compile_time_assert((sizeof(LSN) == 8) && (sizeof(LSN_WITH_FLAGS) == 8));
DBUG_PRINT("info", ("pthread_mutex_lock LOCK_trn_list"));
pthread_mutex_lock(&LOCK_trn_list);
str_act->length= 8+(6+2+7+7+7)*trnman_active_transactions;
str_com->length= 8+(6+7+7)*trnman_committed_transactions;
str_act->length= 2 + /* number of active transactions */
LSN_STORE_SIZE + /* minimum of their rec_lsn */
(6 + /* long id */
2 + /* short id */
LSN_STORE_SIZE + /* undo_lsn */
#ifdef MARIA_VERSIONING /* not enabled yet */
LSN_STORE_SIZE + /* undo_purge_lsn */
#endif
LSN_STORE_SIZE /* first_undo_lsn */
) * trnman_active_transactions;
str_com->length= 8 + /* number of committed transactions */
(6 + /* long id */
#ifdef MARIA_VERSIONING /* not enabled yet */
LSN_STORE_SIZE + /* undo_purge_lsn */
#endif
LSN_STORE_SIZE /* first_undo_lsn */
) * trnman_committed_transactions;
if ((NULL == (str_act->str= my_malloc(str_act->length, MYF(MY_WME)))) ||
(NULL == (str_com->str= my_malloc(str_com->length, MYF(MY_WME)))))
goto err;
/* First, the active transactions */
ptr= str_act->str;
int8store(ptr, (ulonglong)trnman_active_transactions);
ptr+= 8;
ptr= str_act->str + 2 + LSN_STORE_SIZE;
for (trn= active_list_min.next; trn != &active_list_max; trn= trn->next)
{
/*
trns with a short trid of 0 are not initialized; Recovery will recognize
this and ignore them.
State is not needed for now (only when we supported prepared trns).
For LSNs, Sanja will soon push lsn7store.
trns with a short trid of 0 are not even initialized, we can ignore
them. trns with undo_lsn==0 have done no writes, we can ignore them
too. XID not needed now.
*/
uint sid;
LSN rec_lsn, undo_lsn, first_undo_lsn;
if ((sid= trn->short_id) == 0)
{
/*
Not even inited, has done nothing. Or it is the
dummy_transaction_object, which does only non-transactional
immediate-sync operations (CREATE/DROP/RENAME/REPAIR TABLE), and so
can be forgotten for Checkpoint.
*/
continue;
}
#ifndef MARIA_CHECKPOINT
/*
in the checkpoint patch (not yet ready) we will have a real implementation
of lsn_read_non_atomic(); for now it's not needed
*/
#define lsn_read_non_atomic(A) (A)
#endif
/* needed for low-water mark calculation */
if (((rec_lsn= lsn_read_non_atomic(trn->rec_lsn)) > 0) &&
(cmp_translog_addr(rec_lsn, minimum_rec_lsn) < 0))
minimum_rec_lsn= rec_lsn;
/*
trn may have logged REDOs but not yet UNDO, that's why we read rec_lsn
before deciding to ignore if undo_lsn==0.
*/
if ((undo_lsn= trn->undo_lsn) == 0) /* trn can be forgotten */
continue;
stored_transactions++;
int6store(ptr, trn->trid);
ptr+= 6;
int2store(ptr, trn->short_id);
int2store(ptr, sid);
ptr+= 2;
/* needed for rollback */
/* lsn7store(ptr, trn->undo_lsn); */
ptr+= 7;
/* needed for purge */
/* lsn7store(ptr, trn->undo_purge_lsn); */
ptr+= 7;
lsn_store(ptr, undo_lsn); /* needed for rollback */
ptr+= LSN_STORE_SIZE;
#ifdef MARIA_VERSIONING /* not enabled yet */
/* to know where purging should start (last delete of this trn) */
lsn_store(ptr, trn->undo_purge_lsn);
ptr+= LSN_STORE_SIZE;
#endif
/* needed for low-water mark calculation */
/* lsn7store(ptr, read_non_atomic(&trn->first_undo_lsn)); */
ptr+= 7;
if (((first_undo_lsn= lsn_read_non_atomic(trn->first_undo_lsn)) > 0) &&
(cmp_translog_addr(first_undo_lsn, minimum_first_undo_lsn) < 0))
minimum_first_undo_lsn= first_undo_lsn;
lsn_store(ptr, first_undo_lsn);
ptr+= LSN_STORE_SIZE;
/**
@todo RECOVERY: add a comment explaining why we can dirtily read some
vars, inspired by the text of "assumption 8" in WL#3072
*/
}
str_act->length= ptr - str_act->str; /* as we maybe over-estimated */
ptr= str_act->str;
int2store(ptr, stored_transactions);
ptr+= 2;
/* this LSN influences how REDOs for any page can be ignored by Recovery */
lsn_store(ptr, minimum_rec_lsn);
/* one day there will also be a list of prepared transactions */
/* do the same for committed ones */
ptr= str_com->str;
int8store(ptr, (ulonglong)trnman_committed_transactions);
......@@ -598,18 +675,26 @@ my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com)
for (trn= committed_list_min.next; trn != &committed_list_max;
trn= trn->next)
{
LSN first_undo_lsn;
int6store(ptr, trn->trid);
ptr+= 6;
/* mi_int7store(ptr, trn->undo_purge_lsn); */
ptr+= 7;
/* mi_int7store(ptr, read_non_atomic(&trn->first_undo_lsn)); */
ptr+= 7;
#ifdef MARIA_VERSIONING /* not enabled yet */
lsn_store(ptr, trn->undo_purge_lsn);
ptr+= LSN_STORE_SIZE;
#endif
first_undo_lsn= LSN_WITH_FLAGS_TO_LSN(trn->first_undo_lsn);
if (cmp_translog_addr(first_undo_lsn, minimum_first_undo_lsn) < 0)
minimum_first_undo_lsn= first_undo_lsn;
lsn_store(ptr, first_undo_lsn);
ptr+= LSN_STORE_SIZE;
}
/*
TODO: if we see there exists no transaction (active and committed) we can
tell the lock-free structures to do some freeing (my_free()).
*/
error= 0;
*min_rec_lsn= minimum_rec_lsn;
*min_first_undo_lsn= minimum_first_undo_lsn;
goto end;
err:
error= 1;
......
......@@ -45,12 +45,13 @@ struct st_transaction
LF_PINS *pins;
TrID trid, min_read_from, commit_trid;
TRN *next, *prev;
LSN rec_lsn, undo_lsn, first_undo_lsn;
LSN rec_lsn, undo_lsn;
LSN_WITH_FLAGS first_undo_lsn;
uint locked_tables;
/* Note! if locks.loid is 0, trn is NOT initialized */
};
TRN dummy_transaction_object;
#define TRANSACTION_LOGGED_LONG_ID ULL(0x8000000000000000)
C_MODE_END
......
......@@ -20,6 +20,8 @@
to include my_atomic.h in C++ code.
*/
#include "ma_loghandler_lsn.h"
C_MODE_START
typedef uint64 TrID; /* our TrID is 6 bytes */
typedef struct st_transaction TRN;
......@@ -27,6 +29,7 @@ typedef struct st_transaction TRN;
#define SHORT_TRID_MAX 65535
extern uint trnman_active_transactions, trnman_allocated_transactions;
extern TRN dummy_transaction_object;
int trnman_init(void);
void trnman_destroy(void);
......@@ -39,7 +42,9 @@ void trnman_free_trn(TRN *trn);
int trnman_can_read_from(TRN *trn, TrID trid);
void trnman_new_statement(TRN *trn);
void trnman_rollback_statement(TRN *trn);
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com);
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com,
LSN *min_rec_lsn,
LSN *min_first_undo_lsn);
uint trnman_increment_locked_tables(TRN *trn);
uint trnman_decrement_locked_tables(TRN *trn);
......
......@@ -196,7 +196,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL,
6, TRANSLOG_INTERNAL_PARTS + 1, parts))
6, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
translog_destroy();
......@@ -218,7 +218,7 @@ int main(int argc __attribute__((unused)), char *argv[])
parts[TRANSLOG_INTERNAL_PARTS + 1].str= NULL;
parts[TRANSLOG_INTERNAL_PARTS + 1].length= 0;
if (translog_write_record(&lsn, LOGREC_FIXED_RECORD_1LSN_EXAMPLE,
trn, NULL, LSN_STORE_SIZE, 0, parts))
trn, NULL, LSN_STORE_SIZE, 0, parts, NULL))
{
fprintf(stderr, "1 Can't write reference defore record #%lu\n",
(ulong) i);
......@@ -238,7 +238,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE,
trn, NULL, 0, TRANSLOG_INTERNAL_PARTS + 2,
parts))
parts, NULL))
{
fprintf(stderr, "1 Can't write var reference defore record #%lu\n",
(ulong) i);
......@@ -257,7 +257,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_2LSN_EXAMPLE,
trn, NULL,
23, TRANSLOG_INTERNAL_PARTS + 1, parts))
23, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "0 Can't write reference defore record #%lu\n",
(ulong) i);
......@@ -277,7 +277,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE,
trn, NULL, 14 + rec_len,
TRANSLOG_INTERNAL_PARTS + 2, parts))
TRANSLOG_INTERNAL_PARTS + 2, parts, NULL))
{
fprintf(stderr, "0 Can't write var reference defore record #%lu\n",
(ulong) i);
......@@ -294,7 +294,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL, 6,
TRANSLOG_INTERNAL_PARTS + 1,
parts))
parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) i);
translog_destroy();
......@@ -313,7 +313,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
trn, NULL, rec_len,
TRANSLOG_INTERNAL_PARTS + 1,
parts))
parts, NULL))
{
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
translog_destroy();
......
......@@ -192,7 +192,7 @@ int main(int argc __attribute__((unused)), char *argv[])
trn->short_id= 0;
if (translog_write_record(&lsn, LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL,
6, TRANSLOG_INTERNAL_PARTS + 1, parts))
6, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
translog_destroy();
......@@ -214,7 +214,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_1LSN_EXAMPLE,
trn, NULL,
LSN_STORE_SIZE,
TRANSLOG_INTERNAL_PARTS + 1, parts))
TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "1 Can't write reference before record #%lu\n",
(ulong) i);
......@@ -234,7 +234,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE,
trn, NULL, LSN_STORE_SIZE + rec_len,
TRANSLOG_INTERNAL_PARTS + 2,
parts))
parts, NULL))
{
fprintf(stderr, "1 Can't write var reference before record #%lu\n",
(ulong) i);
......@@ -255,7 +255,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_2LSN_EXAMPLE,
trn, NULL, 23,
TRANSLOG_INTERNAL_PARTS + 1,
parts))
parts, NULL))
{
fprintf(stderr, "0 Can't write reference before record #%lu\n",
(ulong) i);
......@@ -276,7 +276,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE,
trn, NULL, LSN_STORE_SIZE * 2 + rec_len,
TRANSLOG_INTERNAL_PARTS + 2,
parts))
parts, NULL))
{
fprintf(stderr, "0 Can't write var reference before record #%lu\n",
(ulong) i);
......@@ -293,7 +293,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL, 6,
TRANSLOG_INTERNAL_PARTS + 1, parts))
TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) i);
translog_destroy();
......@@ -311,7 +311,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
trn, NULL, rec_len,
TRANSLOG_INTERNAL_PARTS + 1, parts))
TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
translog_destroy();
......
......@@ -137,7 +137,7 @@ void writer(int num)
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
&trn, NULL, 6, TRANSLOG_INTERNAL_PARTS + 1,
parts))
parts, NULL))
{
fprintf(stderr, "Can't write LOGREC_FIXED_RECORD_0LSN_EXAMPLE record #%lu "
"thread %i\n", (ulong) i, num);
......@@ -154,7 +154,7 @@ void writer(int num)
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
&trn, NULL,
len, TRANSLOG_INTERNAL_PARTS + 1,
parts))
parts, NULL))
{
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
translog_destroy();
......@@ -303,7 +303,7 @@ int main(int argc __attribute__((unused)),
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
&dummy_transaction_object, NULL, 6,
TRANSLOG_INTERNAL_PARTS + 1,
parts))
parts, NULL))
{
fprintf(stderr, "Can't write the first record\n");
translog_destroy();
......
......@@ -94,7 +94,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
&dummy_transaction_object, NULL, 6,
TRANSLOG_INTERNAL_PARTS + 1,
parts))
parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
translog_destroy();
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment