Commit 7f890ab6 authored by Satya B's avatar Satya B

Applying InnoDB Plugin 1.0.5 snapshot, part 4

From revision r5703 to r5716

Detailed revision comments:

r5703 | marko | 2009-08-27 02:25:00 -0500 (Thu, 27 Aug 2009) | 41 lines
branches/zip: Replace the constant 3/8 ratio that controls the LRU_old
size with the settable global variable innodb_old_blocks_pct. The
minimum and maximum values are 5 and 95 per cent, respectively. The
default is 100*3/8, in line with the old behavior.

ut_time_ms(): New utility function, to return the current time in
milliseconds. TODO: Is there a more efficient timestamp function, such
as rdtsc divided by a power of two?

buf_LRU_old_threshold_ms: New variable, corresponding to
innodb_old_blocks_time. The value 0 is the default behaviour: no
timeout before making blocks 'new'.

bpage->accessed, bpage->LRU_position, buf_pool->ulint_clock: Remove.

bpage->access_time: New field, replacing bpage->accessed. Protected by
buf_pool_mutex instead of bpage->mutex. Updated when a page is created
or accessed the first time in the buffer pool.

buf_LRU_old_ratio, innobase_old_blocks_pct: New variables,
corresponding to innodb_old_blocks_pct

buf_LRU_old_ratio_update(), innobase_old_blocks_pct_update(): Update
functions for buf_LRU_old_ratio, innobase_old_blocks_pct.

buf_page_peek_if_too_old(): Compare ut_time_ms() to bpage->access_time
if buf_LRU_old_threshold_ms && bpage->old.  Else observe
buf_LRU_old_ratio and bpage->freed_page_clock.

buf_pool_t: Add n_pages_made_young, n_pages_not_made_young,
n_pages_made_young_old, n_pages_not_made_young, for statistics.

buf_print(): Display buf_pool->n_pages_made_young,
buf_pool->n_pages_not_made_young.  This function is only for crash
diagnostics.

buf_print_io(): Display buf_pool->LRU_old_len and quantities derived
from buf_pool->n_pages_made_young, buf_pool->n_pages_not_made_young.
This function is invoked by SHOW ENGINE INNODB STATUS.

rb://129 approved by Heikki Tuuri.  This addresses Bug #45015.
r5704 | marko | 2009-08-27 03:31:17 -0500 (Thu, 27 Aug 2009) | 32 lines
branches/zip: Fix a critical bug in fast index creation that could
corrupt the created indexes.

row_merge(): Make "half" an in/out parameter. Determine the offset of
half the output file. Copy the last blocks record-by-record instead of
block-by-block, so that the records can be counted. Check that the
input and output have matching n_rec.

row_merge_sort(): Do not assume that two blocks of size N are merged
into a block of size 2*N. The output block can be shorter than the
input if the last page of each input block is almost empty. Use an
accurate termination condition, based on the "half" computed by
row_merge().

row_merge_read(), row_merge_write(), row_merge_blocks(): Add debug output.

merge_file_t, row_merge_file_create(): Add n_rec, the number of records
in the merge file.

row_merge_read_clustered_index(): Update n_rec.

row_merge_blocks(): Update and check n_rec.

row_merge_blocks_copy(): New function, for copying the last blocks in
row_merge().  Update and check n_rec.

This bug was discovered with a user-supplied test case that creates an
index where the initial temporary file is 249 one-megabyte blocks and
the merged files become smaller. In the test, possible merge record
sizes are 10, 18, and 26 bytes.

rb://150 approved by Sunny Bains.  This addresses Issue #320.
r5705 | marko | 2009-08-27 06:56:24 -0500 (Thu, 27 Aug 2009) | 11 lines
branches/zip: dict_index_find_cols(): On column name lookup failure,
return DB_CORRUPTION (HA_ERR_CRASHED) instead of abnormally
terminating the server.  Also, disable the previously added diagnostic
output to the error log, because mysql-test-run does not like extra
output in the error log.  (Bug #44571)

dict_index_add_to_cache(): Handle errors from dict_index_find_cols().

mysql-test/innodb_bug44571.test: A test case for triggering the bug.

rb://135 approved by Sunny Bains.
r5706 | inaam | 2009-08-27 11:00:27 -0500 (Thu, 27 Aug 2009) | 20 lines
branches/zip rb://147

Done away with following two status variables:

innodb_buffer_pool_read_ahead_rnd
innodb_buffer_pool_read_ahead_seq

Introduced two new status variables:
innodb_buffer_pool_read_ahead = number of pages read as part of
readahead since server startup
innodb_buffer_pool_read_ahead_evicted = number of pages that are read
in as readahead but were evicted before ever being accessed since
server startup i.e.: a measure of how badly our readahead is
performing

SHOW INNODB STATUS will show two extra numbers in buffer pool section:
pages read ahead/sec and pages evicted without access/sec

Approved by: Marko

r5707 | inaam | 2009-08-27 11:20:35 -0500 (Thu, 27 Aug 2009) | 6 lines
branches/zip

Remove unused macros as we erased the random readahead code in r5703.
Also fixed some comments.


r5708 | inaam | 2009-08-27 17:43:32 -0500 (Thu, 27 Aug 2009) | 4 lines
branches/zip

Remove redundant TRUE : FALSE from the return statement

r5709 | inaam | 2009-08-28 01:22:46 -0500 (Fri, 28 Aug 2009) | 5 lines
branches/zip rb://152

Disable display of deprecated parameter innodb_file_io_threads in
'show variables'.

r5714 | marko | 2009-08-31 01:10:10 -0500 (Mon, 31 Aug 2009) | 5 lines
branches/zip: buf_chunk_not_freed(): Do not acquire block->mutex unless
block->page.state == BUF_BLOCK_FILE_PAGE.  Check that block->page.state
makes sense.

Approved by Sunny Bains over the IM.
r5716 | vasil | 2009-08-31 02:47:49 -0500 (Mon, 31 Aug 2009) | 9 lines
branches/zip:

Fix Bug#46718 InnoDB plugin incompatible with gcc 4.1 (at least: on PPC): "Undefined symbol"

by implementing our own check in plug.in instead of using the result from
the check from MySQL because it is insufficient.

Approved by:	Marko (rb://154)
parent 266c7cd6
CREATE TABLE bug44571 (foo INT) ENGINE=InnoDB;
ALTER TABLE bug44571 CHANGE foo bar INT;
ALTER TABLE bug44571 ADD INDEX bug44571b (foo);
ERROR 42000: Key column 'foo' doesn't exist in table
ALTER TABLE bug44571 ADD INDEX bug44571b (bar);
ERROR HY000: Incorrect key file for table 'bug44571'; try to repair it
CREATE INDEX bug44571b ON bug44571 (bar);
ERROR HY000: Incorrect key file for table 'bug44571'; try to repair it
DROP TABLE bug44571;
#
# Bug#44571 InnoDB Plugin crashes on ADD INDEX
# http://bugs.mysql.com/44571
#
-- source include/have_innodb.inc
-- source suite/innodb/include/have_innodb_plugin.inc
CREATE TABLE bug44571 (foo INT) ENGINE=InnoDB;
ALTER TABLE bug44571 CHANGE foo bar INT;
-- error ER_KEY_COLUMN_DOES_NOT_EXITS
ALTER TABLE bug44571 ADD INDEX bug44571b (foo);
# The following will fail, because the CHANGE foo bar was
# not communicated to InnoDB.
--error ER_NOT_KEYFILE
ALTER TABLE bug44571 ADD INDEX bug44571b (bar);
--error ER_NOT_KEYFILE
CREATE INDEX bug44571b ON bug44571 (bar);
DROP TABLE bug44571;
2009-08-27 The InnoDB Team
* dict/dict0dict.c, include/dict0dict.h,
mysql-test/innodb_bug44571.result, mysql-test/innodb_bug44571.test:
Fix Bug#44571 InnoDB Plugin crashes on ADD INDEX
2009-08-27 The InnoDB Team
* row/row0merge.c:
Fix a bug in the merge sort that can corrupt indexes in fast index
creation. Add some consistency checks. Check that the number of
records remains constant in every merge sort pass.
2009-08-27 The InnoDB Team
* buf/buf0buf.c, buf/buf0lru.c, buf/buf0rea.c,
handler/ha_innodb.cc, include/buf0buf.h, include/buf0buf.ic,
include/buf0lru.h, include/ut0ut.h, ut/ut0ut.c:
Make it possible to tune the buffer pool LRU eviction policy to be
more resistant against index scans. Introduce the settable global
variables innodb_old_blocks_pct and innodb_old_blocks_time for
controlling the buffer pool eviction policy. The parameter
innodb_old_blocks_pct (5..95) controls the desired amount of "old"
blocks in the LRU list. The default is 37, corresponding to the
old fixed ratio of 3/8. Each time a block is accessed, it will be
moved to the "new" blocks if its first access was at least
innodb_old_blocks_time milliseconds ago (default 0, meaning every
block). The idea is that in index scans, blocks will be accessed
a few times within innodb_old_blocks_time, and they will remain in
the "old" section of the LRU list. Thus, when
innodb_old_blocks_time is nonzero, blocks retrieved for one-time
index scans will be more likely candidates for eviction than
blocks that are accessed in random patterns.
2009-08-26 The InnoDB Team 2009-08-26 The InnoDB Team
* handler/ha_innodb.cc, os/os0file.c: * handler/ha_innodb.cc, os/os0file.c:
......
...@@ -957,7 +957,7 @@ btr_search_guess_on_hash( ...@@ -957,7 +957,7 @@ btr_search_guess_on_hash(
/* Increment the page get statistics though we did not really /* Increment the page get statistics though we did not really
fix the page: for user info only */ fix the page: for user info only */
buf_pool->n_page_gets++; buf_pool->stat.n_page_gets++;
return(TRUE); return(TRUE);
......
...@@ -837,16 +837,35 @@ buf_chunk_not_freed( ...@@ -837,16 +837,35 @@ buf_chunk_not_freed(
block = chunk->blocks; block = chunk->blocks;
for (i = chunk->size; i--; block++) { for (i = chunk->size; i--; block++) {
ibool ready;
switch (buf_block_get_state(block)) {
case BUF_BLOCK_ZIP_FREE:
case BUF_BLOCK_ZIP_PAGE:
case BUF_BLOCK_ZIP_DIRTY:
/* The uncompressed buffer pool should never
contain compressed block descriptors. */
ut_error;
break;
case BUF_BLOCK_NOT_USED:
case BUF_BLOCK_READY_FOR_USE:
case BUF_BLOCK_MEMORY:
case BUF_BLOCK_REMOVE_HASH:
/* Skip blocks that are not being used for
file pages. */
break;
case BUF_BLOCK_FILE_PAGE:
mutex_enter(&block->mutex); mutex_enter(&block->mutex);
ready = buf_flush_ready_for_replace(&block->page);
mutex_exit(&block->mutex);
if (buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE if (!ready) {
&& !buf_flush_ready_for_replace(&block->page)) {
mutex_exit(&block->mutex);
return(block); return(block);
} }
mutex_exit(&block->mutex); break;
}
} }
return(NULL); return(NULL);
...@@ -966,8 +985,6 @@ buf_pool_init(void) ...@@ -966,8 +985,6 @@ buf_pool_init(void)
buf_pool->no_flush[i] = os_event_create(NULL); buf_pool->no_flush[i] = os_event_create(NULL);
} }
buf_pool->ulint_clock = 1;
/* 3. Initialize LRU fields /* 3. Initialize LRU fields
--------------------------- */ --------------------------- */
/* All fields are initialized by mem_zalloc(). */ /* All fields are initialized by mem_zalloc(). */
...@@ -1470,31 +1487,6 @@ buf_pool_resize(void) ...@@ -1470,31 +1487,6 @@ buf_pool_resize(void)
buf_pool_page_hash_rebuild(); buf_pool_page_hash_rebuild();
} }
/********************************************************************//**
Moves the block to the start of the LRU list if there is a danger
that the block would drift out of the buffer pool. */
UNIV_INLINE
void
buf_block_make_young(
/*=================*/
buf_page_t* bpage) /*!< in: block to make younger */
{
ut_ad(!buf_pool_mutex_own());
/* Note that we read freed_page_clock's without holding any mutex:
this is allowed since the result is used only in heuristics */
if (buf_page_peek_if_too_old(bpage)) {
buf_pool_mutex_enter();
/* There has been freeing activity in the LRU list:
best to move to the head of the LRU list */
buf_LRU_make_block_young(bpage);
buf_pool_mutex_exit();
}
}
/********************************************************************//** /********************************************************************//**
Moves a page to the start of the buffer pool LRU list. This high-level Moves a page to the start of the buffer pool LRU list. This high-level
function can be used to prevent an important page from from slipping out of function can be used to prevent an important page from from slipping out of
...@@ -1649,7 +1641,7 @@ buf_page_get_zip( ...@@ -1649,7 +1641,7 @@ buf_page_get_zip(
#ifndef UNIV_LOG_DEBUG #ifndef UNIV_LOG_DEBUG
ut_ad(!ibuf_inside()); ut_ad(!ibuf_inside());
#endif #endif
buf_pool->n_page_gets++; buf_pool->stat.n_page_gets++;
for (;;) { for (;;) {
buf_pool_mutex_enter(); buf_pool_mutex_enter();
...@@ -1713,13 +1705,15 @@ err_exit: ...@@ -1713,13 +1705,15 @@ err_exit:
got_block: got_block:
must_read = buf_page_get_io_fix(bpage) == BUF_IO_READ; must_read = buf_page_get_io_fix(bpage) == BUF_IO_READ;
buf_pool_mutex_exit(); if (buf_page_peek_if_too_old(bpage)) {
buf_LRU_make_block_young(bpage);
}
buf_page_set_accessed(bpage, TRUE); buf_page_set_accessed(bpage);
mutex_exit(block_mutex); buf_pool_mutex_exit();
buf_block_make_young(bpage); mutex_exit(block_mutex);
#ifdef UNIV_DEBUG_FILE_ACCESSES #ifdef UNIV_DEBUG_FILE_ACCESSES
ut_a(!bpage->file_page_was_freed); ut_a(!bpage->file_page_was_freed);
...@@ -2000,7 +1994,7 @@ buf_page_get_gen( ...@@ -2000,7 +1994,7 @@ buf_page_get_gen(
mtr_t* mtr) /*!< in: mini-transaction */ mtr_t* mtr) /*!< in: mini-transaction */
{ {
buf_block_t* block; buf_block_t* block;
ibool accessed; unsigned access_time;
ulint fix_type; ulint fix_type;
ibool must_read; ibool must_read;
...@@ -2016,7 +2010,7 @@ buf_page_get_gen( ...@@ -2016,7 +2010,7 @@ buf_page_get_gen(
#ifndef UNIV_LOG_DEBUG #ifndef UNIV_LOG_DEBUG
ut_ad(!ibuf_inside() || ibuf_page(space, zip_size, offset, NULL)); ut_ad(!ibuf_inside() || ibuf_page(space, zip_size, offset, NULL));
#endif #endif
buf_pool->n_page_gets++; buf_pool->stat.n_page_gets++;
loop: loop:
block = guess; block = guess;
buf_pool_mutex_enter(); buf_pool_mutex_enter();
...@@ -2243,17 +2237,20 @@ wait_until_unfixed: ...@@ -2243,17 +2237,20 @@ wait_until_unfixed:
UNIV_MEM_ASSERT_RW(&block->page, sizeof block->page); UNIV_MEM_ASSERT_RW(&block->page, sizeof block->page);
buf_block_buf_fix_inc(block, file, line); buf_block_buf_fix_inc(block, file, line);
buf_pool_mutex_exit();
mutex_exit(&block->mutex);
/* Check if this is the first access to the page */ /* Check if this is the first access to the page */
accessed = buf_page_is_accessed(&block->page); access_time = buf_page_is_accessed(&block->page);
buf_page_set_accessed(&block->page, TRUE); if (buf_page_peek_if_too_old(&block->page)) {
buf_LRU_make_block_young(&block->page);
}
mutex_exit(&block->mutex); buf_page_set_accessed(&block->page);
buf_block_make_young(&block->page); buf_pool_mutex_exit();
#ifdef UNIV_DEBUG_FILE_ACCESSES #ifdef UNIV_DEBUG_FILE_ACCESSES
ut_a(!block->page.file_page_was_freed); ut_a(!block->page.file_page_was_freed);
...@@ -2306,7 +2303,7 @@ wait_until_unfixed: ...@@ -2306,7 +2303,7 @@ wait_until_unfixed:
mtr_memo_push(mtr, block, fix_type); mtr_memo_push(mtr, block, fix_type);
if (!accessed) { if (!access_time) {
/* In the case of a first access, try to apply linear /* In the case of a first access, try to apply linear
read-ahead */ read-ahead */
...@@ -2336,7 +2333,7 @@ buf_page_optimistic_get_func( ...@@ -2336,7 +2333,7 @@ buf_page_optimistic_get_func(
ulint line, /*!< in: line where called */ ulint line, /*!< in: line where called */
mtr_t* mtr) /*!< in: mini-transaction */ mtr_t* mtr) /*!< in: mini-transaction */
{ {
ibool accessed; unsigned access_time;
ibool success; ibool success;
ulint fix_type; ulint fix_type;
...@@ -2353,14 +2350,21 @@ buf_page_optimistic_get_func( ...@@ -2353,14 +2350,21 @@ buf_page_optimistic_get_func(
} }
buf_block_buf_fix_inc(block, file, line); buf_block_buf_fix_inc(block, file, line);
accessed = buf_page_is_accessed(&block->page);
buf_page_set_accessed(&block->page, TRUE);
mutex_exit(&block->mutex); mutex_exit(&block->mutex);
buf_block_make_young(&block->page); buf_pool_mutex_enter();
/* Check if this is the first access to the page */ /* Check if this is the first access to the page */
access_time = buf_page_is_accessed(&block->page);
if (buf_page_peek_if_too_old(&block->page)) {
buf_LRU_make_block_young(&block->page);
}
buf_page_set_accessed(&block->page);
buf_pool_mutex_exit();
ut_ad(!ibuf_inside() ut_ad(!ibuf_inside()
|| ibuf_page(buf_block_get_space(block), || ibuf_page(buf_block_get_space(block),
...@@ -2412,7 +2416,7 @@ buf_page_optimistic_get_func( ...@@ -2412,7 +2416,7 @@ buf_page_optimistic_get_func(
#ifdef UNIV_DEBUG_FILE_ACCESSES #ifdef UNIV_DEBUG_FILE_ACCESSES
ut_a(block->page.file_page_was_freed == FALSE); ut_a(block->page.file_page_was_freed == FALSE);
#endif #endif
if (UNIV_UNLIKELY(!accessed)) { if (UNIV_UNLIKELY(!access_time)) {
/* In the case of a first access, try to apply linear /* In the case of a first access, try to apply linear
read-ahead */ read-ahead */
...@@ -2425,7 +2429,7 @@ buf_page_optimistic_get_func( ...@@ -2425,7 +2429,7 @@ buf_page_optimistic_get_func(
ut_a(ibuf_count_get(buf_block_get_space(block), ut_a(ibuf_count_get(buf_block_get_space(block),
buf_block_get_page_no(block)) == 0); buf_block_get_page_no(block)) == 0);
#endif #endif
buf_pool->n_page_gets++; buf_pool->stat.n_page_gets++;
return(TRUE); return(TRUE);
} }
...@@ -2473,10 +2477,16 @@ buf_page_get_known_nowait( ...@@ -2473,10 +2477,16 @@ buf_page_get_known_nowait(
mutex_exit(&block->mutex); mutex_exit(&block->mutex);
if (mode == BUF_MAKE_YOUNG) { buf_pool_mutex_enter();
buf_block_make_young(&block->page);
if (mode == BUF_MAKE_YOUNG && buf_page_peek_if_too_old(&block->page)) {
buf_LRU_make_block_young(&block->page);
} }
buf_page_set_accessed(&block->page);
buf_pool_mutex_exit();
ut_ad(!ibuf_inside() || (mode == BUF_KEEP_OLD)); ut_ad(!ibuf_inside() || (mode == BUF_KEEP_OLD));
if (rw_latch == RW_S_LATCH) { if (rw_latch == RW_S_LATCH) {
...@@ -2513,7 +2523,7 @@ buf_page_get_known_nowait( ...@@ -2513,7 +2523,7 @@ buf_page_get_known_nowait(
|| (ibuf_count_get(buf_block_get_space(block), || (ibuf_count_get(buf_block_get_space(block),
buf_block_get_page_no(block)) == 0)); buf_block_get_page_no(block)) == 0));
#endif #endif
buf_pool->n_page_gets++; buf_pool->stat.n_page_gets++;
return(TRUE); return(TRUE);
} }
...@@ -2589,7 +2599,7 @@ buf_page_try_get_func( ...@@ -2589,7 +2599,7 @@ buf_page_try_get_func(
#endif /* UNIV_DEBUG_FILE_ACCESSES */ #endif /* UNIV_DEBUG_FILE_ACCESSES */
buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK); buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
buf_pool->n_page_gets++; buf_pool->stat.n_page_gets++;
#ifdef UNIV_IBUF_COUNT_DEBUG #ifdef UNIV_IBUF_COUNT_DEBUG
ut_a(ibuf_count_get(buf_block_get_space(block), ut_a(ibuf_count_get(buf_block_get_space(block),
...@@ -2608,10 +2618,10 @@ buf_page_init_low( ...@@ -2608,10 +2618,10 @@ buf_page_init_low(
buf_page_t* bpage) /*!< in: block to init */ buf_page_t* bpage) /*!< in: block to init */
{ {
bpage->flush_type = BUF_FLUSH_LRU; bpage->flush_type = BUF_FLUSH_LRU;
bpage->accessed = FALSE;
bpage->io_fix = BUF_IO_NONE; bpage->io_fix = BUF_IO_NONE;
bpage->buf_fix_count = 0; bpage->buf_fix_count = 0;
bpage->freed_page_clock = 0; bpage->freed_page_clock = 0;
bpage->access_time = 0;
bpage->newest_modification = 0; bpage->newest_modification = 0;
bpage->oldest_modification = 0; bpage->oldest_modification = 0;
HASH_INVALIDATE(bpage, hash); HASH_INVALIDATE(bpage, hash);
...@@ -2953,7 +2963,7 @@ buf_page_create( ...@@ -2953,7 +2963,7 @@ buf_page_create(
buf_LRU_add_block(&block->page, FALSE); buf_LRU_add_block(&block->page, FALSE);
buf_block_buf_fix_inc(block, __FILE__, __LINE__); buf_block_buf_fix_inc(block, __FILE__, __LINE__);
buf_pool->n_pages_created++; buf_pool->stat.n_pages_created++;
if (zip_size) { if (zip_size) {
void* data; void* data;
...@@ -2990,12 +3000,12 @@ buf_page_create( ...@@ -2990,12 +3000,12 @@ buf_page_create(
rw_lock_x_unlock(&block->lock); rw_lock_x_unlock(&block->lock);
} }
buf_page_set_accessed(&block->page);
buf_pool_mutex_exit(); buf_pool_mutex_exit();
mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX); mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX);
buf_page_set_accessed(&block->page, TRUE);
mutex_exit(&block->mutex); mutex_exit(&block->mutex);
/* Delete possible entries for the page from the insert buffer: /* Delete possible entries for the page from the insert buffer:
...@@ -3201,7 +3211,7 @@ corrupt: ...@@ -3201,7 +3211,7 @@ corrupt:
ut_ad(buf_pool->n_pend_reads > 0); ut_ad(buf_pool->n_pend_reads > 0);
buf_pool->n_pend_reads--; buf_pool->n_pend_reads--;
buf_pool->n_pages_read++; buf_pool->stat.n_pages_read++;
if (uncompressed) { if (uncompressed) {
rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock, rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock,
...@@ -3221,7 +3231,7 @@ corrupt: ...@@ -3221,7 +3231,7 @@ corrupt:
BUF_IO_WRITE); BUF_IO_WRITE);
} }
buf_pool->n_pages_written++; buf_pool->stat.n_pages_written++;
break; break;
...@@ -3528,6 +3538,7 @@ buf_print(void) ...@@ -3528,6 +3538,7 @@ buf_print(void)
"n pending decompressions %lu\n" "n pending decompressions %lu\n"
"n pending reads %lu\n" "n pending reads %lu\n"
"n pending flush LRU %lu list %lu single page %lu\n" "n pending flush LRU %lu list %lu single page %lu\n"
"pages made young %lu, not young %lu\n"
"pages read %lu, created %lu, written %lu\n", "pages read %lu, created %lu, written %lu\n",
(ulong) size, (ulong) size,
(ulong) UT_LIST_GET_LEN(buf_pool->LRU), (ulong) UT_LIST_GET_LEN(buf_pool->LRU),
...@@ -3538,8 +3549,11 @@ buf_print(void) ...@@ -3538,8 +3549,11 @@ buf_print(void)
(ulong) buf_pool->n_flush[BUF_FLUSH_LRU], (ulong) buf_pool->n_flush[BUF_FLUSH_LRU],
(ulong) buf_pool->n_flush[BUF_FLUSH_LIST], (ulong) buf_pool->n_flush[BUF_FLUSH_LIST],
(ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE], (ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE],
(ulong) buf_pool->n_pages_read, buf_pool->n_pages_created, (ulong) buf_pool->stat.n_pages_made_young,
(ulong) buf_pool->n_pages_written); (ulong) buf_pool->stat.n_pages_not_made_young,
(ulong) buf_pool->stat.n_pages_read,
(ulong) buf_pool->stat.n_pages_created,
(ulong) buf_pool->stat.n_pages_written);
/* Count the number of blocks belonging to each index in the buffer */ /* Count the number of blocks belonging to each index in the buffer */
...@@ -3744,10 +3758,9 @@ buf_print_io( ...@@ -3744,10 +3758,9 @@ buf_print_io(
{ {
time_t current_time; time_t current_time;
double time_elapsed; double time_elapsed;
ulint size; ulint n_gets_diff;
ut_ad(buf_pool); ut_ad(buf_pool);
size = buf_pool->curr_size;
buf_pool_mutex_enter(); buf_pool_mutex_enter();
...@@ -3755,12 +3768,14 @@ buf_print_io( ...@@ -3755,12 +3768,14 @@ buf_print_io(
"Buffer pool size %lu\n" "Buffer pool size %lu\n"
"Free buffers %lu\n" "Free buffers %lu\n"
"Database pages %lu\n" "Database pages %lu\n"
"Old database pages %lu\n"
"Modified db pages %lu\n" "Modified db pages %lu\n"
"Pending reads %lu\n" "Pending reads %lu\n"
"Pending writes: LRU %lu, flush list %lu, single page %lu\n", "Pending writes: LRU %lu, flush list %lu, single page %lu\n",
(ulong) size, (ulong) buf_pool->curr_size,
(ulong) UT_LIST_GET_LEN(buf_pool->free), (ulong) UT_LIST_GET_LEN(buf_pool->free),
(ulong) UT_LIST_GET_LEN(buf_pool->LRU), (ulong) UT_LIST_GET_LEN(buf_pool->LRU),
(ulong) buf_pool->LRU_old_len,
(ulong) UT_LIST_GET_LEN(buf_pool->flush_list), (ulong) UT_LIST_GET_LEN(buf_pool->flush_list),
(ulong) buf_pool->n_pend_reads, (ulong) buf_pool->n_pend_reads,
(ulong) buf_pool->n_flush[BUF_FLUSH_LRU] (ulong) buf_pool->n_flush[BUF_FLUSH_LRU]
...@@ -3772,37 +3787,66 @@ buf_print_io( ...@@ -3772,37 +3787,66 @@ buf_print_io(
current_time = time(NULL); current_time = time(NULL);
time_elapsed = 0.001 + difftime(current_time, time_elapsed = 0.001 + difftime(current_time,
buf_pool->last_printout_time); buf_pool->last_printout_time);
buf_pool->last_printout_time = current_time;
fprintf(file, fprintf(file,
"Pages made young %lu, not young %lu\n"
"%.2f youngs/s, %.2f non-youngs/s\n"
"Pages read %lu, created %lu, written %lu\n" "Pages read %lu, created %lu, written %lu\n"
"%.2f reads/s, %.2f creates/s, %.2f writes/s\n", "%.2f reads/s, %.2f creates/s, %.2f writes/s\n",
(ulong) buf_pool->n_pages_read, (ulong) buf_pool->stat.n_pages_made_young,
(ulong) buf_pool->n_pages_created, (ulong) buf_pool->stat.n_pages_not_made_young,
(ulong) buf_pool->n_pages_written, (buf_pool->stat.n_pages_made_young
(buf_pool->n_pages_read - buf_pool->n_pages_read_old) - buf_pool->old_stat.n_pages_made_young)
/ time_elapsed,
(buf_pool->stat.n_pages_not_made_young
- buf_pool->old_stat.n_pages_not_made_young)
/ time_elapsed,
(ulong) buf_pool->stat.n_pages_read,
(ulong) buf_pool->stat.n_pages_created,
(ulong) buf_pool->stat.n_pages_written,
(buf_pool->stat.n_pages_read
- buf_pool->old_stat.n_pages_read)
/ time_elapsed, / time_elapsed,
(buf_pool->n_pages_created - buf_pool->n_pages_created_old) (buf_pool->stat.n_pages_created
- buf_pool->old_stat.n_pages_created)
/ time_elapsed, / time_elapsed,
(buf_pool->n_pages_written - buf_pool->n_pages_written_old) (buf_pool->stat.n_pages_written
- buf_pool->old_stat.n_pages_written)
/ time_elapsed); / time_elapsed);
if (buf_pool->n_page_gets > buf_pool->n_page_gets_old) { n_gets_diff = buf_pool->stat.n_page_gets - buf_pool->old_stat.n_page_gets;
fprintf(file, "Buffer pool hit rate %lu / 1000\n",
if (n_gets_diff) {
fprintf(file,
"Buffer pool hit rate %lu / 1000,"
" young-making rate %lu / 1000 not %lu / 1000\n",
(ulong) (ulong)
(1000 - ((1000 * (buf_pool->n_pages_read (1000 - ((1000 * (buf_pool->stat.n_pages_read
- buf_pool->n_pages_read_old)) - buf_pool->old_stat.n_pages_read))
/ (buf_pool->n_page_gets / (buf_pool->stat.n_page_gets
- buf_pool->n_page_gets_old)))); - buf_pool->old_stat.n_page_gets))),
(ulong)
(1000 * (buf_pool->stat.n_pages_made_young
- buf_pool->old_stat.n_pages_made_young)
/ n_gets_diff),
(ulong)
(1000 * (buf_pool->stat.n_pages_not_made_young
- buf_pool->old_stat.n_pages_not_made_young)
/ n_gets_diff));
} else { } else {
fputs("No buffer pool page gets since the last printout\n", fputs("No buffer pool page gets since the last printout\n",
file); file);
} }
buf_pool->n_page_gets_old = buf_pool->n_page_gets; /* Statistics about read ahead algorithm */
buf_pool->n_pages_read_old = buf_pool->n_pages_read; fprintf(file, "Pages read ahead %.2f/s,"
buf_pool->n_pages_created_old = buf_pool->n_pages_created; " evicted without access %.2f/s\n",
buf_pool->n_pages_written_old = buf_pool->n_pages_written; (buf_pool->stat.n_ra_pages_read
- buf_pool->old_stat.n_ra_pages_read)
/ time_elapsed,
(buf_pool->stat.n_ra_pages_evicted
- buf_pool->old_stat.n_ra_pages_evicted)
/ time_elapsed);
/* Print some values to help us with visualizing what is /* Print some values to help us with visualizing what is
happening with LRU eviction. */ happening with LRU eviction. */
...@@ -3814,6 +3858,7 @@ buf_print_io( ...@@ -3814,6 +3858,7 @@ buf_print_io(
buf_LRU_stat_sum.io, buf_LRU_stat_cur.io, buf_LRU_stat_sum.io, buf_LRU_stat_cur.io,
buf_LRU_stat_sum.unzip, buf_LRU_stat_cur.unzip); buf_LRU_stat_sum.unzip, buf_LRU_stat_cur.unzip);
buf_refresh_io_stats();
buf_pool_mutex_exit(); buf_pool_mutex_exit();
} }
...@@ -3825,10 +3870,7 @@ buf_refresh_io_stats(void) ...@@ -3825,10 +3870,7 @@ buf_refresh_io_stats(void)
/*======================*/ /*======================*/
{ {
buf_pool->last_printout_time = time(NULL); buf_pool->last_printout_time = time(NULL);
buf_pool->n_page_gets_old = buf_pool->n_page_gets; buf_pool->old_stat = buf_pool->stat;
buf_pool->n_pages_read_old = buf_pool->n_pages_read;
buf_pool->n_pages_created_old = buf_pool->n_pages_created;
buf_pool->n_pages_written_old = buf_pool->n_pages_written;
} }
/*********************************************************************//** /*********************************************************************//**
......
...@@ -49,14 +49,23 @@ Created 11/5/1995 Heikki Tuuri ...@@ -49,14 +49,23 @@ Created 11/5/1995 Heikki Tuuri
#include "log0recv.h" #include "log0recv.h"
#include "srv0srv.h" #include "srv0srv.h"
/** The number of blocks from the LRU_old pointer onward, including the block /** The number of blocks from the LRU_old pointer onward, including
pointed to, must be 3/8 of the whole LRU list length, except that the the block pointed to, must be buf_LRU_old_ratio/BUF_LRU_OLD_RATIO_DIV
tolerance defined below is allowed. Note that the tolerance must be small of the whole LRU list length, except that the tolerance defined below
enough such that for even the BUF_LRU_OLD_MIN_LEN long LRU list, the is allowed. Note that the tolerance must be small enough such that for
LRU_old pointer is not allowed to point to either end of the LRU list. */ even the BUF_LRU_OLD_MIN_LEN long LRU list, the LRU_old pointer is not
allowed to point to either end of the LRU list. */
#define BUF_LRU_OLD_TOLERANCE 20 #define BUF_LRU_OLD_TOLERANCE 20
/** The minimum amount of non-old blocks when the LRU_old list exists
(that is, when there are more than BUF_LRU_OLD_MIN_LEN blocks).
@see buf_LRU_old_adjust_len */
#define BUF_LRU_NON_OLD_MIN_LEN 5
#if BUF_LRU_NON_OLD_MIN_LEN >= BUF_LRU_OLD_MIN_LEN
# error "BUF_LRU_NON_OLD_MIN_LEN >= BUF_LRU_OLD_MIN_LEN"
#endif
/** The whole LRU list length is divided by this number to determine an /** The whole LRU list length is divided by this number to determine an
initial segment in buf_LRU_get_recent_limit */ initial segment in buf_LRU_get_recent_limit */
...@@ -107,6 +116,15 @@ UNIV_INTERN buf_LRU_stat_t buf_LRU_stat_sum; ...@@ -107,6 +116,15 @@ UNIV_INTERN buf_LRU_stat_t buf_LRU_stat_sum;
/* @} */ /* @} */
/** @name Heuristics for detecting index scan @{ */
/** Reserve this much/BUF_LRU_OLD_RATIO_DIV of the buffer pool for
"old" blocks. Protected by buf_pool_mutex. */
UNIV_INTERN uint buf_LRU_old_ratio;
/** Move blocks to "new" LRU list only if the first access was at
least this many milliseconds ago. Not protected by any mutex or latch. */
UNIV_INTERN uint buf_LRU_old_threshold_ms;
/* @} */
/******************************************************************//** /******************************************************************//**
Takes a block out of the LRU list and page hash table. Takes a block out of the LRU list and page hash table.
If the block is compressed-only (BUF_BLOCK_ZIP_PAGE), If the block is compressed-only (BUF_BLOCK_ZIP_PAGE),
...@@ -428,42 +446,6 @@ next_page: ...@@ -428,42 +446,6 @@ next_page:
} }
} }
/******************************************************************//**
Gets the minimum LRU_position field for the blocks in an initial segment
(determined by BUF_LRU_INITIAL_RATIO) of the LRU list. The limit is not
guaranteed to be precise, because the ulint_clock may wrap around.
@return the limit; zero if could not determine it */
UNIV_INTERN
ulint
buf_LRU_get_recent_limit(void)
/*==========================*/
{
const buf_page_t* bpage;
ulint len;
ulint limit;
buf_pool_mutex_enter();
len = UT_LIST_GET_LEN(buf_pool->LRU);
if (len < BUF_LRU_OLD_MIN_LEN) {
/* The LRU list is too short to do read-ahead */
buf_pool_mutex_exit();
return(0);
}
bpage = UT_LIST_GET_FIRST(buf_pool->LRU);
limit = buf_page_get_LRU_position(bpage);
len /= BUF_LRU_INITIAL_RATIO;
buf_pool_mutex_exit();
return(limit > len ? (limit - len) : 0);
}
/********************************************************************//** /********************************************************************//**
Insert a compressed block into buf_pool->zip_clean in the LRU order. */ Insert a compressed block into buf_pool->zip_clean in the LRU order. */
UNIV_INTERN UNIV_INTERN
...@@ -594,6 +576,7 @@ buf_LRU_free_from_common_LRU_list( ...@@ -594,6 +576,7 @@ buf_LRU_free_from_common_LRU_list(
bpage = UT_LIST_GET_PREV(LRU, bpage), distance--) { bpage = UT_LIST_GET_PREV(LRU, bpage), distance--) {
enum buf_lru_free_block_status freed; enum buf_lru_free_block_status freed;
unsigned accessed;
mutex_t* block_mutex mutex_t* block_mutex
= buf_page_get_mutex(bpage); = buf_page_get_mutex(bpage);
...@@ -601,11 +584,18 @@ buf_LRU_free_from_common_LRU_list( ...@@ -601,11 +584,18 @@ buf_LRU_free_from_common_LRU_list(
ut_ad(bpage->in_LRU_list); ut_ad(bpage->in_LRU_list);
mutex_enter(block_mutex); mutex_enter(block_mutex);
accessed = buf_page_is_accessed(bpage);
freed = buf_LRU_free_block(bpage, TRUE, NULL); freed = buf_LRU_free_block(bpage, TRUE, NULL);
mutex_exit(block_mutex); mutex_exit(block_mutex);
switch (freed) { switch (freed) {
case BUF_LRU_FREED: case BUF_LRU_FREED:
/* Keep track of pages that are evicted without
ever being accessed. This gives us a measure of
the effectiveness of readahead */
if (!accessed) {
++buf_pool->stat.n_ra_pages_evicted;
}
return(TRUE); return(TRUE);
case BUF_LRU_NOT_FREED: case BUF_LRU_NOT_FREED:
...@@ -953,8 +943,10 @@ buf_LRU_old_adjust_len(void) ...@@ -953,8 +943,10 @@ buf_LRU_old_adjust_len(void)
ut_a(buf_pool->LRU_old); ut_a(buf_pool->LRU_old);
ut_ad(buf_pool_mutex_own()); ut_ad(buf_pool_mutex_own());
#if 3 * (BUF_LRU_OLD_MIN_LEN / 8) <= BUF_LRU_OLD_TOLERANCE + 5 ut_ad(buf_LRU_old_ratio >= BUF_LRU_OLD_RATIO_MIN);
# error "3 * (BUF_LRU_OLD_MIN_LEN / 8) <= BUF_LRU_OLD_TOLERANCE + 5" ut_ad(buf_LRU_old_ratio <= BUF_LRU_OLD_RATIO_MAX);
#if BUF_LRU_OLD_RATIO_MIN * BUF_LRU_OLD_MIN_LEN <= BUF_LRU_OLD_RATIO_DIV * (BUF_LRU_OLD_TOLERANCE + 5)
# error "BUF_LRU_OLD_RATIO_MIN * BUF_LRU_OLD_MIN_LEN <= BUF_LRU_OLD_RATIO_DIV * (BUF_LRU_OLD_TOLERANCE + 5)"
#endif #endif
#ifdef UNIV_LRU_DEBUG #ifdef UNIV_LRU_DEBUG
/* buf_pool->LRU_old must be the first item in the LRU list /* buf_pool->LRU_old must be the first item in the LRU list
...@@ -966,34 +958,39 @@ buf_LRU_old_adjust_len(void) ...@@ -966,34 +958,39 @@ buf_LRU_old_adjust_len(void)
|| UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)->old); || UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)->old);
#endif /* UNIV_LRU_DEBUG */ #endif /* UNIV_LRU_DEBUG */
for (;;) {
old_len = buf_pool->LRU_old_len; old_len = buf_pool->LRU_old_len;
new_len = 3 * (UT_LIST_GET_LEN(buf_pool->LRU) / 8); new_len = ut_min(UT_LIST_GET_LEN(buf_pool->LRU)
* buf_LRU_old_ratio / BUF_LRU_OLD_RATIO_DIV,
UT_LIST_GET_LEN(buf_pool->LRU)
- (BUF_LRU_OLD_TOLERANCE
+ BUF_LRU_NON_OLD_MIN_LEN));
ut_ad(buf_pool->LRU_old->in_LRU_list); for (;;) {
ut_a(buf_pool->LRU_old); buf_page_t* LRU_old = buf_pool->LRU_old;
ut_a(LRU_old);
ut_ad(LRU_old->in_LRU_list);
#ifdef UNIV_LRU_DEBUG #ifdef UNIV_LRU_DEBUG
ut_a(buf_pool->LRU_old->old); ut_a(LRU_old->old);
#endif /* UNIV_LRU_DEBUG */ #endif /* UNIV_LRU_DEBUG */
/* Update the LRU_old pointer if necessary */ /* Update the LRU_old pointer if necessary */
if (old_len < new_len - BUF_LRU_OLD_TOLERANCE) { if (old_len + BUF_LRU_OLD_TOLERANCE < new_len) {
buf_pool->LRU_old = UT_LIST_GET_PREV( buf_pool->LRU_old = LRU_old = UT_LIST_GET_PREV(
LRU, buf_pool->LRU_old); LRU, LRU_old);
#ifdef UNIV_LRU_DEBUG #ifdef UNIV_LRU_DEBUG
ut_a(!buf_pool->LRU_old->old); ut_a(!LRU_old->old);
#endif /* UNIV_LRU_DEBUG */ #endif /* UNIV_LRU_DEBUG */
buf_page_set_old(buf_pool->LRU_old, TRUE); buf_page_set_old(LRU_old, TRUE);
buf_pool->LRU_old_len++; old_len = ++buf_pool->LRU_old_len;
} else if (old_len > new_len + BUF_LRU_OLD_TOLERANCE) { } else if (old_len > new_len + BUF_LRU_OLD_TOLERANCE) {
buf_page_set_old(buf_pool->LRU_old, FALSE); buf_page_set_old(LRU_old, FALSE);
buf_pool->LRU_old = UT_LIST_GET_NEXT( buf_pool->LRU_old = UT_LIST_GET_NEXT(LRU, LRU_old);
LRU, buf_pool->LRU_old); old_len = --buf_pool->LRU_old_len;
buf_pool->LRU_old_len--;
} else { } else {
return; return;
} }
...@@ -1021,6 +1018,7 @@ buf_LRU_old_init(void) ...@@ -1021,6 +1018,7 @@ buf_LRU_old_init(void)
while (bpage != NULL) { while (bpage != NULL) {
ut_ad(bpage->in_LRU_list); ut_ad(bpage->in_LRU_list);
ut_ad(buf_page_in_file(bpage));
buf_page_set_old(bpage, TRUE); buf_page_set_old(bpage, TRUE);
bpage = UT_LIST_GET_NEXT(LRU, bpage); bpage = UT_LIST_GET_NEXT(LRU, bpage);
} }
...@@ -1075,16 +1073,19 @@ buf_LRU_remove_block( ...@@ -1075,16 +1073,19 @@ buf_LRU_remove_block(
if (UNIV_UNLIKELY(bpage == buf_pool->LRU_old)) { if (UNIV_UNLIKELY(bpage == buf_pool->LRU_old)) {
/* Below: the previous block is guaranteed to exist, because /* Below: the previous block is guaranteed to exist,
the LRU_old pointer is only allowed to differ by the because the LRU_old pointer is only allowed to differ
tolerance value from strict 3/8 of the LRU list length. */ by BUF_LRU_OLD_TOLERANCE from strict
buf_LRU_old_ratio/BUF_LRU_OLD_RATIO_DIV of the LRU
list length. */
buf_page_t* prev_bpage = UT_LIST_GET_PREV(LRU, bpage);
buf_pool->LRU_old = UT_LIST_GET_PREV(LRU, bpage); ut_a(prev_bpage);
ut_a(buf_pool->LRU_old);
#ifdef UNIV_LRU_DEBUG #ifdef UNIV_LRU_DEBUG
ut_a(!buf_pool->LRU_old->old); ut_a(!prev_bpage->old);
#endif /* UNIV_LRU_DEBUG */ #endif /* UNIV_LRU_DEBUG */
buf_page_set_old(buf_pool->LRU_old, TRUE); buf_pool->LRU_old = prev_bpage;
buf_page_set_old(prev_bpage, TRUE);
buf_pool->LRU_old_len++; buf_pool->LRU_old_len++;
} }
...@@ -1149,39 +1150,25 @@ buf_LRU_add_block_to_end_low( ...@@ -1149,39 +1150,25 @@ buf_LRU_add_block_to_end_low(
/*=========================*/ /*=========================*/
buf_page_t* bpage) /*!< in: control block */ buf_page_t* bpage) /*!< in: control block */
{ {
buf_page_t* last_bpage;
ut_ad(buf_pool); ut_ad(buf_pool);
ut_ad(bpage); ut_ad(bpage);
ut_ad(buf_pool_mutex_own()); ut_ad(buf_pool_mutex_own());
ut_a(buf_page_in_file(bpage)); ut_a(buf_page_in_file(bpage));
last_bpage = UT_LIST_GET_LAST(buf_pool->LRU);
if (last_bpage) {
bpage->LRU_position = last_bpage->LRU_position;
} else {
bpage->LRU_position = buf_pool_clock_tic();
}
ut_ad(!bpage->in_LRU_list); ut_ad(!bpage->in_LRU_list);
UT_LIST_ADD_LAST(LRU, buf_pool->LRU, bpage); UT_LIST_ADD_LAST(LRU, buf_pool->LRU, bpage);
ut_d(bpage->in_LRU_list = TRUE); ut_d(bpage->in_LRU_list = TRUE);
buf_page_set_old(bpage, TRUE); buf_page_set_old(bpage, TRUE);
if (UT_LIST_GET_LEN(buf_pool->LRU) >= BUF_LRU_OLD_MIN_LEN) {
buf_pool->LRU_old_len++;
}
if (UT_LIST_GET_LEN(buf_pool->LRU) > BUF_LRU_OLD_MIN_LEN) { if (UT_LIST_GET_LEN(buf_pool->LRU) > BUF_LRU_OLD_MIN_LEN) {
ut_ad(buf_pool->LRU_old); ut_ad(buf_pool->LRU_old);
/* Adjust the length of the old block list if necessary */ /* Adjust the length of the old block list if necessary */
buf_pool->LRU_old_len++;
buf_LRU_old_adjust_len(); buf_LRU_old_adjust_len();
} else if (UT_LIST_GET_LEN(buf_pool->LRU) == BUF_LRU_OLD_MIN_LEN) { } else if (UT_LIST_GET_LEN(buf_pool->LRU) == BUF_LRU_OLD_MIN_LEN) {
...@@ -1189,6 +1176,7 @@ buf_LRU_add_block_to_end_low( ...@@ -1189,6 +1176,7 @@ buf_LRU_add_block_to_end_low(
/* The LRU list is now long enough for LRU_old to become /* The LRU list is now long enough for LRU_old to become
defined: init it */ defined: init it */
buf_pool->LRU_old_len++;
buf_LRU_old_init(); buf_LRU_old_init();
} }
...@@ -1222,7 +1210,6 @@ buf_LRU_add_block_low( ...@@ -1222,7 +1210,6 @@ buf_LRU_add_block_low(
UT_LIST_ADD_FIRST(LRU, buf_pool->LRU, bpage); UT_LIST_ADD_FIRST(LRU, buf_pool->LRU, bpage);
bpage->LRU_position = buf_pool_clock_tic();
bpage->freed_page_clock = buf_pool->freed_page_clock; bpage->freed_page_clock = buf_pool->freed_page_clock;
} else { } else {
#ifdef UNIV_LRU_DEBUG #ifdef UNIV_LRU_DEBUG
...@@ -1237,11 +1224,6 @@ buf_LRU_add_block_low( ...@@ -1237,11 +1224,6 @@ buf_LRU_add_block_low(
UT_LIST_INSERT_AFTER(LRU, buf_pool->LRU, buf_pool->LRU_old, UT_LIST_INSERT_AFTER(LRU, buf_pool->LRU, buf_pool->LRU_old,
bpage); bpage);
buf_pool->LRU_old_len++; buf_pool->LRU_old_len++;
/* We copy the LRU position field of the previous block
to the new block */
bpage->LRU_position = (buf_pool->LRU_old)->LRU_position;
} }
ut_d(bpage->in_LRU_list = TRUE); ut_d(bpage->in_LRU_list = TRUE);
...@@ -1295,6 +1277,9 @@ buf_LRU_make_block_young( ...@@ -1295,6 +1277,9 @@ buf_LRU_make_block_young(
/*=====================*/ /*=====================*/
buf_page_t* bpage) /*!< in: control block */ buf_page_t* bpage) /*!< in: control block */
{ {
ut_ad(buf_pool_mutex_own());
buf_pool->stat.n_pages_made_young++;
buf_LRU_remove_block(bpage); buf_LRU_remove_block(bpage);
buf_LRU_add_block_low(bpage, FALSE); buf_LRU_add_block_low(bpage, FALSE);
} }
...@@ -1847,6 +1832,48 @@ buf_LRU_block_free_hashed_page( ...@@ -1847,6 +1832,48 @@ buf_LRU_block_free_hashed_page(
buf_LRU_block_free_non_file_page(block); buf_LRU_block_free_non_file_page(block);
} }
/**********************************************************************//**
Updates buf_LRU_old_ratio.
@return updated old_pct */
UNIV_INTERN
uint
buf_LRU_old_ratio_update(
/*=====================*/
uint old_pct,/*!< in: Reserve this percentage of
the buffer pool for "old" blocks. */
ibool adjust) /*!< in: TRUE=adjust the LRU list;
FALSE=just assign buf_LRU_old_ratio
during the initialization of InnoDB */
{
uint ratio;
ratio = old_pct * BUF_LRU_OLD_RATIO_DIV / 100;
if (ratio < BUF_LRU_OLD_RATIO_MIN) {
ratio = BUF_LRU_OLD_RATIO_MIN;
} else if (ratio > BUF_LRU_OLD_RATIO_MAX) {
ratio = BUF_LRU_OLD_RATIO_MAX;
}
if (adjust) {
buf_pool_mutex_enter();
if (ratio != buf_LRU_old_ratio) {
buf_LRU_old_ratio = ratio;
if (UT_LIST_GET_LEN(buf_pool->LRU)
>= BUF_LRU_OLD_MIN_LEN) {
buf_LRU_old_adjust_len();
}
}
buf_pool_mutex_exit();
} else {
buf_LRU_old_ratio = ratio;
}
return(ratio * 100 / BUF_LRU_OLD_RATIO_DIV);
}
/********************************************************************//** /********************************************************************//**
Update the historical stats that we are collecting for LRU eviction Update the historical stats that we are collecting for LRU eviction
policy at the end of each interval. */ policy at the end of each interval. */
...@@ -1896,7 +1923,6 @@ buf_LRU_validate(void) ...@@ -1896,7 +1923,6 @@ buf_LRU_validate(void)
buf_block_t* block; buf_block_t* block;
ulint old_len; ulint old_len;
ulint new_len; ulint new_len;
ulint LRU_pos;
ut_ad(buf_pool); ut_ad(buf_pool);
buf_pool_mutex_enter(); buf_pool_mutex_enter();
...@@ -1905,7 +1931,11 @@ buf_LRU_validate(void) ...@@ -1905,7 +1931,11 @@ buf_LRU_validate(void)
ut_a(buf_pool->LRU_old); ut_a(buf_pool->LRU_old);
old_len = buf_pool->LRU_old_len; old_len = buf_pool->LRU_old_len;
new_len = 3 * (UT_LIST_GET_LEN(buf_pool->LRU) / 8); new_len = ut_min(UT_LIST_GET_LEN(buf_pool->LRU)
* buf_LRU_old_ratio / BUF_LRU_OLD_RATIO_DIV,
UT_LIST_GET_LEN(buf_pool->LRU)
- (BUF_LRU_OLD_TOLERANCE
+ BUF_LRU_NON_OLD_MIN_LEN));
ut_a(old_len >= new_len - BUF_LRU_OLD_TOLERANCE); ut_a(old_len >= new_len - BUF_LRU_OLD_TOLERANCE);
ut_a(old_len <= new_len + BUF_LRU_OLD_TOLERANCE); ut_a(old_len <= new_len + BUF_LRU_OLD_TOLERANCE);
} }
...@@ -1943,16 +1973,7 @@ buf_LRU_validate(void) ...@@ -1943,16 +1973,7 @@ buf_LRU_validate(void)
ut_a(buf_pool->LRU_old == bpage); ut_a(buf_pool->LRU_old == bpage);
} }
LRU_pos = buf_page_get_LRU_position(bpage);
bpage = UT_LIST_GET_NEXT(LRU, bpage); bpage = UT_LIST_GET_NEXT(LRU, bpage);
if (bpage) {
/* If the following assert fails, it may
not be an error: just the buf_pool clock
has wrapped around */
ut_a(LRU_pos >= buf_page_get_LRU_position(bpage));
}
} }
if (buf_pool->LRU_old) { if (buf_pool->LRU_old) {
...@@ -2000,9 +2021,6 @@ buf_LRU_print(void) ...@@ -2000,9 +2021,6 @@ buf_LRU_print(void)
ut_ad(buf_pool); ut_ad(buf_pool);
buf_pool_mutex_enter(); buf_pool_mutex_enter();
fprintf(stderr, "Pool ulint clock %lu\n",
(ulong) buf_pool->ulint_clock);
bpage = UT_LIST_GET_FIRST(buf_pool->LRU); bpage = UT_LIST_GET_FIRST(buf_pool->LRU);
while (bpage != NULL) { while (bpage != NULL) {
...@@ -2033,18 +2051,16 @@ buf_LRU_print(void) ...@@ -2033,18 +2051,16 @@ buf_LRU_print(void)
const byte* frame; const byte* frame;
case BUF_BLOCK_FILE_PAGE: case BUF_BLOCK_FILE_PAGE:
frame = buf_block_get_frame((buf_block_t*) bpage); frame = buf_block_get_frame((buf_block_t*) bpage);
fprintf(stderr, "\nLRU pos %lu type %lu" fprintf(stderr, "\ntype %lu"
" index id %lu\n", " index id %lu\n",
(ulong) buf_page_get_LRU_position(bpage),
(ulong) fil_page_get_type(frame), (ulong) fil_page_get_type(frame),
(ulong) ut_dulint_get_low( (ulong) ut_dulint_get_low(
btr_page_get_index_id(frame))); btr_page_get_index_id(frame)));
break; break;
case BUF_BLOCK_ZIP_PAGE: case BUF_BLOCK_ZIP_PAGE:
frame = bpage->zip.data; frame = bpage->zip.data;
fprintf(stderr, "\nLRU pos %lu type %lu size %lu" fprintf(stderr, "\ntype %lu size %lu"
" index id %lu\n", " index id %lu\n",
(ulong) buf_page_get_LRU_position(bpage),
(ulong) fil_page_get_type(frame), (ulong) fil_page_get_type(frame),
(ulong) buf_page_get_zip_size(bpage), (ulong) buf_page_get_zip_size(bpage),
(ulong) ut_dulint_get_low( (ulong) ut_dulint_get_low(
...@@ -2052,8 +2068,7 @@ buf_LRU_print(void) ...@@ -2052,8 +2068,7 @@ buf_LRU_print(void)
break; break;
default: default:
fprintf(stderr, "\nLRU pos %lu !state %lu!\n", fprintf(stderr, "\n!state %lu!\n",
(ulong) buf_page_get_LRU_position(bpage),
(ulong) buf_page_get_state(bpage)); (ulong) buf_page_get_state(bpage));
break; break;
} }
......
...@@ -38,14 +38,6 @@ Created 11/5/1995 Heikki Tuuri ...@@ -38,14 +38,6 @@ Created 11/5/1995 Heikki Tuuri
#include "srv0start.h" #include "srv0start.h"
#include "srv0srv.h" #include "srv0srv.h"
/** The size in blocks of the area where the random read-ahead algorithm counts
the accessed pages when deciding whether to read-ahead */
#define BUF_READ_AHEAD_RANDOM_AREA BUF_READ_AHEAD_AREA
/** There must be at least this many pages in buf_pool in the area to start
a random read-ahead */
#define BUF_READ_AHEAD_RANDOM_THRESHOLD (1 + BUF_READ_AHEAD_RANDOM_AREA / 2)
/** The linear read-ahead area size */ /** The linear read-ahead area size */
#define BUF_READ_AHEAD_LINEAR_AREA BUF_READ_AHEAD_AREA #define BUF_READ_AHEAD_LINEAR_AREA BUF_READ_AHEAD_AREA
...@@ -62,7 +54,8 @@ flag is cleared and the x-lock released by an i/o-handler thread. ...@@ -62,7 +54,8 @@ flag is cleared and the x-lock released by an i/o-handler thread.
@return 1 if a read request was queued, 0 if the page already resided @return 1 if a read request was queued, 0 if the page already resided
in buf_pool, or if the page is in the doublewrite buffer blocks in in buf_pool, or if the page is in the doublewrite buffer blocks in
which case it is never read into the pool, or if the tablespace does which case it is never read into the pool, or if the tablespace does
not exist or is being dropped */ not exist or is being dropped
@return 1 if read request is issued. 0 if it is not */
static static
ulint ulint
buf_read_page_low( buf_read_page_low(
...@@ -164,175 +157,14 @@ buf_read_page_low( ...@@ -164,175 +157,14 @@ buf_read_page_low(
return(1); return(1);
} }
/********************************************************************//**
Applies a random read-ahead in buf_pool if there are at least a threshold
value of accessed pages from the random read-ahead area. Does not read any
page, not even the one at the position (space, offset), if the read-ahead
mechanism is not activated. NOTE 1: the calling thread may own latches on
pages: to avoid deadlocks this function must be written such that it cannot
end up waiting for these latches! NOTE 2: the calling thread must want
access to the page given: this rule is set to prevent unintended read-aheads
performed by ibuf routines, a situation which could result in a deadlock if
the OS does not support asynchronous i/o.
@return number of page read requests issued; NOTE that if we read ibuf
pages, it may happen that the page at the given page number does not
get read even if we return a positive value! */
static
ulint
buf_read_ahead_random(
/*==================*/
ulint space, /*!< in: space id */
ulint zip_size,/*!< in: compressed page size in bytes, or 0 */
ulint offset) /*!< in: page number of a page which the current thread
wants to access */
{
ib_int64_t tablespace_version;
ulint recent_blocks = 0;
ulint count;
ulint LRU_recent_limit;
ulint ibuf_mode;
ulint low, high;
ulint err;
ulint i;
ulint buf_read_ahead_random_area;
/* We have currently disabled random readahead */
return(0);
if (srv_startup_is_before_trx_rollback_phase) {
/* No read-ahead to avoid thread deadlocks */
return(0);
}
if (ibuf_bitmap_page(zip_size, offset)
|| trx_sys_hdr_page(space, offset)) {
/* If it is an ibuf bitmap page or trx sys hdr, we do
no read-ahead, as that could break the ibuf page access
order */
return(0);
}
/* Remember the tablespace version before we ask te tablespace size
below: if DISCARD + IMPORT changes the actual .ibd file meanwhile, we
do not try to read outside the bounds of the tablespace! */
tablespace_version = fil_space_get_version(space);
buf_read_ahead_random_area = BUF_READ_AHEAD_RANDOM_AREA;
low = (offset / buf_read_ahead_random_area)
* buf_read_ahead_random_area;
high = (offset / buf_read_ahead_random_area + 1)
* buf_read_ahead_random_area;
if (high > fil_space_get_size(space)) {
high = fil_space_get_size(space);
}
/* Get the minimum LRU_position field value for an initial segment
of the LRU list, to determine which blocks have recently been added
to the start of the list. */
LRU_recent_limit = buf_LRU_get_recent_limit();
buf_pool_mutex_enter();
if (buf_pool->n_pend_reads
> buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
buf_pool_mutex_exit();
return(0);
}
/* Count how many blocks in the area have been recently accessed,
that is, reside near the start of the LRU list. */
for (i = low; i < high; i++) {
const buf_page_t* bpage = buf_page_hash_get(space, i);
if (bpage
&& buf_page_is_accessed(bpage)
&& (buf_page_get_LRU_position(bpage) > LRU_recent_limit)) {
recent_blocks++;
if (recent_blocks >= BUF_READ_AHEAD_RANDOM_THRESHOLD) {
buf_pool_mutex_exit();
goto read_ahead;
}
}
}
buf_pool_mutex_exit();
/* Do nothing */
return(0);
read_ahead:
/* Read all the suitable blocks within the area */
if (ibuf_inside()) {
ibuf_mode = BUF_READ_IBUF_PAGES_ONLY;
} else {
ibuf_mode = BUF_READ_ANY_PAGE;
}
count = 0;
for (i = low; i < high; i++) {
/* It is only sensible to do read-ahead in the non-sync aio
mode: hence FALSE as the first parameter */
if (!ibuf_bitmap_page(zip_size, i)) {
count += buf_read_page_low(
&err, FALSE,
ibuf_mode | OS_AIO_SIMULATED_WAKE_LATER,
space, zip_size, FALSE,
tablespace_version, i);
if (err == DB_TABLESPACE_DELETED) {
ut_print_timestamp(stderr);
fprintf(stderr,
" InnoDB: Warning: in random"
" readahead trying to access\n"
"InnoDB: tablespace %lu page %lu,\n"
"InnoDB: but the tablespace does not"
" exist or is just being dropped.\n",
(ulong) space, (ulong) i);
}
}
}
/* In simulated aio we wake the aio handler threads only after
queuing all aio requests, in native aio the following call does
nothing: */
os_aio_simulated_wake_handler_threads();
#ifdef UNIV_DEBUG
if (buf_debug_prints && (count > 0)) {
fprintf(stderr,
"Random read-ahead space %lu offset %lu pages %lu\n",
(ulong) space, (ulong) offset,
(ulong) count);
}
#endif /* UNIV_DEBUG */
++srv_read_ahead_rnd;
return(count);
}
/********************************************************************//** /********************************************************************//**
High-level function which reads a page asynchronously from a file to the High-level function which reads a page asynchronously from a file to the
buffer buf_pool if it is not already there. Sets the io_fix flag and sets buffer buf_pool if it is not already there. Sets the io_fix flag and sets
an exclusive lock on the buffer frame. The flag is cleared and the x-lock an exclusive lock on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread. Does a random read-ahead if it seems released by the i/o-handler thread.
sensible. @return TRUE if page has been read in, FALSE in case of failure */
@return number of page read requests issued: this can be greater than
1 if read-ahead occurred */
UNIV_INTERN UNIV_INTERN
ulint ibool
buf_read_page( buf_read_page(
/*==========*/ /*==========*/
ulint space, /*!< in: space id */ ulint space, /*!< in: space id */
...@@ -341,20 +173,17 @@ buf_read_page( ...@@ -341,20 +173,17 @@ buf_read_page(
{ {
ib_int64_t tablespace_version; ib_int64_t tablespace_version;
ulint count; ulint count;
ulint count2;
ulint err; ulint err;
tablespace_version = fil_space_get_version(space); tablespace_version = fil_space_get_version(space);
count = buf_read_ahead_random(space, zip_size, offset);
/* We do the i/o in the synchronous aio mode to save thread /* We do the i/o in the synchronous aio mode to save thread
switches: hence TRUE */ switches: hence TRUE */
count2 = buf_read_page_low(&err, TRUE, BUF_READ_ANY_PAGE, space, count = buf_read_page_low(&err, TRUE, BUF_READ_ANY_PAGE, space,
zip_size, FALSE, zip_size, FALSE,
tablespace_version, offset); tablespace_version, offset);
srv_buf_pool_reads+= count2; srv_buf_pool_reads += count;
if (err == DB_TABLESPACE_DELETED) { if (err == DB_TABLESPACE_DELETED) {
ut_print_timestamp(stderr); ut_print_timestamp(stderr);
fprintf(stderr, fprintf(stderr,
...@@ -371,7 +200,7 @@ buf_read_page( ...@@ -371,7 +200,7 @@ buf_read_page(
/* Increment number of I/O operations used for LRU policy. */ /* Increment number of I/O operations used for LRU policy. */
buf_LRU_stat_inc_io(); buf_LRU_stat_inc_io();
return(count + count2); return(count > 0);
} }
/********************************************************************//** /********************************************************************//**
...@@ -498,9 +327,17 @@ buf_read_ahead_linear( ...@@ -498,9 +327,17 @@ buf_read_ahead_linear(
fail_count++; fail_count++;
} else if (pred_bpage) { } else if (pred_bpage) {
int res = (ut_ulint_cmp( /* Note that buf_page_is_accessed() returns
buf_page_get_LRU_position(bpage), the time of the first access. If some blocks
buf_page_get_LRU_position(pred_bpage))); of the extent existed in the buffer pool at
the time of a linear access pattern, the first
access times may be nonmonotonic, even though
the latest access times were linear. The
threshold (srv_read_ahead_factor) should help
a little against this. */
int res = ut_ulint_cmp(
buf_page_is_accessed(bpage),
buf_page_is_accessed(pred_bpage));
/* Accesses not in the right order */ /* Accesses not in the right order */
if (res != 0 && res != asc_or_desc) { if (res != 0 && res != asc_or_desc) {
fail_count++; fail_count++;
...@@ -643,7 +480,7 @@ buf_read_ahead_linear( ...@@ -643,7 +480,7 @@ buf_read_ahead_linear(
LRU policy decision. */ LRU policy decision. */
buf_LRU_stat_inc_io(); buf_LRU_stat_inc_io();
++srv_read_ahead_seq; buf_pool->stat.n_ra_pages_read += count;
return(count); return(count);
} }
......
...@@ -82,9 +82,10 @@ static char dict_ibfk[] = "_ibfk_"; ...@@ -82,9 +82,10 @@ static char dict_ibfk[] = "_ibfk_";
/*******************************************************************//** /*******************************************************************//**
Tries to find column names for the index and sets the col field of the Tries to find column names for the index and sets the col field of the
index. */ index.
@return TRUE if the column names were found */
static static
void ibool
dict_index_find_cols( dict_index_find_cols(
/*=================*/ /*=================*/
dict_table_t* table, /*!< in: table */ dict_table_t* table, /*!< in: table */
...@@ -1431,7 +1432,7 @@ add_field_size: ...@@ -1431,7 +1432,7 @@ add_field_size:
/**********************************************************************//** /**********************************************************************//**
Adds an index to the dictionary cache. Adds an index to the dictionary cache.
@return DB_SUCCESS or DB_TOO_BIG_RECORD */ @return DB_SUCCESS, DB_TOO_BIG_RECORD, or DB_CORRUPTION */
UNIV_INTERN UNIV_INTERN
ulint ulint
dict_index_add_to_cache( dict_index_add_to_cache(
...@@ -1457,7 +1458,10 @@ dict_index_add_to_cache( ...@@ -1457,7 +1458,10 @@ dict_index_add_to_cache(
ut_a(!dict_index_is_clust(index) ut_a(!dict_index_is_clust(index)
|| UT_LIST_GET_LEN(table->indexes) == 0); || UT_LIST_GET_LEN(table->indexes) == 0);
dict_index_find_cols(table, index); if (!dict_index_find_cols(table, index)) {
return(DB_CORRUPTION);
}
/* Build the cache internal representation of the index, /* Build the cache internal representation of the index,
containing also the added system fields */ containing also the added system fields */
...@@ -1665,9 +1669,10 @@ dict_index_remove_from_cache( ...@@ -1665,9 +1669,10 @@ dict_index_remove_from_cache(
/*******************************************************************//** /*******************************************************************//**
Tries to find column names for the index and sets the col field of the Tries to find column names for the index and sets the col field of the
index. */ index.
@return TRUE if the column names were found */
static static
void ibool
dict_index_find_cols( dict_index_find_cols(
/*=================*/ /*=================*/
dict_table_t* table, /*!< in: table */ dict_table_t* table, /*!< in: table */
...@@ -1692,17 +1697,21 @@ dict_index_find_cols( ...@@ -1692,17 +1697,21 @@ dict_index_find_cols(
} }
} }
#ifdef UNIV_DEBUG
/* It is an error not to find a matching column. */ /* It is an error not to find a matching column. */
fputs("InnoDB: Error: no matching column for ", stderr); fputs("InnoDB: Error: no matching column for ", stderr);
ut_print_name(stderr, NULL, FALSE, field->name); ut_print_name(stderr, NULL, FALSE, field->name);
fputs(" in ", stderr); fputs(" in ", stderr);
dict_index_name_print(stderr, NULL, index); dict_index_name_print(stderr, NULL, index);
fputs("!\n", stderr); fputs("!\n", stderr);
ut_error; #endif /* UNIV_DEBUG */
return(FALSE);
found: found:
; ;
} }
return(TRUE);
} }
#endif /* !UNIV_HOTBACKUP */ #endif /* !UNIV_HOTBACKUP */
......
...@@ -72,6 +72,7 @@ with this program; if not, write to the Free Software Foundation, Inc., ...@@ -72,6 +72,7 @@ with this program; if not, write to the Free Software Foundation, Inc.,
/* Include necessary InnoDB headers */ /* Include necessary InnoDB headers */
extern "C" { extern "C" {
#include "univ.i" #include "univ.i"
#include "buf0lru.h"
#include "btr0sea.h" #include "btr0sea.h"
#include "os0file.h" #include "os0file.h"
#include "os0thread.h" #include "os0thread.h"
...@@ -152,6 +153,10 @@ static ulong innobase_write_io_threads; ...@@ -152,6 +153,10 @@ static ulong innobase_write_io_threads;
static long long innobase_buffer_pool_size, innobase_log_file_size; static long long innobase_buffer_pool_size, innobase_log_file_size;
/** Percentage of the buffer pool to reserve for 'old' blocks.
Connected to buf_LRU_old_ratio. */
static uint innobase_old_blocks_pct;
/* The default values for the following char* start-up parameters /* The default values for the following char* start-up parameters
are determined in innobase_init below: */ are determined in innobase_init below: */
...@@ -490,10 +495,10 @@ static SHOW_VAR innodb_status_variables[]= { ...@@ -490,10 +495,10 @@ static SHOW_VAR innodb_status_variables[]= {
(char*) &export_vars.innodb_buffer_pool_pages_misc, SHOW_LONG}, (char*) &export_vars.innodb_buffer_pool_pages_misc, SHOW_LONG},
{"buffer_pool_pages_total", {"buffer_pool_pages_total",
(char*) &export_vars.innodb_buffer_pool_pages_total, SHOW_LONG}, (char*) &export_vars.innodb_buffer_pool_pages_total, SHOW_LONG},
{"buffer_pool_read_ahead_rnd", {"buffer_pool_read_ahead",
(char*) &export_vars.innodb_buffer_pool_read_ahead_rnd, SHOW_LONG}, (char*) &export_vars.innodb_buffer_pool_read_ahead, SHOW_LONG},
{"buffer_pool_read_ahead_seq", {"buffer_pool_read_ahead_evicted",
(char*) &export_vars.innodb_buffer_pool_read_ahead_seq, SHOW_LONG}, (char*) &export_vars.innodb_buffer_pool_read_ahead_evicted, SHOW_LONG},
{"buffer_pool_read_requests", {"buffer_pool_read_requests",
(char*) &export_vars.innodb_buffer_pool_read_requests, SHOW_LONG}, (char*) &export_vars.innodb_buffer_pool_read_requests, SHOW_LONG},
{"buffer_pool_reads", {"buffer_pool_reads",
...@@ -2204,6 +2209,9 @@ innobase_change_buffering_inited_ok: ...@@ -2204,6 +2209,9 @@ innobase_change_buffering_inited_ok:
ut_a(0 == strcmp(my_charset_latin1.name, "latin1_swedish_ci")); ut_a(0 == strcmp(my_charset_latin1.name, "latin1_swedish_ci"));
srv_latin1_ordering = my_charset_latin1.sort_order; srv_latin1_ordering = my_charset_latin1.sort_order;
innobase_old_blocks_pct = buf_LRU_old_ratio_update(
innobase_old_blocks_pct, FALSE);
innobase_commit_concurrency_init_default(); innobase_commit_concurrency_init_default();
/* Since we in this module access directly the fields of a trx /* Since we in this module access directly the fields of a trx
...@@ -9610,6 +9618,25 @@ innodb_adaptive_hash_index_update( ...@@ -9610,6 +9618,25 @@ innodb_adaptive_hash_index_update(
} }
} }
/****************************************************************//**
Update the system variable innodb_old_blocks_pct using the "saved"
value. This function is registered as a callback with MySQL. */
static
void
innodb_old_blocks_pct_update(
/*=========================*/
THD* thd, /*!< in: thread handle */
struct st_mysql_sys_var* var, /*!< in: pointer to
system variable */
void* var_ptr,/*!< out: where the
formal string goes */
const void* save) /*!< in: immediate result
from check function */
{
innobase_old_blocks_pct = buf_LRU_old_ratio_update(
*static_cast<const uint*>(save), TRUE);
}
/*************************************************************//** /*************************************************************//**
Check if it is a valid value of innodb_change_buffering. This function is Check if it is a valid value of innodb_change_buffering. This function is
registered as a callback with MySQL. registered as a callback with MySQL.
...@@ -9847,7 +9874,7 @@ static MYSQL_SYSVAR_ULONG(concurrency_tickets, srv_n_free_tickets_to_enter, ...@@ -9847,7 +9874,7 @@ static MYSQL_SYSVAR_ULONG(concurrency_tickets, srv_n_free_tickets_to_enter,
NULL, NULL, 500L, 1L, ~0L, 0); NULL, NULL, 500L, 1L, ~0L, 0);
static MYSQL_SYSVAR_LONG(file_io_threads, innobase_file_io_threads, static MYSQL_SYSVAR_LONG(file_io_threads, innobase_file_io_threads,
PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY, PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY | PLUGIN_VAR_NOSYSVAR,
"Number of file I/O threads in InnoDB.", "Number of file I/O threads in InnoDB.",
NULL, NULL, 4, 4, 64, 0); NULL, NULL, 4, 4, 64, 0);
...@@ -9886,6 +9913,18 @@ static MYSQL_SYSVAR_LONG(mirrored_log_groups, innobase_mirrored_log_groups, ...@@ -9886,6 +9913,18 @@ static MYSQL_SYSVAR_LONG(mirrored_log_groups, innobase_mirrored_log_groups,
"Number of identical copies of log groups we keep for the database. Currently this should be set to 1.", "Number of identical copies of log groups we keep for the database. Currently this should be set to 1.",
NULL, NULL, 1, 1, 10, 0); NULL, NULL, 1, 1, 10, 0);
static MYSQL_SYSVAR_UINT(old_blocks_pct, innobase_old_blocks_pct,
PLUGIN_VAR_RQCMDARG,
"Percentage of the buffer pool to reserve for 'old' blocks.",
NULL, innodb_old_blocks_pct_update, 100 * 3 / 8, 5, 95, 0);
static MYSQL_SYSVAR_UINT(old_blocks_time, buf_LRU_old_threshold_ms,
PLUGIN_VAR_RQCMDARG,
"Move blocks to the 'new' end of the buffer pool if the first access"
" was at least this many milliseconds ago."
" The timeout is disabled if 0 (the default).",
NULL, NULL, 0, 0, UINT_MAX32, 0);
static MYSQL_SYSVAR_LONG(open_files, innobase_open_files, static MYSQL_SYSVAR_LONG(open_files, innobase_open_files,
PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY, PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
"How many files at the maximum InnoDB keeps open at the same time.", "How many files at the maximum InnoDB keeps open at the same time.",
...@@ -9984,6 +10023,8 @@ static struct st_mysql_sys_var* innobase_system_variables[]= { ...@@ -9984,6 +10023,8 @@ static struct st_mysql_sys_var* innobase_system_variables[]= {
MYSQL_SYSVAR(adaptive_flushing), MYSQL_SYSVAR(adaptive_flushing),
MYSQL_SYSVAR(max_purge_lag), MYSQL_SYSVAR(max_purge_lag),
MYSQL_SYSVAR(mirrored_log_groups), MYSQL_SYSVAR(mirrored_log_groups),
MYSQL_SYSVAR(old_blocks_pct),
MYSQL_SYSVAR(old_blocks_time),
MYSQL_SYSVAR(open_files), MYSQL_SYSVAR(open_files),
MYSQL_SYSVAR(rollback_on_timeout), MYSQL_SYSVAR(rollback_on_timeout),
MYSQL_SYSVAR(stats_on_metadata), MYSQL_SYSVAR(stats_on_metadata),
......
...@@ -707,15 +707,6 @@ buf_page_belongs_to_unzip_LRU( ...@@ -707,15 +707,6 @@ buf_page_belongs_to_unzip_LRU(
/*==========================*/ /*==========================*/
const buf_page_t* bpage) /*!< in: pointer to control block */ const buf_page_t* bpage) /*!< in: pointer to control block */
__attribute__((pure)); __attribute__((pure));
/*********************************************************************//**
Determine the approximate LRU list position of a block.
@return LRU list position */
UNIV_INLINE
ulint
buf_page_get_LRU_position(
/*======================*/
const buf_page_t* bpage) /*!< in: control block */
__attribute__((pure));
/*********************************************************************//** /*********************************************************************//**
Gets the mutex of a block. Gets the mutex of a block.
...@@ -816,22 +807,22 @@ buf_page_set_old( ...@@ -816,22 +807,22 @@ buf_page_set_old(
buf_page_t* bpage, /*!< in/out: control block */ buf_page_t* bpage, /*!< in/out: control block */
ibool old); /*!< in: old */ ibool old); /*!< in: old */
/*********************************************************************//** /*********************************************************************//**
Determine if a block has been accessed in the buffer pool. Determine the time of last access a block in the buffer pool.
@return TRUE if accessed */ @return ut_time_ms() at the time of last access, 0 if not accessed */
UNIV_INLINE UNIV_INLINE
ibool unsigned
buf_page_is_accessed( buf_page_is_accessed(
/*=================*/ /*=================*/
const buf_page_t* bpage) /*!< in: control block */ const buf_page_t* bpage) /*!< in: control block */
__attribute__((pure)); __attribute__((nonnull, pure));
/*********************************************************************//** /*********************************************************************//**
Flag a block accessed. */ Flag a block accessed. */
UNIV_INLINE UNIV_INLINE
void void
buf_page_set_accessed( buf_page_set_accessed(
/*==================*/ /*==================*/
buf_page_t* bpage, /*!< in/out: control block */ buf_page_t* bpage) /*!< in/out: control block */
ibool accessed); /*!< in: accessed */ __attribute__((nonnull));
/*********************************************************************//** /*********************************************************************//**
Gets the buf_block_t handle of a buffered file block if an uncompressed Gets the buf_block_t handle of a buffered file block if an uncompressed
page frame exists, or NULL. page frame exists, or NULL.
...@@ -1017,14 +1008,6 @@ buf_block_hash_get( ...@@ -1017,14 +1008,6 @@ buf_block_hash_get(
/*===============*/ /*===============*/
ulint space, /*!< in: space id */ ulint space, /*!< in: space id */
ulint offset);/*!< in: offset of the page within space */ ulint offset);/*!< in: offset of the page within space */
/*******************************************************************//**
Increments the pool clock by one and returns its new value. Remember that
in the 32 bit version the clock wraps around at 4 billion!
@return new clock value */
UNIV_INLINE
ulint
buf_pool_clock_tic(void);
/*====================*/
/*********************************************************************//** /*********************************************************************//**
Gets the current length of the free list of buffer blocks. Gets the current length of the free list of buffer blocks.
@return length of the free list */ @return length of the free list */
...@@ -1064,16 +1047,10 @@ struct buf_page_struct{ ...@@ -1064,16 +1047,10 @@ struct buf_page_struct{
flushed to disk, this tells the flushed to disk, this tells the
flush_type. flush_type.
@see enum buf_flush */ @see enum buf_flush */
unsigned accessed:1; /*!< TRUE if the page has been accessed
while in the buffer pool: read-ahead
may read in pages which have not been
accessed yet; a thread is allowed to
read this for heuristic purposes
without holding any mutex or latch */
unsigned io_fix:2; /*!< type of pending I/O operation; unsigned io_fix:2; /*!< type of pending I/O operation;
also protected by buf_pool_mutex also protected by buf_pool_mutex
@see enum buf_io_fix */ @see enum buf_io_fix */
unsigned buf_fix_count:24;/*!< count of how manyfold this block unsigned buf_fix_count:25;/*!< count of how manyfold this block
is currently bufferfixed */ is currently bufferfixed */
/* @} */ /* @} */
#endif /* !UNIV_HOTBACKUP */ #endif /* !UNIV_HOTBACKUP */
...@@ -1152,17 +1129,7 @@ struct buf_page_struct{ ...@@ -1152,17 +1129,7 @@ struct buf_page_struct{
#endif /* UNIV_DEBUG */ #endif /* UNIV_DEBUG */
unsigned old:1; /*!< TRUE if the block is in the old unsigned old:1; /*!< TRUE if the block is in the old
blocks in the LRU list */ blocks in the LRU list */
unsigned LRU_position:31;/*!< value which monotonically unsigned freed_page_clock:31;/*!< the value of
decreases (or may stay
constant if old==TRUE) toward
the end of the LRU list, if
buf_pool->ulint_clock has not
wrapped around: NOTE that this
value can only be used in
heuristic algorithms, because
of the possibility of a
wrap-around! */
unsigned freed_page_clock:32;/*!< the value of
buf_pool->freed_page_clock buf_pool->freed_page_clock
when this block was the last when this block was the last
time put to the head of the time put to the head of the
...@@ -1170,6 +1137,9 @@ struct buf_page_struct{ ...@@ -1170,6 +1137,9 @@ struct buf_page_struct{
to read this for heuristic to read this for heuristic
purposes without holding any purposes without holding any
mutex or latch */ mutex or latch */
unsigned access_time:32; /*!< time of first access, or
0 if the block was never accessed
in the buffer pool */
/* @} */ /* @} */
# ifdef UNIV_DEBUG_FILE_ACCESSES # ifdef UNIV_DEBUG_FILE_ACCESSES
ibool file_page_was_freed; ibool file_page_was_freed;
...@@ -1314,6 +1284,31 @@ Compute the hash fold value for blocks in buf_pool->zip_hash. */ ...@@ -1314,6 +1284,31 @@ Compute the hash fold value for blocks in buf_pool->zip_hash. */
#define BUF_POOL_ZIP_FOLD_BPAGE(b) BUF_POOL_ZIP_FOLD((buf_block_t*) (b)) #define BUF_POOL_ZIP_FOLD_BPAGE(b) BUF_POOL_ZIP_FOLD((buf_block_t*) (b))
/* @} */ /* @} */
/** @brief The buffer pool statistics structure. */
struct buf_pool_stat_struct{
ulint n_page_gets; /*!< number of page gets performed;
also successful searches through
the adaptive hash index are
counted as page gets; this field
is NOT protected by the buffer
pool mutex */
ulint n_pages_read; /*!< number read operations */
ulint n_pages_written;/*!< number write operations */
ulint n_pages_created;/*!< number of pages created
in the pool with no read */
ulint n_ra_pages_read;/*!< number of pages read in
as part of read ahead */
ulint n_ra_pages_evicted;/*!< number of read ahead
pages that are evicted without
being accessed */
ulint n_pages_made_young; /*!< number of pages made young, in
calls to buf_LRU_make_block_young() */
ulint n_pages_not_made_young; /*!< number of pages not made
young because the first access
was not long enough ago, in
buf_page_peek_if_too_old() */
};
/** @brief The buffer pool structure. /** @brief The buffer pool structure.
NOTE! The definition appears here only for other modules of this NOTE! The definition appears here only for other modules of this
...@@ -1338,28 +1333,16 @@ struct buf_pool_struct{ ...@@ -1338,28 +1333,16 @@ struct buf_pool_struct{
ulint n_pend_reads; /*!< number of pending read operations */ ulint n_pend_reads; /*!< number of pending read operations */
ulint n_pend_unzip; /*!< number of pending decompressions */ ulint n_pend_unzip; /*!< number of pending decompressions */
time_t last_printout_time; /*!< when buf_print was last time time_t last_printout_time;
/*!< when buf_print_io was last time
called */ called */
ulint n_pages_read; /*!< number read operations */ buf_pool_stat_t stat; /*!< current statistics */
ulint n_pages_written;/*!< number write operations */ buf_pool_stat_t old_stat; /*!< old statistics */
ulint n_pages_created;/*!< number of pages created
in the pool with no read */
ulint n_page_gets; /*!< number of page gets performed;
also successful searches through
the adaptive hash index are
counted as page gets; this field
is NOT protected by the buffer
pool mutex */
ulint n_page_gets_old;/*!< n_page_gets when buf_print was
last time called: used to calculate
hit rate */
ulint n_pages_read_old;/*!< n_pages_read when buf_print was
last time called */
ulint n_pages_written_old;/*!< number write operations */
ulint n_pages_created_old;/*!< number of pages created in
the pool with no read */
/* @} */ /* @} */
/** @name Page flushing algorithm fields */ /** @name Page flushing algorithm fields */
/* @{ */ /* @{ */
UT_LIST_BASE_NODE_T(buf_page_t) flush_list; UT_LIST_BASE_NODE_T(buf_page_t) flush_list;
...@@ -1375,10 +1358,6 @@ struct buf_pool_struct{ ...@@ -1375,10 +1358,6 @@ struct buf_pool_struct{
/*!< this is in the set state /*!< this is in the set state
when there is no flush batch when there is no flush batch
of the given type running */ of the given type running */
ulint ulint_clock; /*!< a sequence number used to count
time. NOTE! This counter wraps
around at 4 billion (if ulint ==
32 bits)! */
ulint freed_page_clock;/*!< a sequence number used ulint freed_page_clock;/*!< a sequence number used
to count the number of buffer to count the number of buffer
blocks removed from the end of blocks removed from the end of
...@@ -1402,9 +1381,11 @@ struct buf_pool_struct{ ...@@ -1402,9 +1381,11 @@ struct buf_pool_struct{
block list */ block list */
UT_LIST_BASE_NODE_T(buf_page_t) LRU; UT_LIST_BASE_NODE_T(buf_page_t) LRU;
/*!< base node of the LRU list */ /*!< base node of the LRU list */
buf_page_t* LRU_old; /*!< pointer to the about 3/8 oldest buf_page_t* LRU_old; /*!< pointer to the about
blocks in the LRU list; NULL if LRU buf_LRU_old_ratio/BUF_LRU_OLD_RATIO_DIV
length less than BUF_LRU_OLD_MIN_LEN; oldest blocks in the LRU list;
NULL if LRU length less than
BUF_LRU_OLD_MIN_LEN;
NOTE: when LRU_old != NULL, its length NOTE: when LRU_old != NULL, its length
should always equal LRU_old_len */ should always equal LRU_old_len */
ulint LRU_old_len; /*!< length of the LRU list from ulint LRU_old_len; /*!< length of the LRU list from
......
...@@ -72,9 +72,24 @@ buf_page_peek_if_too_old( ...@@ -72,9 +72,24 @@ buf_page_peek_if_too_old(
/*=====================*/ /*=====================*/
const buf_page_t* bpage) /*!< in: block to make younger */ const buf_page_t* bpage) /*!< in: block to make younger */
{ {
return(buf_pool->freed_page_clock if (buf_LRU_old_threshold_ms && bpage->old) {
>= buf_page_get_freed_page_clock(bpage) unsigned access_time = buf_page_is_accessed(bpage);
+ 1 + (buf_pool->curr_size / 4));
if (access_time && ut_time_ms() - access_time
>= buf_LRU_old_threshold_ms) {
return(TRUE);
}
buf_pool->stat.n_pages_not_made_young++;
return(FALSE);
} else {
/* FIXME: bpage->freed_page_clock is 31 bits */
return((buf_pool->freed_page_clock & ~(1 << 31))
> bpage->freed_page_clock
+ (buf_pool->curr_size
* (BUF_LRU_OLD_RATIO_DIV - buf_LRU_old_ratio)
/ (BUF_LRU_OLD_RATIO_DIV * 4)));
}
} }
/*********************************************************************//** /*********************************************************************//**
...@@ -118,22 +133,6 @@ buf_pool_get_oldest_modification(void) ...@@ -118,22 +133,6 @@ buf_pool_get_oldest_modification(void)
return(lsn); return(lsn);
} }
/*******************************************************************//**
Increments the buf_pool clock by one and returns its new value. Remember
that in the 32 bit version the clock wraps around at 4 billion!
@return new clock value */
UNIV_INLINE
ulint
buf_pool_clock_tic(void)
/*====================*/
{
ut_ad(buf_pool_mutex_own());
buf_pool->ulint_clock++;
return(buf_pool->ulint_clock);
}
#endif /* !UNIV_HOTBACKUP */ #endif /* !UNIV_HOTBACKUP */
/*********************************************************************//** /*********************************************************************//**
...@@ -279,21 +278,6 @@ buf_page_belongs_to_unzip_LRU( ...@@ -279,21 +278,6 @@ buf_page_belongs_to_unzip_LRU(
&& buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE); && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE);
} }
/*********************************************************************//**
Determine the approximate LRU list position of a block.
@return LRU list position */
UNIV_INLINE
ulint
buf_page_get_LRU_position(
/*======================*/
const buf_page_t* bpage) /*!< in: control block */
{
ut_ad(buf_page_in_file(bpage));
ut_ad(buf_pool_mutex_own());
return(bpage->LRU_position);
}
/*********************************************************************//** /*********************************************************************//**
Gets the mutex of a block. Gets the mutex of a block.
@return pointer to mutex protecting bpage */ @return pointer to mutex protecting bpage */
...@@ -487,17 +471,17 @@ buf_page_set_old( ...@@ -487,17 +471,17 @@ buf_page_set_old(
} }
/*********************************************************************//** /*********************************************************************//**
Determine if a block has been accessed in the buffer pool. Determine the time of last access a block in the buffer pool.
@return TRUE if accessed */ @return ut_time_ms() at the time of last access, 0 if not accessed */
UNIV_INLINE UNIV_INLINE
ibool unsigned
buf_page_is_accessed( buf_page_is_accessed(
/*=================*/ /*=================*/
const buf_page_t* bpage) /*!< in: control block */ const buf_page_t* bpage) /*!< in: control block */
{ {
ut_ad(buf_page_in_file(bpage)); ut_ad(buf_page_in_file(bpage));
return(bpage->accessed); return(bpage->access_time);
} }
/*********************************************************************//** /*********************************************************************//**
...@@ -506,13 +490,15 @@ UNIV_INLINE ...@@ -506,13 +490,15 @@ UNIV_INLINE
void void
buf_page_set_accessed( buf_page_set_accessed(
/*==================*/ /*==================*/
buf_page_t* bpage, /*!< in/out: control block */ buf_page_t* bpage) /*!< in/out: control block */
ibool accessed) /*!< in: accessed */
{ {
ut_a(buf_page_in_file(bpage)); ut_a(buf_page_in_file(bpage));
ut_ad(mutex_own(buf_page_get_mutex(bpage))); ut_ad(buf_pool_mutex_own());
bpage->accessed = accessed; if (!bpage->access_time) {
/* Make this the time of the first access. */
bpage->access_time = ut_time_ms();
}
} }
/*********************************************************************//** /*********************************************************************//**
......
...@@ -69,7 +69,7 @@ These are low-level functions ...@@ -69,7 +69,7 @@ These are low-level functions
#########################################################################*/ #########################################################################*/
/** Minimum LRU list length for which the LRU_old pointer is defined */ /** Minimum LRU list length for which the LRU_old pointer is defined */
#define BUF_LRU_OLD_MIN_LEN 80 #define BUF_LRU_OLD_MIN_LEN 512 /* 8 megabytes of 16k pages */
/** Maximum LRU list search length in buf_flush_LRU_recommendation() */ /** Maximum LRU list search length in buf_flush_LRU_recommendation() */
#define BUF_LRU_FREE_SEARCH_LEN (5 + 2 * BUF_READ_AHEAD_AREA) #define BUF_LRU_FREE_SEARCH_LEN (5 + 2 * BUF_READ_AHEAD_AREA)
...@@ -84,15 +84,6 @@ void ...@@ -84,15 +84,6 @@ void
buf_LRU_invalidate_tablespace( buf_LRU_invalidate_tablespace(
/*==========================*/ /*==========================*/
ulint id); /*!< in: space id */ ulint id); /*!< in: space id */
/******************************************************************//**
Gets the minimum LRU_position field for the blocks in an initial segment
(determined by BUF_LRU_INITIAL_RATIO) of the LRU list. The limit is not
guaranteed to be precise, because the ulint_clock may wrap around.
@return the limit; zero if could not determine it */
UNIV_INTERN
ulint
buf_LRU_get_recent_limit(void);
/*==========================*/
/********************************************************************//** /********************************************************************//**
Insert a compressed block into buf_pool->zip_clean in the LRU order. */ Insert a compressed block into buf_pool->zip_clean in the LRU order. */
UNIV_INTERN UNIV_INTERN
...@@ -201,6 +192,18 @@ void ...@@ -201,6 +192,18 @@ void
buf_LRU_make_block_old( buf_LRU_make_block_old(
/*===================*/ /*===================*/
buf_page_t* bpage); /*!< in: control block */ buf_page_t* bpage); /*!< in: control block */
/**********************************************************************//**
Updates buf_LRU_old_ratio.
@return updated old_pct */
UNIV_INTERN
uint
buf_LRU_old_ratio_update(
/*=====================*/
uint old_pct,/*!< in: Reserve this percentage of
the buffer pool for "old" blocks. */
ibool adjust);/*!< in: TRUE=adjust the LRU list;
FALSE=just assign buf_LRU_old_ratio
during the initialization of InnoDB */
/********************************************************************//** /********************************************************************//**
Update the historical stats that we are collecting for LRU eviction Update the historical stats that we are collecting for LRU eviction
policy at the end of each interval. */ policy at the end of each interval. */
...@@ -227,6 +230,35 @@ buf_LRU_print(void); ...@@ -227,6 +230,35 @@ buf_LRU_print(void);
/*===============*/ /*===============*/
#endif /* UNIV_DEBUG_PRINT || UNIV_DEBUG || UNIV_BUF_DEBUG */ #endif /* UNIV_DEBUG_PRINT || UNIV_DEBUG || UNIV_BUF_DEBUG */
/** @name Heuristics for detecting index scan @{ */
/** Reserve this much/BUF_LRU_OLD_RATIO_DIV of the buffer pool for
"old" blocks. Protected by buf_pool_mutex. */
extern uint buf_LRU_old_ratio;
/** The denominator of buf_LRU_old_ratio. */
#define BUF_LRU_OLD_RATIO_DIV 1024
/** Maximum value of buf_LRU_old_ratio.
@see buf_LRU_old_adjust_len
@see buf_LRU_old_ratio_update */
#define BUF_LRU_OLD_RATIO_MAX BUF_LRU_OLD_RATIO_DIV
/** Minimum value of buf_LRU_old_ratio.
@see buf_LRU_old_adjust_len
@see buf_LRU_old_ratio_update
The minimum must exceed
(BUF_LRU_OLD_TOLERANCE + 5) * BUF_LRU_OLD_RATIO_DIV / BUF_LRU_OLD_MIN_LEN. */
#define BUF_LRU_OLD_RATIO_MIN 51
#if BUF_LRU_OLD_RATIO_MIN >= BUF_LRU_OLD_RATIO_MAX
# error "BUF_LRU_OLD_RATIO_MIN >= BUF_LRU_OLD_RATIO_MAX"
#endif
#if BUF_LRU_OLD_RATIO_MAX > BUF_LRU_OLD_RATIO_DIV
# error "BUF_LRU_OLD_RATIO_MAX > BUF_LRU_OLD_RATIO_DIV"
#endif
/** Move blocks to "new" LRU list only if the first access was at
least this many milliseconds ago. Not protected by any mutex or latch. */
extern uint buf_LRU_old_threshold_ms;
/* @} */
/** @brief Statistics for selecting the LRU list for eviction. /** @brief Statistics for selecting the LRU list for eviction.
These statistics are not 'of' LRU but 'for' LRU. We keep count of I/O These statistics are not 'of' LRU but 'for' LRU. We keep count of I/O
......
...@@ -33,12 +33,10 @@ Created 11/5/1995 Heikki Tuuri ...@@ -33,12 +33,10 @@ Created 11/5/1995 Heikki Tuuri
High-level function which reads a page asynchronously from a file to the High-level function which reads a page asynchronously from a file to the
buffer buf_pool if it is not already there. Sets the io_fix flag and sets buffer buf_pool if it is not already there. Sets the io_fix flag and sets
an exclusive lock on the buffer frame. The flag is cleared and the x-lock an exclusive lock on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread. Does a random read-ahead if it seems released by the i/o-handler thread.
sensible. @return TRUE if page has been read in, FALSE in case of failure */
@return number of page read requests issued: this can be greater than
1 if read-ahead occurred */
UNIV_INTERN UNIV_INTERN
ulint ibool
buf_read_page( buf_read_page(
/*==========*/ /*==========*/
ulint space, /*!< in: space id */ ulint space, /*!< in: space id */
......
...@@ -34,6 +34,8 @@ typedef struct buf_block_struct buf_block_t; ...@@ -34,6 +34,8 @@ typedef struct buf_block_struct buf_block_t;
typedef struct buf_chunk_struct buf_chunk_t; typedef struct buf_chunk_struct buf_chunk_t;
/** Buffer pool comprising buf_chunk_t */ /** Buffer pool comprising buf_chunk_t */
typedef struct buf_pool_struct buf_pool_t; typedef struct buf_pool_struct buf_pool_t;
/** Buffer pool statistics struct */
typedef struct buf_pool_stat_struct buf_pool_stat_t;
/** A buffer frame. @see page_t */ /** A buffer frame. @see page_t */
typedef byte buf_frame_t; typedef byte buf_frame_t;
......
...@@ -712,7 +712,7 @@ dict_index_find_on_id_low( ...@@ -712,7 +712,7 @@ dict_index_find_on_id_low(
dulint id); /*!< in: index id */ dulint id); /*!< in: index id */
/**********************************************************************//** /**********************************************************************//**
Adds an index to the dictionary cache. Adds an index to the dictionary cache.
@return DB_SUCCESS or error code */ @return DB_SUCCESS, DB_TOO_BIG_RECORD, or DB_CORRUPTION */
UNIV_INTERN UNIV_INTERN
ulint ulint
dict_index_add_to_cache( dict_index_add_to_cache(
......
...@@ -285,7 +285,7 @@ os_fast_mutex_free( ...@@ -285,7 +285,7 @@ os_fast_mutex_free(
/**********************************************************//** /**********************************************************//**
Atomic compare-and-swap and increment for InnoDB. */ Atomic compare-and-swap and increment for InnoDB. */
#ifdef HAVE_GCC_ATOMIC_BUILTINS #ifdef HAVE_IB_GCC_ATOMIC_BUILTINS
/**********************************************************//** /**********************************************************//**
Returns true if swapped, ptr is pointer to target, old_val is value to Returns true if swapped, ptr is pointer to target, old_val is value to
compare to, new_val is the value to swap in. */ compare to, new_val is the value to swap in. */
...@@ -377,7 +377,7 @@ InterlockedExchange() operates on LONG, and the LONG will be ...@@ -377,7 +377,7 @@ InterlockedExchange() operates on LONG, and the LONG will be
clobbered */ clobbered */
# define os_atomic_test_and_set_byte(ptr, new_val) \ # define os_atomic_test_and_set_byte(ptr, new_val) \
((byte) InterlockedExchange(ptr, new_val)) ((byte) InterlockedExchange(ptr, new_val))
#endif /* HAVE_GCC_ATOMIC_BUILTINS */ #endif /* HAVE_IB_GCC_ATOMIC_BUILTINS */
#ifndef UNIV_NONINL #ifndef UNIV_NONINL
#include "os0sync.ic" #include "os0sync.ic"
......
...@@ -315,10 +315,6 @@ extern ulint srv_buf_pool_flushed; ...@@ -315,10 +315,6 @@ extern ulint srv_buf_pool_flushed;
/** Number of buffer pool reads that led to the /** Number of buffer pool reads that led to the
reading of a disk page */ reading of a disk page */
extern ulint srv_buf_pool_reads; extern ulint srv_buf_pool_reads;
/** Number of sequential read-aheads */
extern ulint srv_read_ahead_seq;
/** Number of random read-aheads */
extern ulint srv_read_ahead_rnd;
/** Status variables to be passed to MySQL */ /** Status variables to be passed to MySQL */
typedef struct export_var_struct export_struc; typedef struct export_var_struct export_struc;
...@@ -605,13 +601,13 @@ struct export_var_struct{ ...@@ -605,13 +601,13 @@ struct export_var_struct{
#ifdef UNIV_DEBUG #ifdef UNIV_DEBUG
ulint innodb_buffer_pool_pages_latched; /*!< Latched pages */ ulint innodb_buffer_pool_pages_latched; /*!< Latched pages */
#endif /* UNIV_DEBUG */ #endif /* UNIV_DEBUG */
ulint innodb_buffer_pool_read_requests; /*!< buf_pool->n_page_gets */ ulint innodb_buffer_pool_read_requests; /*!< buf_pool->stat.n_page_gets */
ulint innodb_buffer_pool_reads; /*!< srv_buf_pool_reads */ ulint innodb_buffer_pool_reads; /*!< srv_buf_pool_reads */
ulint innodb_buffer_pool_wait_free; /*!< srv_buf_pool_wait_free */ ulint innodb_buffer_pool_wait_free; /*!< srv_buf_pool_wait_free */
ulint innodb_buffer_pool_pages_flushed; /*!< srv_buf_pool_flushed */ ulint innodb_buffer_pool_pages_flushed; /*!< srv_buf_pool_flushed */
ulint innodb_buffer_pool_write_requests;/*!< srv_buf_pool_write_requests */ ulint innodb_buffer_pool_write_requests;/*!< srv_buf_pool_write_requests */
ulint innodb_buffer_pool_read_ahead_seq;/*!< srv_read_ahead_seq */ ulint innodb_buffer_pool_read_ahead; /*!< srv_read_ahead */
ulint innodb_buffer_pool_read_ahead_rnd;/*!< srv_read_ahead_rnd */ ulint innodb_buffer_pool_read_ahead_evicted;/*!< srv_read_ahead evicted*/
ulint innodb_dblwr_pages_written; /*!< srv_dblwr_pages_written */ ulint innodb_dblwr_pages_written; /*!< srv_dblwr_pages_written */
ulint innodb_dblwr_writes; /*!< srv_dblwr_writes */ ulint innodb_dblwr_writes; /*!< srv_dblwr_writes */
ibool innodb_have_atomic_builtins; /*!< HAVE_ATOMIC_BUILTINS */ ibool innodb_have_atomic_builtins; /*!< HAVE_ATOMIC_BUILTINS */
...@@ -623,9 +619,9 @@ struct export_var_struct{ ...@@ -623,9 +619,9 @@ struct export_var_struct{
ulint innodb_os_log_pending_writes; /*!< srv_os_log_pending_writes */ ulint innodb_os_log_pending_writes; /*!< srv_os_log_pending_writes */
ulint innodb_os_log_pending_fsyncs; /*!< fil_n_pending_log_flushes */ ulint innodb_os_log_pending_fsyncs; /*!< fil_n_pending_log_flushes */
ulint innodb_page_size; /*!< UNIV_PAGE_SIZE */ ulint innodb_page_size; /*!< UNIV_PAGE_SIZE */
ulint innodb_pages_created; /*!< buf_pool->n_pages_created */ ulint innodb_pages_created; /*!< buf_pool->stat.n_pages_created */
ulint innodb_pages_read; /*!< buf_pool->n_pages_read */ ulint innodb_pages_read; /*!< buf_pool->stat.n_pages_read */
ulint innodb_pages_written; /*!< buf_pool->n_pages_written */ ulint innodb_pages_written; /*!< buf_pool->stat.n_pages_written */
ulint innodb_row_lock_waits; /*!< srv_n_lock_wait_count */ ulint innodb_row_lock_waits; /*!< srv_n_lock_wait_count */
ulint innodb_row_lock_current_waits; /*!< srv_n_lock_wait_current_count */ ulint innodb_row_lock_current_waits; /*!< srv_n_lock_wait_current_count */
ib_int64_t innodb_row_lock_time; /*!< srv_n_lock_wait_time ib_int64_t innodb_row_lock_time; /*!< srv_n_lock_wait_time
......
...@@ -125,11 +125,11 @@ if we are compiling on Windows. */ ...@@ -125,11 +125,11 @@ if we are compiling on Windows. */
# include <sched.h> # include <sched.h>
# endif # endif
# if defined(HAVE_GCC_ATOMIC_BUILTINS) || defined(HAVE_SOLARIS_ATOMICS) \ # if defined(HAVE_IB_GCC_ATOMIC_BUILTINS) || defined(HAVE_SOLARIS_ATOMICS) \
|| defined(HAVE_WINDOWS_ATOMICS) || defined(HAVE_WINDOWS_ATOMICS)
/* If atomics are defined we use them in InnoDB mutex implementation */ /* If atomics are defined we use them in InnoDB mutex implementation */
# define HAVE_ATOMIC_BUILTINS # define HAVE_ATOMIC_BUILTINS
# endif /* (HAVE_GCC_ATOMIC_BUILTINS) || (HAVE_SOLARIS_ATOMICS) # endif /* (HAVE_IB_GCC_ATOMIC_BUILTINS) || (HAVE_SOLARIS_ATOMICS)
|| (HAVE_WINDOWS_ATOMICS) */ || (HAVE_WINDOWS_ATOMICS) */
/* For InnoDB rw_locks to work with atomics we need the thread_id /* For InnoDB rw_locks to work with atomics we need the thread_id
......
...@@ -239,6 +239,15 @@ ullint ...@@ -239,6 +239,15 @@ ullint
ut_time_us( ut_time_us(
/*=======*/ /*=======*/
ullint* tloc); /*!< out: us since epoch, if non-NULL */ ullint* tloc); /*!< out: us since epoch, if non-NULL */
/**********************************************************//**
Returns the number of milliseconds since some epoch. The
value may wrap around. It should only be used for heuristic
purposes.
@return ms since epoch */
UNIV_INTERN
uint
ut_time_ms(void);
/*============*/
/**********************************************************//** /**********************************************************//**
Returns the difference of two times in seconds. Returns the difference of two times in seconds.
......
...@@ -63,6 +63,57 @@ MYSQL_PLUGIN_ACTIONS(innodb_plugin, [ ...@@ -63,6 +63,57 @@ MYSQL_PLUGIN_ACTIONS(innodb_plugin, [
;; ;;
esac esac
AC_SUBST(INNODB_DYNAMIC_CFLAGS) AC_SUBST(INNODB_DYNAMIC_CFLAGS)
AC_MSG_CHECKING(whether GCC atomic builtins are available)
AC_TRY_RUN(
[
int main()
{
long x;
long y;
long res;
char c;
x = 10;
y = 123;
res = __sync_bool_compare_and_swap(&x, x, y);
if (!res || x != y) {
return(1);
}
x = 10;
y = 123;
res = __sync_bool_compare_and_swap(&x, x + 1, y);
if (res || x != 10) {
return(1);
}
x = 10;
y = 123;
res = __sync_add_and_fetch(&x, y);
if (res != 123 + 10 || x != 123 + 10) {
return(1);
}
c = 10;
res = __sync_lock_test_and_set(&c, 123);
if (res != 10 || c != 123) {
return(1);
}
return(0);
}
],
[
AC_DEFINE([HAVE_IB_GCC_ATOMIC_BUILTINS], [1],
[GCC atomic builtins are available])
AC_MSG_RESULT(yes)
],
[
AC_MSG_RESULT(no)
]
)
AC_MSG_CHECKING(whether pthread_t can be used by GCC atomic builtins) AC_MSG_CHECKING(whether pthread_t can be used by GCC atomic builtins)
AC_TRY_RUN( AC_TRY_RUN(
[ [
......
...@@ -60,9 +60,19 @@ Completed by Sunny Bains and Marko Makela ...@@ -60,9 +60,19 @@ Completed by Sunny Bains and Marko Makela
#ifdef UNIV_DEBUG #ifdef UNIV_DEBUG
/** Set these in order ot enable debug printout. */ /** Set these in order ot enable debug printout. */
/* @{ */ /* @{ */
/** Log the outcome of each row_merge_cmp() call, comparing records. */
static ibool row_merge_print_cmp; static ibool row_merge_print_cmp;
/** Log each record read from temporary file. */
static ibool row_merge_print_read; static ibool row_merge_print_read;
/** Log each record write to temporary file. */
static ibool row_merge_print_write; static ibool row_merge_print_write;
/** Log each row_merge_blocks() call, merging two blocks of records to
a bigger one. */
static ibool row_merge_print_block;
/** Log each block read from temporary file. */
static ibool row_merge_print_block_read;
/** Log each block read from temporary file. */
static ibool row_merge_print_block_write;
/* @} */ /* @} */
#endif /* UNIV_DEBUG */ #endif /* UNIV_DEBUG */
...@@ -110,7 +120,8 @@ typedef struct row_merge_buf_struct row_merge_buf_t; ...@@ -110,7 +120,8 @@ typedef struct row_merge_buf_struct row_merge_buf_t;
/** Information about temporary files used in merge sort */ /** Information about temporary files used in merge sort */
struct merge_file_struct { struct merge_file_struct {
int fd; /*!< file descriptor */ int fd; /*!< file descriptor */
ulint offset; /*!< file offset */ ulint offset; /*!< file offset (end of file) */
ib_uint64_t n_rec; /*!< number of records in the file */
}; };
/** Information about temporary files used in merge sort */ /** Information about temporary files used in merge sort */
...@@ -682,6 +693,13 @@ row_merge_read( ...@@ -682,6 +693,13 @@ row_merge_read(
ib_uint64_t ofs = ((ib_uint64_t) offset) * sizeof *buf; ib_uint64_t ofs = ((ib_uint64_t) offset) * sizeof *buf;
ibool success; ibool success;
#ifdef UNIV_DEBUG
if (row_merge_print_block_read) {
fprintf(stderr, "row_merge_read fd=%d ofs=%lu\n",
fd, (ulong) offset);
}
#endif /* UNIV_DEBUG */
success = os_file_read_no_error_handling(OS_FILE_FROM_FD(fd), buf, success = os_file_read_no_error_handling(OS_FILE_FROM_FD(fd), buf,
(ulint) (ofs & 0xFFFFFFFF), (ulint) (ofs & 0xFFFFFFFF),
(ulint) (ofs >> 32), (ulint) (ofs >> 32),
...@@ -709,6 +727,13 @@ row_merge_write( ...@@ -709,6 +727,13 @@ row_merge_write(
ib_uint64_t ofs = ((ib_uint64_t) offset) ib_uint64_t ofs = ((ib_uint64_t) offset)
* sizeof(row_merge_block_t); * sizeof(row_merge_block_t);
#ifdef UNIV_DEBUG
if (row_merge_print_block_write) {
fprintf(stderr, "row_merge_write fd=%d ofs=%lu\n",
fd, (ulong) offset);
}
#endif /* UNIV_DEBUG */
return(UNIV_LIKELY(os_file_write("(merge)", OS_FILE_FROM_FD(fd), buf, return(UNIV_LIKELY(os_file_write("(merge)", OS_FILE_FROM_FD(fd), buf,
(ulint) (ofs & 0xFFFFFFFF), (ulint) (ofs & 0xFFFFFFFF),
(ulint) (ofs >> 32), (ulint) (ofs >> 32),
...@@ -718,7 +743,7 @@ row_merge_write( ...@@ -718,7 +743,7 @@ row_merge_write(
/********************************************************************//** /********************************************************************//**
Read a merge record. Read a merge record.
@return pointer to next record, or NULL on I/O error or end of list */ @return pointer to next record, or NULL on I/O error or end of list */
static static __attribute__((nonnull))
const byte* const byte*
row_merge_read_rec( row_merge_read_rec(
/*===============*/ /*===============*/
...@@ -1070,7 +1095,7 @@ row_merge_cmp( ...@@ -1070,7 +1095,7 @@ row_merge_cmp(
Reads clustered index of the table and create temporary files Reads clustered index of the table and create temporary files
containing the index entries for the indexes to be built. containing the index entries for the indexes to be built.
@return DB_SUCCESS or error */ @return DB_SUCCESS or error */
static static __attribute__((nonnull))
ulint ulint
row_merge_read_clustered_index( row_merge_read_clustered_index(
/*===========================*/ /*===========================*/
...@@ -1233,6 +1258,7 @@ row_merge_read_clustered_index( ...@@ -1233,6 +1258,7 @@ row_merge_read_clustered_index(
if (UNIV_LIKELY if (UNIV_LIKELY
(row && row_merge_buf_add(buf, row, ext))) { (row && row_merge_buf_add(buf, row, ext))) {
file->n_rec++;
continue; continue;
} }
...@@ -1274,15 +1300,20 @@ err_exit: ...@@ -1274,15 +1300,20 @@ err_exit:
UNIV_MEM_INVALID(block[0], sizeof block[0]); UNIV_MEM_INVALID(block[0], sizeof block[0]);
merge_buf[i] = row_merge_buf_empty(buf); merge_buf[i] = row_merge_buf_empty(buf);
/* Try writing the record again, now that if (UNIV_LIKELY(row != NULL)) {
the buffer has been written out and emptied. */ /* Try writing the record again, now
that the buffer has been written out
and emptied. */
if (UNIV_UNLIKELY if (UNIV_UNLIKELY
(row && !row_merge_buf_add(buf, row, ext))) { (!row_merge_buf_add(buf, row, ext))) {
/* An empty buffer should have enough /* An empty buffer should have enough
room for at least one record. */ room for at least one record. */
ut_error; ut_error;
} }
file->n_rec++;
}
} }
mem_heap_empty(row_heap); mem_heap_empty(row_heap);
...@@ -1320,7 +1351,7 @@ func_exit: ...@@ -1320,7 +1351,7 @@ func_exit:
b2 = row_merge_write_rec(&block[2], &buf[2], b2, \ b2 = row_merge_write_rec(&block[2], &buf[2], b2, \
of->fd, &of->offset, \ of->fd, &of->offset, \
mrec##N, offsets##N); \ mrec##N, offsets##N); \
if (UNIV_UNLIKELY(!b2)) { \ if (UNIV_UNLIKELY(!b2 || ++of->n_rec > file->n_rec)) { \
goto corrupt; \ goto corrupt; \
} \ } \
b##N = row_merge_read_rec(&block[N], &buf[N], \ b##N = row_merge_read_rec(&block[N], &buf[N], \
...@@ -1336,14 +1367,14 @@ func_exit: ...@@ -1336,14 +1367,14 @@ func_exit:
} while (0) } while (0)
/*************************************************************//** /*************************************************************//**
Merge two blocks of linked lists on disk and write a bigger block. Merge two blocks of records on disk and write a bigger block.
@return DB_SUCCESS or error code */ @return DB_SUCCESS or error code */
static static
ulint ulint
row_merge_blocks( row_merge_blocks(
/*=============*/ /*=============*/
const dict_index_t* index, /*!< in: index being created */ const dict_index_t* index, /*!< in: index being created */
merge_file_t* file, /*!< in/out: file containing const merge_file_t* file, /*!< in: file containing
index entries */ index entries */
row_merge_block_t* block, /*!< in/out: 3 buffers */ row_merge_block_t* block, /*!< in/out: 3 buffers */
ulint* foffs0, /*!< in/out: offset of first ulint* foffs0, /*!< in/out: offset of first
...@@ -1366,6 +1397,17 @@ row_merge_blocks( ...@@ -1366,6 +1397,17 @@ row_merge_blocks(
ulint* offsets0;/* offsets of mrec0 */ ulint* offsets0;/* offsets of mrec0 */
ulint* offsets1;/* offsets of mrec1 */ ulint* offsets1;/* offsets of mrec1 */
#ifdef UNIV_DEBUG
if (row_merge_print_block) {
fprintf(stderr,
"row_merge_blocks fd=%d ofs=%lu + fd=%d ofs=%lu"
" = fd=%d ofs=%lu\n",
file->fd, (ulong) *foffs0,
file->fd, (ulong) *foffs1,
of->fd, (ulong) of->offset);
}
#endif /* UNIV_DEBUG */
heap = row_merge_heap_create(index, &offsets0, &offsets1); heap = row_merge_heap_create(index, &offsets0, &offsets1);
/* Write a record and read the next record. Split the output /* Write a record and read the next record. Split the output
...@@ -1437,17 +1479,88 @@ done1: ...@@ -1437,17 +1479,88 @@ done1:
return(b2 ? DB_SUCCESS : DB_CORRUPTION); return(b2 ? DB_SUCCESS : DB_CORRUPTION);
} }
/*************************************************************//**
Copy a block of index entries.
@return TRUE on success, FALSE on failure */
static __attribute__((nonnull))
ibool
row_merge_blocks_copy(
/*==================*/
const dict_index_t* index, /*!< in: index being created */
const merge_file_t* file, /*!< in: input file */
row_merge_block_t* block, /*!< in/out: 3 buffers */
ulint* foffs0, /*!< in/out: input file offset */
merge_file_t* of) /*!< in/out: output file */
{
mem_heap_t* heap; /*!< memory heap for offsets0, offsets1 */
mrec_buf_t buf[3]; /*!< buffer for handling
split mrec in block[] */
const byte* b0; /*!< pointer to block[0] */
byte* b2; /*!< pointer to block[2] */
const mrec_t* mrec0; /*!< merge rec, points to block[0] */
ulint* offsets0;/* offsets of mrec0 */
ulint* offsets1;/* dummy offsets */
#ifdef UNIV_DEBUG
if (row_merge_print_block) {
fprintf(stderr,
"row_merge_blocks_copy fd=%d ofs=%lu"
" = fd=%d ofs=%lu\n",
file->fd, (ulong) foffs0,
of->fd, (ulong) of->offset);
}
#endif /* UNIV_DEBUG */
heap = row_merge_heap_create(index, &offsets0, &offsets1);
/* Write a record and read the next record. Split the output
file in two halves, which can be merged on the following pass. */
if (!row_merge_read(file->fd, *foffs0, &block[0])) {
corrupt:
mem_heap_free(heap);
return(FALSE);
}
b0 = block[0];
b2 = block[2];
b0 = row_merge_read_rec(&block[0], &buf[0], b0, index, file->fd,
foffs0, &mrec0, offsets0);
if (UNIV_UNLIKELY(!b0 && mrec0)) {
goto corrupt;
}
if (mrec0) {
/* append all mrec0 to output */
for (;;) {
ROW_MERGE_WRITE_GET_NEXT(0, goto done0);
}
}
done0:
/* The file offset points to the beginning of the last page
that has been read. Update it to point to the next block. */
(*foffs0)++;
mem_heap_free(heap);
return(row_merge_write_eof(&block[2], b2, of->fd, &of->offset)
!= NULL);
}
/*************************************************************//** /*************************************************************//**
Merge disk files. Merge disk files.
@return DB_SUCCESS or error code */ @return DB_SUCCESS or error code */
static static __attribute__((nonnull))
ulint ulint
row_merge( row_merge(
/*======*/ /*======*/
const dict_index_t* index, /*!< in: index being created */ const dict_index_t* index, /*!< in: index being created */
merge_file_t* file, /*!< in/out: file containing merge_file_t* file, /*!< in/out: file containing
index entries */ index entries */
ulint half, /*!< in: half the file */ ulint* half, /*!< in/out: half the file */
row_merge_block_t* block, /*!< in/out: 3 buffers */ row_merge_block_t* block, /*!< in/out: 3 buffers */
int* tmpfd, /*!< in/out: temporary file handle */ int* tmpfd, /*!< in/out: temporary file handle */
TABLE* table) /*!< in/out: MySQL table, for TABLE* table) /*!< in/out: MySQL table, for
...@@ -1458,43 +1571,76 @@ row_merge( ...@@ -1458,43 +1571,76 @@ row_merge(
ulint foffs1; /*!< second input offset */ ulint foffs1; /*!< second input offset */
ulint error; /*!< error code */ ulint error; /*!< error code */
merge_file_t of; /*!< output file */ merge_file_t of; /*!< output file */
const ulint ihalf = *half;
/*!< half the input file */
ulint ohalf; /*!< half the output file */
UNIV_MEM_ASSERT_W(block[0], 3 * sizeof block[0]); UNIV_MEM_ASSERT_W(block[0], 3 * sizeof block[0]);
ut_ad(half > 0); ut_ad(ihalf > 0);
ut_ad(ihalf < file->offset);
of.fd = *tmpfd; of.fd = *tmpfd;
of.offset = 0; of.offset = 0;
of.n_rec = 0;
/* Merge blocks to the output file. */ /* Merge blocks to the output file. */
ohalf = 0;
foffs0 = 0; foffs0 = 0;
foffs1 = half; foffs1 = ihalf;
for (; foffs0 < ihalf && foffs1 < file->offset; foffs0++, foffs1++) {
ulint ahalf; /*!< arithmetic half the input file */
for (; foffs0 < half && foffs1 < file->offset; foffs0++, foffs1++) {
error = row_merge_blocks(index, file, block, error = row_merge_blocks(index, file, block,
&foffs0, &foffs1, &of, table); &foffs0, &foffs1, &of, table);
if (error != DB_SUCCESS) { if (error != DB_SUCCESS) {
return(error); return(error);
} }
/* Record the offset of the output file when
approximately half the output has been generated. In
this way, the next invocation of row_merge() will
spend most of the time in this loop. The initial
estimate is ohalf==0. */
ahalf = file->offset / 2;
ut_ad(ohalf <= of.offset);
/* Improve the estimate until reaching half the input
file size, or we can not get any closer to it. All
comparands should be non-negative when !(ohalf < ahalf)
because ohalf <= of.offset. */
if (ohalf < ahalf || of.offset - ahalf < ohalf - ahalf) {
ohalf = of.offset;
}
} }
/* Copy the last block, if there is one. */ /* Copy the last blocks, if there are any. */
while (foffs0 < half) {
if (!row_merge_read(file->fd, foffs0++, block) while (foffs0 < ihalf) {
|| !row_merge_write(of.fd, of.offset++, block)) { if (!row_merge_blocks_copy(index, file, block, &foffs0, &of)) {
return(DB_CORRUPTION); return(DB_CORRUPTION);
} }
} }
ut_ad(foffs0 == ihalf);
while (foffs1 < file->offset) { while (foffs1 < file->offset) {
if (!row_merge_read(file->fd, foffs1++, block) if (!row_merge_blocks_copy(index, file, block, &foffs1, &of)) {
|| !row_merge_write(of.fd, of.offset++, block)) {
return(DB_CORRUPTION); return(DB_CORRUPTION);
} }
} }
ut_ad(foffs1 == file->offset);
if (UNIV_UNLIKELY(of.n_rec != file->n_rec)) {
return(DB_CORRUPTION);
}
/* Swap file descriptors for the next pass. */ /* Swap file descriptors for the next pass. */
*tmpfd = file->fd; *tmpfd = file->fd;
*file = of; *file = of;
*half = ohalf;
UNIV_MEM_INVALID(block[0], 3 * sizeof block[0]); UNIV_MEM_INVALID(block[0], 3 * sizeof block[0]);
...@@ -1517,20 +1663,17 @@ row_merge_sort( ...@@ -1517,20 +1663,17 @@ row_merge_sort(
reporting erroneous key value reporting erroneous key value
if applicable */ if applicable */
{ {
ulint blksz; /*!< block size */ ulint half = file->offset / 2;
for (blksz = 1; blksz < file->offset; blksz *= 2) { do {
ulint half;
ulint error; ulint error;
ut_ad(ut_is_2pow(blksz)); error = row_merge(index, file, &half, block, tmpfd, table);
half = ut_2pow_round((file->offset + (blksz - 1)) / 2, blksz);
error = row_merge(index, file, half, block, tmpfd, table);
if (error != DB_SUCCESS) { if (error != DB_SUCCESS) {
return(error); return(error);
} }
} } while (half < file->offset && half > 0);
return(DB_SUCCESS); return(DB_SUCCESS);
} }
...@@ -1909,6 +2052,7 @@ row_merge_file_create( ...@@ -1909,6 +2052,7 @@ row_merge_file_create(
{ {
merge_file->fd = innobase_mysql_tmpfile(); merge_file->fd = innobase_mysql_tmpfile();
merge_file->offset = 0; merge_file->offset = 0;
merge_file->n_rec = 0;
} }
/*********************************************************************//** /*********************************************************************//**
......
...@@ -292,12 +292,6 @@ UNIV_INTERN ulint srv_buf_pool_flushed = 0; ...@@ -292,12 +292,6 @@ UNIV_INTERN ulint srv_buf_pool_flushed = 0;
reading of a disk page */ reading of a disk page */
UNIV_INTERN ulint srv_buf_pool_reads = 0; UNIV_INTERN ulint srv_buf_pool_reads = 0;
/** Number of sequential read-aheads */
UNIV_INTERN ulint srv_read_ahead_seq = 0;
/** Number of random read-aheads */
UNIV_INTERN ulint srv_read_ahead_rnd = 0;
/* structure to pass status variables to MySQL */ /* structure to pass status variables to MySQL */
UNIV_INTERN export_struc export_vars; UNIV_INTERN export_struc export_vars;
...@@ -464,8 +458,6 @@ static ulint srv_main_background_loops = 0; ...@@ -464,8 +458,6 @@ static ulint srv_main_background_loops = 0;
static ulint srv_main_flush_loops = 0; static ulint srv_main_flush_loops = 0;
/* Log writes involving flush. */ /* Log writes involving flush. */
static ulint srv_log_writes_and_flush = 0; static ulint srv_log_writes_and_flush = 0;
/* Log writes not including flush. */
static ulint srv_log_buffer_writes = 0;
/* This is only ever touched by the master thread. It records the /* This is only ever touched by the master thread. It records the
time when the last flush of log file has happened. The master time when the last flush of log file has happened. The master
...@@ -714,9 +706,8 @@ srv_print_master_thread_info( ...@@ -714,9 +706,8 @@ srv_print_master_thread_info(
srv_main_1_second_loops, srv_main_sleeps, srv_main_1_second_loops, srv_main_sleeps,
srv_main_10_second_loops, srv_main_background_loops, srv_main_10_second_loops, srv_main_background_loops,
srv_main_flush_loops); srv_main_flush_loops);
fprintf(file, "srv_master_thread log flush and writes: %lu " fprintf(file, "srv_master_thread log flush and writes: %lu\n",
" log writes only: %lu\n", srv_log_writes_and_flush);
srv_log_writes_and_flush, srv_log_buffer_writes);
} }
/*********************************************************************//** /*********************************************************************//**
...@@ -1877,14 +1868,16 @@ srv_export_innodb_status(void) ...@@ -1877,14 +1868,16 @@ srv_export_innodb_status(void)
export_vars.innodb_data_reads = os_n_file_reads; export_vars.innodb_data_reads = os_n_file_reads;
export_vars.innodb_data_writes = os_n_file_writes; export_vars.innodb_data_writes = os_n_file_writes;
export_vars.innodb_data_written = srv_data_written; export_vars.innodb_data_written = srv_data_written;
export_vars.innodb_buffer_pool_read_requests = buf_pool->n_page_gets; export_vars.innodb_buffer_pool_read_requests = buf_pool->stat.n_page_gets;
export_vars.innodb_buffer_pool_write_requests export_vars.innodb_buffer_pool_write_requests
= srv_buf_pool_write_requests; = srv_buf_pool_write_requests;
export_vars.innodb_buffer_pool_wait_free = srv_buf_pool_wait_free; export_vars.innodb_buffer_pool_wait_free = srv_buf_pool_wait_free;
export_vars.innodb_buffer_pool_pages_flushed = srv_buf_pool_flushed; export_vars.innodb_buffer_pool_pages_flushed = srv_buf_pool_flushed;
export_vars.innodb_buffer_pool_reads = srv_buf_pool_reads; export_vars.innodb_buffer_pool_reads = srv_buf_pool_reads;
export_vars.innodb_buffer_pool_read_ahead_rnd = srv_read_ahead_rnd; export_vars.innodb_buffer_pool_read_ahead
export_vars.innodb_buffer_pool_read_ahead_seq = srv_read_ahead_seq; = buf_pool->stat.n_ra_pages_read;
export_vars.innodb_buffer_pool_read_ahead_evicted
= buf_pool->stat.n_ra_pages_evicted;
export_vars.innodb_buffer_pool_pages_data export_vars.innodb_buffer_pool_pages_data
= UT_LIST_GET_LEN(buf_pool->LRU); = UT_LIST_GET_LEN(buf_pool->LRU);
export_vars.innodb_buffer_pool_pages_dirty export_vars.innodb_buffer_pool_pages_dirty
...@@ -1915,9 +1908,9 @@ srv_export_innodb_status(void) ...@@ -1915,9 +1908,9 @@ srv_export_innodb_status(void)
export_vars.innodb_log_writes = srv_log_writes; export_vars.innodb_log_writes = srv_log_writes;
export_vars.innodb_dblwr_pages_written = srv_dblwr_pages_written; export_vars.innodb_dblwr_pages_written = srv_dblwr_pages_written;
export_vars.innodb_dblwr_writes = srv_dblwr_writes; export_vars.innodb_dblwr_writes = srv_dblwr_writes;
export_vars.innodb_pages_created = buf_pool->n_pages_created; export_vars.innodb_pages_created = buf_pool->stat.n_pages_created;
export_vars.innodb_pages_read = buf_pool->n_pages_read; export_vars.innodb_pages_read = buf_pool->stat.n_pages_read;
export_vars.innodb_pages_written = buf_pool->n_pages_written; export_vars.innodb_pages_written = buf_pool->stat.n_pages_written;
export_vars.innodb_row_lock_waits = srv_n_lock_wait_count; export_vars.innodb_row_lock_waits = srv_n_lock_wait_count;
export_vars.innodb_row_lock_current_waits export_vars.innodb_row_lock_current_waits
= srv_n_lock_wait_current_count; = srv_n_lock_wait_current_count;
...@@ -2284,12 +2277,6 @@ srv_sync_log_buffer_in_background(void) ...@@ -2284,12 +2277,6 @@ srv_sync_log_buffer_in_background(void)
log_buffer_sync_in_background(TRUE); log_buffer_sync_in_background(TRUE);
srv_last_log_flush_time = current_time; srv_last_log_flush_time = current_time;
srv_log_writes_and_flush++; srv_log_writes_and_flush++;
} else {
/* Actually we don't need to write logs here.
We are just being extra safe here by forcing
the log buffer to log file. */
log_buffer_sync_in_background(FALSE);
srv_log_buffer_writes++;
} }
} }
...@@ -2340,8 +2327,8 @@ loop: ...@@ -2340,8 +2327,8 @@ loop:
srv_main_thread_op_info = "reserving kernel mutex"; srv_main_thread_op_info = "reserving kernel mutex";
n_ios_very_old = log_sys->n_log_ios + buf_pool->n_pages_read n_ios_very_old = log_sys->n_log_ios + buf_pool->stat.n_pages_read
+ buf_pool->n_pages_written; + buf_pool->stat.n_pages_written;
mutex_enter(&kernel_mutex); mutex_enter(&kernel_mutex);
/* Store the user activity counter at the start of this loop */ /* Store the user activity counter at the start of this loop */
...@@ -2361,8 +2348,8 @@ loop: ...@@ -2361,8 +2348,8 @@ loop:
skip_sleep = FALSE; skip_sleep = FALSE;
for (i = 0; i < 10; i++) { for (i = 0; i < 10; i++) {
n_ios_old = log_sys->n_log_ios + buf_pool->n_pages_read n_ios_old = log_sys->n_log_ios + buf_pool->stat.n_pages_read
+ buf_pool->n_pages_written; + buf_pool->stat.n_pages_written;
srv_main_thread_op_info = "sleeping"; srv_main_thread_op_info = "sleeping";
srv_main_1_second_loops++; srv_main_1_second_loops++;
...@@ -2401,8 +2388,8 @@ loop: ...@@ -2401,8 +2388,8 @@ loop:
n_pend_ios = buf_get_n_pending_ios() n_pend_ios = buf_get_n_pending_ios()
+ log_sys->n_pending_writes; + log_sys->n_pending_writes;
n_ios = log_sys->n_log_ios + buf_pool->n_pages_read n_ios = log_sys->n_log_ios + buf_pool->stat.n_pages_read
+ buf_pool->n_pages_written; + buf_pool->stat.n_pages_written;
if (n_pend_ios < SRV_PEND_IO_THRESHOLD if (n_pend_ios < SRV_PEND_IO_THRESHOLD
&& (n_ios - n_ios_old < SRV_RECENT_IO_ACTIVITY)) { && (n_ios - n_ios_old < SRV_RECENT_IO_ACTIVITY)) {
srv_main_thread_op_info = "doing insert buffer merge"; srv_main_thread_op_info = "doing insert buffer merge";
...@@ -2418,6 +2405,8 @@ loop: ...@@ -2418,6 +2405,8 @@ loop:
/* Try to keep the number of modified pages in the /* Try to keep the number of modified pages in the
buffer pool under the limit wished by the user */ buffer pool under the limit wished by the user */
srv_main_thread_op_info =
"flushing buffer pool pages";
n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST, n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST,
PCT_IO(100), PCT_IO(100),
IB_ULONGLONG_MAX); IB_ULONGLONG_MAX);
...@@ -2436,6 +2425,8 @@ loop: ...@@ -2436,6 +2425,8 @@ loop:
ulint n_flush = buf_flush_get_desired_flush_rate(); ulint n_flush = buf_flush_get_desired_flush_rate();
if (n_flush) { if (n_flush) {
srv_main_thread_op_info =
"flushing buffer pool pages";
n_flush = ut_min(PCT_IO(100), n_flush); n_flush = ut_min(PCT_IO(100), n_flush);
n_pages_flushed = n_pages_flushed =
buf_flush_batch( buf_flush_batch(
...@@ -2473,8 +2464,8 @@ loop: ...@@ -2473,8 +2464,8 @@ loop:
are not required, and may be disabled. */ are not required, and may be disabled. */
n_pend_ios = buf_get_n_pending_ios() + log_sys->n_pending_writes; n_pend_ios = buf_get_n_pending_ios() + log_sys->n_pending_writes;
n_ios = log_sys->n_log_ios + buf_pool->n_pages_read n_ios = log_sys->n_log_ios + buf_pool->stat.n_pages_read
+ buf_pool->n_pages_written; + buf_pool->stat.n_pages_written;
srv_main_10_second_loops++; srv_main_10_second_loops++;
if (n_pend_ios < SRV_PEND_IO_THRESHOLD if (n_pend_ios < SRV_PEND_IO_THRESHOLD
......
...@@ -1106,7 +1106,7 @@ innobase_start_or_create_for_mysql(void) ...@@ -1106,7 +1106,7 @@ innobase_start_or_create_for_mysql(void)
"InnoDB: The InnoDB memory heap is disabled\n"); "InnoDB: The InnoDB memory heap is disabled\n");
} }
#ifdef HAVE_GCC_ATOMIC_BUILTINS #ifdef HAVE_IB_GCC_ATOMIC_BUILTINS
# ifdef INNODB_RW_LOCKS_USE_ATOMICS # ifdef INNODB_RW_LOCKS_USE_ATOMICS
fprintf(stderr, fprintf(stderr,
"InnoDB: Mutexes and rw_locks use GCC atomic builtins.\n"); "InnoDB: Mutexes and rw_locks use GCC atomic builtins.\n");
...@@ -1130,10 +1130,10 @@ innobase_start_or_create_for_mysql(void) ...@@ -1130,10 +1130,10 @@ innobase_start_or_create_for_mysql(void)
fprintf(stderr, fprintf(stderr,
"InnoDB: Mutexes use Windows interlocked functions.\n"); "InnoDB: Mutexes use Windows interlocked functions.\n");
# endif /* INNODB_RW_LOCKS_USE_ATOMICS */ # endif /* INNODB_RW_LOCKS_USE_ATOMICS */
#else /* HAVE_GCC_ATOMIC_BUILTINS */ #else /* HAVE_IB_GCC_ATOMIC_BUILTINS */
fprintf(stderr, fprintf(stderr,
"InnoDB: Neither mutexes nor rw_locks use GCC atomic builtins.\n"); "InnoDB: Neither mutexes nor rw_locks use GCC atomic builtins.\n");
#endif /* HAVE_GCC_ATOMIC_BUILTINS */ #endif /* HAVE_IB_GCC_ATOMIC_BUILTINS */
/* Since InnoDB does not currently clean up all its internal data /* Since InnoDB does not currently clean up all its internal data
structures in MySQL Embedded Server Library server_end(), we structures in MySQL Embedded Server Library server_end(), we
......
...@@ -199,6 +199,23 @@ ut_time_us( ...@@ -199,6 +199,23 @@ ut_time_us(
return(us); return(us);
} }
/**********************************************************//**
Returns the number of milliseconds since some epoch. The
value may wrap around. It should only be used for heuristic
purposes.
@return ms since epoch */
UNIV_INTERN
uint
ut_time_ms(void)
/*============*/
{
struct timeval tv;
ut_gettimeofday(&tv, NULL);
return((uint) tv.tv_sec * 1000 + tv.tv_usec / 1000);
}
/**********************************************************//** /**********************************************************//**
Returns the difference of two times in seconds. Returns the difference of two times in seconds.
@return time2 - time1 expressed in seconds */ @return time2 - time1 expressed in seconds */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment