Commit a328b913 authored by John Esmet's avatar John Esmet

fixes #46 Add dynamic-value omt clone (dmt) and use it to implement basement nodes

parent be69db1f
Notes during 2014-01-08 Leif/Yoni
-Should verify (dmt?omt?bndata?) crash or return error on failed verify
DECISIONS:
Replace dmt_functor with implicit interface only. Instead of (for data type x) requiring the name to be dmt_functor<x> just pass the writer's class name into the dmt's template as a new parameter.
Replace dmt_functor<default> with comments explaining the "interface"
-==========================================-
See wiki:
https://github.com/Tokutek/ft-index/wiki/Improving-in-memory-query-performance---Design
ft/bndata.{cc,h} The basement node was heavily modified to split the key/value, and inline the keys
bn_data::initialize_from_separate_keys_and_vals
This is effectively the deserialize
The bn_data::omt_* functions (probably badly named) kind of treat the basement node as an omt of key+leafentry pairs
There are many references to 'omt' that could be renamed to dmt if it's worth it.
util/dmt.{cc,h} The new DMT structure
Possible questions:
1-Should we merge dmt<> & omt<>? (delete omt entirely)
2-Should omt<> become a wrapper for dmt<>?
3-Should we just keep both around?
If we plan to do this for a while, should we get rid of any scaffolding that would make it easier to do 1 or 2?
The dmt is basically an omt with dynamic sized nodes/values.
There are two representations: an array of values, or a tree of nodes.
The high-level algorithm is basically the same for dmt and omt, except the dmt tries not to move values around in tree form
Instead, it moves the metadata from nodes around.
Insertion into a dmt requires a functor that can provide information about size, since it's expected to be (potentially at least) dynamically sized
The dmt does not revert to array form when rebalancing the root, but it CAN revert to array form when it prepares for serializing (if it notices everything is fixed length)
The dmt also can serialize and deserialize the values (set) it represents. It saves no information about the dmt itself, just the values.
Some comments about what's in each file.
ft/CMakeLists.txt
add dmt-wrapper (test wrapper, nearly identical to ft/omt.cc which is also a test wrapper)
ft/dmt-wrapper.cc/h
Just like ft/omt.cc,h. Is a test wrapper for the dmt to implement a version of the old (non-templated) omt tests.
ft/ft-internal.h
Additional engine status
ft/ft-ops.cc/h
Additional engine status
in ftnode_memory_size()
fix a minor bug where we didn't count all the memory.
comments
ft/ft_layout_version.h
Update comment describing version change.
NOTE: May need to add version 26 if 25 is sent to customers before this goes live.
Adding 26 requires additional code changes (limited to a subset of places where version 24/25 are referred to)
ft/ft_node-serialize.cc
Changes calculation of size of a leaf node to include basement-node header
Adds optimized serialization for basement nodes with fixed-length keys
Maintains old method when not using fixed-length keys.
rebalance_ftnode_leaf()
Minor changes since key/leafentries are separated
deserialize_ftnode_partition()
Minor changes, including passing rbuf directly to child function (so ndone calculation is done by child)
ft/memarena.cc
Changes so that toku_memory_footprint is more accurate. (Not exactly related project)
ft/rollback.cc
Just uses new memarena function for memory footprint
ft/tests/dmt-test.cc
"clone" of old omt-test (non templated) ported to dmt
Basically not worth looking at except to make sure it imports dmt instead of omt.
ft/tests/dmt-test2.cc
New dmt tests.
You might decide not enough new tests were implemented.
ft/tests/ft-serialize-benchmark.cc
Minor improvements s.t. you can take an average of a bunch of runs.
ft/tests/ft-serialize-test.cc
Just ported to changed api
ft/tests/test-pick-child-to-flush.cc
The new basement-node headers reduce available memory.. reduce max size of test appropriately.
ft/wbuf.h
Added wbuf_nocrc_reserve_literal_bytes()
Gives you a pointer to write to the wbuf, but notes the memory was used.
util/mempool.cc
Made mempool allocations aligned to cachelines
Minor 'const' changes to help compilation
Some utility functions to get/give offsets
...@@ -31,6 +31,7 @@ set(FT_SOURCES ...@@ -31,6 +31,7 @@ set(FT_SOURCES
checkpoint checkpoint
compress compress
dbufio dbufio
dmt-wrapper
fifo fifo
ft ft
ft-cachetable-wrappers ft-cachetable-wrappers
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -689,16 +689,16 @@ ftleaf_get_split_loc( ...@@ -689,16 +689,16 @@ ftleaf_get_split_loc(
switch (split_mode) { switch (split_mode) {
case SPLIT_LEFT_HEAVY: { case SPLIT_LEFT_HEAVY: {
*num_left_bns = node->n_children; *num_left_bns = node->n_children;
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->omt_size(); *num_left_les = BLB_DATA(node, *num_left_bns - 1)->num_klpairs();
if (*num_left_les == 0) { if (*num_left_les == 0) {
*num_left_bns = node->n_children - 1; *num_left_bns = node->n_children - 1;
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->omt_size(); *num_left_les = BLB_DATA(node, *num_left_bns - 1)->num_klpairs();
} }
goto exit; goto exit;
} }
case SPLIT_RIGHT_HEAVY: { case SPLIT_RIGHT_HEAVY: {
*num_left_bns = 1; *num_left_bns = 1;
*num_left_les = BLB_DATA(node, 0)->omt_size() ? 1 : 0; *num_left_les = BLB_DATA(node, 0)->num_klpairs() ? 1 : 0;
goto exit; goto exit;
} }
case SPLIT_EVENLY: { case SPLIT_EVENLY: {
...@@ -707,8 +707,8 @@ ftleaf_get_split_loc( ...@@ -707,8 +707,8 @@ ftleaf_get_split_loc(
uint64_t sumlesizes = ftleaf_disk_size(node); uint64_t sumlesizes = ftleaf_disk_size(node);
uint32_t size_so_far = 0; uint32_t size_so_far = 0;
for (int i = 0; i < node->n_children; i++) { for (int i = 0; i < node->n_children; i++) {
BN_DATA bd = BLB_DATA(node, i); bn_data* bd = BLB_DATA(node, i);
uint32_t n_leafentries = bd->omt_size(); uint32_t n_leafentries = bd->num_klpairs();
for (uint32_t j=0; j < n_leafentries; j++) { for (uint32_t j=0; j < n_leafentries; j++) {
size_t size_this_le; size_t size_this_le;
int rr = bd->fetch_klpair_disksize(j, &size_this_le); int rr = bd->fetch_klpair_disksize(j, &size_this_le);
...@@ -725,7 +725,7 @@ ftleaf_get_split_loc( ...@@ -725,7 +725,7 @@ ftleaf_get_split_loc(
(*num_left_les)--; (*num_left_les)--;
} else if (*num_left_bns > 1) { } else if (*num_left_bns > 1) {
(*num_left_bns)--; (*num_left_bns)--;
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->omt_size(); *num_left_les = BLB_DATA(node, *num_left_bns - 1)->num_klpairs();
} else { } else {
// we are trying to split a leaf with only one // we are trying to split a leaf with only one
// leafentry in it // leafentry in it
...@@ -754,7 +754,8 @@ move_leafentries( ...@@ -754,7 +754,8 @@ move_leafentries(
) )
//Effect: move leafentries in the range [lbi, upe) from src_omt to newly created dest_omt //Effect: move leafentries in the range [lbi, upe) from src_omt to newly created dest_omt
{ {
src_bn->data_buffer.move_leafentries_to(&dest_bn->data_buffer, lbi, ube); invariant(ube == src_bn->data_buffer.num_klpairs());
src_bn->data_buffer.split_klpairs(&dest_bn->data_buffer, lbi);
} }
static void ftnode_finalize_split(FTNODE node, FTNODE B, MSN max_msn_applied_to_node) { static void ftnode_finalize_split(FTNODE node, FTNODE B, MSN max_msn_applied_to_node) {
...@@ -851,7 +852,7 @@ ftleaf_split( ...@@ -851,7 +852,7 @@ ftleaf_split(
ftleaf_get_split_loc(node, split_mode, &num_left_bns, &num_left_les); ftleaf_get_split_loc(node, split_mode, &num_left_bns, &num_left_les);
{ {
// did we split right on the boundary between basement nodes? // did we split right on the boundary between basement nodes?
const bool split_on_boundary = (num_left_les == 0) || (num_left_les == (int) BLB_DATA(node, num_left_bns - 1)->omt_size()); const bool split_on_boundary = (num_left_les == 0) || (num_left_les == (int) BLB_DATA(node, num_left_bns - 1)->num_klpairs());
// Now we know where we are going to break it // Now we know where we are going to break it
// the two nodes will have a total of n_children+1 basement nodes // the two nodes will have a total of n_children+1 basement nodes
// and n_children-1 pivots // and n_children-1 pivots
...@@ -912,7 +913,7 @@ ftleaf_split( ...@@ -912,7 +913,7 @@ ftleaf_split(
move_leafentries(BLB(B, curr_dest_bn_index), move_leafentries(BLB(B, curr_dest_bn_index),
BLB(node, curr_src_bn_index), BLB(node, curr_src_bn_index),
num_left_les, // first row to be moved to B num_left_les, // first row to be moved to B
BLB_DATA(node, curr_src_bn_index)->omt_size() // number of rows in basement to be split BLB_DATA(node, curr_src_bn_index)->num_klpairs() // number of rows in basement to be split
); );
BLB_MAX_MSN_APPLIED(B, curr_dest_bn_index) = BLB_MAX_MSN_APPLIED(node, curr_src_bn_index); BLB_MAX_MSN_APPLIED(B, curr_dest_bn_index) = BLB_MAX_MSN_APPLIED(node, curr_src_bn_index);
curr_dest_bn_index++; curr_dest_bn_index++;
...@@ -954,10 +955,10 @@ ftleaf_split( ...@@ -954,10 +955,10 @@ ftleaf_split(
toku_destroy_dbt(&node->childkeys[num_left_bns - 1]); toku_destroy_dbt(&node->childkeys[num_left_bns - 1]);
} }
} else if (splitk) { } else if (splitk) {
BN_DATA bd = BLB_DATA(node, num_left_bns - 1); bn_data* bd = BLB_DATA(node, num_left_bns - 1);
uint32_t keylen; uint32_t keylen;
void *key; void *key;
int rr = bd->fetch_le_key_and_len(bd->omt_size() - 1, &keylen, &key); int rr = bd->fetch_key_and_len(bd->num_klpairs() - 1, &keylen, &key);
invariant_zero(rr); invariant_zero(rr);
toku_memdup_dbt(splitk, key, keylen); toku_memdup_dbt(splitk, key, keylen);
} }
...@@ -1168,11 +1169,11 @@ merge_leaf_nodes(FTNODE a, FTNODE b) ...@@ -1168,11 +1169,11 @@ merge_leaf_nodes(FTNODE a, FTNODE b)
a->dirty = 1; a->dirty = 1;
b->dirty = 1; b->dirty = 1;
BN_DATA a_last_bd = BLB_DATA(a, a->n_children-1); bn_data* a_last_bd = BLB_DATA(a, a->n_children-1);
// this bool states if the last basement node in a has any items or not // this bool states if the last basement node in a has any items or not
// If it does, then it stays in the merge. If it does not, the last basement node // If it does, then it stays in the merge. If it does not, the last basement node
// of a gets eliminated because we do not have a pivot to store for it (because it has no elements) // of a gets eliminated because we do not have a pivot to store for it (because it has no elements)
const bool a_has_tail = a_last_bd->omt_size() > 0; const bool a_has_tail = a_last_bd->num_klpairs() > 0;
// move each basement node from b to a // move each basement node from b to a
// move the pivots, adding one of what used to be max(a) // move the pivots, adding one of what used to be max(a)
...@@ -1199,7 +1200,7 @@ merge_leaf_nodes(FTNODE a, FTNODE b) ...@@ -1199,7 +1200,7 @@ merge_leaf_nodes(FTNODE a, FTNODE b)
if (a_has_tail) { if (a_has_tail) {
uint32_t keylen; uint32_t keylen;
void *key; void *key;
int rr = a_last_bd->fetch_le_key_and_len(a_last_bd->omt_size() - 1, &keylen, &key); int rr = a_last_bd->fetch_key_and_len(a_last_bd->num_klpairs() - 1, &keylen, &key);
invariant_zero(rr); invariant_zero(rr);
toku_memdup_dbt(&a->childkeys[a->n_children-1], key, keylen); toku_memdup_dbt(&a->childkeys[a->n_children-1], key, keylen);
a->totalchildkeylens += keylen; a->totalchildkeylens += keylen;
......
...@@ -1184,6 +1184,8 @@ typedef enum { ...@@ -1184,6 +1184,8 @@ typedef enum {
FT_PRO_NUM_STOP_LOCK_CHILD, FT_PRO_NUM_STOP_LOCK_CHILD,
FT_PRO_NUM_STOP_CHILD_INMEM, FT_PRO_NUM_STOP_CHILD_INMEM,
FT_PRO_NUM_DIDNT_WANT_PROMOTE, FT_PRO_NUM_DIDNT_WANT_PROMOTE,
FT_BASEMENT_DESERIALIZE_FIXED_KEYSIZE, // how many basement nodes were deserialized with a fixed keysize
FT_BASEMENT_DESERIALIZE_VARIABLE_KEYSIZE, // how many basement nodes were deserialized with a variable keysize
FT_STATUS_NUM_ROWS FT_STATUS_NUM_ROWS
} ft_status_entry; } ft_status_entry;
......
This diff is collapsed.
...@@ -358,4 +358,6 @@ extern bool garbage_collection_debug; ...@@ -358,4 +358,6 @@ extern bool garbage_collection_debug;
void toku_ft_set_direct_io(bool direct_io_on); void toku_ft_set_direct_io(bool direct_io_on);
void toku_ft_set_compress_buffers_before_eviction(bool compress_buffers); void toku_ft_set_compress_buffers_before_eviction(bool compress_buffers);
void toku_note_deserialized_basement_node(bool fixed_key_size);
#endif #endif
...@@ -462,6 +462,7 @@ serialize_ft_min_size (uint32_t version) { ...@@ -462,6 +462,7 @@ serialize_ft_min_size (uint32_t version) {
size_t size = 0; size_t size = 0;
switch(version) { switch(version) {
case FT_LAYOUT_VERSION_26:
case FT_LAYOUT_VERSION_25: case FT_LAYOUT_VERSION_25:
case FT_LAYOUT_VERSION_24: case FT_LAYOUT_VERSION_24:
case FT_LAYOUT_VERSION_23: case FT_LAYOUT_VERSION_23:
......
...@@ -152,7 +152,7 @@ verify_msg_in_child_buffer(FT_HANDLE ft_handle, enum ft_msg_type type, MSN msn, ...@@ -152,7 +152,7 @@ verify_msg_in_child_buffer(FT_HANDLE ft_handle, enum ft_msg_type type, MSN msn,
static DBT static DBT
get_ith_key_dbt (BASEMENTNODE bn, int i) { get_ith_key_dbt (BASEMENTNODE bn, int i) {
DBT kdbt; DBT kdbt;
int r = bn->data_buffer.fetch_le_key_and_len(i, &kdbt.size, &kdbt.data); int r = bn->data_buffer.fetch_key_and_len(i, &kdbt.size, &kdbt.data);
invariant_zero(r); // this is a bad failure if it happens. invariant_zero(r); // this is a bad failure if it happens.
return kdbt; return kdbt;
} }
...@@ -422,7 +422,7 @@ toku_verify_ftnode_internal(FT_HANDLE ft_handle, ...@@ -422,7 +422,7 @@ toku_verify_ftnode_internal(FT_HANDLE ft_handle,
} }
else { else {
BASEMENTNODE bn = BLB(node, i); BASEMENTNODE bn = BLB(node, i);
for (uint32_t j = 0; j < bn->data_buffer.omt_size(); j++) { for (uint32_t j = 0; j < bn->data_buffer.num_klpairs(); j++) {
VERIFY_ASSERTION((rootmsn.msn >= this_msn.msn), 0, "leaf may have latest msn, but cannot be greater than root msn"); VERIFY_ASSERTION((rootmsn.msn >= this_msn.msn), 0, "leaf may have latest msn, but cannot be greater than root msn");
DBT kdbt = get_ith_key_dbt(bn, j); DBT kdbt = get_ith_key_dbt(bn, j);
if (curr_less_pivot) { if (curr_less_pivot) {
......
...@@ -1077,8 +1077,8 @@ garbage_helper(BLOCKNUM blocknum, int64_t UU(size), int64_t UU(address), void *e ...@@ -1077,8 +1077,8 @@ garbage_helper(BLOCKNUM blocknum, int64_t UU(size), int64_t UU(address), void *e
goto exit; goto exit;
} }
for (int i = 0; i < node->n_children; ++i) { for (int i = 0; i < node->n_children; ++i) {
BN_DATA bd = BLB_DATA(node, i); bn_data* bd = BLB_DATA(node, i);
r = bd->omt_iterate<struct garbage_helper_extra, garbage_leafentry_helper>(info); r = bd->iterate<struct garbage_helper_extra, garbage_leafentry_helper>(info);
if (r != 0) { if (r != 0) {
goto exit; goto exit;
} }
......
...@@ -119,6 +119,7 @@ enum ft_layout_version_e { ...@@ -119,6 +119,7 @@ enum ft_layout_version_e {
FT_LAYOUT_VERSION_23 = 23, // Ming: Fix upgrade path #5902 FT_LAYOUT_VERSION_23 = 23, // Ming: Fix upgrade path #5902
FT_LAYOUT_VERSION_24 = 24, // Riddler: change logentries that log transactions to store TXNID_PAIRs instead of TXNIDs FT_LAYOUT_VERSION_24 = 24, // Riddler: change logentries that log transactions to store TXNID_PAIRs instead of TXNIDs
FT_LAYOUT_VERSION_25 = 25, // SecretSquirrel: ROLLBACK_LOG_NODES (on disk and in memory) now just use blocknum (instead of blocknum + hash) to point to other log nodes. same for xstillopen log entry FT_LAYOUT_VERSION_25 = 25, // SecretSquirrel: ROLLBACK_LOG_NODES (on disk and in memory) now just use blocknum (instead of blocknum + hash) to point to other log nodes. same for xstillopen log entry
FT_LAYOUT_VERSION_26 = 26, // Hojo: basements store key/vals separately on disk for fixed klpair length BNs
FT_NEXT_VERSION, // the version after the current version FT_NEXT_VERSION, // the version after the current version
FT_LAYOUT_VERSION = FT_NEXT_VERSION-1, // A hack so I don't have to change this line. FT_LAYOUT_VERSION = FT_NEXT_VERSION-1, // A hack so I don't have to change this line.
FT_LAYOUT_MIN_SUPPORTED_VERSION = FT_LAYOUT_VERSION_13, // Minimum version supported FT_LAYOUT_MIN_SUPPORTED_VERSION = FT_LAYOUT_VERSION_13, // Minimum version supported
......
...@@ -284,31 +284,6 @@ serialize_node_header(FTNODE node, FTNODE_DISK_DATA ndd, struct wbuf *wbuf) { ...@@ -284,31 +284,6 @@ serialize_node_header(FTNODE node, FTNODE_DISK_DATA ndd, struct wbuf *wbuf) {
invariant(wbuf->ndone == wbuf->size); invariant(wbuf->ndone == wbuf->size);
} }
static int
wbufwriteleafentry(const void* key, const uint32_t keylen, const LEAFENTRY &le, const uint32_t UU(idx), struct wbuf * const wb) {
// need to pack the leafentry as it was in versions
// where the key was integrated into it
uint32_t begin_spot UU() = wb->ndone;
uint32_t le_disk_size = leafentry_disksize(le);
wbuf_nocrc_uint8_t(wb, le->type);
wbuf_nocrc_uint32_t(wb, keylen);
if (le->type == LE_CLEAN) {
wbuf_nocrc_uint32_t(wb, le->u.clean.vallen);
wbuf_nocrc_literal_bytes(wb, key, keylen);
wbuf_nocrc_literal_bytes(wb, le->u.clean.val, le->u.clean.vallen);
}
else {
paranoid_invariant(le->type == LE_MVCC);
wbuf_nocrc_uint32_t(wb, le->u.mvcc.num_cxrs);
wbuf_nocrc_uint8_t(wb, le->u.mvcc.num_pxrs);
wbuf_nocrc_literal_bytes(wb, key, keylen);
wbuf_nocrc_literal_bytes(wb, le->u.mvcc.xrs, le_disk_size - (1 + 4 + 1));
}
uint32_t end_spot UU() = wb->ndone;
paranoid_invariant((end_spot - begin_spot) == keylen + sizeof(keylen) + le_disk_size);
return 0;
}
static uint32_t static uint32_t
serialize_ftnode_partition_size (FTNODE node, int i) serialize_ftnode_partition_size (FTNODE node, int i)
{ {
...@@ -320,14 +295,14 @@ serialize_ftnode_partition_size (FTNODE node, int i) ...@@ -320,14 +295,14 @@ serialize_ftnode_partition_size (FTNODE node, int i)
result += toku_bnc_nbytesinbuf(BNC(node, i)); result += toku_bnc_nbytesinbuf(BNC(node, i));
} }
else { else {
result += 4; // n_entries in buffer table result += 4 + bn_data::HEADER_LENGTH; // n_entries in buffer table + basement header
result += BLB_NBYTESINDATA(node, i); result += BLB_NBYTESINDATA(node, i);
} }
result += 4; // checksum result += 4; // checksum
return result; return result;
} }
#define FTNODE_PARTITION_OMT_LEAVES 0xaa #define FTNODE_PARTITION_DMT_LEAVES 0xaa
#define FTNODE_PARTITION_FIFO_MSG 0xbb #define FTNODE_PARTITION_FIFO_MSG 0xbb
static void static void
...@@ -374,16 +349,13 @@ serialize_ftnode_partition(FTNODE node, int i, struct sub_block *sb) { ...@@ -374,16 +349,13 @@ serialize_ftnode_partition(FTNODE node, int i, struct sub_block *sb) {
serialize_nonleaf_childinfo(BNC(node, i), &wb); serialize_nonleaf_childinfo(BNC(node, i), &wb);
} }
else { else {
unsigned char ch = FTNODE_PARTITION_OMT_LEAVES; unsigned char ch = FTNODE_PARTITION_DMT_LEAVES;
BN_DATA bd = BLB_DATA(node, i); bn_data* bd = BLB_DATA(node, i);
wbuf_nocrc_char(&wb, ch); wbuf_nocrc_char(&wb, ch);
wbuf_nocrc_uint(&wb, bd->omt_size()); wbuf_nocrc_uint(&wb, bd->num_klpairs());
// bd->serialize_to_wbuf(&wb);
// iterate over leafentries and place them into the buffer
//
bd->omt_iterate<struct wbuf, wbufwriteleafentry>(&wb);
} }
uint32_t end_to_end_checksum = x1764_memory(sb->uncompressed_ptr, wbuf_get_woffset(&wb)); uint32_t end_to_end_checksum = x1764_memory(sb->uncompressed_ptr, wbuf_get_woffset(&wb));
wbuf_nocrc_int(&wb, end_to_end_checksum); wbuf_nocrc_int(&wb, end_to_end_checksum);
...@@ -546,7 +518,7 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize) ...@@ -546,7 +518,7 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
// Count number of leaf entries in this leaf (num_le). // Count number of leaf entries in this leaf (num_le).
uint32_t num_le = 0; uint32_t num_le = 0;
for (uint32_t i = 0; i < num_orig_basements; i++) { for (uint32_t i = 0; i < num_orig_basements; i++) {
num_le += BLB_DATA(node, i)->omt_size(); num_le += BLB_DATA(node, i)->num_klpairs();
} }
uint32_t num_alloc = num_le ? num_le : 1; // simplify logic below by always having at least one entry per array uint32_t num_alloc = num_le ? num_le : 1; // simplify logic below by always having at least one entry per array
...@@ -571,10 +543,10 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize) ...@@ -571,10 +543,10 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
uint32_t curr_le = 0; uint32_t curr_le = 0;
for (uint32_t i = 0; i < num_orig_basements; i++) { for (uint32_t i = 0; i < num_orig_basements; i++) {
BN_DATA bd = BLB_DATA(node, i); bn_data* bd = BLB_DATA(node, i);
struct array_info ai {.offset = curr_le, .le_array = leafpointers, .key_sizes_array = key_sizes, .key_ptr_array = key_pointers }; struct array_info ai {.offset = curr_le, .le_array = leafpointers, .key_sizes_array = key_sizes, .key_ptr_array = key_pointers };
bd->omt_iterate<array_info, array_item>(&ai); bd->iterate<array_info, array_item>(&ai);
curr_le += bd->omt_size(); curr_le += bd->num_klpairs();
} }
// Create an array that will store indexes of new pivots. // Create an array that will store indexes of new pivots.
...@@ -592,9 +564,14 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize) ...@@ -592,9 +564,14 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
// Create an array that will store the size of each basement. // Create an array that will store the size of each basement.
// This is the sum of the leaf sizes of all the leaves in that basement. // This is the sum of the leaf sizes of all the leaves in that basement.
// We don't know how many basements there will be, so we use num_le as the upper bound. // We don't know how many basements there will be, so we use num_le as the upper bound.
toku::scoped_malloc bn_sizes_buf(sizeof(size_t) * num_alloc);
size_t *bn_sizes = reinterpret_cast<size_t *>(bn_sizes_buf.get()); // Sum of all le sizes in a single basement
bn_sizes[0] = 0; toku::scoped_calloc bn_le_sizes_buf(sizeof(size_t) * num_alloc);
size_t *bn_le_sizes = reinterpret_cast<size_t *>(bn_le_sizes_buf.get());
// Sum of all key sizes in a single basement
toku::scoped_calloc bn_key_sizes_buf(sizeof(size_t) * num_alloc);
size_t *bn_key_sizes = reinterpret_cast<size_t *>(bn_key_sizes_buf.get());
// TODO 4050: All these arrays should be combined into a single array of some bn_info struct (pivot, msize, num_les). // TODO 4050: All these arrays should be combined into a single array of some bn_info struct (pivot, msize, num_les).
// Each entry is the number of leafentries in this basement. (Again, num_le is overkill upper baound.) // Each entry is the number of leafentries in this basement. (Again, num_le is overkill upper baound.)
...@@ -611,7 +588,7 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize) ...@@ -611,7 +588,7 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
for (uint32_t i = 0; i < num_le; i++) { for (uint32_t i = 0; i < num_le; i++) {
uint32_t curr_le_size = leafentry_disksize((LEAFENTRY) leafpointers[i]); uint32_t curr_le_size = leafentry_disksize((LEAFENTRY) leafpointers[i]);
le_sizes[i] = curr_le_size; le_sizes[i] = curr_le_size;
if ((bn_size_so_far + curr_le_size > basementnodesize) && (num_le_in_curr_bn != 0)) { if ((bn_size_so_far + curr_le_size + sizeof(uint32_t) + key_sizes[i] > basementnodesize) && (num_le_in_curr_bn != 0)) {
// cap off the current basement node to end with the element before i // cap off the current basement node to end with the element before i
new_pivots[curr_pivot] = i-1; new_pivots[curr_pivot] = i-1;
curr_pivot++; curr_pivot++;
...@@ -620,8 +597,9 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize) ...@@ -620,8 +597,9 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
} }
num_le_in_curr_bn++; num_le_in_curr_bn++;
num_les_this_bn[curr_pivot] = num_le_in_curr_bn; num_les_this_bn[curr_pivot] = num_le_in_curr_bn;
bn_le_sizes[curr_pivot] += curr_le_size;
bn_key_sizes[curr_pivot] += sizeof(uint32_t) + key_sizes[i]; // uint32_t le_offset
bn_size_so_far += curr_le_size + sizeof(uint32_t) + key_sizes[i]; bn_size_so_far += curr_le_size + sizeof(uint32_t) + key_sizes[i];
bn_sizes[curr_pivot] = bn_size_so_far;
} }
// curr_pivot is now the total number of pivot keys in the leaf node // curr_pivot is now the total number of pivot keys in the leaf node
int num_pivots = curr_pivot; int num_pivots = curr_pivot;
...@@ -688,17 +666,15 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize) ...@@ -688,17 +666,15 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
uint32_t num_les_to_copy = num_les_this_bn[i]; uint32_t num_les_to_copy = num_les_this_bn[i];
invariant(num_les_to_copy == num_in_bn); invariant(num_les_to_copy == num_in_bn);
// construct mempool for this basement bn_data* bd = BLB_DATA(node, i);
size_t size_this_bn = bn_sizes[i]; bd->set_contents_as_clone_of_sorted_array(
BN_DATA bd = BLB_DATA(node, i);
bd->replace_contents_with_clone_of_sorted_array(
num_les_to_copy, num_les_to_copy,
&key_pointers[baseindex_this_bn], &key_pointers[baseindex_this_bn],
&key_sizes[baseindex_this_bn], &key_sizes[baseindex_this_bn],
&leafpointers[baseindex_this_bn], &leafpointers[baseindex_this_bn],
&le_sizes[baseindex_this_bn], &le_sizes[baseindex_this_bn],
size_this_bn bn_key_sizes[i], // Total key sizes
bn_le_sizes[i] // total le sizes
); );
BP_STATE(node,i) = PT_AVAIL; BP_STATE(node,i) = PT_AVAIL;
...@@ -1541,15 +1517,14 @@ deserialize_ftnode_partition( ...@@ -1541,15 +1517,14 @@ deserialize_ftnode_partition(
BP_WORKDONE(node, childnum) = 0; BP_WORKDONE(node, childnum) = 0;
} }
else { else {
assert(ch == FTNODE_PARTITION_OMT_LEAVES); assert(ch == FTNODE_PARTITION_DMT_LEAVES);
BLB_SEQINSERT(node, childnum) = 0; BLB_SEQINSERT(node, childnum) = 0;
uint32_t num_entries = rbuf_int(&rb); uint32_t num_entries = rbuf_int(&rb);
// we are now at the first byte of first leafentry // we are now at the first byte of first leafentry
data_size -= rb.ndone; // remaining bytes of leafentry data data_size -= rb.ndone; // remaining bytes of leafentry data
BASEMENTNODE bn = BLB(node, childnum); BASEMENTNODE bn = BLB(node, childnum);
bn->data_buffer.initialize_from_data(num_entries, &rb.buf[rb.ndone], data_size); bn->data_buffer.deserialize_from_rbuf(num_entries, &rb, data_size, node->layout_version_read_from_disk);
rb.ndone += data_size;
} }
assert(rb.ndone == rb.size); assert(rb.ndone == rb.size);
exit: exit:
...@@ -2086,13 +2061,18 @@ deserialize_and_upgrade_leaf_node(FTNODE node, ...@@ -2086,13 +2061,18 @@ deserialize_and_upgrade_leaf_node(FTNODE node,
assert_zero(r); assert_zero(r);
// Copy the pointer value straight into the OMT // Copy the pointer value straight into the OMT
LEAFENTRY new_le_in_bn = nullptr; LEAFENTRY new_le_in_bn = nullptr;
void *maybe_free;
bn->data_buffer.get_space_for_insert( bn->data_buffer.get_space_for_insert(
i, i,
key, key,
keylen, keylen,
new_le_size, new_le_size,
&new_le_in_bn &new_le_in_bn,
&maybe_free
); );
if (maybe_free) {
toku_free(maybe_free);
}
memcpy(new_le_in_bn, new_le, new_le_size); memcpy(new_le_in_bn, new_le, new_le_size);
toku_free(new_le); toku_free(new_le);
} }
...@@ -2101,8 +2081,7 @@ deserialize_and_upgrade_leaf_node(FTNODE node, ...@@ -2101,8 +2081,7 @@ deserialize_and_upgrade_leaf_node(FTNODE node,
if (has_end_to_end_checksum) { if (has_end_to_end_checksum) {
data_size -= sizeof(uint32_t); data_size -= sizeof(uint32_t);
} }
bn->data_buffer.initialize_from_data(n_in_buf, &rb->buf[rb->ndone], data_size); bn->data_buffer.deserialize_from_rbuf(n_in_buf, rb, data_size, node->layout_version_read_from_disk);
rb->ndone += data_size;
} }
// Whatever this is must be less than the MSNs of every message above // Whatever this is must be less than the MSNs of every message above
......
...@@ -2917,7 +2917,7 @@ static void add_pair_to_leafnode (struct leaf_buf *lbuf, unsigned char *key, int ...@@ -2917,7 +2917,7 @@ static void add_pair_to_leafnode (struct leaf_buf *lbuf, unsigned char *key, int
// #3588 TODO just make a clean ule and append it to the omt // #3588 TODO just make a clean ule and append it to the omt
// #3588 TODO can do the rebalancing here and avoid a lot of work later // #3588 TODO can do the rebalancing here and avoid a lot of work later
FTNODE leafnode = lbuf->node; FTNODE leafnode = lbuf->node;
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
DBT thekey = { .data = key, .size = (uint32_t) keylen }; DBT thekey = { .data = key, .size = (uint32_t) keylen };
DBT theval = { .data = val, .size = (uint32_t) vallen }; DBT theval = { .data = val, .size = (uint32_t) vallen };
FT_MSG_S msg = { .type = FT_INSERT, FT_MSG_S msg = { .type = FT_INSERT,
......
...@@ -230,7 +230,7 @@ typedef struct cachetable *CACHETABLE; ...@@ -230,7 +230,7 @@ typedef struct cachetable *CACHETABLE;
typedef struct cachefile *CACHEFILE; typedef struct cachefile *CACHEFILE;
typedef struct ctpair *PAIR; typedef struct ctpair *PAIR;
typedef class checkpointer *CHECKPOINTER; typedef class checkpointer *CHECKPOINTER;
typedef class bn_data *BN_DATA; class bn_data;
/* tree command types */ /* tree command types */
enum ft_msg_type { enum ft_msg_type {
......
...@@ -98,6 +98,7 @@ struct memarena { ...@@ -98,6 +98,7 @@ struct memarena {
char *buf; char *buf;
size_t buf_used, buf_size; size_t buf_used, buf_size;
size_t size_of_other_bufs; // the buf_size of all the other bufs. size_t size_of_other_bufs; // the buf_size of all the other bufs.
size_t footprint_of_other_bufs; // the footprint of all the other bufs.
char **other_bufs; char **other_bufs;
int n_other_bufs; int n_other_bufs;
}; };
...@@ -108,6 +109,7 @@ MEMARENA memarena_create_presized (size_t initial_size) { ...@@ -108,6 +109,7 @@ MEMARENA memarena_create_presized (size_t initial_size) {
result->buf_used = 0; result->buf_used = 0;
result->other_bufs = NULL; result->other_bufs = NULL;
result->size_of_other_bufs = 0; result->size_of_other_bufs = 0;
result->footprint_of_other_bufs = 0;
result->n_other_bufs = 0; result->n_other_bufs = 0;
XMALLOC_N(result->buf_size, result->buf); XMALLOC_N(result->buf_size, result->buf);
return result; return result;
...@@ -128,6 +130,7 @@ void memarena_clear (MEMARENA ma) { ...@@ -128,6 +130,7 @@ void memarena_clear (MEMARENA ma) {
// But reuse the main buffer // But reuse the main buffer
ma->buf_used = 0; ma->buf_used = 0;
ma->size_of_other_bufs = 0; ma->size_of_other_bufs = 0;
ma->footprint_of_other_bufs = 0;
} }
static size_t static size_t
...@@ -151,6 +154,7 @@ void* malloc_in_memarena (MEMARENA ma, size_t size) { ...@@ -151,6 +154,7 @@ void* malloc_in_memarena (MEMARENA ma, size_t size) {
ma->other_bufs[old_n]=ma->buf; ma->other_bufs[old_n]=ma->buf;
ma->n_other_bufs = old_n+1; ma->n_other_bufs = old_n+1;
ma->size_of_other_bufs += ma->buf_size; ma->size_of_other_bufs += ma->buf_size;
ma->footprint_of_other_bufs += toku_memory_footprint(ma->buf, ma->buf_used);
} }
// Make a new one // Make a new one
{ {
...@@ -217,7 +221,9 @@ void memarena_move_buffers(MEMARENA dest, MEMARENA source) { ...@@ -217,7 +221,9 @@ void memarena_move_buffers(MEMARENA dest, MEMARENA source) {
#endif #endif
dest ->size_of_other_bufs += source->size_of_other_bufs + source->buf_size; dest ->size_of_other_bufs += source->size_of_other_bufs + source->buf_size;
dest ->footprint_of_other_bufs += source->footprint_of_other_bufs + toku_memory_footprint(source->buf, source->buf_used);
source->size_of_other_bufs = 0; source->size_of_other_bufs = 0;
source->footprint_of_other_bufs = 0;
assert(other_bufs); assert(other_bufs);
dest->other_bufs = other_bufs; dest->other_bufs = other_bufs;
...@@ -247,3 +253,11 @@ memarena_total_size_in_use (MEMARENA m) ...@@ -247,3 +253,11 @@ memarena_total_size_in_use (MEMARENA m)
{ {
return m->size_of_other_bufs + m->buf_used; return m->size_of_other_bufs + m->buf_used;
} }
size_t
memarena_total_footprint (MEMARENA m)
{
return m->footprint_of_other_bufs + toku_memory_footprint(m->buf, m->buf_used) +
sizeof(*m) +
m->n_other_bufs * sizeof(*m->other_bufs);
}
...@@ -129,5 +129,6 @@ size_t memarena_total_memory_size (MEMARENA); ...@@ -129,5 +129,6 @@ size_t memarena_total_memory_size (MEMARENA);
size_t memarena_total_size_in_use (MEMARENA); size_t memarena_total_size_in_use (MEMARENA);
size_t memarena_total_footprint (MEMARENA);
#endif #endif
...@@ -146,7 +146,7 @@ PAIR_ATTR ...@@ -146,7 +146,7 @@ PAIR_ATTR
rollback_memory_size(ROLLBACK_LOG_NODE log) { rollback_memory_size(ROLLBACK_LOG_NODE log) {
size_t size = sizeof(*log); size_t size = sizeof(*log);
if (log->rollentry_arena) { if (log->rollentry_arena) {
size += memarena_total_memory_size(log->rollentry_arena); size += memarena_total_footprint(log->rollentry_arena);
} }
return make_rollback_pair_attr(size); return make_rollback_pair_attr(size);
} }
......
This diff is collapsed.
This diff is collapsed.
...@@ -115,13 +115,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keylen, const char ...@@ -115,13 +115,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keylen, const char
{ {
LEAFENTRY r = NULL; LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(vallen); uint32_t size_needed = LE_CLEAN_MEMSIZE(vallen);
void *maybe_free = nullptr;
bn->get_space_for_insert( bn->get_space_for_insert(
idx, idx,
key, key,
keylen, keylen,
size_needed, size_needed,
&r &r,
&maybe_free
); );
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r); resource_assert(r);
r->type = LE_CLEAN; r->type = LE_CLEAN;
r->u.clean.vallen = vallen; r->u.clean.vallen = vallen;
......
...@@ -105,13 +105,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, char *key, int keylen, char *val, int va ...@@ -105,13 +105,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, char *key, int keylen, char *val, int va
{ {
LEAFENTRY r = NULL; LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(vallen); uint32_t size_needed = LE_CLEAN_MEMSIZE(vallen);
void *maybe_free = nullptr;
bn->get_space_for_insert( bn->get_space_for_insert(
idx, idx,
key, key,
keylen, keylen,
size_needed, size_needed,
&r &r,
&maybe_free
); );
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r); resource_assert(r);
r->type = LE_CLEAN; r->type = LE_CLEAN;
r->u.clean.vallen = vallen; r->u.clean.vallen = vallen;
...@@ -127,7 +132,7 @@ long_key_cmp(DB *UU(e), const DBT *a, const DBT *b) ...@@ -127,7 +132,7 @@ long_key_cmp(DB *UU(e), const DBT *a, const DBT *b)
} }
static void static void
test_serialize_leaf(int valsize, int nelts, double entropy) { test_serialize_leaf(int valsize, int nelts, double entropy, int ser_runs, int deser_runs) {
// struct ft_handle source_ft; // struct ft_handle source_ft;
struct ftnode *sn, *dn; struct ftnode *sn, *dn;
...@@ -214,47 +219,76 @@ test_serialize_leaf(int valsize, int nelts, double entropy) { ...@@ -214,47 +219,76 @@ test_serialize_leaf(int valsize, int nelts, double entropy) {
assert(size == 100); assert(size == 100);
} }
struct timeval total_start;
struct timeval total_end;
total_start.tv_sec = total_start.tv_usec = 0;
total_end.tv_sec = total_end.tv_usec = 0;
struct timeval t[2]; struct timeval t[2];
gettimeofday(&t[0], NULL);
FTNODE_DISK_DATA ndd = NULL; FTNODE_DISK_DATA ndd = NULL;
for (int i = 0; i < ser_runs; i++) {
gettimeofday(&t[0], NULL);
ndd = NULL;
sn->dirty = 1;
r = toku_serialize_ftnode_to(fd, make_blocknum(20), sn, &ndd, true, ft->ft, false); r = toku_serialize_ftnode_to(fd, make_blocknum(20), sn, &ndd, true, ft->ft, false);
assert(r==0); assert(r==0);
gettimeofday(&t[1], NULL); gettimeofday(&t[1], NULL);
total_start.tv_sec += t[0].tv_sec;
total_start.tv_usec += t[0].tv_usec;
total_end.tv_sec += t[1].tv_sec;
total_end.tv_usec += t[1].tv_usec;
toku_free(ndd);
}
double dt; double dt;
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC); dt = (total_end.tv_sec - total_start.tv_sec) + ((total_end.tv_usec - total_start.tv_usec) / USECS_PER_SEC);
printf("serialize leaf: %0.05lf\n", dt); dt *= 1000;
dt /= ser_runs;
printf("serialize leaf(ms): %0.05lf (average of %d runs)\n", dt, ser_runs);
//reset
total_start.tv_sec = total_start.tv_usec = 0;
total_end.tv_sec = total_end.tv_usec = 0;
struct ftnode_fetch_extra bfe; struct ftnode_fetch_extra bfe;
for (int i = 0; i < deser_runs; i++) {
fill_bfe_for_full_read(&bfe, ft_h); fill_bfe_for_full_read(&bfe, ft_h);
gettimeofday(&t[0], NULL); gettimeofday(&t[0], NULL);
FTNODE_DISK_DATA ndd2 = NULL; FTNODE_DISK_DATA ndd2 = NULL;
r = toku_deserialize_ftnode_from(fd, make_blocknum(20), 0/*pass zero for hash*/, &dn, &ndd2, &bfe); r = toku_deserialize_ftnode_from(fd, make_blocknum(20), 0/*pass zero for hash*/, &dn, &ndd2, &bfe);
assert(r==0); assert(r==0);
gettimeofday(&t[1], NULL); gettimeofday(&t[1], NULL);
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC);
printf("deserialize leaf: %0.05lf\n", dt); total_start.tv_sec += t[0].tv_sec;
printf("io time %lf decompress time %lf deserialize time %lf\n", total_start.tv_usec += t[0].tv_usec;
tokutime_to_seconds(bfe.io_time), total_end.tv_sec += t[1].tv_sec;
tokutime_to_seconds(bfe.decompress_time), total_end.tv_usec += t[1].tv_usec;
tokutime_to_seconds(bfe.deserialize_time)
);
toku_ftnode_free(&dn); toku_ftnode_free(&dn);
toku_free(ndd2);
}
dt = (total_end.tv_sec - total_start.tv_sec) + ((total_end.tv_usec - total_start.tv_usec) / USECS_PER_SEC);
dt *= 1000;
dt /= deser_runs;
printf("deserialize leaf(ms): %0.05lf (average of %d runs)\n", dt, deser_runs);
printf("io time(ms) %lf decompress time(ms) %lf deserialize time(ms) %lf (average of %d runs)\n",
tokutime_to_seconds(bfe.io_time)*1000,
tokutime_to_seconds(bfe.decompress_time)*1000,
tokutime_to_seconds(bfe.deserialize_time)*1000,
deser_runs
);
toku_ftnode_free(&sn); toku_ftnode_free(&sn);
toku_block_free(ft_h->blocktable, BLOCK_ALLOCATOR_TOTAL_HEADER_RESERVE); toku_block_free(ft_h->blocktable, BLOCK_ALLOCATOR_TOTAL_HEADER_RESERVE);
toku_blocktable_destroy(&ft_h->blocktable); toku_blocktable_destroy(&ft_h->blocktable);
toku_free(ft_h->h); toku_free(ft_h->h);
toku_free(ft_h); toku_free(ft_h);
toku_free(ft); toku_free(ft_h);
toku_free(ndd);
toku_free(ndd2);
r = close(fd); assert(r != -1); r = close(fd); assert(r != -1);
} }
static void static void
test_serialize_nonleaf(int valsize, int nelts, double entropy) { test_serialize_nonleaf(int valsize, int nelts, double entropy, int ser_runs, int deser_runs) {
// struct ft_handle source_ft; // struct ft_handle source_ft;
struct ftnode sn, *dn; struct ftnode sn, *dn;
...@@ -353,7 +387,8 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) { ...@@ -353,7 +387,8 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) {
gettimeofday(&t[1], NULL); gettimeofday(&t[1], NULL);
double dt; double dt;
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC); dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC);
printf("serialize nonleaf: %0.05lf\n", dt); dt *= 1000;
printf("serialize nonleaf(ms): %0.05lf (IGNORED RUNS=%d)\n", dt, ser_runs);
struct ftnode_fetch_extra bfe; struct ftnode_fetch_extra bfe;
fill_bfe_for_full_read(&bfe, ft_h); fill_bfe_for_full_read(&bfe, ft_h);
...@@ -363,11 +398,13 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) { ...@@ -363,11 +398,13 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) {
assert(r==0); assert(r==0);
gettimeofday(&t[1], NULL); gettimeofday(&t[1], NULL);
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC); dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC);
printf("deserialize nonleaf: %0.05lf\n", dt); dt *= 1000;
printf("io time %lf decompress time %lf deserialize time %lf\n", printf("deserialize nonleaf(ms): %0.05lf (IGNORED RUNS=%d)\n", dt, deser_runs);
tokutime_to_seconds(bfe.io_time), printf("io time(ms) %lf decompress time(ms) %lf deserialize time(ms) %lf (IGNORED RUNS=%d)\n",
tokutime_to_seconds(bfe.decompress_time), tokutime_to_seconds(bfe.io_time)*1000,
tokutime_to_seconds(bfe.deserialize_time) tokutime_to_seconds(bfe.decompress_time)*1000,
tokutime_to_seconds(bfe.deserialize_time)*1000,
deser_runs
); );
toku_ftnode_free(&dn); toku_ftnode_free(&dn);
...@@ -394,19 +431,32 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) { ...@@ -394,19 +431,32 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) {
int int
test_main (int argc __attribute__((__unused__)), const char *argv[] __attribute__((__unused__))) { test_main (int argc __attribute__((__unused__)), const char *argv[] __attribute__((__unused__))) {
long valsize, nelts; const int DEFAULT_RUNS = 5;
long valsize, nelts, ser_runs = DEFAULT_RUNS, deser_runs = DEFAULT_RUNS;
double entropy = 0.3; double entropy = 0.3;
if (argc != 3) { if (argc != 3 && argc != 5) {
fprintf(stderr, "Usage: %s <valsize> <nelts>\n", argv[0]); fprintf(stderr, "Usage: %s <valsize> <nelts> [<serialize_runs> <deserialize_runs>]\n", argv[0]);
fprintf(stderr, "Default (and min) runs is %d\n", DEFAULT_RUNS);
return 2; return 2;
} }
valsize = strtol(argv[1], NULL, 0); valsize = strtol(argv[1], NULL, 0);
nelts = strtol(argv[2], NULL, 0); nelts = strtol(argv[2], NULL, 0);
if (argc == 5) {
ser_runs = strtol(argv[3], NULL, 0);
deser_runs = strtol(argv[4], NULL, 0);
}
if (ser_runs <= 0) {
ser_runs = DEFAULT_RUNS;
}
if (deser_runs <= 0) {
deser_runs = DEFAULT_RUNS;
}
initialize_dummymsn(); initialize_dummymsn();
test_serialize_leaf(valsize, nelts, entropy); test_serialize_leaf(valsize, nelts, entropy, ser_runs, deser_runs);
test_serialize_nonleaf(valsize, nelts, entropy); test_serialize_nonleaf(valsize, nelts, entropy, ser_runs, deser_runs);
return 0; return 0;
} }
This diff is collapsed.
...@@ -119,7 +119,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -119,7 +119,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen); DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -96,13 +96,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keysize, const cha ...@@ -96,13 +96,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keysize, const cha
{ {
LEAFENTRY r = NULL; LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize); uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize);
void *maybe_free = nullptr;
bn->get_space_for_insert( bn->get_space_for_insert(
idx, idx,
key, key,
keysize, keysize,
size_needed, size_needed,
&r &r,
&maybe_free
); );
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r); resource_assert(r);
r->type = LE_CLEAN; r->type = LE_CLEAN;
r->u.clean.vallen = valsize; r->u.clean.vallen = valsize;
...@@ -113,14 +118,19 @@ static void ...@@ -113,14 +118,19 @@ static void
le_overwrite(bn_data* bn, uint32_t idx, const char *key, int keysize, const char *val, int valsize) { le_overwrite(bn_data* bn, uint32_t idx, const char *key, int keysize, const char *val, int valsize) {
LEAFENTRY r = NULL; LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize); uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize);
void *maybe_free = nullptr;
bn->get_space_for_overwrite( bn->get_space_for_overwrite(
idx, idx,
key, key,
keysize, keysize,
size_needed, // old_le_size size_needed, // old_le_size
size_needed, size_needed,
&r &r,
&maybe_free
); );
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r); resource_assert(r);
r->type = LE_CLEAN; r->type = LE_CLEAN;
r->u.clean.vallen = valsize; r->u.clean.vallen = valsize;
......
...@@ -734,7 +734,7 @@ flush_to_leaf(FT_HANDLE t, bool make_leaf_up_to_date, bool use_flush) { ...@@ -734,7 +734,7 @@ flush_to_leaf(FT_HANDLE t, bool make_leaf_up_to_date, bool use_flush) {
int total_messages = 0; int total_messages = 0;
for (i = 0; i < 8; ++i) { for (i = 0; i < 8; ++i) {
total_messages += BLB_DATA(child, i)->omt_size(); total_messages += BLB_DATA(child, i)->num_klpairs();
} }
assert(total_messages <= num_parent_messages + num_child_messages); assert(total_messages <= num_parent_messages + num_child_messages);
...@@ -747,7 +747,7 @@ flush_to_leaf(FT_HANDLE t, bool make_leaf_up_to_date, bool use_flush) { ...@@ -747,7 +747,7 @@ flush_to_leaf(FT_HANDLE t, bool make_leaf_up_to_date, bool use_flush) {
memset(parent_messages_present, 0, sizeof parent_messages_present); memset(parent_messages_present, 0, sizeof parent_messages_present);
memset(child_messages_present, 0, sizeof child_messages_present); memset(child_messages_present, 0, sizeof child_messages_present);
for (int j = 0; j < 8; ++j) { for (int j = 0; j < 8; ++j) {
uint32_t len = BLB_DATA(child, j)->omt_size(); uint32_t len = BLB_DATA(child, j)->num_klpairs();
for (uint32_t idx = 0; idx < len; ++idx) { for (uint32_t idx = 0; idx < len; ++idx) {
LEAFENTRY le; LEAFENTRY le;
DBT keydbt, valdbt; DBT keydbt, valdbt;
...@@ -969,7 +969,7 @@ flush_to_leaf_with_keyrange(FT_HANDLE t, bool make_leaf_up_to_date) { ...@@ -969,7 +969,7 @@ flush_to_leaf_with_keyrange(FT_HANDLE t, bool make_leaf_up_to_date) {
int total_messages = 0; int total_messages = 0;
for (i = 0; i < 8; ++i) { for (i = 0; i < 8; ++i) {
total_messages += BLB_DATA(child, i)->omt_size(); total_messages += BLB_DATA(child, i)->num_klpairs();
} }
assert(total_messages <= num_parent_messages + num_child_messages); assert(total_messages <= num_parent_messages + num_child_messages);
...@@ -1145,10 +1145,10 @@ compare_apply_and_flush(FT_HANDLE t, bool make_leaf_up_to_date) { ...@@ -1145,10 +1145,10 @@ compare_apply_and_flush(FT_HANDLE t, bool make_leaf_up_to_date) {
toku_ftnode_free(&parentnode); toku_ftnode_free(&parentnode);
for (int j = 0; j < 8; ++j) { for (int j = 0; j < 8; ++j) {
BN_DATA first = BLB_DATA(child1, j); bn_data* first = BLB_DATA(child1, j);
BN_DATA second = BLB_DATA(child2, j); bn_data* second = BLB_DATA(child2, j);
uint32_t len = first->omt_size(); uint32_t len = first->num_klpairs();
assert(len == second->omt_size()); assert(len == second->num_klpairs());
for (uint32_t idx = 0; idx < len; ++idx) { for (uint32_t idx = 0; idx < len; ++idx) {
LEAFENTRY le1, le2; LEAFENTRY le1, le2;
DBT key1dbt, val1dbt, key2dbt, val2dbt; DBT key1dbt, val1dbt, key2dbt, val2dbt;
......
...@@ -352,7 +352,7 @@ doit (int state) { ...@@ -352,7 +352,7 @@ doit (int state) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1); assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
toku_pin_ftnode_with_dep_nodes( toku_pin_ftnode_with_dep_nodes(
...@@ -369,7 +369,7 @@ doit (int state) { ...@@ -369,7 +369,7 @@ doit (int state) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1); assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
} }
else if (state == ft_flush_aflter_merge || state == flt_flush_before_unpin_remove) { else if (state == ft_flush_aflter_merge || state == flt_flush_before_unpin_remove) {
...@@ -387,7 +387,7 @@ doit (int state) { ...@@ -387,7 +387,7 @@ doit (int state) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2); assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
} }
else { else {
......
...@@ -355,7 +355,7 @@ doit (int state) { ...@@ -355,7 +355,7 @@ doit (int state) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2); assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
toku_pin_ftnode( toku_pin_ftnode(
...@@ -370,10 +370,9 @@ doit (int state) { ...@@ -370,10 +370,9 @@ doit (int state) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2); assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
DBT k; DBT k;
struct check_pair pair1 = {2, "a", 0, NULL, 0}; struct check_pair pair1 = {2, "a", 0, NULL, 0};
r = toku_ft_lookup(c_ft, toku_fill_dbt(&k, "a", 2), lookup_checkf, &pair1); r = toku_ft_lookup(c_ft, toku_fill_dbt(&k, "a", 2), lookup_checkf, &pair1);
......
...@@ -338,7 +338,7 @@ doit (bool after_split) { ...@@ -338,7 +338,7 @@ doit (bool after_split) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1); assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
toku_pin_ftnode( toku_pin_ftnode(
...@@ -353,7 +353,7 @@ doit (bool after_split) { ...@@ -353,7 +353,7 @@ doit (bool after_split) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1); assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
} }
else { else {
...@@ -369,7 +369,7 @@ doit (bool after_split) { ...@@ -369,7 +369,7 @@ doit (bool after_split) {
assert(node->height == 0); assert(node->height == 0);
assert(!node->dirty); assert(!node->dirty);
assert(node->n_children == 1); assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2); assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode(c_ft->ft, node); toku_unpin_ftnode(c_ft->ft, node);
} }
......
...@@ -213,7 +213,7 @@ test_le_offsets (void) { ...@@ -213,7 +213,7 @@ test_le_offsets (void) {
static void static void
test_ule_packs_to_nothing (ULE ule) { test_ule_packs_to_nothing (ULE ule) {
LEAFENTRY le; LEAFENTRY le;
int r = le_pack(ule, NULL, 0, NULL, 0, 0, &le); int r = le_pack(ule, NULL, 0, NULL, 0, 0, &le, nullptr);
assert(r==0); assert(r==0);
assert(le==NULL); assert(le==NULL);
} }
...@@ -319,7 +319,7 @@ test_le_pack_committed (void) { ...@@ -319,7 +319,7 @@ test_le_pack_committed (void) {
size_t memsize; size_t memsize;
LEAFENTRY le; LEAFENTRY le;
int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le); int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le, nullptr);
assert(r==0); assert(r==0);
assert(le!=NULL); assert(le!=NULL);
memsize = le_memsize_from_ule(&ule); memsize = le_memsize_from_ule(&ule);
...@@ -329,7 +329,7 @@ test_le_pack_committed (void) { ...@@ -329,7 +329,7 @@ test_le_pack_committed (void) {
verify_ule_equal(&ule, &tmp_ule); verify_ule_equal(&ule, &tmp_ule);
LEAFENTRY tmp_le; LEAFENTRY tmp_le;
size_t tmp_memsize; size_t tmp_memsize;
r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le); r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le, nullptr);
tmp_memsize = le_memsize_from_ule(&tmp_ule); tmp_memsize = le_memsize_from_ule(&tmp_ule);
assert(r==0); assert(r==0);
assert(tmp_memsize == memsize); assert(tmp_memsize == memsize);
...@@ -377,7 +377,7 @@ test_le_pack_uncommitted (uint8_t committed_type, uint8_t prov_type, int num_pla ...@@ -377,7 +377,7 @@ test_le_pack_uncommitted (uint8_t committed_type, uint8_t prov_type, int num_pla
size_t memsize; size_t memsize;
LEAFENTRY le; LEAFENTRY le;
int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le); int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le, nullptr);
assert(r==0); assert(r==0);
assert(le!=NULL); assert(le!=NULL);
memsize = le_memsize_from_ule(&ule); memsize = le_memsize_from_ule(&ule);
...@@ -387,7 +387,7 @@ test_le_pack_uncommitted (uint8_t committed_type, uint8_t prov_type, int num_pla ...@@ -387,7 +387,7 @@ test_le_pack_uncommitted (uint8_t committed_type, uint8_t prov_type, int num_pla
verify_ule_equal(&ule, &tmp_ule); verify_ule_equal(&ule, &tmp_ule);
LEAFENTRY tmp_le; LEAFENTRY tmp_le;
size_t tmp_memsize; size_t tmp_memsize;
r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le); r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le, nullptr);
tmp_memsize = le_memsize_from_ule(&tmp_ule); tmp_memsize = le_memsize_from_ule(&tmp_ule);
assert(r==0); assert(r==0);
assert(tmp_memsize == memsize); assert(tmp_memsize == memsize);
...@@ -448,7 +448,7 @@ test_le_apply(ULE ule_initial, FT_MSG msg, ULE ule_expected) { ...@@ -448,7 +448,7 @@ test_le_apply(ULE ule_initial, FT_MSG msg, ULE ule_expected) {
LEAFENTRY le_expected; LEAFENTRY le_expected;
LEAFENTRY le_result; LEAFENTRY le_result;
r = le_pack(ule_initial, nullptr, 0, nullptr, 0, 0, &le_initial); r = le_pack(ule_initial, nullptr, 0, nullptr, 0, 0, &le_initial, nullptr);
CKERR(r); CKERR(r);
size_t result_memsize = 0; size_t result_memsize = 0;
...@@ -467,7 +467,7 @@ test_le_apply(ULE ule_initial, FT_MSG msg, ULE ule_expected) { ...@@ -467,7 +467,7 @@ test_le_apply(ULE ule_initial, FT_MSG msg, ULE ule_expected) {
} }
size_t expected_memsize = 0; size_t expected_memsize = 0;
r = le_pack(ule_expected, nullptr, 0, nullptr, 0, 0, &le_expected); r = le_pack(ule_expected, nullptr, 0, nullptr, 0, 0, &le_expected, nullptr);
CKERR(r); CKERR(r);
if (le_expected) { if (le_expected) {
expected_memsize = leafentry_memsize(le_expected); expected_memsize = leafentry_memsize(le_expected);
...@@ -749,7 +749,7 @@ test_le_apply_messages(void) { ...@@ -749,7 +749,7 @@ test_le_apply_messages(void) {
static bool ule_worth_running_garbage_collection(ULE ule, TXNID oldest_referenced_xid_known) { static bool ule_worth_running_garbage_collection(ULE ule, TXNID oldest_referenced_xid_known) {
LEAFENTRY le; LEAFENTRY le;
int r = le_pack(ule, nullptr, 0, nullptr, 0, 0, &le); CKERR(r); int r = le_pack(ule, nullptr, 0, nullptr, 0, 0, &le, nullptr); CKERR(r);
invariant_notnull(le); invariant_notnull(le);
txn_gc_info gc_info(nullptr, oldest_referenced_xid_known, oldest_referenced_xid_known, true); txn_gc_info gc_info(nullptr, oldest_referenced_xid_known, oldest_referenced_xid_known, true);
bool worth_running = toku_le_worth_running_garbage_collection(le, &gc_info); bool worth_running = toku_le_worth_running_garbage_collection(le, &gc_info);
......
...@@ -189,7 +189,7 @@ doit (void) { ...@@ -189,7 +189,7 @@ doit (void) {
r = toku_testsetup_root(t, node_root); r = toku_testsetup_root(t, node_root);
assert(r==0); assert(r==0);
char filler[900]; char filler[900-2*bn_data::HEADER_LENGTH];
memset(filler, 0, sizeof(filler)); memset(filler, 0, sizeof(filler));
// now we insert filler data so that a merge does not happen // now we insert filler data so that a merge does not happen
r = toku_testsetup_insert_to_leaf ( r = toku_testsetup_insert_to_leaf (
......
...@@ -119,13 +119,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keysize, const cha ...@@ -119,13 +119,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keysize, const cha
{ {
LEAFENTRY r = NULL; LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize); uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize);
void *maybe_free = nullptr;
bn->get_space_for_insert( bn->get_space_for_insert(
idx, idx,
key, key,
keysize, keysize,
size_needed, size_needed,
&r &r,
&maybe_free
); );
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r); resource_assert(r);
r->type = LE_CLEAN; r->type = LE_CLEAN;
r->u.clean.vallen = valsize; r->u.clean.vallen = valsize;
......
...@@ -122,7 +122,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -122,7 +122,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen); DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen); DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node // apply an insert to the leaf node
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -112,7 +112,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -112,7 +112,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen); DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node // apply an insert to the leaf node
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen); DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node // apply an insert to the leaf node
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -112,7 +112,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -112,7 +112,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen); DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node // apply an insert to the leaf node
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -114,7 +114,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -114,7 +114,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
toku_fill_dbt(&theval, val, vallen); toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node // apply an insert to the leaf node
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen) ...@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen); DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry // get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size(); uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node // apply an insert to the leaf node
MSN msn = next_dummymsn(); MSN msn = next_dummymsn();
......
...@@ -315,9 +315,9 @@ dump_node (int f, BLOCKNUM blocknum, FT h) { ...@@ -315,9 +315,9 @@ dump_node (int f, BLOCKNUM blocknum, FT h) {
} }
} else { } else {
printf(" n_bytes_in_buffer= %" PRIu64 "", BLB_DATA(n, i)->get_disk_size()); printf(" n_bytes_in_buffer= %" PRIu64 "", BLB_DATA(n, i)->get_disk_size());
printf(" items_in_buffer=%u\n", BLB_DATA(n, i)->omt_size()); printf(" items_in_buffer=%u\n", BLB_DATA(n, i)->num_klpairs());
if (dump_data) { if (dump_data) {
BLB_DATA(n, i)->omt_iterate<void, print_le>(NULL); BLB_DATA(n, i)->iterate<void, print_le>(NULL);
} }
} }
} }
......
...@@ -149,7 +149,8 @@ le_pack(ULE ule, // data to be packed into new leafentry ...@@ -149,7 +149,8 @@ le_pack(ULE ule, // data to be packed into new leafentry
void* keyp, void* keyp,
uint32_t keylen, uint32_t keylen,
uint32_t old_le_size, uint32_t old_le_size,
LEAFENTRY * const new_leafentry_p // this is what this function creates LEAFENTRY * const new_leafentry_p, // this is what this function creates
void **const maybe_free
); );
......
...@@ -258,20 +258,21 @@ static void get_space_for_le( ...@@ -258,20 +258,21 @@ static void get_space_for_le(
uint32_t keylen, uint32_t keylen,
uint32_t old_le_size, uint32_t old_le_size,
size_t size, size_t size,
LEAFENTRY* new_le_space LEAFENTRY* new_le_space,
void **const maybe_free
) )
{ {
if (data_buffer == NULL) { if (data_buffer == nullptr) {
CAST_FROM_VOIDP(*new_le_space, toku_xmalloc(size)); CAST_FROM_VOIDP(*new_le_space, toku_xmalloc(size));
} }
else { else {
// this means we are overwriting something // this means we are overwriting something
if (old_le_size > 0) { if (old_le_size > 0) {
data_buffer->get_space_for_overwrite(idx, keyp, keylen, old_le_size, size, new_le_space); data_buffer->get_space_for_overwrite(idx, keyp, keylen, old_le_size, size, new_le_space, maybe_free);
} }
// this means we are inserting something new // this means we are inserting something new
else { else {
data_buffer->get_space_for_insert(idx, keyp, keylen, size, new_le_space); data_buffer->get_space_for_insert(idx, keyp, keylen, size, new_le_space, maybe_free);
} }
} }
} }
...@@ -505,19 +506,12 @@ toku_le_apply_msg(FT_MSG msg, ...@@ -505,19 +506,12 @@ toku_le_apply_msg(FT_MSG msg,
int64_t newnumbytes = 0; int64_t newnumbytes = 0;
uint64_t oldmemsize = 0; uint64_t oldmemsize = 0;
uint32_t keylen = ft_msg_get_keylen(msg); uint32_t keylen = ft_msg_get_keylen(msg);
LEAFENTRY copied_old_le = NULL;
size_t old_le_size = old_leafentry ? leafentry_memsize(old_leafentry) : 0;
toku::scoped_malloc copied_old_le_buf(old_le_size);
if (old_leafentry) {
CAST_FROM_VOIDP(copied_old_le, copied_old_le_buf.get());
memcpy(copied_old_le, old_leafentry, old_le_size);
}
if (old_leafentry == NULL) { if (old_leafentry == NULL) {
msg_init_empty_ule(&ule); msg_init_empty_ule(&ule);
} else { } else {
oldmemsize = leafentry_memsize(old_leafentry); oldmemsize = leafentry_memsize(old_leafentry);
le_unpack(&ule, copied_old_le); // otherwise unpack leafentry le_unpack(&ule, old_leafentry); // otherwise unpack leafentry
oldnumbytes = ule_get_innermost_numbytes(&ule, keylen); oldnumbytes = ule_get_innermost_numbytes(&ule, keylen);
} }
msg_modify_ule(&ule, msg); // modify unpacked leafentry msg_modify_ule(&ule, msg); // modify unpacked leafentry
...@@ -550,21 +544,28 @@ toku_le_apply_msg(FT_MSG msg, ...@@ -550,21 +544,28 @@ toku_le_apply_msg(FT_MSG msg,
STATUS_INC(LE_APPLY_GC_BYTES_IN, size_before_gc); STATUS_INC(LE_APPLY_GC_BYTES_IN, size_before_gc);
STATUS_INC(LE_APPLY_GC_BYTES_OUT, size_after_gc); STATUS_INC(LE_APPLY_GC_BYTES_OUT, size_after_gc);
} }
int rval = le_pack(
void *maybe_free = nullptr;
int r = le_pack(
&ule, // create packed leafentry &ule, // create packed leafentry
data_buffer, data_buffer,
idx, idx,
ft_msg_get_key(msg), // contract of this function is caller has this set, always ft_msg_get_key(msg), // contract of this function is caller has this set, always
keylen, // contract of this function is caller has this set, always keylen, // contract of this function is caller has this set, always
oldmemsize, oldmemsize,
new_leafentry_p new_leafentry_p,
&maybe_free
); );
invariant_zero(rval); invariant_zero(r);
if (*new_leafentry_p) { if (*new_leafentry_p) {
newnumbytes = ule_get_innermost_numbytes(&ule, keylen); newnumbytes = ule_get_innermost_numbytes(&ule, keylen);
} }
*numbytes_delta_p = newnumbytes - oldnumbytes; *numbytes_delta_p = newnumbytes - oldnumbytes;
ule_cleanup(&ule); ule_cleanup(&ule);
if (maybe_free != nullptr) {
toku_free(maybe_free);
}
} }
bool toku_le_worth_running_garbage_collection(LEAFENTRY le, txn_gc_info *gc_info) { bool toku_le_worth_running_garbage_collection(LEAFENTRY le, txn_gc_info *gc_info) {
...@@ -621,15 +622,8 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry, ...@@ -621,15 +622,8 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry,
ULE_S ule; ULE_S ule;
int64_t oldnumbytes = 0; int64_t oldnumbytes = 0;
int64_t newnumbytes = 0; int64_t newnumbytes = 0;
LEAFENTRY copied_old_le = NULL;
size_t old_le_size = old_leaf_entry ? leafentry_memsize(old_leaf_entry) : 0;
toku::scoped_malloc copied_old_le_buf(old_le_size);
if (old_leaf_entry) {
CAST_FROM_VOIDP(copied_old_le, copied_old_le_buf.get());
memcpy(copied_old_le, old_leaf_entry, old_le_size);
}
le_unpack(&ule, copied_old_le); le_unpack(&ule, old_leaf_entry);
oldnumbytes = ule_get_innermost_numbytes(&ule, keylen); oldnumbytes = ule_get_innermost_numbytes(&ule, keylen);
uint32_t old_mem_size = leafentry_memsize(old_leaf_entry); uint32_t old_mem_size = leafentry_memsize(old_leaf_entry);
...@@ -654,6 +648,7 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry, ...@@ -654,6 +648,7 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry,
STATUS_INC(LE_APPLY_GC_BYTES_OUT, size_after_gc); STATUS_INC(LE_APPLY_GC_BYTES_OUT, size_after_gc);
} }
void *maybe_free = nullptr;
int r = le_pack( int r = le_pack(
&ule, &ule,
data_buffer, data_buffer,
...@@ -661,14 +656,19 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry, ...@@ -661,14 +656,19 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry,
keyp, keyp,
keylen, keylen,
old_mem_size, old_mem_size,
new_leaf_entry new_leaf_entry,
&maybe_free
); );
assert(r == 0); invariant_zero(r);
if (*new_leaf_entry) { if (*new_leaf_entry) {
newnumbytes = ule_get_innermost_numbytes(&ule, keylen); newnumbytes = ule_get_innermost_numbytes(&ule, keylen);
} }
*numbytes_delta_p = newnumbytes - oldnumbytes; *numbytes_delta_p = newnumbytes - oldnumbytes;
ule_cleanup(&ule); ule_cleanup(&ule);
if (maybe_free != nullptr) {
toku_free(maybe_free);
}
} }
///////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////////
...@@ -975,7 +975,8 @@ le_pack(ULE ule, // data to be packed into new leafentry ...@@ -975,7 +975,8 @@ le_pack(ULE ule, // data to be packed into new leafentry
void* keyp, void* keyp,
uint32_t keylen, uint32_t keylen,
uint32_t old_le_size, uint32_t old_le_size,
LEAFENTRY * const new_leafentry_p // this is what this function creates LEAFENTRY * const new_leafentry_p, // this is what this function creates
void **const maybe_free
) )
{ {
invariant(ule->num_cuxrs > 0); invariant(ule->num_cuxrs > 0);
...@@ -1001,10 +1002,10 @@ le_pack(ULE ule, // data to be packed into new leafentry ...@@ -1001,10 +1002,10 @@ le_pack(ULE ule, // data to be packed into new leafentry
rval = 0; rval = 0;
goto cleanup; goto cleanup;
} }
found_insert:; found_insert:
memsize = le_memsize_from_ule(ule); memsize = le_memsize_from_ule(ule);
LEAFENTRY new_leafentry; LEAFENTRY new_leafentry;
get_space_for_le(data_buffer, idx, keyp, keylen, old_le_size, memsize, &new_leafentry); get_space_for_le(data_buffer, idx, keyp, keylen, old_le_size, memsize, &new_leafentry, maybe_free);
//p always points to first unused byte after leafentry we are packing //p always points to first unused byte after leafentry we are packing
uint8_t *p; uint8_t *p;
...@@ -2467,12 +2468,14 @@ toku_le_upgrade_13_14(LEAFENTRY_13 old_leafentry, ...@@ -2467,12 +2468,14 @@ toku_le_upgrade_13_14(LEAFENTRY_13 old_leafentry,
// malloc instead of a mempool. However after supporting upgrade, // malloc instead of a mempool. However after supporting upgrade,
// we need to use mempools and the OMT. // we need to use mempools and the OMT.
rval = le_pack(&ule, // create packed leafentry rval = le_pack(&ule, // create packed leafentry
NULL, nullptr,
0, //only matters if we are passing in a bn_data 0, //only matters if we are passing in a bn_data
NULL, //only matters if we are passing in a bn_data nullptr, //only matters if we are passing in a bn_data
0, //only matters if we are passing in a bn_data 0, //only matters if we are passing in a bn_data
0, //only matters if we are passing in a bn_data 0, //only matters if we are passing in a bn_data
new_leafentry_p); new_leafentry_p,
nullptr //only matters if we are passing in a bn_data
);
ule_cleanup(&ule); ule_cleanup(&ule);
*new_leafentry_memorysize = leafentry_memsize(*new_leafentry_p); *new_leafentry_memorysize = leafentry_memsize(*new_leafentry_p);
return rval; return rval;
......
...@@ -187,6 +187,13 @@ static inline void wbuf_uint (struct wbuf *w, uint32_t i) { ...@@ -187,6 +187,13 @@ static inline void wbuf_uint (struct wbuf *w, uint32_t i) {
wbuf_int(w, (int32_t)i); wbuf_int(w, (int32_t)i);
} }
static inline uint8_t* wbuf_nocrc_reserve_literal_bytes(struct wbuf *w, uint32_t nbytes) {
assert(w->ndone + nbytes <= w->size);
uint8_t * dest = w->buf + w->ndone;
w->ndone += nbytes;
return dest;
}
static inline void wbuf_nocrc_literal_bytes(struct wbuf *w, bytevec bytes_bv, uint32_t nbytes) { static inline void wbuf_nocrc_literal_bytes(struct wbuf *w, bytevec bytes_bv, uint32_t nbytes) {
const unsigned char *bytes = (const unsigned char *) bytes_bv; const unsigned char *bytes = (const unsigned char *) bytes_bv;
#if 0 #if 0
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment