Commit b1c1e93d authored by Leif Walsh's avatar Leif Walsh

Merge remote-tracking branch 'origin/ft-index/46merge-a'

parents 60be90af 886b0041
Notes during 2014-01-08 Leif/Yoni
-Should verify (dmt?omt?bndata?) crash or return error on failed verify
DECISIONS:
Replace dmt_functor with implicit interface only. Instead of (for data type x) requiring the name to be dmt_functor<x> just pass the writer's class name into the dmt's template as a new parameter.
Replace dmt_functor<default> with comments explaining the "interface"
-==========================================-
See wiki:
https://github.com/Tokutek/ft-index/wiki/Improving-in-memory-query-performance---Design
ft/bndata.{cc,h} The basement node was heavily modified to split the key/value, and inline the keys
bn_data::initialize_from_separate_keys_and_vals
This is effectively the deserialize
The bn_data::omt_* functions (probably badly named) kind of treat the basement node as an omt of key+leafentry pairs
There are many references to 'omt' that could be renamed to dmt if it's worth it.
util/dmt.{cc,h} The new DMT structure
Possible questions:
1-Should we merge dmt<> & omt<>? (delete omt entirely)
2-Should omt<> become a wrapper for dmt<>?
3-Should we just keep both around?
If we plan to do this for a while, should we get rid of any scaffolding that would make it easier to do 1 or 2?
The dmt is basically an omt with dynamic sized nodes/values.
There are two representations: an array of values, or a tree of nodes.
The high-level algorithm is basically the same for dmt and omt, except the dmt tries not to move values around in tree form
Instead, it moves the metadata from nodes around.
Insertion into a dmt requires a functor that can provide information about size, since it's expected to be (potentially at least) dynamically sized
The dmt does not revert to array form when rebalancing the root, but it CAN revert to array form when it prepares for serializing (if it notices everything is fixed length)
The dmt also can serialize and deserialize the values (set) it represents. It saves no information about the dmt itself, just the values.
Some comments about what's in each file.
ft/CMakeLists.txt
add dmt-wrapper (test wrapper, nearly identical to ft/omt.cc which is also a test wrapper)
ft/dmt-wrapper.cc/h
Just like ft/omt.cc,h. Is a test wrapper for the dmt to implement a version of the old (non-templated) omt tests.
ft/ft-internal.h
Additional engine status
ft/ft-ops.cc/h
Additional engine status
in ftnode_memory_size()
fix a minor bug where we didn't count all the memory.
comments
ft/ft_layout_version.h
Update comment describing version change.
NOTE: May need to add version 26 if 25 is sent to customers before this goes live.
Adding 26 requires additional code changes (limited to a subset of places where version 24/25 are referred to)
ft/ft_node-serialize.cc
Changes calculation of size of a leaf node to include basement-node header
Adds optimized serialization for basement nodes with fixed-length keys
Maintains old method when not using fixed-length keys.
rebalance_ftnode_leaf()
Minor changes since key/leafentries are separated
deserialize_ftnode_partition()
Minor changes, including passing rbuf directly to child function (so ndone calculation is done by child)
ft/memarena.cc
Changes so that toku_memory_footprint is more accurate. (Not exactly related project)
ft/rollback.cc
Just uses new memarena function for memory footprint
ft/tests/dmt-test.cc
"clone" of old omt-test (non templated) ported to dmt
Basically not worth looking at except to make sure it imports dmt instead of omt.
ft/tests/dmt-test2.cc
New dmt tests.
You might decide not enough new tests were implemented.
ft/tests/ft-serialize-benchmark.cc
Minor improvements s.t. you can take an average of a bunch of runs.
ft/tests/ft-serialize-test.cc
Just ported to changed api
ft/tests/test-pick-child-to-flush.cc
The new basement-node headers reduce available memory.. reduce max size of test appropriately.
ft/wbuf.h
Added wbuf_nocrc_reserve_literal_bytes()
Gives you a pointer to write to the wbuf, but notes the memory was used.
util/mempool.cc
Made mempool allocations aligned to cachelines
Minor 'const' changes to help compilation
Some utility functions to get/give offsets
......@@ -31,6 +31,7 @@ set(FT_SOURCES
checkpoint
compress
dbufio
dmt-wrapper
fifo
ft
ft-cachetable-wrappers
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -689,16 +689,16 @@ ftleaf_get_split_loc(
switch (split_mode) {
case SPLIT_LEFT_HEAVY: {
*num_left_bns = node->n_children;
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->omt_size();
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->num_klpairs();
if (*num_left_les == 0) {
*num_left_bns = node->n_children - 1;
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->omt_size();
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->num_klpairs();
}
goto exit;
}
case SPLIT_RIGHT_HEAVY: {
*num_left_bns = 1;
*num_left_les = BLB_DATA(node, 0)->omt_size() ? 1 : 0;
*num_left_les = BLB_DATA(node, 0)->num_klpairs() ? 1 : 0;
goto exit;
}
case SPLIT_EVENLY: {
......@@ -707,8 +707,8 @@ ftleaf_get_split_loc(
uint64_t sumlesizes = ftleaf_disk_size(node);
uint32_t size_so_far = 0;
for (int i = 0; i < node->n_children; i++) {
BN_DATA bd = BLB_DATA(node, i);
uint32_t n_leafentries = bd->omt_size();
bn_data* bd = BLB_DATA(node, i);
uint32_t n_leafentries = bd->num_klpairs();
for (uint32_t j=0; j < n_leafentries; j++) {
size_t size_this_le;
int rr = bd->fetch_klpair_disksize(j, &size_this_le);
......@@ -725,7 +725,7 @@ ftleaf_get_split_loc(
(*num_left_les)--;
} else if (*num_left_bns > 1) {
(*num_left_bns)--;
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->omt_size();
*num_left_les = BLB_DATA(node, *num_left_bns - 1)->num_klpairs();
} else {
// we are trying to split a leaf with only one
// leafentry in it
......@@ -754,7 +754,8 @@ move_leafentries(
)
//Effect: move leafentries in the range [lbi, upe) from src_omt to newly created dest_omt
{
src_bn->data_buffer.move_leafentries_to(&dest_bn->data_buffer, lbi, ube);
invariant(ube == src_bn->data_buffer.num_klpairs());
src_bn->data_buffer.split_klpairs(&dest_bn->data_buffer, lbi);
}
static void ftnode_finalize_split(FTNODE node, FTNODE B, MSN max_msn_applied_to_node) {
......@@ -851,7 +852,7 @@ ftleaf_split(
ftleaf_get_split_loc(node, split_mode, &num_left_bns, &num_left_les);
{
// did we split right on the boundary between basement nodes?
const bool split_on_boundary = (num_left_les == 0) || (num_left_les == (int) BLB_DATA(node, num_left_bns - 1)->omt_size());
const bool split_on_boundary = (num_left_les == 0) || (num_left_les == (int) BLB_DATA(node, num_left_bns - 1)->num_klpairs());
// Now we know where we are going to break it
// the two nodes will have a total of n_children+1 basement nodes
// and n_children-1 pivots
......@@ -912,7 +913,7 @@ ftleaf_split(
move_leafentries(BLB(B, curr_dest_bn_index),
BLB(node, curr_src_bn_index),
num_left_les, // first row to be moved to B
BLB_DATA(node, curr_src_bn_index)->omt_size() // number of rows in basement to be split
BLB_DATA(node, curr_src_bn_index)->num_klpairs() // number of rows in basement to be split
);
BLB_MAX_MSN_APPLIED(B, curr_dest_bn_index) = BLB_MAX_MSN_APPLIED(node, curr_src_bn_index);
curr_dest_bn_index++;
......@@ -954,10 +955,10 @@ ftleaf_split(
toku_destroy_dbt(&node->childkeys[num_left_bns - 1]);
}
} else if (splitk) {
BN_DATA bd = BLB_DATA(node, num_left_bns - 1);
bn_data* bd = BLB_DATA(node, num_left_bns - 1);
uint32_t keylen;
void *key;
int rr = bd->fetch_le_key_and_len(bd->omt_size() - 1, &keylen, &key);
int rr = bd->fetch_key_and_len(bd->num_klpairs() - 1, &keylen, &key);
invariant_zero(rr);
toku_memdup_dbt(splitk, key, keylen);
}
......@@ -1168,11 +1169,11 @@ merge_leaf_nodes(FTNODE a, FTNODE b)
a->dirty = 1;
b->dirty = 1;
BN_DATA a_last_bd = BLB_DATA(a, a->n_children-1);
bn_data* a_last_bd = BLB_DATA(a, a->n_children-1);
// this bool states if the last basement node in a has any items or not
// If it does, then it stays in the merge. If it does not, the last basement node
// of a gets eliminated because we do not have a pivot to store for it (because it has no elements)
const bool a_has_tail = a_last_bd->omt_size() > 0;
const bool a_has_tail = a_last_bd->num_klpairs() > 0;
// move each basement node from b to a
// move the pivots, adding one of what used to be max(a)
......@@ -1199,7 +1200,7 @@ merge_leaf_nodes(FTNODE a, FTNODE b)
if (a_has_tail) {
uint32_t keylen;
void *key;
int rr = a_last_bd->fetch_le_key_and_len(a_last_bd->omt_size() - 1, &keylen, &key);
int rr = a_last_bd->fetch_key_and_len(a_last_bd->num_klpairs() - 1, &keylen, &key);
invariant_zero(rr);
toku_memdup_dbt(&a->childkeys[a->n_children-1], key, keylen);
a->totalchildkeylens += keylen;
......
......@@ -1178,6 +1178,8 @@ typedef enum {
FT_PRO_NUM_STOP_LOCK_CHILD,
FT_PRO_NUM_STOP_CHILD_INMEM,
FT_PRO_NUM_DIDNT_WANT_PROMOTE,
FT_BASEMENT_DESERIALIZE_FIXED_KEYSIZE, // how many basement nodes were deserialized with a fixed keysize
FT_BASEMENT_DESERIALIZE_VARIABLE_KEYSIZE, // how many basement nodes were deserialized with a variable keysize
FT_STATUS_NUM_ROWS
} ft_status_entry;
......
This diff is collapsed.
......@@ -351,6 +351,8 @@ int toku_ft_strerror_r(int error, char *buf, size_t buflen);
extern bool garbage_collection_debug;
void toku_note_deserialized_basement_node(bool fixed_key_size);
// This is a poor place to put global options like these.
void toku_ft_set_direct_io(bool direct_io_on);
void toku_ft_set_compress_buffers_before_eviction(bool compress_buffers);
......
......@@ -462,6 +462,7 @@ serialize_ft_min_size (uint32_t version) {
size_t size = 0;
switch(version) {
case FT_LAYOUT_VERSION_26:
case FT_LAYOUT_VERSION_25:
case FT_LAYOUT_VERSION_24:
case FT_LAYOUT_VERSION_23:
......
......@@ -152,7 +152,7 @@ verify_msg_in_child_buffer(FT_HANDLE brt, enum ft_msg_type type, MSN msn, byteve
static DBT
get_ith_key_dbt (BASEMENTNODE bn, int i) {
DBT kdbt;
int r = bn->data_buffer.fetch_le_key_and_len(i, &kdbt.size, &kdbt.data);
int r = bn->data_buffer.fetch_key_and_len(i, &kdbt.size, &kdbt.data);
invariant_zero(r); // this is a bad failure if it happens.
return kdbt;
}
......@@ -424,7 +424,7 @@ toku_verify_ftnode_internal(FT_HANDLE brt,
}
else {
BASEMENTNODE bn = BLB(node, i);
for (uint32_t j = 0; j < bn->data_buffer.omt_size(); j++) {
for (uint32_t j = 0; j < bn->data_buffer.num_klpairs(); j++) {
VERIFY_ASSERTION((rootmsn.msn >= this_msn.msn), 0, "leaf may have latest msn, but cannot be greater than root msn");
DBT kdbt = get_ith_key_dbt(bn, j);
if (curr_less_pivot) {
......
......@@ -1077,8 +1077,8 @@ garbage_helper(BLOCKNUM blocknum, int64_t UU(size), int64_t UU(address), void *e
goto exit;
}
for (int i = 0; i < node->n_children; ++i) {
BN_DATA bd = BLB_DATA(node, i);
r = bd->omt_iterate<struct garbage_helper_extra, garbage_leafentry_helper>(info);
bn_data* bd = BLB_DATA(node, i);
r = bd->iterate<struct garbage_helper_extra, garbage_leafentry_helper>(info);
if (r != 0) {
goto exit;
}
......
......@@ -119,6 +119,7 @@ enum ft_layout_version_e {
FT_LAYOUT_VERSION_23 = 23, // Ming: Fix upgrade path #5902
FT_LAYOUT_VERSION_24 = 24, // Riddler: change logentries that log transactions to store TXNID_PAIRs instead of TXNIDs
FT_LAYOUT_VERSION_25 = 25, // SecretSquirrel: ROLLBACK_LOG_NODES (on disk and in memory) now just use blocknum (instead of blocknum + hash) to point to other log nodes. same for xstillopen log entry
FT_LAYOUT_VERSION_26 = 26, // Hojo: basements store key/vals separately on disk for fixed klpair length BNs
FT_NEXT_VERSION, // the version after the current version
FT_LAYOUT_VERSION = FT_NEXT_VERSION-1, // A hack so I don't have to change this line.
FT_LAYOUT_MIN_SUPPORTED_VERSION = FT_LAYOUT_VERSION_13, // Minimum version supported
......
......@@ -284,31 +284,6 @@ serialize_node_header(FTNODE node, FTNODE_DISK_DATA ndd, struct wbuf *wbuf) {
invariant(wbuf->ndone == wbuf->size);
}
static int
wbufwriteleafentry(const void* key, const uint32_t keylen, const LEAFENTRY &le, const uint32_t UU(idx), struct wbuf * const wb) {
// need to pack the leafentry as it was in versions
// where the key was integrated into it
uint32_t begin_spot UU() = wb->ndone;
uint32_t le_disk_size = leafentry_disksize(le);
wbuf_nocrc_uint8_t(wb, le->type);
wbuf_nocrc_uint32_t(wb, keylen);
if (le->type == LE_CLEAN) {
wbuf_nocrc_uint32_t(wb, le->u.clean.vallen);
wbuf_nocrc_literal_bytes(wb, key, keylen);
wbuf_nocrc_literal_bytes(wb, le->u.clean.val, le->u.clean.vallen);
}
else {
paranoid_invariant(le->type == LE_MVCC);
wbuf_nocrc_uint32_t(wb, le->u.mvcc.num_cxrs);
wbuf_nocrc_uint8_t(wb, le->u.mvcc.num_pxrs);
wbuf_nocrc_literal_bytes(wb, key, keylen);
wbuf_nocrc_literal_bytes(wb, le->u.mvcc.xrs, le_disk_size - (1 + 4 + 1));
}
uint32_t end_spot UU() = wb->ndone;
paranoid_invariant((end_spot - begin_spot) == keylen + sizeof(keylen) + le_disk_size);
return 0;
}
static uint32_t
serialize_ftnode_partition_size (FTNODE node, int i)
{
......@@ -320,14 +295,14 @@ serialize_ftnode_partition_size (FTNODE node, int i)
result += toku_bnc_nbytesinbuf(BNC(node, i));
}
else {
result += 4; // n_entries in buffer table
result += 4 + bn_data::HEADER_LENGTH; // n_entries in buffer table + basement header
result += BLB_NBYTESINDATA(node, i);
}
result += 4; // checksum
return result;
}
#define FTNODE_PARTITION_OMT_LEAVES 0xaa
#define FTNODE_PARTITION_DMT_LEAVES 0xaa
#define FTNODE_PARTITION_FIFO_MSG 0xbb
static void
......@@ -374,16 +349,13 @@ serialize_ftnode_partition(FTNODE node, int i, struct sub_block *sb) {
serialize_nonleaf_childinfo(BNC(node, i), &wb);
}
else {
unsigned char ch = FTNODE_PARTITION_OMT_LEAVES;
BN_DATA bd = BLB_DATA(node, i);
unsigned char ch = FTNODE_PARTITION_DMT_LEAVES;
bn_data* bd = BLB_DATA(node, i);
wbuf_nocrc_char(&wb, ch);
wbuf_nocrc_uint(&wb, bd->omt_size());
wbuf_nocrc_uint(&wb, bd->num_klpairs());
//
// iterate over leafentries and place them into the buffer
//
bd->omt_iterate<struct wbuf, wbufwriteleafentry>(&wb);
bd->serialize_to_wbuf(&wb);
}
uint32_t end_to_end_checksum = x1764_memory(sb->uncompressed_ptr, wbuf_get_woffset(&wb));
wbuf_nocrc_int(&wb, end_to_end_checksum);
......@@ -546,7 +518,7 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
// Count number of leaf entries in this leaf (num_le).
uint32_t num_le = 0;
for (uint32_t i = 0; i < num_orig_basements; i++) {
num_le += BLB_DATA(node, i)->omt_size();
num_le += BLB_DATA(node, i)->num_klpairs();
}
uint32_t num_alloc = num_le ? num_le : 1; // simplify logic below by always having at least one entry per array
......@@ -571,10 +543,10 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
uint32_t curr_le = 0;
for (uint32_t i = 0; i < num_orig_basements; i++) {
BN_DATA bd = BLB_DATA(node, i);
bn_data* bd = BLB_DATA(node, i);
struct array_info ai {.offset = curr_le, .le_array = leafpointers, .key_sizes_array = key_sizes, .key_ptr_array = key_pointers };
bd->omt_iterate<array_info, array_item>(&ai);
curr_le += bd->omt_size();
bd->iterate<array_info, array_item>(&ai);
curr_le += bd->num_klpairs();
}
// Create an array that will store indexes of new pivots.
......@@ -592,9 +564,14 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
// Create an array that will store the size of each basement.
// This is the sum of the leaf sizes of all the leaves in that basement.
// We don't know how many basements there will be, so we use num_le as the upper bound.
toku::scoped_malloc bn_sizes_buf(sizeof(size_t) * num_alloc);
size_t *bn_sizes = reinterpret_cast<size_t *>(bn_sizes_buf.get());
bn_sizes[0] = 0;
// Sum of all le sizes in a single basement
toku::scoped_calloc bn_le_sizes_buf(sizeof(size_t) * num_alloc);
size_t *bn_le_sizes = reinterpret_cast<size_t *>(bn_le_sizes_buf.get());
// Sum of all key sizes in a single basement
toku::scoped_calloc bn_key_sizes_buf(sizeof(size_t) * num_alloc);
size_t *bn_key_sizes = reinterpret_cast<size_t *>(bn_key_sizes_buf.get());
// TODO 4050: All these arrays should be combined into a single array of some bn_info struct (pivot, msize, num_les).
// Each entry is the number of leafentries in this basement. (Again, num_le is overkill upper baound.)
......@@ -611,7 +588,7 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
for (uint32_t i = 0; i < num_le; i++) {
uint32_t curr_le_size = leafentry_disksize((LEAFENTRY) leafpointers[i]);
le_sizes[i] = curr_le_size;
if ((bn_size_so_far + curr_le_size > basementnodesize) && (num_le_in_curr_bn != 0)) {
if ((bn_size_so_far + curr_le_size + sizeof(uint32_t) + key_sizes[i] > basementnodesize) && (num_le_in_curr_bn != 0)) {
// cap off the current basement node to end with the element before i
new_pivots[curr_pivot] = i-1;
curr_pivot++;
......@@ -620,8 +597,9 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
}
num_le_in_curr_bn++;
num_les_this_bn[curr_pivot] = num_le_in_curr_bn;
bn_le_sizes[curr_pivot] += curr_le_size;
bn_key_sizes[curr_pivot] += sizeof(uint32_t) + key_sizes[i]; // uint32_t le_offset
bn_size_so_far += curr_le_size + sizeof(uint32_t) + key_sizes[i];
bn_sizes[curr_pivot] = bn_size_so_far;
}
// curr_pivot is now the total number of pivot keys in the leaf node
int num_pivots = curr_pivot;
......@@ -688,17 +666,15 @@ rebalance_ftnode_leaf(FTNODE node, unsigned int basementnodesize)
uint32_t num_les_to_copy = num_les_this_bn[i];
invariant(num_les_to_copy == num_in_bn);
// construct mempool for this basement
size_t size_this_bn = bn_sizes[i];
BN_DATA bd = BLB_DATA(node, i);
bd->replace_contents_with_clone_of_sorted_array(
bn_data* bd = BLB_DATA(node, i);
bd->set_contents_as_clone_of_sorted_array(
num_les_to_copy,
&key_pointers[baseindex_this_bn],
&key_sizes[baseindex_this_bn],
&leafpointers[baseindex_this_bn],
&le_sizes[baseindex_this_bn],
size_this_bn
bn_key_sizes[i], // Total key sizes
bn_le_sizes[i] // total le sizes
);
BP_STATE(node,i) = PT_AVAIL;
......@@ -1541,15 +1517,14 @@ deserialize_ftnode_partition(
BP_WORKDONE(node, childnum) = 0;
}
else {
assert(ch == FTNODE_PARTITION_OMT_LEAVES);
assert(ch == FTNODE_PARTITION_DMT_LEAVES);
BLB_SEQINSERT(node, childnum) = 0;
uint32_t num_entries = rbuf_int(&rb);
// we are now at the first byte of first leafentry
data_size -= rb.ndone; // remaining bytes of leafentry data
BASEMENTNODE bn = BLB(node, childnum);
bn->data_buffer.initialize_from_data(num_entries, &rb.buf[rb.ndone], data_size);
rb.ndone += data_size;
bn->data_buffer.deserialize_from_rbuf(num_entries, &rb, data_size, node->layout_version_read_from_disk);
}
assert(rb.ndone == rb.size);
exit:
......@@ -2086,13 +2061,18 @@ deserialize_and_upgrade_leaf_node(FTNODE node,
assert_zero(r);
// Copy the pointer value straight into the OMT
LEAFENTRY new_le_in_bn = nullptr;
void *maybe_free;
bn->data_buffer.get_space_for_insert(
i,
key,
keylen,
new_le_size,
&new_le_in_bn
&new_le_in_bn,
&maybe_free
);
if (maybe_free) {
toku_free(maybe_free);
}
memcpy(new_le_in_bn, new_le, new_le_size);
toku_free(new_le);
}
......@@ -2101,8 +2081,7 @@ deserialize_and_upgrade_leaf_node(FTNODE node,
if (has_end_to_end_checksum) {
data_size -= sizeof(uint32_t);
}
bn->data_buffer.initialize_from_data(n_in_buf, &rb->buf[rb->ndone], data_size);
rb->ndone += data_size;
bn->data_buffer.deserialize_from_rbuf(n_in_buf, rb, data_size, node->layout_version_read_from_disk);
}
// Whatever this is must be less than the MSNs of every message above
......
......@@ -2917,7 +2917,7 @@ static void add_pair_to_leafnode (struct leaf_buf *lbuf, unsigned char *key, int
// #3588 TODO just make a clean ule and append it to the omt
// #3588 TODO can do the rebalancing here and avoid a lot of work later
FTNODE leafnode = lbuf->node;
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
DBT thekey = { .data = key, .size = (uint32_t) keylen };
DBT theval = { .data = val, .size = (uint32_t) vallen };
FT_MSG_S cmd = { .type = FT_INSERT,
......
......@@ -234,7 +234,7 @@ typedef struct cachetable *CACHETABLE;
typedef struct cachefile *CACHEFILE;
typedef struct ctpair *PAIR;
typedef class checkpointer *CHECKPOINTER;
typedef class bn_data *BN_DATA;
class bn_data;
/* tree command types */
enum ft_msg_type {
......
......@@ -98,6 +98,7 @@ struct memarena {
char *buf;
size_t buf_used, buf_size;
size_t size_of_other_bufs; // the buf_size of all the other bufs.
size_t footprint_of_other_bufs; // the footprint of all the other bufs.
char **other_bufs;
int n_other_bufs;
};
......@@ -108,6 +109,7 @@ MEMARENA memarena_create_presized (size_t initial_size) {
result->buf_used = 0;
result->other_bufs = NULL;
result->size_of_other_bufs = 0;
result->footprint_of_other_bufs = 0;
result->n_other_bufs = 0;
XMALLOC_N(result->buf_size, result->buf);
return result;
......@@ -128,6 +130,7 @@ void memarena_clear (MEMARENA ma) {
// But reuse the main buffer
ma->buf_used = 0;
ma->size_of_other_bufs = 0;
ma->footprint_of_other_bufs = 0;
}
static size_t
......@@ -151,6 +154,7 @@ void* malloc_in_memarena (MEMARENA ma, size_t size) {
ma->other_bufs[old_n]=ma->buf;
ma->n_other_bufs = old_n+1;
ma->size_of_other_bufs += ma->buf_size;
ma->footprint_of_other_bufs += toku_memory_footprint(ma->buf, ma->buf_used);
}
// Make a new one
{
......@@ -217,7 +221,9 @@ void memarena_move_buffers(MEMARENA dest, MEMARENA source) {
#endif
dest ->size_of_other_bufs += source->size_of_other_bufs + source->buf_size;
dest ->footprint_of_other_bufs += source->footprint_of_other_bufs + toku_memory_footprint(source->buf, source->buf_used);
source->size_of_other_bufs = 0;
source->footprint_of_other_bufs = 0;
assert(other_bufs);
dest->other_bufs = other_bufs;
......@@ -247,3 +253,11 @@ memarena_total_size_in_use (MEMARENA m)
{
return m->size_of_other_bufs + m->buf_used;
}
size_t
memarena_total_footprint (MEMARENA m)
{
return m->footprint_of_other_bufs + toku_memory_footprint(m->buf, m->buf_used) +
sizeof(*m) +
m->n_other_bufs * sizeof(*m->other_bufs);
}
......@@ -129,5 +129,6 @@ size_t memarena_total_memory_size (MEMARENA);
size_t memarena_total_size_in_use (MEMARENA);
size_t memarena_total_footprint (MEMARENA);
#endif
......@@ -146,7 +146,7 @@ PAIR_ATTR
rollback_memory_size(ROLLBACK_LOG_NODE log) {
size_t size = sizeof(*log);
if (log->rollentry_arena) {
size += memarena_total_memory_size(log->rollentry_arena);
size += memarena_total_footprint(log->rollentry_arena);
}
return make_rollback_pair_attr(size);
}
......
This diff is collapsed.
This diff is collapsed.
......@@ -115,13 +115,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keylen, const char
{
LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(vallen);
void *maybe_free = nullptr;
bn->get_space_for_insert(
idx,
key,
keylen,
size_needed,
&r
&r,
&maybe_free
);
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r);
r->type = LE_CLEAN;
r->u.clean.vallen = vallen;
......
......@@ -105,13 +105,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, char *key, int keylen, char *val, int va
{
LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(vallen);
void *maybe_free = nullptr;
bn->get_space_for_insert(
idx,
key,
keylen,
size_needed,
&r
&r,
&maybe_free
);
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r);
r->type = LE_CLEAN;
r->u.clean.vallen = vallen;
......@@ -127,7 +132,7 @@ long_key_cmp(DB *UU(e), const DBT *a, const DBT *b)
}
static void
test_serialize_leaf(int valsize, int nelts, double entropy) {
test_serialize_leaf(int valsize, int nelts, double entropy, int ser_runs, int deser_runs) {
// struct ft_handle source_ft;
struct ftnode *sn, *dn;
......@@ -214,32 +219,63 @@ test_serialize_leaf(int valsize, int nelts, double entropy) {
assert(size == 100);
}
struct timeval total_start;
struct timeval total_end;
total_start.tv_sec = total_start.tv_usec = 0;
total_end.tv_sec = total_end.tv_usec = 0;
struct timeval t[2];
gettimeofday(&t[0], NULL);
FTNODE_DISK_DATA ndd = NULL;
for (int i = 0; i < ser_runs; i++) {
gettimeofday(&t[0], NULL);
ndd = NULL;
sn->dirty = 1;
r = toku_serialize_ftnode_to(fd, make_blocknum(20), sn, &ndd, true, brt->ft, false);
assert(r==0);
gettimeofday(&t[1], NULL);
total_start.tv_sec += t[0].tv_sec;
total_start.tv_usec += t[0].tv_usec;
total_end.tv_sec += t[1].tv_sec;
total_end.tv_usec += t[1].tv_usec;
toku_free(ndd);
}
double dt;
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC);
printf("serialize leaf: %0.05lf\n", dt);
dt = (total_end.tv_sec - total_start.tv_sec) + ((total_end.tv_usec - total_start.tv_usec) / USECS_PER_SEC);
dt *= 1000;
dt /= ser_runs;
printf("serialize leaf(ms): %0.05lf (average of %d runs)\n", dt, ser_runs);
//reset
total_start.tv_sec = total_start.tv_usec = 0;
total_end.tv_sec = total_end.tv_usec = 0;
struct ftnode_fetch_extra bfe;
for (int i = 0; i < deser_runs; i++) {
fill_bfe_for_full_read(&bfe, brt_h);
gettimeofday(&t[0], NULL);
FTNODE_DISK_DATA ndd2 = NULL;
r = toku_deserialize_ftnode_from(fd, make_blocknum(20), 0/*pass zero for hash*/, &dn, &ndd2, &bfe);
assert(r==0);
gettimeofday(&t[1], NULL);
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC);
printf("deserialize leaf: %0.05lf\n", dt);
printf("io time %lf decompress time %lf deserialize time %lf\n",
tokutime_to_seconds(bfe.io_time),
tokutime_to_seconds(bfe.decompress_time),
tokutime_to_seconds(bfe.deserialize_time)
);
total_start.tv_sec += t[0].tv_sec;
total_start.tv_usec += t[0].tv_usec;
total_end.tv_sec += t[1].tv_sec;
total_end.tv_usec += t[1].tv_usec;
toku_ftnode_free(&dn);
toku_free(ndd2);
}
dt = (total_end.tv_sec - total_start.tv_sec) + ((total_end.tv_usec - total_start.tv_usec) / USECS_PER_SEC);
dt *= 1000;
dt /= deser_runs;
printf("deserialize leaf(ms): %0.05lf (average of %d runs)\n", dt, deser_runs);
printf("io time(ms) %lf decompress time(ms) %lf deserialize time(ms) %lf (average of %d runs)\n",
tokutime_to_seconds(bfe.io_time)*1000,
tokutime_to_seconds(bfe.decompress_time)*1000,
tokutime_to_seconds(bfe.deserialize_time)*1000,
deser_runs
);
toku_ftnode_free(&sn);
toku_block_free(brt_h->blocktable, BLOCK_ALLOCATOR_TOTAL_HEADER_RESERVE);
......@@ -247,14 +283,12 @@ test_serialize_leaf(int valsize, int nelts, double entropy) {
toku_free(brt_h->h);
toku_free(brt_h);
toku_free(brt);
toku_free(ndd);
toku_free(ndd2);
r = close(fd); assert(r != -1);
}
static void
test_serialize_nonleaf(int valsize, int nelts, double entropy) {
test_serialize_nonleaf(int valsize, int nelts, double entropy, int ser_runs, int deser_runs) {
// struct ft_handle source_ft;
struct ftnode sn, *dn;
......@@ -353,7 +387,8 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) {
gettimeofday(&t[1], NULL);
double dt;
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC);
printf("serialize nonleaf: %0.05lf\n", dt);
dt *= 1000;
printf("serialize nonleaf(ms): %0.05lf (IGNORED RUNS=%d)\n", dt, ser_runs);
struct ftnode_fetch_extra bfe;
fill_bfe_for_full_read(&bfe, brt_h);
......@@ -363,11 +398,13 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) {
assert(r==0);
gettimeofday(&t[1], NULL);
dt = (t[1].tv_sec - t[0].tv_sec) + ((t[1].tv_usec - t[0].tv_usec) / USECS_PER_SEC);
printf("deserialize nonleaf: %0.05lf\n", dt);
printf("io time %lf decompress time %lf deserialize time %lf\n",
tokutime_to_seconds(bfe.io_time),
tokutime_to_seconds(bfe.decompress_time),
tokutime_to_seconds(bfe.deserialize_time)
dt *= 1000;
printf("deserialize nonleaf(ms): %0.05lf (IGNORED RUNS=%d)\n", dt, deser_runs);
printf("io time(ms) %lf decompress time(ms) %lf deserialize time(ms) %lf (IGNORED RUNS=%d)\n",
tokutime_to_seconds(bfe.io_time)*1000,
tokutime_to_seconds(bfe.decompress_time)*1000,
tokutime_to_seconds(bfe.deserialize_time)*1000,
deser_runs
);
toku_ftnode_free(&dn);
......@@ -394,19 +431,32 @@ test_serialize_nonleaf(int valsize, int nelts, double entropy) {
int
test_main (int argc __attribute__((__unused__)), const char *argv[] __attribute__((__unused__))) {
long valsize, nelts;
const int DEFAULT_RUNS = 5;
long valsize, nelts, ser_runs = DEFAULT_RUNS, deser_runs = DEFAULT_RUNS;
double entropy = 0.3;
if (argc != 3) {
fprintf(stderr, "Usage: %s <valsize> <nelts>\n", argv[0]);
if (argc != 3 && argc != 5) {
fprintf(stderr, "Usage: %s <valsize> <nelts> [<serialize_runs> <deserialize_runs>]\n", argv[0]);
fprintf(stderr, "Default (and min) runs is %d\n", DEFAULT_RUNS);
return 2;
}
valsize = strtol(argv[1], NULL, 0);
nelts = strtol(argv[2], NULL, 0);
if (argc == 5) {
ser_runs = strtol(argv[3], NULL, 0);
deser_runs = strtol(argv[4], NULL, 0);
}
if (ser_runs <= 0) {
ser_runs = DEFAULT_RUNS;
}
if (deser_runs <= 0) {
deser_runs = DEFAULT_RUNS;
}
initialize_dummymsn();
test_serialize_leaf(valsize, nelts, entropy);
test_serialize_nonleaf(valsize, nelts, entropy);
test_serialize_leaf(valsize, nelts, entropy, ser_runs, deser_runs);
test_serialize_nonleaf(valsize, nelts, entropy, ser_runs, deser_runs);
return 0;
}
This diff is collapsed.
......@@ -119,7 +119,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
MSN msn = next_dummymsn();
......
......@@ -96,13 +96,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keysize, const cha
{
LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize);
void *maybe_free = nullptr;
bn->get_space_for_insert(
idx,
key,
keysize,
size_needed,
&r
&r,
&maybe_free
);
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r);
r->type = LE_CLEAN;
r->u.clean.vallen = valsize;
......@@ -113,14 +118,19 @@ static void
le_overwrite(bn_data* bn, uint32_t idx, const char *key, int keysize, const char *val, int valsize) {
LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize);
void *maybe_free = nullptr;
bn->get_space_for_overwrite(
idx,
key,
keysize,
size_needed, // old_le_size
size_needed,
&r
&r,
&maybe_free
);
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r);
r->type = LE_CLEAN;
r->u.clean.vallen = valsize;
......
......@@ -733,7 +733,7 @@ flush_to_leaf(FT_HANDLE t, bool make_leaf_up_to_date, bool use_flush) {
int total_messages = 0;
for (i = 0; i < 8; ++i) {
total_messages += BLB_DATA(child, i)->omt_size();
total_messages += BLB_DATA(child, i)->num_klpairs();
}
assert(total_messages <= num_parent_messages + num_child_messages);
......@@ -746,7 +746,7 @@ flush_to_leaf(FT_HANDLE t, bool make_leaf_up_to_date, bool use_flush) {
memset(parent_messages_present, 0, sizeof parent_messages_present);
memset(child_messages_present, 0, sizeof child_messages_present);
for (int j = 0; j < 8; ++j) {
uint32_t len = BLB_DATA(child, j)->omt_size();
uint32_t len = BLB_DATA(child, j)->num_klpairs();
for (uint32_t idx = 0; idx < len; ++idx) {
LEAFENTRY le;
DBT keydbt, valdbt;
......@@ -968,7 +968,7 @@ flush_to_leaf_with_keyrange(FT_HANDLE t, bool make_leaf_up_to_date) {
int total_messages = 0;
for (i = 0; i < 8; ++i) {
total_messages += BLB_DATA(child, i)->omt_size();
total_messages += BLB_DATA(child, i)->num_klpairs();
}
assert(total_messages <= num_parent_messages + num_child_messages);
......@@ -1144,10 +1144,10 @@ compare_apply_and_flush(FT_HANDLE t, bool make_leaf_up_to_date) {
toku_ftnode_free(&parentnode);
for (int j = 0; j < 8; ++j) {
BN_DATA first = BLB_DATA(child1, j);
BN_DATA second = BLB_DATA(child2, j);
uint32_t len = first->omt_size();
assert(len == second->omt_size());
bn_data* first = BLB_DATA(child1, j);
bn_data* second = BLB_DATA(child2, j);
uint32_t len = first->num_klpairs();
assert(len == second->num_klpairs());
for (uint32_t idx = 0; idx < len; ++idx) {
LEAFENTRY le1, le2;
DBT key1dbt, val1dbt, key2dbt, val2dbt;
......
......@@ -348,7 +348,7 @@ doit (int state) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1);
assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
toku_pin_ftnode_off_client_thread(
......@@ -364,7 +364,7 @@ doit (int state) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1);
assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
}
else if (state == ft_flush_aflter_merge || state == flt_flush_before_unpin_remove) {
......@@ -381,7 +381,7 @@ doit (int state) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2);
assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
}
else {
......
......@@ -359,7 +359,7 @@ doit (int state) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2);
assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
toku_pin_ftnode_off_client_thread(
......@@ -375,7 +375,7 @@ doit (int state) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2);
assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
......
......@@ -342,7 +342,7 @@ doit (bool after_split) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1);
assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
toku_pin_ftnode_off_client_thread(
......@@ -358,7 +358,7 @@ doit (bool after_split) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 1);
assert(BLB_DATA(node, 0)->num_klpairs() == 1);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
}
else {
......@@ -375,7 +375,7 @@ doit (bool after_split) {
assert(node->height == 0);
assert(!node->dirty);
assert(node->n_children == 1);
assert(BLB_DATA(node, 0)->omt_size() == 2);
assert(BLB_DATA(node, 0)->num_klpairs() == 2);
toku_unpin_ftnode_off_client_thread(c_ft->ft, node);
}
......
......@@ -213,7 +213,7 @@ test_le_offsets (void) {
static void
test_ule_packs_to_nothing (ULE ule) {
LEAFENTRY le;
int r = le_pack(ule, NULL, 0, NULL, 0, 0, &le);
int r = le_pack(ule, NULL, 0, NULL, 0, 0, &le, nullptr);
assert(r==0);
assert(le==NULL);
}
......@@ -319,7 +319,7 @@ test_le_pack_committed (void) {
size_t memsize;
LEAFENTRY le;
int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le);
int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le, nullptr);
assert(r==0);
assert(le!=NULL);
memsize = le_memsize_from_ule(&ule);
......@@ -329,7 +329,7 @@ test_le_pack_committed (void) {
verify_ule_equal(&ule, &tmp_ule);
LEAFENTRY tmp_le;
size_t tmp_memsize;
r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le);
r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le, nullptr);
tmp_memsize = le_memsize_from_ule(&tmp_ule);
assert(r==0);
assert(tmp_memsize == memsize);
......@@ -377,7 +377,7 @@ test_le_pack_uncommitted (uint8_t committed_type, uint8_t prov_type, int num_pla
size_t memsize;
LEAFENTRY le;
int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le);
int r = le_pack(&ule, nullptr, 0, nullptr, 0, 0, &le, nullptr);
assert(r==0);
assert(le!=NULL);
memsize = le_memsize_from_ule(&ule);
......@@ -387,7 +387,7 @@ test_le_pack_uncommitted (uint8_t committed_type, uint8_t prov_type, int num_pla
verify_ule_equal(&ule, &tmp_ule);
LEAFENTRY tmp_le;
size_t tmp_memsize;
r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le);
r = le_pack(&tmp_ule, nullptr, 0, nullptr, 0, 0, &tmp_le, nullptr);
tmp_memsize = le_memsize_from_ule(&tmp_ule);
assert(r==0);
assert(tmp_memsize == memsize);
......@@ -448,7 +448,7 @@ test_le_apply(ULE ule_initial, FT_MSG msg, ULE ule_expected) {
LEAFENTRY le_expected;
LEAFENTRY le_result;
r = le_pack(ule_initial, nullptr, 0, nullptr, 0, 0, &le_initial);
r = le_pack(ule_initial, nullptr, 0, nullptr, 0, 0, &le_initial, nullptr);
CKERR(r);
size_t result_memsize = 0;
......@@ -467,7 +467,7 @@ test_le_apply(ULE ule_initial, FT_MSG msg, ULE ule_expected) {
}
size_t expected_memsize = 0;
r = le_pack(ule_expected, nullptr, 0, nullptr, 0, 0, &le_expected);
r = le_pack(ule_expected, nullptr, 0, nullptr, 0, 0, &le_expected, nullptr);
CKERR(r);
if (le_expected) {
expected_memsize = leafentry_memsize(le_expected);
......@@ -749,7 +749,7 @@ test_le_apply_messages(void) {
static bool ule_worth_running_garbage_collection(ULE ule, TXNID oldest_referenced_xid_known) {
LEAFENTRY le;
int r = le_pack(ule, nullptr, 0, nullptr, 0, 0, &le); CKERR(r);
int r = le_pack(ule, nullptr, 0, nullptr, 0, 0, &le, nullptr); CKERR(r);
invariant_notnull(le);
bool worth_running = toku_le_worth_running_garbage_collection(le, oldest_referenced_xid_known);
toku_free(le);
......
......@@ -189,7 +189,7 @@ doit (void) {
r = toku_testsetup_root(t, node_root);
assert(r==0);
char filler[900];
char filler[900-2*bn_data::HEADER_LENGTH];
memset(filler, 0, sizeof(filler));
// now we insert filler data so that a merge does not happen
r = toku_testsetup_insert_to_leaf (
......
......@@ -119,13 +119,18 @@ le_add_to_bn(bn_data* bn, uint32_t idx, const char *key, int keysize, const cha
{
LEAFENTRY r = NULL;
uint32_t size_needed = LE_CLEAN_MEMSIZE(valsize);
void *maybe_free = nullptr;
bn->get_space_for_insert(
idx,
key,
keysize,
size_needed,
&r
&r,
&maybe_free
);
if (maybe_free) {
toku_free(maybe_free);
}
resource_assert(r);
r->type = LE_CLEAN;
r->u.clean.vallen = valsize;
......
......@@ -122,7 +122,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
MSN msn = next_dummymsn();
......
......@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node
MSN msn = next_dummymsn();
......
......@@ -112,7 +112,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node
MSN msn = next_dummymsn();
......
......@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node
MSN msn = next_dummymsn();
......
......@@ -112,7 +112,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node
MSN msn = next_dummymsn();
......
......@@ -114,7 +114,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node
MSN msn = next_dummymsn();
......
......@@ -111,7 +111,7 @@ append_leaf(FTNODE leafnode, void *key, size_t keylen, void *val, size_t vallen)
DBT theval; toku_fill_dbt(&theval, val, vallen);
// get an index that we can use to create a new leaf entry
uint32_t idx = BLB_DATA(leafnode, 0)->omt_size();
uint32_t idx = BLB_DATA(leafnode, 0)->num_klpairs();
// apply an insert to the leaf node
MSN msn = next_dummymsn();
......
......@@ -315,9 +315,9 @@ dump_node (int f, BLOCKNUM blocknum, FT h) {
}
} else {
printf(" n_bytes_in_buffer= %" PRIu64 "", BLB_DATA(n, i)->get_disk_size());
printf(" items_in_buffer=%u\n", BLB_DATA(n, i)->omt_size());
printf(" items_in_buffer=%u\n", BLB_DATA(n, i)->num_klpairs());
if (dump_data) {
BLB_DATA(n, i)->omt_iterate<void, print_le>(NULL);
BLB_DATA(n, i)->iterate<void, print_le>(NULL);
}
}
}
......
......@@ -149,7 +149,8 @@ le_pack(ULE ule, // data to be packed into new leafentry
void* keyp,
uint32_t keylen,
uint32_t old_le_size,
LEAFENTRY * const new_leafentry_p // this is what this function creates
LEAFENTRY * const new_leafentry_p, // this is what this function creates
void **const maybe_free
);
......
......@@ -242,20 +242,21 @@ static void get_space_for_le(
uint32_t keylen,
uint32_t old_le_size,
size_t size,
LEAFENTRY* new_le_space
LEAFENTRY* new_le_space,
void **const maybe_free
)
{
if (data_buffer == NULL) {
if (data_buffer == nullptr) {
CAST_FROM_VOIDP(*new_le_space, toku_xmalloc(size));
}
else {
// this means we are overwriting something
if (old_le_size > 0) {
data_buffer->get_space_for_overwrite(idx, keyp, keylen, old_le_size, size, new_le_space);
data_buffer->get_space_for_overwrite(idx, keyp, keylen, old_le_size, size, new_le_space, maybe_free);
}
// this means we are inserting something new
else {
data_buffer->get_space_for_insert(idx, keyp, keylen, size, new_le_space);
data_buffer->get_space_for_insert(idx, keyp, keylen, size, new_le_space, maybe_free);
}
}
}
......@@ -470,23 +471,17 @@ toku_le_apply_msg(FT_MSG msg,
int64_t newnumbytes = 0;
uint64_t oldmemsize = 0;
uint32_t keylen = ft_msg_get_keylen(msg);
LEAFENTRY copied_old_le = NULL;
size_t old_le_size = old_leafentry ? leafentry_memsize(old_leafentry) : 0;
toku::scoped_malloc copied_old_le_buf(old_le_size);
if (old_leafentry) {
CAST_FROM_VOIDP(copied_old_le, copied_old_le_buf.get());
memcpy(copied_old_le, old_leafentry, old_le_size);
}
if (old_leafentry == NULL) {
msg_init_empty_ule(&ule);
} else {
oldmemsize = leafentry_memsize(old_leafentry);
le_unpack(&ule, copied_old_le); // otherwise unpack leafentry
le_unpack(&ule, old_leafentry); // otherwise unpack leafentry
oldnumbytes = ule_get_innermost_numbytes(&ule, keylen);
}
msg_modify_ule(&ule, msg); // modify unpacked leafentry
ule_simple_garbage_collection(&ule, oldest_referenced_xid, gc_info);
void *maybe_free = nullptr;
int rval = le_pack(
&ule, // create packed leafentry
data_buffer,
......@@ -494,7 +489,8 @@ toku_le_apply_msg(FT_MSG msg,
ft_msg_get_key(msg), // contract of this function is caller has this set, always
keylen, // contract of this function is caller has this set, always
oldmemsize,
new_leafentry_p
new_leafentry_p,
&maybe_free
);
invariant_zero(rval);
if (*new_leafentry_p) {
......@@ -502,6 +498,9 @@ toku_le_apply_msg(FT_MSG msg,
}
*numbytes_delta_p = newnumbytes - oldnumbytes;
ule_cleanup(&ule);
if (maybe_free) {
toku_free(maybe_free);
}
}
bool toku_le_worth_running_garbage_collection(LEAFENTRY le, TXNID oldest_referenced_xid_known) {
......@@ -557,15 +556,8 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry,
ULE_S ule;
int64_t oldnumbytes = 0;
int64_t newnumbytes = 0;
LEAFENTRY copied_old_le = NULL;
size_t old_le_size = old_leaf_entry ? leafentry_memsize(old_leaf_entry) : 0;
toku::scoped_malloc copied_old_le_buf(old_le_size);
if (old_leaf_entry) {
CAST_FROM_VOIDP(copied_old_le, copied_old_le_buf.get());
memcpy(copied_old_le, old_leaf_entry, old_le_size);
}
le_unpack(&ule, copied_old_le);
le_unpack(&ule, old_leaf_entry);
oldnumbytes = ule_get_innermost_numbytes(&ule, keylen);
uint32_t old_mem_size = leafentry_memsize(old_leaf_entry);
......@@ -580,6 +572,7 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry,
ule_try_promote_provisional_outermost(&ule, oldest_possible_live_xid);
ule_garbage_collect(&ule, snapshot_xids, referenced_xids, live_root_txns);
void *maybe_free = nullptr;
int r = le_pack(
&ule,
data_buffer,
......@@ -587,7 +580,8 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry,
keyp,
keylen,
old_mem_size,
new_leaf_entry
new_leaf_entry,
&maybe_free
);
assert(r == 0);
if (*new_leaf_entry) {
......@@ -595,6 +589,9 @@ toku_le_garbage_collect(LEAFENTRY old_leaf_entry,
}
*numbytes_delta_p = newnumbytes - oldnumbytes;
ule_cleanup(&ule);
if (maybe_free) {
toku_free(maybe_free);
}
}
/////////////////////////////////////////////////////////////////////////////////
......@@ -901,7 +898,8 @@ le_pack(ULE ule, // data to be packed into new leafentry
void* keyp,
uint32_t keylen,
uint32_t old_le_size,
LEAFENTRY * const new_leafentry_p // this is what this function creates
LEAFENTRY * const new_leafentry_p, // this is what this function creates
void **const maybe_free
)
{
invariant(ule->num_cuxrs > 0);
......@@ -927,10 +925,10 @@ le_pack(ULE ule, // data to be packed into new leafentry
rval = 0;
goto cleanup;
}
found_insert:;
found_insert:
memsize = le_memsize_from_ule(ule);
LEAFENTRY new_leafentry;
get_space_for_le(data_buffer, idx, keyp, keylen, old_le_size, memsize, &new_leafentry);
get_space_for_le(data_buffer, idx, keyp, keylen, old_le_size, memsize, &new_leafentry, maybe_free);
//p always points to first unused byte after leafentry we are packing
uint8_t *p;
......@@ -2393,12 +2391,14 @@ toku_le_upgrade_13_14(LEAFENTRY_13 old_leafentry,
// malloc instead of a mempool. However after supporting upgrade,
// we need to use mempools and the OMT.
rval = le_pack(&ule, // create packed leafentry
NULL,
nullptr,
0, //only matters if we are passing in a bn_data
NULL, //only matters if we are passing in a bn_data
nullptr, //only matters if we are passing in a bn_data
0, //only matters if we are passing in a bn_data
0, //only matters if we are passing in a bn_data
new_leafentry_p);
new_leafentry_p,
nullptr //only matters if we are passing in a bn_data
);
ule_cleanup(&ule);
*new_leafentry_memorysize = leafentry_memsize(*new_leafentry_p);
return rval;
......
......@@ -187,6 +187,13 @@ static inline void wbuf_uint (struct wbuf *w, uint32_t i) {
wbuf_int(w, (int32_t)i);
}
static inline uint8_t* wbuf_nocrc_reserve_literal_bytes(struct wbuf *w, uint32_t nbytes) {
assert(w->ndone + nbytes <= w->size);
uint8_t * dest = w->buf + w->ndone;
w->ndone += nbytes;
return dest;
}
static inline void wbuf_nocrc_literal_bytes(struct wbuf *w, bytevec bytes_bv, uint32_t nbytes) {
const unsigned char *bytes = (const unsigned char *) bytes_bv;
#if 0
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment