Commit 6413dc84 authored by Tim Peters's avatar Tim Peters

For speed, and to squash delicate code duplication, introduced a

micro-optimized BUCKET_SEARCH macro.  Change _bucket_get() to use
it (more later).
parent 2bdd3d57
......@@ -12,7 +12,41 @@
****************************************************************************/
#define BUCKETTEMPLATE_C "$Id: BucketTemplate.c,v 1.35 2002/06/08 16:16:32 tim_one Exp $\n"
#define BUCKETTEMPLATE_C "$Id: BucketTemplate.c,v 1.36 2002/06/08 19:46:18 tim_one Exp $\n"
/* Use BUCKET_SEARCH to find the index at which a key belongs.
* INDEX An int lvalue to hold the index i such that KEY belongs at
* SELF->keys[i]. Note that this will equal SELF->len if KEY
* is larger than the bucket's largest key. Else it's the
* smallest i such that SELF->keys[i] >= KEY.
* ABSENT An int lvalue to hold a Boolean result, true (!= 0) if the
* key is absent, false (== 0) if the key is at INDEX.
* SELF A pointer to a Bucket node.
* KEY The key you're looking for, of type KEY_TYPE.
* ONERROR What to do if key comparison raises an exception; for example,
* perhaps 'return NULL'.
*
* See Maintainer.txt for discussion: this is optimized in subtle ways.
* It's recommended that you call this at the start of a routine, waiting
* to check for self->len == 0 after (if an empty bucket is special in
* context; INDEX becomes 0 and ABSENT becomes true if this macro is run
* with an empty SELF, and that may be all the invoker needs to know).
*/
#define BUCKET_SEARCH(INDEX, ABSENT, SELF, KEY, ONERROR) { \
int _lo = 0; \
int _hi = (SELF)->len; \
int _i; \
int _cmp = 1; \
for (_i = _hi >> 1; _lo < _hi; _i = (_lo + _hi) >> 1) { \
TEST_KEY_SET_OR(_cmp, (SELF)->keys[_i], (KEY)) \
ONERROR; \
if (_cmp < 0) _lo = _i + 1; \
else if (_cmp == 0) break; \
else _hi = _i; \
} \
(INDEX) = _i; \
(ABSENT) = _cmp; \
}
/*
** _bucket_get
......@@ -41,45 +75,31 @@
static PyObject *
_bucket_get(Bucket *self, PyObject *keyarg, int has_key)
{
int min, max, i, l, cmp, copied=1;
PyObject *r;
int i, cmp;
KEY_TYPE key;
PyObject *r = NULL;
int copied = 1;
COPY_KEY_FROM_ARG(key, keyarg, copied);
UNLESS (copied) return NULL;
PER_USE_OR_RETURN(self, NULL);
for (min=0, max=self->len, i=max/2, l=max; i != l; l=i, i=(min+max)/2)
{
TEST_KEY_SET_OR(cmp, self->keys[i], key) goto err;
if (PyErr_Occurred()) goto err;
if (cmp < 0) min=i;
else if (cmp == 0)
{
if (has_key) r=PyInt_FromLong(has_key);
else
{
BUCKET_SEARCH(i, cmp, self, key, goto Done);
if (has_key)
r = PyInt_FromLong(cmp ? 0 : has_key);
else {
if (cmp == 0) {
COPY_VALUE_TO_OBJECT(r, self->values[i]);
}
PER_ALLOW_DEACTIVATION(self);
PER_ACCESSED(self);
return r;
}
else max=i;
}
PER_ALLOW_DEACTIVATION(self);
PER_ACCESSED(self);
if (has_key) return PyInt_FromLong(0);
else
PyErr_SetObject(PyExc_KeyError, keyarg);
return NULL;
}
err:
Done:
PER_ALLOW_DEACTIVATION(self);
PER_ACCESSED(self);
return NULL;
return r;
}
static PyObject *
......
......@@ -266,3 +266,90 @@ Optimization points:
immediate test-and-branch, and for a very unlikely case (BTree
nodes are rarely empty). It's better to get into the loop right
away so the normal case makes progress ASAP.
The BUCKET_SEARCH Macro
=======================
This has a different job than BTREE_SEARCH: the key 0 slot is
legitimate in a bucket, and we want to find the index at which the
key belongs. If the key is larger than the bucket's largest key, a
new slot at index len is where it belongs, else it belongs at the
smallest i with keys[i] >= the key we're looking for. We also need
to know whether or not the key is present (BTREE_SEARCH didn't care;
it only wanted to find the next node to search).
The mechanics of the search are quite similar, though. The primary
loop invariant changes to (say we're searching for key k):
K(lo-1) < k < K(hi)
where K(i) means keys[i], and we pretend K(-1) is minus infinity and
K(len) is plus infinity.
If the bucket is empty, lo=hi=i=0 at the start, the loop body is never
entered, and the macro sets INDEX to 0 and ABSENT to true. That's why
_cmp is initialized to 1 (_cmp becomes ABSENT).
Else the bucket is not empty, lo<hi at the start, and the loop body
is entered. The invariant is obviously satisfied then, as lo=0 and
hi=len.
If K[i]<k, lo is set to i+1, preserving that K(lo-1) = K[i] < k.
If K[i]>k, hi is set to i, preserving that K[hi] = K[i] > k.
If the loop exits after either of those, _cmp != 0, so ABSENT becomes
true.
If K[i]=k, the loop breaks, so that INDEX becomes i, and ABSENT
becomes false (_cmp=0 in this case).
The same case analysis for BTREE_SEARCH on lo and hi holds here:
a. (lo == i == hi) if and only if (lo == hi).
b. (lo == i < hi) if and only if (lo+1 == hi).
c. (lo < i < hi) if and only if (lo+1 < hi).
So long as lo+1 < hi, we're in case #c, and either break with
equality (in which case the right results are obviously computed) or
narrow the range. If equality doesn't obtain, the range eventually
narrows to cases #a or #b.
To go from #c to #a, we must have lo+2==hi at the start, and
K[i]=K[lo+1]<k. Then the new lo gets set to i+1 = lo+2 = hi, and the
loop exits with lo=hi=i and _cmp<0. This is correct, because we
know that k != K(i) (loop invariant! we actually know something
stronger, that k < K(hi); since i=hi, this implies k != K(i)).
Else #c eventually falls into case #b, lo+1==hi and i==lo. The
invariant tells us K(lo-1) < k < K(hi) = K(lo+1), so if the key
is present it must be at K(lo). i==lo in this case, so we test
K(lo) against k. As always, if equality obtains we do the right
thing, else case #b becomes case #a.
When #b becomes #a, the last comparison was non-equal, so _cmp is
non-zero, and the loop exits because lo==hi==i in case #a. The
invariant then tells us K(lo-1) < k < K(lo), so the key is in fact
not present, it's correct to exit with _cmp non-zero, and i==lo is
again the index at which k belongs.
Optimization points:
+ As for BTREE_SEARCH, shifting of signed ints is cheaper than
division.
+ Unlike as for BTREE_SEARCH, there's nothing special about searching
an empty bucket, and the macro computes thoroughly sensible results
in that case.
+ The order of _cmp comparisons differs from BTREE_SEARCH. When
searching a bucket, it's much more likely (than when searching a
BTree node) that the key is present, so testing __cmp==0 isn't a
systematic waste of cycles. At the extreme, if all searches are
successful (key present), on average this saves one comparison per
search, against leaving the determination of _cmp==0 implicit (as
BTREE_SEARCH does). But even on successful searches, __cmp != 0 is
a more popular outcome than __cmp == 0 across iterations (unless
the bucket has only a few keys), so it's important to check one
of the inequality cases first. It turns out it's better on average
to check K(i) < key (than to check K(i) > key), because when it
pays it narrows the range more (we get a little boost from setting
lo=i+1 in this case; the other case sets hi=i, which isn't as much
of a narrowing).
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment