Commit de53a4f5 authored by Tim Peters's avatar Tim Peters

Updated the BTree intro to include operations new in ZODB 3.3 (play nice

with Python's newer iteration and __contains__ protocols; optional
Boolean args to exclude endpoints on range searches).
parent 75795c57
......@@ -56,28 +56,31 @@ touching every single key).
The BTrees package provides a large collection of related data
structures. There are variants of the data structures specialized to
handle integer values, which are faster and use less memory. There
integers, which are faster and use less memory. There
are four modules that handle the different variants. The first two
letters of the module name specify the types of the keys and values in
mappings -- O for any object and I for integer. For example, the
\module{BTrees.IOBTree} module provides a mapping with integer
keys and arbitrary objects as values.
The four data structures provide by each module are a btree, a bucket,
a tree set, and a set. The btree and bucket types are mappings and
The four data structures provide by each module are a BTree, a Bucket,
a TreeSet, and a Set. The BTree and Bucket types are mappings and
support all the usual mapping methods, e.g. \function{update()} and
\function{keys()}. The tree set and set types are similar to mappings
\function{keys()}. The TreeSet and Set types are similar to mappings
but they have no values; they support the methods that make sense for
a mapping with no keys, e.g. \function{keys()} but not
\function{items()}. The bucket and set types are the individual
building blocks for btrees and tree sets, respectively. A bucket or
set can be used when you are sure that it will have few elements. If
the data structure will grow large, you should use a btree or tree
set. Like Python lists, buckets and sets are allocated in one
\function{items()}. The Bucket and Set types are the individual
building blocks for BTrees and TreeSets, respectively. A Bucket or
Set can be used when you are sure that it will have few elements. If
the data structure will grow large, you should use a BTree or TreeSet.
Like Python lists, Buckets and Sets are allocated in one
contiguous piece, and insertions and deletions can take time
proportional to the number of existing elements. Btrees and tree sets
are multi-level tree structures with much better (logarithmic) worst-case
time bounds.
proportional to the number of existing elements. Also like Python lists,
a Bucket or Set is a single object, and is pickled and unpickled in its
entirety. BTrees and TreeSets are multi-level tree structures with
much better (logarithmic) worst-case time bounds, and the tree structure
is built out of multiple objects, which ZODB can load individually
as needed.
The four modules are named \module{OOBTree}, \module{IOBTree},
\module{OIBTree}, and \module{IIBTree}. The two letter prefixes are
......@@ -87,7 +90,7 @@ defines the following types: \class{OOBTree}, \class{OOBucket},
each define their own variants of those four types.
The \function{keys()}, \function{values()}, and \function{items()}
methods on btree and tree set types do not materialize a list with all
methods on BTree and TreeSet types do not materialize a list with all
of the data. Instead, they return lazy sequences that fetch data
from the BTree as needed. They also support optional arguments to
specify the minimum and maximum values to return, often called "range
......@@ -95,24 +98,29 @@ searching". Because all these types are stored in sorted order, range
searching is very efficient.
The \function{keys()}, \function{values()}, and \function{items()}
methods on bucket and set types do return lists with all the data.
Starting in ZODB4, there are also \function{iterkeys()},
methods on Bucket and Set types do return lists with all the data.
Starting in ZODB 3.3, there are also \function{iterkeys()},
\function{itervalues()}, and \function{iteritems()} methods that
return iterators (in the Python 2.2 sense).
return iterators (in the Python 2.2 sense). Those methods also apply to
BTree and TreeSet objects.
A BTree object supports all the methods you would expect of a mapping
A BTree object supports all the methods you would expect of a mapping,
with a few extensions that exploit the fact that the keys are sorted.
The example below demonstrates how some of the methods work. The
extra methods are \function{minKey()} and \function{maxKey()}, which
find the minimum and maximum key value subject to an optional bound
argument, and \function{byValue()}, which should probably be ignored
(it's hard to explain exactly what it does, and as a result it's
almost never used -- best to consider it deprecated).
almost never used -- best to consider it deprecated). The various
methods for enumerating keys, values and items also accept minimum
and maximum key arguments ("range search"), and (new in ZODB 3.3)
optional Boolean arguments to control whether a range search is
inclusive or exclusive of the range's endpoints.
\begin{verbatim}
>>> from BTrees.OOBTree import OOBTree
>>> t = OOBTree()
>>> t.update({ 1: "red", 2: "green", 3: "blue", 4: "spades" })
>>> t.update({1: "red", 2: "green", 3: "blue", 4: "spades"})
>>> len(t)
4
>>> t[2]
......@@ -128,14 +136,38 @@ almost never used -- best to consider it deprecated).
[1, 2, 3, 4]
>>> list(t.values())
['red', 'green', 'blue', 'spades']
>>> list(t.values(1, 2))
>>> list(t.values(1, 2)) # values at keys in 1 to 2 inclusive
['red', 'green']
>>> list(t.values(2))
>>> list(t.values(2)) # values at keys >= 2
['green', 'blue', 'spades']
>>> list(t.values(min=1, max=4)) # keyword args new in ZODB 3.3
['red', 'green', 'blue', 'spades']
>>> list(t.values(min=1, max=4, excludemin=True, excludemax=True))
['green', 'blue']
>>> t.minKey() # smallest key
1
>>> t.minKey(1.5) # smallest key >= 1.5
2
>>> for k in t.keys():
... print k,
1 2 3 4
>>> for k in t: # new in ZODB 3.3
... print k,
1 2 3 4
>>> for pair in t.iteritems(): # new in ZODB 3.3
... print pair,
...
(1, 'red') (2, 'green') (3, 'blue') (4, 'spades')
>>> t.has_key(4) # returns a true value, but exactly what undefined
2
>>> t.has_key(5)
0
>>> 4 in t # new in ZODB 3.3
True
>>> 5 in t # new in ZODB 3.3
False
>>>
\end{verbatim}
% XXX I'm not sure all of the following is actually correct. The
......@@ -143,19 +175,32 @@ almost never used -- best to consider it deprecated).
Each of the modules also defines some functions that operate on
BTrees -- \function{difference()}, \function{union()}, and
\function{intersection()}. The \function{difference()} function returns
a bucket, while the other two methods return a set.
a Bucket, while the other two methods return a Set.
If the keys are integers, then the module also defines
\function{multiunion()}. If the values are integers, then the module
also defines \function{weightedIntersection()} and
\function{weightedUnion()}. The function doc strings describe each
function briefly.
\code{BTrees/Interfaces.py} defines the operations, and is the official
documentation. Note that the interfaces don't define the concrete types
returned by most operations, and you shouldn't rely on the concrete types
that happen to be returned: stick to operations guaranteed by the
interface. In particular, note that the interfaces don't specify anything
about comparison behavior, and so nothing about it is guaranteed. In ZODB
3.3, for example, two BTrees happen to use Python's default object
comparison, which amounts to comparing the (arbitrary but fixed) memory
addresses of the BTrees. This may or may not be true in future releases.
If the interfaces don't specify a behavior, then whether that behavior
appears to work, and exactly happens if it does appear to work, are
undefined and should not be relied on.
\subsubsection{Total Ordering and Persistence}
The BTree-based data structures differ from Python dicts in several
fundamental ways. One of the most important is that while dicts
require that keys support hash codes and equality comparison,
the btree-based structures don't use hash codes and require a total
the BTree-based structures don't use hash codes and require a total
ordering on keys.
Total ordering means three things:
......@@ -178,11 +223,11 @@ numbers are an example of an object whose default comparison function
does not satisfy these rules: complex numbers only support \code{==}
and \code{!=} comparisons, and raise an exception if you try to compare
them in any other way. They don't satisfy the trichotomy rule, and must
not be used as keys in btree-based data structures (although note that
not be used as keys in BTree-based data structures (although note that
complex numbers can be used as keys in Python dicts, which do not require
a total ordering).
Examples of objects that are wholly safe to use as keys in btree-based
Examples of objects that are wholly safe to use as keys in BTree-based
structures include ints, longs, floats, 8-bit strings, Unicode strings,
and tuples composed (possibly recursively) of objects of wholly safe
types.
......@@ -194,11 +239,11 @@ the two loses trichotomy; e.g., \code{'x' < chr(255)} and
\code{u'x' == 'x'}, but trying to compare \code{chr(255)} to
\code{u'x'} raises an exception. Partly for this reason (another is
given later), it can be dangerous to use keys with multiple types in
a single btree-based structure. Don't try to do that, and you don't
a single BTree-based structure. Don't try to do that, and you don't
have to worry about it.
Another potential problem is mutability: when a key is inserted in a
btree-based structure, it must retain the same order relative to the
BTree-based structure, it must retain the same order relative to the
other keys over time. This is easy to run afoul of if you use mutable
objects as keys. For example, lists supply a total ordering, and then
......@@ -228,7 +273,7 @@ Python does several kinds of comparison by comparing the memory
addresses of two objects. Because Python never moves an object in memory,
this does supply a usable (albeit arbitrary) total ordering across the
life of a program run (an object's memory address doesn't change). But
if objects compared in this way are used as keys of a btree-based
if objects compared in this way are used as keys of a BTree-based
structure that's stored in a database, when the objects are loaded from
the database again they will almost certainly wind up at different
memory addresses. There's no guarantee then that if key K1 had a memory
......@@ -244,8 +289,8 @@ nothing to worry about here if you use keys of those types. The most
common mistake is to use keys that are instances of a user-defined class
that doesn't supply its own \method{__cmp__()} method. Python compares
such instances by memory address. This is fine if such instances are
used as keys in temporary btree-based structures used only in a single
program run. It can be disastrous if that btree-based structure is
used as keys in temporary BTree-based structures used only in a single
program run. It can be disastrous if that BTree-based structure is
stored to a database, though.
\begin{verbatim}
......@@ -270,7 +315,7 @@ Another problem occurs with instances of classes that do define
\method{__cmp__()}, but define it incorrectly. It's possible but
rare for a custom \method{__cmp__()} implementation to violate one
of the three required formal properties directly. It's more common for
it to fall back" to address-based comparison by mistake.
it to "fall back" to address-based comparison by mistake.
For example,
\begin{verbatim}
......@@ -284,7 +329,7 @@ class Mine:
It's quite possible there that the \keyword{else} clause allows
a result to be computed based on memory address. The bug won't show
up until a btree-based structure uses objects of class \class{Mine} as
up until a BTree-based structure uses objects of class \class{Mine} as
keys, and also objects of other types as keys, and the structure is
loaded from a database, and a sequence of comparisons happens to execute
the \keyword{else} clause in a case where the relative order of object
......@@ -298,9 +343,9 @@ only with great care:
\begin{enumerate}
\item Use objects of simple immutable types as keys in
btree-based data structures.
BTree-based data structures.
\item Within a single btree-based data structure, use objects of
\item Within a single BTree-based data structure, use objects of
a single type as keys. Don't use multiple key types in a
single structure.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment