Commit 05729335 authored by Jim Fulton's avatar Jim Fulton

Created a tag for the ZODB 3,3 second alpha release. We will try to

sync the Zope 2 trunk with that so we can make progress on Zope 2
without waiting for ZODB 3.3 mounting/multi-database support to be
finished.

This tag was created from trunk revision 3211:

svn cp -r3211 svn+ssh://svn.zope.org/repos/main/ZODB/trunk \
              svn+ssh://svn.zope.org/repos/main/ZODB/tags/3.3.0a2
parent c394cc1a
##############################################################################
#
# Copyright (c) 2002, 2003 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
##############################################################################
# Rules to convert the documentation to a single PDF file.
#
# PostScript, HTML, and plain text output are also supported, though
# PDF is the default.
#
# See the README.txt file for information on the mkhowto program used
# to generate the formatted versions of the documentation.
.PHONY: default all html pdf ps text
default: pdf
all: html pdf ps text
html: zconfig/zconfig.html
pdf: zconfig.pdf
ps: zconfig.ps
text: zconfig.txt
zconfig/zconfig.html: zconfig.tex schema.dtd xmlmarkup.perl
mkhowto --html $<
zconfig.pdf: zconfig.tex schema.dtd xmlmarkup.sty
mkhowto --pdf $<
zconfig.ps: zconfig.tex schema.dtd xmlmarkup.sty
mkhowto --postscript $<
zconfig.txt: zconfig.tex schema.dtd xmlmarkup.sty
mkhowto --text $<
clean:
rm -f zconfig.l2h zconfig.l2h~
clobber: clean
rm -f zconfig.pdf zconfig.ps zconfig.txt
rm -rf zconfig
The zconfig.tex document in this directory contains the reference
documentation for the ZConfig package. This documentation is written
using the Python LaTeX styles.
To format the documentation, get a copy of the Python documentation
tools (the Doc/ directory from the Python sources), and create a
symlink to the tools/mkhowto script from some convenient bin/
directory. You will need to have a fairly complete set of
documentation tools installed on your platform; see
http://www.python.org/doc/current/doc/doc.html
for more information on the tools.
<!--
*************************************************************************
Copyright (c) 2002, 2003 Zope Corporation and Contributors.
All Rights Reserved.
This software is subject to the provisions of the Zope Public License,
Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution.
THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
FOR A PARTICULAR PURPOSE.
*************************************************************************
-->
<!-- DTD for ZConfig schema documents. -->
<!ELEMENT schema (description?, metadefault?, example?,
import*,
(sectiontype | abstracttype)*,
(section | key | multisection | multikey)*)>
<!ATTLIST schema
extends NMTOKEN #IMPLIED
prefix NMTOKEN #IMPLIED
handler NMTOKEN #IMPLIED
keytype NMTOKEN #IMPLIED
datatype NMTOKEN #IMPLIED>
<!ELEMENT component (description?, (sectiontype | abstracttype)*)>
<!ATTLIST component
prefix NMTOKEN #IMPLIED>
<!ELEMENT extension (description?, (sectiontype | abstracttype)*)>
<!ATTLIST extension
prefix NMTOKEN #IMPLIED>
<!ELEMENT import EMPTY>
<!ATTLIST import
file CDATA #IMPLIED
package NMTOKEN #IMPLIED
src CDATA #IMPLIED>
<!ELEMENT description (#PCDATA)*>
<!ATTLIST description
format NMTOKEN #IMPLIED>
<!ELEMENT metadefault (#PCDATA)*>
<!ELEMENT example (#PCDATA)*>
<!ELEMENT sectiontype (description?,
(section | key | multisection | multikey)*)>
<!ATTLIST sectiontype
name NMTOKEN #REQUIRED
prefix NMTOKEN #IMPLIED
keytype NMTOKEN #IMPLIED
datatype NMTOKEN #IMPLIED
implements NMTOKEN #IMPLIED
extends NMTOKEN #IMPLIED>
<!ELEMENT abstracttype (description?)>
<!ATTLIST abstracttype
name NMTOKEN #REQUIRED
prefix NMTOKEN #IMPLIED>
<!ELEMENT key (description?, metadefault?, example?)>
<!ATTLIST key
name CDATA #REQUIRED
attribute NMTOKEN #IMPLIED
datatype NMTOKEN #IMPLIED
handler NMTOKEN #IMPLIED
required (yes|no) "no"
default CDATA #IMPLIED>
<!ELEMENT multikey (description?, metadefault?, example?, default*)>
<!ATTLIST multikey
name CDATA #REQUIRED
attribute NMTOKEN #IMPLIED
datatype NMTOKEN #IMPLIED
handler NMTOKEN #IMPLIED
required (yes|no) "no">
<!ELEMENT section (description?)>
<!ATTLIST section
name CDATA #REQUIRED
attribute NMTOKEN #IMPLIED
type NMTOKEN #REQUIRED
handler NMTOKEN #IMPLIED
required (yes|no) "no">
<!ELEMENT multisection (description?)>
<!ATTLIST multisection
name CDATA #REQUIRED
attribute NMTOKEN #IMPLIED
type NMTOKEN #REQUIRED
handler NMTOKEN #IMPLIED
required (yes|no) "no">
##############################################################################
#
# Copyright (c) 2003 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
##############################################################################
# LaTeX2HTML support for the xmlmarkup package. Doesn't do indexing.
package main;
sub do_cmd_element{
local($_) = @_;
my $name = next_argument();
return "<tt class='element'>$name</tt>" . $_;
}
sub do_cmd_attribute{
local($_) = @_;
my $name = next_argument();
return "<tt class='attribute'>$name</tt>" . $_;
}
sub do_env_attributedesc{
local($_) = @_;
my $name = next_argument();
my $valuetype = next_argument();
return ("\n<dl class='macrodesc'>"
. "\n<dt><b><tt class='macro'>$name</tt></b>"
. "&nbsp;&nbsp;&nbsp;($valuetype)"
. "\n<dd>"
. $_
. "</dl>");
}
sub do_env_elementdesc{
local($_) = @_;
my $name = next_argument();
my $contentmodel = next_argument();
return ("\n<dl class='elementdesc'>"
. "\n<dt class='start-tag'><tt>&lt;"
. "<b class='element'>$name</b>&gt;</tt>"
. "\n<dd class='content-model'>$contentmodel"
. "\n<dt class='endtag'><tt>&lt;/"
. "<b class='element'>$name</b>&gt;</tt>"
. "\n<dd class='descrition'>"
. $_
. "</dl>");
}
1; # Must end with this, because Perl is bogus.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Copyright (c) 2003 Zope Corporation and Contributors.
% All Rights Reserved.
%
% This software is subject to the provisions of the Zope Public License,
% Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution.
% THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
% WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
% WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
% FOR A PARTICULAR PURPOSE.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Define some simple markup for the LaTeX command documentation:
\ProvidesPackage{xmlmarkup}
\RequirePackage{python} % fulllineitems environment
\newcommand{\element}[1]{\code{#1}}
\newcommand{\attribute}[1]{\code{#1}}
% \begin{elementdesc}{type}{content-model}
\newenvironment{elementdesc}[2]{
\begin{fulllineitems}
\item[\code{\textless{\bfseries #1}\textgreater}]
\code{#2}
\item[\code{\textless/{\bfseries #1}\textgreater}]
\index{#1 element@\py@idxcode{#1} element}
\index{elements!#1@\py@idxcode{#1}}
}{\end{fulllineitems}}
% \begin{attributedesc}{name}{content-type}
\newenvironment{attributedesc}[2]{
\begin{fulllineitems}
\item[\code{\bfseries#1}{\quad(#2)}]
\index{#1@\py@idxcode{#1}}
}{\end{fulllineitems}}
This diff is collapsed.
ZEO Documentation
=================
This directory contains ZEO documentation.
howto.txt
The first place to look.
It provides a high-level overview of ZEO features and details on
how to install and configure clients and servers. Includes a
configuration reference.
cache.txt
Explains how the client cache works.
trace.txt
Describe cache trace tool used to determine ideal cache size.
ZopeREADME.txt
A somewhat dated description of how to integrate ZEO with Zope.
It provides a few hints that are not yet in howto.txt.
Zope Enterprise Objects (ZEO)
Installation
ZEO 2.0 requires Zope 2.4 or higher and Python 2.1 or higher.
If you use Python 2.1, we recommend the latest minor release
(2.1.3 as of this writing) because it includes a few bug fixes
that affect ZEO.
Put the package (the ZEO directory, without any wrapping directory
included in a distribution) in your Zope lib/python.
The setup.py script in the top-level ZEO directory can also be
used. Run "python setup.py install --home=ZOPE" where ZOPE is the
top-level Zope directory.
You can test ZEO before installing it with the test script::
python test.py -v
Run the script with the -h option for a full list of options. The
ZEO 2.0b2 release contains 122 unit tests on Unix.
Starting (and configuring) the ZEO Server
To start the storage server, go to your Zope install directory and
run::
python lib/python/ZEO/start.py -p port_number
This run the storage sever under zdaemon. zdaemon automatically
restarts programs that exit unexpectedly.
The server and the client don't have to be on the same machine.
If they are on the same machine, then you can use a Unix domain
socket::
python lib/python/ZEO/start.py -U filename
The start script provides a number of options not documented here.
See doc/start.txt for more information.
Running Zope as a ZEO client
To get Zope to use the server, create a custom_zodb module,
custom_zodb.py, in your Zope install directory, so that Zope uses a
ClientStorage::
from ZEO.ClientStorage import ClientStorage
Storage = ClientStorage(('', port_number))
You can specify a host name (rather than '') if you want. The port
number is, of course, the port number used to start the storage
server.
You can also give the name of a Unix domain socket file::
from ZEO.ClientStorage import ClientStorage
Storage = ClientStorage(filename)
There are a number of configuration options available for the
ClientStorage. See doc/ClientStorage.txt for details.
If you want a persistent client cache which retains cache contents
across ClientStorage restarts, you need to define the environment
variable, ZEO_CLIENT, or set the client keyword argument to the
constructor to a unique name for the client. This is needed so
that unique cache name files can be computed. Otherwise, the
client cache is stored in temporary files which are removed when
the ClientStorage shuts down. For example, to start two Zope
processes with unique caches, use something like::
python z2.py -P8700 ZEO_CLIENT=8700
python z2.py -P8800 ZEO_CLIENT=8800
Zope product installation
Normally, Zope updates the Zope database during startup to reflect
product changes or new products found. It makes no sense for
multiple ZEO clients to do the same installation. Further, if
different clients have different software installed, the correct
state of the database is ambiguous.
Zope will not modify the Zope database during product installation
if the environment variable ZEO_CLIENT is set.
Normally, Zope ZEO clients should be run with ZEO_CLIENT set so
that product initialization is not performed.
If you do install new Zope products, then you need to take a
special step to cause the new products to be properly registered
in the database. The easiest way to do this is to start Zope
once with the environment variable FORCE_PRODUCT_LOAD set.
The interaction between ZEO and Zope product installation is
unfortunate.
ZEO Client Cache
The Client cache provides a disk based cache for each ZEO client.
The client cache allows reads to be done from local disk rather than
by remote access to the storage server.
The cache may be persistent or transient. If the cache is
persistent, then the cache files are retained for use after process
restarts. A non-persistent cache uses temporary files that are
removed when the client storage is closed.
The client cache is managed as two files. The cache manager
endeavors to maintain the two files at sizes less than or equal to
one half the cache size. One of the cache files is designated the
"current" cache file. The other cache file is designated the "old"
cache file, if it exists. All writes are done to the current cache
files. When transactions are committed on the client, transactions
are not split between cache files. Large transactions may cause
cache files to be larger than one half the target cache size.
The life of the cache is as follows:
- When the cache is created, the first of the two cache files is
created and designated the "current" cache file.
- Cache records are written to the cache file, either as
transactions commit locally, or as data are loaded from the
server.
- When the cache file size exceeds one half the cache size, the
second cache file is created and designated the "current" cache
file. The first cache file becomes the "old" cache file.
- Cache records are written to the new current cache file, either as
transactions commit locally, or as data are loaded from the
server.
- When a cache hit is found in the old cache file, it is copied to
the current cache file.
- When the current cache file size exceeds one half the cache size, the
first cache file is recreated and designated the "current" cache
file. The second cache file becomes the "old" cache file.
and so on.
Persistent cache files are created in the directory named in the
'var' argument to the ClientStorage (see ClientStorage.txt) or in
the 'var' subdirectory of the directory given by the INSTANCE_HOME
builtin (created by Zope), or in the current working directory.
Persistent cache files have names of the form::
cstorage-client-n.zec
where:
storage -- the storage name
client -- the client name, as given by the 'ZEO_CLIENT' environment
variable or the 'client' argument provided when creating a client
storage.
n -- '0' for the first cache file and '1' for the second.
For example, the second cache file for storage 'spam' and client 8881
would be named 'cspam-8881-1.zec'.
This diff is collapsed.
ZEO Client Cache Tracing
========================
An important question for ZEO users is: how large should the ZEO
client cache be? ZEO 2 (as of ZEO 2.0b2) has a new feature that lets
you collect a trace of cache activity and tools to analyze this trace,
enabling you to make an informed decision about the cache size.
Don't confuse the ZEO client cache with the Zope object cache. The
ZEO client cache is only used when an object is not in the Zope object
cache; the ZEO client cache avoids roundtrips to the ZEO server.
Enabling Cache Tracing
----------------------
To enable cache tracing, set the environment variable ZEO_CACHE_TRACE
to the name of a file to which the ZEO client process can write. ZEO
will append a hyphen and the storage name to the filename, to
distinguish different storages. If the file doesn't exist, the ZEO
will try to create it. If there are problems with the file, a log
message is written to the standard Zope log file. To start or stop
tracing, the ZEO client process (typically a Zope application server)
must be restarted.
The trace file can grow pretty quickly; on a moderately loaded server,
we observed it growing by 5 MB per hour. The file consists of binary
records, each 24 bytes long; a detailed description of the record
lay-out is given in stats.py. No sensitive data is logged.
Analyzing a Cache Trace
-----------------------
The stats.py command-line tool is the first-line tool to analyze a
cache trace. Its default output consists of two parts: a one-line
summary of essential statistics for each segment of 15 minutes,
interspersed with lines indicating client restarts and "cache flip
events" (more about those later), followed by a more detailed summary
of overall statistics.
The most important statistic is probably the "hit rate", a percentage
indicating how many requests to load an object could be satisfied from
the cache. Hit rates around 70% are good. 90% is probably close to
the theoretical maximum. If you see a hit rate under 60% you can
probably improve the cache performance (and hence your Zope
application server's performance) by increasing the ZEO cache size.
This is normally configured using the cache_size keyword argument to
the ClientStorage() constructor in your custom_zodb.py file. The
default cache size is 20 MB.
The stats.py tool shows its command line syntax when invoked without
arguments. The tracefile argument can be a gzipped file if it has a
.gz extension. It will read from stdin (assuming uncompressed data)
if the tracefile argument is '-'.
Simulating Different Cache Sizes
--------------------------------
Based on a cache trace file, you can make a prediction of how well the
cache might do with a different cache size. The simul.py tool runs an
accurate simulation of the ZEO client cache implementation based upon
the events read from a trace file. A new simulation is started each
time the trace file records a client restart event; if a trace file
contains more than one restart event, a separate line is printed for
each simulation, and line with overall statistics is added at the end.
Example, assuming the trace file is in /tmp/cachetrace.log::
$ python simul.py -s 100 /tmp/cachetrace.log
START TIME DURATION LOADS HITS INVALS WRITES FLIPS HITRATE
Sep 4 11:59 38:01 59833 40473 257 20 2 67.6%
$
This shows that with a 100 MB cache size, the cache hit rate is
67.6%. So let's try this again with a 200 MB cache size::
$ python simul.py -s 200 /tmp/cachetrace.log
START TIME DURATION LOADS HITS INVALS WRITES FLIPS HITRATE
Sep 4 11:59 38:01 59833 40921 258 20 1 68.4%
$
This showed hardly any improvement. So let's try a 300 MB cache
size::
$ python2.0 simul.py -s 300 /tmp/cachetrace.log
ZEOCacheSimulation, cache size 300,000,000 bytes
START TIME DURATION LOADS HITS INVALS WRITES FLIPS HITRATE
Sep 4 11:59 38:01 59833 40921 258 20 0 68.4%
$
This shows that for this particular trace file, the maximum attainable
hit rate is 68.4%. This is probably caused by the fact that nearly a
third of the objects mentioned in the trace were loaded only once --
the cache only helps if an object is loaded more than once.
The simul.py tool also supports simulating different cache
strategies. Since none of these are implemented, these are not
further documented here.
Cache Flips
-----------
The cache uses two files, which are managed as follows:
- Data are written to file 0 until file 0 exceeds limit/2 in size.
- Data are written to file 1 until file 1 exceeds limit/2 in size.
- File 0 is truncated to size 0 (or deleted and recreated).
- Data are written to file 0 until file 0 exceeds limit/2 in size.
- File 1 is truncated to size 0 (or deleted and recreated).
- Data are written to file 1 until file 1 exceeds limit/2 in size.
and so on.
A switch from file 0 to file 1 is called a "cache flip". At all cache
flips except the first, half of the cache contents is wiped out. This
affects cache performance. How badly this impact is can be seen from
the per-15-minutes summaries printed by stats.py. The -i option lets
you choose a smaller summary interval which shows the impact more
acutely.
The simul.py tool shows the number of cache flips in the FLIPS column.
If you see more than one flip per hour the cache may be too small.
This directory contains Andrew Kuchling's programmer's guide to ZODB
and ZEO. It was originally taken from Andrew's zodb.sf.net project on
SourceForge. Because the original version is no longer updated, this
version is best viewed as an independent fork now.
Write section on __setstate__
Continue working on it
Suppress the full GFDL text in the PDF/PS versions
% Administration
% Importing and exporting data
% Disaster recovery/avoidance
% Security
import sys, time, os, random
from ZEO import ClientStorage
import ZODB
from ZODB.POSException import ConflictError
from Persistence import Persistent
import BTree
class ChatSession(Persistent):
"""Class for a chat session.
Messages are stored in a B-tree, indexed by the time the message
was created. (Eventually we'd want to throw messages out,
add_message(message) -- add a message to the channel
new_messages() -- return new messages since the last call to
this method
"""
def __init__(self, name):
"""Initialize new chat session.
name -- the channel's name
"""
self.name = name
# Internal attribute: _messages holds all the chat messages.
self._messages = BTree.BTree()
def new_messages(self):
"Return new messages."
# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
self._v_last_time = 0
new = []
T = self._v_last_time
for T2, message in self._messages.items():
if T2 > T:
new.append( message )
self._v_last_time = T2
return new
def add_message(self, message):
"""Add a message to the channel.
message -- text of the message to be added
"""
while 1:
try:
now = time.time()
self._messages[ now ] = message
get_transaction().commit()
except ConflictError:
# Conflict occurred; this process should pause and
# wait for a little bit, then try again.
time.sleep(.2)
pass
else:
# No ConflictError exception raised, so break
# out of the enclosing while loop.
break
# end while
def get_chat_session(conn, channelname):
"""Return the chat session for a given channel, creating the session
if required."""
# We'll keep a B-tree of sessions, mapping channel names to
# session objects. The B-tree is stored at the ZODB's root under
# the key 'chat_sessions'.
root = conn.root()
if not root.has_key('chat_sessions'):
print 'Creating chat_sessions B-tree'
root['chat_sessions'] = BTree.BTree()
get_transaction().commit()
sessions = root['chat_sessions']
# Get a session object corresponding to the channel name, creating
# it if necessary.
if not sessions.has_key( channelname ):
print 'Creating new session:', channelname
sessions[ channelname ] = ChatSession(channelname)
get_transaction().commit()
session = sessions[ channelname ]
return session
if __name__ == '__main__':
if len(sys.argv) != 2:
print 'Usage: %s <channelname>' % sys.argv[0]
sys.exit(0)
storage = ClientStorage.ClientStorage( ('localhost', 9672) )
db = ZODB.DB( storage )
conn = db.open()
s = session = get_chat_session(conn, sys.argv[1])
messages = ['Hi.', 'Hello', 'Me too', "I'M 3L33T!!!!"]
while 1:
# Send a random message
msg = random.choice(messages)
session.add_message( '%s: pid %i' % (msg,os.getpid() ))
# Display new messages
for msg in session.new_messages():
print msg
# Wait for a few seconds
pause = random.randint( 1, 4 )
time.sleep( pause )
This diff is collapsed.
% Indexing Data
% BTrees
% Full-text indexing
%Introduction
% What is ZODB?
% What is ZEO?
% OODBs vs. Relational DBs
% Other OODBs
\section{Introduction}
This guide explains how to write Python programs that use the Z Object
Database (ZODB) and Zope Enterprise Objects (ZEO). The latest version
of the guide is always available at
\url{http://www.zope.org/Wikis/ZODB/guide/index.html}.
\subsection{What is the ZODB?}
The ZODB is a persistence system for Python objects. Persistent
programming languages provide facilities that automatically write
objects to disk and read them in again when they're required by a
running program. By installing the ZODB, you add such facilities to
Python.
It's certainly possible to build your own system for making Python
objects persistent. The usual starting points are the \module{pickle}
module, for converting objects into a string representation, and
various database modules, such as the \module{gdbm} or \module{bsddb}
modules, that provide ways to write strings to disk and read them
back. It's straightforward to combine the \module{pickle} module and
a database module to store and retrieve objects, and in fact the
\module{shelve} module, included in Python's standard library, does
this.
The downside is that the programmer has to explicitly manage objects,
reading an object when it's needed and writing it out to disk when the
object is no longer required. The ZODB manages objects for you,
keeping them in a cache, writing them out to disk when they are
modified, and dropping them from the cache if they haven't been used
in a while.
\subsection{OODBs vs. Relational DBs}
Another way to look at it is that the ZODB is a Python-specific
object-oriented database (OODB). Commercial object databases for C++
or Java often require that you jump through some hoops, such as using
a special preprocessor or avoiding certain data types. As we'll see,
the ZODB has some hoops of its own to jump through, but in comparison
the naturalness of the ZODB is astonishing.
Relational databases (RDBs) are far more common than OODBs.
Relational databases store information in tables; a table consists of
any number of rows, each row containing several columns of
information. (Rows are more formally called relations, which is where
the term ``relational database'' originates.)
Let's look at a concrete example. The example comes from my day job
working for the MEMS Exchange, in a greatly simplified version. The
job is to track process runs, which are lists of manufacturing steps
to be performed in a semiconductor fab. A run is owned by a
particular user, and has a name and assigned ID number. Runs consist
of a number of operations; an operation is a single step to be
performed, such as depositing something on a wafer or etching
something off it.
Operations may have parameters, which are additional information
required to perform an operation. For example, if you're depositing
something on a wafer, you need to know two things: 1) what you're
depositing, and 2) how much should be deposited. You might deposit
100 microns of silicon oxide, or 1 micron of copper.
Mapping these structures to a relational database is straightforward:
\begin{verbatim}
CREATE TABLE runs (
int run_id,
varchar owner,
varchar title,
int acct_num,
primary key(run_id)
);
CREATE TABLE operations (
int run_id,
int step_num,
varchar process_id,
PRIMARY KEY(run_id, step_num),
FOREIGN KEY(run_id) REFERENCES runs(run_id),
);
CREATE TABLE parameters (
int run_id,
int step_num,
varchar param_name,
varchar param_value,
PRIMARY KEY(run_id, step_num, param_name)
FOREIGN KEY(run_id, step_num)
REFERENCES operations(run_id, step_num),
);
\end{verbatim}
In Python, you would write three classes named \class{Run},
\class{Operation}, and \class{Parameter}. I won't present code for
defining these classes, since that code is uninteresting at this
point. Each class would contain a single method to begin with, an
\method{__init__} method that assigns default values, such as 0 or
\code{None}, to each attribute of the class.
It's not difficult to write Python code that will create a \class{Run}
instance and populate it with the data from the relational tables;
with a little more effort, you can build a straightforward tool,
usually called an object-relational mapper, to do this automatically.
(See
\url{http://www.amk.ca/python/unmaintained/ordb.html} for a quick hack
at a Python object-relational mapper, and
\url{http://www.python.org/workshops/1997-10/proceedings/shprentz.html}
for Joel Shprentz's more successful implementation of the same idea;
Unlike mine, Shprentz's system has been used for actual work.)
However, it is difficult to make an object-relational mapper
reasonably quick; a simple-minded implementation like mine is quite
slow because it has to do several queries to access all of an object's
data. Higher performance object-relational mappers cache objects to
improve performance, only performing SQL queries when they actually
need to.
That helps if you want to access run number 123 all of a sudden. But
what if you want to find all runs where a step has a parameter named
'thickness' with a value of 2.0? In the relational version, you have
two unappealing choices:
\begin{enumerate}
\item Write a specialized SQL query for this case: \code{SELECT run_id
FROM operations WHERE param_name = 'thickness' AND param_value = 2.0}
If such queries are common, you can end up with lots of specialized
queries. When the database tables get rearranged, all these queries
will need to be modified.
\item An object-relational mapper doesn't help much. Scanning
through the runs means that the the mapper will perform the required
SQL queries to read run \#1, and then a simple Python loop can check
whether any of its steps have the parameter you're looking for.
Repeat for run \#2, 3, and so forth. This does a vast
number of SQL queries, and therefore is incredibly slow.
\end{enumerate}
An object database such as ZODB simply stores internal pointers from
object to object, so reading in a single object is much faster than
doing a bunch of SQL queries and assembling the results. Scanning all
runs, therefore, is still inefficient, but not grossly inefficient.
\subsection{What is ZEO?}
The ZODB comes with a few different classes that implement the
\class{Storage} interface. Such classes handle the job of
writing out Python objects to a physical storage medium, which can be
a disk file (the \class{FileStorage} class), a BerkeleyDB file
(\class{BDBFullStorage}), a relational database
(\class{DCOracleStorage}), or some other medium. ZEO adds
\class{ClientStorage}, a new \class{Storage} that doesn't write to
physical media but just forwards all requests across a network to a
server. The server, which is running an instance of the
\class{StorageServer} class, simply acts as a front-end for some
physical \class{Storage} class. It's a fairly simple idea, but as
we'll see later on in this document, it opens up many possibilities.
\subsection{About this guide}
The primary author of this guide works on a project which uses the
ZODB and ZEO as its primary storage technology. We use the ZODB to
store process runs and operations, a catalog of available processes,
user information, accounting information, and other data. Part of the
goal of writing this document is to make our experience more widely
available. A few times we've spent hours or even days trying to
figure out a problem, and this guide is an attempt to gather up the
knowledge we've gained so that others don't have to make the same
mistakes we did while learning.
The author's ZODB project is described in a paper available here,
\url{http://www.amk.ca/python/writing/mx-architecture/}
This document will always be a work in progress. If you wish to
suggest clarifications or additional topics, please send your comments to
\email{zodb-dev@zope.org}.
\subsection{Acknowledgements}
Andrew Kuchling wrote the original version of this guide, which
provided some of the first ZODB documentation for Python programmers.
His initial version has been updated over time by Jeremy Hylton and
Tim Peters.
I'd like to thank the people who've pointed out inaccuracies and bugs,
offered suggestions on the text, or proposed new topics that should be
covered: Jeff Bauer, Willem Broekema, Thomas Guettler,
Chris McDonough, George Runyan.
% links.tex
% Collection of relevant links
\section{Resources}
Introduction to the Zope Object Database, by Jim Fulton:
\\
Goes into much greater detail, explaining advanced uses of the ZODB and
how it's actually implemented. A definitive reference, and highly recommended.
\\
\url{http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html}
Persistent Programing with ZODB, by Jeremy Hylton and Barry Warsaw:
\\
Slides for a tutorial presented at the 10th Python conference. Covers
much of the same ground as this guide, with more details in some areas
and less in others.
\\
\url{http://www.zope.org/Members/bwarsaw/ipc10-slides}
This diff is collapsed.
This diff is collapsed.
% Storages
% FileStorage
% BerkeleyStorage
% OracleStorage
\section{Storages}
This chapter will examine the different \class{Storage} subclasses
that are considered stable, discuss their varying characteristics, and
explain how to administer them.
\subsection{Using Multiple Storages}
XXX explain mounting substorages
\subsection{FileStorage}
\subsection{BDBFullStorage}
\subsection{OracleStorage}
%Transactions and Versioning
% Committing and Aborting
% Subtransactions
% Undoing
% Versions
% Multithreaded ZODB Programs
\section{Transactions and Versioning}
%\subsection{Committing and Aborting}
% XXX There seems very little to say about commit/abort...
\subsection{Subtransactions}
Subtransactions can be created within a transaction. Each
subtransaction can be individually committed and aborted, but the
changes within a subtransaction are not truly committed until the
containing transaction is committed.
The primary purpose of subtransactions is to decrease the memory usage
of transactions that touch a very large number of objects. Consider a
transaction during which 200,000 objects are modified. All the
objects that are modified in a single transaction have to remain in
memory until the transaction is committed, because the ZODB can't
discard them from the object cache. This can potentially make the
memory usage quite large. With subtransactions, a commit can be be
performed at intervals, say, every 10,000 objects. Those 10,000
objects are then written to permanent storage and can be purged from
the cache to free more space.
To commit a subtransaction instead of a full transaction,
pass a true value to the \method{commit()}
or \method{abort()} method of the \class{Transaction} object.
\begin{verbatim}
# Commit a subtransaction
get_transaction().commit(1)
# Abort a subtransaction
get_transaction().abort(1)
\end{verbatim}
A new subtransaction is automatically started on committing or
aborting the previous subtransaction.
\subsection{Undoing Changes}
Some types of \class{Storage} support undoing a transaction even after
it's been committed. You can tell if this is the case by calling the
\method{supportsUndo()} method of the \class{DB} instance, which
returns true if the underlying storage supports undo. Alternatively
you can call the \method{supportsUndo()} method on the underlying
storage instance.
If a database supports undo, then the \method{undoLog(\var{start},
\var{end}\optional{, func})} method on the \class{DB} instance returns
the log of past transactions, returning transactions between the times
\var{start} and \var{end}, measured in seconds from the epoch.
If present, \var{func} is a function that acts as a filter on the
transactions to be returned; it's passed a dictionary representing
each transaction, and only transactions for which \var{func} returns
true will be included in the list of transactions returned to the
caller of \method{undoLog()}. The dictionary contains keys for
various properties of the transaction. The most important keys are
\samp{id}, for the transaction ID, and \samp{time}, for the time at
which the transaction was committed.
\begin{verbatim}
>>> print storage.undoLog(0, sys.maxint)
[{'description': '',
'id': 'AzpGEGqU/0QAAAAAAAAGMA',
'time': 981126744.98,
'user_name': ''},
{'description': '',
'id': 'AzpGC/hUOKoAAAAAAAAFDQ',
'time': 981126478.202,
'user_name': ''}
...
\end{verbatim}
To store a description and a user name on a commit, get the current
transaction and call the \method{note(\var{text})} method to store a
description, and the
\method{setUser(\var{user_name})} method to store the user name.
While \method{setUser()} overwrites the current user name and replaces
it with the new value, the \method{note()} method always adds the text
to the transaction's description, so it can be called several times to
log several different changes made in the course of a single
transaction.
\begin{verbatim}
get_transaction().setUser('amk')
get_transaction().note('Change ownership')
\end{verbatim}
To undo a transaction, call the \method{DB.undo(\var{id})} method,
passing it the ID of the transaction to undo. If the transaction
can't be undone, a \exception{ZODB.POSException.UndoError} exception
will be raised, with the message ``non-undoable
transaction''. Usually this will happen because later transactions
modified the objects affected by the transaction you're trying to
undo.
After you call \method{undo()} you must commit the transaction for the
undo to actually be applied.
\footnote{There are actually two different ways a storage can
implement the undo feature. Most of the storages that ship with ZODB
use the transactional form of undo described in the main text. Some
storages may use a non-transactional undo makes changes visible
immediately.} There is one glitch in the undo process. The thread
that calls undo may not see the changes to the object until it calls
\method{Connection.sync()} or commits another transaction.
\subsection{Versions}
While many subtransactions can be contained within a single regular
transaction, it's also possible to contain many regular transactions
within a long-running transaction, called a version in ZODB
terminology. Inside a version, any number of transactions can be
created and committed or rolled back, but the changes within a version
are not made visible to other connections to the same ZODB.
Not all storages support versions, but you can test for versioning
ability by calling \method{supportsVersions()} method of the
\class{DB} instance, which returns true if the underlying storage
supports versioning.
A version can be selected when creating the \class{Connection}
instance using the \method{DB.open(\optional{\var{version}})} method.
The \var{version} argument must be a string that will be used as the
name of the version.
\begin{verbatim}
vers_conn = db.open(version='Working version')
\end{verbatim}
Transactions can then be committed and aborted using this versioned
connection. Other connections that don't specify a version, or
provide a different version name, will not see changes committed
within the version named \samp{Working~version}. To commit or abort a
version, which will either make the changes visible to all clients or
roll them back, call the \method{DB.commitVersion()} or
\method{DB.abortVersion()} methods.
XXX what are the source and dest arguments for?
The ZODB makes no attempt to reconcile changes between different
versions. Instead, the first version which modifies an object will
gain a lock on that object. Attempting to modify the object from a
different version or from an unversioned connection will cause a
\exception{ZODB.POSException.VersionLockError} to be raised:
\begin{verbatim}
from ZODB.POSException import VersionLockError
try:
get_transaction().commit()
except VersionLockError, (obj_id, version):
print ('Cannot commit; object %s '
'locked by version %s' % (obj_id, version))
\end{verbatim}
The exception provides the ID of the locked object, and the name of
the version having a lock on it.
\subsection{Multithreaded ZODB Programs}
ZODB databases can be accessed from multithreaded Python programs.
The \class{Storage} and \class{DB} instances can be shared among
several threads, as long as individual \class{Connection} instances
are created for each thread.
This diff is collapsed.
\documentclass{howto}
\title{ZODB/ZEO Programming Guide}
\release{0.3a2}
\date{\today}
\author{A.M.\ Kuchling}
\authoraddress{\email{amk@amk.ca}}
\begin{document}
\maketitle
\tableofcontents
\copyright{Copyright 2002 A.M. Kuchling.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the appendix entitled ``GNU
Free Documentation License''.}
\input{introduction}
\input{prog-zodb}
\input{zeo}
\input{transactions}
\input{modules}
\appendix
\input links.tex
\input gfdl.tex
\end{document}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
##############################################################################
#
# Copyright (c) 2001, 2002 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE
#
##############################################################################
# hack to overcome dynamic-linking headache.
from _IIBTree import *
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment