Commit 6b0e720d authored by Tim Peters's avatar Tim Peters

A large number of changes so that cache tracing isn't

100% broken -- I can get a trace file now, and stats.py
doesn't blow up anymore.  Don't yet know whether it's
reporting sensible things, but it's not obviously insane.
simul.py doesn't know about the current cache implementation
either yet.
parent fdb78a64
......@@ -13,29 +13,32 @@ cache; the ZEO client cache avoids roundtrips to the ZEO server.
Enabling Cache Tracing
----------------------
To enable cache tracing, set the environment variable ZEO_CACHE_TRACE
to the name of a file to which the ZEO client process can write. ZEO
will append a hyphen and the storage name to the filename, to
distinguish different storages. If the file doesn't exist, the ZEO
will try to create it. If there are problems with the file, a log
message is written to the standard Zope log file. To start or stop
tracing, the ZEO client process (typically a Zope application server)
must be restarted.
The trace file can grow pretty quickly; on a moderately loaded server,
we observed it growing by 5 MB per hour. The file consists of binary
records, each 24 bytes long; a detailed description of the record
lay-out is given in stats.py. No sensitive data is logged.
To enable cache tracing, you must use a persistent cache (specify a ``client``
name), and set the environment variable ZEO_CACHE_TRACE. The path to the
trace file is derived from the path to the persistent cache file by appending
".trace". If the file doesn't exist, ZEO will try to create it. If the file
does exist, it's opened for appending (previous trace information is not
overwritten). If there are problems with the file, a warning message is
logged. To start or stop tracing, the ZEO client process (typically a Zope
application server) must be restarted.
The trace file can grow pretty quickly; on a moderately loaded server, we
observed it growing by 5 MB per hour. The file consists of binary records,
each 26 bytes long if 8-byte oids are in use; a detailed description of the
record lay-out is given in stats.py. No sensitive data is logged: data
record sizes and binary object and transaction ids are logged, but no
information about object types or names, user names, version names,
transaction comments, access paths, or machine information such as machine
name or IP address.
Analyzing a Cache Trace
-----------------------
The stats.py command-line tool is the first-line tool to analyze a
cache trace. Its default output consists of two parts: a one-line
summary of essential statistics for each segment of 15 minutes,
interspersed with lines indicating client restarts and "cache flip
events" (more about those later), followed by a more detailed summary
of overall statistics.
The stats.py command-line tool is the first-line tool to analyze a cache
trace. Its default output consists of two parts: a one-line summary of
essential statistics for each segment of 15 minutes, interspersed with lines
indicating client restarts, followed by a more detailed summary of overall
statistics.
The most important statistic is probably the "hit rate", a percentage
indicating how many requests to load an object could be satisfied from
......@@ -43,9 +46,9 @@ the cache. Hit rates around 70% are good. 90% is probably close to
the theoretical maximum. If you see a hit rate under 60% you can
probably improve the cache performance (and hence your Zope
application server's performance) by increasing the ZEO cache size.
This is normally configured using the cache_size keyword argument to
the ClientStorage() constructor in your custom_zodb.py file. The
default cache size is 20 MB.
This is normally configured using cache_size key in the ``zeoclient``
section of your configuration file. The default cache size is 20 MB, which
is very small.
The stats.py tool shows its command line syntax when invoked without
arguments. The tracefile argument can be a gzipped file if it has a
......@@ -85,7 +88,7 @@ size::
ZEOCacheSimulation, cache size 300,000,000 bytes
START TIME DURATION LOADS HITS INVALS WRITES FLIPS HITRATE
Sep 4 11:59 38:01 59833 40921 258 20 0 68.4%
$
$
This shows that for this particular trace file, the maximum attainable
hit rate is 68.4%. This is probably caused by the fact that nearly a
......@@ -95,32 +98,3 @@ the cache only helps if an object is loaded more than once.
The simul.py tool also supports simulating different cache
strategies. Since none of these are implemented, these are not
further documented here.
Cache Flips
-----------
The cache uses two files, which are managed as follows:
- Data are written to file 0 until file 0 exceeds limit/2 in size.
- Data are written to file 1 until file 1 exceeds limit/2 in size.
- File 0 is truncated to size 0 (or deleted and recreated).
- Data are written to file 0 until file 0 exceeds limit/2 in size.
- File 1 is truncated to size 0 (or deleted and recreated).
- Data are written to file 1 until file 1 exceeds limit/2 in size.
and so on.
A switch from file 0 to file 1 is called a "cache flip". At all cache
flips except the first, half of the cache contents is wiped out. This
affects cache performance. How badly this impact is can be seen from
the per-15-minutes summaries printed by stats.py. The -i option lets
you choose a smaller summary interval which shows the impact more
acutely.
The simul.py tool shows the number of cache flips in the FLIPS column.
If you see more than one flip per hour the cache may be too small.
......@@ -76,15 +76,10 @@ class ClientCache(object):
# default of 20MB. The default here is misleading, though, since
# ClientStorage is the only user of ClientCache, and it always passes an
# explicit size of its own choosing.
def __init__(self, path=None, size=200*1024**2, trace=False):
def __init__(self, path=None, size=200*1024**2):
self.path = path
self.size = size
if trace and path:
self._setup_trace()
else:
self._trace = self._notrace
# The cache stores objects in a dict mapping (oid, tid) pairs
# to Object() records (see below). The tid is the transaction
# id that wrote the object. An object record includes data,
......@@ -111,6 +106,8 @@ class ClientCache(object):
# and retrieving objects to/from the cache file.
self.fc = FileCache(size, self.path, self)
self._setup_trace(self.path)
def open(self):
self.fc.scan(self.install)
......@@ -140,6 +137,10 @@ class ClientCache(object):
def close(self):
self.fc.close()
if self._tracefile:
sync(self._tracefile)
self._tracefile.close()
self._tracefile = None
##
# Set the last transaction seen by the cache.
......@@ -430,19 +431,29 @@ class ClientCache(object):
L = self.noncurrent[oid]
L.remove((o.start_tid, o.end_tid))
def _setup_trace(self):
tfn = self.path + ".trace"
self.tracefile = None
try:
self.tracefile = open(tfn, "ab")
self._trace(0x00)
except IOError, msg:
self.tracefile = None
logger.warning("Could not write to trace file %s: %s",
tfn, msg)
# If `path` isn't None (== we're using a persistent cache file), and
# envar ZEO_CACHE_TRACE is set to a non-empty value, try to open
# path+'.trace' as a trace file, and store the file object in
# self._tracefile. If not, or we can't write to the trace file, disable
# tracing by setting self._trace to a dummy function, and set
# self._tracefile to None.
def _setup_trace(self, path):
self._tracefile = None
if path and os.environ.get("ZEO_CACHE_TRACE"):
tfn = path + ".trace"
try:
self._tracefile = open(tfn, "ab")
self._trace(0x00)
except IOError, msg:
self._tracefile = None
logger.warning("cannot write tracefile %r (%s)", tfn, msg)
else:
logger.info("opened tracefile %r", tfn)
def _notrace(self, *arg, **kwargs):
pass
if self._tracefile is None:
def notrace(*args, **kws):
pass
self._trace = notrace
def _trace(self,
code, oid="", version="", tid="", end_tid=z64, dlen=0,
......@@ -462,7 +473,7 @@ class ClientCache(object):
if end_tid is None:
end_tid = z64
try:
self.tracefile.write(
self._tracefile.write(
struct_pack(">iiH8s8s",
time_time(),
encoded,
......
......@@ -26,8 +26,8 @@ Usage: stats.py [-h] [-i interval] [-q] [-s] [-S] [-v] [-X] tracefile
"""File format:
Each record is 24 bytes, with the following layout. Numbers are
big-endian integers.
Each record is 18 bytes, plus a variable number of bytes to store an oid,
with the following layout. Numbers are big-endian integers.
Offset Size Contents
......@@ -46,6 +46,9 @@ Mask bits Contents
0x7e 6 function and outcome code
0x01 1 current cache file (0 or 1)
The "current cache file" bit is no longer used; it refers to a 2-file
cache scheme used before ZODB 3.3.
The function and outcome codes are documented in detail at the end of
this file in the 'explain' dictionary. Note that the keys there (and
also the arguments to _trace() in ClientStorage.py) are 'code & 0x7e',
......@@ -59,18 +62,18 @@ import struct
from types import StringType
def usage(msg):
print >>sys.stderr, msg
print >>sys.stderr, __doc__
print >> sys.stderr, msg
print >> sys.stderr, __doc__
def main():
# Parse options
verbose = 0
quiet = 0
dostats = 1
print_size_histogram = 0
print_histogram = 0
interval = 900 # Every 15 minutes
heuristic = 0
verbose = False
quiet = False
dostats = True
print_size_histogram = False
print_histogram = False
interval = 15*60 # Every 15 minutes
heuristic = False
try:
opts, args = getopt.getopt(sys.argv[1:], "hi:qsSvX")
except getopt.error, msg:
......@@ -78,24 +81,27 @@ def main():
return 2
for o, a in opts:
if o == '-h':
print_histogram = 1
if o == "-i":
print_histogram = True
elif o == "-i":
interval = int(60 * float(a))
if interval <= 0:
interval = 60
elif interval > 3600:
interval = 3600
if o == "-q":
quiet = 1
verbose = 0
if o == "-s":
print_size_histogram = 1
if o == "-S":
dostats = 0
if o == "-v":
verbose = 1
if o == '-X':
heuristic = 1
elif o == "-q":
quiet = True
verbose = False
elif o == "-s":
print_size_histogram = True
elif o == "-S":
dostats = False
elif o == "-v":
verbose = True
elif o == '-X':
heuristic = True
else:
assert False, (o, opt)
if len(args) != 1:
usage("exactly one file argument required")
return 2
......@@ -107,12 +113,12 @@ def main():
try:
import gzip
except ImportError:
print >>sys.stderr, "can't read gzipped files (no module gzip)"
print >> sys.stderr, "can't read gzipped files (no module gzip)"
return 1
try:
f = gzip.open(filename, "rb")
except IOError, msg:
print >>sys.stderr, "can't open %s: %s" % (filename, msg)
print >> sys.stderr, "can't open %s: %s" % (filename, msg)
return 1
elif filename == '-':
# Read from stdin
......@@ -122,7 +128,7 @@ def main():
try:
f = open(filename, "rb")
except IOError, msg:
print >>sys.stderr, "can't open %s: %s" % (filename, msg)
print >> sys.stderr, "can't open %s: %s" % (filename, msg)
return 1
# Read file, gathering statistics, and printing each record if verbose
......@@ -203,9 +209,8 @@ def main():
bysizew[dlen] = d = bysizew.get(dlen) or {}
d[oid] = d.get(oid, 0) + 1
if verbose:
print "%s %d %02x %s %016x %016x %1s %s" % (
print "%s %02x %s %016x %016x %c %s" % (
time.ctime(ts)[4:-5],
current,
code,
oid_repr(oid),
U64(start_tid),
......@@ -234,7 +239,7 @@ def main():
# Error if nothing was read
if not records:
print >>sys.stderr, "No records processed"
print >> sys.stderr, "No records processed"
return 1
# Print statistics
......@@ -261,7 +266,7 @@ def main():
code,
explain.get(code) or "*** unknown code ***")
# Print histogram
# Print histogram.
if print_histogram:
print
print "Histogram of object load frequency"
......@@ -281,7 +286,7 @@ def main():
print fmt % (binsize, addcommas(count),
obj_percent, load_percent, cum)
# Print size histogram
# Print size histogram.
if print_size_histogram:
print
print "Histograms of object sizes"
......@@ -325,7 +330,7 @@ def dumpbyinterval(byinterval, h0, he):
def hitrate(bycode):
loads = 0
hits = 0
for code in bycode.keys():
for code in bycode:
if code & 0x70 == 0x20:
n = bycode[code]
loads += n
......@@ -345,8 +350,7 @@ def histogram(d):
return L
def U64(s):
h, v = struct.unpack(">II", s)
return (long(h) << 32) + v
return struct.unpack(">Q", s)[0]
def oid_repr(oid):
if isinstance(oid, StringType) and len(oid) == 8:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment