1. 08 Oct, 2018 2 commits
  2. 02 Oct, 2018 1 commit
  3. 01 Oct, 2018 1 commit
    • Kirill Smelkov's avatar
      go/zodb: Don't truncate Tid time precision to 1µs · 9112f21e
      Kirill Smelkov authored
      The format of tid assumes ~ ns precision, and it is only formatted to µs
      precision by default. So don't truncate TimeStamp value when computing
      it from Tid, and perform the µs-rounding only on formatting.
      
      The float numbers are not always exactly as in python. For example the
      following program
      
      	tidv = [
      	    0x0000000000000000,
      	    0x0285cbac258bf266,
      	    0x0285cbad27ae14e6,
      	    0x037969f722a53488,
      	    0x03b84285d71c57dd,
      	    0x03caa84275fc1166,
      	]
      
      	for tid in tidv:
      	    t = TimeStamp.TimeStamp(p64(tid))
      	    print '0x%016x %s %.9f\t%.9f' % (tid, t, t.timeTime(), t.second())
      
      prints:
      
      	0x0000000000000000 1900-01-01 00:00:00.000000 -2208988800.000000000     0.000000000
      	0x0285cbac258bf266 1979-01-03 21:00:08.800000 284245208.800000191       8.800000185
      	0x0285cbad27ae14e6 1979-01-03 21:01:09.300001 284245269.300001621       9.300001496	<-- ex here
      	0x037969f722a53488 2008-10-24 05:11:08.120000 1224825068.119999886      8.119999878
      	0x03b84285d71c57dd 2016-07-01 09:41:50.416574 1467366110.416574001      50.416573989
      	0x03caa84275fc1166 2018-10-01 16:34:27.652650 1538411667.652649879      27.652650112
      
      the difference is due to floating point operation ordering, because
      TimeStamp.timeTime() looses precision - e.g. for marked case:
      
      	In [8]: '%.10f' % (281566860.000000000 + 9.300001496)
      	Out[8]: '281566869.3000015020'
      
      We don't try to mimic float64 behaviour to Python exactly - because it is even
      different for PURE_PYTHON=y or C TimeStamp implementations. However we don't
      limit due to that our timestamp precision to only 1µs.
      
      In other words we keep on maintaining exact compatibility with Python on
      printing, but timestamp values itself are now ~ ns precision.
      9112f21e
  4. 28 Sep, 2018 1 commit
    • Kirill Smelkov's avatar
      go/zodb/fs1/index: Don't rely on []byte being pickled as string · c72aaa0d
      Kirill Smelkov authored
      As https://github.com/kisielk/og-rek/pull/57 maybe shows []byte was
      pickling as string only unintentionally and that might change.
      
      We are already explicitly checking for string in corresponding index
      load place:
      
      	https://lab.nexedi.com/kirr/neo/blob/2dba8607/go/zodb/storage/fs1/index.go#L282
      
      so it is better we also explicitly save the bits as string.
      
      If we don't and https://github.com/kisielk/og-rek/pull/57 gets accepted,
      tests will fail:
      
      	--- FAIL: TestIndexSaveLoad (0.00s)
      	    index_test.go:176: index load: /tmp/t-index893650059/458967662/1.fs.index: pickle @6: invalid oidPrefix: type []uint8
      	Traceback (most recent call last):
      	  File "./py/indexcmp", line 41, in <module>
      	    main()
      	  File "./py/indexcmp", line 29, in main
      	    d2 = fsIndex.load(path2)
      	  File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/fsIndex.py", line 138, in load
      	    data[ensure_bytes(k)] = fsBucket().fromString(ensure_bytes(v))
      	  File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/fsIndex.py", line 71, in ensure_bytes
      	    return s.encode('ascii') if not isinstance(s, bytes) else s
      	AttributeError: 'bytearray' object has no attribute 'encode'
      	--- FAIL: TestIndexSaveToPy (0.04s)
      	    index_test.go:218: zodb/py read/compare index: exit status 1
      c72aaa0d
  5. 09 Aug, 2018 14 commits
    • Kirill Smelkov's avatar
      go/zodb/btree: New package to work with ZODB BTrees (draft) · 2dba8607
      Kirill Smelkov authored
      Provide minimal support for BTrees.LOBTree Get for now.
      2dba8607
    • Kirill Smelkov's avatar
      go/zodb: DB - application-level handle to database (very draft) · 533f0c73
      Kirill Smelkov authored
      DB represents a handle to database at application level and contains pool
      of connections. DB.Open opens database connection. The connection will be
      automatically put back into DB pool for future reuse after corresponding
      transaction is complete. DB thus provides service to maintain live objects
      cache and reuse live objects from transaction to transaction.
      
      Note that it is possible to have several DB handles to the same database.
      This might be useful if application accesses distinctly different sets of
      objects in different transactions and knows beforehand which set it will be
      next time. Then, to avoid huge cache misses, it makes sense to keep DB
      handles opened for every possible case of application access.
      
      TODO handle invalidations.
      533f0c73
    • Kirill Smelkov's avatar
      go/zodb: Connection: Allow applications to tune live-cache eviction policy · c67ff9ea
      Kirill Smelkov authored
      For example Wendelin.core wcfs will need to keep some types of objects
      (e.g. BigFile index) always in RAM for efficiency.
      
      Provide corresponding interface that application could use to install
      such live-cache eviction decision-making tuning.
      c67ff9ea
    • Kirill Smelkov's avatar
      go/zodb: Implement Connection · fb343a6f
      Kirill Smelkov authored
      Connection represents an application-level view of a ZODB database.
      It has groups of in-RAM application-level objects associated with it.
      The objects are isolated from both changes in further database
      transactions and from changes to in-RAM objects in other connections.
      
      Connection, as objects group manager, is responsible for handling
      object -> object database references. For this to work it keeps
      
      	{} oid -> obj
      
      dict and uses it to find already loaded object when another object
      persistently references particular oid. Since it related pydata handling
      of persistent references is correspondingly implemented in this patch.
      
      The dict must keep weak references on objects. The following text
      explains the rationale:
      
      	if Connection keeps strong link to obj, just
      	obj.PDeactivate will not fully release obj if there are no
      	references to it from other objects:
      
      	     - deactivate will release obj state (ok)
      	     - but there will be still reference from connection `oid -> obj` map to this object,
      	       which means the object won't be garbage-collected.
      
      	-> we can solve it by using "weak" pointers in the map.
      
      	NOTE we cannot use regular map and arbitrarily manually "gc" entries
      	there periodically: since for an obj we don't know whether other
      	objects are referencing it, we can't just remove obj's oid from
      	the map - if we do so and there are other live objects that
      	reference obj, user code can still reach obj via those
      	references. On the other hand, if another, not yet loaded, object
      	also references obj and gets loaded, traversing reference from
      	that loaded object will load second copy of obj, thus breaking 1
      	object in db <-> 1 live object invariant:
      
      	     A  →  B  →  C
      	     ↓           |
      	     D <--------- - - -> D2 (wrong)
      
      	- A activate
      	- D activate
      	- B activate
      	- D gc, A still keeps link on D
      	- C activate -> it needs to get to D, but D was removed from objtab
      	  -> new D2 is wrongly created
      
      	that's why we have to depend on Go's GC to know whether there are
      	still live references left or not. And that in turn means finalizers
      	and thus weak references.
      
      	some link on the subject:
      	https://groups.google.com/forum/#!topic/golang-nuts/PYWxjT2v6ps
      fb343a6f
    • Kirill Smelkov's avatar
      go/zodb/internal/weak: New package to handle weak references · 79e28f3c
      Kirill Smelkov authored
      We will need weak references to handle {} oid -> obj inside zodb.Connection .
      
      In Go world they often say that weak references are not needed at all.
      Please see however the next patch for detailed rationale for why weak
      references (finalizers and cooperation from Go's GC in other words) are
      _required_ in that case.
      79e28f3c
    • Kirill Smelkov's avatar
      go/zodb: PyStateful persistency support · 532d014f
      Kirill Smelkov authored
      As promised in 354e0e51 (go/zodb: Persistent - the base type to
      implement IPersistent objects) add support to persistency machinery to
      set object state from python pickles serialized by ZODB/py.
      
      Persistent references are not yet handled.
      
      As promised add some very minimal persistent tests.
      532d014f
    • Kirill Smelkov's avatar
      go/zodb: pydata: Factor out class extraction logic into xpyclass · abc11031
      Kirill Smelkov authored
      Currently we handle many ways ZODB could serialize a Python class in
      PyData.ClassName. Since we'll be using this functionality in other
      places soon - extract it into dedicated function.
      
      Since will be also frequently using
      
      	class.__module__ + "." + class.__name__
      
      don't inline it in ClassName and instead put it into pyclassPath() right
      away.
      abc11031
    • Kirill Smelkov's avatar
      go/zodb: Persistent - the base type to implement IPersistent objects · 354e0e51
      Kirill Smelkov authored
      Add the base type, that types which want to implement persistency
      could embed, and this way inherit persistent functionality. For example
      
      	type MyObject struct {
      		Persistent
      		...
      	}
      
      	type myObjectState MyObject
      
      	func (o *myObjectState) DropState() { ... }
      	func (o *myObjectState) SetState(state *mem.Buf) error { ... }
      
      Here state management methods (DropState and SetState) will be
      automatically used by persistency machinery on activation and
      deactivation.
      
      For this to work MyObject class has to be registered to ZODB
      
      	func init() {
      		t := reflect.TypeOf
      		zodb.RegisterClass("mymodule.MyObject", t(MyObject{}), t(myObjectState))
      	}
      
      and new instances of MyObject has to be created via zodb.NewPersistent:
      
      	obj := zodb.NewPersistent(reflect.TypeOf(MyObject{}), jar).(*MyObject)
      
      SetState corresponds to __setstate__ in Python. However in Go version it
      is explicitly separated from class's public API - as it is the contract
      between a class and persistency machinery, not between the class and its
      user. Notice that SetState takes raw buffer as its argument. In the
      following patch we'll add SetState cousing (PySetState) that will be
      taking unpickled objects as the state - exactly how __setstate__
      operates in Python. Classes will be able to choose whether to accept
      state as raw bytes or as a python object.
      
      The activation/deactivation is implemented via reference counting.
      
      Tests are pending (for PySetState).
      354e0e51
    • Kirill Smelkov's avatar
      go/zodb: IPersistent + Connection stub · f6a27a1e
      Kirill Smelkov authored
      Add to ZODB/go IPersistent - the interface that is used to represent
      in-RAM application-level objects that are mirroring objects in database.
      
      The interface is modelled after Python's IPersistent
      
      	https://github.com/zopefoundation/ZODB/blob/3.10.7-4-gb8d7a8567/src/persistent/interfaces.py#L22
      
      but is not exactly equal to it. In particular we support concurrent
      access to an object from multiple goroutines simultaneously.
      
      Due to concurrency support there is no STICKY state, because STICKY is
      used in CPython version to temporarily pin object in RAM briefly and is
      not safe to use from multiple threads there. Correspondingly the
      semantic of PActivate is a bit different from _p_activate - in Go, after
      an object has been activated, it is guaranteed that it will remain
      present in RAM until it is explicitly deactivated by user.
      
      Please see details of the activation protocol in IPersistent
      documentation.
      
      ZODB/py uses interface (IDataManager) for a persistent-object's jar, but
      in Go I decided, at least for now, to go without explicit interface at
      that level. For this reason a concrete type - Connection - will be used,
      and so its stub is also introduced in the patch, since IPersistent wants
      to return the connection via PJar.
      f6a27a1e
    • Kirill Smelkov's avatar
      go/zodb: pkgdoc: Add section overviewing storage drivers · 9b751272
      Kirill Smelkov authored
      We already have the functionality, just add an overview on how to
      implement drivers and use the most common ones.
      9b751272
    • Kirill Smelkov's avatar
      go/zodb: pkgdoc: Put zodbtools reference into "Miscellaneous" section · 498606b4
      Kirill Smelkov authored
      There will be many text added to pkgdoc with new sections and
      per-section footnotes, and this way it is better to use a dedicated
      section for references instead of global footnote whose context might
      become unclear.
      498606b4
    • Kirill Smelkov's avatar
      go/zodb: pkgdoc: "Operations" -> "Storage layer" · 25170e24
      Kirill Smelkov authored
      As we are going to add another - "Application layer" to zodb package,
      turn previous text overviewing IStorage & friends into "Storage layer"
      section.
      25170e24
    • Kirill Smelkov's avatar
      go/zodb: pkgdoc: Stress that objects can reference each other in the database · a5ecb24b
      Kirill Smelkov authored
      We are too used to have this for granted with ZODB, but this property of
      object databases is not generally universally available in other databases.
      a5ecb24b
    • Kirill Smelkov's avatar
      go/zodb: Pkgdoc cosmetics · 02c0a3d2
      Kirill Smelkov authored
      - change "types, interfaces and errors" to API in the header.
      - it is not only data model, but also API that is tried to be reasonable
        compatible with ZODB/py.
      - an article before "the" transaction is better.
      02c0a3d2
  6. 08 Aug, 2018 2 commits
  7. 07 Aug, 2018 1 commit
  8. 25 Jul, 2018 1 commit
  9. 20 Jul, 2018 4 commits
  10. 11 Jul, 2018 13 commits
    • Kirill Smelkov's avatar
      go/zodb: Tweak documentation a bit so it renders more well in godoc · 8685b742
      Kirill Smelkov authored
      While at it add draft overview of data model & friends to package
      documentation.
      8685b742
    • Kirill Smelkov's avatar
    • Kirill Smelkov's avatar
      go/neo/t/nxd/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally · f67c147d
      Kirill Smelkov authored
      We send output from tested process to master. We also print it to
      stdout,stderr so it appears in testnode logs.
      
      However till now it was like, whole output first read, and only then
      emitted to log as a whole, thus not allowing to oversee current test
      progress by watching testnode log tail.
      
      Fix it by implementing the teeing process manually.
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/aa370ca3        fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
      	lab.nexedi.com/kirr/neo/commit/096550b1        fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
      	lab.nexedi.com/kirr/neo/commit/63956f43        fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
      	lab.nexedi.com/kirr/neo/commit/b9819d0e        X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
      f67c147d
    • Kirill Smelkov's avatar
    • Kirill Smelkov's avatar
      go/neo/t/benchplot: New program to visualise neotest benchmarks (draft) · 0fb3d795
      Kirill Smelkov authored
      Add the program that reads results from either bench-local or bench-cluster
      neotest output and visualizes it. It uses benchlib.py module to read data
      in Go benchmark format(*), processes them and plots scalability and other
      graphs via matplotlib.
      
      There are lots of hacks and rough edges, and in particular callout coordinate
      calculation is completely wrong. However even in this state benchplot was used
      to prepare the graphs in http://navytux.spb.ru/~kirr/neo.html and
      http://navytux.spb.ru/~kirr/misc/neo·P4.html .
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/078c9ac3        X move benchlib to -> https://lab.nexedi.com/kirr/pygolang
      	lab.nexedi.com/kirr/neo/commit/0edd5129        X benchplot: Teach it to understand benchmark names for partitioned NEO clusters
      	lab.nexedi.com/kirr/neo/commit/a1dde3c9        X deco-rio timings
      	lab.nexedi.com/kirr/neo/commit/916782b6        X normalize/convert units, so that disk and ping/tcp latencies could be plotted too
      	lab.nexedi.com/kirr/neo/commit/f5fec740        X switch node info to labels; start adding that to plot
      	lab.nexedi.com/kirr/neo/commit/906462a3        X neotest: Move cluster / node out fro benchmark name to label in environment
      	lab.nexedi.com/kirr/neo/commit/cceca65f        X benchplot: Start of automated plotting for neotest benchmark data
      	lab.nexedi.com/kirr/neo/commit/a9b10a45        X benchlib/benchstat: Emit label:value info for several labels on one line, similary to go version
      	lab.nexedi.com/kirr/neo/commit/502d9477        X benchlib: Python module to read & work with data in Go benchmark format
      
      (*) benchlib.py is now part of pygolang: https://pypi.org/project/pygolang .
      0fb3d795
    • Kirill Smelkov's avatar
      go/neo/t/neotest: Add bench-local and bench-cluster commands · 92a30ef1
      Kirill Smelkov authored
      These commands do full benchmarking for localhost and networked cases:
      
      - show system info
      - do server & client cpu benchmarks
      - do server disk benchmarks
      - for networked case: do network benchmarks
      - tail to either zbench-local or zbench-cluster
      
      It was full `neotest bench-local` that was used to prepare benchmarks
      for http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html
      92a30ef1
    • Kirill Smelkov's avatar
      go/internal/xzlib: Switch to github.com/DataDog/czlib to zlib Decompression · 7385209f
      Kirill Smelkov authored
      name                 old time/op    new time/op    delta
      unzlib/py/wczdata      20.8µs ± 2%    20.7µs ± 1%     ~     (p=0.421 n=5+5)
      unzlib/go/wczdata      64.4µs ± 1%    21.3µs ± 0%  -66.89%  (p=0.008 n=5+5)
      unzlib/py/prod1-avg    4.00µs ± 1%    4.02µs ± 1%     ~     (p=0.421 n=5+5)
      unzlib/go/prod1-avg    10.4µs ± 1%     4.3µs ± 1%  -58.72%  (p=0.008 n=5+5)
      
      There is also unsafe interface with czlib.UnsafeDecompress & friends which I
      had not tried because even using safe interface brings ~ 3x speedup.
      7385209f
    • Kirill Smelkov's avatar
      go/internal/xzlib: Try to reuse zlib decoders · fc44cbd7
      Kirill Smelkov authored
      name                 old time/op    new time/op    delta
      unzlib/py/wczdata      20.7µs ± 2%    20.8µs ± 2%     ~     (p=0.548 n=5+5)
      unzlib/go/wczdata      70.6µs ± 0%    64.4µs ± 1%   -8.85%  (p=0.008 n=5+5)
      unzlib/py/prod1-avg    4.02µs ± 1%    4.00µs ± 1%     ~     (p=0.167 n=5+5)
      unzlib/go/prod1-avg    15.2µs ± 0%    10.4µs ± 1%  -31.59%  (p=0.008 n=5+5)
      
      still on wczdata and prod1 much slower compared to py/c zlib.
      fc44cbd7
    • Kirill Smelkov's avatar
      go/neo/t/neotest: bench-cpu += unzlib for wczblk1 and prod1 objects · 91a8afa8
      Kirill Smelkov authored
      NEO uses zlib compression for data, and this way client has to spend
      time decompressing it. Benchmark how much time zlib decompression takes.
      With stdlib zlib decompressor out of the box it looks like:
      
      	name                 time/op
      	unzlib/py/wczdata    20.7µs ± 2%
      	unzlib/go/wczdata    70.6µs ± 0%
      	unzlib/py/prod1-avg  4.02µs ± 1%
      	unzlib/go/prod1-avg  15.2µs ± 0%
      
      i.e. much not in favour of Go.
      
      We'll be fixing that in the following patches.
      91a8afa8
    • Kirill Smelkov's avatar
      go/neo/t/neotest: Switch to zwrk to simulate parallel load from multiple clients · 646a94b5
      Kirill Smelkov authored
      With zwrk for ZODB being similar to what wrk is for HTTP.
      
      Rationale: simulating multiple clients is:
      
      1. noisy - the timings from run to run are changing sometimes up to 50%
      2. with significant additional overhead - there are constant OS-level
         process switches in between client processes and this prevents to
         actually create the load.
      3. the above load from "2" actually takes resources from the server in
         localhost case.
      
      So let's switch to simulating many requests in lightweight way similarly
      to how it is done in wrk - in one process and not so many threads (it
      can be just 1) with many connections opened to server and epolly way to
      load it with Go providing epoll-goroutine matching.
      
      Example summarized zbench-local output:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.txt
      	name                             time/object
      	cluster:rio dataset:wczblk1-8
      	fs1-zhash.py                             23.7µs ± 5%
      	fs1-zhash.go                             5.68µs ± 8%
      	fs1-zhash.go+prefetch128                 6.44µs ±16%
      	zeo/py/fs1-zhash.py                       376µs ± 4%
      	zeo/py/fs1-zhash.go                       130µs ± 3%
      	zeo/py/fs1-zhash.go+prefetch128          72.3µs ± 4%
      	neo/py(!log)/sqlite·P1-zhash.py           565µs ± 4%
      	neo/py(!log)/sql·P1-zhash.py              491µs ± 8%
      	cluster:rio dataset:prod1-1024
      	fs1-zhash.py                             19.5µs ± 2%
      	fs1-zhash.go                             3.92µs ±12%
      	fs1-zhash.go+prefetch128                 4.42µs ± 6%
      	zeo/py/fs1-zhash.py                       365µs ± 9%
      	zeo/py/fs1-zhash.go                       120µs ± 1%
      	zeo/py/fs1-zhash.go+prefetch128          68.4µs ± 3%
      	neo/py(!log)/sqlite·P1-zhash.py           560µs ± 5%
      	neo/py(!log)/sql·P1-zhash.py              482µs ± 8%
      
      	name                             req/s
      	cluster:rio dataset:wczblk1-8
      	fs1-zwrk.go·1                              380k ± 2%
      	fs1-zwrk.go·2                              666k ± 3%
      	fs1-zwrk.go·3                              948k ± 1%
      	fs1-zwrk.go·4                             1.24M ± 1%
      	fs1-zwrk.go·8                             1.62M ± 0%
      	fs1-zwrk.go·12                            1.70M ± 0%
      	fs1-zwrk.go·16                            1.71M ± 0%
      	zeo/py/fs1-zwrk.go·1                      8.29k ± 1%
      	zeo/py/fs1-zwrk.go·2                      10.4k ± 2%
      	zeo/py/fs1-zwrk.go·3                      11.2k ± 1%
      	zeo/py/fs1-zwrk.go·4                      11.7k ± 1%
      	zeo/py/fs1-zwrk.go·8                      12.1k ± 2%
      	zeo/py/fs1-zwrk.go·12                     12.3k ± 1%
      	zeo/py/fs1-zwrk.go·16                     12.3k ± 2%
      	cluster:rio dataset:prod1-1024
      	fs1-zwrk.go·1                              594k ± 7%
      	fs1-zwrk.go·2                             1.14M ± 4%
      	fs1-zwrk.go·3                             1.60M ± 2%
      	fs1-zwrk.go·4                             2.09M ± 1%
      	fs1-zwrk.go·8                             2.74M ± 1%
      	fs1-zwrk.go·12                            2.76M ± 0%
      	fs1-zwrk.go·16                            2.76M ± 1%
      	zeo/py/fs1-zwrk.go·1                      9.42k ± 9%
      	zeo/py/fs1-zwrk.go·2                      10.4k ± 1%
      	zeo/py/fs1-zwrk.go·3                      11.4k ± 1%
      	zeo/py/fs1-zwrk.go·4                      11.7k ± 2%
      	zeo/py/fs1-zwrk.go·8                      12.4k ± 1%
      	zeo/py/fs1-zwrk.go·12                     12.5k ± 1%
      	zeo/py/fs1-zwrk.go·16                     13.4k ±11%
      
      	name                             latency-time/object
      	cluster:rio dataset:wczblk1-8
      	fs1-zwrk.go·1                            2.63µs ± 2%
      	fs1-zwrk.go·2                            3.00µs ± 3%
      	fs1-zwrk.go·3                            3.16µs ± 1%
      	fs1-zwrk.go·4                            3.23µs ± 1%
      	fs1-zwrk.go·8                            4.94µs ± 0%
      	fs1-zwrk.go·12                           7.06µs ± 0%
      	fs1-zwrk.go·16                           9.36µs ± 0%
      	zeo/py/fs1-zwrk.go·1                      121µs ± 1%
      	zeo/py/fs1-zwrk.go·2                      192µs ± 2%
      	zeo/py/fs1-zwrk.go·3                      267µs ± 1%
      	zeo/py/fs1-zwrk.go·4                      343µs ± 1%
      	zeo/py/fs1-zwrk.go·8                      660µs ± 2%
      	zeo/py/fs1-zwrk.go·12                     977µs ± 1%
      	zeo/py/fs1-zwrk.go·16                    1.30ms ± 2%
      	cluster:rio dataset:prod1-1024
      	fs1-zwrk.go·1                            1.69µs ± 7%
      	fs1-zwrk.go·2                            1.76µs ± 4%
      	fs1-zwrk.go·3                            1.88µs ± 2%
      	fs1-zwrk.go·4                            1.91µs ± 1%
      	fs1-zwrk.go·8                            2.92µs ± 1%
      	fs1-zwrk.go·12                           4.34µs ± 0%
      	fs1-zwrk.go·16                           5.80µs ± 1%
      	zeo/py/fs1-zwrk.go·1                      107µs ± 9%
      	zeo/py/fs1-zwrk.go·2                      192µs ± 1%
      	zeo/py/fs1-zwrk.go·3                      263µs ± 1%
      	zeo/py/fs1-zwrk.go·4                      342µs ± 2%
      	zeo/py/fs1-zwrk.go·8                      648µs ± 1%
      	zeo/py/fs1-zwrk.go·12                     957µs ± 1%
      	zeo/py/fs1-zwrk.go·16                    1.20ms ±10%
      
      The scalability graphs in http://navytux.spb.ru/~kirr/neo.html were
      made with simulating client load by zwrk, not many client OS processes.
      http://navytux.spb.ru/~kirr/neo.html#performance-tests has some
      additional notes on zwrk.
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/ca0d828b	X neotest: Tzwrk1 - place to control running time of 1 zwrk iteration
      	lab.nexedi.com/kirr/neo/commit/bbfb5006	X zwrk: Make sure we warm up connections to all NEO storages when cluster is partitioned
      	lab.nexedi.com/kirr/neo/commit/7f22bba6	X zwrk: New tool to simulate paralell load from multiple clients
      646a94b5
    • Kirill Smelkov's avatar
      go/zodb: Allow to open a storage in "direct" mode - without local cache · 1f92a4e2
      Kirill Smelkov authored
      zodb/go provides generic cache (see 7233b4c0 "zodb/go: In-RAM client
      cache") primarily in order for prefetch to work. However if we need to
      benchmark a storage with loading some objects several times, this cache
      can hide the actual time it takes for an object to load.
      
      For such use cases add NoCache open option so that opening does not
      create a cache and always conveys load operations directly to storage
      driver. The option will be used by zwrk tool (see next patch).
      1f92a4e2
    • Kirill Smelkov's avatar
      go/neo/t/neotest: ZODB benchmarks · 3f578560
      Kirill Smelkov authored
      Add to neotest zbench-local and zbench-cluster commands that perform
      ZODB benchmarks on FileStorage, ZEO and NEO with Python and Go clients
      either locally, or with a server and client running on 2 different nodes.
      
      There are 2 client programs: tzodb.py and tzodb.go which for now compute
      hash of whole latest objects stream in a ZODB database. On server side
      neotest is taught to launch ZEO and various NEO clusters and to execute
      client load on them.
      
      Two test datasets are used: wczblk1-8 - the dataset with wendelin.core ZBlk1
      objects covering 8M array, and prod1-1024 - synthethic dataset that tries to
      represent regular ERP5 instance. Both datasets are very small and so we can
      assume they reside completely in server disk cache while running benchmarks.
      Benchmark timings will thus give pure storage software processing latency, as
      pagecache hit time is on par, or less, to 1µs.
      
      Example output:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest zbench-local
      	dataset:	wczblk1-8
      	node:
      	cluster:	deco
      
      	*** generating fs1 data...
      	I: RAM:  7.47GB
      	I: WORK: 0.01GB
      	gen signal t=0...1.05e+06  float64  (= 0.01GB)
      	gen signal blk [0:1048576]  (100.0%)
      	VIRT: 297 MB	RSS: 48MB
      
      	*** generating sqlite data...
      	I: RAM:  7.47GB
      	I: WORK: 0.01GB
      	gen signal t=0...1.05e+06  float64  (= 0.01GB)
      	gen signal blk [0:1048576]  (100.0%)
      	VIRT: 386 MB	RSS: 58MB
      	2018-07-10 19:57:35.7065 ERROR     NEO        [           app: 91] primary master is down
      	Cluster state changed
      
      	*** generating sql data...
      	2018-07-10 19:57:35 140115116649600 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27574 ...
      	2018-07-10 19:57:39 140205509999744 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27603 ...
      	2018-07-10 19:57:42 139692109810816 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27633 ...
      	2018-07-10 19:57:45 139759221546112 [Note] mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27662 ...
      	I: RAM:  7.47GB
      	I: WORK: 0.01GB
      	gen signal t=0...1.05e+06  float64  (= 0.01GB)
      	gen signal blk [0:1048576]  (100.0%)
      	VIRT: 387 MB	RSS: 59MB
      	2018-07-10 19:57:48.2565 ERROR     NEO        [           app: 91] primary master is down
      	Cluster state changed
      
      	*** FileStorage
      	Benchmarkfs1-zhash.py 2127 16.3 µs/object	# crc32:14640593  nread=8540363  t=0.035s	# POLL·2 C1·73 C1E·38 C3·12 C6·36 C7s·0 C8·112 C9·0 C10·62
      	Benchmarkfs1-zhash.py 2127 16.6 µs/object	# crc32:14640593  nread=8540363  t=0.035s	# POLL·0 C1·113 C1E·21 C3·16 C6·56 C7s·0 C8·136 C9·0 C10·41
      	Benchmarkfs1-zhash.py 2127 15.9 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·71 C1E·36 C3·22 C6·50 C7s·0 C8·167 C9·0 C10·47
      	Benchmarkfs1-zhash.py 2127 15.9 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·77 C1E·32 C3·11 C6·55 C7s·0 C8·184 C9·0 C10·31
      	Benchmarkfs1-zhash.py 2127 16.0 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·78 C1E·15 C3·12 C6·51 C7s·0 C8·140 C9·0 C10·44
      
      	# 16 clients in parallel
      	Benchmarkfs1-zhash.py·P16 2127 129.0 µs/object	# crc32:14640593  nread=8540363  t=0.274s
      	Benchmarkfs1-zhash.py·P16 2127 132.6 µs/object	# crc32:14640593  nread=8540363  t=0.282s
      	Benchmarkfs1-zhash.py·P16 2127 135.0 µs/object	# crc32:14640593  nread=8540363  t=0.287s
      	Benchmarkfs1-zhash.py·P16 2127 135.3 µs/object	# crc32:14640593  nread=8540363  t=0.288s
      	Benchmarkfs1-zhash.py·P16 2127 136.6 µs/object	# crc32:14640593  nread=8540363  t=0.291s
      	Benchmarkfs1-zhash.py·P16 2127 122.8 µs/object	# crc32:14640593  nread=8540363  t=0.261s
      	Benchmarkfs1-zhash.py·P16 2127 130.9 µs/object	# crc32:14640593  nread=8540363  t=0.279s
      	Benchmarkfs1-zhash.py·P16 2127 126.4 µs/object	# crc32:14640593  nread=8540363  t=0.269s
      	Benchmarkfs1-zhash.py·P16 2127 125.8 µs/object	# crc32:14640593  nread=8540363  t=0.268s
      	Benchmarkfs1-zhash.py·P16 2127 108.3 µs/object	# crc32:14640593  nread=8540363  t=0.230s
      	Benchmarkfs1-zhash.py·P16 2127 131.0 µs/object	# crc32:14640593  nread=8540363  t=0.279s
      	Benchmarkfs1-zhash.py·P16 2127 124.1 µs/object	# crc32:14640593  nread=8540363  t=0.264s
      	Benchmarkfs1-zhash.py·P16 2127 129.3 µs/object	# crc32:14640593  nread=8540363  t=0.275s
      	Benchmarkfs1-zhash.py·P16 2127 125.0 µs/object	# crc32:14640593  nread=8540363  t=0.266s
      	Benchmarkfs1-zhash.py·P16 2127 131.5 µs/object	# crc32:14640593  nread=8540363  t=0.280s
      	Benchmarkfs1-zhash.py·P16 2127 131.4 µs/object	# crc32:14640593  nread=8540363  t=0.280s
      	# POLL·0 C1·4 C1E·13 C3·11 C6·79 C7s·0 C8·14 C9·0 C10·0
      
      	...
      
      And its summary via benchstat:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.log
      	name                                 time/object
      	cluster:deco dataset:wczblk1-8
      	fs1-zhash.py                         16.1µs ± 3%
      	fs1-zhash.py·P16                      130µs ± 5%
      	fs1-zhash.go                         3.00µs ±10%
      	fs1-zhash.go+prefetch128             3.40µs ±18%
      	fs1-zhash.go·P16                     10.2µs ±71%
      	zeo/py/fs1-zhash.py                   336µs ± 3%
      	zeo/py/fs1-zhash.py·P16              3.22ms ± 6%
      	zeo/py/fs1-zhash.go                   112µs ± 2%
      	zeo/py/fs1-zhash.go+prefetch128      60.9µs ± 1%
      	zeo/py/fs1-zhash.go·P16              1.07ms ± 5%
      	neo/py(!log)/sqlite·P1-zhash.py       291µs ± 2%
      	neo/py(!log)/sqlite·P1-zhash.py·P16  2.86ms ± 1%
      	neo/py(!log)/sql·P1-zhash.py          318µs ± 4%
      	neo/py(!log)/sql·P1-zhash.py·P16     3.99ms ± 0%
      	cluster:deco dataset:prod1-1024
      	fs1-zhash.py                         12.3µs ± 1%
      	fs1-zhash.py·P16                      106µs ±10%
      	fs1-zhash.go                         2.56µs ±10%
      	fs1-zhash.go+prefetch128             2.68µs ± 8%
      	fs1-zhash.go·P16                     9.48µs ±43%
      	zeo/py/fs1-zhash.py                   319µs ± 3%
      	zeo/py/fs1-zhash.py·P16              3.13ms ± 3%
      	zeo/py/fs1-zhash.go                   101µs ± 5%
      	zeo/py/fs1-zhash.go+prefetch128      56.9µs ± 1%
      	zeo/py/fs1-zhash.go·P16              1.19ms ± 4%
      	neo/py(!log)/sqlite·P1-zhash.py       281µs ± 3%
      	neo/py(!log)/sqlite·P1-zhash.py·P16  2.80ms ± 1%
      	neo/py(!log)/sql·P1-zhash.py          316µs ± 1%
      	neo/py(!log)/sql·P1-zhash.py·P16     3.91ms ± 1%
      
      Since there is no NEO/go support yet, corresponding neotest parts are merged,
      but commented-out with appropriate remark.
      
      Parallel access is simulated with spawning many OS processes for now.
      This will change in the nearby followup patch to zwrk.
      
      Results of ZODB benchmarking were discussed in
      
      	http://navytux.spb.ru/~kirr/neo.html#performance-tests		, and
      	http://navytux.spb.ru/~kirr/neo.html#results-and-discussion
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/e0d875bc	X neotest: Teach it to benchmark NEO with storage partitioned to several nodes
      	lab.nexedi.com/kirr/neo/commit/590f0a46	X neo/py uses n(replica) as n(real-replica) - 1
      	lab.nexedi.com/kirr/neo/commit/b655da26	X save time not benchmarking things we do not show
      	lab.nexedi.com/kirr/neo/commit/f834f40d	X zhash: Show N(obj) read, not 1, in place of N(iter)
      	lab.nexedi.com/kirr/neo/commit/a16e8d52	X teach golang to access ZEO
      	lab.nexedi.com/kirr/neo/commit/b9827725	X switch to using no compression, because this way it is more fair for comparing storage latencies
      	lab.nexedi.com/kirr/neo/commit/c0067335	X neotest: Don't depend on killall
      	lab.nexedi.com/kirr/neo/commit/2bcd6ebb	X neotest: add zbench-local & zbench-cluster subcomands
      	lab.nexedi.com/kirr/neo/commit/fb165ad9	X neotest: Also benchmark NEO/py with logging disabled
      	lab.nexedi.com/kirr/neo/commit/2118ba38	X neotest: Help mysqlk_install_db find its basedir under SlapOS
      	lab.nexedi.com/kirr/neo/commit/80eaa05e	X zgenprod1 tool
      	lab.nexedi.com/kirr/neo/commit/eb0e516f	X check hash result and error if mismatch (zhash.* part); neotest part pending
      	lab.nexedi.com/kirr/neo/commit/046370db	X benchify rest of bench-cluster
      	lab.nexedi.com/kirr/neo/commit/2d13818e	X bench-local + zhash: Add output in std bench format
      	lab.nexedi.com/kirr/neo/commit/1d692a3b	X add NEO/go with SHA1 disabled (both Sgo and Cgo to regular benchmarks)
      3f578560
    • Kirill Smelkov's avatar
      go/neo/t/neotest: Network information & benchmarks · 26006d7e
      Kirill Smelkov authored
      Add to neotest bench-net command that performs latency measurments at
      ping and TCP levels. Example output:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-net neotest@rio:9
      	node:
      	cluster:        deco-rio
      
      	*** link latency:
      
      	# deco ⇄ rio (ping 16B)
      	PING rio (192.168.0.8) 16(44) bytes of data.
      
      	--- rio ping statistics ---
      	25705 packets transmitted, 25705 received, 0% packet loss, time 2999ms
      	rtt min/avg/max/mdev = 0.080/0.097/0.220/0.011 ms, ipg/ewma 0.116/0.095 ms
      	Benchmarkpingrtt-/16B-min 1 0.080 ms/op
      	Benchmarkpingrtt-/16B-avg 1 0.097 ms/op
      	# POLL·3 C1·476 C1E·60917 C3·53 C6·132 C7s·0 C8·203 C9·0 C10·141
      
      	...
      
      	*** TCP latency:
      
      	# deco ⇄ rio (lat_tcp.c 1B  -> lat_tcp.c -s)
      	Benchmarktcprtt(c_c)-/1B 1 116.1743 µs/op       # TCP latency using rio: 116.1743 microseconds  # POLL·6 C1·892 C1E·65748 C3·80 C6·165 C7s·0 C8·339 C9·0 C10·444
      	Benchmarktcprtt(c_c)-/1B 1 117.2896 µs/op       # TCP latency using rio: 117.2896 microseconds  # POLL·4 C1·1063 C1E·67647 C3·64 C6·77 C7s·0 C8·144 C9·0 C10·209
      	Benchmarktcprtt(c_c)-/1B 1 117.5331 µs/op       # TCP latency using rio: 117.5331 microseconds  # POLL·1 C1·954 C1E·76866 C3·96 C6·88 C7s·0 C8·206 C9·0 C10·246
      	Benchmarktcprtt(c_c)-/1B 1 117.6509 µs/op       # TCP latency using rio: 117.6509 microseconds  # POLL·4 C1·731 C1E·84210 C3·103 C6·93 C7s·0 C8·180 C9·0 C10·187
      	Benchmarktcprtt(c_c)-/1B 1 116.8125 µs/op       # TCP latency using rio: 116.8125 microseconds  # POLL·9 C1·550 C1E·79544 C3·110 C6·213 C7s·0 C8·508 C9·0 C10·475
      
      	...
      
      And its summary via benchstat:
      
      	name                 time/op
      	pingrtt-/16B-min     80.0µs ± 0%
      	pingrtt-/16B-avg     97.0µs ± 0%
      	-pingrtt/16B-min     79.0µs ± 0%
      	-pingrtt/16B-avg      112µs ± 0%
      	pingrtt-/1452B-min    241µs ± 0%
      	pingrtt-/1452B-avg    303µs ± 0%
      	-pingrtt/1452B-min    266µs ± 0%
      	-pingrtt/1452B-avg    303µs ± 0%
      	tcprtt(c_c)-/1B       117µs ± 1%
      	tcprtt(c_go)-/1B      122µs ± 2%
      	-tcprtt(c_c)/1B       117µs ± 1%
      	-tcprtt(c_go)/1B      121µs ± 5%
      	tcprtt(c_c)-/1400B    392µs ± 4%
      	tcprtt(c_go)-/1400B   363µs ±18%
      	-tcprtt(c_c)/1400B    412µs ±21%
      	-tcprtt(c_go)/1400B   391µs ±38%
      	tcprtt(c_c)-/1500B    271µs ±18%
      	tcprtt(c_go)-/1500B   290µs ±21%
      	-tcprtt(c_c)/1500B    282µs ±16%
      	-tcprtt(c_go)/1500B   334µs ±24%
      	tcprtt(c_c)-/4096B    711µs ± 5%
      	tcprtt(c_go)-/4096B   737µs ± 5%
      	-tcprtt(c_c)/4096B    740µs ± 2%
      	-tcprtt(c_go)/4096B   711µs ± 7%
      
      Latencies here are not good because for this run on rio interrupt mitigation
      was not tuned (see below). By the way, analyzing ping RTT latencies on our
      shuttle machines (similar to rio) resulted in the following kernel patch
      
      	https://git.kernel.org/linus/509708310c (released with Linux 4.15)
      
      to fix/being able to adjust interrupt mitigation on Realtek NICs.
      
      While at networking topic, teach info/info-local to show related
      information about node's NICs. Example lines output for deco:
      
      	nic/eth0: Intel Corporation Ethernet Connection I219-LM rev 21
      	nic/eth0/features: rx tx sg tso !ufo gso gro !lro rxvlan txvlan !ntuple rxhash ...
      	nic/eth0/coalesce: rxc: 3μs/0f/0μs-irq/0f-irq,  txc: 0μs/0f/0μs-irq/0f-irq
      	nic/eth0/status:   up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	nic/wlan0: Intel Corporation Wireless 8260 rev 3a
      	nic/wlan0/features: !rx !tx sg !tso !ufo gso gro !lro !rxvlan !txvlan !ntuple !rxhash ...
      	nic/wlan0/coalesce: rxc: ?,  txc: ?
      	nic/wlan0/status:   down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	WARNING: nic/wlan0: TSO not enabled - TCP latency with packets > MSS will be poor
      
      for rio:
      
      	nic/eth0: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06
      	nic/eth0/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ...
      	nic/eth0/coalesce: rxc: 200μs/4f/0μs-irq/0f-irq,  txc: 200μs/4f/0μs-irq/0f-irq
      	nic/eth0/status:   up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	WARNING: nic/eth0: TSO not enabled - TCP latency with packets > MSS will be poor
      	WARNING: nic/eth0: RX coalesce latency is max 200μs - that will add to networked request-reply latency
      	nic/eth1: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06
      	nic/eth1/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ...
      	nic/eth1/coalesce: rxc: 0μs/1f/0μs-irq/0f-irq,  txc: 0μs/1f/0μs-irq/0f-irq
      	nic/eth1/status:   down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	WARNING: nic/eth1: TSO not enabled - TCP latency with packets > MSS will be poor
      
      The warning about "RX coalesce latency is max 200μs ..." says that on
      receive path eth0 will be coalescing incoming frames for up to 200μs and
      this way this delay will be added to overal latency. (for small frames
      Realtek NICs do not coalesce interrupts - see details in the kernel patch).
      
      Networked performance (raw and NEO) was not discussed in
      http://navytux.spb.ru/~kirr/neo.html at all, but for the reference the
      importance of C-states for performance was first found via this
      networking latency benchmarks. Links on C-states topic:
      
      	http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states
      	http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/e8e395ae	X neotest: Move network benchmarking into separate function + add `neotest bench-net`
      	lab.nexedi.com/kirr/neo/commit/a971231c	X neotest/info: Handle USB NICs
      	lab.nexedi.com/kirr/neo/commit/5dd3d1ab	X neotest: sort NIC names
      	lab.nexedi.com/kirr/neo/commit/9888f047	X neotest: Do not crash if kernel is too old to support gro_flush_timeout
      	lab.nexedi.com/kirr/neo/commit/3a1bdf4a	X bench-remote / tcp : std benchmark output
      	lab.nexedi.com/kirr/neo/commit/9450b6db	X bench-remote / ping += std bench output
      	lab.nexedi.com/kirr/neo/commit/68d5b015	X show gro_flush_timeout + friends
      	lab.nexedi.com/kirr/neo/commit/4c815af9	X neotest: Show NIC features and emit warning if !TSO
      	lab.nexedi.com/kirr/neo/commit/659ce938	X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc...
      	lab.nexedi.com/kirr/neo/commit/ded384cb	X neotest += `lat_tcp.go -s`
      	lab.nexedi.com/kirr/neo/commit/59d46504	X neotest += lat_tcp
      	lab.nexedi.com/kirr/neo/commit/67fc3440	X show small (56B) and full-packet (1472B) ping link latencies
      26006d7e