- 08 Oct, 2018 1 commit
-
-
Kirill Smelkov authored
2dba8607 (go/zodb/btree: New package to work with ZODB BTrees (draft)) added btree module with btree.BTree essentially being LOBTree (int64 key -> object). Since for wendelin.core we also need IOBTree (int32 key -> object), which is used in ZBlk1 https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.12-6-g318efce/bigfile/file_zodb.py#L267 https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.12-6-g318efce/bigfile/file_zodb.py#L374 let's turn btree module into template internally and generate code for both LOBTree and IOBTree. For the reference BTree/py takes similar approach with respect to templating.
-
- 02 Oct, 2018 1 commit
-
-
Kirill Smelkov authored
To know database state corresponding to the connection.
-
- 01 Oct, 2018 1 commit
-
-
Kirill Smelkov authored
The format of tid assumes ~ ns precision, and it is only formatted to µs precision by default. So don't truncate TimeStamp value when computing it from Tid, and perform the µs-rounding only on formatting. The float numbers are not always exactly as in python. For example the following program tidv = [ 0x0000000000000000, 0x0285cbac258bf266, 0x0285cbad27ae14e6, 0x037969f722a53488, 0x03b84285d71c57dd, 0x03caa84275fc1166, ] for tid in tidv: t = TimeStamp.TimeStamp(p64(tid)) print '0x%016x %s %.9f\t%.9f' % (tid, t, t.timeTime(), t.second()) prints: 0x0000000000000000 1900-01-01 00:00:00.000000 -2208988800.000000000 0.000000000 0x0285cbac258bf266 1979-01-03 21:00:08.800000 284245208.800000191 8.800000185 0x0285cbad27ae14e6 1979-01-03 21:01:09.300001 284245269.300001621 9.300001496 <-- ex here 0x037969f722a53488 2008-10-24 05:11:08.120000 1224825068.119999886 8.119999878 0x03b84285d71c57dd 2016-07-01 09:41:50.416574 1467366110.416574001 50.416573989 0x03caa84275fc1166 2018-10-01 16:34:27.652650 1538411667.652649879 27.652650112 the difference is due to floating point operation ordering, because TimeStamp.timeTime() looses precision - e.g. for marked case: In [8]: '%.10f' % (281566860.000000000 + 9.300001496) Out[8]: '281566869.3000015020' We don't try to mimic float64 behaviour to Python exactly - because it is even different for PURE_PYTHON=y or C TimeStamp implementations. However we don't limit due to that our timestamp precision to only 1µs. In other words we keep on maintaining exact compatibility with Python on printing, but timestamp values itself are now ~ ns precision.
-
- 28 Sep, 2018 1 commit
-
-
Kirill Smelkov authored
As https://github.com/kisielk/og-rek/pull/57 maybe shows []byte was pickling as string only unintentionally and that might change. We are already explicitly checking for string in corresponding index load place: https://lab.nexedi.com/kirr/neo/blob/2dba8607/go/zodb/storage/fs1/index.go#L282 so it is better we also explicitly save the bits as string. If we don't and https://github.com/kisielk/og-rek/pull/57 gets accepted, tests will fail: --- FAIL: TestIndexSaveLoad (0.00s) index_test.go:176: index load: /tmp/t-index893650059/458967662/1.fs.index: pickle @6: invalid oidPrefix: type []uint8 Traceback (most recent call last): File "./py/indexcmp", line 41, in <module> main() File "./py/indexcmp", line 29, in main d2 = fsIndex.load(path2) File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/fsIndex.py", line 138, in load data[ensure_bytes(k)] = fsBucket().fromString(ensure_bytes(v)) File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/fsIndex.py", line 71, in ensure_bytes return s.encode('ascii') if not isinstance(s, bytes) else s AttributeError: 'bytearray' object has no attribute 'encode' --- FAIL: TestIndexSaveToPy (0.04s) index_test.go:218: zodb/py read/compare index: exit status 1
-
- 09 Aug, 2018 14 commits
-
-
Kirill Smelkov authored
Provide minimal support for BTrees.LOBTree Get for now.
-
Kirill Smelkov authored
DB represents a handle to database at application level and contains pool of connections. DB.Open opens database connection. The connection will be automatically put back into DB pool for future reuse after corresponding transaction is complete. DB thus provides service to maintain live objects cache and reuse live objects from transaction to transaction. Note that it is possible to have several DB handles to the same database. This might be useful if application accesses distinctly different sets of objects in different transactions and knows beforehand which set it will be next time. Then, to avoid huge cache misses, it makes sense to keep DB handles opened for every possible case of application access. TODO handle invalidations.
-
Kirill Smelkov authored
For example Wendelin.core wcfs will need to keep some types of objects (e.g. BigFile index) always in RAM for efficiency. Provide corresponding interface that application could use to install such live-cache eviction decision-making tuning.
-
Kirill Smelkov authored
Connection represents an application-level view of a ZODB database. It has groups of in-RAM application-level objects associated with it. The objects are isolated from both changes in further database transactions and from changes to in-RAM objects in other connections. Connection, as objects group manager, is responsible for handling object -> object database references. For this to work it keeps {} oid -> obj dict and uses it to find already loaded object when another object persistently references particular oid. Since it related pydata handling of persistent references is correspondingly implemented in this patch. The dict must keep weak references on objects. The following text explains the rationale: if Connection keeps strong link to obj, just obj.PDeactivate will not fully release obj if there are no references to it from other objects: - deactivate will release obj state (ok) - but there will be still reference from connection `oid -> obj` map to this object, which means the object won't be garbage-collected. -> we can solve it by using "weak" pointers in the map. NOTE we cannot use regular map and arbitrarily manually "gc" entries there periodically: since for an obj we don't know whether other objects are referencing it, we can't just remove obj's oid from the map - if we do so and there are other live objects that reference obj, user code can still reach obj via those references. On the other hand, if another, not yet loaded, object also references obj and gets loaded, traversing reference from that loaded object will load second copy of obj, thus breaking 1 object in db <-> 1 live object invariant: A → B → C ↓ | D <--------- - - -> D2 (wrong) - A activate - D activate - B activate - D gc, A still keeps link on D - C activate -> it needs to get to D, but D was removed from objtab -> new D2 is wrongly created that's why we have to depend on Go's GC to know whether there are still live references left or not. And that in turn means finalizers and thus weak references. some link on the subject: https://groups.google.com/forum/#!topic/golang-nuts/PYWxjT2v6ps
-
Kirill Smelkov authored
We will need weak references to handle {} oid -> obj inside zodb.Connection . In Go world they often say that weak references are not needed at all. Please see however the next patch for detailed rationale for why weak references (finalizers and cooperation from Go's GC in other words) are _required_ in that case.
-
Kirill Smelkov authored
As promised in 354e0e51 (go/zodb: Persistent - the base type to implement IPersistent objects) add support to persistency machinery to set object state from python pickles serialized by ZODB/py. Persistent references are not yet handled. As promised add some very minimal persistent tests.
-
Kirill Smelkov authored
Currently we handle many ways ZODB could serialize a Python class in PyData.ClassName. Since we'll be using this functionality in other places soon - extract it into dedicated function. Since will be also frequently using class.__module__ + "." + class.__name__ don't inline it in ClassName and instead put it into pyclassPath() right away.
-
Kirill Smelkov authored
Add the base type, that types which want to implement persistency could embed, and this way inherit persistent functionality. For example type MyObject struct { Persistent ... } type myObjectState MyObject func (o *myObjectState) DropState() { ... } func (o *myObjectState) SetState(state *mem.Buf) error { ... } Here state management methods (DropState and SetState) will be automatically used by persistency machinery on activation and deactivation. For this to work MyObject class has to be registered to ZODB func init() { t := reflect.TypeOf zodb.RegisterClass("mymodule.MyObject", t(MyObject{}), t(myObjectState)) } and new instances of MyObject has to be created via zodb.NewPersistent: obj := zodb.NewPersistent(reflect.TypeOf(MyObject{}), jar).(*MyObject) SetState corresponds to __setstate__ in Python. However in Go version it is explicitly separated from class's public API - as it is the contract between a class and persistency machinery, not between the class and its user. Notice that SetState takes raw buffer as its argument. In the following patch we'll add SetState cousing (PySetState) that will be taking unpickled objects as the state - exactly how __setstate__ operates in Python. Classes will be able to choose whether to accept state as raw bytes or as a python object. The activation/deactivation is implemented via reference counting. Tests are pending (for PySetState).
-
Kirill Smelkov authored
Add to ZODB/go IPersistent - the interface that is used to represent in-RAM application-level objects that are mirroring objects in database. The interface is modelled after Python's IPersistent https://github.com/zopefoundation/ZODB/blob/3.10.7-4-gb8d7a8567/src/persistent/interfaces.py#L22 but is not exactly equal to it. In particular we support concurrent access to an object from multiple goroutines simultaneously. Due to concurrency support there is no STICKY state, because STICKY is used in CPython version to temporarily pin object in RAM briefly and is not safe to use from multiple threads there. Correspondingly the semantic of PActivate is a bit different from _p_activate - in Go, after an object has been activated, it is guaranteed that it will remain present in RAM until it is explicitly deactivated by user. Please see details of the activation protocol in IPersistent documentation. ZODB/py uses interface (IDataManager) for a persistent-object's jar, but in Go I decided, at least for now, to go without explicit interface at that level. For this reason a concrete type - Connection - will be used, and so its stub is also introduced in the patch, since IPersistent wants to return the connection via PJar.
-
Kirill Smelkov authored
We already have the functionality, just add an overview on how to implement drivers and use the most common ones.
-
Kirill Smelkov authored
There will be many text added to pkgdoc with new sections and per-section footnotes, and this way it is better to use a dedicated section for references instead of global footnote whose context might become unclear.
-
Kirill Smelkov authored
As we are going to add another - "Application layer" to zodb package, turn previous text overviewing IStorage & friends into "Storage layer" section.
-
Kirill Smelkov authored
We are too used to have this for granted with ZODB, but this property of object databases is not generally universally available in other databases.
-
Kirill Smelkov authored
- change "types, interfaces and errors" to API in the header. - it is not only data model, but also API that is tried to be reasonable compatible with ZODB/py. - an article before "the" transaction is better.
-
- 08 Aug, 2018 2 commits
-
-
Kirill Smelkov authored
Since 0 is valid Oid that is used for root database object, zero Oid value cannot be used as invalid oid. -> Provide explicit invalid Oid constant. For symmetry do similarly to Tid, even though zero Tid value is invalid Tid.
-
Kirill Smelkov authored
This should be in 1f92a4e2 (go/zodb: Allow to open a storage in "direct" mode - without local cache).
-
- 07 Aug, 2018 1 commit
-
-
Kirill Smelkov authored
Add Go counterpart of Python transaction package[1,2]. The Go version is not complete - in particular Transaction.Commit is not yet implemented. However even in this state it is useful to have transaction around for read-only transaction cases. The synchronization logic is more well-thought - in particular there is no dance around "new transaction", because in Go, contrary to ZODB/py, ZODB connections will be always opened under already started transaction. See "Synchronization" section in package documentation for details on this topic. [1] http://transaction.readthedocs.org [2] https://github.com/zopefoundation/transaction
-
- 25 Jul, 2018 1 commit
-
-
Kirill Smelkov authored
Previously TxnComplete was typed, but TxnPacked and TxnInprogress were untyped constants. Make all constants have TxnStatus type.
-
- 20 Jul, 2018 4 commits
-
-
Kirill Smelkov authored
The reason for this is that the buffer can be shared with other loaders via cache.
-
Kirill Smelkov authored
There are situations when we want to work with, or emulate, only subset of whole IStorage functionality, with one already established example being Cache. For this reason split IStorage interface into finer-grained ones. For now: - Loader - Prefetcher - Iterator - Committer - Notifier
-
Kirill Smelkov authored
Oid(0) represents root database object. We cannot change this.
-
Kirill Smelkov authored
Missed few places in 8685b742.
-
- 11 Jul, 2018 14 commits
-
-
Kirill Smelkov authored
While at it add draft overview of data model & friends to package documentation.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
We send output from tested process to master. We also print it to stdout,stderr so it appears in testnode logs. However till now it was like, whole output first read, and only then emitted to log as a whole, thus not allowing to oversee current test progress by watching testnode log tail. Fix it by implementing the teeing process manually. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/aa370ca3 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally lab.nexedi.com/kirr/neo/commit/096550b1 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally lab.nexedi.com/kirr/neo/commit/63956f43 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally lab.nexedi.com/kirr/neo/commit/b9819d0e X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
-
Kirill Smelkov authored
The wrapper runs `neotest bench-local` in neotest SlapOS instance: https://lab.nexedi.com/nexedi/slapos/blob/ab8705f4/software/neotest/instance.cfg.in Extracted from nexedi/slapos!282
-
Kirill Smelkov authored
Add the program that reads results from either bench-local or bench-cluster neotest output and visualizes it. It uses benchlib.py module to read data in Go benchmark format(*), processes them and plots scalability and other graphs via matplotlib. There are lots of hacks and rough edges, and in particular callout coordinate calculation is completely wrong. However even in this state benchplot was used to prepare the graphs in http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html . Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/078c9ac3 X move benchlib to -> https://lab.nexedi.com/kirr/pygolang lab.nexedi.com/kirr/neo/commit/0edd5129 X benchplot: Teach it to understand benchmark names for partitioned NEO clusters lab.nexedi.com/kirr/neo/commit/a1dde3c9 X deco-rio timings lab.nexedi.com/kirr/neo/commit/916782b6 X normalize/convert units, so that disk and ping/tcp latencies could be plotted too lab.nexedi.com/kirr/neo/commit/f5fec740 X switch node info to labels; start adding that to plot lab.nexedi.com/kirr/neo/commit/906462a3 X neotest: Move cluster / node out fro benchmark name to label in environment lab.nexedi.com/kirr/neo/commit/cceca65f X benchplot: Start of automated plotting for neotest benchmark data lab.nexedi.com/kirr/neo/commit/a9b10a45 X benchlib/benchstat: Emit label:value info for several labels on one line, similary to go version lab.nexedi.com/kirr/neo/commit/502d9477 X benchlib: Python module to read & work with data in Go benchmark format (*) benchlib.py is now part of pygolang: https://pypi.org/project/pygolang .
-
Kirill Smelkov authored
These commands do full benchmarking for localhost and networked cases: - show system info - do server & client cpu benchmarks - do server disk benchmarks - for networked case: do network benchmarks - tail to either zbench-local or zbench-cluster It was full `neotest bench-local` that was used to prepare benchmarks for http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html
-
Kirill Smelkov authored
name old time/op new time/op delta unzlib/py/wczdata 20.8µs ± 2% 20.7µs ± 1% ~ (p=0.421 n=5+5) unzlib/go/wczdata 64.4µs ± 1% 21.3µs ± 0% -66.89% (p=0.008 n=5+5) unzlib/py/prod1-avg 4.00µs ± 1% 4.02µs ± 1% ~ (p=0.421 n=5+5) unzlib/go/prod1-avg 10.4µs ± 1% 4.3µs ± 1% -58.72% (p=0.008 n=5+5) There is also unsafe interface with czlib.UnsafeDecompress & friends which I had not tried because even using safe interface brings ~ 3x speedup.
-
Kirill Smelkov authored
name old time/op new time/op delta unzlib/py/wczdata 20.7µs ± 2% 20.8µs ± 2% ~ (p=0.548 n=5+5) unzlib/go/wczdata 70.6µs ± 0% 64.4µs ± 1% -8.85% (p=0.008 n=5+5) unzlib/py/prod1-avg 4.02µs ± 1% 4.00µs ± 1% ~ (p=0.167 n=5+5) unzlib/go/prod1-avg 15.2µs ± 0% 10.4µs ± 1% -31.59% (p=0.008 n=5+5) still on wczdata and prod1 much slower compared to py/c zlib.
-
Kirill Smelkov authored
NEO uses zlib compression for data, and this way client has to spend time decompressing it. Benchmark how much time zlib decompression takes. With stdlib zlib decompressor out of the box it looks like: name time/op unzlib/py/wczdata 20.7µs ± 2% unzlib/go/wczdata 70.6µs ± 0% unzlib/py/prod1-avg 4.02µs ± 1% unzlib/go/prod1-avg 15.2µs ± 0% i.e. much not in favour of Go. We'll be fixing that in the following patches.
-
Kirill Smelkov authored
With zwrk for ZODB being similar to what wrk is for HTTP. Rationale: simulating multiple clients is: 1. noisy - the timings from run to run are changing sometimes up to 50% 2. with significant additional overhead - there are constant OS-level process switches in between client processes and this prevents to actually create the load. 3. the above load from "2" actually takes resources from the server in localhost case. So let's switch to simulating many requests in lightweight way similarly to how it is done in wrk - in one process and not so many threads (it can be just 1) with many connections opened to server and epolly way to load it with Go providing epoll-goroutine matching. Example summarized zbench-local output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.txt name time/object cluster:rio dataset:wczblk1-8 fs1-zhash.py 23.7µs ± 5% fs1-zhash.go 5.68µs ± 8% fs1-zhash.go+prefetch128 6.44µs ±16% zeo/py/fs1-zhash.py 376µs ± 4% zeo/py/fs1-zhash.go 130µs ± 3% zeo/py/fs1-zhash.go+prefetch128 72.3µs ± 4% neo/py(!log)/sqlite·P1-zhash.py 565µs ± 4% neo/py(!log)/sql·P1-zhash.py 491µs ± 8% cluster:rio dataset:prod1-1024 fs1-zhash.py 19.5µs ± 2% fs1-zhash.go 3.92µs ±12% fs1-zhash.go+prefetch128 4.42µs ± 6% zeo/py/fs1-zhash.py 365µs ± 9% zeo/py/fs1-zhash.go 120µs ± 1% zeo/py/fs1-zhash.go+prefetch128 68.4µs ± 3% neo/py(!log)/sqlite·P1-zhash.py 560µs ± 5% neo/py(!log)/sql·P1-zhash.py 482µs ± 8% name req/s cluster:rio dataset:wczblk1-8 fs1-zwrk.go·1 380k ± 2% fs1-zwrk.go·2 666k ± 3% fs1-zwrk.go·3 948k ± 1% fs1-zwrk.go·4 1.24M ± 1% fs1-zwrk.go·8 1.62M ± 0% fs1-zwrk.go·12 1.70M ± 0% fs1-zwrk.go·16 1.71M ± 0% zeo/py/fs1-zwrk.go·1 8.29k ± 1% zeo/py/fs1-zwrk.go·2 10.4k ± 2% zeo/py/fs1-zwrk.go·3 11.2k ± 1% zeo/py/fs1-zwrk.go·4 11.7k ± 1% zeo/py/fs1-zwrk.go·8 12.1k ± 2% zeo/py/fs1-zwrk.go·12 12.3k ± 1% zeo/py/fs1-zwrk.go·16 12.3k ± 2% cluster:rio dataset:prod1-1024 fs1-zwrk.go·1 594k ± 7% fs1-zwrk.go·2 1.14M ± 4% fs1-zwrk.go·3 1.60M ± 2% fs1-zwrk.go·4 2.09M ± 1% fs1-zwrk.go·8 2.74M ± 1% fs1-zwrk.go·12 2.76M ± 0% fs1-zwrk.go·16 2.76M ± 1% zeo/py/fs1-zwrk.go·1 9.42k ± 9% zeo/py/fs1-zwrk.go·2 10.4k ± 1% zeo/py/fs1-zwrk.go·3 11.4k ± 1% zeo/py/fs1-zwrk.go·4 11.7k ± 2% zeo/py/fs1-zwrk.go·8 12.4k ± 1% zeo/py/fs1-zwrk.go·12 12.5k ± 1% zeo/py/fs1-zwrk.go·16 13.4k ±11% name latency-time/object cluster:rio dataset:wczblk1-8 fs1-zwrk.go·1 2.63µs ± 2% fs1-zwrk.go·2 3.00µs ± 3% fs1-zwrk.go·3 3.16µs ± 1% fs1-zwrk.go·4 3.23µs ± 1% fs1-zwrk.go·8 4.94µs ± 0% fs1-zwrk.go·12 7.06µs ± 0% fs1-zwrk.go·16 9.36µs ± 0% zeo/py/fs1-zwrk.go·1 121µs ± 1% zeo/py/fs1-zwrk.go·2 192µs ± 2% zeo/py/fs1-zwrk.go·3 267µs ± 1% zeo/py/fs1-zwrk.go·4 343µs ± 1% zeo/py/fs1-zwrk.go·8 660µs ± 2% zeo/py/fs1-zwrk.go·12 977µs ± 1% zeo/py/fs1-zwrk.go·16 1.30ms ± 2% cluster:rio dataset:prod1-1024 fs1-zwrk.go·1 1.69µs ± 7% fs1-zwrk.go·2 1.76µs ± 4% fs1-zwrk.go·3 1.88µs ± 2% fs1-zwrk.go·4 1.91µs ± 1% fs1-zwrk.go·8 2.92µs ± 1% fs1-zwrk.go·12 4.34µs ± 0% fs1-zwrk.go·16 5.80µs ± 1% zeo/py/fs1-zwrk.go·1 107µs ± 9% zeo/py/fs1-zwrk.go·2 192µs ± 1% zeo/py/fs1-zwrk.go·3 263µs ± 1% zeo/py/fs1-zwrk.go·4 342µs ± 2% zeo/py/fs1-zwrk.go·8 648µs ± 1% zeo/py/fs1-zwrk.go·12 957µs ± 1% zeo/py/fs1-zwrk.go·16 1.20ms ±10% The scalability graphs in http://navytux.spb.ru/~kirr/neo.html were made with simulating client load by zwrk, not many client OS processes. http://navytux.spb.ru/~kirr/neo.html#performance-tests has some additional notes on zwrk. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/ca0d828b X neotest: Tzwrk1 - place to control running time of 1 zwrk iteration lab.nexedi.com/kirr/neo/commit/bbfb5006 X zwrk: Make sure we warm up connections to all NEO storages when cluster is partitioned lab.nexedi.com/kirr/neo/commit/7f22bba6 X zwrk: New tool to simulate paralell load from multiple clients
-
Kirill Smelkov authored
zodb/go provides generic cache (see 7233b4c0 "zodb/go: In-RAM client cache") primarily in order for prefetch to work. However if we need to benchmark a storage with loading some objects several times, this cache can hide the actual time it takes for an object to load. For such use cases add NoCache open option so that opening does not create a cache and always conveys load operations directly to storage driver. The option will be used by zwrk tool (see next patch).
-
Kirill Smelkov authored
Add to neotest zbench-local and zbench-cluster commands that perform ZODB benchmarks on FileStorage, ZEO and NEO with Python and Go clients either locally, or with a server and client running on 2 different nodes. There are 2 client programs: tzodb.py and tzodb.go which for now compute hash of whole latest objects stream in a ZODB database. On server side neotest is taught to launch ZEO and various NEO clusters and to execute client load on them. Two test datasets are used: wczblk1-8 - the dataset with wendelin.core ZBlk1 objects covering 8M array, and prod1-1024 - synthethic dataset that tries to represent regular ERP5 instance. Both datasets are very small and so we can assume they reside completely in server disk cache while running benchmarks. Benchmark timings will thus give pure storage software processing latency, as pagecache hit time is on par, or less, to 1µs. Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest zbench-local dataset: wczblk1-8 node: cluster: deco *** generating fs1 data... I: RAM: 7.47GB I: WORK: 0.01GB gen signal t=0...1.05e+06 float64 (= 0.01GB) gen signal blk [0:1048576] (100.0%) VIRT: 297 MB RSS: 48MB *** generating sqlite data... I: RAM: 7.47GB I: WORK: 0.01GB gen signal t=0...1.05e+06 float64 (= 0.01GB) gen signal blk [0:1048576] (100.0%) VIRT: 386 MB RSS: 58MB 2018-07-10 19:57:35.7065 ERROR NEO [ app: 91] primary master is down Cluster state changed *** generating sql data... 2018-07-10 19:57:35 140115116649600 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27574 ... 2018-07-10 19:57:39 140205509999744 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27603 ... 2018-07-10 19:57:42 139692109810816 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27633 ... 2018-07-10 19:57:45 139759221546112 [Note] mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27662 ... I: RAM: 7.47GB I: WORK: 0.01GB gen signal t=0...1.05e+06 float64 (= 0.01GB) gen signal blk [0:1048576] (100.0%) VIRT: 387 MB RSS: 59MB 2018-07-10 19:57:48.2565 ERROR NEO [ app: 91] primary master is down Cluster state changed *** FileStorage Benchmarkfs1-zhash.py 2127 16.3 µs/object # crc32:14640593 nread=8540363 t=0.035s # POLL·2 C1·73 C1E·38 C3·12 C6·36 C7s·0 C8·112 C9·0 C10·62 Benchmarkfs1-zhash.py 2127 16.6 µs/object # crc32:14640593 nread=8540363 t=0.035s # POLL·0 C1·113 C1E·21 C3·16 C6·56 C7s·0 C8·136 C9·0 C10·41 Benchmarkfs1-zhash.py 2127 15.9 µs/object # crc32:14640593 nread=8540363 t=0.034s # POLL·0 C1·71 C1E·36 C3·22 C6·50 C7s·0 C8·167 C9·0 C10·47 Benchmarkfs1-zhash.py 2127 15.9 µs/object # crc32:14640593 nread=8540363 t=0.034s # POLL·0 C1·77 C1E·32 C3·11 C6·55 C7s·0 C8·184 C9·0 C10·31 Benchmarkfs1-zhash.py 2127 16.0 µs/object # crc32:14640593 nread=8540363 t=0.034s # POLL·0 C1·78 C1E·15 C3·12 C6·51 C7s·0 C8·140 C9·0 C10·44 # 16 clients in parallel Benchmarkfs1-zhash.py·P16 2127 129.0 µs/object # crc32:14640593 nread=8540363 t=0.274s Benchmarkfs1-zhash.py·P16 2127 132.6 µs/object # crc32:14640593 nread=8540363 t=0.282s Benchmarkfs1-zhash.py·P16 2127 135.0 µs/object # crc32:14640593 nread=8540363 t=0.287s Benchmarkfs1-zhash.py·P16 2127 135.3 µs/object # crc32:14640593 nread=8540363 t=0.288s Benchmarkfs1-zhash.py·P16 2127 136.6 µs/object # crc32:14640593 nread=8540363 t=0.291s Benchmarkfs1-zhash.py·P16 2127 122.8 µs/object # crc32:14640593 nread=8540363 t=0.261s Benchmarkfs1-zhash.py·P16 2127 130.9 µs/object # crc32:14640593 nread=8540363 t=0.279s Benchmarkfs1-zhash.py·P16 2127 126.4 µs/object # crc32:14640593 nread=8540363 t=0.269s Benchmarkfs1-zhash.py·P16 2127 125.8 µs/object # crc32:14640593 nread=8540363 t=0.268s Benchmarkfs1-zhash.py·P16 2127 108.3 µs/object # crc32:14640593 nread=8540363 t=0.230s Benchmarkfs1-zhash.py·P16 2127 131.0 µs/object # crc32:14640593 nread=8540363 t=0.279s Benchmarkfs1-zhash.py·P16 2127 124.1 µs/object # crc32:14640593 nread=8540363 t=0.264s Benchmarkfs1-zhash.py·P16 2127 129.3 µs/object # crc32:14640593 nread=8540363 t=0.275s Benchmarkfs1-zhash.py·P16 2127 125.0 µs/object # crc32:14640593 nread=8540363 t=0.266s Benchmarkfs1-zhash.py·P16 2127 131.5 µs/object # crc32:14640593 nread=8540363 t=0.280s Benchmarkfs1-zhash.py·P16 2127 131.4 µs/object # crc32:14640593 nread=8540363 t=0.280s # POLL·0 C1·4 C1E·13 C3·11 C6·79 C7s·0 C8·14 C9·0 C10·0 ... And its summary via benchstat: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.log name time/object cluster:deco dataset:wczblk1-8 fs1-zhash.py 16.1µs ± 3% fs1-zhash.py·P16 130µs ± 5% fs1-zhash.go 3.00µs ±10% fs1-zhash.go+prefetch128 3.40µs ±18% fs1-zhash.go·P16 10.2µs ±71% zeo/py/fs1-zhash.py 336µs ± 3% zeo/py/fs1-zhash.py·P16 3.22ms ± 6% zeo/py/fs1-zhash.go 112µs ± 2% zeo/py/fs1-zhash.go+prefetch128 60.9µs ± 1% zeo/py/fs1-zhash.go·P16 1.07ms ± 5% neo/py(!log)/sqlite·P1-zhash.py 291µs ± 2% neo/py(!log)/sqlite·P1-zhash.py·P16 2.86ms ± 1% neo/py(!log)/sql·P1-zhash.py 318µs ± 4% neo/py(!log)/sql·P1-zhash.py·P16 3.99ms ± 0% cluster:deco dataset:prod1-1024 fs1-zhash.py 12.3µs ± 1% fs1-zhash.py·P16 106µs ±10% fs1-zhash.go 2.56µs ±10% fs1-zhash.go+prefetch128 2.68µs ± 8% fs1-zhash.go·P16 9.48µs ±43% zeo/py/fs1-zhash.py 319µs ± 3% zeo/py/fs1-zhash.py·P16 3.13ms ± 3% zeo/py/fs1-zhash.go 101µs ± 5% zeo/py/fs1-zhash.go+prefetch128 56.9µs ± 1% zeo/py/fs1-zhash.go·P16 1.19ms ± 4% neo/py(!log)/sqlite·P1-zhash.py 281µs ± 3% neo/py(!log)/sqlite·P1-zhash.py·P16 2.80ms ± 1% neo/py(!log)/sql·P1-zhash.py 316µs ± 1% neo/py(!log)/sql·P1-zhash.py·P16 3.91ms ± 1% Since there is no NEO/go support yet, corresponding neotest parts are merged, but commented-out with appropriate remark. Parallel access is simulated with spawning many OS processes for now. This will change in the nearby followup patch to zwrk. Results of ZODB benchmarking were discussed in http://navytux.spb.ru/~kirr/neo.html#performance-tests , and http://navytux.spb.ru/~kirr/neo.html#results-and-discussion Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/e0d875bc X neotest: Teach it to benchmark NEO with storage partitioned to several nodes lab.nexedi.com/kirr/neo/commit/590f0a46 X neo/py uses n(replica) as n(real-replica) - 1 lab.nexedi.com/kirr/neo/commit/b655da26 X save time not benchmarking things we do not show lab.nexedi.com/kirr/neo/commit/f834f40d X zhash: Show N(obj) read, not 1, in place of N(iter) lab.nexedi.com/kirr/neo/commit/a16e8d52 X teach golang to access ZEO lab.nexedi.com/kirr/neo/commit/b9827725 X switch to using no compression, because this way it is more fair for comparing storage latencies lab.nexedi.com/kirr/neo/commit/c0067335 X neotest: Don't depend on killall lab.nexedi.com/kirr/neo/commit/2bcd6ebb X neotest: add zbench-local & zbench-cluster subcomands lab.nexedi.com/kirr/neo/commit/fb165ad9 X neotest: Also benchmark NEO/py with logging disabled lab.nexedi.com/kirr/neo/commit/2118ba38 X neotest: Help mysqlk_install_db find its basedir under SlapOS lab.nexedi.com/kirr/neo/commit/80eaa05e X zgenprod1 tool lab.nexedi.com/kirr/neo/commit/eb0e516f X check hash result and error if mismatch (zhash.* part); neotest part pending lab.nexedi.com/kirr/neo/commit/046370db X benchify rest of bench-cluster lab.nexedi.com/kirr/neo/commit/2d13818e X bench-local + zhash: Add output in std bench format lab.nexedi.com/kirr/neo/commit/1d692a3b X add NEO/go with SHA1 disabled (both Sgo and Cgo to regular benchmarks)
-
Kirill Smelkov authored
Add to neotest bench-net command that performs latency measurments at ping and TCP levels. Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-net neotest@rio:9 node: cluster: deco-rio *** link latency: # deco ⇄ rio (ping 16B) PING rio (192.168.0.8) 16(44) bytes of data. --- rio ping statistics --- 25705 packets transmitted, 25705 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.080/0.097/0.220/0.011 ms, ipg/ewma 0.116/0.095 ms Benchmarkpingrtt-/16B-min 1 0.080 ms/op Benchmarkpingrtt-/16B-avg 1 0.097 ms/op # POLL·3 C1·476 C1E·60917 C3·53 C6·132 C7s·0 C8·203 C9·0 C10·141 ... *** TCP latency: # deco ⇄ rio (lat_tcp.c 1B -> lat_tcp.c -s) Benchmarktcprtt(c_c)-/1B 1 116.1743 µs/op # TCP latency using rio: 116.1743 microseconds # POLL·6 C1·892 C1E·65748 C3·80 C6·165 C7s·0 C8·339 C9·0 C10·444 Benchmarktcprtt(c_c)-/1B 1 117.2896 µs/op # TCP latency using rio: 117.2896 microseconds # POLL·4 C1·1063 C1E·67647 C3·64 C6·77 C7s·0 C8·144 C9·0 C10·209 Benchmarktcprtt(c_c)-/1B 1 117.5331 µs/op # TCP latency using rio: 117.5331 microseconds # POLL·1 C1·954 C1E·76866 C3·96 C6·88 C7s·0 C8·206 C9·0 C10·246 Benchmarktcprtt(c_c)-/1B 1 117.6509 µs/op # TCP latency using rio: 117.6509 microseconds # POLL·4 C1·731 C1E·84210 C3·103 C6·93 C7s·0 C8·180 C9·0 C10·187 Benchmarktcprtt(c_c)-/1B 1 116.8125 µs/op # TCP latency using rio: 116.8125 microseconds # POLL·9 C1·550 C1E·79544 C3·110 C6·213 C7s·0 C8·508 C9·0 C10·475 ... And its summary via benchstat: name time/op pingrtt-/16B-min 80.0µs ± 0% pingrtt-/16B-avg 97.0µs ± 0% -pingrtt/16B-min 79.0µs ± 0% -pingrtt/16B-avg 112µs ± 0% pingrtt-/1452B-min 241µs ± 0% pingrtt-/1452B-avg 303µs ± 0% -pingrtt/1452B-min 266µs ± 0% -pingrtt/1452B-avg 303µs ± 0% tcprtt(c_c)-/1B 117µs ± 1% tcprtt(c_go)-/1B 122µs ± 2% -tcprtt(c_c)/1B 117µs ± 1% -tcprtt(c_go)/1B 121µs ± 5% tcprtt(c_c)-/1400B 392µs ± 4% tcprtt(c_go)-/1400B 363µs ±18% -tcprtt(c_c)/1400B 412µs ±21% -tcprtt(c_go)/1400B 391µs ±38% tcprtt(c_c)-/1500B 271µs ±18% tcprtt(c_go)-/1500B 290µs ±21% -tcprtt(c_c)/1500B 282µs ±16% -tcprtt(c_go)/1500B 334µs ±24% tcprtt(c_c)-/4096B 711µs ± 5% tcprtt(c_go)-/4096B 737µs ± 5% -tcprtt(c_c)/4096B 740µs ± 2% -tcprtt(c_go)/4096B 711µs ± 7% Latencies here are not good because for this run on rio interrupt mitigation was not tuned (see below). By the way, analyzing ping RTT latencies on our shuttle machines (similar to rio) resulted in the following kernel patch https://git.kernel.org/linus/509708310c (released with Linux 4.15) to fix/being able to adjust interrupt mitigation on Realtek NICs. While at networking topic, teach info/info-local to show related information about node's NICs. Example lines output for deco: nic/eth0: Intel Corporation Ethernet Connection I219-LM rev 21 nic/eth0/features: rx tx sg tso !ufo gso gro !lro rxvlan txvlan !ntuple rxhash ... nic/eth0/coalesce: rxc: 3μs/0f/0μs-irq/0f-irq, txc: 0μs/0f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs nic/wlan0: Intel Corporation Wireless 8260 rev 3a nic/wlan0/features: !rx !tx sg !tso !ufo gso gro !lro !rxvlan !txvlan !ntuple !rxhash ... nic/wlan0/coalesce: rxc: ?, txc: ? nic/wlan0/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/wlan0: TSO not enabled - TCP latency with packets > MSS will be poor for rio: nic/eth0: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth0/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth0/coalesce: rxc: 200μs/4f/0μs-irq/0f-irq, txc: 200μs/4f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth0: TSO not enabled - TCP latency with packets > MSS will be poor WARNING: nic/eth0: RX coalesce latency is max 200μs - that will add to networked request-reply latency nic/eth1: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth1/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth1/coalesce: rxc: 0μs/1f/0μs-irq/0f-irq, txc: 0μs/1f/0μs-irq/0f-irq nic/eth1/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth1: TSO not enabled - TCP latency with packets > MSS will be poor The warning about "RX coalesce latency is max 200μs ..." says that on receive path eth0 will be coalescing incoming frames for up to 200μs and this way this delay will be added to overal latency. (for small frames Realtek NICs do not coalesce interrupts - see details in the kernel patch). Networked performance (raw and NEO) was not discussed in http://navytux.spb.ru/~kirr/neo.html at all, but for the reference the importance of C-states for performance was first found via this networking latency benchmarks. Links on C-states topic: http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/e8e395ae X neotest: Move network benchmarking into separate function + add `neotest bench-net` lab.nexedi.com/kirr/neo/commit/a971231c X neotest/info: Handle USB NICs lab.nexedi.com/kirr/neo/commit/5dd3d1ab X neotest: sort NIC names lab.nexedi.com/kirr/neo/commit/9888f047 X neotest: Do not crash if kernel is too old to support gro_flush_timeout lab.nexedi.com/kirr/neo/commit/3a1bdf4a X bench-remote / tcp : std benchmark output lab.nexedi.com/kirr/neo/commit/9450b6db X bench-remote / ping += std bench output lab.nexedi.com/kirr/neo/commit/68d5b015 X show gro_flush_timeout + friends lab.nexedi.com/kirr/neo/commit/4c815af9 X neotest: Show NIC features and emit warning if !TSO lab.nexedi.com/kirr/neo/commit/659ce938 X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc... lab.nexedi.com/kirr/neo/commit/ded384cb X neotest += `lat_tcp.go -s` lab.nexedi.com/kirr/neo/commit/59d46504 X neotest += lat_tcp lab.nexedi.com/kirr/neo/commit/67fc3440 X show small (56B) and full-packet (1472B) ping link latencies
-
Kirill Smelkov authored
Add to neotest bench-disk command that performs random-read disk benchmarks via ioping. Example output: (venv) (8) neotest@rio:~/8/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-disk node: rio cluster: *** disk: random direct (no kernel cache) 4K-read latency --- . (ext4 /dev/sda1) ioping statistics --- 29.1 k requests completed in 2.95 s, 113.6 MiB read, 9.85 k iops, 38.5 MiB/s generated 29.1 k requests in 3.00 s, 113.6 MiB, 9.69 k iops, 37.9 MiB/s min/avg/max/mdev = 43.2 us / 101.5 us / 250.0 us / 7.48 us Benchmarkdisk/randread/direct/4K-min 1 43.2 us/op Benchmarkdisk/randread/direct/4K-avg 1 101.5 us/op < 59.2 us 458 | < 63.0 us 0 | < 66.7 us 0 | < 70.5 us 0 | < 74.2 us 1 | < 78.0 us 1 | < 81.7 us 0 | < 85.5 us 1 | < 89.2 us 0 | < 93.0 us 0 | < 96.7 us 0 | < 100.5 us 333 | < 104.2 us 27793 | *********************************************** < 108.0 us 259 | < 111.7 us 21 | < 115.5 us 8 | < 119.2 us 18 | < 123.0 us 1 | < 126.7 us 0 | < 130.5 us 7 | < 134.2 us 59 | < +∞ 18 | # POLL·186 C1·291360 C1E·290802 C3·31 C6·1218 ... *** disk: random cached 4K-read latency --- . (ext4 /dev/sda1) ioping statistics --- 3.15 M requests completed in 2.82 s, 12.0 GiB read, 1.12 M iops, 4.26 GiB/s generated 3.15 M requests in 3.00 s, 12.0 GiB, 1.05 M iops, 4.00 GiB/s min/avg/max/mdev = 465 ns / 896 ns / 37.4 us / 183 ns Benchmarkdisk/randread/pagecache/4K-min 1 465 ns/op Benchmarkdisk/randread/pagecache/4K-avg 1 896 ns/op < 839 ns 771375 | ************ < 872 ns 609361 | ********* < 905 ns 660635 | ********** < 938 ns 505305 | ******** < 971 ns 189182 | *** < 1.00 us 93655 | * < 1.04 us 70811 | * < 1.07 us 57650 | < 1.10 us 51587 | < 1.14 us 44648 | < 1.17 us 40868 | < 1.20 us 27301 | < 1.24 us 12503 | < 1.27 us 5580 | < 1.30 us 2517 | < 1.34 us 1404 | < 1.37 us 698 | < 1.40 us 378 | < 1.44 us 208 | < 1.47 us 119 | < 1.50 us 50 | < +∞ 978 | # POLL·1 C1·57 C1E·11 C3·2 C6·257 The benchmarks are so done so that output conforms to Go benchmarking format. This way it is possible to process the output with benchstat to summarize / compare results. For above run summarization gives: name time/op disk/randread/direct/4K-min 39.0µs ±12% disk/randread/direct/4K-avg 117µs ± 3% disk/randread/pagecache/4K-min 461ns ± 3% disk/randread/pagecache/4K-avg 880ns ± 0% While at disk topic, teach info/info-local to show related information about node's disk. Example line output for rio: disk/sda: Samsung SSD 840 rev BB0Q 931.5G Please see http://navytux.spb.ru/~kirr/neo.html#the-need-for-faster-storage http://navytux.spb.ru/~kirr/neo.html#appendix-i-ssd-latency for some discussion about SSD performance. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/d35a2fdf X neotest/info-local: Fix disk display in presence of bind-mounts lab.nexedi.com/kirr/neo/commit/44529dbf X neotest/bench-disk: Deduplicate code; change 1M -> 2M lab.nexedi.com/kirr/neo/commit/8bac3dba X neotest/bench-disk: Also benchmark randomly reading 1M blocks lab.nexedi.com/kirr/neo/commit/e795c6ed X neotest: Fix disk display in case of DM lab.nexedi.com/kirr/neo/commit/352cd100 X neotest: Fix disk display in case of MD lab.nexedi.com/kirr/neo/commit/cd2cd093 X bench_disk: Add std bench format lab.nexedi.com/kirr/neo/commit/9f86eb40 X bench += ioping
-