- 20 Jul, 2018 11 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* master: go/zodb: Loader: Document that loaded buffer must not be modified go/zodb: Split Loader & co. interfaces from IStorage go/zodb: Document that Tid(0) is invalid while Oid(0) is valid. fixup! go/zodb: Tweak documentation a bit so it renders more well in godoc
-
Kirill Smelkov authored
The reason for this is that the buffer can be shared with other loaders via cache.
-
Kirill Smelkov authored
There are situations when we want to work with, or emulate, only subset of whole IStorage functionality, with one already established example being Cache. For this reason split IStorage interface into finer-grained ones. For now: - Loader - Prefetcher - Iterator - Committer - Notifier
-
Kirill Smelkov authored
Oid(0) represents root database object. We cannot change this.
-
Kirill Smelkov authored
Missed few places in 8685b742.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 13 Jul, 2018 3 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 12 Jul, 2018 3 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* master: (29 commits) go/zodb: Tweak documentation a bit so it renders more well in godoc go/neo/t/nxd/runTestSuite: Also run go & py unit tests in addition to bench-local go/neo/t/nxd/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally go/neo/t/nxd/runTestSuite: New wrapper to run neotest as part of Nexedi testing infrastructure go/neo/t/benchplot: New program to visualise neotest benchmarks (draft) go/neo/t/neotest: Add bench-local and bench-cluster commands go/internal/xzlib: Switch to github.com/DataDog/czlib to zlib Decompression go/internal/xzlib: Try to reuse zlib decoders go/neo/t/neotest: bench-cpu += unzlib for wczblk1 and prod1 objects go/neo/t/neotest: Switch to zwrk to simulate parallel load from multiple clients go/zodb: Allow to open a storage in "direct" mode - without local cache go/neo/t/neotest: ZODB benchmarks go/neo/t/neotest: Network information & benchmarks go/neo/t/neotest: Disk information & benchmarks go/neo/t/neotest: CPU information & benchmarks go/neo/t: Neotest start go/zodb/zeo: Add support for protocols preceding ZEO5 go/zodb: Teach ZODB/go to access ZEO (draft) go/zodb: Start putting pickle-related utilities into internal/pickletools package go/neo/neonet: Lightweight mode ...
-
Kirill Smelkov authored
-
- 11 Jul, 2018 21 commits
-
-
Kirill Smelkov authored
While at it add draft overview of data model & friends to package documentation.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
We send output from tested process to master. We also print it to stdout,stderr so it appears in testnode logs. However till now it was like, whole output first read, and only then emitted to log as a whole, thus not allowing to oversee current test progress by watching testnode log tail. Fix it by implementing the teeing process manually. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/aa370ca3 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally lab.nexedi.com/kirr/neo/commit/096550b1 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally lab.nexedi.com/kirr/neo/commit/63956f43 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally lab.nexedi.com/kirr/neo/commit/b9819d0e X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
-
Kirill Smelkov authored
The wrapper runs `neotest bench-local` in neotest SlapOS instance: https://lab.nexedi.com/nexedi/slapos/blob/ab8705f4/software/neotest/instance.cfg.in Extracted from nexedi/slapos!282
-
Kirill Smelkov authored
Add the program that reads results from either bench-local or bench-cluster neotest output and visualizes it. It uses benchlib.py module to read data in Go benchmark format(*), processes them and plots scalability and other graphs via matplotlib. There are lots of hacks and rough edges, and in particular callout coordinate calculation is completely wrong. However even in this state benchplot was used to prepare the graphs in http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html . Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/078c9ac3 X move benchlib to -> https://lab.nexedi.com/kirr/pygolang lab.nexedi.com/kirr/neo/commit/0edd5129 X benchplot: Teach it to understand benchmark names for partitioned NEO clusters lab.nexedi.com/kirr/neo/commit/a1dde3c9 X deco-rio timings lab.nexedi.com/kirr/neo/commit/916782b6 X normalize/convert units, so that disk and ping/tcp latencies could be plotted too lab.nexedi.com/kirr/neo/commit/f5fec740 X switch node info to labels; start adding that to plot lab.nexedi.com/kirr/neo/commit/906462a3 X neotest: Move cluster / node out fro benchmark name to label in environment lab.nexedi.com/kirr/neo/commit/cceca65f X benchplot: Start of automated plotting for neotest benchmark data lab.nexedi.com/kirr/neo/commit/a9b10a45 X benchlib/benchstat: Emit label:value info for several labels on one line, similary to go version lab.nexedi.com/kirr/neo/commit/502d9477 X benchlib: Python module to read & work with data in Go benchmark format (*) benchlib.py is now part of pygolang: https://pypi.org/project/pygolang .
-
Kirill Smelkov authored
These commands do full benchmarking for localhost and networked cases: - show system info - do server & client cpu benchmarks - do server disk benchmarks - for networked case: do network benchmarks - tail to either zbench-local or zbench-cluster It was full `neotest bench-local` that was used to prepare benchmarks for http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html
-
Kirill Smelkov authored
name old time/op new time/op delta unzlib/py/wczdata 20.8µs ± 2% 20.7µs ± 1% ~ (p=0.421 n=5+5) unzlib/go/wczdata 64.4µs ± 1% 21.3µs ± 0% -66.89% (p=0.008 n=5+5) unzlib/py/prod1-avg 4.00µs ± 1% 4.02µs ± 1% ~ (p=0.421 n=5+5) unzlib/go/prod1-avg 10.4µs ± 1% 4.3µs ± 1% -58.72% (p=0.008 n=5+5) There is also unsafe interface with czlib.UnsafeDecompress & friends which I had not tried because even using safe interface brings ~ 3x speedup.
-
Kirill Smelkov authored
name old time/op new time/op delta unzlib/py/wczdata 20.7µs ± 2% 20.8µs ± 2% ~ (p=0.548 n=5+5) unzlib/go/wczdata 70.6µs ± 0% 64.4µs ± 1% -8.85% (p=0.008 n=5+5) unzlib/py/prod1-avg 4.02µs ± 1% 4.00µs ± 1% ~ (p=0.167 n=5+5) unzlib/go/prod1-avg 15.2µs ± 0% 10.4µs ± 1% -31.59% (p=0.008 n=5+5) still on wczdata and prod1 much slower compared to py/c zlib.
-
Kirill Smelkov authored
NEO uses zlib compression for data, and this way client has to spend time decompressing it. Benchmark how much time zlib decompression takes. With stdlib zlib decompressor out of the box it looks like: name time/op unzlib/py/wczdata 20.7µs ± 2% unzlib/go/wczdata 70.6µs ± 0% unzlib/py/prod1-avg 4.02µs ± 1% unzlib/go/prod1-avg 15.2µs ± 0% i.e. much not in favour of Go. We'll be fixing that in the following patches.
-
Kirill Smelkov authored
With zwrk for ZODB being similar to what wrk is for HTTP. Rationale: simulating multiple clients is: 1. noisy - the timings from run to run are changing sometimes up to 50% 2. with significant additional overhead - there are constant OS-level process switches in between client processes and this prevents to actually create the load. 3. the above load from "2" actually takes resources from the server in localhost case. So let's switch to simulating many requests in lightweight way similarly to how it is done in wrk - in one process and not so many threads (it can be just 1) with many connections opened to server and epolly way to load it with Go providing epoll-goroutine matching. Example summarized zbench-local output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.txt name time/object cluster:rio dataset:wczblk1-8 fs1-zhash.py 23.7µs ± 5% fs1-zhash.go 5.68µs ± 8% fs1-zhash.go+prefetch128 6.44µs ±16% zeo/py/fs1-zhash.py 376µs ± 4% zeo/py/fs1-zhash.go 130µs ± 3% zeo/py/fs1-zhash.go+prefetch128 72.3µs ± 4% neo/py(!log)/sqlite·P1-zhash.py 565µs ± 4% neo/py(!log)/sql·P1-zhash.py 491µs ± 8% cluster:rio dataset:prod1-1024 fs1-zhash.py 19.5µs ± 2% fs1-zhash.go 3.92µs ±12% fs1-zhash.go+prefetch128 4.42µs ± 6% zeo/py/fs1-zhash.py 365µs ± 9% zeo/py/fs1-zhash.go 120µs ± 1% zeo/py/fs1-zhash.go+prefetch128 68.4µs ± 3% neo/py(!log)/sqlite·P1-zhash.py 560µs ± 5% neo/py(!log)/sql·P1-zhash.py 482µs ± 8% name req/s cluster:rio dataset:wczblk1-8 fs1-zwrk.go·1 380k ± 2% fs1-zwrk.go·2 666k ± 3% fs1-zwrk.go·3 948k ± 1% fs1-zwrk.go·4 1.24M ± 1% fs1-zwrk.go·8 1.62M ± 0% fs1-zwrk.go·12 1.70M ± 0% fs1-zwrk.go·16 1.71M ± 0% zeo/py/fs1-zwrk.go·1 8.29k ± 1% zeo/py/fs1-zwrk.go·2 10.4k ± 2% zeo/py/fs1-zwrk.go·3 11.2k ± 1% zeo/py/fs1-zwrk.go·4 11.7k ± 1% zeo/py/fs1-zwrk.go·8 12.1k ± 2% zeo/py/fs1-zwrk.go·12 12.3k ± 1% zeo/py/fs1-zwrk.go·16 12.3k ± 2% cluster:rio dataset:prod1-1024 fs1-zwrk.go·1 594k ± 7% fs1-zwrk.go·2 1.14M ± 4% fs1-zwrk.go·3 1.60M ± 2% fs1-zwrk.go·4 2.09M ± 1% fs1-zwrk.go·8 2.74M ± 1% fs1-zwrk.go·12 2.76M ± 0% fs1-zwrk.go·16 2.76M ± 1% zeo/py/fs1-zwrk.go·1 9.42k ± 9% zeo/py/fs1-zwrk.go·2 10.4k ± 1% zeo/py/fs1-zwrk.go·3 11.4k ± 1% zeo/py/fs1-zwrk.go·4 11.7k ± 2% zeo/py/fs1-zwrk.go·8 12.4k ± 1% zeo/py/fs1-zwrk.go·12 12.5k ± 1% zeo/py/fs1-zwrk.go·16 13.4k ±11% name latency-time/object cluster:rio dataset:wczblk1-8 fs1-zwrk.go·1 2.63µs ± 2% fs1-zwrk.go·2 3.00µs ± 3% fs1-zwrk.go·3 3.16µs ± 1% fs1-zwrk.go·4 3.23µs ± 1% fs1-zwrk.go·8 4.94µs ± 0% fs1-zwrk.go·12 7.06µs ± 0% fs1-zwrk.go·16 9.36µs ± 0% zeo/py/fs1-zwrk.go·1 121µs ± 1% zeo/py/fs1-zwrk.go·2 192µs ± 2% zeo/py/fs1-zwrk.go·3 267µs ± 1% zeo/py/fs1-zwrk.go·4 343µs ± 1% zeo/py/fs1-zwrk.go·8 660µs ± 2% zeo/py/fs1-zwrk.go·12 977µs ± 1% zeo/py/fs1-zwrk.go·16 1.30ms ± 2% cluster:rio dataset:prod1-1024 fs1-zwrk.go·1 1.69µs ± 7% fs1-zwrk.go·2 1.76µs ± 4% fs1-zwrk.go·3 1.88µs ± 2% fs1-zwrk.go·4 1.91µs ± 1% fs1-zwrk.go·8 2.92µs ± 1% fs1-zwrk.go·12 4.34µs ± 0% fs1-zwrk.go·16 5.80µs ± 1% zeo/py/fs1-zwrk.go·1 107µs ± 9% zeo/py/fs1-zwrk.go·2 192µs ± 1% zeo/py/fs1-zwrk.go·3 263µs ± 1% zeo/py/fs1-zwrk.go·4 342µs ± 2% zeo/py/fs1-zwrk.go·8 648µs ± 1% zeo/py/fs1-zwrk.go·12 957µs ± 1% zeo/py/fs1-zwrk.go·16 1.20ms ±10% The scalability graphs in http://navytux.spb.ru/~kirr/neo.html were made with simulating client load by zwrk, not many client OS processes. http://navytux.spb.ru/~kirr/neo.html#performance-tests has some additional notes on zwrk. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/ca0d828b X neotest: Tzwrk1 - place to control running time of 1 zwrk iteration lab.nexedi.com/kirr/neo/commit/bbfb5006 X zwrk: Make sure we warm up connections to all NEO storages when cluster is partitioned lab.nexedi.com/kirr/neo/commit/7f22bba6 X zwrk: New tool to simulate paralell load from multiple clients
-
Kirill Smelkov authored
zodb/go provides generic cache (see 7233b4c0 "zodb/go: In-RAM client cache") primarily in order for prefetch to work. However if we need to benchmark a storage with loading some objects several times, this cache can hide the actual time it takes for an object to load. For such use cases add NoCache open option so that opening does not create a cache and always conveys load operations directly to storage driver. The option will be used by zwrk tool (see next patch).
-
Kirill Smelkov authored
Add to neotest zbench-local and zbench-cluster commands that perform ZODB benchmarks on FileStorage, ZEO and NEO with Python and Go clients either locally, or with a server and client running on 2 different nodes. There are 2 client programs: tzodb.py and tzodb.go which for now compute hash of whole latest objects stream in a ZODB database. On server side neotest is taught to launch ZEO and various NEO clusters and to execute client load on them. Two test datasets are used: wczblk1-8 - the dataset with wendelin.core ZBlk1 objects covering 8M array, and prod1-1024 - synthethic dataset that tries to represent regular ERP5 instance. Both datasets are very small and so we can assume they reside completely in server disk cache while running benchmarks. Benchmark timings will thus give pure storage software processing latency, as pagecache hit time is on par, or less, to 1µs. Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest zbench-local dataset: wczblk1-8 node: cluster: deco *** generating fs1 data... I: RAM: 7.47GB I: WORK: 0.01GB gen signal t=0...1.05e+06 float64 (= 0.01GB) gen signal blk [0:1048576] (100.0%) VIRT: 297 MB RSS: 48MB *** generating sqlite data... I: RAM: 7.47GB I: WORK: 0.01GB gen signal t=0...1.05e+06 float64 (= 0.01GB) gen signal blk [0:1048576] (100.0%) VIRT: 386 MB RSS: 58MB 2018-07-10 19:57:35.7065 ERROR NEO [ app: 91] primary master is down Cluster state changed *** generating sql data... 2018-07-10 19:57:35 140115116649600 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27574 ... 2018-07-10 19:57:39 140205509999744 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27603 ... 2018-07-10 19:57:42 139692109810816 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27633 ... 2018-07-10 19:57:45 139759221546112 [Note] mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27662 ... I: RAM: 7.47GB I: WORK: 0.01GB gen signal t=0...1.05e+06 float64 (= 0.01GB) gen signal blk [0:1048576] (100.0%) VIRT: 387 MB RSS: 59MB 2018-07-10 19:57:48.2565 ERROR NEO [ app: 91] primary master is down Cluster state changed *** FileStorage Benchmarkfs1-zhash.py 2127 16.3 µs/object # crc32:14640593 nread=8540363 t=0.035s # POLL·2 C1·73 C1E·38 C3·12 C6·36 C7s·0 C8·112 C9·0 C10·62 Benchmarkfs1-zhash.py 2127 16.6 µs/object # crc32:14640593 nread=8540363 t=0.035s # POLL·0 C1·113 C1E·21 C3·16 C6·56 C7s·0 C8·136 C9·0 C10·41 Benchmarkfs1-zhash.py 2127 15.9 µs/object # crc32:14640593 nread=8540363 t=0.034s # POLL·0 C1·71 C1E·36 C3·22 C6·50 C7s·0 C8·167 C9·0 C10·47 Benchmarkfs1-zhash.py 2127 15.9 µs/object # crc32:14640593 nread=8540363 t=0.034s # POLL·0 C1·77 C1E·32 C3·11 C6·55 C7s·0 C8·184 C9·0 C10·31 Benchmarkfs1-zhash.py 2127 16.0 µs/object # crc32:14640593 nread=8540363 t=0.034s # POLL·0 C1·78 C1E·15 C3·12 C6·51 C7s·0 C8·140 C9·0 C10·44 # 16 clients in parallel Benchmarkfs1-zhash.py·P16 2127 129.0 µs/object # crc32:14640593 nread=8540363 t=0.274s Benchmarkfs1-zhash.py·P16 2127 132.6 µs/object # crc32:14640593 nread=8540363 t=0.282s Benchmarkfs1-zhash.py·P16 2127 135.0 µs/object # crc32:14640593 nread=8540363 t=0.287s Benchmarkfs1-zhash.py·P16 2127 135.3 µs/object # crc32:14640593 nread=8540363 t=0.288s Benchmarkfs1-zhash.py·P16 2127 136.6 µs/object # crc32:14640593 nread=8540363 t=0.291s Benchmarkfs1-zhash.py·P16 2127 122.8 µs/object # crc32:14640593 nread=8540363 t=0.261s Benchmarkfs1-zhash.py·P16 2127 130.9 µs/object # crc32:14640593 nread=8540363 t=0.279s Benchmarkfs1-zhash.py·P16 2127 126.4 µs/object # crc32:14640593 nread=8540363 t=0.269s Benchmarkfs1-zhash.py·P16 2127 125.8 µs/object # crc32:14640593 nread=8540363 t=0.268s Benchmarkfs1-zhash.py·P16 2127 108.3 µs/object # crc32:14640593 nread=8540363 t=0.230s Benchmarkfs1-zhash.py·P16 2127 131.0 µs/object # crc32:14640593 nread=8540363 t=0.279s Benchmarkfs1-zhash.py·P16 2127 124.1 µs/object # crc32:14640593 nread=8540363 t=0.264s Benchmarkfs1-zhash.py·P16 2127 129.3 µs/object # crc32:14640593 nread=8540363 t=0.275s Benchmarkfs1-zhash.py·P16 2127 125.0 µs/object # crc32:14640593 nread=8540363 t=0.266s Benchmarkfs1-zhash.py·P16 2127 131.5 µs/object # crc32:14640593 nread=8540363 t=0.280s Benchmarkfs1-zhash.py·P16 2127 131.4 µs/object # crc32:14640593 nread=8540363 t=0.280s # POLL·0 C1·4 C1E·13 C3·11 C6·79 C7s·0 C8·14 C9·0 C10·0 ... And its summary via benchstat: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.log name time/object cluster:deco dataset:wczblk1-8 fs1-zhash.py 16.1µs ± 3% fs1-zhash.py·P16 130µs ± 5% fs1-zhash.go 3.00µs ±10% fs1-zhash.go+prefetch128 3.40µs ±18% fs1-zhash.go·P16 10.2µs ±71% zeo/py/fs1-zhash.py 336µs ± 3% zeo/py/fs1-zhash.py·P16 3.22ms ± 6% zeo/py/fs1-zhash.go 112µs ± 2% zeo/py/fs1-zhash.go+prefetch128 60.9µs ± 1% zeo/py/fs1-zhash.go·P16 1.07ms ± 5% neo/py(!log)/sqlite·P1-zhash.py 291µs ± 2% neo/py(!log)/sqlite·P1-zhash.py·P16 2.86ms ± 1% neo/py(!log)/sql·P1-zhash.py 318µs ± 4% neo/py(!log)/sql·P1-zhash.py·P16 3.99ms ± 0% cluster:deco dataset:prod1-1024 fs1-zhash.py 12.3µs ± 1% fs1-zhash.py·P16 106µs ±10% fs1-zhash.go 2.56µs ±10% fs1-zhash.go+prefetch128 2.68µs ± 8% fs1-zhash.go·P16 9.48µs ±43% zeo/py/fs1-zhash.py 319µs ± 3% zeo/py/fs1-zhash.py·P16 3.13ms ± 3% zeo/py/fs1-zhash.go 101µs ± 5% zeo/py/fs1-zhash.go+prefetch128 56.9µs ± 1% zeo/py/fs1-zhash.go·P16 1.19ms ± 4% neo/py(!log)/sqlite·P1-zhash.py 281µs ± 3% neo/py(!log)/sqlite·P1-zhash.py·P16 2.80ms ± 1% neo/py(!log)/sql·P1-zhash.py 316µs ± 1% neo/py(!log)/sql·P1-zhash.py·P16 3.91ms ± 1% Since there is no NEO/go support yet, corresponding neotest parts are merged, but commented-out with appropriate remark. Parallel access is simulated with spawning many OS processes for now. This will change in the nearby followup patch to zwrk. Results of ZODB benchmarking were discussed in http://navytux.spb.ru/~kirr/neo.html#performance-tests , and http://navytux.spb.ru/~kirr/neo.html#results-and-discussion Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/e0d875bc X neotest: Teach it to benchmark NEO with storage partitioned to several nodes lab.nexedi.com/kirr/neo/commit/590f0a46 X neo/py uses n(replica) as n(real-replica) - 1 lab.nexedi.com/kirr/neo/commit/b655da26 X save time not benchmarking things we do not show lab.nexedi.com/kirr/neo/commit/f834f40d X zhash: Show N(obj) read, not 1, in place of N(iter) lab.nexedi.com/kirr/neo/commit/a16e8d52 X teach golang to access ZEO lab.nexedi.com/kirr/neo/commit/b9827725 X switch to using no compression, because this way it is more fair for comparing storage latencies lab.nexedi.com/kirr/neo/commit/c0067335 X neotest: Don't depend on killall lab.nexedi.com/kirr/neo/commit/2bcd6ebb X neotest: add zbench-local & zbench-cluster subcomands lab.nexedi.com/kirr/neo/commit/fb165ad9 X neotest: Also benchmark NEO/py with logging disabled lab.nexedi.com/kirr/neo/commit/2118ba38 X neotest: Help mysqlk_install_db find its basedir under SlapOS lab.nexedi.com/kirr/neo/commit/80eaa05e X zgenprod1 tool lab.nexedi.com/kirr/neo/commit/eb0e516f X check hash result and error if mismatch (zhash.* part); neotest part pending lab.nexedi.com/kirr/neo/commit/046370db X benchify rest of bench-cluster lab.nexedi.com/kirr/neo/commit/2d13818e X bench-local + zhash: Add output in std bench format lab.nexedi.com/kirr/neo/commit/1d692a3b X add NEO/go with SHA1 disabled (both Sgo and Cgo to regular benchmarks)
-
Kirill Smelkov authored
Add to neotest bench-net command that performs latency measurments at ping and TCP levels. Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-net neotest@rio:9 node: cluster: deco-rio *** link latency: # deco ⇄ rio (ping 16B) PING rio (192.168.0.8) 16(44) bytes of data. --- rio ping statistics --- 25705 packets transmitted, 25705 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.080/0.097/0.220/0.011 ms, ipg/ewma 0.116/0.095 ms Benchmarkpingrtt-/16B-min 1 0.080 ms/op Benchmarkpingrtt-/16B-avg 1 0.097 ms/op # POLL·3 C1·476 C1E·60917 C3·53 C6·132 C7s·0 C8·203 C9·0 C10·141 ... *** TCP latency: # deco ⇄ rio (lat_tcp.c 1B -> lat_tcp.c -s) Benchmarktcprtt(c_c)-/1B 1 116.1743 µs/op # TCP latency using rio: 116.1743 microseconds # POLL·6 C1·892 C1E·65748 C3·80 C6·165 C7s·0 C8·339 C9·0 C10·444 Benchmarktcprtt(c_c)-/1B 1 117.2896 µs/op # TCP latency using rio: 117.2896 microseconds # POLL·4 C1·1063 C1E·67647 C3·64 C6·77 C7s·0 C8·144 C9·0 C10·209 Benchmarktcprtt(c_c)-/1B 1 117.5331 µs/op # TCP latency using rio: 117.5331 microseconds # POLL·1 C1·954 C1E·76866 C3·96 C6·88 C7s·0 C8·206 C9·0 C10·246 Benchmarktcprtt(c_c)-/1B 1 117.6509 µs/op # TCP latency using rio: 117.6509 microseconds # POLL·4 C1·731 C1E·84210 C3·103 C6·93 C7s·0 C8·180 C9·0 C10·187 Benchmarktcprtt(c_c)-/1B 1 116.8125 µs/op # TCP latency using rio: 116.8125 microseconds # POLL·9 C1·550 C1E·79544 C3·110 C6·213 C7s·0 C8·508 C9·0 C10·475 ... And its summary via benchstat: name time/op pingrtt-/16B-min 80.0µs ± 0% pingrtt-/16B-avg 97.0µs ± 0% -pingrtt/16B-min 79.0µs ± 0% -pingrtt/16B-avg 112µs ± 0% pingrtt-/1452B-min 241µs ± 0% pingrtt-/1452B-avg 303µs ± 0% -pingrtt/1452B-min 266µs ± 0% -pingrtt/1452B-avg 303µs ± 0% tcprtt(c_c)-/1B 117µs ± 1% tcprtt(c_go)-/1B 122µs ± 2% -tcprtt(c_c)/1B 117µs ± 1% -tcprtt(c_go)/1B 121µs ± 5% tcprtt(c_c)-/1400B 392µs ± 4% tcprtt(c_go)-/1400B 363µs ±18% -tcprtt(c_c)/1400B 412µs ±21% -tcprtt(c_go)/1400B 391µs ±38% tcprtt(c_c)-/1500B 271µs ±18% tcprtt(c_go)-/1500B 290µs ±21% -tcprtt(c_c)/1500B 282µs ±16% -tcprtt(c_go)/1500B 334µs ±24% tcprtt(c_c)-/4096B 711µs ± 5% tcprtt(c_go)-/4096B 737µs ± 5% -tcprtt(c_c)/4096B 740µs ± 2% -tcprtt(c_go)/4096B 711µs ± 7% Latencies here are not good because for this run on rio interrupt mitigation was not tuned (see below). By the way, analyzing ping RTT latencies on our shuttle machines (similar to rio) resulted in the following kernel patch https://git.kernel.org/linus/509708310c (released with Linux 4.15) to fix/being able to adjust interrupt mitigation on Realtek NICs. While at networking topic, teach info/info-local to show related information about node's NICs. Example lines output for deco: nic/eth0: Intel Corporation Ethernet Connection I219-LM rev 21 nic/eth0/features: rx tx sg tso !ufo gso gro !lro rxvlan txvlan !ntuple rxhash ... nic/eth0/coalesce: rxc: 3μs/0f/0μs-irq/0f-irq, txc: 0μs/0f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs nic/wlan0: Intel Corporation Wireless 8260 rev 3a nic/wlan0/features: !rx !tx sg !tso !ufo gso gro !lro !rxvlan !txvlan !ntuple !rxhash ... nic/wlan0/coalesce: rxc: ?, txc: ? nic/wlan0/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/wlan0: TSO not enabled - TCP latency with packets > MSS will be poor for rio: nic/eth0: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth0/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth0/coalesce: rxc: 200μs/4f/0μs-irq/0f-irq, txc: 200μs/4f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth0: TSO not enabled - TCP latency with packets > MSS will be poor WARNING: nic/eth0: RX coalesce latency is max 200μs - that will add to networked request-reply latency nic/eth1: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth1/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth1/coalesce: rxc: 0μs/1f/0μs-irq/0f-irq, txc: 0μs/1f/0μs-irq/0f-irq nic/eth1/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth1: TSO not enabled - TCP latency with packets > MSS will be poor The warning about "RX coalesce latency is max 200μs ..." says that on receive path eth0 will be coalescing incoming frames for up to 200μs and this way this delay will be added to overal latency. (for small frames Realtek NICs do not coalesce interrupts - see details in the kernel patch). Networked performance (raw and NEO) was not discussed in http://navytux.spb.ru/~kirr/neo.html at all, but for the reference the importance of C-states for performance was first found via this networking latency benchmarks. Links on C-states topic: http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/e8e395ae X neotest: Move network benchmarking into separate function + add `neotest bench-net` lab.nexedi.com/kirr/neo/commit/a971231c X neotest/info: Handle USB NICs lab.nexedi.com/kirr/neo/commit/5dd3d1ab X neotest: sort NIC names lab.nexedi.com/kirr/neo/commit/9888f047 X neotest: Do not crash if kernel is too old to support gro_flush_timeout lab.nexedi.com/kirr/neo/commit/3a1bdf4a X bench-remote / tcp : std benchmark output lab.nexedi.com/kirr/neo/commit/9450b6db X bench-remote / ping += std bench output lab.nexedi.com/kirr/neo/commit/68d5b015 X show gro_flush_timeout + friends lab.nexedi.com/kirr/neo/commit/4c815af9 X neotest: Show NIC features and emit warning if !TSO lab.nexedi.com/kirr/neo/commit/659ce938 X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc... lab.nexedi.com/kirr/neo/commit/ded384cb X neotest += `lat_tcp.go -s` lab.nexedi.com/kirr/neo/commit/59d46504 X neotest += lat_tcp lab.nexedi.com/kirr/neo/commit/67fc3440 X show small (56B) and full-packet (1472B) ping link latencies
-
Kirill Smelkov authored
Add to neotest bench-disk command that performs random-read disk benchmarks via ioping. Example output: (venv) (8) neotest@rio:~/8/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-disk node: rio cluster: *** disk: random direct (no kernel cache) 4K-read latency --- . (ext4 /dev/sda1) ioping statistics --- 29.1 k requests completed in 2.95 s, 113.6 MiB read, 9.85 k iops, 38.5 MiB/s generated 29.1 k requests in 3.00 s, 113.6 MiB, 9.69 k iops, 37.9 MiB/s min/avg/max/mdev = 43.2 us / 101.5 us / 250.0 us / 7.48 us Benchmarkdisk/randread/direct/4K-min 1 43.2 us/op Benchmarkdisk/randread/direct/4K-avg 1 101.5 us/op < 59.2 us 458 | < 63.0 us 0 | < 66.7 us 0 | < 70.5 us 0 | < 74.2 us 1 | < 78.0 us 1 | < 81.7 us 0 | < 85.5 us 1 | < 89.2 us 0 | < 93.0 us 0 | < 96.7 us 0 | < 100.5 us 333 | < 104.2 us 27793 | *********************************************** < 108.0 us 259 | < 111.7 us 21 | < 115.5 us 8 | < 119.2 us 18 | < 123.0 us 1 | < 126.7 us 0 | < 130.5 us 7 | < 134.2 us 59 | < +∞ 18 | # POLL·186 C1·291360 C1E·290802 C3·31 C6·1218 ... *** disk: random cached 4K-read latency --- . (ext4 /dev/sda1) ioping statistics --- 3.15 M requests completed in 2.82 s, 12.0 GiB read, 1.12 M iops, 4.26 GiB/s generated 3.15 M requests in 3.00 s, 12.0 GiB, 1.05 M iops, 4.00 GiB/s min/avg/max/mdev = 465 ns / 896 ns / 37.4 us / 183 ns Benchmarkdisk/randread/pagecache/4K-min 1 465 ns/op Benchmarkdisk/randread/pagecache/4K-avg 1 896 ns/op < 839 ns 771375 | ************ < 872 ns 609361 | ********* < 905 ns 660635 | ********** < 938 ns 505305 | ******** < 971 ns 189182 | *** < 1.00 us 93655 | * < 1.04 us 70811 | * < 1.07 us 57650 | < 1.10 us 51587 | < 1.14 us 44648 | < 1.17 us 40868 | < 1.20 us 27301 | < 1.24 us 12503 | < 1.27 us 5580 | < 1.30 us 2517 | < 1.34 us 1404 | < 1.37 us 698 | < 1.40 us 378 | < 1.44 us 208 | < 1.47 us 119 | < 1.50 us 50 | < +∞ 978 | # POLL·1 C1·57 C1E·11 C3·2 C6·257 The benchmarks are so done so that output conforms to Go benchmarking format. This way it is possible to process the output with benchstat to summarize / compare results. For above run summarization gives: name time/op disk/randread/direct/4K-min 39.0µs ±12% disk/randread/direct/4K-avg 117µs ± 3% disk/randread/pagecache/4K-min 461ns ± 3% disk/randread/pagecache/4K-avg 880ns ± 0% While at disk topic, teach info/info-local to show related information about node's disk. Example line output for rio: disk/sda: Samsung SSD 840 rev BB0Q 931.5G Please see http://navytux.spb.ru/~kirr/neo.html#the-need-for-faster-storage http://navytux.spb.ru/~kirr/neo.html#appendix-i-ssd-latency for some discussion about SSD performance. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/d35a2fdf X neotest/info-local: Fix disk display in presence of bind-mounts lab.nexedi.com/kirr/neo/commit/44529dbf X neotest/bench-disk: Deduplicate code; change 1M -> 2M lab.nexedi.com/kirr/neo/commit/8bac3dba X neotest/bench-disk: Also benchmark randomly reading 1M blocks lab.nexedi.com/kirr/neo/commit/e795c6ed X neotest: Fix disk display in case of DM lab.nexedi.com/kirr/neo/commit/352cd100 X neotest: Fix disk display in case of MD lab.nexedi.com/kirr/neo/commit/cd2cd093 X bench_disk: Add std bench format lab.nexedi.com/kirr/neo/commit/9f86eb40 X bench += ioping
-
Kirill Smelkov authored
Add to neotest bench-cpu command that performs basic CPU benchmarks: pystone and CRC32/SHA1 for now. While every benchmark is run additionally C-states profile is collected(*). Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu node: deco cluster: Benchmarkpystone 1 283297 pystone/s # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6 Benchmarkpystone 1 289788 pystone/s # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7 Benchmarkpystone 1 286329 pystone/s # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6 Benchmarkpystone 1 292087 pystone/s # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3 Benchmarkpystone 1 290119 pystone/s # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5 Benchmarkcrc32/py/4K 300000 3.415 µs/op # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71 Benchmarkcrc32/py/4K 300000 3.402 µs/op # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77 Benchmarkcrc32/py/4K 300000 3.396 µs/op # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36 Benchmarkcrc32/py/4K 300000 3.435 µs/op # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79 Benchmarkcrc32/py/4K 300000 3.434 µs/op # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295 Benchmarkcrc32/go/4K 10000000 0.216 µs/op # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328 Benchmarksha1/py/4K 300000 4.553 µs/op # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94 Benchmarksha1/py/4K 300000 4.459 µs/op # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92 Benchmarksha1/py/4K 300000 4.492 µs/op # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62 Benchmarksha1/py/4K 300000 4.550 µs/op # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93 Benchmarksha1/py/4K 300000 4.518 µs/op # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78 Benchmarksha1/go/4K 300000 4.312 µs/op # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190 Benchmarksha1/go/4K 300000 4.383 µs/op # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182 Benchmarksha1/go/4K 300000 4.387 µs/op # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186 Benchmarksha1/go/4K 300000 4.328 µs/op # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179 Benchmarksha1/go/4K 300000 4.337 µs/op # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191 Such raw output can be summarized with the help of benchstat - either with Go[1] or Python[2] implementations: $ benchstat x.txt name pystone/s pystone 288k ± 2% name time/op crc32/py/4K 3.42µs ± 1% crc32/go/4K 218ns ± 1% sha1/py/4K 4.51µs ± 1% sha1/go/4K 4.35µs ± 1% See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some discussion on SHA1 vs CRC32. While at CPU topic, teach info/info-local to show related information about node's CPU: available processors, frequency and idle governors. Example of lines added: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6 ... cpu: Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz] cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs WARNING: cpu: frequency not fixed - benchmark timings won't be stable WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency) See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to understand why there are warnings in above example. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/cf1f7c24 X tcpu: Don't depend on running tests with cwd = .../go/neo/t/ lab.nexedi.com/kirr/neo/commit/1e438610 fixup! X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/4af48245 X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/2910cf56 X neotest: Prefer first part of FQDN for hostname lab.nexedi.com/kirr/neo/commit/c86ba1b0 X bench-cpu += crc32, adler32 lab.nexedi.com/kirr/neo/commit/4ac3a550 X neotest: Don't use bc lab.nexedi.com/kirr/neo/commit/3918a997 X neotest: Don't assume we are invoked from the directory where neotest is lab.nexedi.com/kirr/neo/commit/9a266d11 X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B lab.nexedi.com/kirr/neo/commit/b6a830d8 X switch cpu benchmarks to go format lab.nexedi.com/kirr/neo/commit/4436b983 X neotest: Provide cpustat command so it is possible to cpustat something external lab.nexedi.com/kirr/neo/commit/b062b349 X microbenchmark CPU first lab.nexedi.com/kirr/neo/commit/a4a18b55 X first cut on C-state profiling lab.nexedi.com/kirr/neo/commit/ea1e0835 X found that cpuidle can be affecting latency a lot! (*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for why this is important. Since being able to profile C-states can be generally useful, we expose such profiling with externally-visible `neotest cpustat` utility. [1] https://godoc.org/golang.org/x/perf/cmd/benchstat [2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
-
Kirill Smelkov authored
For testing and benchmarking how NEO/py & NEO/go interact with each other we need corresponding test driver. Neotest will be that driver and present patch begins it: - neotest can deploy NEO/{go,py} checkout either locally or on remote node; - it can run NEO/{go,py} unit tests either locally or on remote node; - it can also show information about a system - either local or remote. Examples in action. Banner: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest Neotest is a tool to test and benchmark NEO. Usage: neotest command [arguments] The commands are: test run all tests on a remote host test-local run all tests locally test-go run NEO/go unit tests (part of test-local) test-py run NEO/py unit tests (part of test-local) deploy deploy NEO & needed software for tests to remote host deploy-local deploy NEO & needed software for tests locally info print information about a node info-local print information about local deployment Deploy to another node: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest deploy neotest@rio.kirr.nexedi.com:3 *** deploying to neotest@rio.kirr.nexedi.com:3 ... ... # deployed ok Print information about that node: x/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:3 date: Sun, 08 Jul 2018 21:30:43 +0300 xnode: neotest@rio.kirr.nexedi.com (2401:5180:0:2a::1 192.168.0.8) uname: Linux rio 4.16.0-2-amd64 #1 SMP Debian 4.16.16-2 (2018-06-22) x86_64 GNU/Linux sw/python: Python 2.7.15 sw/go: go version go1.10.3 linux/amd64 sw/sqlite: sqlite 3.24.0 (py mod 2.6.0) sw/mysqld: mysqld Ver 10.1.29-MariaDB-6+b1 for debian-linux-gnu on x86_64 (Debian buildd-unstable) sw/neo: v1.9-42-g972ff5f9 sw/zodb: 5.4.0 sw/zeo: 5.2.0 sw/mysqlclient: 1.3.13 Run NEO/{py,go} unit tests there: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest test neotest@rio.kirr.nexedi.com:4 ? lab.nexedi.com/kirr/neo/go/internal/packed [no test files] ok lab.nexedi.com/kirr/neo/go/neo/neonet (cached) ok lab.nexedi.com/kirr/neo/go/neo/proto (cached) ok lab.nexedi.com/kirr/neo/go/zodb (cached) ? lab.nexedi.com/kirr/neo/go/zodb/cmd/zodb [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/internal/pickletools [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/storage [no test files] ok lab.nexedi.com/kirr/neo/go/zodb/storage/fs1 (cached) ? lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/cmd/fs1 [no test files] ok lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fs1tools (cached) ? lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fsb [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/storage/zeo [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/wks [no test files] ok lab.nexedi.com/kirr/neo/go/zodb/zodbtools (cached) ...........................................................................E.EE....c....^C Neotest will be the driver that was used to prepare benchmarks in http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html . Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/d46afb3e X neotest: Teach it to also run go & py unit tests; hook it into nxd/runTestSuite lab.nexedi.com/kirr/neo/commit/f694d643 X neotest: Don't fail silently if network address detection fails lab.nexedi.com/kirr/neo/commit/fa78290a X neotest: FQDN host name might be not configured lab.nexedi.com/kirr/neo/commit/56faccad X neotest/info-local: Don't crash if a prog could not be found lab.nexedi.com/kirr/neo/commit/f2932247 X neotest/info-local: Don't crash if an egg could not be found lab.nexedi.com/kirr/neo/commit/f06b7302 X neotest: Determine machine IP addresses via `ip ...` directly, not `getent ...` lab.nexedi.com/kirr/neo/commit/42e5fe71 X neotest info
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 10 Jul, 2018 2 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-