Commits · e1b05139f53b31731f86860c270cf3c0b3b6c933 · Kirill Smelkov / neo

20 Jul, 2018 4 commits

go/zodb: Loader: Document that loaded buffer must not be modified · e1b05139
Kirill Smelkov authored Jul 20, 2018
```
The reason for this is that the buffer can be shared with other loaders
via cache.
```
e1b05139

go/zodb: Split Loader & co. interfaces from IStorage · 1d8f62c2

Kirill Smelkov authored Jul 20, 2018

There are situations when we want to work with, or emulate, only subset
of whole IStorage functionality, with one already established example
being Cache. For this reason split IStorage interface into finer-grained
ones. For now:

	- Loader
	- Prefetcher
	- Iterator
	- Committer
	- Notifier

1d8f62c2

go/zodb: Document that Tid(0) is invalid while Oid(0) is valid. · 65c42681
Kirill Smelkov authored Jul 20, 2018
```
Oid(0) represents root database object. We cannot change this.
```
65c42681
fixup! go/zodb: Tweak documentation a bit so it renders more well in godoc · 97214cb9
Kirill Smelkov authored Jul 20, 2018
```
Missed few places in 8685b742.
```
97214cb9

11 Jul, 2018 16 commits

go/zodb: Tweak documentation a bit so it renders more well in godoc · 8685b742
Kirill Smelkov authored Jul 11, 2018
```
While at it add draft overview of data model & friends to package
documentation.
```
8685b742
go/neo/t/nxd/runTestSuite: Also run go & py unit tests in addition to bench-local · 0e28fbb8
Kirill Smelkov authored Feb 06, 2018

0e28fbb8

go/neo/t/nxd/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally · f67c147d

Kirill Smelkov authored Feb 05, 2018

We send output from tested process to master. We also print it to
stdout,stderr so it appears in testnode logs.

However till now it was like, whole output first read, and only then
emitted to log as a whole, thus not allowing to oversee current test
progress by watching testnode log tail.

Fix it by implementing the teeing process manually.

Some draft history related to this patch:

lab.nexedi.com/kirr/neo/commit/aa370ca3 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
lab.nexedi.com/kirr/neo/commit/096550b1 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
lab.nexedi.com/kirr/neo/commit/63956f43 fixup! X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally
lab.nexedi.com/kirr/neo/commit/b9819d0e X neotest/runTestSuite: Tee tested process stdout,stderr to testnode logs incrementally

f67c147d

go/neo/t/nxd/runTestSuite: New wrapper to run neotest as part of Nexedi testing infrastructure · 51b18490

Kirill Smelkov authored Feb 02, 2018

The wrapper runs `neotest bench-local` in neotest SlapOS instance:

	https://lab.nexedi.com/nexedi/slapos/blob/ab8705f4/software/neotest/instance.cfg.in

Extracted from nexedi/slapos!282

51b18490

go/neo/t/benchplot: New program to visualise neotest benchmarks (draft) · 0fb3d795

Kirill Smelkov authored Jul 11, 2018

Add the program that reads results from either bench-local or bench-cluster
neotest output and visualizes it. It uses benchlib.py module to read data
in Go benchmark format(*), processes them and plots scalability and other
graphs via matplotlib.

There are lots of hacks and rough edges, and in particular callout coordinate
calculation is completely wrong. However even in this state benchplot was used
to prepare the graphs in http://navytux.spb.ru/~kirr/neo.html and
http://navytux.spb.ru/~kirr/misc/neo·P4.html .

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/078c9ac3        X move benchlib to -> https://lab.nexedi.com/kirr/pygolang
	lab.nexedi.com/kirr/neo/commit/0edd5129        X benchplot: Teach it to understand benchmark names for partitioned NEO clusters
	lab.nexedi.com/kirr/neo/commit/a1dde3c9        X deco-rio timings
	lab.nexedi.com/kirr/neo/commit/916782b6        X normalize/convert units, so that disk and ping/tcp latencies could be plotted too
	lab.nexedi.com/kirr/neo/commit/f5fec740        X switch node info to labels; start adding that to plot
	lab.nexedi.com/kirr/neo/commit/906462a3        X neotest: Move cluster / node out fro benchmark name to label in environment
	lab.nexedi.com/kirr/neo/commit/cceca65f        X benchplot: Start of automated plotting for neotest benchmark data
	lab.nexedi.com/kirr/neo/commit/a9b10a45        X benchlib/benchstat: Emit label:value info for several labels on one line, similary to go version
	lab.nexedi.com/kirr/neo/commit/502d9477        X benchlib: Python module to read & work with data in Go benchmark format

(*) benchlib.py is now part of pygolang: https://pypi.org/project/pygolang .

0fb3d795

go/neo/t/neotest: Add bench-local and bench-cluster commands · 92a30ef1

Kirill Smelkov authored Jul 11, 2018

These commands do full benchmarking for localhost and networked cases:

- show system info
- do server & client cpu benchmarks
- do server disk benchmarks
- for networked case: do network benchmarks
- tail to either zbench-local or zbench-cluster

It was full `neotest bench-local` that was used to prepare benchmarks
for http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html

92a30ef1

go/internal/xzlib: Switch to github.com/DataDog/czlib to zlib Decompression · 7385209f

Kirill Smelkov authored Feb 21, 2018

name old time/op new time/op delta
unzlib/py/wczdata 20.8µs ± 2% 20.7µs ± 1% ~ (p=0.421 n=5+5)
unzlib/go/wczdata 64.4µs ± 1% 21.3µs ± 0% -66.89% (p=0.008 n=5+5)
unzlib/py/prod1-avg 4.00µs ± 1% 4.02µs ± 1% ~ (p=0.421 n=5+5)
unzlib/go/prod1-avg 10.4µs ± 1% 4.3µs ± 1% -58.72% (p=0.008 n=5+5)

There is also unsafe interface with czlib.UnsafeDecompress & friends which I
had not tried because even using safe interface brings ~ 3x speedup.

7385209f

go/internal/xzlib: Try to reuse zlib decoders · fc44cbd7

Kirill Smelkov authored Feb 21, 2018

name old time/op new time/op delta
unzlib/py/wczdata 20.7µs ± 2% 20.8µs ± 2% ~ (p=0.548 n=5+5)
unzlib/go/wczdata 70.6µs ± 0% 64.4µs ± 1% -8.85% (p=0.008 n=5+5)
unzlib/py/prod1-avg 4.02µs ± 1% 4.00µs ± 1% ~ (p=0.167 n=5+5)
unzlib/go/prod1-avg 15.2µs ± 0% 10.4µs ± 1% -31.59% (p=0.008 n=5+5)

still on wczdata and prod1 much slower compared to py/c zlib.

fc44cbd7

go/neo/t/neotest: bench-cpu += unzlib for wczblk1 and prod1 objects · 91a8afa8

Kirill Smelkov authored Jul 11, 2018

NEO uses zlib compression for data, and this way client has to spend
time decompressing it. Benchmark how much time zlib decompression takes.
With stdlib zlib decompressor out of the box it looks like:

	name                 time/op
	unzlib/py/wczdata    20.7µs ± 2%
	unzlib/go/wczdata    70.6µs ± 0%
	unzlib/py/prod1-avg  4.02µs ± 1%
	unzlib/go/prod1-avg  15.2µs ± 0%

i.e. much not in favour of Go.

We'll be fixing that in the following patches.

91a8afa8

go/neo/t/neotest: Switch to zwrk to simulate parallel load from multiple clients · 646a94b5

Kirill Smelkov authored Feb 26, 2018

With zwrk for ZODB being similar to what wrk is for HTTP.

Rationale: simulating multiple clients is:

1. noisy - the timings from run to run are changing sometimes up to 50%
2. with significant additional overhead - there are constant OS-level
   process switches in between client processes and this prevents to
   actually create the load.
3. the above load from "2" actually takes resources from the server in
   localhost case.

So let's switch to simulating many requests in lightweight way similarly
to how it is done in wrk - in one process and not so many threads (it
can be just 1) with many connections opened to server and epolly way to
load it with Go providing epoll-goroutine matching.

Example summarized zbench-local output:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.txt
	name                             time/object
	cluster:rio dataset:wczblk1-8
	fs1-zhash.py                             23.7µs ± 5%
	fs1-zhash.go                             5.68µs ± 8%
	fs1-zhash.go+prefetch128                 6.44µs ±16%
	zeo/py/fs1-zhash.py                       376µs ± 4%
	zeo/py/fs1-zhash.go                       130µs ± 3%
	zeo/py/fs1-zhash.go+prefetch128          72.3µs ± 4%
	neo/py(!log)/sqlite·P1-zhash.py           565µs ± 4%
	neo/py(!log)/sql·P1-zhash.py              491µs ± 8%
	cluster:rio dataset:prod1-1024
	fs1-zhash.py                             19.5µs ± 2%
	fs1-zhash.go                             3.92µs ±12%
	fs1-zhash.go+prefetch128                 4.42µs ± 6%
	zeo/py/fs1-zhash.py                       365µs ± 9%
	zeo/py/fs1-zhash.go                       120µs ± 1%
	zeo/py/fs1-zhash.go+prefetch128          68.4µs ± 3%
	neo/py(!log)/sqlite·P1-zhash.py           560µs ± 5%
	neo/py(!log)/sql·P1-zhash.py              482µs ± 8%

	name                             req/s
	cluster:rio dataset:wczblk1-8
	fs1-zwrk.go·1                              380k ± 2%
	fs1-zwrk.go·2                              666k ± 3%
	fs1-zwrk.go·3                              948k ± 1%
	fs1-zwrk.go·4                             1.24M ± 1%
	fs1-zwrk.go·8                             1.62M ± 0%
	fs1-zwrk.go·12                            1.70M ± 0%
	fs1-zwrk.go·16                            1.71M ± 0%
	zeo/py/fs1-zwrk.go·1                      8.29k ± 1%
	zeo/py/fs1-zwrk.go·2                      10.4k ± 2%
	zeo/py/fs1-zwrk.go·3                      11.2k ± 1%
	zeo/py/fs1-zwrk.go·4                      11.7k ± 1%
	zeo/py/fs1-zwrk.go·8                      12.1k ± 2%
	zeo/py/fs1-zwrk.go·12                     12.3k ± 1%
	zeo/py/fs1-zwrk.go·16                     12.3k ± 2%
	cluster:rio dataset:prod1-1024
	fs1-zwrk.go·1                              594k ± 7%
	fs1-zwrk.go·2                             1.14M ± 4%
	fs1-zwrk.go·3                             1.60M ± 2%
	fs1-zwrk.go·4                             2.09M ± 1%
	fs1-zwrk.go·8                             2.74M ± 1%
	fs1-zwrk.go·12                            2.76M ± 0%
	fs1-zwrk.go·16                            2.76M ± 1%
	zeo/py/fs1-zwrk.go·1                      9.42k ± 9%
	zeo/py/fs1-zwrk.go·2                      10.4k ± 1%
	zeo/py/fs1-zwrk.go·3                      11.4k ± 1%
	zeo/py/fs1-zwrk.go·4                      11.7k ± 2%
	zeo/py/fs1-zwrk.go·8                      12.4k ± 1%
	zeo/py/fs1-zwrk.go·12                     12.5k ± 1%
	zeo/py/fs1-zwrk.go·16                     13.4k ±11%

	name                             latency-time/object
	cluster:rio dataset:wczblk1-8
	fs1-zwrk.go·1                            2.63µs ± 2%
	fs1-zwrk.go·2                            3.00µs ± 3%
	fs1-zwrk.go·3                            3.16µs ± 1%
	fs1-zwrk.go·4                            3.23µs ± 1%
	fs1-zwrk.go·8                            4.94µs ± 0%
	fs1-zwrk.go·12                           7.06µs ± 0%
	fs1-zwrk.go·16                           9.36µs ± 0%
	zeo/py/fs1-zwrk.go·1                      121µs ± 1%
	zeo/py/fs1-zwrk.go·2                      192µs ± 2%
	zeo/py/fs1-zwrk.go·3                      267µs ± 1%
	zeo/py/fs1-zwrk.go·4                      343µs ± 1%
	zeo/py/fs1-zwrk.go·8                      660µs ± 2%
	zeo/py/fs1-zwrk.go·12                     977µs ± 1%
	zeo/py/fs1-zwrk.go·16                    1.30ms ± 2%
	cluster:rio dataset:prod1-1024
	fs1-zwrk.go·1                            1.69µs ± 7%
	fs1-zwrk.go·2                            1.76µs ± 4%
	fs1-zwrk.go·3                            1.88µs ± 2%
	fs1-zwrk.go·4                            1.91µs ± 1%
	fs1-zwrk.go·8                            2.92µs ± 1%
	fs1-zwrk.go·12                           4.34µs ± 0%
	fs1-zwrk.go·16                           5.80µs ± 1%
	zeo/py/fs1-zwrk.go·1                      107µs ± 9%
	zeo/py/fs1-zwrk.go·2                      192µs ± 1%
	zeo/py/fs1-zwrk.go·3                      263µs ± 1%
	zeo/py/fs1-zwrk.go·4                      342µs ± 2%
	zeo/py/fs1-zwrk.go·8                      648µs ± 1%
	zeo/py/fs1-zwrk.go·12                     957µs ± 1%
	zeo/py/fs1-zwrk.go·16                    1.20ms ±10%

The scalability graphs in http://navytux.spb.ru/~kirr/neo.html were
made with simulating client load by zwrk, not many client OS processes.
http://navytux.spb.ru/~kirr/neo.html#performance-tests has some
additional notes on zwrk.

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/ca0d828b	X neotest: Tzwrk1 - place to control running time of 1 zwrk iteration
	lab.nexedi.com/kirr/neo/commit/bbfb5006	X zwrk: Make sure we warm up connections to all NEO storages when cluster is partitioned
	lab.nexedi.com/kirr/neo/commit/7f22bba6	X zwrk: New tool to simulate paralell load from multiple clients

646a94b5

go/zodb: Allow to open a storage in "direct" mode - without local cache · 1f92a4e2

Kirill Smelkov authored Jul 10, 2018

zodb/go provides generic cache (see 7233b4c0 "zodb/go: In-RAM client
cache") primarily in order for prefetch to work. However if we need to
benchmark a storage with loading some objects several times, this cache
can hide the actual time it takes for an object to load.

For such use cases add NoCache open option so that opening does not
create a cache and always conveys load operations directly to storage
driver. The option will be used by zwrk tool (see next patch).

1f92a4e2

go/neo/t/neotest: ZODB benchmarks · 3f578560

Kirill Smelkov authored Jul 10, 2018

Add to neotest zbench-local and zbench-cluster commands that perform
ZODB benchmarks on FileStorage, ZEO and NEO with Python and Go clients
either locally, or with a server and client running on 2 different nodes.

There are 2 client programs: tzodb.py and tzodb.go which for now compute
hash of whole latest objects stream in a ZODB database. On server side
neotest is taught to launch ZEO and various NEO clusters and to execute
client load on them.

Two test datasets are used: wczblk1-8 - the dataset with wendelin.core ZBlk1
objects covering 8M array, and prod1-1024 - synthethic dataset that tries to
represent regular ERP5 instance. Both datasets are very small and so we can
assume they reside completely in server disk cache while running benchmarks.
Benchmark timings will thus give pure storage software processing latency, as
pagecache hit time is on par, or less, to 1µs.

Example output:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest zbench-local
	dataset:	wczblk1-8
	node:
	cluster:	deco

	*** generating fs1 data...
	I: RAM:  7.47GB
	I: WORK: 0.01GB
	gen signal t=0...1.05e+06  float64  (= 0.01GB)
	gen signal blk [0:1048576]  (100.0%)
	VIRT: 297 MB	RSS: 48MB

	*** generating sqlite data...
	I: RAM:  7.47GB
	I: WORK: 0.01GB
	gen signal t=0...1.05e+06  float64  (= 0.01GB)
	gen signal blk [0:1048576]  (100.0%)
	VIRT: 386 MB	RSS: 58MB
	2018-07-10 19:57:35.7065 ERROR     NEO        [           app: 91] primary master is down
	Cluster state changed

	*** generating sql data...
	2018-07-10 19:57:35 140115116649600 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27574 ...
	2018-07-10 19:57:39 140205509999744 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27603 ...
	2018-07-10 19:57:42 139692109810816 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27633 ...
	2018-07-10 19:57:45 139759221546112 [Note] mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27662 ...
	I: RAM:  7.47GB
	I: WORK: 0.01GB
	gen signal t=0...1.05e+06  float64  (= 0.01GB)
	gen signal blk [0:1048576]  (100.0%)
	VIRT: 387 MB	RSS: 59MB
	2018-07-10 19:57:48.2565 ERROR     NEO        [           app: 91] primary master is down
	Cluster state changed

	*** FileStorage
	Benchmarkfs1-zhash.py 2127 16.3 µs/object	# crc32:14640593  nread=8540363  t=0.035s	# POLL·2 C1·73 C1E·38 C3·12 C6·36 C7s·0 C8·112 C9·0 C10·62
	Benchmarkfs1-zhash.py 2127 16.6 µs/object	# crc32:14640593  nread=8540363  t=0.035s	# POLL·0 C1·113 C1E·21 C3·16 C6·56 C7s·0 C8·136 C9·0 C10·41
	Benchmarkfs1-zhash.py 2127 15.9 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·71 C1E·36 C3·22 C6·50 C7s·0 C8·167 C9·0 C10·47
	Benchmarkfs1-zhash.py 2127 15.9 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·77 C1E·32 C3·11 C6·55 C7s·0 C8·184 C9·0 C10·31
	Benchmarkfs1-zhash.py 2127 16.0 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·78 C1E·15 C3·12 C6·51 C7s·0 C8·140 C9·0 C10·44

	# 16 clients in parallel
	Benchmarkfs1-zhash.py·P16 2127 129.0 µs/object	# crc32:14640593  nread=8540363  t=0.274s
	Benchmarkfs1-zhash.py·P16 2127 132.6 µs/object	# crc32:14640593  nread=8540363  t=0.282s
	Benchmarkfs1-zhash.py·P16 2127 135.0 µs/object	# crc32:14640593  nread=8540363  t=0.287s
	Benchmarkfs1-zhash.py·P16 2127 135.3 µs/object	# crc32:14640593  nread=8540363  t=0.288s
	Benchmarkfs1-zhash.py·P16 2127 136.6 µs/object	# crc32:14640593  nread=8540363  t=0.291s
	Benchmarkfs1-zhash.py·P16 2127 122.8 µs/object	# crc32:14640593  nread=8540363  t=0.261s
	Benchmarkfs1-zhash.py·P16 2127 130.9 µs/object	# crc32:14640593  nread=8540363  t=0.279s
	Benchmarkfs1-zhash.py·P16 2127 126.4 µs/object	# crc32:14640593  nread=8540363  t=0.269s
	Benchmarkfs1-zhash.py·P16 2127 125.8 µs/object	# crc32:14640593  nread=8540363  t=0.268s
	Benchmarkfs1-zhash.py·P16 2127 108.3 µs/object	# crc32:14640593  nread=8540363  t=0.230s
	Benchmarkfs1-zhash.py·P16 2127 131.0 µs/object	# crc32:14640593  nread=8540363  t=0.279s
	Benchmarkfs1-zhash.py·P16 2127 124.1 µs/object	# crc32:14640593  nread=8540363  t=0.264s
	Benchmarkfs1-zhash.py·P16 2127 129.3 µs/object	# crc32:14640593  nread=8540363  t=0.275s
	Benchmarkfs1-zhash.py·P16 2127 125.0 µs/object	# crc32:14640593  nread=8540363  t=0.266s
	Benchmarkfs1-zhash.py·P16 2127 131.5 µs/object	# crc32:14640593  nread=8540363  t=0.280s
	Benchmarkfs1-zhash.py·P16 2127 131.4 µs/object	# crc32:14640593  nread=8540363  t=0.280s
	# POLL·0 C1·4 C1E·13 C3·11 C6·79 C7s·0 C8·14 C9·0 C10·0

	...

And its summary via benchstat:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.log
	name                                 time/object
	cluster:deco dataset:wczblk1-8
	fs1-zhash.py                         16.1µs ± 3%
	fs1-zhash.py·P16                      130µs ± 5%
	fs1-zhash.go                         3.00µs ±10%
	fs1-zhash.go+prefetch128             3.40µs ±18%
	fs1-zhash.go·P16                     10.2µs ±71%
	zeo/py/fs1-zhash.py                   336µs ± 3%
	zeo/py/fs1-zhash.py·P16              3.22ms ± 6%
	zeo/py/fs1-zhash.go                   112µs ± 2%
	zeo/py/fs1-zhash.go+prefetch128      60.9µs ± 1%
	zeo/py/fs1-zhash.go·P16              1.07ms ± 5%
	neo/py(!log)/sqlite·P1-zhash.py       291µs ± 2%
	neo/py(!log)/sqlite·P1-zhash.py·P16  2.86ms ± 1%
	neo/py(!log)/sql·P1-zhash.py          318µs ± 4%
	neo/py(!log)/sql·P1-zhash.py·P16     3.99ms ± 0%
	cluster:deco dataset:prod1-1024
	fs1-zhash.py                         12.3µs ± 1%
	fs1-zhash.py·P16                      106µs ±10%
	fs1-zhash.go                         2.56µs ±10%
	fs1-zhash.go+prefetch128             2.68µs ± 8%
	fs1-zhash.go·P16                     9.48µs ±43%
	zeo/py/fs1-zhash.py                   319µs ± 3%
	zeo/py/fs1-zhash.py·P16              3.13ms ± 3%
	zeo/py/fs1-zhash.go                   101µs ± 5%
	zeo/py/fs1-zhash.go+prefetch128      56.9µs ± 1%
	zeo/py/fs1-zhash.go·P16              1.19ms ± 4%
	neo/py(!log)/sqlite·P1-zhash.py       281µs ± 3%
	neo/py(!log)/sqlite·P1-zhash.py·P16  2.80ms ± 1%
	neo/py(!log)/sql·P1-zhash.py          316µs ± 1%
	neo/py(!log)/sql·P1-zhash.py·P16     3.91ms ± 1%

Since there is no NEO/go support yet, corresponding neotest parts are merged,
but commented-out with appropriate remark.

Parallel access is simulated with spawning many OS processes for now.
This will change in the nearby followup patch to zwrk.

Results of ZODB benchmarking were discussed in

	http://navytux.spb.ru/~kirr/neo.html#performance-tests		, and
	http://navytux.spb.ru/~kirr/neo.html#results-and-discussion

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/e0d875bc	X neotest: Teach it to benchmark NEO with storage partitioned to several nodes
	lab.nexedi.com/kirr/neo/commit/590f0a46	X neo/py uses n(replica) as n(real-replica) - 1
	lab.nexedi.com/kirr/neo/commit/b655da26	X save time not benchmarking things we do not show
	lab.nexedi.com/kirr/neo/commit/f834f40d	X zhash: Show N(obj) read, not 1, in place of N(iter)
	lab.nexedi.com/kirr/neo/commit/a16e8d52	X teach golang to access ZEO
	lab.nexedi.com/kirr/neo/commit/b9827725	X switch to using no compression, because this way it is more fair for comparing storage latencies
	lab.nexedi.com/kirr/neo/commit/c0067335	X neotest: Don't depend on killall
	lab.nexedi.com/kirr/neo/commit/2bcd6ebb	X neotest: add zbench-local & zbench-cluster subcomands
	lab.nexedi.com/kirr/neo/commit/fb165ad9	X neotest: Also benchmark NEO/py with logging disabled
	lab.nexedi.com/kirr/neo/commit/2118ba38	X neotest: Help mysqlk_install_db find its basedir under SlapOS
	lab.nexedi.com/kirr/neo/commit/80eaa05e	X zgenprod1 tool
	lab.nexedi.com/kirr/neo/commit/eb0e516f	X check hash result and error if mismatch (zhash.* part); neotest part pending
	lab.nexedi.com/kirr/neo/commit/046370db	X benchify rest of bench-cluster
	lab.nexedi.com/kirr/neo/commit/2d13818e	X bench-local + zhash: Add output in std bench format
	lab.nexedi.com/kirr/neo/commit/1d692a3b	X add NEO/go with SHA1 disabled (both Sgo and Cgo to regular benchmarks)

3f578560

go/neo/t/neotest: Network information & benchmarks · 26006d7e

Kirill Smelkov authored Jul 09, 2018

Add to neotest bench-net command that performs latency measurments at
ping and TCP levels. Example output:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-net neotest@rio:9
	node:
	cluster:        deco-rio

	*** link latency:

	# deco ⇄ rio (ping 16B)
	PING rio (192.168.0.8) 16(44) bytes of data.

	--- rio ping statistics ---
	25705 packets transmitted, 25705 received, 0% packet loss, time 2999ms
	rtt min/avg/max/mdev = 0.080/0.097/0.220/0.011 ms, ipg/ewma 0.116/0.095 ms
	Benchmarkpingrtt-/16B-min 1 0.080 ms/op
	Benchmarkpingrtt-/16B-avg 1 0.097 ms/op
	# POLL·3 C1·476 C1E·60917 C3·53 C6·132 C7s·0 C8·203 C9·0 C10·141

	...

	*** TCP latency:

	# deco ⇄ rio (lat_tcp.c 1B  -> lat_tcp.c -s)
	Benchmarktcprtt(c_c)-/1B 1 116.1743 µs/op       # TCP latency using rio: 116.1743 microseconds  # POLL·6 C1·892 C1E·65748 C3·80 C6·165 C7s·0 C8·339 C9·0 C10·444
	Benchmarktcprtt(c_c)-/1B 1 117.2896 µs/op       # TCP latency using rio: 117.2896 microseconds  # POLL·4 C1·1063 C1E·67647 C3·64 C6·77 C7s·0 C8·144 C9·0 C10·209
	Benchmarktcprtt(c_c)-/1B 1 117.5331 µs/op       # TCP latency using rio: 117.5331 microseconds  # POLL·1 C1·954 C1E·76866 C3·96 C6·88 C7s·0 C8·206 C9·0 C10·246
	Benchmarktcprtt(c_c)-/1B 1 117.6509 µs/op       # TCP latency using rio: 117.6509 microseconds  # POLL·4 C1·731 C1E·84210 C3·103 C6·93 C7s·0 C8·180 C9·0 C10·187
	Benchmarktcprtt(c_c)-/1B 1 116.8125 µs/op       # TCP latency using rio: 116.8125 microseconds  # POLL·9 C1·550 C1E·79544 C3·110 C6·213 C7s·0 C8·508 C9·0 C10·475

	...

And its summary via benchstat:

	name                 time/op
	pingrtt-/16B-min     80.0µs ± 0%
	pingrtt-/16B-avg     97.0µs ± 0%
	-pingrtt/16B-min     79.0µs ± 0%
	-pingrtt/16B-avg      112µs ± 0%
	pingrtt-/1452B-min    241µs ± 0%
	pingrtt-/1452B-avg    303µs ± 0%
	-pingrtt/1452B-min    266µs ± 0%
	-pingrtt/1452B-avg    303µs ± 0%
	tcprtt(c_c)-/1B       117µs ± 1%
	tcprtt(c_go)-/1B      122µs ± 2%
	-tcprtt(c_c)/1B       117µs ± 1%
	-tcprtt(c_go)/1B      121µs ± 5%
	tcprtt(c_c)-/1400B    392µs ± 4%
	tcprtt(c_go)-/1400B   363µs ±18%
	-tcprtt(c_c)/1400B    412µs ±21%
	-tcprtt(c_go)/1400B   391µs ±38%
	tcprtt(c_c)-/1500B    271µs ±18%
	tcprtt(c_go)-/1500B   290µs ±21%
	-tcprtt(c_c)/1500B    282µs ±16%
	-tcprtt(c_go)/1500B   334µs ±24%
	tcprtt(c_c)-/4096B    711µs ± 5%
	tcprtt(c_go)-/4096B   737µs ± 5%
	-tcprtt(c_c)/4096B    740µs ± 2%
	-tcprtt(c_go)/4096B   711µs ± 7%

Latencies here are not good because for this run on rio interrupt mitigation
was not tuned (see below). By the way, analyzing ping RTT latencies on our
shuttle machines (similar to rio) resulted in the following kernel patch

	https://git.kernel.org/linus/509708310c (released with Linux 4.15)

to fix/being able to adjust interrupt mitigation on Realtek NICs.

While at networking topic, teach info/info-local to show related
information about node's NICs. Example lines output for deco:

	nic/eth0: Intel Corporation Ethernet Connection I219-LM rev 21
	nic/eth0/features: rx tx sg tso !ufo gso gro !lro rxvlan txvlan !ntuple rxhash ...
	nic/eth0/coalesce: rxc: 3μs/0f/0μs-irq/0f-irq,  txc: 0μs/0f/0μs-irq/0f-irq
	nic/eth0/status:   up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
	nic/wlan0: Intel Corporation Wireless 8260 rev 3a
	nic/wlan0/features: !rx !tx sg !tso !ufo gso gro !lro !rxvlan !txvlan !ntuple !rxhash ...
	nic/wlan0/coalesce: rxc: ?,  txc: ?
	nic/wlan0/status:   down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
	WARNING: nic/wlan0: TSO not enabled - TCP latency with packets > MSS will be poor

for rio:

	nic/eth0: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06
	nic/eth0/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ...
	nic/eth0/coalesce: rxc: 200μs/4f/0μs-irq/0f-irq,  txc: 200μs/4f/0μs-irq/0f-irq
	nic/eth0/status:   up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
	WARNING: nic/eth0: TSO not enabled - TCP latency with packets > MSS will be poor
	WARNING: nic/eth0: RX coalesce latency is max 200μs - that will add to networked request-reply latency
	nic/eth1: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06
	nic/eth1/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ...
	nic/eth1/coalesce: rxc: 0μs/1f/0μs-irq/0f-irq,  txc: 0μs/1f/0μs-irq/0f-irq
	nic/eth1/status:   down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
	WARNING: nic/eth1: TSO not enabled - TCP latency with packets > MSS will be poor

The warning about "RX coalesce latency is max 200μs ..." says that on
receive path eth0 will be coalescing incoming frames for up to 200μs and
this way this delay will be added to overal latency. (for small frames
Realtek NICs do not coalesce interrupts - see details in the kernel patch).

Networked performance (raw and NEO) was not discussed in
http://navytux.spb.ru/~kirr/neo.html at all, but for the reference the
importance of C-states for performance was first found via this
networking latency benchmarks. Links on C-states topic:

	http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states
	http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/e8e395ae	X neotest: Move network benchmarking into separate function + add `neotest bench-net`
	lab.nexedi.com/kirr/neo/commit/a971231c	X neotest/info: Handle USB NICs
	lab.nexedi.com/kirr/neo/commit/5dd3d1ab	X neotest: sort NIC names
	lab.nexedi.com/kirr/neo/commit/9888f047	X neotest: Do not crash if kernel is too old to support gro_flush_timeout
	lab.nexedi.com/kirr/neo/commit/3a1bdf4a	X bench-remote / tcp : std benchmark output
	lab.nexedi.com/kirr/neo/commit/9450b6db	X bench-remote / ping += std bench output
	lab.nexedi.com/kirr/neo/commit/68d5b015	X show gro_flush_timeout + friends
	lab.nexedi.com/kirr/neo/commit/4c815af9	X neotest: Show NIC features and emit warning if !TSO
	lab.nexedi.com/kirr/neo/commit/659ce938	X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc...
	lab.nexedi.com/kirr/neo/commit/ded384cb	X neotest += `lat_tcp.go -s`
	lab.nexedi.com/kirr/neo/commit/59d46504	X neotest += lat_tcp
	lab.nexedi.com/kirr/neo/commit/67fc3440	X show small (56B) and full-packet (1472B) ping link latencies

26006d7e

go/neo/t/neotest: Disk information & benchmarks · ee6c2796

Kirill Smelkov authored Jul 09, 2018

Add to neotest bench-disk command that performs random-read disk
benchmarks via ioping. Example output:

	(venv) (8) neotest@rio:~/8/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-disk
	node:   rio
	cluster:

	*** disk: random direct (no kernel cache) 4K-read latency

	--- . (ext4 /dev/sda1) ioping statistics ---
	29.1 k requests completed in 2.95 s, 113.6 MiB read, 9.85 k iops, 38.5 MiB/s
	generated 29.1 k requests in 3.00 s, 113.6 MiB, 9.69 k iops, 37.9 MiB/s
	min/avg/max/mdev = 43.2 us / 101.5 us / 250.0 us / 7.48 us
	Benchmarkdisk/randread/direct/4K-min 1 43.2 us/op
	Benchmarkdisk/randread/direct/4K-avg 1 101.5 us/op
	< 59.2 us       458     |
	< 63.0 us       0       |
	< 66.7 us       0       |
	< 70.5 us       0       |
	< 74.2 us       1       |
	< 78.0 us       1       |
	< 81.7 us       0       |
	< 85.5 us       1       |
	< 89.2 us       0       |
	< 93.0 us       0       |
	< 96.7 us       0       |
	< 100.5 us      333     |
	< 104.2 us      27793   | ***********************************************
	< 108.0 us      259     |
	< 111.7 us      21      |
	< 115.5 us      8       |
	< 119.2 us      18      |
	< 123.0 us      1       |
	< 126.7 us      0       |
	< 130.5 us      7       |
	< 134.2 us      59      |
	<       +∞      18      |
	# POLL·186 C1·291360 C1E·290802 C3·31 C6·1218

	...

	*** disk: random cached 4K-read latency

	--- . (ext4 /dev/sda1) ioping statistics ---
	3.15 M requests completed in 2.82 s, 12.0 GiB read, 1.12 M iops, 4.26 GiB/s
	generated 3.15 M requests in 3.00 s, 12.0 GiB, 1.05 M iops, 4.00 GiB/s
	min/avg/max/mdev = 465 ns / 896 ns / 37.4 us / 183 ns
	Benchmarkdisk/randread/pagecache/4K-min 1 465 ns/op
	Benchmarkdisk/randread/pagecache/4K-avg 1 896 ns/op
	< 839 ns        771375  | ************
	< 872 ns        609361  | *********
	< 905 ns        660635  | **********
	< 938 ns        505305  | ********
	< 971 ns        189182  | ***
	< 1.00 us       93655   | *
	< 1.04 us       70811   | *
	< 1.07 us       57650   |
	< 1.10 us       51587   |
	< 1.14 us       44648   |
	< 1.17 us       40868   |
	< 1.20 us       27301   |
	< 1.24 us       12503   |
	< 1.27 us       5580    |
	< 1.30 us       2517    |
	< 1.34 us       1404    |
	< 1.37 us       698     |
	< 1.40 us       378     |
	< 1.44 us       208     |
	< 1.47 us       119     |
	< 1.50 us       50      |
	<       +∞      978     |
	# POLL·1 C1·57 C1E·11 C3·2 C6·257

The benchmarks are so done so that output conforms to Go benchmarking
format. This way it is possible to process the output with benchstat to
summarize / compare results. For above run summarization gives:

	name                            time/op
	disk/randread/direct/4K-min     39.0µs ±12%
	disk/randread/direct/4K-avg      117µs ± 3%
	disk/randread/pagecache/4K-min   461ns ± 3%
	disk/randread/pagecache/4K-avg   880ns ± 0%

While at disk topic, teach info/info-local to show related information
about node's disk. Example line output for rio:

	disk/sda: Samsung SSD 840   rev BB0Q 931.5G

Please see

	http://navytux.spb.ru/~kirr/neo.html#the-need-for-faster-storage
	http://navytux.spb.ru/~kirr/neo.html#appendix-i-ssd-latency

for some discussion about SSD performance.

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/d35a2fdf	X neotest/info-local: Fix disk display in presence of bind-mounts
	lab.nexedi.com/kirr/neo/commit/44529dbf	X neotest/bench-disk: Deduplicate code; change 1M -> 2M
	lab.nexedi.com/kirr/neo/commit/8bac3dba	X neotest/bench-disk: Also benchmark randomly reading 1M blocks
	lab.nexedi.com/kirr/neo/commit/e795c6ed	X neotest: Fix disk display in case of DM
	lab.nexedi.com/kirr/neo/commit/352cd100	X neotest: Fix disk display in case of MD
	lab.nexedi.com/kirr/neo/commit/cd2cd093	X bench_disk: Add std bench format
	lab.nexedi.com/kirr/neo/commit/9f86eb40	X bench += ioping

ee6c2796

go/neo/t/neotest: CPU information & benchmarks · a60c472c

Kirill Smelkov authored Jul 09, 2018

Add to neotest bench-cpu command that performs basic CPU benchmarks:
pystone and CRC32/SHA1 for now. While every benchmark is run
additionally C-states profile is collected(*). Example output:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu
	node:   deco
	cluster:
	Benchmarkpystone 1 283297 pystone/s     # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6
	Benchmarkpystone 1 289788 pystone/s     # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7
	Benchmarkpystone 1 286329 pystone/s     # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6
	Benchmarkpystone 1 292087 pystone/s     # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3
	Benchmarkpystone 1 290119 pystone/s     # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5
	Benchmarkcrc32/py/4K 300000     3.415 µs/op     # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71
	Benchmarkcrc32/py/4K 300000     3.402 µs/op     # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77
	Benchmarkcrc32/py/4K 300000     3.396 µs/op     # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36
	Benchmarkcrc32/py/4K 300000     3.435 µs/op     # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79
	Benchmarkcrc32/py/4K 300000     3.434 µs/op     # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55
	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295
	Benchmarkcrc32/go/4K 10000000   0.216 µs/op     # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330
	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301
	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309
	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328
	Benchmarksha1/py/4K 300000      4.553 µs/op     # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94
	Benchmarksha1/py/4K 300000      4.459 µs/op     # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92
	Benchmarksha1/py/4K 300000      4.492 µs/op     # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62
	Benchmarksha1/py/4K 300000      4.550 µs/op     # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93
	Benchmarksha1/py/4K 300000      4.518 µs/op     # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78
	Benchmarksha1/go/4K 300000      4.312 µs/op     # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190
	Benchmarksha1/go/4K 300000      4.383 µs/op     # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182
	Benchmarksha1/go/4K 300000      4.387 µs/op     # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186
	Benchmarksha1/go/4K 300000      4.328 µs/op     # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179
	Benchmarksha1/go/4K 300000      4.337 µs/op     # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191

Such raw output can be summarized with the help of benchstat - either
with Go[1] or Python[2] implementations:

	$ benchstat x.txt
	name         pystone/s
	pystone        288k ± 2%

	name         time/op
	crc32/py/4K  3.42µs ± 1%
	crc32/go/4K   218ns ± 1%
	sha1/py/4K   4.51µs ± 1%
	sha1/go/4K   4.35µs ± 1%

See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some
discussion on SHA1 vs CRC32.

While at CPU topic, teach info/info-local to show related information
about node's CPU: available processors, frequency and idle governors.
Example of lines added:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6
	...
	cpu:    Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
	cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz]
	cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs
	WARNING: cpu: frequency not fixed - benchmark timings won't be stable
	WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable
	WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency)

See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to
understand why there are warnings in above example.

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/cf1f7c24	X tcpu: Don't depend on running tests with cwd = .../go/neo/t/
	lab.nexedi.com/kirr/neo/commit/1e438610	fixup! X neotest: Also show target-latency for C-states
	lab.nexedi.com/kirr/neo/commit/4af48245	X neotest: Also show target-latency for C-states
	lab.nexedi.com/kirr/neo/commit/2910cf56	X neotest: Prefer first part of FQDN for hostname
	lab.nexedi.com/kirr/neo/commit/c86ba1b0	X bench-cpu += crc32, adler32
	lab.nexedi.com/kirr/neo/commit/4ac3a550	X neotest: Don't use bc
	lab.nexedi.com/kirr/neo/commit/3918a997	X neotest: Don't assume we are invoked from the directory where neotest is
	lab.nexedi.com/kirr/neo/commit/9a266d11	X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B
	lab.nexedi.com/kirr/neo/commit/b6a830d8	X switch cpu benchmarks to go format
	lab.nexedi.com/kirr/neo/commit/4436b983	X neotest: Provide cpustat command so it is possible to cpustat something external
	lab.nexedi.com/kirr/neo/commit/b062b349	X microbenchmark CPU first
	lab.nexedi.com/kirr/neo/commit/a4a18b55	X first cut on C-state profiling
	lab.nexedi.com/kirr/neo/commit/ea1e0835	X found that cpuidle can be affecting latency a lot!

(*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and
    http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for
    why this is important.

    Since being able to profile C-states can be generally useful, we
    expose such profiling with externally-visible `neotest cpustat` utility.

[1] https://godoc.org/golang.org/x/perf/cmd/benchstat
[2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py

a60c472c

go/neo/t: Neotest start · c12f2991

Kirill Smelkov authored Jul 08, 2018

For testing and benchmarking how NEO/py & NEO/go interact with each
other we need corresponding test driver. Neotest will be that driver and
present patch begins it:

- neotest can deploy NEO/{go,py} checkout either locally or on remote node;
- it can run NEO/{go,py} unit tests either locally or on remote node;
- it can also show information about a system - either local or remote.

Examples in action. Banner:

    x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest
    Neotest is a tool to test and benchmark NEO.

    Usage:

            neotest command [arguments]

    The commands are:

            test            run all tests on a remote host
            test-local      run all tests locally

            test-go         run NEO/go unit tests   (part of test-local)
            test-py         run NEO/py unit tests   (part of test-local)

            deploy          deploy NEO & needed software for tests to remote host
            deploy-local    deploy NEO & needed software for tests locally

            info            print information about a node
            info-local      print information about local deployment

Deploy to another node:

    x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest deploy neotest@rio.kirr.nexedi.com:3
    *** deploying to neotest@rio.kirr.nexedi.com:3 ...
    ...
    # deployed ok

Print information about that node:

    x/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:3
    date:   Sun, 08 Jul 2018 21:30:43 +0300
    xnode:  neotest@rio.kirr.nexedi.com (2401:5180:0:2a::1 192.168.0.8)
    uname:  Linux rio 4.16.0-2-amd64 #1 SMP Debian 4.16.16-2 (2018-06-22) x86_64 GNU/Linux
    sw/python:          Python 2.7.15
    sw/go:              go version go1.10.3 linux/amd64
    sw/sqlite:          sqlite 3.24.0 (py mod 2.6.0)
    sw/mysqld:          mysqld  Ver 10.1.29-MariaDB-6+b1 for debian-linux-gnu on x86_64 (Debian buildd-unstable)
    sw/neo:             v1.9-42-g972ff5f9
    sw/zodb:            5.4.0
    sw/zeo:             5.2.0
    sw/mysqlclient:     1.3.13

Run NEO/{py,go} unit tests there:

    x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest test neotest@rio.kirr.nexedi.com:4
    ?       lab.nexedi.com/kirr/neo/go/internal/packed      [no test files]
    ok      lab.nexedi.com/kirr/neo/go/neo/neonet   (cached)
    ok      lab.nexedi.com/kirr/neo/go/neo/proto    (cached)
    ok      lab.nexedi.com/kirr/neo/go/zodb (cached)
    ?       lab.nexedi.com/kirr/neo/go/zodb/cmd/zodb        [no test files]
    ?       lab.nexedi.com/kirr/neo/go/zodb/internal/pickletools    [no test files]
    ?       lab.nexedi.com/kirr/neo/go/zodb/storage [no test files]
    ok      lab.nexedi.com/kirr/neo/go/zodb/storage/fs1     (cached)
    ?       lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/cmd/fs1     [no test files]
    ok      lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fs1tools    (cached)
    ?       lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fsb [no test files]
    ?       lab.nexedi.com/kirr/neo/go/zodb/storage/zeo     [no test files]
    ?       lab.nexedi.com/kirr/neo/go/zodb/wks     [no test files]
    ok      lab.nexedi.com/kirr/neo/go/zodb/zodbtools       (cached)
    ...........................................................................E.EE....c....^C

Neotest will be the driver that was used to prepare benchmarks in
http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html .

Some draft history related to this patch:

lab.nexedi.com/kirr/neo/commit/d46afb3e		X neotest: Teach it to also run go & py unit tests; hook it into nxd/runTestSuite
lab.nexedi.com/kirr/neo/commit/f694d643		X neotest: Don't fail silently if network address detection fails
lab.nexedi.com/kirr/neo/commit/fa78290a		X neotest: FQDN host name might be not configured
lab.nexedi.com/kirr/neo/commit/56faccad		X neotest/info-local: Don't crash if a prog could not be found
lab.nexedi.com/kirr/neo/commit/f2932247		X neotest/info-local: Don't crash if an egg could not be found
lab.nexedi.com/kirr/neo/commit/f06b7302		X neotest: Determine machine IP addresses via `ip ...` directly, not `getent ...`
lab.nexedi.com/kirr/neo/commit/42e5fe71		X neotest info

c12f2991

08 Jul, 2018 11 commits

go/zodb/zeo: Add support for protocols preceding ZEO5 · 5f30b4c0

Kirill Smelkov authored Mar 26, 2018

In particular try to support ZEO4:

- during handshake we now first wait for remote server to announce its
  preferred protocol, and only then send the version we select to use.
  This is the procedure original ZEO server-client do.

- teach rpc.call to decode exceptions not only for how ZEO5 encodes them
  (marking via 2 flag in "async" field), but also on how ZEO4 and
  earlier encode them: via replying with (exc_type, exc_inst) and
  expecting client to dynamically check exc_type is a subtype of
  Exception.

- handle other protocol differences - e.g. ZEO5 returns last_tid on
  register(), while earlier versions return nothing there.

Tests pending.

5f30b4c0

go/zodb: Teach ZODB/go to access ZEO (draft) · b65f6d0f

Kirill Smelkov authored Feb 23, 2018

For the reference on deco (performance, frequency not fixed):

	name                           time/object
	deco/fs1/zhash.py              15.8µs ± 2%
	deco/fs1/zhash.py-P16           116µs ±12%
	deco/fs1/zhash.go              2.60µs ± 0%
	deco/fs1/zhash.go+prefetch128  3.70µs ±11%
	deco/fs1/zhash.go-P16          13.4µs ±43%
	deco/zeo/zhash.py               316µs ± 7%
	deco/zeo/zhash.py-P16          2.68ms ± 7%
	deco/zeo/zhash.go               111µs ± 2%
	deco/zeo/zhash.go+prefetch128  57.7µs ± 2%
	deco/zeo/zhash.go-P16          1.23ms ± 5%

and in particular it shows that with the same ZEO/py server, the latency
to load an object via py client is ~ 3x worse compared to the latency to
load the same object via hereby Go client.

The performance was obtained via forthcoming neotest, and in particular
ZEO/go client will be also used in forthcoming zwrk (no analog on python side).

See http://navytux.spb.ru/~kirr/neo.html#performance-tests for details.

Tests: pending.

b65f6d0f

go/zodb: Start putting pickle-related utilities into internal/pickletools package · 2ee495ce

Kirill Smelkov authored Jul 08, 2018

As the first step factor-out int64 Xint64 checker from zodb/storagefs1/index.go
into there.  We'll need the checker in the next patch.

2ee495ce

go/neo/neonet: Lightweight mode · ec4b3ce0

Kirill Smelkov authored Jul 08, 2018

In situations when created connections are used to only send/receive 1
request/response, the overhead to create/shutdown full connections could be
too much. Unfortunately this is exactly the mode that is currently
primarily used for compatibility with NEO/py. To help mitigate the overhead
in such scenarios, lightweight connections mode is provided:

At requester side, one message can be sent over node link with link.Send1 .
Inside a connection will be created and then shut down, but since the
code manages whole process internally and does not show the connection to
user, it can optimize those operations significantly. Similarly link.Ask1
sends 1 request, receives 1 response, and then puts the connection back into
pool for later reuse.

At receiver side, link.Recv1 accepts a connection with the first message
remote peer sent us when establishing it, and wraps the result into Request
object. The Request contains the message received and internally the
connection. A response can be sent back via Request.Reply. Then once
Request.Close is called the connection object that was accepted is
immediately put back into pool for later reuse.

Some history of lightweight mode:

lab.nexedi.com/kirr/neo/commit/0fa96338 X Clarified Request.Close semantics - tests working again
lab.nexedi.com/kirr/neo/commit/a5ac1652 X Ask1: switch to sending directly over link
lab.nexedi.com/kirr/neo/commit/755e3654 X Request.Reply: switch to replying directly over link
lab.nexedi.com/kirr/neo/commit/c643ba53 X Send1: switch to sending directly over link
lab.nexedi.com/kirr/neo/commit/7dcbc9c5 X Send1: switch to lightClose
lab.nexedi.com/kirr/neo/commit/851864a9 X chan RTT benchmark which simulates Recv1 = Accept + Recv
lab.nexedi.com/kirr/neo/commit/099bfc29 X freelist for PktBuf
lab.nexedi.com/kirr/neo/commit/58c2e39a X Benchmark for link Ask1/Recv1 over TCP loopback

ec4b3ce0

go/neo/neonet: User-visible Send/Recv + Ask/Expect · 39982595

Kirill Smelkov authored Jul 08, 2018

Provide Conn.Send and Conn.Recv which work on NEO messages and
automatically encode/decode them into packets on the fly.

Similarly to NEO/py also provide Ask to send a request and receive
expected reply and Expect which does only the latter half of Ask.

39982595

go/neo/neonet: Link establishment · e8396003

Kirill Smelkov authored Jul 08, 2018

Implement NEO protocol handshaking and use it in newly provided DialLink
and ListenLink which correspondingly first do regular network dial or
listen and than perform the handshake on just established TCP
connection. If handshake goes ok, the result is wrapped into NodeLink.

Some history:

	lab.nexedi.com/kirr/neo/commit/8d0a1469	X Handshake draftly done

See also http://navytux.spb.ru/~kirr/neo.html#development-overview
(starting from "The neonet module also provides DialLink and ListenLink
...")

e8396003

go/neo/neonet: Start (first draft) · 64513925

Kirill Smelkov authored Jul 06, 2018

Continue NEO/go with neonet - the layer to exchange messages in between
NEO nodes.

NEO/go shifts from thinking about NEO protocol logic as RPC to thinking
of it as more general network protocol and so settles to provide general
connection-oriented message exchange service. This way neonet provides
generic connection multiplexing on top of a single TCP node-node link.

Neonet compatibility with NEO/py depends on the following small NEO/py patch:

    dd3bb8b4

which adjusts message ID a bit so it behaves like stream_id in HTTP/2:

    - always even for server initiated streams
    - always odd  for client initiated streams

and is incremented by += 2, instead of += 1 to maintain above invariant.

See http://navytux.spb.ru/~kirr/neo.html#development-overview (starting from
"Then comes the link layer which provides service to exchange messages over
network...") for the rationale.

Unfortunately current NEO/py maintainer is very much against merging that patch.

This patch brings in the core of neonet. Next patches will add initial
handshaking, user-level Send/Recv + Ask/Expect and "lightweight mode".

Some neonet core history:

lab.nexedi.com/kirr/neo/commit/6b9ed46d	X neonet: Avoid integer overflow on max packet length check
lab.nexedi.com/kirr/neo/commit/8eac771c	X neo/connection: Fix race between link.shutdown() and conn.lightClose()
lab.nexedi.com/kirr/neo/commit/8021a1d5	X rxghandoff
lab.nexedi.com/kirr/neo/commit/68738036	X ... but negative impact on separate client / server processes, strange ...
lab.nexedi.com/kirr/neo/commit/b0dda9d2	X serveRecv: help Go scheduler to switch to receiving G sooner
lab.nexedi.com/kirr/neo/commit/4989918a	X remove defer from rx/tx hot paths
lab.nexedi.com/kirr/neo/commit/e055406a	X no select for acceptq - similarly for rxq path
lab.nexedi.com/kirr/neo/commit/c28ad4d0	X Conn.Recv: receive without select
lab.nexedi.com/kirr/neo/commit/496bd425	X add benchmark RTT over plain net.Conn with serveRecv-style RX handler
lab.nexedi.com/kirr/neo/commit/9fa79958	X draft how to mark RX down without reallocating .rxdown
lab.nexedi.com/kirr/neo/commit/4324c812	X restore all Conn functionality
lab.nexedi.com/kirr/neo/commit/a8e61d2f	X serveSend is not needed
lab.nexedi.com/kirr/neo/commit/9d047b36	X recvPkt via only 1 syscall
lab.nexedi.com/kirr/neo/commit/b555a507	X baseline net RTT benchmark
lab.nexedi.com/kirr/neo/commit/91be5cdd	X everyone is listening from start; CloseAccept to disable listening - works
lab.nexedi.com/kirr/neo/commit/c2a1b63a	X naming: Packet = raw data; Message = meaningful object
lab.nexedi.com/kirr/neo/commit/6fd0c9be	X connection: Adding context to errors from NodeLink and Conn operations
lab.nexedi.com/kirr/neo/commit/65b17bdc	X rework Conn acceptance to be explicit via NodeLink.Accept

64513925

go/neo/proto: Test that message codes are the same in between Go and Python NEO versions · 5beab048

Kirill Smelkov authored Jul 05, 2018

This brings some go/py compatibility checks that verify go and python
treat a message code equally. Although messages encoding are tested in
the previous patch there is no explicit tests for go/py compatibility on
messages encoding.

5beab048

go/neo/proto: Serialization support · ea5f7d61

Kirill Smelkov authored Jul 05, 2018

Provide a way for every message to be encoded/decoded to/from NEO wire
encoding. For this introduce Msg interface with wire coding methods and
provide such methods for all message types.

For selected types the methods are implemented manually.
For most of the types the methods are generated automatically by protogen.go program.

protogen.go was mentioned in http://navytux.spb.ru/~kirr/neo.html#development-overview
in "On server-side NEO/go work started by first implementing messages
serialization in exactly the same wire format as NEO/py does ..." paragraph.

A bit of late protogen fixups history:

	lab.nexedi.com/kirr/neo/commit/c884bfd5
	lab.nexedi.com/kirr/neo/commit/385d813a
	lab.nexedi.com/kirr/neo/commit/0f7e0b00
	lab.nexedi.com/kirr/neo/commit/de3ef2c0

Also a message type can be reverse-looked up by message code via MsgType().
This will be later used in network receive code path.

ea5f7d61

go/neo/proto: String/Error/Address/UUID conversion · fcd6f9f6

Kirill Smelkov authored Jul 05, 2018

Provide routines to convert selected types to string and also for
UUID and Address <-> string  encoding/decoding.

fcd6f9f6

go/neo/proto: Start (first draft) · 6301c23f

Kirill Smelkov authored Jul 05, 2018

Start NEO/go code with protocol package that defines message that NEO
nodes exchange in between each other. The definition is based on
neo/lib/protocol.py and is kept in sync with that file.

This commit brings only messages definition. Messages serialization will
come in the follow-up patch.

6301c23f

05 Jul, 2018 1 commit
- go/internal/packed: Small package that provides types to be used in packed structures. · 49a9cdcc
  Kirill Smelkov authored Jul 04, 2018
```
We will need to use BE16 and BE32 in the next patch.
```
  49a9cdcc
04 Jul, 2018 1 commit
- go/zodb/*.py: qq moved from zodbtools -> pygolang · cdd7234b
  Kirill Smelkov authored Jul 04, 2018
```
See nexedi/zodbtools@b1163449 for details.
```
  cdd7234b
03 Jul, 2018 1 commit
- Sync with NEO/py v1.9 · 4a2407dc
  Kirill Smelkov authored Jul 03, 2018
  
  4a2407dc
09 Apr, 2018 1 commit

go/zodb/zodbtools/dump: Don't use goto · 5bf40022

Kirill Smelkov authored Apr 09, 2018

Tomáš Peterka noticed that gotos in dump.go are not actually needed because
the same functionality could be achieved with defer in more clean and
structured way.

Do it.

This brings ~ 5% performance hit

	name        old time/op  new time/op  delta
	ZodbDump-4   148µs ± 1%   155µs ± 2%  +4.69%  (p=0.000 n=9+10)

because defer implementation is currently not great
(https://github.com/golang/go/issues/14939)

If we absolutely need those 5% back it could be worked around similar to
e.g. FileStorage.Load:

https://lab.nexedi.com/kirr/neo/blob/6faed528/go/zodb/storage/fs1/filestorage.go#L133
https://lab.nexedi.com/kirr/neo/blob/6faed528/go/zodb/storage/fs1/filestorage.go#L141

/suggested-by @katomaso

5bf40022

13 Mar, 2018 1 commit
- Release version 1.9 · 1b57a7ae
  Julien Muchembled authored Mar 13, 2018
  
  1b57a7ae
02 Mar, 2018 3 commits

master: fix resumption of backup replication (internal or not) · 27229793

Julien Muchembled authored Feb 27, 2018

Before, it waited for upstream activity until all partitions are touched.
However, when upstream is idle the backup cluster could remain stuck forever
if it was interrupted whereas some cells were still late.

27229793

master: fix/simplify generation of TID · 7b2e6752
Julien Muchembled authored Feb 27, 2018
```
The 'min_tid < new_tid' assertion failed when jumping to the past.
```
7b2e6752

master: fix possible failure when reading data in a backup cluster with replicas · ca2f7061

Julien Muchembled authored Feb 14, 2018

Given that:
- read locks are only taken by transactions (not replication)
- in backup mode, storage nodes stay in UP_TO_DATE state, even if partitions
  are synchronized up to different tids

there was a race condition with the master node replying to LastTransaction
with a TID that may not be replicated yet by all replicas, potentially causing
such replicas to reply OidDoesNotExist or OidNotFound if a client asks it data
too early.

IOW, even if the cluster does contain the data up to `getBackupTid(max)`,
it is only readable by NEO clients up to `getBackupTid(min)` as long as the
cluster is in BACKINGUP state.

ca2f7061

17 Jan, 2018 1 commit

client: kill .supportsTransactionalUndo() · f95f336a

Kirill Smelkov authored Jan 16, 2018

Usage of supportsTransactionalUndo() was removed from ZODB in 2007 - see
e.g. the following commits:

https://github.com/zopefoundation/ZODB/commit/a06bfc03
https://github.com/zopefoundation/ZODB/commit/e667b022
https://github.com/zopefoundation/ZODB/commit/f595f7e7
...

/reviewed-by @vpelletier
/reviewed-on !8

f95f336a