1. 13 Oct, 2017 8 commits
  2. 12 Oct, 2017 4 commits
    • Test's avatar
      X neo1 - neo2 timings; delta tgroX, txc · 9cab8f1f
      Test authored
      compared to tgro5:
      
      tgro0   py -> noise (ZEO, NEO/pylite, NEO/pysql, partly neo/go)
      tgro10  py=no-noise, tpy~=tgro5; tgo serial~same, +prefetch more noise (?) 40|65
      tgro15  lat_tcp 1B 49->45 sometime (consistently) otherwise more-or-less same
      tgro20  lat_tcp 4K 170µs->167µs (c)
      tgro25  lat_tcp 4K 170µs->167µs (c), lat_tcp 4K 175 -> 171µs (go)
              ZEOv a bit
              go-go + prefetch !sha1 -> noise + higher (~65µs -> 70 - 110µs)
      
      txc40µs4f seems to be worse than txc200µs4f
      9cab8f1f
    • Test's avatar
      X time neo1 - neo2; δ: gro_flush_timeout=5µs, txc=20μs/0f/0μs-irq/0f-irq · 16f249f1
      Test authored
      Link and TCP* latencies are ~ same and stable.
      
      Compared to b288e62f py timings stabilize:
      
      ZEO		~580-1045μs	-> 585µs
      NEO/pylite	~600-700μs	-> ~550-570µs	(Cpy)
      NEO/pylite	~525-580μs	-> ~450µs	(Cgo)
      NEO/pysql	~820-930μs	-> ~840µs	(Cpy)
      NEO/pysql	~740-800μs	-> ~740µs	(Cgo)
      
      Go timings get a bit worse:
      
      NEO/go		~360µs		-> ~380µs	(Cpy)
      NEO/go		~160µs		-> ~165-170µs	(Cgo)
      NEO/go-nosha1	~140µs		-> ~150µs
      
      Compared to 15a9ccef go+pretech128 timings stabilize:
      
      go-go+prefetch128		~65-160µs	-> ~40-45µs, 60µs(x1)
      go-go+prefetch128 (!sha1)	~60-150µs	-> ~60µs,    40µs(x1)
      16f249f1
    • Kirill Smelkov's avatar
      X show gro_flush_timeout + friends · 68d5b015
      Kirill Smelkov authored
      68d5b015
    • Kirill Smelkov's avatar
      . · 86ab45eb
      Kirill Smelkov authored
      86ab45eb
  3. 11 Oct, 2017 1 commit
    • Test's avatar
      X reran neo1 - neo2 after reloading r8169 driver on both machines · 15a9ccef
      Test authored
      Compared to run from Oct10 txc settings were not changed and they are now
      200μs/4f/0μs-irq/0f-irq on both machines instead of 200μs/4f/0μs-irq/0f-irq on
      neo1 and 200μs/0f/0μs-irq/0f-irq on neo2.
      
      The noise about py runs remains.
      
      Go-go serial time remains the same ~155μs without much noise, but Go-go time
      with +prefetch128 gets worse:
      
      NEO/go Cgo +prefetch128
      
              ~45μs   -> 65 - 160μs
      
      Probably it makes sense to tune txc too...
      15a9ccef
  4. 10 Oct, 2017 8 commits
    • Test's avatar
      X neo1 - neo2 timings with turned on TSO, SG, TX · b288e62f
      Test authored
      See previous commit for details about why latency is bad for TCP payload > MSS
      without TSO.
      
      Compared to latest neo1-neo2 timings from Oct05 (C-states disabled, no
      rx-delay) it improves:
      
      TCP1             ~45μs  ->   ~45μs
      TCP1472         ~430μs			# TCP lat. anaomaly
      TCP1400                 ->  ~120μs	# finally
      TCP1500                 ->  ~130μs	# fixed
      TCP4096         ~285μs  ->  ~170μs	# !
      
      ZEO             ~670μs  ->  ~580-1045μs (?)
      NEO/pylite      ~605μs  ->  ~600-700μs  (?)     (Cpy)
      NEO/pylite      ~505μs  ->  ~525-580μs  (?)     (Cgo)
      NEO/pysql       ~900μs  ->  ~820-930μs  (?)     (Cpy)
      NEO/pysql       ~780μs  ->  ~740-800μs  (?)     (Cgo)
      NEO/go          ~430μs  ->  ~360μs              (Cpy)	# <-- NOTE
      NEO/go          ~210μs  ->  ~160μs              (Cgo)	# <-- NOTE
      NEO/go-nosha1   ~190μs  ->  ~140μs			# <-- NOTE
      
      not sure about noise in pure py runs but given raw tcp latency absolutely
      improves this should be a good change to make.
      b288e62f
    • Kirill Smelkov's avatar
      X neotest: Show NIC features and emit warning if !TSO · 4c815af9
      Kirill Smelkov authored
      On neo1 - neo2 without TSO latency becomes very poor in lat_tcp when
      payload size becomes greater TCP MSS (lat_tcp -m 1448 ~ 130μs; lat_tcp
      -m 1449 ~ 500μs and more)
      4c815af9
    • Kirill Smelkov's avatar
      4436b983
    • Kirill Smelkov's avatar
      . · 474c17bc
      Kirill Smelkov authored
      474c17bc
    • Kirill Smelkov's avatar
      X time for z6001 (localhost) and z6001-z600 · b6d6344d
      Kirill Smelkov authored
      Same a before with added information.
      In particular z6001-z600 shows there is no TCP RR step-wise 400μs increase
      while going 1400B -> 1500B as it is currently the case on RTL.
      b6d6344d
    • Kirill Smelkov's avatar
      . · 50b0d130
      Kirill Smelkov authored
      50b0d130
    • Test's avatar
      X neo1 - neo2 timings · c5df0f47
      Test authored
      Same as before but with more information and adjusted ping and TCP RR.
      
      In particular TCP RR shows:
      
      TCPRR1400	~ 120μs
      TCPRR1500	~ 430μs
      
      The time when this increases step-wise is when TCP packet crosses fitting 1 ethernet frame.
      It does not happen on z600* with their current settings.
      c5df0f47
    • Kirill Smelkov's avatar
  5. 09 Oct, 2017 6 commits
    • Test's avatar
      X neo1 localhost timings Oct09 · af41382e
      Test authored
      Same as Oct05 5μs C-latency with added:
      
      - pystones
      - t(sha1) py & go
      - zhash.go working in prefetch mode
      - 16 clients loading 1 server at the same time
      af41382e
    • Kirill Smelkov's avatar
      X deco localhost timings Oct09 · 1e9f23cb
      Kirill Smelkov authored
      Same as Oct05 5μs C-latency with added:
      
      - pystones
      - t(sha1) py & go
      - zhash.go working in prefetch mode
      - 16 clients loading 1 server at the same time
      1e9f23cb
    • Kirill Smelkov's avatar
      . · 0d1206b9
      Kirill Smelkov authored
      0d1206b9
    • Kirill Smelkov's avatar
      X neo/connection: Fix race between link.shutdown() and conn.lightClose() · 8eac771c
      Kirill Smelkov authored
      The scenario could be: nl.shutdown sets nl.connTab=nil and
      then iterates connTab snapshot taken under nl.connMu lock. If so
      this activity (which calls e.g. Conn.shutdown) could be running with
      conn.lightClose -> conn.release() in parallel.
      
      Bug triggers under neotest in zhash.go +prefetch128:
      
      	crc32:bf9deea9   ; oid=0..8499  nread=34134938  t=9.010696705s (1.060081ms / object)  x=zhash.go +prefetch128
      	==================
      	WARNING: DATA RACE
      	Write at 0x00c424b55ed0 by goroutine 147:
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).reinit()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:258 +0x102
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).release()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:240 +0x38
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).lightClose()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1633 +0xa4
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).Ask1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1719 +0xde
      	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client)._Load()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:466 +0x5d3
      	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client).Load()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:392 +0xa8
      	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).loadRCE()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:297 +0xfb
      
      	Previous write at 0x00c424b55ed0 by goroutine 75:
      	  sync/atomic.AddInt32()
      	      /home/kirr/src/tools/go/go/src/runtime/race_amd64.s:269 +0xb
      	  sync.(*Mutex).Unlock()
      	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
      	  sync.(*Once).Do()
      	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f
      
      	Goroutine 147 (running) created at:
      	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).Prefetch()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:198 +0xbb
      	  main.zhash.func2()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:134 +0x86
      	  main.zhash.func4()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:157 +0x8f
      	  main.zhash()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:193 +0xc7b
      	  main.main()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:107 +0x48d
      
      	Goroutine 75 (finished) created at:
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:391 +0x116
      	  sync.(*Once).Do()
      	      /home/kirr/src/tools/go/go/src/sync/once.go:44 +0xe1
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:383 +0x99
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).serveRecv()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:748 +0x651
      	==================
      
      and without -race zhash.go instrumentation it was showing itself as
      "panic: unlock of unlocked mutex" in
      
      	  sync.(*Mutex).Unlock()
      	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
      	  sync.(*Once).Do()
      	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
      	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
      	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
      	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f
      8eac771c
    • Kirill Smelkov's avatar
      X zodb: Disable Buf tests under race · e4a100de
      Kirill Smelkov authored
      It was failing in a strange way, e.g.
      
      panic: Buf: refcnt < 0
      fatal error: panic holding locks
      
      goroutine 19 [running]:
      runtime.throw(0x5f9d65, 0x13)
              /home/kirr/src/tools/go/go/src/runtime/panic.go:605 +0x95 fp=0xc42003cc20 sp=0xc42003cc00 pc=0x452935
      panic(0x5c21c0, 0x611f60)
              /home/kirr/src/tools/go/go/src/runtime/panic.go:451 +0x585 fp=0xc42003ccc8 sp=0xc42003cc20 pc=0x4524c5
      lab.nexedi.com/kirr/neo/go/zodb.(*Buf).Release(0xc4200900e0)
              /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer.go:112 +0x161 fp=0xc42003ccf8 sp=0xc42003ccc8 pc=0x59ebc1
      lab.nexedi.com/kirr/neo/go/zodb.TestBufAllocFree(0xc4200a80f0)
              /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer_test.go:111 +0x6ae fp=0xc42003cf78 sp=0xc42003ccf8 pc=0x5a137e
      testing.tRunner(0xc4200a80f0, 0x600930)
              /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d fp=0xc42003cfd0 sp=0xc42003cf78 pc=0x51688d
      runtime.goexit()
              /home/kirr/src/tools/go/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc42003cfd8 sp=0xc42003cfd0 pc=0x4835f1
      created by testing.(*T).Run
              /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x569
      
      goroutine 1 [chan receive]:
      testing.(*T).Run(0xc4200a8000, 0x5f9180, 0x10, 0x600930, 0xc420051c50)
              /home/kirr/src/tools/go/go/src/testing/testing.go:790 +0x59b
      testing.runTests.func1(0xc4200a8000)
              /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa8
      testing.tRunner(0xc4200a8000, 0xc420051d90)
              /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d
      testing.runTests(0xc420090040, 0x6ed4a0, 0x7, 0x7, 0x4159b0)
              /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x522
      testing.(*M).Run(0xc420051f20, 0x1d7ab28)
              /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x207
      main.main()
              lab.nexedi.com/kirr/neo/go/zodb/_test/_testmain.go:56 +0x1d4
      exit status 2
      FAIL    lab.nexedi.com/kirr/neo/go/zodb 0.006s
      
      becuase upon release object was not put back into pool, then we were
      getting another object from pool, complainig but still doing
      buf.Release.
      e4a100de
    • Kirill Smelkov's avatar
      X tracing: Silence race-detector about probe.Detach · bab79665
      Kirill Smelkov authored
      Race-detector does not know Probe.Detach works under world stopped and
      that this way it cannot break consistency of probes list attached to a
      trace event - on event signalling either a probe will be run or not run
      at all.
      
      And we do not mind that e.g. while Detach was in progress a probe was
      was read from traceevent list and deceded to be run and the probe
      function was actually called just after Detach finished.
      
      For this reason tell race-detector to not take into account all memory
      read/write that are performed while the world is stopped.
      
      If we do not it complains e.g. this way:
      
          ==================
          WARNING: DATA RACE
          Read at 0x00c42000d760 by goroutine 7:
            lab.nexedi.com/kirr/neo/go/zodb/storage._traceCacheGCFinish_run()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:265 +0x81
            lab.nexedi.com/kirr/neo/go/zodb/storage.traceCacheGCFinish()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/ztrace.go:22 +0x63
            lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gc()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:497 +0x62c
            lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gcmain()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:478 +0x4c
      
          Previous write at 0x00c42000d760 by goroutine 6:
            lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*Probe).Detach()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:319 +0x103
            lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*ProbeGroup).Done()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:344 +0xa5
            lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:576 +0x7f94
            testing.tRunner()
                /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c
      
          Goroutine 7 (running) created at:
            lab.nexedi.com/kirr/neo/go/zodb/storage.NewCache()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:129 +0x227
            lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache()
                /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:165 +0x7b1
            testing.tRunner()
                /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c
      
          Goroutine 6 (finished) created at:
            testing.(*T).Run()
                /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x568
            testing.runTests.func1()
                /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa7
            testing.tRunner()
                /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c
            testing.runTests()
                /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x521
            testing.(*M).Run()
                /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x206
            main.main()
                lab.nexedi.com/kirr/neo/go/zodb/storage/_test/_testmain.go:44 +0x1d3
          ==================
      bab79665
  6. 06 Oct, 2017 5 commits
  7. 05 Oct, 2017 8 commits
    • Kirill Smelkov's avatar
      X time z6001 - z600 + prefetch · b0cb5454
      Kirill Smelkov authored
      Max < 3x (nprefetch=8) in go-go case.
      b0cb5454
    • Kirill Smelkov's avatar
      X time z6001 (localhost) +prefetch · 6be37f4e
      Kirill Smelkov authored
      improves max 2x becase probably of go scheduler.
      6be37f4e
    • Kirill Smelkov's avatar
      . · 41d2b5b4
      Kirill Smelkov authored
      41d2b5b4
    • Test's avatar
      X neo1-neo2 time with C-states >C1 (and RX delay) disabled on both machines · 4f8313c1
      Test authored
      Compared to no C-states disabled (but RX delay = 0 as before) it improves:
      
      ping56            36μs  ->    33μs
      ping1472        ~150μs  ->  ~107μs
      
      TCP1             ~65μs  ->   ~45μs
      TCP1472     ~440-515μs  ->  ~420μs    (still very bad)
      TCP4096         ~370μs  ->  ~280μs    (still bad)
      
      ZEO:            ~900μs  ->  ~670μs
      NEO/pylite:    ~1050μs  ->  ~600μs    (Cpy)
      NEO/pylite:     ~800μs  ->  ~505μs    (Cgo)
      NEO/pysql:     ~1700μs  ->  ~900μs    (Cpy)
      NEO/pysql:     ~1300μs  ->  ~790μs    (Cgo)
      NEO/go:         ~730μs  ->  ~430μs    (Cpy)
      NEO/go:         ~370μs  ->  ~210μs    (Cgo)
      NEO/go-nosha1:  ~280μs  ->  ~190μs
      
      Raw TCP timings are still bad.
      4f8313c1
    • Test's avatar
      X neo1 localhost time with C-states >C1 disabled · 8e7527d0
      Test authored
      It improves:
      
      ZEO:             ~500μs ->  ~470μs
      NEO/pylite:      ~640μs ->  ~460μs      (Cpy)
      NEO/pylite:      ~415μs ->  ~350μs      (Cgo)
      NEO/pysql:      ~1300μs ->  ~790μs      (Cpy)
      NEO/pysql:       ~850μs ->  ~650μs      (Cgo)
      NEO/go:          ~350μs ->  ~275μs      (Cpy)
      NEO/go:           ~78μs ->   ~78μs      (Cgo)
      NEO/go-nosha1:    ~45μs ->   ~45μs
      8e7527d0
    • Kirill Smelkov's avatar
      . · 119d2908
      Kirill Smelkov authored
      119d2908
    • Test's avatar
      X time neo1-neo2 · a4e95d4e
      Test authored
      Same as on oct4 - we just add C-states profiling. In particular TCP1472 and TCP4096 are awful.
      C-states are not yet disabled.
      a4e95d4e
    • Test's avatar
      X time on neo1 (localhost) · de5d9c06
      Test authored
      Same as oct04 - we just add C-states profiling. C-states are not yet disabled.
      de5d9c06