An error occurred fetching the project authors.
  1. 15 Sep, 2024 1 commit
    • Kirill Smelkov's avatar
      wcfs: Switch filesystem to EIO mode on zwatcher failure · e64f0e0b
      Kirill Smelkov authored
      Currently zwatcher failure leads to wcfs starting to provide stale data
      instead of uptodate data. Fix that by detecting zwatcher failures and
      explicitly switching the filesystem to a mode where any access to
      anything returns "input/output error".
      
      Zwatcher can fail on e.g. failure to retrieve transactions from ZODB
      storage or any other failure. With this patch we make sure this does not
      go unnoticed.
      e64f0e0b
  2. 25 Jun, 2024 1 commit
    • Carlos Ramos Carreño's avatar
      wcfs: tests: Adapt changed modules/methods to Python 3. · 594ff3fa
      Carlos Ramos Carreño authored
      Some modules and methods have changed names in Python 3.
      The `thread` module has been renamed to `_thread` and the old name
      gives error when run on Python 3:
      
      ```python
      Traceback:
      /opt/slapgrid/b0df76c24a1d2728ccf3e276f07c1790/parts/python3/lib/python3.9/importlib/__init__.py:127: in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
      wcfs/client/client_test.py:32: in <module>
          from wendelin.wcfs.wcfs_test import tDB, tAt, timeout, eprint
      wcfs/wcfs_test.py:44: in <module>
          from thread import get_ident as gettid
      E   ModuleNotFoundError: No module named 'thread'
      ```
      
      In a similar vein, the `items` method of dictionaries plays the same
      role as the old `iteritems`.
      
      We use the `six` module to paper over these differences.
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!27
      594ff3fa
  3. 21 Dec, 2022 1 commit
  4. 21 Jan, 2022 7 commits
    • Kirill Smelkov's avatar
      wcfs: Fix crash if on watch request setupWatch needs to access ZODB · 38dde766
      Kirill Smelkov authored
      The problem is similar to a7bf0311 (wcfs: Fix crash if on invalidation
      handledδZ needs to access ZODB) - I forgot to put zhead's transaction into
      context.
      
      Without the fix added test fails as:
      
          wcfs_test.py::test_wcfs_crash_old_data
          ---------------- live log call -----------------
          WARNING  ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.OV0rS6/1.fs
      
          M: commit -> @at0 (03e5a3342bc5ab22)
      
          M: commit -> @at1 (03e5a3342bc88899)
          M:      f<0000000000000002>     [0]
          INFO     wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.OV0rS6/1.fs ...
          I0120 17:12:10.274379  704327 wcfs.go:2393] start "/dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59" "file:///tmp/testdb_fs.OV0rS6/1.fs"
          I0120 17:12:10.274409  704327 wcfs.go:2399] (built with go1.17.6)
          W0120 17:12:10.274560  704327 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.OV0rS6/1.fs: raw cache is not ready for invalidations -> NoCache forced
          INFO     wcfs:__init__.py:334 started pid704327 @ /dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59
      
          C: setup watch f<0000000000000002> @at1 (03e5a3342bc88899)
          #  pinok: {}
      
          M: commit -> @at2 (03e5a3342c895777)
          M:      f<0000000000000002>     [1]
      
          M: commit -> @at3 (03e5a3342ca5ef55)
          M:      f<0000000000000002>     [0]
      
          C: setup watch f<0000000000000002> @at2 (03e5a3342c895777)
          #  pinok: {0: @at1 (03e5a3342bc88899)}
          panic: transaction: no current transaction
      
          goroutine 88 [running]:
          lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x969718, 0xc0000b6240})
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77
          lab.nexedi.com/kirr/neo/go/transaction.Current(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc0000c25a0, {0x969718, 0xc0000b6240}, 0x4)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc00009dd40, {0x969718, 0xc0000b6240}, 0xc000100540, 0x30, 0x3e5a3342c895777)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d
          main.(*WatchLink).setupWatch(0xc0000120a0, {0x969718, 0xc0000b6240}, 0x2, 0x3e5a3342c895777)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f
          main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc0000b6240}, {0xc0000a0122, 0x0})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65
          main.(*WatchLink).handleWatch(0x0, {0x969718, 0xc0000b6240}, 0x0, {0xc0000a0122, 0x28})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c
          main.(*WatchLink)._serve.func3({0x969718, 0xc0000b6240})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
          >>> Change history by file:
      
          f<0000000000000002>:
                                          0 1 2 3 4 5 6 7
                                          a b c d e f g h
                  @at0 (03e5a3342bc5ab22)
                  @at1 (03e5a3342bc88899) 0
                  @at2 (03e5a3342c895777)   1
                  @at3 (03e5a3342ca5ef55) 0
      
          ----------------------------------------
      
                  # wcfs was crashing in setting up watch because of "1" and "2" from above, and
                  # 3. setupWatch was calling ΔFtail.BlkRevAt without putting zhead's transaction into ctx.
                  wl2 = t.openwatch()
          >       wl2.watch(zf, at2, {0:at1})
      38dde766
    • Kirill Smelkov's avatar
      wcfs: tests: Exercise watching @at0 · 769b1c06
      Kirill Smelkov authored
      Watching with at=tail is inevitable as explained in the previous patch.
      769b1c06
    • Kirill Smelkov's avatar
      wcfs: Adjust ΔFtail/ΔBtail to allow point-queries with at=tail · ef10f820
      Kirill Smelkov authored
      This is needed because when e.g. wcfs is just started the coverage of
      ΔFtail is (head,head] i.e. empty, and if user wants to setup a watch
      with at=head, it becomes watch with at=tail. Then that at is used in a
      query and if point-queries with at=tail are disallowed it panics with
      "at out of bounds".
      
      This fixes crashes in test_wcfs_watch_setup (see 339f1884 "wcfs: tests:
      Always start tDB with ZBigFile pre-created before WCFS startup") and in
      test_wcfs_crash_old_data (see 97ce5105 "wcfs: tests: Add test do
      demonstrate "at out of bounds" crash on readPinWatchers ->
      ΔFtail.BlkRevAt")
      
      For the reference zodb.ΔTail already allows point queries with at=tail:
      
      https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L202-206
      https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L225-228
      ef10f820
    • Kirill Smelkov's avatar
      wcfs: tests: Add test do demonstrate "at out of bounds" crash on readPinWatchers -> ΔFtail.BlkRevAt · 97ce5105
      Kirill Smelkov authored
      The codepath that sends pin messages to watchers on FUSE READ, similarly
      to what was showed in 339f1884 is also vulnerable to "at out of bounds"
      panic if at=ΔFtail.tail:
      
          wcfs_test.py::test_wcfs_crash_old_data
          ---------------- live log call -----------------
          WARNING  ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.nbSKXu/1.fs
      
          M: commit -> @at0 (03e5a31e5e5ef6bb)
      
          M: commit -> @at1 (03e5a31e5e63fa77)
          M:      f<0000000000000002>     [0]
          INFO     wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.nbSKXu/1.fs ...
          I0120 16:50:22.136098  697106 wcfs.go:2393] start "/dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652" "file:///tmp/testdb_fs.nbSKXu/1.fs"
          I0120 16:50:22.136127  697106 wcfs.go:2399] (built with go1.17.6)
          W0120 16:50:22.136233  697106 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.nbSKXu/1.fs: raw cache is not ready for invalidations -> NoCache forced
          INFO     wcfs:__init__.py:334 started pid697106 @ /dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652
      
          C: setup watch f<0000000000000002> @at1 (03e5a31e5e63fa77)
          #  pinok: {}
          panic: at out of bounds: at: @03e5a31e5e63fa77,  (tail, head] = (@03e5a31e5e63fa77, @03e5a31e5e63fa77]
      
          goroutine 7 [running]:
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc0000a5d40, {0x969718, 0xc000076140}, 0xc0001a22a0, 0xc0001c0200, 0x3e5a31e5e63fa77)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45
          main.(*BigFile).readPinWatchers(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, 0xffffffffffffffff)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1559 +0x2a5
          main.(*BigFile).readBlk(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, {0xc000320000, 0x200000, 0x0})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1281 +0x4d2
          main.(*BigFile).Read.func1({0x969718, 0xc000076140})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1223 +0x71
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
          >>> Change history by file:
      
          f<0000000000000002>:
                                          0 1 2 3 4 5 6 7
                                          a b c d e f g h
                  @at0 (03e5a31e5e5ef6bb)
                  @at1 (03e5a31e5e63fa77) 0
      
          ...
      
              @func
              def test_wcfs_crash_old_data():
                  # start wcfs with ΔFtail/ΔBtail not covering that initial data.
                  t = tDB(old_data=[{0:'a'}]); zf = t.zfile; at1 = t.head
                  defer(t.close)
      
                  f = t.open(zf)
      
                  # ΔFtail coverage is currently (at1,at1]
                  wl = t.openwatch()
                  wl.watch(zf, at1, {})
      
                  # wcfs is crashing on readPinWatcher -> ΔFtail.BlkRevAt with
                  #   "at out of bounds: at: @at1,  (tail,head] = (@at1,@at1]
                  # because BlkRevAt(at=tail) query was disallowed.
          >       f.assertBlk(0, 'a')          # [0] becomes tracked
      
      Still also crashing in test_wcfs_watch_setup.
      97ce5105
    • Kirill Smelkov's avatar
      wcfs: tests: Move tests for crashing WCFS due to old data to dedicated section · 67519be7
      Kirill Smelkov authored
      Soon this test will also exercise functionality from isolation protocol
      as well and so it will stop to be basic.
      
      Move plus rename test_wcfs_basic_invalidation_wo_dFtail_coverage ->
      test_wcfs_crash_old_data.
      
      Still crashing in test_wcfs_watch_setup.
      67519be7
    • Kirill Smelkov's avatar
      wcfs: tests: Teach tDB to create database with initial ZBigFile changes before WCFS is started · 1da89b57
      Kirill Smelkov authored
      This semantically moves initialization code from
      test_wcfs_basic_invalidation_wo_dFtail_coverage (see a7bf0311 "wcfs: Fix
      crash if on invalidation handledδZ needs to access ZODB") to tDB itself,
      and will be useful to exercise similar scenarios in other tests.
      
      Still crashing in test_wcfs_watch_setup.
      1da89b57
    • Kirill Smelkov's avatar
      wcfs: tests: Always start tDB with ZBigFile pre-created before WCFS startup · 339f1884
      Kirill Smelkov authored
      This should hopefully exercise codepaths in wcfs.go a bit more for
      mistakes similar to a7bf0311 (wcfs: Fix crash if on invalidation
      handledδZ needs to access ZODB) where the code on server side forgets to
      put zhead's transaction into context.
      
      Currently, because watching @tail is disallowed, this leads to panic triggered by test_wcfs_watch_setup:
      
          @at0 (03e59e3e606b89bb) -> @at1 (03e59e3e610692bb) -> @at2 (03e59e3e612a5811) -> @at3 (03e59e3e614fa9cc) -> @at4 (03e59e3e6189c3ee) -> @at5 (03e59e3e61af0baa)
      
          C: setup watch f<0000000000000002> @at0 (03e59e3e606b89bb)
          #  pinok: {0: @at0 (03e59e3e606b89bb), 2: @at0 (03e59e3e606b89bb), 3: @at0 (03e59e3e606b89bb), 5: @at0 (03e59e3e606b89bb)}
          panic: at out of bounds: at: @03e59e3e606b89bb,  (tail, head] = (@03e59e3e606b89bb, @03e59e3e61af0baa]
      
          goroutine 187 [running]:
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x969718, 0xc000062940}, 0xc0003060c0, 0x4174f4, 0x3e59e3e606b89bb)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45
          main.(*WatchLink).setupWatch(0xc000108050, {0x969718, 0xc000062940}, 0x2, 0x3e59e3e606b89bb)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f
          main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc000062940}, {0xc00001c812, 0xa00000})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65
          main.(*WatchLink).handleWatch(0x74039b, {0x969718, 0xc000062940}, 0xc0000a4280, {0xc00001c812, 0x28})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c
          main.(*WatchLink)._serve.func3({0x969718, 0xc000062940})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
          >>> Change history by file:
      
          f<0000000000000002>:
                                          0 1 2 3 4 5 6 7
                                          a b c d e f g h
                  @at0 (03e59e3e606b89bb)
                  @at1 (03e59e3e610692bb)     2
                  @at2 (03e59e3e612a5811)     2 3 4 5
                  @at3 (03e59e3e614fa9cc) 0   2     5
                  @at4 (03e59e3e6189c3ee)     2   4 5
                  @at5 (03e59e3e61af0baa)       3   5
      
      However next we will anyway need to allow to setup watches @tail, and so
      we will be fixing this and other errors in followup commits.
      
      NOTE: we don't loose coverage for the case when ZBigFile is created after wcfs
      startup due to test_wcfs_watch_2files, where that scenario is tested.
      
      ΔFtail/ΔBtail tests also exercise ZBigFile/BTree epochs
      (creation/deletion) well.
      339f1884
  5. 19 Jan, 2022 3 commits
  6. 18 Jan, 2022 1 commit
    • Kirill Smelkov's avatar
      wcfs: Fix crash if on invalidation handledδZ needs to access ZODB · a7bf0311
      Kirill Smelkov authored
      The invalidation logic is generally right, but invalidateBlk -> ΔFtail.BlkRevAt
      was being called with ctx without transaction. As the result it was
      panicking as
      
          panic: transaction: no current transaction
      
          goroutine 41 [running]:
          lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x9696d8, 0xc0000d8080})
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77
          lab.nexedi.com/kirr/neo/go/transaction.Current(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc00010c640, {0x9696d8, 0xc0000d8080}, 0x4)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x9696d8, 0xc0000d8080}, 0xc000064f60, 0x0, 0x3e5983329bbd100)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d
          main.(*BigFile).invalidateBlk.func1(0xc000164400, {0x9696d8, 0xc0000d8080}, 0xc0005a0000, 0x200000, 0x200000, {0xc0005a0000, 0x200000, 0x200000})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1089 +0xb8
          main.(*BigFile).invalidateBlk(0xc000164400, {0x9696d8, 0xc0000d8080}, 0x0)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1105 +0x3bb
          main.(*Root).handleδZ.func3({0x9696d8, 0xc0000d8080})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:898 +0x34
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
      
      on any new change to tracked file block whose previous history is not covered by ΔFtail/ΔBtail.
      
      Problem reported by @Francois.
      a7bf0311
  7. 12 Nov, 2021 1 commit
    • Kirill Smelkov's avatar
      wcfs: Make sure to remove mountpoint directory on Server.stop · d2fd8b77
      Kirill Smelkov authored
      Else every time test.py/wcfs is run several empty directories are left
      in /dev/shm/wcfs - each corresponding to WCFS server that was
      automatically spawned and stopped at the end of the test. Over time this
      can accumulate to some big number as e.g. ~20000 of such directories
      were left on the testnode during last 6 months.
      d2fd8b77
  8. 28 Oct, 2021 7 commits
    • Kirill Smelkov's avatar
      bigfile/zodb: Teach ZBigFile backend to use WCFS · c5e18c74
      Kirill Smelkov authored
      By using WCFS as mmap-overlay for base data(*). WCFS-mode is still opt-in
      with default remaining to use old full user-space virtual memory manager
      mode as initially introduced in 2015.
      
      Wendelin.core should be draftly usable in WCFS mode now.
      
      This patch is organized as follows:
      
      - file_zodb.cpp provides mmap-overlay operations for WCFS implemented via
        WCFS client library.
      - file_zodb.py is adjusted accordingly to use WCFS if requested.
        Low-level things specific to gluing to file_zodb.cpp are moved to _file_zodb.pyx.
      - the rest of the changes are drive-by by main ones.
      
      (*) see the following patches for what is mmap-overlay:
      
      - fae045cc  (bigfile/virtmem: Introduce "mmap overlay" mode)
      - 23362204  (bigfile/py: Allow PyBigFile backend to expose "mmap overlay" functionality)
      
      Some preliminary history:
      
      kirr/wendelin.core@01916f09    X Draft demo that reading data through wcfs works
      kirr/wendelin.core@fd58082a    X Fix build on old GCC
      kirr/wendelin.core@f622e751    X tests: Stop wcfs spawned during tests
      kirr/wendelin.core@f118617b    X tests: Don't try to stop wcfs that is already exited
      c5e18c74
    • Kirill Smelkov's avatar
      wcfs: client: Provide client package to care about isolation protocol details · 10f7153a
      Kirill Smelkov authored
      This patch follows-up on previous patch, that added server-side part of
      isolation protocol handling, and adds client package that takes care about
      WCFS isolation protocol details and provides to clients simple interface to
      isolated view of bigfile data on WCFS similar to regular files: given a
      particular revision of database @at, it provides synthetic read-only bigfile
      memory mappings with data corresponding to @at state, but using /head/bigfile/*
      most of the time to build and maintain the mappings.
      
      The patch is organized as follows:
      
      - wcfs.h and wcfs.cpp brings in usage documentation, internal overview and the
        main part of the implementation.
      
      - wcfs/client/client_test.py is tests.
      
      - The rest of the changes in wcfs/client/ are to support the implementation and tests.
      
      Quoting package documentation for the reference:
      
      ---- 8< ----
      
      Package wcfs provides WCFS client.
      
      This client package takes care about WCFS isolation protocol details and
      provides to clients simple interface to isolated view of bigfile data on
      WCFS similar to regular files: given a particular revision of database @at,
      it provides synthetic read-only bigfile memory mappings with data
      corresponding to @at state, but using /head/bigfile/* most of the time to
      build and maintain the mappings.
      
      For its data a mapping to bigfile X mostly reuses kernel cache for
      /head/bigfile/X with amount of data not associated with kernel cache for
      /head/bigfile/X being proportional to δ(bigfile/X, at..head). In the usual
      case where many client workers simultaneously serve requests, their database
      views are a bit outdated, but close to head, which means that in practice
      the kernel cache for /head/bigfile/* is being used almost 100% of the time.
      
      A mapping for bigfile X@at is built from OS-level memory mappings of
      on-WCFS files as follows:
      
                                                ___        /@revA/bigfile/X
              __                                           /@revB/bigfile/X
                     _                                     /@revC/bigfile/X
                                 +                         ...
           ───  ───── ──────────────────────────   ─────   /head/bigfile/X
      
      where @revR mmaps are being dynamically added/removed by this client package
      to maintain X@at data view according to WCFS isolation protocol(*).
      
      API overview
      
       - `WCFS` represents filesystem-level connection to wcfs server.
       - `Conn` represents logical connection that provides view of data on wcfs
         filesystem as of particular database state.
       - `FileH` represent isolated file view under Conn.
       - `Mapping` represents one memory mapping of FileH.
      
      A path from WCFS to Mapping is as follows:
      
       WCFS.connect(at)                    -> Conn
       Conn.open(foid)                     -> FileH
       FileH.mmap([blk_start +blk_len))    -> Mapping
      
      A connection can be resynced to another database view via Conn.resync(at').
      
      Documentation for classes provides more thorough overview and API details.
      
      --------
      
      (*) see wcfs.go documentation for WCFS isolation protocol overview and details.
      
      .
      
      Wcfs client organization
      ~~~~~~~~~~~~~~~~~~~~~~~~
      
      Wcfs client provides to its users isolated bigfile views backed by data on
      WCFS filesystem. In the absence of Isolation property, wcfs client would
      reduce to just directly using OS-level file wcfs/head/f for a bigfile f. On
      the other hand there is a simple, but inefficient, way to support isolation:
      for @at database view of bigfile f - directly use OS-level file wcfs/@at/f.
      The latter works, but is very inefficient because OS-cache for f data is not
      shared in between two connections with @at1 and @at2 views. The cache is
      also lost when connection view of the database is resynced on transaction
      boundary. To support isolation efficiently, wcfs client uses wcfs/head/f
      most of the time, but injects wcfs/@revX/f parts into mappings to maintain
      f@at view driven by pin messages that wcfs server sends to client in
      accordance to WCFS isolation protocol(*).
      
      Wcfs server sends pin messages synchronously triggered by access to mmaped
      memory. That means that a client thread, that is accessing wcfs/head/f mmap,
      is completely blocked while wcfs server sends pins and waits to receive acks
      from all clients. In other words on-client handling of pins has to be done
      in separate thread, because wcfs server can also send pins to client that
      triggered the access.
      
      Wcfs client implements pins handling in so-called "pinner" thread(+). The
      pinner thread receives pin requests from wcfs server via watchlink handle
      opened through wcfs/head/watch. For every pin request the pinner finds
      corresponding Mappings and injects wcfs/@revX/f parts via Mapping._remmapblk
      appropriately.
      
      The same watchlink handle is used to send client-originated requests to wcfs
      server. The requests are sent to tell wcfs that client wants to observe a
      particular bigfile as of particular revision, or to stop watching it.
      Such requests originate from regular client threads - not pinner - via entry
      points like Conn.open, Conn.resync and FileH.close.
      
      Every FileH maintains fileh._pinned {} with currently pinned blk -> rev. This
      dict is updated by pinner driven by pin messages, and is used when
      new fileh Mapping is created (FileH.mmap).
      
      In wendelin.core a bigfile has semantic that it is infinite in size and
      reads as all zeros beyond region initialized with data. Memory-mapping of
      OS-level files can also go beyond file size, however accessing memory
      corresponding to file region after file.size triggers SIGBUS. To preserve
      wendelin.core semantic wcfs client mmaps-in zeros for Mapping regions after
      wcfs/head/f.size. For simplicity it is assumed that bigfiles only grow and
      never shrink. It is indeed currently so, but will have to be revisited
      if/when wendelin.core adds bigfile truncation. Wcfs client restats
      wcfs/head/f at every transaction boundary (Conn.resync) and remembers f.size
      in FileH._headfsize for use during one transaction(%).
      
      --------
      
      (*) see wcfs.go documentation for WCFS isolation protocol overview and details.
      (+) currently, for simplicity, there is one pinner thread for each connection.
          In the future, for efficiency, it might be reworked to be one pinner thread
          that serves all connections simultaneously.
      (%) see _headWait comments on how this has to be reworked.
      
      Wcfs client locking organization
      
      Wcfs client needs to synchronize regular user threads vs each other and vs
      pinner. A major lock Conn.atMu protects updates to changes to Conn's view of
      the database. Whenever atMu.W is taken - Conn.at is changing (Conn.resync),
      and contrary whenever atMu.R is taken - Conn.at is stable (roughly speaking
      Conn.resync is not running).
      
      Similarly to wcfs.go(*) several locks that protect internal data structures
      are minor to Conn.atMu - they need to be taken only under atMu.R (to
      synchronize e.g. multiple fileh open running simultaneously), but do not
      need to be taken at all if atMu.W is taken. In data structures such locks
      are noted as follows
      
           sync::Mutex xMu;    // atMu.W  |  atMu.R + xMu
      
      After atMu, Conn.filehMu protects registry of opened file handles
      (Conn._filehTab), and FileH.mmapMu protects registry of created Mappings
      (FileH.mmaps) and FileH.pinned.
      
      Several locks are RWMutex instead of just Mutex not only to allow more
      concurrency, but, in the first place for correctness: pinner thread being
      core element in handling WCFS isolation protocol, is effectively invoked
      synchronously from other threads via messages coming through wcfs server.
      For example Conn.resync sends watch request to wcfs server and waits for the
      answer. Wcfs server, in turn, might send corresponding pin messages to the
      pinner and _wait_ for the answer before answering to resync:
      
             - - - - - -
            |       .···|·····.        ---->   = request
               pinner <------.↓        <····   = response
            |           |   wcfs
               resync -------^↓
            |      `····|·····
             - - - - - -
            client process
      
      This creates the necessity to use RWMutex for locks that pinner and other
      parts of the code could be using at the same time in synchronous scenarios
      similar to the above. This locks are:
      
           - Conn.atMu
           - Conn.filehMu
      
      Note that FileH.mmapMu is regular - not RW - mutex, since nothing in wcfs
      client calls into wcfs server via watchlink with mmapMu held.
      
      The ordering of locks is:
      
           Conn.atMu > Conn.filehMu > FileH.mmapMu
      
      The pinner takes the following locks:
      
           - wconn.atMu.R
           - wconn.filehMu.R
           - fileh.mmapMu (to read .mmaps  +  write .pinned)
      
      (*) see "Wcfs locking organization" in wcfs.go
      
      Handling of fork
      
      When a process calls fork, OS copies its memory and creates child process
      with only 1 thread. That child inherits file descriptors and memory mappings
      from parent. To correctly continue using Conn, FileH and Mappings, the child
      must recreate pinner thread and reconnect to wcfs via reopened watchlink.
      The reason here is that without reconnection - by using watchlink file
      descriptor inherited from parent - the child would interfere into
      parent-wcfs exchange and neither parent nor child could continue normal
      protocol communication with WCFS.
      
      For simplicity, since fork is seldomly used for things besides followup
      exec, wcfs client currently takes straightforward approach by disabling
      mappings and detaching from WCFS server in the child right after fork. This
      ensures that there is no interference into parent-wcfs exchange should child
      decide not to exec and to continue running in the forked thread. Without
      this protection the interference might come even automatically via e.g.
      Python GC -> PyFileH.__del__ -> FileH.close -> message to WCFS.
      
      ----------------------------------------
      
      Some preliminary history:
      
      a8fa9178    X wcfs: move client tests into client/
      990afac1    X wcfs/client: Package overview (draft)
      3f83469c    X wcfs: client: Handle fork
      0ed6b8b6    fixup! X wcfs: client: Handle fork
      24378c46    X wcfs: client: Provide Conn.at()
      10f7153a
    • Kirill Smelkov's avatar
      wcfs: Provide isolation to clients · 6f0cdaff
      Kirill Smelkov authored
      Via custom isolation protocol that both server and clients must cooperatively
      follow. This is the core change that enables file cache to be practically
      shared while each client can still be provided with isolated view of the database.
      
      This patch brings only server changes, tests + the minimum client bits to support the tests.
      The client library, that will implement isolation protocol on client side, will come next.
      
      This patch is organized as follows:
      
      - wcfs.go brings in description of the protocol, overview of how server
        implements that protocol and the implementation itself.
        See also notes.txt
      
      - wcfs_test.py brings in tests for server implementation.
        tWCFS._abort_ontimeout had to be moved into nogil mode into wcfs_test.pyx
        to avoid deadlock on the GIL (see comments in wcfs_test.pyx for details).
      
      - files added in wcfs/client/ are needed to provide client-side
        implementation of WatchLink - the message exchange protocol over
        opened head/watch file - for tests. Client-side watchlink implementation
        lives in wcfs/client/wcfs_watchlink.{h,cpp}. The other additions in
        wcfs/client/ are to support that and to expose the WatchLink to Python.
      
        Client-side bits are done right in C++ because upcoming WCFS client
        library will be implemented in C++ to work in nogil mode in order to
        avoid deadlock on the GIL because client-side pinner thread might be
        woken-up synchronously by WCFS server at any moment, including when
        another client thread already holds the GIL and is paused by WCFS.
      
      Some preliminary history:
      
      kirr/wendelin.core@9b4a42a3    X invalidation design draftly settled
      kirr/wendelin.core@27d91d47    X δFtail settled
      kirr/wendelin.core@c27c1940    X mmap over under pagefault to this mmapping works
      kirr/wendelin.core@d36b171f    X ptrace when client is under pagefault or syscall won't work
      kirr/wendelin.core@c1f5bb19    X notes on why lazy-invalidate approach was taken
      kirr/wendelin.core@4fbdd270    X Proof that that it is possible to change mmapping while under pagefault to it
      kirr/wendelin.core@33e0dfce    X ΔTail draftly done
      kirr/wendelin.core@12628943    X make sure "bye" is always processed immediately - even if a handleWatch is currently blocked
      kirr/wendelin.core@af0a64cb    X test for "bye" canceling blocked handlers
      kirr/wendelin.core@996dc6a8    X Fix race in test
      kirr/wendelin.core@43915fe9    X wcfs: Don't forbid simultaneous watch requests
      kirr/wendelin.core@941dc54b    X wcfs: threading.Lock -> sync.Mutex
      kirr/wendelin.core@d75b2304    X wcfs: Move _abort_ontimeout to pyx/nogil
      kirr/wendelin.core@79234659    X Notes on why eagier invalidation was rejected
      kirr/wendelin.core@f05271b1    X Test that sysread(/head/watch) can be interrupted
      kirr/wendelin.core@5ba816da    X restore test_wcfs_watch_robust after f05271b1.
      kirr/wendelin.core@4bd88564    X "Invalidation protocol" -> "Isolation protocol"
      kirr/wendelin.core@f7b54ca4    X avoid fmt::vsprintf  (now compils again with latest pygolang@master)
      kirr/wendelin.core@0a8fcd9d    X wcfs/client: Move EOF -> pygolang
      kirr/wendelin.core@153e02e6    X test_wcfs_watch_setup and test_wcfs_watch_setup_ahead work again
      kirr/wendelin.core@17f98edc    X wcfs: client: os: Factor syserr -> string into _sysErrString
      kirr/wendelin.core@7b0c301c    X wcfs: tests: Fix tFile.assertBlk not to segfault on a test failure
      kirr/wendelin.core@b74dda09    X Start switching Track from Track(key) to Track(keycov)
      kirr/wendelin.core@8b5d8523    X Move tracking of which blocks were accessed from wcfs to ΔFtail
      6f0cdaff
    • Kirill Smelkov's avatar
      wcfs: Handle ZODB invalidations · 4430de41
      Kirill Smelkov authored
      Use ΔFtail.Track on every READ, and query accumulated ΔFtail upon
      receiving ZODB invalidation to query it about which blocks of which
      files have been changed. Then invalidate those blocks in OS file cache.
      
      See added documentation to wcfs.go and notes.txt for details.
      
      Now the filesystem is no longer stale: it provides view of data
      that is uptodate wrt changes on ZODB storage.
      
      Some preliminary history:
      
      kirr/wendelin.core@9b4a42a3    X invalidation design draftly settled
      kirr/wendelin.core@27d91d47    X δFtail settled
      kirr/wendelin.core@33e0dfce    X ΔTail draftly done
      kirr/wendelin.core@822366a7    X keeping fd to root opened prevents the filesystem from being unmounted
      kirr/wendelin.core@89ad3a79    X Don't keep ZBigFile activated during whole current transaction
      kirr/wendelin.core@245511ac    X Give pointer on from where to get nxd-fuse.ko
      kirr/wendelin.core@d1cd128c    X Hit FUSE-related deadlock
      kirr/wendelin.core@d134ee44    X FUSE lookup deadlock should be hopefully fixed
      kirr/wendelin.core@0e60e9ff    X wcfs: Don't noise ZWatcher trace logs with "select ..."
      kirr/wendelin.core@bf9a7405    X No longer rely on ZODB cache invariant for invalidations
      4430de41
    • Kirill Smelkov's avatar
      wcfs: tests: Start verifying state of OS file cache · d81d2cbb
      Kirill Smelkov authored
      For WCFS to be efficient it will have to carefully preserve OS cache on
      file invalidations. As preparatory step establish infrastructure for
      verifying state of OS file cache and start asserting on OS cache state
      in a couple of places.
      
      See comments added to tFile constructor that describe how OS cache state
      verification is setup.
      
      Some preliminary history:
      
      kirr/wendelin.core@8293025b    X Thoughts on how to avoid readahead touching pages of neighbour block
      kirr/wendelin.core@3054e4a3    X not touching neighbour block works via setting MADV_RANDOM in last 1/4 of every block
      kirr/wendelin.core@18362227    X #5 access still triggers read to #4 ?
      kirr/wendelin.core@17dbf94e    X Provide mlock2 fallback for Ubuntu
      kirr/wendelin.core@d134c0b9    X wcfs: test: try to live with only hard memlock limit adjusted
      kirr/wendelin.core@c2423296    X Fix mlock2 build on Debian 8
      d81d2cbb
    • Kirill Smelkov's avatar
      wcfs: Initial implementation of basic filesystem · e3f2ee2d
      Kirill Smelkov authored
      Provide filesystem view of in-ZODB ZBigFiles, but do not implement support for
      invalidations nor isolation protocol yet. In particular, because ZODB
      invalidations are not yet handled, the filesystem does not update its data in
      accordance with ZODB updates, and instead provides stale data view that
      corresponds to the state of ZODB at the time when wcfs was mounted.
      
      The main parts of this patch are:
      
      - wcfs/wcfs.go is filesystem implementation itself together with overview.
      - wcfs/__init__.py is python wrapper to spawn and interoperate with that filesystem.
      - wcfs/wcfs_test.py is tests.
      
      Some preliminary history:
      
      kirr/wendelin.core@fe7efb94    X start of wcfs
      kirr/wendelin.core@878b2787    X draft loading
      kirr/wendelin.core@d58c71e8    X don't overalign end by 1 blksize if end is already aligned
      kirr/wendelin.core@29c9f13d    X readBlk: Fix thinko in already case
      kirr/wendelin.core@59552328    X wcfs: Care to disable OS polling on us
      kirr/wendelin.core@c00d94c7    X workaround lack of exception chaining on Python2 with xdefer
      kirr/wendelin.core@0398e23d    X bytearray turned out to be copying data
      kirr/wendelin.core@7a837040    X print wcfs.py py-level traceback on SIGBUS (e.g. wcfs.go aborting due to bug/panic)
      kirr/wendelin.core@661b871f    X make sure tests don't get stuck even if wcfs gets killed -9 ...
      kirr/wendelin.core@2c043d29    X More effort to unmount failed wcfs.go
      kirr/wendelin.core@1ccc4478    X Use `with gil` + regular py code instead of PyGILState_Ensure/PyGILState_Release/PyRun_SimpleString
      kirr/wendelin.core@5dc9c791    X wcfs: Kill xdefer
      kirr/wendelin.core@91e9eba8    X wcfs: test: Register tFile to tDB early
      kirr/wendelin.core@a7138fef    X wcfs: mkdir /tmp/wcfs with sticky bit
      kirr/wendelin.core@1eec76d0    X wcfs: try to set sticky for /tmp/wcfs even if the directory already exists
      kirr/wendelin.core@c2c35851    X wcfs: tests: Factor-out waiting for a general condition to become true into waitfor
      kirr/wendelin.core@78f36993    X wcfs: test: Fix thinko in getting /sys/fs/fuse/connection/<X> for wcfs
      kirr/wendelin.core@bc9eb16f    X wcfs: tests: Don't use testmntpt everywhere
      kirr/wendelin.core@6dec74e7    X wcfs: tests: Split tDB into -> tDB + tWCFS
      kirr/wendelin.core@3a6bd764    X wcfs: tests: Run `fusermount -u` the second time if we had to kill wcfs
      kirr/wendelin.core@112720f3    X wcfs: tests: Print which files are still opened on wcfs if `fusermount -u` fails
      kirr/wendelin.core@bb40185b    X wcfs: Take $WENDELIN_CORE_WCFS_OPTIONS into account not only from under join
      kirr/wendelin.core@03a9ef33    X wcfs: Remove credentials from zurl when computing wcfs mountpoint
      kirr/wendelin.core@68ee5bdc    X wcfs: lsof tweaks
      kirr/wendelin.core@21671879    X wcfs: Teach entrypoint frontend to handle subcommands: serve, status, stop
      kirr/wendelin.core@b0642b80    X wcfs: Switch mountpoints from /tmp/wcfs/* to /dev/shm/*
      kirr/wendelin.core@b0ca031f    X wcfs: Teach join/serve to start successfully even after unclean wcfs shutdown
      kirr/wendelin.core@5bfa8cf8    X wcfs: Add start to spawn a Server that can be later stopped  (draft)
      kirr/wendelin.core@5fcec261    X wcfs: Run fusermount and friends with /bin:/usr/bin always on path
      kirr/wendelin.core@669d7a20    fixup! X wcfs: Run fusermount and friends with /bin:/usr/bin always on path
      kirr/wendelin.core@6b22f8c4    X wcfs: Teach start to start successfully even after unclean wcfs shutdown
      kirr/wendelin.core@15389db0    X wcfs: Tune _fuse_unmount to include `fusermount -u` error message into raised exception
      kirr/wendelin.core@153c002a    X wcfs: _fuse_unmount: Try first `kill -TERM` before `kill -QUIT` wcfs
      kirr/wendelin.core@3244f3a6    X wcfs: lsof +D misbehaves - don't use it
      kirr/wendelin.core@a126e709    X wcfs: Put client log into its own logger
      kirr/wendelin.core@ac303d1e    X wcfs: tests: -v  ->  show only wcfs.py logs verbosely
      kirr/wendelin.core@d671a9e9    X wcfs: Give more time to stop wcfs server
      e3f2ee2d
    • Kirill Smelkov's avatar
      wcfs: Initial stub · 2163fcaf
      Kirill Smelkov authored
      Add initial stub for WCFS program and tests.
      WCFS functionality will be added step-by-step in follow-up commits.
      
      Some preliminary history:
      
      kirr/wendelin.core@0ae88a32       X .nxdtest: Verify Go bits with GOMAXPROCS=1,2,`nproc`
      kirr/wendelin.core@23528eb4       X wcfs: make it to use go modules for dependencies
      2163fcaf
  9. 24 Oct, 2017 1 commit
    • Kirill Smelkov's avatar
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source... · f11386a4
      Kirill Smelkov authored
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options.
      
      Nexedi stack is licensed under Free Software licenses with various exceptions
      that cover three business cases:
      
      - Free Software
      - Proprietary Software
      - Rebranding
      
      As long as one intends to develop Free Software based on Nexedi stack, no
      license cost is involved. Developing proprietary software based on Nexedi stack
      may require a proprietary exception license. Rebranding Nexedi stack is
      prohibited unless rebranding license is acquired.
      
      Through this licensing approach, Nexedi expects to encourage Free Software
      development without restrictions and at the same time create a framework for
      proprietary software to contribute to the long term sustainability of the
      Nexedi stack.
      
      Please see https://www.nexedi.com/licensing for details, rationale and options.
      f11386a4
  10. 03 Apr, 2015 2 commits