An error occurred fetching the project authors.
- 15 Sep, 2024 1 commit
-
-
Kirill Smelkov authored
Currently zwatcher failure leads to wcfs starting to provide stale data instead of uptodate data. Fix that by detecting zwatcher failures and explicitly switching the filesystem to a mode where any access to anything returns "input/output error". Zwatcher can fail on e.g. failure to retrieve transactions from ZODB storage or any other failure. With this patch we make sure this does not go unnoticed.
-
- 25 Jun, 2024 1 commit
-
-
Carlos Ramos Carreño authored
Some modules and methods have changed names in Python 3. The `thread` module has been renamed to `_thread` and the old name gives error when run on Python 3: ```python Traceback: /opt/slapgrid/b0df76c24a1d2728ccf3e276f07c1790/parts/python3/lib/python3.9/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) wcfs/client/client_test.py:32: in <module> from wendelin.wcfs.wcfs_test import tDB, tAt, timeout, eprint wcfs/wcfs_test.py:44: in <module> from thread import get_ident as gettid E ModuleNotFoundError: No module named 'thread' ``` In a similar vein, the `items` method of dictionaries plays the same role as the old `iteritems`. We use the `six` module to paper over these differences. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!27
-
- 21 Dec, 2022 1 commit
-
-
Kirill Smelkov authored
The function to do it was there, but I missed to add corresponding defer. Fixes 6f0cdaff (wcfs: Provide isolation to clients).
-
- 21 Jan, 2022 7 commits
-
-
Kirill Smelkov authored
The problem is similar to a7bf0311 (wcfs: Fix crash if on invalidation handledδZ needs to access ZODB) - I forgot to put zhead's transaction into context. Without the fix added test fails as: wcfs_test.py::test_wcfs_crash_old_data ---------------- live log call ----------------- WARNING ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.OV0rS6/1.fs M: commit -> @at0 (03e5a3342bc5ab22) M: commit -> @at1 (03e5a3342bc88899) M: f<0000000000000002> [0] INFO wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.OV0rS6/1.fs ... I0120 17:12:10.274379 704327 wcfs.go:2393] start "/dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59" "file:///tmp/testdb_fs.OV0rS6/1.fs" I0120 17:12:10.274409 704327 wcfs.go:2399] (built with go1.17.6) W0120 17:12:10.274560 704327 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.OV0rS6/1.fs: raw cache is not ready for invalidations -> NoCache forced INFO wcfs:__init__.py:334 started pid704327 @ /dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59 C: setup watch f<0000000000000002> @at1 (03e5a3342bc88899) # pinok: {} M: commit -> @at2 (03e5a3342c895777) M: f<0000000000000002> [1] M: commit -> @at3 (03e5a3342ca5ef55) M: f<0000000000000002> [0] C: setup watch f<0000000000000002> @at2 (03e5a3342c895777) # pinok: {0: @at1 (03e5a3342bc88899)} panic: transaction: no current transaction goroutine 88 [running]: lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x969718, 0xc0000b6240}) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77 lab.nexedi.com/kirr/neo/go/transaction.Current(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc0000c25a0, {0x969718, 0xc0000b6240}, 0x4) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc00009dd40, {0x969718, 0xc0000b6240}, 0xc000100540, 0x30, 0x3e5a3342c895777) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d main.(*WatchLink).setupWatch(0xc0000120a0, {0x969718, 0xc0000b6240}, 0x2, 0x3e5a3342c895777) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc0000b6240}, {0xc0000a0122, 0x0}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65 main.(*WatchLink).handleWatch(0x0, {0x969718, 0xc0000b6240}, 0x0, {0xc0000a0122, 0x28}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c main.(*WatchLink)._serve.func3({0x969718, 0xc0000b6240}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 >>> Change history by file: f<0000000000000002>: 0 1 2 3 4 5 6 7 a b c d e f g h @at0 (03e5a3342bc5ab22) @at1 (03e5a3342bc88899) 0 @at2 (03e5a3342c895777) 1 @at3 (03e5a3342ca5ef55) 0 ---------------------------------------- # wcfs was crashing in setting up watch because of "1" and "2" from above, and # 3. setupWatch was calling ΔFtail.BlkRevAt without putting zhead's transaction into ctx. wl2 = t.openwatch() > wl2.watch(zf, at2, {0:at1})
-
Kirill Smelkov authored
Watching with at=tail is inevitable as explained in the previous patch.
-
Kirill Smelkov authored
This is needed because when e.g. wcfs is just started the coverage of ΔFtail is (head,head] i.e. empty, and if user wants to setup a watch with at=head, it becomes watch with at=tail. Then that at is used in a query and if point-queries with at=tail are disallowed it panics with "at out of bounds". This fixes crashes in test_wcfs_watch_setup (see 339f1884 "wcfs: tests: Always start tDB with ZBigFile pre-created before WCFS startup") and in test_wcfs_crash_old_data (see 97ce5105 "wcfs: tests: Add test do demonstrate "at out of bounds" crash on readPinWatchers -> ΔFtail.BlkRevAt") For the reference zodb.ΔTail already allows point queries with at=tail: https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L202-206 https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L225-228
-
Kirill Smelkov authored
The codepath that sends pin messages to watchers on FUSE READ, similarly to what was showed in 339f1884 is also vulnerable to "at out of bounds" panic if at=ΔFtail.tail: wcfs_test.py::test_wcfs_crash_old_data ---------------- live log call ----------------- WARNING ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.nbSKXu/1.fs M: commit -> @at0 (03e5a31e5e5ef6bb) M: commit -> @at1 (03e5a31e5e63fa77) M: f<0000000000000002> [0] INFO wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.nbSKXu/1.fs ... I0120 16:50:22.136098 697106 wcfs.go:2393] start "/dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652" "file:///tmp/testdb_fs.nbSKXu/1.fs" I0120 16:50:22.136127 697106 wcfs.go:2399] (built with go1.17.6) W0120 16:50:22.136233 697106 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.nbSKXu/1.fs: raw cache is not ready for invalidations -> NoCache forced INFO wcfs:__init__.py:334 started pid697106 @ /dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652 C: setup watch f<0000000000000002> @at1 (03e5a31e5e63fa77) # pinok: {} panic: at out of bounds: at: @03e5a31e5e63fa77, (tail, head] = (@03e5a31e5e63fa77, @03e5a31e5e63fa77] goroutine 7 [running]: lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc0000a5d40, {0x969718, 0xc000076140}, 0xc0001a22a0, 0xc0001c0200, 0x3e5a31e5e63fa77) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45 main.(*BigFile).readPinWatchers(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, 0xffffffffffffffff) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1559 +0x2a5 main.(*BigFile).readBlk(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, {0xc000320000, 0x200000, 0x0}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1281 +0x4d2 main.(*BigFile).Read.func1({0x969718, 0xc000076140}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1223 +0x71 lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 >>> Change history by file: f<0000000000000002>: 0 1 2 3 4 5 6 7 a b c d e f g h @at0 (03e5a31e5e5ef6bb) @at1 (03e5a31e5e63fa77) 0 ... @func def test_wcfs_crash_old_data(): # start wcfs with ΔFtail/ΔBtail not covering that initial data. t = tDB(old_data=[{0:'a'}]); zf = t.zfile; at1 = t.head defer(t.close) f = t.open(zf) # ΔFtail coverage is currently (at1,at1] wl = t.openwatch() wl.watch(zf, at1, {}) # wcfs is crashing on readPinWatcher -> ΔFtail.BlkRevAt with # "at out of bounds: at: @at1, (tail,head] = (@at1,@at1] # because BlkRevAt(at=tail) query was disallowed. > f.assertBlk(0, 'a') # [0] becomes tracked Still also crashing in test_wcfs_watch_setup.
-
Kirill Smelkov authored
Soon this test will also exercise functionality from isolation protocol as well and so it will stop to be basic. Move plus rename test_wcfs_basic_invalidation_wo_dFtail_coverage -> test_wcfs_crash_old_data. Still crashing in test_wcfs_watch_setup.
-
Kirill Smelkov authored
This semantically moves initialization code from test_wcfs_basic_invalidation_wo_dFtail_coverage (see a7bf0311 "wcfs: Fix crash if on invalidation handledδZ needs to access ZODB") to tDB itself, and will be useful to exercise similar scenarios in other tests. Still crashing in test_wcfs_watch_setup.
-
Kirill Smelkov authored
This should hopefully exercise codepaths in wcfs.go a bit more for mistakes similar to a7bf0311 (wcfs: Fix crash if on invalidation handledδZ needs to access ZODB) where the code on server side forgets to put zhead's transaction into context. Currently, because watching @tail is disallowed, this leads to panic triggered by test_wcfs_watch_setup: @at0 (03e59e3e606b89bb) -> @at1 (03e59e3e610692bb) -> @at2 (03e59e3e612a5811) -> @at3 (03e59e3e614fa9cc) -> @at4 (03e59e3e6189c3ee) -> @at5 (03e59e3e61af0baa) C: setup watch f<0000000000000002> @at0 (03e59e3e606b89bb) # pinok: {0: @at0 (03e59e3e606b89bb), 2: @at0 (03e59e3e606b89bb), 3: @at0 (03e59e3e606b89bb), 5: @at0 (03e59e3e606b89bb)} panic: at out of bounds: at: @03e59e3e606b89bb, (tail, head] = (@03e59e3e606b89bb, @03e59e3e61af0baa] goroutine 187 [running]: lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x969718, 0xc000062940}, 0xc0003060c0, 0x4174f4, 0x3e59e3e606b89bb) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45 main.(*WatchLink).setupWatch(0xc000108050, {0x969718, 0xc000062940}, 0x2, 0x3e59e3e606b89bb) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc000062940}, {0xc00001c812, 0xa00000}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65 main.(*WatchLink).handleWatch(0x74039b, {0x969718, 0xc000062940}, 0xc0000a4280, {0xc00001c812, 0x28}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c main.(*WatchLink)._serve.func3({0x969718, 0xc000062940}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 >>> Change history by file: f<0000000000000002>: 0 1 2 3 4 5 6 7 a b c d e f g h @at0 (03e59e3e606b89bb) @at1 (03e59e3e610692bb) 2 @at2 (03e59e3e612a5811) 2 3 4 5 @at3 (03e59e3e614fa9cc) 0 2 5 @at4 (03e59e3e6189c3ee) 2 4 5 @at5 (03e59e3e61af0baa) 3 5 However next we will anyway need to allow to setup watches @tail, and so we will be fixing this and other errors in followup commits. NOTE: we don't loose coverage for the case when ZBigFile is created after wcfs startup due to test_wcfs_watch_2files, where that scenario is tested. ΔFtail/ΔBtail tests also exercise ZBigFile/BTree epochs (creation/deletion) well.
-
- 19 Jan, 2022 3 commits
-
-
Kirill Smelkov authored
tDB.commit always creates only one transaction and so wcfs should be expected to catch up with only that single one -> no need to loop. No need to keep tDB._wc_zheadv as we have information about all committed transactions in t.dFtail.
-
Kirill Smelkov authored
.commit is the only caller of ._wcsync. .commit is also the only place via which tests are intended to modify ZODB.
-
Kirill Smelkov authored
- .commit performs ZODB commit and synchronizes WCFS to database changes; - ._commit performs ZODB commit without WCFS synchronization. We will soon need ._commit to create initial revisions for ZBigFile while WCFS is not yet started.
-
- 18 Jan, 2022 1 commit
-
-
Kirill Smelkov authored
The invalidation logic is generally right, but invalidateBlk -> ΔFtail.BlkRevAt was being called with ctx without transaction. As the result it was panicking as panic: transaction: no current transaction goroutine 41 [running]: lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x9696d8, 0xc0000d8080}) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77 lab.nexedi.com/kirr/neo/go/transaction.Current(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc00010c640, {0x9696d8, 0xc0000d8080}, 0x4) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x9696d8, 0xc0000d8080}, 0xc000064f60, 0x0, 0x3e5983329bbd100) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d main.(*BigFile).invalidateBlk.func1(0xc000164400, {0x9696d8, 0xc0000d8080}, 0xc0005a0000, 0x200000, 0x200000, {0xc0005a0000, 0x200000, 0x200000}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1089 +0xb8 main.(*BigFile).invalidateBlk(0xc000164400, {0x9696d8, 0xc0000d8080}, 0x0) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1105 +0x3bb main.(*Root).handleδZ.func3({0x9696d8, 0xc0000d8080}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:898 +0x34 lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 on any new change to tracked file block whose previous history is not covered by ΔFtail/ΔBtail. Problem reported by @Francois.
-
- 12 Nov, 2021 1 commit
-
-
Kirill Smelkov authored
Else every time test.py/wcfs is run several empty directories are left in /dev/shm/wcfs - each corresponding to WCFS server that was automatically spawned and stopped at the end of the test. Over time this can accumulate to some big number as e.g. ~20000 of such directories were left on the testnode during last 6 months.
-
- 28 Oct, 2021 7 commits
-
-
Kirill Smelkov authored
By using WCFS as mmap-overlay for base data(*). WCFS-mode is still opt-in with default remaining to use old full user-space virtual memory manager mode as initially introduced in 2015. Wendelin.core should be draftly usable in WCFS mode now. This patch is organized as follows: - file_zodb.cpp provides mmap-overlay operations for WCFS implemented via WCFS client library. - file_zodb.py is adjusted accordingly to use WCFS if requested. Low-level things specific to gluing to file_zodb.cpp are moved to _file_zodb.pyx. - the rest of the changes are drive-by by main ones. (*) see the following patches for what is mmap-overlay: - fae045cc (bigfile/virtmem: Introduce "mmap overlay" mode) - 23362204 (bigfile/py: Allow PyBigFile backend to expose "mmap overlay" functionality) Some preliminary history: kirr/wendelin.core@01916f09 X Draft demo that reading data through wcfs works kirr/wendelin.core@fd58082a X Fix build on old GCC kirr/wendelin.core@f622e751 X tests: Stop wcfs spawned during tests kirr/wendelin.core@f118617b X tests: Don't try to stop wcfs that is already exited
-
Kirill Smelkov authored
This patch follows-up on previous patch, that added server-side part of isolation protocol handling, and adds client package that takes care about WCFS isolation protocol details and provides to clients simple interface to isolated view of bigfile data on WCFS similar to regular files: given a particular revision of database @at, it provides synthetic read-only bigfile memory mappings with data corresponding to @at state, but using /head/bigfile/* most of the time to build and maintain the mappings. The patch is organized as follows: - wcfs.h and wcfs.cpp brings in usage documentation, internal overview and the main part of the implementation. - wcfs/client/client_test.py is tests. - The rest of the changes in wcfs/client/ are to support the implementation and tests. Quoting package documentation for the reference: ---- 8< ---- Package wcfs provides WCFS client. This client package takes care about WCFS isolation protocol details and provides to clients simple interface to isolated view of bigfile data on WCFS similar to regular files: given a particular revision of database @at, it provides synthetic read-only bigfile memory mappings with data corresponding to @at state, but using /head/bigfile/* most of the time to build and maintain the mappings. For its data a mapping to bigfile X mostly reuses kernel cache for /head/bigfile/X with amount of data not associated with kernel cache for /head/bigfile/X being proportional to δ(bigfile/X, at..head). In the usual case where many client workers simultaneously serve requests, their database views are a bit outdated, but close to head, which means that in practice the kernel cache for /head/bigfile/* is being used almost 100% of the time. A mapping for bigfile X@at is built from OS-level memory mappings of on-WCFS files as follows: ___ /@revA/bigfile/X __ /@revB/bigfile/X _ /@revC/bigfile/X + ... ─── ───── ────────────────────────── ───── /head/bigfile/X where @revR mmaps are being dynamically added/removed by this client package to maintain X@at data view according to WCFS isolation protocol(*). API overview - `WCFS` represents filesystem-level connection to wcfs server. - `Conn` represents logical connection that provides view of data on wcfs filesystem as of particular database state. - `FileH` represent isolated file view under Conn. - `Mapping` represents one memory mapping of FileH. A path from WCFS to Mapping is as follows: WCFS.connect(at) -> Conn Conn.open(foid) -> FileH FileH.mmap([blk_start +blk_len)) -> Mapping A connection can be resynced to another database view via Conn.resync(at'). Documentation for classes provides more thorough overview and API details. -------- (*) see wcfs.go documentation for WCFS isolation protocol overview and details. . Wcfs client organization ~~~~~~~~~~~~~~~~~~~~~~~~ Wcfs client provides to its users isolated bigfile views backed by data on WCFS filesystem. In the absence of Isolation property, wcfs client would reduce to just directly using OS-level file wcfs/head/f for a bigfile f. On the other hand there is a simple, but inefficient, way to support isolation: for @at database view of bigfile f - directly use OS-level file wcfs/@at/f. The latter works, but is very inefficient because OS-cache for f data is not shared in between two connections with @at1 and @at2 views. The cache is also lost when connection view of the database is resynced on transaction boundary. To support isolation efficiently, wcfs client uses wcfs/head/f most of the time, but injects wcfs/@revX/f parts into mappings to maintain f@at view driven by pin messages that wcfs server sends to client in accordance to WCFS isolation protocol(*). Wcfs server sends pin messages synchronously triggered by access to mmaped memory. That means that a client thread, that is accessing wcfs/head/f mmap, is completely blocked while wcfs server sends pins and waits to receive acks from all clients. In other words on-client handling of pins has to be done in separate thread, because wcfs server can also send pins to client that triggered the access. Wcfs client implements pins handling in so-called "pinner" thread(+). The pinner thread receives pin requests from wcfs server via watchlink handle opened through wcfs/head/watch. For every pin request the pinner finds corresponding Mappings and injects wcfs/@revX/f parts via Mapping._remmapblk appropriately. The same watchlink handle is used to send client-originated requests to wcfs server. The requests are sent to tell wcfs that client wants to observe a particular bigfile as of particular revision, or to stop watching it. Such requests originate from regular client threads - not pinner - via entry points like Conn.open, Conn.resync and FileH.close. Every FileH maintains fileh._pinned {} with currently pinned blk -> rev. This dict is updated by pinner driven by pin messages, and is used when new fileh Mapping is created (FileH.mmap). In wendelin.core a bigfile has semantic that it is infinite in size and reads as all zeros beyond region initialized with data. Memory-mapping of OS-level files can also go beyond file size, however accessing memory corresponding to file region after file.size triggers SIGBUS. To preserve wendelin.core semantic wcfs client mmaps-in zeros for Mapping regions after wcfs/head/f.size. For simplicity it is assumed that bigfiles only grow and never shrink. It is indeed currently so, but will have to be revisited if/when wendelin.core adds bigfile truncation. Wcfs client restats wcfs/head/f at every transaction boundary (Conn.resync) and remembers f.size in FileH._headfsize for use during one transaction(%). -------- (*) see wcfs.go documentation for WCFS isolation protocol overview and details. (+) currently, for simplicity, there is one pinner thread for each connection. In the future, for efficiency, it might be reworked to be one pinner thread that serves all connections simultaneously. (%) see _headWait comments on how this has to be reworked. Wcfs client locking organization Wcfs client needs to synchronize regular user threads vs each other and vs pinner. A major lock Conn.atMu protects updates to changes to Conn's view of the database. Whenever atMu.W is taken - Conn.at is changing (Conn.resync), and contrary whenever atMu.R is taken - Conn.at is stable (roughly speaking Conn.resync is not running). Similarly to wcfs.go(*) several locks that protect internal data structures are minor to Conn.atMu - they need to be taken only under atMu.R (to synchronize e.g. multiple fileh open running simultaneously), but do not need to be taken at all if atMu.W is taken. In data structures such locks are noted as follows sync::Mutex xMu; // atMu.W | atMu.R + xMu After atMu, Conn.filehMu protects registry of opened file handles (Conn._filehTab), and FileH.mmapMu protects registry of created Mappings (FileH.mmaps) and FileH.pinned. Several locks are RWMutex instead of just Mutex not only to allow more concurrency, but, in the first place for correctness: pinner thread being core element in handling WCFS isolation protocol, is effectively invoked synchronously from other threads via messages coming through wcfs server. For example Conn.resync sends watch request to wcfs server and waits for the answer. Wcfs server, in turn, might send corresponding pin messages to the pinner and _wait_ for the answer before answering to resync: - - - - - - | .···|·····. ----> = request pinner <------.↓ <···· = response | | wcfs resync -------^↓ | `····|····· - - - - - - client process This creates the necessity to use RWMutex for locks that pinner and other parts of the code could be using at the same time in synchronous scenarios similar to the above. This locks are: - Conn.atMu - Conn.filehMu Note that FileH.mmapMu is regular - not RW - mutex, since nothing in wcfs client calls into wcfs server via watchlink with mmapMu held. The ordering of locks is: Conn.atMu > Conn.filehMu > FileH.mmapMu The pinner takes the following locks: - wconn.atMu.R - wconn.filehMu.R - fileh.mmapMu (to read .mmaps + write .pinned) (*) see "Wcfs locking organization" in wcfs.go Handling of fork When a process calls fork, OS copies its memory and creates child process with only 1 thread. That child inherits file descriptors and memory mappings from parent. To correctly continue using Conn, FileH and Mappings, the child must recreate pinner thread and reconnect to wcfs via reopened watchlink. The reason here is that without reconnection - by using watchlink file descriptor inherited from parent - the child would interfere into parent-wcfs exchange and neither parent nor child could continue normal protocol communication with WCFS. For simplicity, since fork is seldomly used for things besides followup exec, wcfs client currently takes straightforward approach by disabling mappings and detaching from WCFS server in the child right after fork. This ensures that there is no interference into parent-wcfs exchange should child decide not to exec and to continue running in the forked thread. Without this protection the interference might come even automatically via e.g. Python GC -> PyFileH.__del__ -> FileH.close -> message to WCFS. ---------------------------------------- Some preliminary history: a8fa9178 X wcfs: move client tests into client/ 990afac1 X wcfs/client: Package overview (draft) 3f83469c X wcfs: client: Handle fork 0ed6b8b6 fixup! X wcfs: client: Handle fork 24378c46 X wcfs: client: Provide Conn.at()
-
Kirill Smelkov authored
Via custom isolation protocol that both server and clients must cooperatively follow. This is the core change that enables file cache to be practically shared while each client can still be provided with isolated view of the database. This patch brings only server changes, tests + the minimum client bits to support the tests. The client library, that will implement isolation protocol on client side, will come next. This patch is organized as follows: - wcfs.go brings in description of the protocol, overview of how server implements that protocol and the implementation itself. See also notes.txt - wcfs_test.py brings in tests for server implementation. tWCFS._abort_ontimeout had to be moved into nogil mode into wcfs_test.pyx to avoid deadlock on the GIL (see comments in wcfs_test.pyx for details). - files added in wcfs/client/ are needed to provide client-side implementation of WatchLink - the message exchange protocol over opened head/watch file - for tests. Client-side watchlink implementation lives in wcfs/client/wcfs_watchlink.{h,cpp}. The other additions in wcfs/client/ are to support that and to expose the WatchLink to Python. Client-side bits are done right in C++ because upcoming WCFS client library will be implemented in C++ to work in nogil mode in order to avoid deadlock on the GIL because client-side pinner thread might be woken-up synchronously by WCFS server at any moment, including when another client thread already holds the GIL and is paused by WCFS. Some preliminary history: kirr/wendelin.core@9b4a42a3 X invalidation design draftly settled kirr/wendelin.core@27d91d47 X δFtail settled kirr/wendelin.core@c27c1940 X mmap over under pagefault to this mmapping works kirr/wendelin.core@d36b171f X ptrace when client is under pagefault or syscall won't work kirr/wendelin.core@c1f5bb19 X notes on why lazy-invalidate approach was taken kirr/wendelin.core@4fbdd270 X Proof that that it is possible to change mmapping while under pagefault to it kirr/wendelin.core@33e0dfce X ΔTail draftly done kirr/wendelin.core@12628943 X make sure "bye" is always processed immediately - even if a handleWatch is currently blocked kirr/wendelin.core@af0a64cb X test for "bye" canceling blocked handlers kirr/wendelin.core@996dc6a8 X Fix race in test kirr/wendelin.core@43915fe9 X wcfs: Don't forbid simultaneous watch requests kirr/wendelin.core@941dc54b X wcfs: threading.Lock -> sync.Mutex kirr/wendelin.core@d75b2304 X wcfs: Move _abort_ontimeout to pyx/nogil kirr/wendelin.core@79234659 X Notes on why eagier invalidation was rejected kirr/wendelin.core@f05271b1 X Test that sysread(/head/watch) can be interrupted kirr/wendelin.core@5ba816da X restore test_wcfs_watch_robust after f05271b1. kirr/wendelin.core@4bd88564 X "Invalidation protocol" -> "Isolation protocol" kirr/wendelin.core@f7b54ca4 X avoid fmt::vsprintf (now compils again with latest pygolang@master) kirr/wendelin.core@0a8fcd9d X wcfs/client: Move EOF -> pygolang kirr/wendelin.core@153e02e6 X test_wcfs_watch_setup and test_wcfs_watch_setup_ahead work again kirr/wendelin.core@17f98edc X wcfs: client: os: Factor syserr -> string into _sysErrString kirr/wendelin.core@7b0c301c X wcfs: tests: Fix tFile.assertBlk not to segfault on a test failure kirr/wendelin.core@b74dda09 X Start switching Track from Track(key) to Track(keycov) kirr/wendelin.core@8b5d8523 X Move tracking of which blocks were accessed from wcfs to ΔFtail
-
Kirill Smelkov authored
Use ΔFtail.Track on every READ, and query accumulated ΔFtail upon receiving ZODB invalidation to query it about which blocks of which files have been changed. Then invalidate those blocks in OS file cache. See added documentation to wcfs.go and notes.txt for details. Now the filesystem is no longer stale: it provides view of data that is uptodate wrt changes on ZODB storage. Some preliminary history: kirr/wendelin.core@9b4a42a3 X invalidation design draftly settled kirr/wendelin.core@27d91d47 X δFtail settled kirr/wendelin.core@33e0dfce X ΔTail draftly done kirr/wendelin.core@822366a7 X keeping fd to root opened prevents the filesystem from being unmounted kirr/wendelin.core@89ad3a79 X Don't keep ZBigFile activated during whole current transaction kirr/wendelin.core@245511ac X Give pointer on from where to get nxd-fuse.ko kirr/wendelin.core@d1cd128c X Hit FUSE-related deadlock kirr/wendelin.core@d134ee44 X FUSE lookup deadlock should be hopefully fixed kirr/wendelin.core@0e60e9ff X wcfs: Don't noise ZWatcher trace logs with "select ..." kirr/wendelin.core@bf9a7405 X No longer rely on ZODB cache invariant for invalidations
-
Kirill Smelkov authored
For WCFS to be efficient it will have to carefully preserve OS cache on file invalidations. As preparatory step establish infrastructure for verifying state of OS file cache and start asserting on OS cache state in a couple of places. See comments added to tFile constructor that describe how OS cache state verification is setup. Some preliminary history: kirr/wendelin.core@8293025b X Thoughts on how to avoid readahead touching pages of neighbour block kirr/wendelin.core@3054e4a3 X not touching neighbour block works via setting MADV_RANDOM in last 1/4 of every block kirr/wendelin.core@18362227 X #5 access still triggers read to #4 ? kirr/wendelin.core@17dbf94e X Provide mlock2 fallback for Ubuntu kirr/wendelin.core@d134c0b9 X wcfs: test: try to live with only hard memlock limit adjusted kirr/wendelin.core@c2423296 X Fix mlock2 build on Debian 8
-
Kirill Smelkov authored
Provide filesystem view of in-ZODB ZBigFiles, but do not implement support for invalidations nor isolation protocol yet. In particular, because ZODB invalidations are not yet handled, the filesystem does not update its data in accordance with ZODB updates, and instead provides stale data view that corresponds to the state of ZODB at the time when wcfs was mounted. The main parts of this patch are: - wcfs/wcfs.go is filesystem implementation itself together with overview. - wcfs/__init__.py is python wrapper to spawn and interoperate with that filesystem. - wcfs/wcfs_test.py is tests. Some preliminary history: kirr/wendelin.core@fe7efb94 X start of wcfs kirr/wendelin.core@878b2787 X draft loading kirr/wendelin.core@d58c71e8 X don't overalign end by 1 blksize if end is already aligned kirr/wendelin.core@29c9f13d X readBlk: Fix thinko in already case kirr/wendelin.core@59552328 X wcfs: Care to disable OS polling on us kirr/wendelin.core@c00d94c7 X workaround lack of exception chaining on Python2 with xdefer kirr/wendelin.core@0398e23d X bytearray turned out to be copying data kirr/wendelin.core@7a837040 X print wcfs.py py-level traceback on SIGBUS (e.g. wcfs.go aborting due to bug/panic) kirr/wendelin.core@661b871f X make sure tests don't get stuck even if wcfs gets killed -9 ... kirr/wendelin.core@2c043d29 X More effort to unmount failed wcfs.go kirr/wendelin.core@1ccc4478 X Use `with gil` + regular py code instead of PyGILState_Ensure/PyGILState_Release/PyRun_SimpleString kirr/wendelin.core@5dc9c791 X wcfs: Kill xdefer kirr/wendelin.core@91e9eba8 X wcfs: test: Register tFile to tDB early kirr/wendelin.core@a7138fef X wcfs: mkdir /tmp/wcfs with sticky bit kirr/wendelin.core@1eec76d0 X wcfs: try to set sticky for /tmp/wcfs even if the directory already exists kirr/wendelin.core@c2c35851 X wcfs: tests: Factor-out waiting for a general condition to become true into waitfor kirr/wendelin.core@78f36993 X wcfs: test: Fix thinko in getting /sys/fs/fuse/connection/<X> for wcfs kirr/wendelin.core@bc9eb16f X wcfs: tests: Don't use testmntpt everywhere kirr/wendelin.core@6dec74e7 X wcfs: tests: Split tDB into -> tDB + tWCFS kirr/wendelin.core@3a6bd764 X wcfs: tests: Run `fusermount -u` the second time if we had to kill wcfs kirr/wendelin.core@112720f3 X wcfs: tests: Print which files are still opened on wcfs if `fusermount -u` fails kirr/wendelin.core@bb40185b X wcfs: Take $WENDELIN_CORE_WCFS_OPTIONS into account not only from under join kirr/wendelin.core@03a9ef33 X wcfs: Remove credentials from zurl when computing wcfs mountpoint kirr/wendelin.core@68ee5bdc X wcfs: lsof tweaks kirr/wendelin.core@21671879 X wcfs: Teach entrypoint frontend to handle subcommands: serve, status, stop kirr/wendelin.core@b0642b80 X wcfs: Switch mountpoints from /tmp/wcfs/* to /dev/shm/* kirr/wendelin.core@b0ca031f X wcfs: Teach join/serve to start successfully even after unclean wcfs shutdown kirr/wendelin.core@5bfa8cf8 X wcfs: Add start to spawn a Server that can be later stopped (draft) kirr/wendelin.core@5fcec261 X wcfs: Run fusermount and friends with /bin:/usr/bin always on path kirr/wendelin.core@669d7a20 fixup! X wcfs: Run fusermount and friends with /bin:/usr/bin always on path kirr/wendelin.core@6b22f8c4 X wcfs: Teach start to start successfully even after unclean wcfs shutdown kirr/wendelin.core@15389db0 X wcfs: Tune _fuse_unmount to include `fusermount -u` error message into raised exception kirr/wendelin.core@153c002a X wcfs: _fuse_unmount: Try first `kill -TERM` before `kill -QUIT` wcfs kirr/wendelin.core@3244f3a6 X wcfs: lsof +D misbehaves - don't use it kirr/wendelin.core@a126e709 X wcfs: Put client log into its own logger kirr/wendelin.core@ac303d1e X wcfs: tests: -v -> show only wcfs.py logs verbosely kirr/wendelin.core@d671a9e9 X wcfs: Give more time to stop wcfs server
-
Kirill Smelkov authored
Add initial stub for WCFS program and tests. WCFS functionality will be added step-by-step in follow-up commits. Some preliminary history: kirr/wendelin.core@0ae88a32 X .nxdtest: Verify Go bits with GOMAXPROCS=1,2,`nproc` kirr/wendelin.core@23528eb4 X wcfs: make it to use go modules for dependencies
-
- 24 Oct, 2017 1 commit
-
-
Kirill Smelkov authored
Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options. Nexedi stack is licensed under Free Software licenses with various exceptions that cover three business cases: - Free Software - Proprietary Software - Rebranding As long as one intends to develop Free Software based on Nexedi stack, no license cost is involved. Developing proprietary software based on Nexedi stack may require a proprietary exception license. Rebranding Nexedi stack is prohibited unless rebranding license is acquired. Through this licensing approach, Nexedi expects to encourage Free Software development without restrictions and at the same time create a framework for proprietary software to contribute to the long term sustainability of the Nexedi stack. Please see https://www.nexedi.com/licensing for details, rationale and options.
-
- 03 Apr, 2015 2 commits
-
-
Kirill Smelkov authored
Exposes BigFile - this way users can define BigFile backend in Python. Also exposed are BigFile handles, and VMA objects which are results of mmaping.
-
Kirill Smelkov authored
This will be a module to allow users to program custom file-like backends and latter memory-map content of files from the backend. For now just start the module structure.
-