- 27 Jan, 2022 1 commit
-
-
Kirill Smelkov authored
* master: wcfs: client: Adjust to be forward-compatible with upcoming pygolang changes wcfs: client: Adjust naming for os::File to match upcoming pygolang
-
- 26 Jan, 2022 2 commits
-
-
Kirill Smelkov authored
Don't use golang::* namespaces to avoid clashes with pygolang adding something in there and getting compilation error due to conflict, when e.g. pygolang!17 lands. -> use xgolang:: as top-level namespace for what was previously living in golang:: shipped in wendelin.core. Inside that xgolang:: namespace don't use the same package names that might be used inside golang:: in pygolang. This avoids ambiguation and compile error in the future on e.g. os::AfterFork - is `using namespace golang` and `using namespace xgolang` were both activated. -> Prefix all namespaces inside xgolang:: also with "x".
-
Kirill Smelkov authored
See nexedi/pygolang!17
-
- 21 Jan, 2022 9 commits
-
-
Kirill Smelkov authored
* master: wcfs: Fix crash if on watch request setupWatch needs to access ZODB wcfs: zdata: ΔFtail tests: Fix/Adjust debug dump for computed blkRevAt wcfs: tests: Exercise watching @at0 wcfs: Adjust ΔFtail/ΔBtail to allow point-queries with at=tail wcfs: tests: Add test do demonstrate "at out of bounds" crash on readPinWatchers -> ΔFtail.BlkRevAt wcfs: tests: Move tests for crashing WCFS due to old data to dedicated section wcfs: tests: Teach tDB to create database with initial ZBigFile changes before WCFS is started wcfs: tests: Always start tDB with ZBigFile pre-created before WCFS startup wcfs: tests: Simplify syncing WCFS to database in tDB.commit wcfs: tests: Inline tDB._wcsync into tDB.commit wcfs: tests: Split tDB.commit into .commit and ._commit
-
Kirill Smelkov authored
The problem is similar to a7bf0311 (wcfs: Fix crash if on invalidation handledδZ needs to access ZODB) - I forgot to put zhead's transaction into context. Without the fix added test fails as: wcfs_test.py::test_wcfs_crash_old_data ---------------- live log call ----------------- WARNING ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.OV0rS6/1.fs M: commit -> @at0 (03e5a3342bc5ab22) M: commit -> @at1 (03e5a3342bc88899) M: f<0000000000000002> [0] INFO wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.OV0rS6/1.fs ... I0120 17:12:10.274379 704327 wcfs.go:2393] start "/dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59" "file:///tmp/testdb_fs.OV0rS6/1.fs" I0120 17:12:10.274409 704327 wcfs.go:2399] (built with go1.17.6) W0120 17:12:10.274560 704327 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.OV0rS6/1.fs: raw cache is not ready for invalidations -> NoCache forced INFO wcfs:__init__.py:334 started pid704327 @ /dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59 C: setup watch f<0000000000000002> @at1 (03e5a3342bc88899) # pinok: {} M: commit -> @at2 (03e5a3342c895777) M: f<0000000000000002> [1] M: commit -> @at3 (03e5a3342ca5ef55) M: f<0000000000000002> [0] C: setup watch f<0000000000000002> @at2 (03e5a3342c895777) # pinok: {0: @at1 (03e5a3342bc88899)} panic: transaction: no current transaction goroutine 88 [running]: lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x969718, 0xc0000b6240}) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77 lab.nexedi.com/kirr/neo/go/transaction.Current(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc0000c25a0, {0x969718, 0xc0000b6240}, 0x4) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc00009dd40, {0x969718, 0xc0000b6240}, 0xc000100540, 0x30, 0x3e5a3342c895777) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d main.(*WatchLink).setupWatch(0xc0000120a0, {0x969718, 0xc0000b6240}, 0x2, 0x3e5a3342c895777) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc0000b6240}, {0xc0000a0122, 0x0}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65 main.(*WatchLink).handleWatch(0x0, {0x969718, 0xc0000b6240}, 0x0, {0xc0000a0122, 0x28}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c main.(*WatchLink)._serve.func3({0x969718, 0xc0000b6240}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 >>> Change history by file: f<0000000000000002>: 0 1 2 3 4 5 6 7 a b c d e f g h @at0 (03e5a3342bc5ab22) @at1 (03e5a3342bc88899) 0 @at2 (03e5a3342c895777) 1 @at3 (03e5a3342ca5ef55) 0 ---------------------------------------- # wcfs was crashing in setting up watch because of "1" and "2" from above, and # 3. setupWatch was calling ΔFtail.BlkRevAt without putting zhead's transaction into ctx. wl2 = t.openwatch() > wl2.watch(zf, at2, {0:at1})
-
Kirill Smelkov authored
- put into if block to avoid collision with already-defined-elsewhere blkv - show revisions in symbolic form Noticed while working on recent change to allow ΔFtail/ΔBtail point-queries with at=tail.
-
Kirill Smelkov authored
Watching with at=tail is inevitable as explained in the previous patch.
-
Kirill Smelkov authored
This is needed because when e.g. wcfs is just started the coverage of ΔFtail is (head,head] i.e. empty, and if user wants to setup a watch with at=head, it becomes watch with at=tail. Then that at is used in a query and if point-queries with at=tail are disallowed it panics with "at out of bounds". This fixes crashes in test_wcfs_watch_setup (see 339f1884 "wcfs: tests: Always start tDB with ZBigFile pre-created before WCFS startup") and in test_wcfs_crash_old_data (see 97ce5105 "wcfs: tests: Add test do demonstrate "at out of bounds" crash on readPinWatchers -> ΔFtail.BlkRevAt") For the reference zodb.ΔTail already allows point queries with at=tail: https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L202-206 https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L225-228
-
Kirill Smelkov authored
The codepath that sends pin messages to watchers on FUSE READ, similarly to what was showed in 339f1884 is also vulnerable to "at out of bounds" panic if at=ΔFtail.tail: wcfs_test.py::test_wcfs_crash_old_data ---------------- live log call ----------------- WARNING ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.nbSKXu/1.fs M: commit -> @at0 (03e5a31e5e5ef6bb) M: commit -> @at1 (03e5a31e5e63fa77) M: f<0000000000000002> [0] INFO wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.nbSKXu/1.fs ... I0120 16:50:22.136098 697106 wcfs.go:2393] start "/dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652" "file:///tmp/testdb_fs.nbSKXu/1.fs" I0120 16:50:22.136127 697106 wcfs.go:2399] (built with go1.17.6) W0120 16:50:22.136233 697106 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.nbSKXu/1.fs: raw cache is not ready for invalidations -> NoCache forced INFO wcfs:__init__.py:334 started pid697106 @ /dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652 C: setup watch f<0000000000000002> @at1 (03e5a31e5e63fa77) # pinok: {} panic: at out of bounds: at: @03e5a31e5e63fa77, (tail, head] = (@03e5a31e5e63fa77, @03e5a31e5e63fa77] goroutine 7 [running]: lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc0000a5d40, {0x969718, 0xc000076140}, 0xc0001a22a0, 0xc0001c0200, 0x3e5a31e5e63fa77) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45 main.(*BigFile).readPinWatchers(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, 0xffffffffffffffff) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1559 +0x2a5 main.(*BigFile).readBlk(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, {0xc000320000, 0x200000, 0x0}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1281 +0x4d2 main.(*BigFile).Read.func1({0x969718, 0xc000076140}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1223 +0x71 lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 >>> Change history by file: f<0000000000000002>: 0 1 2 3 4 5 6 7 a b c d e f g h @at0 (03e5a31e5e5ef6bb) @at1 (03e5a31e5e63fa77) 0 ... @func def test_wcfs_crash_old_data(): # start wcfs with ΔFtail/ΔBtail not covering that initial data. t = tDB(old_data=[{0:'a'}]); zf = t.zfile; at1 = t.head defer(t.close) f = t.open(zf) # ΔFtail coverage is currently (at1,at1] wl = t.openwatch() wl.watch(zf, at1, {}) # wcfs is crashing on readPinWatcher -> ΔFtail.BlkRevAt with # "at out of bounds: at: @at1, (tail,head] = (@at1,@at1] # because BlkRevAt(at=tail) query was disallowed. > f.assertBlk(0, 'a') # [0] becomes tracked Still also crashing in test_wcfs_watch_setup.
-
Kirill Smelkov authored
Soon this test will also exercise functionality from isolation protocol as well and so it will stop to be basic. Move plus rename test_wcfs_basic_invalidation_wo_dFtail_coverage -> test_wcfs_crash_old_data. Still crashing in test_wcfs_watch_setup.
-
Kirill Smelkov authored
This semantically moves initialization code from test_wcfs_basic_invalidation_wo_dFtail_coverage (see a7bf0311 "wcfs: Fix crash if on invalidation handledδZ needs to access ZODB") to tDB itself, and will be useful to exercise similar scenarios in other tests. Still crashing in test_wcfs_watch_setup.
-
Kirill Smelkov authored
This should hopefully exercise codepaths in wcfs.go a bit more for mistakes similar to a7bf0311 (wcfs: Fix crash if on invalidation handledδZ needs to access ZODB) where the code on server side forgets to put zhead's transaction into context. Currently, because watching @tail is disallowed, this leads to panic triggered by test_wcfs_watch_setup: @at0 (03e59e3e606b89bb) -> @at1 (03e59e3e610692bb) -> @at2 (03e59e3e612a5811) -> @at3 (03e59e3e614fa9cc) -> @at4 (03e59e3e6189c3ee) -> @at5 (03e59e3e61af0baa) C: setup watch f<0000000000000002> @at0 (03e59e3e606b89bb) # pinok: {0: @at0 (03e59e3e606b89bb), 2: @at0 (03e59e3e606b89bb), 3: @at0 (03e59e3e606b89bb), 5: @at0 (03e59e3e606b89bb)} panic: at out of bounds: at: @03e59e3e606b89bb, (tail, head] = (@03e59e3e606b89bb, @03e59e3e61af0baa] goroutine 187 [running]: lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x969718, 0xc000062940}, 0xc0003060c0, 0x4174f4, 0x3e59e3e606b89bb) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45 main.(*WatchLink).setupWatch(0xc000108050, {0x969718, 0xc000062940}, 0x2, 0x3e59e3e606b89bb) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc000062940}, {0xc00001c812, 0xa00000}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65 main.(*WatchLink).handleWatch(0x74039b, {0x969718, 0xc000062940}, 0xc0000a4280, {0xc00001c812, 0x28}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c main.(*WatchLink)._serve.func3({0x969718, 0xc000062940}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 >>> Change history by file: f<0000000000000002>: 0 1 2 3 4 5 6 7 a b c d e f g h @at0 (03e59e3e606b89bb) @at1 (03e59e3e610692bb) 2 @at2 (03e59e3e612a5811) 2 3 4 5 @at3 (03e59e3e614fa9cc) 0 2 5 @at4 (03e59e3e6189c3ee) 2 4 5 @at5 (03e59e3e61af0baa) 3 5 However next we will anyway need to allow to setup watches @tail, and so we will be fixing this and other errors in followup commits. NOTE: we don't loose coverage for the case when ZBigFile is created after wcfs startup due to test_wcfs_watch_2files, where that scenario is tested. ΔFtail/ΔBtail tests also exercise ZBigFile/BTree epochs (creation/deletion) well.
-
- 19 Jan, 2022 5 commits
-
-
Kirill Smelkov authored
tDB.commit always creates only one transaction and so wcfs should be expected to catch up with only that single one -> no need to loop. No need to keep tDB._wc_zheadv as we have information about all committed transactions in t.dFtail.
-
Kirill Smelkov authored
.commit is the only caller of ._wcsync. .commit is also the only place via which tests are intended to modify ZODB.
-
Kirill Smelkov authored
- .commit performs ZODB commit and synchronizes WCFS to database changes; - ._commit performs ZODB commit without WCFS synchronization. We will soon need ._commit to create initial revisions for ZBigFile while WCFS is not yet started.
-
Kirill Smelkov authored
* master: wcfs: Setup basic logging for warnings/errors to go to stderr when invoked as e.g. `wcfs serve` wcfs: Fix crash if on invalidation handledδZ needs to access ZODB t/qemu-runlinux: Use multidevs=remaps for 9P setup fixup! *: Use defer for dbclose & friends wcfs: client: tests: Turn SIGSEGV in tMapping.assertBlk into exception wcfs: Server.stop: Don't log "after SIGTERM" when first wait for wcfs.go exit failed wcfs: Server.stop: Don't report first unmount failure to outside
-
Kirill Smelkov authored
If we do not setup logging explicitly, it will print only No handlers could be found for logger "wcfs" instead of emitting useful details in before e.g. RuntimeError: fuse_unmount /dev/shm/wcfs/<X>: failed: fusermount: failed to unmount /dev/shm/wcfs/<X>: Device or resource busy (more details logged) -> Fix it.
-
- 18 Jan, 2022 1 commit
-
-
Kirill Smelkov authored
The invalidation logic is generally right, but invalidateBlk -> ΔFtail.BlkRevAt was being called with ctx without transaction. As the result it was panicking as panic: transaction: no current transaction goroutine 41 [running]: lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x9696d8, 0xc0000d8080}) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77 lab.nexedi.com/kirr/neo/go/transaction.Current(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374 lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc00010c640, {0x9696d8, 0xc0000d8080}, 0x4) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73 lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x9696d8, 0xc0000d8080}, 0xc000064f60, 0x0, 0x3e5983329bbd100) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d main.(*BigFile).invalidateBlk.func1(0xc000164400, {0x9696d8, 0xc0000d8080}, 0xc0005a0000, 0x200000, 0x200000, {0xc0005a0000, 0x200000, 0x200000}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1089 +0xb8 main.(*BigFile).invalidateBlk(0xc000164400, {0x9696d8, 0xc0000d8080}, 0x0) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1105 +0x3bb main.(*Root).handleδZ.func3({0x9696d8, 0xc0000d8080}) /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:898 +0x34 lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1() /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68 created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92 on any new change to tracked file block whose previous history is not covered by ΔFtail/ΔBtail. Problem reported by @Francois.
-
- 26 Nov, 2021 1 commit
-
-
Kirill Smelkov authored
Fixes the following warning that started to appear: kirr@deca:~/src/wendelin/wendelin.core/t$ ./qemu-runlinux -g /home/kirr/src/linux/obj-qemu_debug/arch/x86/boot/bzImage /bin/bash qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'! See https://wiki.qemu.org/Documentation/9psetup for documentation of multidevs option.
-
- 23 Nov, 2021 4 commits
-
-
Kirill Smelkov authored
In 5c8340d2 we said: dbclose now uses defer almost everywhere - there are still few places in tests, where one test function is opening/closing test database multiple times - those were not (yet ?) converted. Let's convert those remaining places now, because when wendelin.core tests are run wrt plain ZODB4 (contrary to ZODB4-wc2), many tests fail at fileh_open time, e.g. @func def test_bigfile_filezodb_fileh_gc(): root1= dbopen() conn1= root1._p_jar db = conn1.db() defer(db.close) root1['zfile4'] = f1 = ZBigFile(blksize) transaction.commit() > fh1 = f1.fileh_open() bigfile/tests/test_filezodb.py:588: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ bigfile/file_zodb.py:603: in fileh_open fileh = _ZBigFileH(self, _use_wcfs) bigfile/file_zodb.py:664: in __init__ self.zfileh = zfile._v_file.fileh_open(use_wcfs) bigfile/_file_zodb.pyx:112: in wendelin.bigfile._file_zodb._ZBigFile.fileh_open pywconn = wczsync.pywconnOf(zconn) wcfs/client/_wczsync.pyx:56: in wendelin.wcfs.client._wczsync.pywconnOf wconn = wc.connect(zconn_at(zconn)) lib/zodb.py:163: in zconn_at "nexedi/ZODB!1") _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ patch = 'conn:MVCC-via-loadBefore-only', details_link = 'nexedi/ZODB!1' def _zassertHasNXDPatch(patch, details_link): if not _zhasNXDPatch(patch): raise AssertionError( "ZODB%s is not patched with required Nexedi patch %r\n\tSee %s for details" % > (zmajor, patch, details_link)) E AssertionError: ZODB4 is not patched with required Nexedi patch 'conn:MVCC-via-loadBefore-only' E See nexedi/ZODB!1 for details and DB is left unclosed. This change should reduce, if not completely fix, the number of leaked /tmp/testdb_* directories for Wendelin.core.UnitTest-ZODB4(xfail) testsuite.
-
Kirill Smelkov authored
When WCFS-mmapped memory is accessed, it can get SIGBUS on IO error (and automatically on WCFS crash), and SIGSEGV when accessed client mapping is closed. tFile.assertBlk in wcfs_test.py already converts SIGSEGV into python exception when accessing on-wcfs file's block. However tMapping.assertBlk was not doing so, which, instead of providing proper details, leads to test crashes if something goes wrong. For example when wendelin.core tests are run wrt plain ZODB4 (contrary to ZODB4-wc2, see nexedi/ZODB!1 and nexedi/slapos@e256ed97), it first fails in pinner and then gets SIGSEGV on data access, because, to mimic SIGBUS on EIO, pinner shutdowns all mappings on its failure: https://lab.nexedi.com/nexedi/wendelin.core/blob/49f826b1/wcfs/client/wcfs.cpp#L477-501 https://nexedijs.erp5.net/#/test_result_module/20211118-7C45220A/25 -> Fix it by wrapping test block access with appropriate read_exfault variant. Before this patch: .../wendelin.core$ WENDELIN_CORE_TEST_DB='<zeo>' WENDELIN_CORE_VIRTMEM='r:wcfs+w:uvmm' python -m pytest -vsx wcfs/ -k test_wcfs_client ... wcfs/client/client_test.py::test_wcfs_client -------------------- live log call --------------------- INFO wcfs:__init__.py:293 starting for zeo://localhost:28866 ... I1122 19:17:14.376182 110032 wcfs.go:2384] start "/dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2" "zeo://localhost:28866" I1122 19:17:14.376291 110032 wcfs.go:2390] (built with go1.17.3) W1122 19:17:14.380882 110032 storage.go:152] zodb: FIXME: open zeo://localhost:28866: raw cache is not ready for invalidations -> NoCache forced INFO wcfs:__init__.py:334 started pid110032 @ /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2 M: commit -> @at0 (03e452313dddbc00) M: commit -> @at1 (03e452313e0f3b99) M: f<0000000000000002> [2, 3] M: commit -> @at2 (03e452313e1adb55) M: f<0000000000000002> [2] M: commit -> @at3 (03e452313e3be500) M: f<0000000000000002> [3, 4] W1122 19:17:14.597654 110032 wcfs.go:2050] /@03e452313d343c88/bigfile: lookup "0000000000000002": bigfopen 0000000000000002 @03e452313d343c88: invalid argument: Get 0000000000000002: Get 03e452313d343c88:0000000000000002: zeo://localhost:28866: load 03e452313d343c88:0000000000000002: 0000000000000002: no such object E1122 19:17:14.597759 110032 wcfs.go:1220] /head/bigfile/0000000000000002: readblk #4: pin watchers: wlink1: f<0000000000000002>: pin #4 @03e452313d343c88: expect "ack"; got "nak: _remmapblk #4 @03e452313d343c88: open /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2/@03e452313d343c88/bigfile/0000000000000002: Invalid argument" F1122 19:17:14.597803 110050 wcfs/client/wcfs.cpp:487] CRITICAL: pinner: pin f<0000000000000002> #4 @03e452313d343c88: _remmapblk #4 @03e452313d343c88: open /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2/@03e452313d343c88/bigfile/0000000000000002: Invalid argument F1122 19:17:14.597835 110050 wcfs/client/wcfs.cpp:488] CRITICAL: wcfs server will likely kill us soon. CRITICAL: pinner: pin f<0000000000000002> #4 @03e452313d343c88: _remmapblk #4 @03e452313d343c88: open /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2/@03e452313d343c88/bigfile/0000000000000002: Invalid argument CRITICAL: wcfs server will likely kill us soon. Segmentation fault: read @00007ff7b9534000 /home/kirr/src/wendelin/wendelin.core/wcfs/client/./../../bigfile/liblibvirtmem.so(dump_traceback+0x34)[0x7ff7d6b5c279] /home/kirr/src/wendelin/wendelin.core/wcfs/client/./../../bigfile/liblibvirtmem.so(+0x27b0)[0x7ff7d6b577b0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7ff7da078140] python(PyString_FromStringAndSize+0x228)[0x5627feb96b58] python(PyEval_EvalFrameEx+0x603e)[0x5627febb7a4e] python(PyEval_EvalCodeEx+0x57c)[0x5627febb03cc] ... python(PyObject_Call+0x43)[0x5627feb9d903] python(+0x18a7e1)[0x5627fec5d7e1] python(Py_Main+0x3ad)[0x5627fec4b8ed] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff7d9d59d0a] python(_start+0x2a)[0x5627fec4b46a] Ошибка сегментирования (стек памяти сброшен на диск) After this patch: .../wendelin.core$ WENDELIN_CORE_TEST_DB='<zeo>' WENDELIN_CORE_VIRTMEM='r:wcfs+w:uvmm' python -m pytest -vsx wcfs/ -k test_wcfs_client ... wcfs/client/client_test.py::test_wcfs_client -------------------- live log call --------------------- INFO wcfs:__init__.py:293 starting for zeo://localhost:22854 ... I1122 18:17:22.486445 102541 wcfs.go:2384] start "/dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98" "zeo://localhost:22854" I1122 18:17:22.486525 102541 wcfs.go:2390] (built with go1.17.3) W1122 18:17:22.489908 102541 storage.go:152] zodb: FIXME: open zeo://localhost:22854: raw cache is not ready for invalidations -> NoCache forced INFO wcfs:__init__.py:334 started pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98 M: commit -> @at0 (03e451f560834477) M: commit -> @at1 (03e451f560a2aa77) M: f<0000000000000002> [2, 3] M: commit -> @at2 (03e451f560adafcc) M: f<0000000000000002> [2] M: commit -> @at3 (03e451f560d02111) M: f<0000000000000002> [3, 4] W1122 18:17:22.703710 102541 wcfs.go:2050] /@03e451f55fcc4c77/bigfile: lookup "0000000000000002": bigfopen 0000000000000002 @03e451f55fcc4c77: invalid argument: Get 0000000000000002: Get 03e451f55fcc4c77:0000000000000002: zeo://localhost:22854: load 03e451f55fcc4c77:0000000000000002: 0000000000000002: no such object E1122 18:17:22.703840 102541 wcfs.go:1220] /head/bigfile/0000000000000002: readblk #4: pin watchers: wlink1: f<0000000000000002>: pin #4 @03e451f55fcc4c77: expect "ack"; got "nak: _remmapblk #4 @03e451f55fcc4c77: open /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98/@03e451f55fcc4c77/bigfile/0000000000000002: Invalid argument" F1122 18:17:22.704380 102558 wcfs/client/wcfs.cpp:487] CRITICAL: pinner: pin f<0000000000000002> #4 @03e451f55fcc4c77: _remmapblk #4 @03e451f55fcc4c77: open /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98/@03e451f55fcc4c77/bigfile/0000000000000002: Invalid argument F1122 18:17:22.704639 102558 wcfs/client/wcfs.cpp:488] CRITICAL: wcfs server will likely kill us soon. CRITICAL: pinner: pin f<0000000000000002> #4 @03e451f55fcc4c77: _remmapblk #4 @03e451f55fcc4c77: open /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98/@03e451f55fcc4c77/bigfile/0000000000000002: Invalid argument CRITICAL: wcfs server will likely kill us soon. >>> Change history by file: f<0000000000000002>: 0 1 2 3 4 5 6 7 a b c d e f g h @at0 (03e451f560834477) @at1 (03e451f560a2aa77) 2 3 @at2 (03e451f560adafcc) 2 @at3 (03e451f560d02111) 3 4 INFO wcfs:__init__.py:400 unmount/stop wcfs pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98 I1122 18:17:22.728452 102541 wcfs.go:2560] stop "/dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98" "zeo://localhost:22854" FAILED ======================= FAILURES ======================= ___________________ test_wcfs_client ___________________ @func def test_wcfs_client(): t = tDB(); zf = t.zfile; at0=t.at0 defer(t.close) pinned = lambda fh: fhpinned(t, fh) at1 = t.commit(zf, {2:'c1', 3:'d1'}) at2 = t.commit(zf, {2:'c2'}) wconn = t.wc.connect(at1) defer(wconn.close) fh = wconn.open(zf._p_oid) defer(fh.close) # create mmap with 1 block beyond file size m1 = fh.mmap(2, 3) defer(m1.unmap) assert m1.blk_start == 2 assert m1.blk_stop == 5 assert len(m1.mem) == 3*zf.blksize tm1 = tMapping(t, m1) assert pinned(fh) == {} # verify initial data reads tm1.assertBlk(2, 'c1', {2:at1}) tm1.assertBlk(3, 'd1', {2:at1}) tm1.assertBlk(4, '', {2:at1}) # commit with growing file size -> verify data read as the same, #3 pinned. # (#4 is not yet pinned because it was not accessed) at3 = t.commit(zf, {3:'d3', 4:'e3'}) assert pinned(fh) == {2:at1} tm1.assertBlk(2, 'c1', {2:at1}) tm1.assertBlk(3, 'd1', {2:at1, 3:at1}) tm1.assertBlk(4, '', {2:at1, 3:at1}) # resync at1 -> at2: #2 must unpin to @head; #4 must stay as zero wconn.resync(at2) assert pinned(fh) == {3:at1} tm1.assertBlk(2, 'c2', { 3:at1}) tm1.assertBlk(3, 'd1', { 3:at1}) > tm1.assertBlk(4, '', { 3:at1, 4:at0}) # XXX at0->ø ? wcfs/client/client_test.py:158: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ wcfs/client/client_test.py:86: in assertBlk _ = read_exfault_withgil(blkview[0:1]) wcfs/internal/wcfs_test.pyx:90: in wendelin.wcfs.internal.wcfs_test.read_exfault_withgil return _read_exfault(mem, withgil=True) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > raise SegmentationFault() E SegmentationFault wcfs/internal/wcfs_test.pyx:120: SegmentationFault ------------------ Captured log call ------------------- INFO wcfs:__init__.py:293 starting for zeo://localhost:22854 ... INFO wcfs:__init__.py:334 started pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98 INFO wcfs:__init__.py:400 unmount/stop wcfs pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98
-
Kirill Smelkov authored
Here wcfs.go should have exited due to either unmount request, _or_ SIGTERM.
-
Kirill Smelkov authored
If first unmount fails, e.g. due to "device or resource is busy", we are trying to unmount the filesystem the second time after force kill/FUSE-abort (see 5f684a49 "wcfs: Server.stop: Make sure to remove mount entry even if we had to use FUSE abort"). This way the caller of Server.stop should get an error only if that second unmount fails, not on unmount-1 error, which should be considered as internal to Server.stop implementation. If we don't hide that unmount-1 error and raise it to the caller, from outside it can confusingly look like "the server is successfully stopped, but nevertheless we are raised with an error".
-
- 17 Nov, 2021 3 commits
-
-
Kirill Smelkov authored
* t2: . wendelin.core v2.0.alpha1
-
Kirill Smelkov authored
* master: wendelin.core v2.0.alpha1
-
Kirill Smelkov authored
-
- 16 Nov, 2021 13 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* t2: *: Cosmetics . . lib/zodb: Mark test_zconn_at as xfail on plain ZODB4 . . wcfs: Server.stop: Make sure to remove mount entry even if we had to use FUSE abort tests: Don't leak WCFS log files tests: Remove test NEO database after test run is over nxdtest: Don't run test.go for multiple GOMAXPROCS wcfs: Make sure to remove mountpoint directory on Server.stop nxdtest: Run WCFS-related tests in verbose mode on testnodes setup: Fix egg_info after addition of δbtail.go
-
Kirill Smelkov authored
* master: *: Cosmetics
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* master: lib/zodb: Mark test_zconn_at as xfail on plain ZODB4
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
This way on plain ZODB4 the following non-wcfs tests will continue to pass test.py/fs-!wcfs test.py/zeo-!wcfs test.py/neo-!wcfs instead of failing as e.g. in here: https://nexedijs.erp5.net/#/test_result_module/20211116-123A66706 On plain ZODB4 WCFS-related functionality - which uses zconn_at - will continue to raise corresponding assertion in WCFS-related tests, as e.g. in https://nexedijs.erp5.net/#/test_result_module/20211116-123A66706/6
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* master: wcfs: Server.stop: Make sure to remove mount entry even if we had to use FUSE abort tests: Don't leak WCFS log files tests: Remove test NEO database after test run is over nxdtest: Don't run test.go for multiple GOMAXPROCS wcfs: Make sure to remove mountpoint directory on Server.stop nxdtest: Run WCFS-related tests in verbose mode on testnodes setup: Fix egg_info after addition of δbtail.go
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* t2: . fixup! wcfs: Handle ZODB invalidations . . wcfs/internal/mm: Complete the package fixup! wcfs: client: Provide client package to care about isolation protocol details lib/zodb: zconn_at: Fix how ZODB4 is asserted to be patched . . . . .
-
Kirill Smelkov authored
-
- 15 Nov, 2021 1 commit
-
-
Kirill Smelkov authored
Server.stop currently tries to unmount, and if that fails invokes FUSE abort and kills wcfs.go . However it does not call unmount the second time after such abort, and this way the filesystem remains mounted (in ENOTCONN state) and rmdir(mountpoint) fails. -> Fix it by calling unmount the second time if we had to abort FUSE connection. In that second try use lazy unmounting, because regular unmount can still fail with "Device or resource busy" since there could be still client file descriptors left pointing to the mounted filesystem. With lazy mode unmounting + followup rmdir, hopefully, always succeeds. Here is example test run where one test timed out, FUSE connection was aborted, but neither the filesystem was unmounted, nor mountpoint directory was deleted, which led to all followup tests failing in setup assert that testmountpoint does not exist: https://nexedijs.erp5.net/#/test_result_module/20211112-1ACEA62D/22 This patch should fix those followup failures + fix another leakage of WCFS mounts in real services.
-