1. 30 Jul, 2023 3 commits
    • Levin Zimmermann's avatar
      qa: lib/tests/zodb += zurl_normalize_main · 6032b274
      Levin Zimmermann authored
      After moving zurl filtering to a dedicated function, we can
      now test this function for correctness. It's important that different
      clients which point to the same storage always result in the same
      zodburi, even if their initial user-specified zodburi slightly differs
      (e.g. due to different client-side parameters or different paths of encryption).
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!17
      6032b274
    • Levin Zimmermann's avatar
      wcfs: Move zuri filter to lib/zodb · ae54c563
      Levin Zimmermann authored
      The WCFS mountpoint of any ZODB storage must be a unique, persistent,
      repeatable hash. This means any client which uses the same storage must
      always calculate the same WCFS mountpoint (independent from
      client-only parameters etc.). Therefore the WCFS mountpoint calculation
      must be robust for all supported ZODB storage types (at least NEO, ZEO,
      filestorage).
      
      It was recently decided [1] that in order to provide this robustness, WCFS
      mountpoint calculation should filter the parsed URI in order to drop
      parts, which prevents the repeatability/persistence across different
      clients (e.g. parts which can differ between clients although the same
      storage is accessed). In order to make this filtering implementation a
      bit easier to read and the wcfs/__init__.py less dense, the first step
      is to move the zurl filtering ("normalization") into lib/zodb.py
      This also makes sense since this normalization can be regarded as a
      general zodb tool which may be useful for other solutions which use
      zodburi.
      
      [1] neoppod!18 (comment 184671)
      
      /reviewed-by @kirr
      /reviewed-on !17
      ae54c563
    • Levin Zimmermann's avatar
      lib/tests: Fix flaky zstor_2zurl test · cc33d610
      Levin Zimmermann authored
      Kirill noted that nexedi/wendelin.core@fb620301 introduced a regression [1]:
      'test_zstor_2zurl' sometimes passes and sometimes fails. The reason for
      this is that there is no deterministic order of master nodes in
      'NodeManager.getMasterList()', which is why there is no specified
      order of master node addresses in a zurl [2]. We don't want to normalize
      a zurl returned by 'zstor_2zurl' as we need some of the client-specific
      parameters as SSL file paths, so we rather fix the test to allow any
      possible order of NEO master nodes in the zurl.
      
      [1] nexedi/wendelin.core!17 (comment 188102)
      [2] https://lab.nexedi.com/nexedi/wendelin.core/blob/fb620301/lib/zodb.py#L414
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!17
      cc33d610
  2. 19 Jun, 2023 1 commit
    • Levin Zimmermann's avatar
      lib/zodb/zstor_2zurl/NEO: support > 1 master nodes · fb620301
      Levin Zimmermann authored
      The old code raised an explicit exception when converting a NEO storage
      with > 1 master nodes into a URI. Perhaps the rationale for this exception
      was that there isn't any agreed on order of master nodes in a NEO URI,
      which means that building a URI from such a storage could potentially
      break the invariant that any client which points to the same storage
      should result in the same WCFS mountpoint.
      With 6f5196fa we can now rely on
      WCFS mountpoint calculation to always return the same mountpoint even if
      the order of master node addresses differ. Therefore we can drop this
      exception and allow WCFS to support NEO clusters with more than one master.
      
      --------
      
      kirr: support for multiple masters was simply not implemented because in
      a05db040 (lib/zodb: Teach zstor_2zurl about ZEO, NEO and Demo storages)
      I though that we do not yet actually need it and wanted to have
      something minimal first.
      
      I agree that in WCFS context it is ok and makes sense to normalize zurl
      to have masters coming in particular order. But at zstor_2zurl level we
      rely on the order of masters that app.nm.getMasterList gives us. The
      normalization is separate function.
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!17
      fb620301
  3. 21 Dec, 2022 3 commits
  4. 26 Nov, 2022 3 commits
    • Levin Zimmermann's avatar
      zstor_2zurl: Fix ipv6 host for NEO/ZEO + test fix · 20498b2f
      Levin Zimmermann authored
      This patch allows using WCFS with a NEO or ZEO storage which is
      reachable by a URL which contains an ipv6 host.
      
      Without this patch the following example doesn't work:
      
      >>> from wendelin.lib.zodb import dbopen
      >>> root = dbopen("neo://cluster-name@[::1]:2051")
      >>> # "abc" points to a ZBigArray
      >>> root["abc"][0]
      
      It doesn't work because the parser missed adding square brackets around
      ipv6 hosts, due to which unparsing the resulting URL resulted in a wrong
      interpretation where a port starts.
      
      This patch furthermore amends 'test_zstor_2zurl' to test ZEO and NEO
      storages with ipv6 hosts.
      
      ---
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!13
      20498b2f
    • Levin Zimmermann's avatar
      lib/zodb/zstor_2zurl: Add comprehensive tests · 0a09d51e
      Levin Zimmermann authored
      This patch adds comprehensive tests for 'wendelin.lib.zodb.zstor_2zurl'.
      Before this patch only one related test existed ('test_zurlstable').
      This test only lightly checked correct functionality of 'zstor_2zurl'.
      Therefore we added the new tests 'test_zstor_2zurl' and
      'test_zurlsamedb'.
      
      The new tests only cover existing functionality.
      
      ---
      
      Co-authored-by: kirr
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!13
      0a09d51e
    • Levin Zimmermann's avatar
      test_zodb/zsync: Fix ZEO storage synchronization · 28a7db7f
      Levin Zimmermann authored
      Before this patch 'zsync(storage)' was effectless for ZEO storages, it
      didn't synchronize the client with the server. This patch fixes 'zsync',
      so that it also performs synchronization of ZEO clients.
      
      Background information:
      =======================
      
      2006 the sync mode of ZEO has been removed:
      
        nexedi/ZEO@629b0667
      
      and only async mode was supported from then. This means, that the "sync"
      method of ZEO.ClientStorage was in fact effectless. In ZEO 5 the
      "server-sync" option has been added:
      
        https://github.com/zopefoundation/ZEO/pull/63
      
      Setting this option to 'True' makes the 'sync' method performing a
      "server round trip, thus causing client to wait for outstanding
      invalidations" [1]. In this patch we imitate the effect of this flag
      for both ZEO 4 and ZEO 5.
      
      [1] https://github.com/zopefoundation/ZEO/blob/423cb8563be3e1ee0bb4297ee980d9b74f09c710/src/ZEO/ClientStorage.py#L225-L226
      
      ---
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!13
      28a7db7f
  5. 10 Nov, 2022 1 commit
    • Levin Zimmermann's avatar
      BigArray: Fix API deviation with ndarray (shape) · adffe247
      Levin Zimmermann authored
      The 'shape' argument of 'numpy.ndarray's initialization method accepts
      integer and sequences of integers. But the 'shape' property of
      'numpy.ndarray' always returns tuple[int, ...], so numpy manually
      casts any legal argument into tuple[int, ...].
      
      In 'BigArray' and 'ZBigArray' this internal casting didn't exist yet.
      This patch adds the casting.
      
      Before:
      
        ZBigArray(shape=[1, 2, 3], dtype=float).shape == [1, 2, 3]
      
      After:
      
        ZBigArray(shape=[1, 2, 3], dtype=float).shape == (1, 2, 3)
      
      In this way BigArray and ZBigArray API behaves closer to numpy.ndaray,
      which should help avoiding confusion when people are using BigArray /
      ZBigArray.
      
      -----
      
      See issue nexedi/wendelin.core#9 and
      MR nexedi/wendelin.core!14
      for additional context.
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!14
      adffe247
  6. 18 May, 2022 1 commit
    • Kirill Smelkov's avatar
      demo_zbigarray: Fix it for Python3 · 61dc1ff2
      Kirill Smelkov authored
      Wendelin.core already supports Python3 relatively well, but demo_zbigarray.py,
      that is invoked only manually, was missing compatibility bits for xrange:
      
          (neo) (py3.venv) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/nexedi/wendelin.core$ ./demo/demo_zbigarray.py gen 1.fs
          I: RAM:  15.29GB
          I: WORK: 30.57GB
          gen signal t=0...4.10e+09  float64  (= 30.57GB)
          Traceback (most recent call last):
            File "/home/kirr/src/wendelin/wendelin.core/./demo/demo_zbigarray.py", line 154, in <module>
              main()
            File "/home/kirr/src/wendelin/venv/py3.venv/lib/python3.9/site-packages/decorator.py", line 232, in fun
              return caller(func, *(extras + args), **kw)
            File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 103, in _
              return f(*argv, **kw)
            File "/home/kirr/src/wendelin/wendelin.core/./demo/demo_zbigarray.py", line 142, in main
              gen(sig)
            File "/home/kirr/src/wendelin/wendelin.core/./demo/demo_zbigarray.py", line 74, in gen
              for t0 in xrange(0, len(a), blocksize):
          NameError: name 'xrange' is not defined
      
      -> Fix it.
      61dc1ff2
  7. 02 Feb, 2022 1 commit
    • Kirill Smelkov's avatar
      setup: Fix sdist/egg_info/... on Python3 · 3d0f134c
      Kirill Smelkov authored
      Arnaud reports that wendelin.core currently cannot be installed on
      Python3:
      
          /opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/python3/bin/python3 /tmp/tmp1fuxchsb -q develop -mN -d /opt/slapgrid/3f9add9291086dee302fc478df4b3130/develop-eggs/tmps5jr7ymsbuild
          /opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/dist.py:476: UserWarning: Normalizing '2.0.alpha2.post1' to '2.0a2.post1'
          package init file '__init__.py' not found (or not a regular file)
          Traceback (most recent call last):
           File "/tmp/tmp1fuxchsb", line 19, in <module>
             exec(compile(f.read(), '/opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/wendelin.core/setup.py', 'exec'))
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/wendelin.core/setup.py", line 426, in <module>
             """.splitlines()]
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/develop-eggs/pygolang-0.1-py3.7-linux-x86_64.egg/golang/pyx/build.py", line 118, in setup
             setuptools_dso.setup(**kw)
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools_dso-1.7-py3.7.egg/setuptools_dso/__init__.py", line 37, in setup
             _setup(**kws)
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/__init__.py", line 162, in setup
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/python3/lib/python3.7/distutils/core.py", line 148, in setup
             dist.run_commands()
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/python3/lib/python3.7/distutils/dist.py", line 966, in run_commands
             self.run_command(cmd)
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/python3/lib/python3.7/distutils/dist.py", line 985, in run_command
             cmd_obj.run()
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/develop.py", line 38, in run
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/develop.py", line 136, in install_for_development
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/python3/lib/python3.7/distutils/cmd.py", line 313, in run_command
             self.distribution.run_command(command)
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/parts/python3/lib/python3.7/distutils/dist.py", line 985, in run_command
             cmd_obj.run()
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/egg_info.py", line 296, in run
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/egg_info.py", line 303, in find_sources
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/egg_info.py", line 537, in run
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/egg_info.py", line 591, in prune_file_list
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/egg_info.py", line 452, in prune
           File "/opt/slapgrid/3f9add9291086dee302fc478df4b3130/eggs/setuptools-44.1.1-py3.7.egg/setuptools/command/egg_info.py", line 405, in _remove_files
          TypeError: cannot use a string pattern on a bytes-like object
          While:
           Installing wendelin.core.
      
      The problem turned out to be that git-lsfiles output, that we add into
      list of source files, is bytes and it breaks when those bytes get
      intermixed into strings.
      
      -> Fix it by always returning from runcmd the str type of current python.
      
      /reported-by @arnau
      3d0f134c
  8. 27 Jan, 2022 3 commits
    • Kirill Smelkov's avatar
      Fix build_dso on clean checkout · ad6305c0
      Kirill Smelkov authored
      Similarly to build_ext we need ccan/config.h to be present for dso to
      build. It was not the case and so pip install wendelin.core was failing:
      
          $ pip install wendelin.core-2.0a2.tar.gz
          Processing ./wendelin.core-2.0a2.tar.gz
            Installing build dependencies ... done
            Getting requirements to build wheel ... done
              Preparing wheel metadata ... done
          Collecting ZODB>=4
          ...
          Building wheels for collected packages: wendelin.core
            Building wheel for wendelin.core (PEP 517) ... error
            ERROR: Command errored out with exit status 1:
            ...
            running build_dso
            Building DSOs
            building 'wendelin.bigfile.libvirtmem' DSO as build/lib.linux-x86_64-2.7/wendelin/bigfile/liblibvirtmem.so
            creating build/temp.linux-x86_64-2.7
            creating build/temp.linux-x86_64-2.7/bigfile
            creating build/temp.linux-x86_64-2.7/lib
            x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -ffile-prefix-map=/build/python2.7-vgIf7a/python2.7-2.7.18=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -D_GNU_SOURCE -I/tmp/pip-build-env-lfVr7E/overlay/lib/python2.7/site-packages -I. -I./include -I./3rdparty/ccan -I./3rdparty/include -Ibuild/lib.linux-x86_64-2.7/. -c bigfile/pagefault.c -o build/temp.linux-x86_64-2.7/bigfile/pagefault.o -fno-strict-aliasing -std=gnu99 -fplan9-extensions -Wno-declaration-after-statement -Wno-error=declaration-after-statement
            In file included from ./include/wendelin/list.h:11,
                             from ./include/wendelin/bigfile/virtmem.h:50,
                             from bigfile/pagefault.c:29:
            ./3rdparty/ccan/ccan/array_size/array_size.h:4:10: fatal error: config.h: Нет такого файла или каталога
                4 | #include "config.h"
                  |          ^~~~~~~~~~
            compilation terminated.
            error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
            ----------------------------------------
            ERROR: Failed building wheel for wendelin.core
          Failed to build wendelin.core
          ERROR: Could not build wheels for wendelin.core which use PEP 517 and cannot be installed directly
      
      -> Fix it by making build_dso also first come through `make all`.
      
      NOTE we cannot fix it in exactly the same way as for build_ext: if we split
      build_dso into build_dso and ll_build_dso, `make all` will still go to infinite
      recursion: build_dso -> ll_build_dso -> build_dso (not ll_build_dso, this is controlled by setuptools_dso) -> oops.
      ad6305c0
    • Kirill Smelkov's avatar
      wendelin.core v2.0.alpha2 · 5e5ad598
      Kirill Smelkov authored
      5e5ad598
    • Kirill Smelkov's avatar
      wcfs: client: Switch to File IO provided by Pygolang · a36cdcc3
      Kirill Smelkov authored
      Starting from version 0.1 pygolang provides File out of the box:
      
      nexedi/pygolang@4690460b
      https://pypi.org/project/pygolang/#pygolang-change-history
      
      -> Use it and remove our custom File implementation that originally
      served as POC for that pygolang functionality.
      a36cdcc3
  9. 26 Jan, 2022 2 commits
  10. 21 Jan, 2022 8 commits
    • Kirill Smelkov's avatar
      wcfs: Fix crash if on watch request setupWatch needs to access ZODB · 38dde766
      Kirill Smelkov authored
      The problem is similar to a7bf0311 (wcfs: Fix crash if on invalidation
      handledδZ needs to access ZODB) - I forgot to put zhead's transaction into
      context.
      
      Without the fix added test fails as:
      
          wcfs_test.py::test_wcfs_crash_old_data
          ---------------- live log call -----------------
          WARNING  ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.OV0rS6/1.fs
      
          M: commit -> @at0 (03e5a3342bc5ab22)
      
          M: commit -> @at1 (03e5a3342bc88899)
          M:      f<0000000000000002>     [0]
          INFO     wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.OV0rS6/1.fs ...
          I0120 17:12:10.274379  704327 wcfs.go:2393] start "/dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59" "file:///tmp/testdb_fs.OV0rS6/1.fs"
          I0120 17:12:10.274409  704327 wcfs.go:2399] (built with go1.17.6)
          W0120 17:12:10.274560  704327 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.OV0rS6/1.fs: raw cache is not ready for invalidations -> NoCache forced
          INFO     wcfs:__init__.py:334 started pid704327 @ /dev/shm/wcfs/556fa61a9f9675f34c6b44e1f978842c37176c59
      
          C: setup watch f<0000000000000002> @at1 (03e5a3342bc88899)
          #  pinok: {}
      
          M: commit -> @at2 (03e5a3342c895777)
          M:      f<0000000000000002>     [1]
      
          M: commit -> @at3 (03e5a3342ca5ef55)
          M:      f<0000000000000002>     [0]
      
          C: setup watch f<0000000000000002> @at2 (03e5a3342c895777)
          #  pinok: {0: @at1 (03e5a3342bc88899)}
          panic: transaction: no current transaction
      
          goroutine 88 [running]:
          lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x969718, 0xc0000b6240})
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77
          lab.nexedi.com/kirr/neo/go/transaction.Current(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc0000c25a0, {0x969718, 0xc0000b6240}, 0x4)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc00009dd40, {0x969718, 0xc0000b6240}, 0xc000100540, 0x30, 0x3e5a3342c895777)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d
          main.(*WatchLink).setupWatch(0xc0000120a0, {0x969718, 0xc0000b6240}, 0x2, 0x3e5a3342c895777)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f
          main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc0000b6240}, {0xc0000a0122, 0x0})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65
          main.(*WatchLink).handleWatch(0x0, {0x969718, 0xc0000b6240}, 0x0, {0xc0000a0122, 0x28})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c
          main.(*WatchLink)._serve.func3({0x969718, 0xc0000b6240})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
          >>> Change history by file:
      
          f<0000000000000002>:
                                          0 1 2 3 4 5 6 7
                                          a b c d e f g h
                  @at0 (03e5a3342bc5ab22)
                  @at1 (03e5a3342bc88899) 0
                  @at2 (03e5a3342c895777)   1
                  @at3 (03e5a3342ca5ef55) 0
      
          ----------------------------------------
      
                  # wcfs was crashing in setting up watch because of "1" and "2" from above, and
                  # 3. setupWatch was calling ΔFtail.BlkRevAt without putting zhead's transaction into ctx.
                  wl2 = t.openwatch()
          >       wl2.watch(zf, at2, {0:at1})
      38dde766
    • Kirill Smelkov's avatar
      wcfs: zdata: ΔFtail tests: Fix/Adjust debug dump for computed blkRevAt · ca3e54e2
      Kirill Smelkov authored
      - put into if block to avoid collision with already-defined-elsewhere blkv
      - show revisions in symbolic form
      
      Noticed while working on recent change to allow ΔFtail/ΔBtail
      point-queries with at=tail.
      ca3e54e2
    • Kirill Smelkov's avatar
      wcfs: tests: Exercise watching @at0 · 769b1c06
      Kirill Smelkov authored
      Watching with at=tail is inevitable as explained in the previous patch.
      769b1c06
    • Kirill Smelkov's avatar
      wcfs: Adjust ΔFtail/ΔBtail to allow point-queries with at=tail · ef10f820
      Kirill Smelkov authored
      This is needed because when e.g. wcfs is just started the coverage of
      ΔFtail is (head,head] i.e. empty, and if user wants to setup a watch
      with at=head, it becomes watch with at=tail. Then that at is used in a
      query and if point-queries with at=tail are disallowed it panics with
      "at out of bounds".
      
      This fixes crashes in test_wcfs_watch_setup (see 339f1884 "wcfs: tests:
      Always start tDB with ZBigFile pre-created before WCFS startup") and in
      test_wcfs_crash_old_data (see 97ce5105 "wcfs: tests: Add test do
      demonstrate "at out of bounds" crash on readPinWatchers ->
      ΔFtail.BlkRevAt")
      
      For the reference zodb.ΔTail already allows point queries with at=tail:
      
      https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L202-206
      https://lab.nexedi.com/kirr/neo/blob/1193c44e/go/zodb/δtail.go#L225-228
      ef10f820
    • Kirill Smelkov's avatar
      wcfs: tests: Add test do demonstrate "at out of bounds" crash on readPinWatchers -> ΔFtail.BlkRevAt · 97ce5105
      Kirill Smelkov authored
      The codepath that sends pin messages to watchers on FUSE READ, similarly
      to what was showed in 339f1884 is also vulnerable to "at out of bounds"
      panic if at=ΔFtail.tail:
      
          wcfs_test.py::test_wcfs_crash_old_data
          ---------------- live log call -----------------
          WARNING  ZODB.FileStorage:FileStorage.py:413 Ignoring index for /tmp/testdb_fs.nbSKXu/1.fs
      
          M: commit -> @at0 (03e5a31e5e5ef6bb)
      
          M: commit -> @at1 (03e5a31e5e63fa77)
          M:      f<0000000000000002>     [0]
          INFO     wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.nbSKXu/1.fs ...
          I0120 16:50:22.136098  697106 wcfs.go:2393] start "/dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652" "file:///tmp/testdb_fs.nbSKXu/1.fs"
          I0120 16:50:22.136127  697106 wcfs.go:2399] (built with go1.17.6)
          W0120 16:50:22.136233  697106 storage.go:152] zodb: FIXME: open file:///tmp/testdb_fs.nbSKXu/1.fs: raw cache is not ready for invalidations -> NoCache forced
          INFO     wcfs:__init__.py:334 started pid697106 @ /dev/shm/wcfs/93026d44ef96f87df2cc0e2e451c5aabee91b652
      
          C: setup watch f<0000000000000002> @at1 (03e5a31e5e63fa77)
          #  pinok: {}
          panic: at out of bounds: at: @03e5a31e5e63fa77,  (tail, head] = (@03e5a31e5e63fa77, @03e5a31e5e63fa77]
      
          goroutine 7 [running]:
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc0000a5d40, {0x969718, 0xc000076140}, 0xc0001a22a0, 0xc0001c0200, 0x3e5a31e5e63fa77)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45
          main.(*BigFile).readPinWatchers(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, 0xffffffffffffffff)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1559 +0x2a5
          main.(*BigFile).readBlk(0xc0001d0200, {0x969718, 0xc000076140}, 0x0, {0xc000320000, 0x200000, 0x0})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1281 +0x4d2
          main.(*BigFile).Read.func1({0x969718, 0xc000076140})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1223 +0x71
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
          >>> Change history by file:
      
          f<0000000000000002>:
                                          0 1 2 3 4 5 6 7
                                          a b c d e f g h
                  @at0 (03e5a31e5e5ef6bb)
                  @at1 (03e5a31e5e63fa77) 0
      
          ...
      
              @func
              def test_wcfs_crash_old_data():
                  # start wcfs with ΔFtail/ΔBtail not covering that initial data.
                  t = tDB(old_data=[{0:'a'}]); zf = t.zfile; at1 = t.head
                  defer(t.close)
      
                  f = t.open(zf)
      
                  # ΔFtail coverage is currently (at1,at1]
                  wl = t.openwatch()
                  wl.watch(zf, at1, {})
      
                  # wcfs is crashing on readPinWatcher -> ΔFtail.BlkRevAt with
                  #   "at out of bounds: at: @at1,  (tail,head] = (@at1,@at1]
                  # because BlkRevAt(at=tail) query was disallowed.
          >       f.assertBlk(0, 'a')          # [0] becomes tracked
      
      Still also crashing in test_wcfs_watch_setup.
      97ce5105
    • Kirill Smelkov's avatar
      wcfs: tests: Move tests for crashing WCFS due to old data to dedicated section · 67519be7
      Kirill Smelkov authored
      Soon this test will also exercise functionality from isolation protocol
      as well and so it will stop to be basic.
      
      Move plus rename test_wcfs_basic_invalidation_wo_dFtail_coverage ->
      test_wcfs_crash_old_data.
      
      Still crashing in test_wcfs_watch_setup.
      67519be7
    • Kirill Smelkov's avatar
      wcfs: tests: Teach tDB to create database with initial ZBigFile changes before WCFS is started · 1da89b57
      Kirill Smelkov authored
      This semantically moves initialization code from
      test_wcfs_basic_invalidation_wo_dFtail_coverage (see a7bf0311 "wcfs: Fix
      crash if on invalidation handledδZ needs to access ZODB") to tDB itself,
      and will be useful to exercise similar scenarios in other tests.
      
      Still crashing in test_wcfs_watch_setup.
      1da89b57
    • Kirill Smelkov's avatar
      wcfs: tests: Always start tDB with ZBigFile pre-created before WCFS startup · 339f1884
      Kirill Smelkov authored
      This should hopefully exercise codepaths in wcfs.go a bit more for
      mistakes similar to a7bf0311 (wcfs: Fix crash if on invalidation
      handledδZ needs to access ZODB) where the code on server side forgets to
      put zhead's transaction into context.
      
      Currently, because watching @tail is disallowed, this leads to panic triggered by test_wcfs_watch_setup:
      
          @at0 (03e59e3e606b89bb) -> @at1 (03e59e3e610692bb) -> @at2 (03e59e3e612a5811) -> @at3 (03e59e3e614fa9cc) -> @at4 (03e59e3e6189c3ee) -> @at5 (03e59e3e61af0baa)
      
          C: setup watch f<0000000000000002> @at0 (03e59e3e606b89bb)
          #  pinok: {0: @at0 (03e59e3e606b89bb), 2: @at0 (03e59e3e606b89bb), 3: @at0 (03e59e3e606b89bb), 5: @at0 (03e59e3e606b89bb)}
          panic: at out of bounds: at: @03e59e3e606b89bb,  (tail, head] = (@03e59e3e606b89bb, @03e59e3e61af0baa]
      
          goroutine 187 [running]:
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.panicf(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/misc.go:47
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x969718, 0xc000062940}, 0xc0003060c0, 0x4174f4, 0x3e59e3e606b89bb)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1077 +0xa45
          main.(*WatchLink).setupWatch(0xc000108050, {0x969718, 0xc000062940}, 0x2, 0x3e59e3e606b89bb)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1754 +0xe3f
          main.(*WatchLink)._handleWatch(0x0, {0x969718, 0xc000062940}, {0xc00001c812, 0xa00000})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1973 +0x65
          main.(*WatchLink).handleWatch(0x74039b, {0x969718, 0xc000062940}, 0xc0000a4280, {0xc00001c812, 0x28})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1955 +0x10c
          main.(*WatchLink)._serve.func3({0x969718, 0xc000062940})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1944 +0x3c
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
          >>> Change history by file:
      
          f<0000000000000002>:
                                          0 1 2 3 4 5 6 7
                                          a b c d e f g h
                  @at0 (03e59e3e606b89bb)
                  @at1 (03e59e3e610692bb)     2
                  @at2 (03e59e3e612a5811)     2 3 4 5
                  @at3 (03e59e3e614fa9cc) 0   2     5
                  @at4 (03e59e3e6189c3ee)     2   4 5
                  @at5 (03e59e3e61af0baa)       3   5
      
      However next we will anyway need to allow to setup watches @tail, and so
      we will be fixing this and other errors in followup commits.
      
      NOTE: we don't loose coverage for the case when ZBigFile is created after wcfs
      startup due to test_wcfs_watch_2files, where that scenario is tested.
      
      ΔFtail/ΔBtail tests also exercise ZBigFile/BTree epochs
      (creation/deletion) well.
      339f1884
  11. 19 Jan, 2022 4 commits
  12. 18 Jan, 2022 1 commit
    • Kirill Smelkov's avatar
      wcfs: Fix crash if on invalidation handledδZ needs to access ZODB · a7bf0311
      Kirill Smelkov authored
      The invalidation logic is generally right, but invalidateBlk -> ΔFtail.BlkRevAt
      was being called with ctx without transaction. As the result it was
      panicking as
      
          panic: transaction: no current transaction
      
          goroutine 41 [running]:
          lab.nexedi.com/kirr/neo/go/transaction.currentTxn({0x9696d8, 0xc0000d8080})
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/transaction.go:59 +0x77
          lab.nexedi.com/kirr/neo/go/transaction.Current(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/transaction/api.go:206
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).checkTxnCtx(...)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:374
          lab.nexedi.com/kirr/neo/go/zodb.(*Connection).Get(0xc00010c640, {0x9696d8, 0xc0000d8080}, 0x4)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/connection.go:331 +0x73
          lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.(*ΔFtail).BlkRevAt(0xc000077d40, {0x9696d8, 0xc0000d8080}, 0xc000064f60, 0x0, 0x3e5983329bbd100)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata/δftail.go:1140 +0x39d
          main.(*BigFile).invalidateBlk.func1(0xc000164400, {0x9696d8, 0xc0000d8080}, 0xc0005a0000, 0x200000, 0x200000, {0xc0005a0000, 0x200000, 0x200000})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1089 +0xb8
          main.(*BigFile).invalidateBlk(0xc000164400, {0x9696d8, 0xc0000d8080}, 0x0)
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:1105 +0x3bb
          main.(*Root).handleδZ.func3({0x9696d8, 0xc0000d8080})
                  /home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core/wcfs/wcfs.go:898 +0x34
          lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go.func1()
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:86 +0x68
          created by lab.nexedi.com/kirr/go123/xsync.(*WorkGroup).Go
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/go123/xsync/xsync.go:83 +0x92
      
      on any new change to tracked file block whose previous history is not covered by ΔFtail/ΔBtail.
      
      Problem reported by @Francois.
      a7bf0311
  13. 26 Nov, 2021 1 commit
    • Kirill Smelkov's avatar
      t/qemu-runlinux: Use multidevs=remaps for 9P setup · c9f64495
      Kirill Smelkov authored
      Fixes the following warning that started to appear:
      
          kirr@deca:~/src/wendelin/wendelin.core/t$ ./qemu-runlinux -g  /home/kirr/src/linux/obj-qemu_debug/arch/x86/boot/bzImage /bin/bash
          qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
      
      See https://wiki.qemu.org/Documentation/9psetup for documentation of
      multidevs option.
      c9f64495
  14. 23 Nov, 2021 4 commits
    • Kirill Smelkov's avatar
      fixup! *: Use defer for dbclose & friends · b6916ca8
      Kirill Smelkov authored
      In 5c8340d2 we said:
      
          dbclose now uses defer almost everywhere - there are still few places in
          tests, where one test function is opening/closing test database multiple
          times - those were not (yet ?) converted.
      
      Let's convert those remaining places now, because when wendelin.core
      tests are run wrt plain ZODB4 (contrary to ZODB4-wc2), many tests fail
      at fileh_open time, e.g.
      
              @func
              def test_bigfile_filezodb_fileh_gc():
                  root1= dbopen()
                  conn1= root1._p_jar
                  db   = conn1.db()
                  defer(db.close)
                  root1['zfile4'] = f1 = ZBigFile(blksize)
                  transaction.commit()
      
          >       fh1  = f1.fileh_open()
      
          bigfile/tests/test_filezodb.py:588:
          _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
          bigfile/file_zodb.py:603: in fileh_open
              fileh = _ZBigFileH(self, _use_wcfs)
          bigfile/file_zodb.py:664: in __init__
              self.zfileh = zfile._v_file.fileh_open(use_wcfs)
          bigfile/_file_zodb.pyx:112: in wendelin.bigfile._file_zodb._ZBigFile.fileh_open
              pywconn   = wczsync.pywconnOf(zconn)
          wcfs/client/_wczsync.pyx:56: in wendelin.wcfs.client._wczsync.pywconnOf
              wconn = wc.connect(zconn_at(zconn))
          lib/zodb.py:163: in zconn_at
              "nexedi/ZODB!1")
          _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      
          patch = 'conn:MVCC-via-loadBefore-only', details_link = 'nexedi/ZODB!1'
      
              def _zassertHasNXDPatch(patch, details_link):
                  if not _zhasNXDPatch(patch):
                      raise AssertionError(
                          "ZODB%s is not patched with required Nexedi patch %r\n\tSee %s for details" %
          >               (zmajor, patch, details_link))
          E           AssertionError: ZODB4 is not patched with required Nexedi patch 'conn:MVCC-via-loadBefore-only'
          E               See nexedi/ZODB!1 for details
      
      and DB is left unclosed.
      
      This change should reduce, if not completely fix, the number of
      leaked /tmp/testdb_* directories for Wendelin.core.UnitTest-ZODB4(xfail) testsuite.
      b6916ca8
    • Kirill Smelkov's avatar
      wcfs: client: tests: Turn SIGSEGV in tMapping.assertBlk into exception · c5624fa9
      Kirill Smelkov authored
      When WCFS-mmapped memory is accessed, it can get SIGBUS on IO error (and
      automatically on WCFS crash), and SIGSEGV when accessed client mapping is closed.
      
      tFile.assertBlk in wcfs_test.py already converts SIGSEGV into python
      exception when accessing on-wcfs file's block. However
      tMapping.assertBlk was not doing so, which, instead of providing proper
      details, leads to test crashes if something goes wrong.
      
      For example when wendelin.core tests are run wrt plain ZODB4 (contrary to
      ZODB4-wc2, see ZODB!1 and
      slapos@e256ed97), it first fails in
      pinner and then gets SIGSEGV on data access, because, to mimic SIGBUS on
      EIO, pinner shutdowns all mappings on its failure:
      
      https://lab.nexedi.com/nexedi/wendelin.core/blob/49f826b1/wcfs/client/wcfs.cpp#L477-501
      https://nexedijs.erp5.net/#/test_result_module/20211118-7C45220A/25
      
      -> Fix it by wrapping test block access with appropriate read_exfault
      variant.
      
      Before this patch:
      
          .../wendelin.core$ WENDELIN_CORE_TEST_DB='<zeo>' WENDELIN_CORE_VIRTMEM='r:wcfs+w:uvmm' python -m pytest -vsx wcfs/ -k test_wcfs_client
          ...
          wcfs/client/client_test.py::test_wcfs_client
          -------------------- live log call ---------------------
          INFO     wcfs:__init__.py:293 starting for zeo://localhost:28866 ...
          I1122 19:17:14.376182  110032 wcfs.go:2384] start "/dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2" "zeo://localhost:28866"
          I1122 19:17:14.376291  110032 wcfs.go:2390] (built with go1.17.3)
          W1122 19:17:14.380882  110032 storage.go:152] zodb: FIXME: open zeo://localhost:28866: raw cache is not ready for invalidations -> NoCache forced
          INFO     wcfs:__init__.py:334 started pid110032 @ /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2
      
          M: commit -> @at0 (03e452313dddbc00)
      
          M: commit -> @at1 (03e452313e0f3b99)
          M:      f<0000000000000002>     [2, 3]
      
          M: commit -> @at2 (03e452313e1adb55)
          M:      f<0000000000000002>     [2]
      
          M: commit -> @at3 (03e452313e3be500)
          M:      f<0000000000000002>     [3, 4]
          W1122 19:17:14.597654  110032 wcfs.go:2050] /@03e452313d343c88/bigfile: lookup "0000000000000002": bigfopen 0000000000000002 @03e452313d343c88: invalid argument: Get 0000000000000002: Get 03e452313d343c88:0000000000000002: zeo://localhost:28866: load 03e452313d343c88:0000000000000002: 0000000000000002: no such object
          E1122 19:17:14.597759  110032 wcfs.go:1220] /head/bigfile/0000000000000002: readblk #4: pin watchers: wlink1: f<0000000000000002>: pin #4 @03e452313d343c88: expect "ack"; got "nak: _remmapblk #4 @03e452313d343c88: open /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2/@03e452313d343c88/bigfile/0000000000000002: Invalid argument"
          F1122 19:17:14.597803  110050 wcfs/client/wcfs.cpp:487] CRITICAL: pinner: pin f<0000000000000002> #4 @03e452313d343c88: _remmapblk #4 @03e452313d343c88: open /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2/@03e452313d343c88/bigfile/0000000000000002: Invalid argument
          F1122 19:17:14.597835  110050 wcfs/client/wcfs.cpp:488] CRITICAL: wcfs server will likely kill us soon.
          CRITICAL: pinner: pin f<0000000000000002> #4 @03e452313d343c88: _remmapblk #4 @03e452313d343c88: open /dev/shm/wcfs/ef87339c054c3e0e48d494fa584bb209518844b2/@03e452313d343c88/bigfile/0000000000000002: Invalid argument
          CRITICAL: wcfs server will likely kill us soon.
          Segmentation fault: read @00007ff7b9534000
          /home/kirr/src/wendelin/wendelin.core/wcfs/client/./../../bigfile/liblibvirtmem.so(dump_traceback+0x34)[0x7ff7d6b5c279]
          /home/kirr/src/wendelin/wendelin.core/wcfs/client/./../../bigfile/liblibvirtmem.so(+0x27b0)[0x7ff7d6b577b0]
          /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7ff7da078140]
          python(PyString_FromStringAndSize+0x228)[0x5627feb96b58]
          python(PyEval_EvalFrameEx+0x603e)[0x5627febb7a4e]
          python(PyEval_EvalCodeEx+0x57c)[0x5627febb03cc]
          ...
          python(PyObject_Call+0x43)[0x5627feb9d903]
          python(+0x18a7e1)[0x5627fec5d7e1]
          python(Py_Main+0x3ad)[0x5627fec4b8ed]
          /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff7d9d59d0a]
          python(_start+0x2a)[0x5627fec4b46a]
          Ошибка сегментирования (стек памяти сброшен на диск)
      
      After this patch:
      
          .../wendelin.core$ WENDELIN_CORE_TEST_DB='<zeo>' WENDELIN_CORE_VIRTMEM='r:wcfs+w:uvmm' python -m pytest -vsx wcfs/ -k test_wcfs_client
          ...
          wcfs/client/client_test.py::test_wcfs_client
          -------------------- live log call ---------------------
          INFO     wcfs:__init__.py:293 starting for zeo://localhost:22854 ...
          I1122 18:17:22.486445  102541 wcfs.go:2384] start "/dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98" "zeo://localhost:22854"
          I1122 18:17:22.486525  102541 wcfs.go:2390] (built with go1.17.3)
          W1122 18:17:22.489908  102541 storage.go:152] zodb: FIXME: open zeo://localhost:22854: raw cache is not ready for invalidations -> NoCache forced
          INFO     wcfs:__init__.py:334 started pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98
      
          M: commit -> @at0 (03e451f560834477)
      
          M: commit -> @at1 (03e451f560a2aa77)
          M:      f<0000000000000002>     [2, 3]
      
          M: commit -> @at2 (03e451f560adafcc)
          M:      f<0000000000000002>     [2]
      
          M: commit -> @at3 (03e451f560d02111)
          M:      f<0000000000000002>     [3, 4]
          W1122 18:17:22.703710  102541 wcfs.go:2050] /@03e451f55fcc4c77/bigfile: lookup "0000000000000002": bigfopen 0000000000000002 @03e451f55fcc4c77: invalid argument: Get 0000000000000002: Get 03e451f55fcc4c77:0000000000000002: zeo://localhost:22854: load 03e451f55fcc4c77:0000000000000002: 0000000000000002: no such object
          E1122 18:17:22.703840  102541 wcfs.go:1220] /head/bigfile/0000000000000002: readblk #4: pin watchers: wlink1: f<0000000000000002>: pin #4 @03e451f55fcc4c77: expect "ack"; got "nak: _remmapblk #4 @03e451f55fcc4c77: open /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98/@03e451f55fcc4c77/bigfile/0000000000000002: Invalid argument"
          F1122 18:17:22.704380  102558 wcfs/client/wcfs.cpp:487] CRITICAL: pinner: pin f<0000000000000002> #4 @03e451f55fcc4c77: _remmapblk #4 @03e451f55fcc4c77: open /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98/@03e451f55fcc4c77/bigfile/0000000000000002: Invalid argument
          F1122 18:17:22.704639  102558 wcfs/client/wcfs.cpp:488] CRITICAL: wcfs server will likely kill us soon.
          CRITICAL: pinner: pin f<0000000000000002> #4 @03e451f55fcc4c77: _remmapblk #4 @03e451f55fcc4c77: open /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98/@03e451f55fcc4c77/bigfile/0000000000000002: Invalid argument
          CRITICAL: wcfs server will likely kill us soon.
          >>> Change history by file:
      
          f<0000000000000002>:
                                          0 1 2 3 4 5 6 7
                                          a b c d e f g h
                  @at0 (03e451f560834477)
                  @at1 (03e451f560a2aa77)     2 3
                  @at2 (03e451f560adafcc)     2
                  @at3 (03e451f560d02111)       3 4
      
          INFO     wcfs:__init__.py:400 unmount/stop wcfs pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98
          I1122 18:17:22.728452  102541 wcfs.go:2560] stop "/dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98" "zeo://localhost:22854"
          FAILED
      
          ======================= FAILURES =======================
          ___________________ test_wcfs_client ___________________
      
              @func
              def test_wcfs_client():
                  t = tDB(); zf = t.zfile; at0=t.at0
                  defer(t.close)
                  pinned = lambda fh: fhpinned(t, fh)
      
                  at1 = t.commit(zf, {2:'c1', 3:'d1'})
                  at2 = t.commit(zf, {2:'c2'})
      
                  wconn = t.wc.connect(at1)
                  defer(wconn.close)
      
                  fh = wconn.open(zf._p_oid)
                  defer(fh.close)
      
                  # create mmap with 1 block beyond file size
                  m1 = fh.mmap(2, 3)
                  defer(m1.unmap)
      
                  assert m1.blk_start == 2
                  assert m1.blk_stop  == 5
                  assert len(m1.mem)  == 3*zf.blksize
      
                  tm1 = tMapping(t, m1)
      
                  assert pinned(fh) == {}
      
                  # verify initial data reads
                  tm1.assertBlk(2, 'c1',  {2:at1})
                  tm1.assertBlk(3, 'd1',  {2:at1})
                  tm1.assertBlk(4, '',    {2:at1})
      
                  # commit with growing file size -> verify data read as the same, #3 pinned.
                  # (#4 is not yet pinned because it was not accessed)
                  at3 = t.commit(zf, {3:'d3', 4:'e3'})
                  assert pinned(fh) == {2:at1}
                  tm1.assertBlk(2, 'c1',  {2:at1})
                  tm1.assertBlk(3, 'd1',  {2:at1, 3:at1})
                  tm1.assertBlk(4, '',    {2:at1, 3:at1})
      
                  # resync at1 -> at2:    #2 must unpin to @head; #4 must stay as zero
                  wconn.resync(at2)
                  assert pinned(fh) == {3:at1}
                  tm1.assertBlk(2, 'c2',  {       3:at1})
                  tm1.assertBlk(3, 'd1',  {       3:at1})
          >       tm1.assertBlk(4, '',    {       3:at1,  4:at0})     # XXX at0->ø ?
      
          wcfs/client/client_test.py:158:
          _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
          wcfs/client/client_test.py:86: in assertBlk
              _ = read_exfault_withgil(blkview[0:1])
          wcfs/internal/wcfs_test.pyx:90: in wendelin.wcfs.internal.wcfs_test.read_exfault_withgil
              return _read_exfault(mem, withgil=True)
          _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      
          >   raise SegmentationFault()
          E   SegmentationFault
      
          wcfs/internal/wcfs_test.pyx:120: SegmentationFault
          ------------------ Captured log call -------------------
          INFO     wcfs:__init__.py:293 starting for zeo://localhost:22854 ...
          INFO     wcfs:__init__.py:334 started pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98
          INFO     wcfs:__init__.py:400 unmount/stop wcfs pid102541 @ /dev/shm/wcfs/c818c147676f8d6f3b408b02f727aca5e3229e98
      c5624fa9
    • Kirill Smelkov's avatar
      wcfs: Server.stop: Don't log "after SIGTERM" when first wait for wcfs.go exit failed · 81274eb7
      Kirill Smelkov authored
      Here wcfs.go should have exited due to either unmount request, _or_
      SIGTERM.
      81274eb7
    • Kirill Smelkov's avatar
      wcfs: Server.stop: Don't report first unmount failure to outside · d0c4469a
      Kirill Smelkov authored
      If first unmount fails, e.g. due to "device or resource is busy", we are
      trying to unmount the filesystem the second time after force
      kill/FUSE-abort (see 5f684a49 "wcfs: Server.stop: Make sure to remove
      mount entry even if we had to use FUSE abort").
      
      This way the caller of Server.stop should get an error only if that
      second unmount fails, not on unmount-1 error, which should be considered
      as internal to Server.stop implementation.
      
      If we don't hide that unmount-1 error and raise it to the caller, from
      outside it can confusingly look like "the server is successfully
      stopped, but nevertheless we are raised with an error".
      d0c4469a
  15. 16 Nov, 2021 3 commits
  16. 15 Nov, 2021 1 commit
    • Kirill Smelkov's avatar
      wcfs: Server.stop: Make sure to remove mount entry even if we had to use FUSE abort · 5f684a49
      Kirill Smelkov authored
      Server.stop currently tries to unmount, and if that fails invokes FUSE
      abort and kills wcfs.go . However it does not call unmount the second
      time after such abort, and this way the filesystem remains mounted (in
      ENOTCONN state) and rmdir(mountpoint) fails.
      
      -> Fix it by calling unmount the second time if we had to abort FUSE
      connection. In that second try use lazy unmounting, because regular
      unmount can still fail with "Device or resource busy" since there
      could be still client file descriptors left pointing to the mounted
      filesystem. With lazy mode unmounting + followup rmdir, hopefully,
      always succeeds.
      
      Here is example test run where one test timed out, FUSE connection was
      aborted, but neither the filesystem was unmounted, nor mountpoint
      directory was deleted, which led to all followup tests failing in setup
      assert that testmountpoint does not exist:
      
      https://nexedijs.erp5.net/#/test_result_module/20211112-1ACEA62D/22
      
      This patch should fix those followup failures + fix another leakage of
      WCFS mounts in real services.
      5f684a49