1. 04 Dec, 2024 8 commits
    • Kirill Smelkov's avatar
      Y wcfs: Fix and enhance `wcfs stop` to be more reliable · 4095241e
      Kirill Smelkov authored
      Since wcfs beginning - since e3f2ee2d (wcfs: Initial implementation of
      basic filesystem) `wcfs stop` was implemented as just `fusermount -u`.
      That, however, turned out to be not robust because if wcfs is
      deadlocked, unmounting hangs, and if wcfs server is crashed, but there
      are still running client processes, unmount will fail with "Device or
      resource busy" error.
      
      For the deadlocked case we often see a situation where both wcfs and
      client zope processes are hung, kill -9 does not work on them (they
      still remain hung) and there is no easy way to do the unmount and
      restart wcfs.
      
      -> Fix `wcfs stop` to do that by first breaking the deadlock via
      /sys/fs/fuse/connection/<X>/abort and making sure that:
      
      1) wcfs.go is not running,
      2) all left clients are terminated, and
      3) the mount is also gone
      
      In many ways this coincides with what Server.stop was already doing, so
      here we teach `wcfs stop` to work via that Server.stop codepath and
      adjust the latter to work ok if Server._proc is not only
      subprocess.Popen that current process spawned, but also an xos.Proc,
      that `wcfs stop` discovered. Which can be also None if wcfs.go crashed
      by itself.
      
      As explained in the comments I took the decision to kill client
      processes instead of doing the final unmount try lazily because
      
          # NOTE if we do `fusermount -uz` (lazy unmount = MNT_DETACH), we will
          #      remove the mount from filesystem tree and /proc/mounts, but the
          #      clients will be left alive and using the old filesystem which is
          #      left in a bad ENOTCONN state. From this point of view restart of
          #      the clients is more preferred compared to leaving them running
          #      but actually disconnected from the data.
          #
          # TODO try to teach wcfs clients to detect filesystem in ENOTCONN state
          #      and reconnect automatically instead of being killed. Then we could
          #      use MNT_DETACH.
      4095241e
    • Kirill Smelkov's avatar
      Y wcfs: Server.stop: Dump kernel traceback of wcfs + lsof + ... if wcfs is stuck · e45fd39f
      Kirill Smelkov authored
      Use Sever._stuckdump we just added for `wcfs status` in the previous
      patch to dump that useful information about where wcfs is stuck and
      which processes have relation to it.
      e45fd39f
    • Kirill Smelkov's avatar
      Y wcfs: Fix and enhance `wcfs status` to be reliable · e542211d
      Kirill Smelkov authored
      Since wcfs beginning - since e3f2ee2d (wcfs: Initial implementation of
      basic filesystem) `wcfs status` was implemented as just join and
      reporting ok if that worked. That, however, turned out to be not robust
      because if wcfs is deadlocked, accessing any file on the filesystem,
      even a simple file as .wcfs/zurl might hang and so the status could
      hang as well. We see lots of such hung `wcfs status` processes on
      current deployment.
      
      More, it might be the case that wcfs is deadlocked in another way - e.g.
      on zheadMu, and then accessing .wcfs/zurl will work ok, but the system
      is not in a good shape while `wcfs status` missed to report that.
      
      -> Rework `wcfs status` completely to try accessing different files on
      the filesystem and doing so in cautious way so that if wcfs is in
      problematic state `wcfs status` won't get hung and will report the
      details about wcfs server and also about filesystem clients: which files
      are kept open, and what is in-kernel traceback of the server and the
      clients in case wcfs is hung.
      
      Please see comments in the added status function for details.
      
      An example of "good" status output when everything is ok:
      
          (neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/nexedi/wendelin.core$ wcfs status  file://D/1.fs
          INFO 1204 17:04:40.154  325432 __init__.py:506] wcfs: status file://D/1.fs ...
          ok - mount entry: /dev/shm/wcfs/fccdb94842958d09c69261970b8037b0e5510fb8  (0:39)
          ok - wcfs server: pid325414 kirr wcfs
          ok - stat mountpoint
          ok - read .wcfs/zurl
          ok - read .wcfs/stats
      
      And example of "bad" status output when wcfs was simulated to be
      seen in deadlocked state by trying to read from .wcfs/debug/zhead
      instead of .wcfs/stats:
      
          root@deca:/home/kirr/src/neo/src/lab.nexedi.com/nexedi/wendelin.core# wcfs status file://D/1.fs
          INFO 1204 17:21:04.145  325658 __init__.py:506] wcfs: status file://D/1.fs ...
          ok - mount entry: /dev/shm/wcfs/fccdb94842958d09c69261970b8037b0e5510fb8  (0:39)
          ok - wcfs server: pid325414 kirr wcfs
          ok - stat mountpoint
          ok - read .wcfs/zurl
          fail - read .wcfs/stats: timed out (wcfs might be stuck)
      
          wcfs ktraceback:
          pid325414 kirr wcfs
          tid325414 kirr wcfs
          [<0>] fuse_dev_do_read+0xa29/0xa50 [fuse]
          [<0>] fuse_dev_read+0x79/0xb0 [fuse]
          [<0>] vfs_read+0x239/0x310
          [<0>] ksys_read+0x6b/0xf0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325418 kirr wcfs
          [<0>] hrtimer_nanosleep+0xc7/0x1b0
          [<0>] __x64_sys_nanosleep+0xbe/0xf0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325419 kirr wcfs
          [<0>] fuse_dev_do_read+0xa29/0xa50 [fuse]
          [<0>] fuse_dev_read+0x79/0xb0 [fuse]
          [<0>] vfs_read+0x239/0x310
          [<0>] ksys_read+0x6b/0xf0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325420 kirr wcfs
          [<0>] futex_wait_queue+0x60/0x90
          [<0>] futex_wait+0x185/0x270
          [<0>] do_futex+0x106/0x1b0
          [<0>] __x64_sys_futex+0x8e/0x1d0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325421 kirr wcfs
          [<0>] do_epoll_wait+0x698/0x7d0
          [<0>] do_compat_epoll_pwait.part.0+0xb/0x70
          [<0>] __x64_sys_epoll_pwait+0x91/0x140
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325422 kirr wcfs
          [<0>] futex_wait_queue+0x60/0x90
          [<0>] futex_wait+0x185/0x270
          [<0>] do_futex+0x106/0x1b0
          [<0>] __x64_sys_futex+0x8e/0x1d0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325423 kirr wcfs
          [<0>] futex_wait_queue+0x60/0x90
          [<0>] futex_wait+0x185/0x270
          [<0>] do_futex+0x106/0x1b0
          [<0>] __x64_sys_futex+0x8e/0x1d0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325426 kirr wcfs
          [<0>] fuse_dev_do_read+0xa29/0xa50 [fuse]
          [<0>] fuse_dev_read+0x79/0xb0 [fuse]
          [<0>] vfs_read+0x239/0x310
          [<0>] ksys_read+0x6b/0xf0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325427 kirr wcfs
          [<0>] futex_wait_queue+0x60/0x90
          [<0>] futex_wait+0x185/0x270
          [<0>] do_futex+0x106/0x1b0
          [<0>] __x64_sys_futex+0x8e/0x1d0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325428 kirr wcfs
          [<0>] fuse_dev_do_read+0xa29/0xa50 [fuse]
          [<0>] fuse_dev_read+0x79/0xb0 [fuse]
          [<0>] vfs_read+0x239/0x310
          [<0>] ksys_read+0x6b/0xf0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325429 kirr wcfs
          [<0>] futex_wait_queue+0x60/0x90
          [<0>] futex_wait+0x185/0x270
          [<0>] do_futex+0x106/0x1b0
          [<0>] __x64_sys_futex+0x8e/0x1d0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          tid325646 kirr wcfs
          [<0>] fuse_dev_do_read+0xa29/0xa50 [fuse]
          [<0>] fuse_dev_read+0x79/0xb0 [fuse]
          [<0>] vfs_read+0x239/0x310
          [<0>] ksys_read+0x6b/0xf0
          [<0>] do_syscall_64+0x58/0xc0
          [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          wcfs clients:
            pid325430 kirr bash ('bash',)
                  cwd     -> /dev/shm/wcfs/fccdb94842958d09c69261970b8037b0e5510fb8
      
                  pid325430 kirr bash
                  tid325430 kirr bash
                  [<0>] do_select+0x661/0x830
                  [<0>] core_sys_select+0x1ba/0x3a0
                  [<0>] do_pselect.constprop.0+0xe9/0x180
                  [<0>] __x64_sys_pselect6+0x53/0x80
                  [<0>] do_syscall_64+0x58/0xc0
                  [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
            pid325637 kirr ipython3 ('/usr/bin/python3', '/usr/bin/ipython3')
                  fd/12   -> /dev/shm/wcfs/fccdb94842958d09c69261970b8037b0e5510fb8/.wcfs/zurl
      
                  pid325637 kirr ipython3
                  tid325637 kirr ipython3
                  [<0>] do_epoll_wait+0x698/0x7d0
                  [<0>] __x64_sys_epoll_wait+0x6f/0x110
                  [<0>] do_syscall_64+0x58/0xc0
                  [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
                  tid325638 kirr ipython3
                  [<0>] futex_wait_queue+0x60/0x90
                  [<0>] futex_wait+0x185/0x270
                  [<0>] do_futex+0x106/0x1b0
                  [<0>] __x64_sys_futex+0x8e/0x1d0
                  [<0>] do_syscall_64+0x58/0xc0
                  [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
                  tid325640 kirr ipython3
                  [<0>] futex_wait_queue+0x60/0x90
                  [<0>] futex_wait+0x185/0x270
                  [<0>] do_futex+0x106/0x1b0
                  [<0>] __x64_sys_futex+0x8e/0x1d0
                  [<0>] do_syscall_64+0x58/0xc0
                  [<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce
      
          Traceback (most recent call last):
            File "/home/kirr/src/wendelin/venv/z-dev/bin/wcfs", line 11, in <module>
              load_entry_point('wendelin.core', 'console_scripts', 'wcfs')()
            File "<decorator-gen-42>", line 2, in main
            File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 165, in _goframe
              return f(*argv, **kw)
            File "/home/kirr/src/wendelin/wendelin.core/wcfs/__init__.py", line 997, in main
              status(zurl)
            File "/home/kirr/src/wendelin/wendelin.core/wcfs/__init__.py", line 592, in status
              verify("read .wcfs/stats", xos.readfile, "%s/.wcfs/debug/zhead" % mnt.point)
            File "<decorator-gen-43>", line 2, in verify
            File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 165, in _goframe
              return f(*argv, **kw)
            File "/home/kirr/src/wendelin/wendelin.core/wcfs/__init__.py", line 570, in verify
              fail("%s: timed out (wcfs might be stuck)" % subj)
            File "/home/kirr/src/wendelin/wendelin.core/wcfs/__init__.py", line 512, in fail
              raise RuntimeError('(failed)')
          RuntimeError: (failed)
      e542211d
    • Kirill Smelkov's avatar
      Y wcfs: Switch loglevel from WARNING -> INFO for wcfs.py commands · b0c8b32a
      Kirill Smelkov authored
      As it turned out managing and monitoring wcfs server turned out to be
      not very robust. For example if wcfs.go is dadlocked, then `wcfs.py
      status` will also hang and things like that.
      
      From this point of view for better tracibility it makes more sense to
      see the informative messages that wcfs.py commands are trying to emit.
      b0c8b32a
    • Kirill Smelkov's avatar
      Y wcfs: Switch to custom lsof · 4da6a16f
      Kirill Smelkov authored
      Previously when error on umount we were invoking lsof(8) to show list of
      files that are still opened on the filesystem. But lsof(8) turned out to
      be unreliable because it stats the filesystem and if e.g. wcfs server
      process is stopped lsof only prints
      
          WARNING:wcfs:# lsof /dev/shm/wcfs/1439df02dfcc41ab9dfb68e7ac4ad615f3b7d46e
          WARNING:wcfs:lsof: status error on /dev/shm/wcfs/1439df02dfcc41ab9dfb68e7ac4ad615f3b7d46e: Transport endpoint is not connected
          ...
          WARNING:wcfs:(lsof failed)
      
      fuser(1) from psmisc works a bit better: it can show list of still
      opened files on the mounted tree even if filesystem server is crashed.
      
      However with some version of fuser I still saw "Transport endpoint is
      not connected" once, and in the next patches we will also need to
      inspect "using" processes more, so if we are to use fuser we will need
      to parse its output which might get fragile.
      
      -> Do our own lsof utility instead.
      
      We have all the infrastructure in place to do so in the form of MountDB
      and ProcDB, and as implemented Mount.lsof() emits Proc'esses which can
      be inspected further conveniently. For now we do not do such inspection,
      but for `wcfs status` and `wcfs stop` we will want to poke with kernel
      tracebacks of those processes.
      4da6a16f
    • Kirill Smelkov's avatar
      Y wcfs: os: Add ProcDB & co · a50e2ce9
      Kirill Smelkov authored
      Add ProcDB that represents database of processes with code to query it
      in several ways. We will need this functionality for `wcfs status`,
      `wcfs stop` and probably for more.
      a50e2ce9
    • Kirill Smelkov's avatar
      Y wcfs: Switch to work with mount entries instead of mountpoints · 1335412f
      Kirill Smelkov authored
      Because mount entries provide more information compared to just one
      string mountpoint. For example later for `wcfs status` and `wcfs stop`
      we will need to use the ID of a "device" that is attached to the mount,
      and also the type of the filesystem that is serving the mount.
      
      -> Introduce internal.os.MountDB to retrieve information from OS
         registry of mounted filesystems and use its entries instead of plain
         mountpoint string.
      
      wcfs_test.py already had some rudimentary code to parse /proc/mounts
      which we also replace with querying MountDB.
      
      The API of MountDB might be viewed as a bit of overkill but it will
      align with API of upcoming ProcDB for which it will be reasonable.
      1335412f
    • Kirill Smelkov's avatar
      Y: wcfs: Factor wcfs_test.{readfile,writefile} into wcfs.internal.os · f02bad18
      Kirill Smelkov authored
      We will soon need to use this functions from several places.
      Plain code movement.
      f02bad18
  2. 29 Nov, 2024 2 commits
    • Kirill Smelkov's avatar
      Y wcfs: Make .wcfs/zhead to be debug-only feature and move it to .wcfs/debug/zhead · ae2d0f53
      Kirill Smelkov authored
      Currently it is easy to deadlock wcfs by opening .wcfs/zhead and not
      reading from opened zhead handle due to the nature of what this zhead
      file is designed to do. For this reason we want .wcfs/zhead not to be present
      on production deployments and be available only where it is needed -
      during test runs.
      
      -> Introduce -debug flag for that to "enable debugging features" and put
         .wcfs/zhead to be that debug-only feature, and move it to be
         .wcfs/debug/zhead for clarity.
      ae2d0f53
    • Kirill Smelkov's avatar
      Y wcfs: Rename -d to -trace.fuse · d4dd89c9
      Kirill Smelkov authored
      Three reasons:
      
      1) -d is not very descriptive, it's help "debug" is also too vague;
      2) currently it only activates trace of FUSE exchange
      3) in the next patch we are going to introduce -debug that will activate
         debugging-only features which will remain disabled on production runs
         for safety.
      
      -> Rename -d to -trace.fuse to clarify its meaning and prepare for
         upcoming -debug.
      d4dd89c9
  3. 26 Nov, 2024 5 commits
    • Kirill Smelkov's avatar
      XY wcfs: multiprocessing: Teach join to return what invoked function returned · 0c412a5d
      Kirill Smelkov authored
      For upcoming `wcfs status` we will need to be able to invoke functions
      from stdlib and similar and retrieve their returned values.
      
      XXX not sure the API is right. Maybe join should just join and return
      value retrieved via .retvalue() method.
      0c412a5d
    • Kirill Smelkov's avatar
      Y wcfs: multiprocessing: Allow SubProcess to be used with plain functions directly · f4cf5aff
      Kirill Smelkov authored
      Sometimes it is useful to run just f(*argv, **kw) instead of f(cin,
      cout, *argv, **kw). For example for upcoming `wcfs status` we will need
      to run os.stat(".wcfs/zurl") and similar things. Add low-level
      flag-parameter to omit passing cin and cout to the called function.
      f4cf5aff
    • Kirill Smelkov's avatar
      Y wcfs: Factor wcfs_faultyprot_test.tSubProcess into wcfs.internal.SubProcess · cd2ee503
      Kirill Smelkov authored
      We will soon need to use this utility not only in tests.
      Practically plain code movement with minimal adaptation.
      cd2ee503
    • Kirill Smelkov's avatar
      Y wcfs: tests: Teach tSubProcess to run function from any module · 989c7105
      Kirill Smelkov authored
      We will soon need to use tSubProcess not only for tests and so will move
      it out to another place. It will need to be able to run functions from
      both wcfs/wcfs_faultyprot_test.py and from other modules. Prepare to
      that.
      989c7105
    • Kirill Smelkov's avatar
      Y wcfs: tests: Fix double close of pipes in tSubProcess · 5571e203
      Kirill Smelkov authored
      Running tests in wcfs/wcfs_faultyprot_test.py was yielding errors about
      invalid closes, e.g.
      
          wcfs/wcfs_faultyprot_test.py::test_wcfs_pinhfaulty_kill_on_watch[_bad_watch_no_pin_read]
          ------------------------------------ live log call -------------------------------------
          M: commit -> @at0 (03fcee2b22c56155)
          INFO     wcfs:__init__.py:293 starting for file:///tmp/testdb_fs.v5XBEf/1.fs ...
          ...
          INFO     wcfs:__init__.py:400 unmount/stop wcfs pid875517 @ /dev/shm/wcfs/1060296857b6e9e0134ffb708787deb4039140ae
          I1126 19:11:13.746387  875517 wcfs.go:3100] stop "/dev/shm/wcfs/1060296857b6e9e0134ffb708787deb4039140ae" "file:///tmp/testdb_fs.v5XBEf/1.fs"
          close failed in file object destructor:             <-- NOTE
          IOError: [Errno 9] Bad file descriptor              <-- NOTE
          PASSED
      
      Since soon we are going to use tSubProcess not only in tests this will
      become less innocent with more incentive to fix.
      
      Looking at the strace output I saw that the same file descriptor was
      being closed twice in a row. Looking at strace further revealed that
      this fd was created via pip syscall and finally I could connect it to
      the pipe created by subprocess.Popen.
      
      So what happens here is that we were taking fd number from
      subprocess.Popen and passing it to multiprocessing.Connection with both
      thinking they own the fd, and closing it on GC.
      
      -> Fix that by really passing the fd to MPConnection as a copy first, and
         then redirecting Popen fds to devnull - to make sure in robust way
         that stdin/out IO does not interfere with pickle exchange going on in
         between child and parent via multiprocessing connections.
      
      Fixes 33ea7769 (wcfs: tests: Move client to be pinkill'ed into separate process)
      5571e203
  4. 18 Nov, 2024 1 commit
  5. 16 Nov, 2024 5 commits
  6. 01 Nov, 2024 1 commit
  7. 29 Oct, 2024 1 commit
  8. 27 Oct, 2024 2 commits
  9. 25 Oct, 2024 1 commit
  10. 07 Oct, 2024 1 commit
    • Levin Zimmermann's avatar
      wcfs/fuse: Allow other users to access file system · d3ae0c6c
      Levin Zimmermann authored
      On Theia all Slappartitions have the same user, however on bare-metal
      SlapOS each partition has a different user. In this case zope nodes are
      not allowed to read WCFS, because they belong to a different user (even
      when every user read permission). This is a feature of FUSE and can be
      disabled with 'allow_other'. This is unconditionally done with this
      patch to allow deployment of WCFS on bare-metal SlapOS.
      d3ae0c6c
  11. 19 Sep, 2024 2 commits
  12. 18 Sep, 2024 1 commit
    • Kirill Smelkov's avatar
      wcfs: Clarify error context when WatchLink.sendReq is waiting for reply · 39d53cbb
      Kirill Smelkov authored
      sendReq has two phases: a) send request, and b) read reply. When there
      is an error on the first phase, e.g. client does not read what wcfs is
      trying to send, it returns an error like
      
          pin #2 @03fb63abd6d65b33: sendReq: send .2: context deadline exceeded
      
      however when there is an error on the second phase, e.g. client does not
      reply to wcfs request, it currently returns an error like
      
          pin #2 @03fb63abd6d65b33: sendReq: context deadline exceeded
      
      which is not clear to interpret about which part was problematic.
      
      After this patch the error for the second case becomes
      
          pin #2 @03fb63abd6d65b33: sendReq: waiting for reply: context deadline exceeded
      
      which is easier to interpret.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!31
      39d53cbb
  13. 17 Sep, 2024 10 commits
    • Levin Zimmermann's avatar
      wcfs: v↑ go dependencies · 764d4da8
      Levin Zimmermann authored
      This patch updates:
      
      - github.com/golang/glog: we already wanted to do so in
          !23,
          but we deferred it to keep go 1.18 support. However in recent patches
          we already dropped go 1.18 support and we can therefore update glog now.
      - lab.nexedi.com/kirr/neo/go: add fix in handshake, see here for more information:
          kirr/neo@d75f4ac2 and
          kirr/neo@03db1d8a
      
      This patch doesn't update:
      
      - github.com/hanwen/go-fuse: This was updated upstream and Kirill already
          reviewed and integrated patches in custom branch. However when updating
          go-fuse to v2.4.3-0.20240904154523-9546fc238dc6 (this is
          kirr/go-fuse@9546fc23),
          WCFS tests fail on my machine [1] => let's defer update
      - github.com/kisielk/og-rek: there are new patches that will be needed
          in the future, but we didn't update NEO/go og-rek dependency yet,
          so let's defer the update in wendelin.core until we updated og-rek
          in NEO/go
      - github.com/johncgriffin/overflow: no update on upstream
      - github.com/pkg/errors: no update on upstream
      - github.com/stretchr/testify: This was already updated with
          c559ec1a
          'testify' seems to have a major release in the future which may break
          some of our test code, but for now major version 1 is still the
          stable release.
      
      ----
      kirr: I confirm that
      kirr/go-fuse@9546fc23 brings in
      regression to WCFS tests. It seems I missed some error in that go-fuse
      update and it will need to be bisected and debugged.
      
      ---
      
      [1] Test failure log:
      
      ========================================== FAILURES ==========================================
      ______________________________________ test_wcfs_basic _______________________________________
      
          @func
          def test_wcfs_basic():
              t = tDB(); zf = t.zfile
              defer(t.close)
      
              # >>> lookup non-BigFile -> must be rejected
              with raises(OSError) as exc:
                  t.wc._stat("head/bigfile/%s" % h(t.nonzfile._p_oid))
              assert exc.value.errno == EINVAL
      
              # >>> file initially empty
              f = t.open(zf)
              f.assertCache([])
              f.assertData ([], mtime=t.at0)
      
              # >>> (@at1) commit data -> we can see it on wcfs
              at1 = t.commit(zf, {2:'c1'})
      
              f.assertCache([0,0,0])  # initially not cached
              f.assertData (['','','c1'], mtime=t.head)
      
              # >>> (@at2) commit again -> we can see both latest and snapshotted states
              # NOTE blocks e(4) and f(5) will be accessed only in the end
              at2 = t.commit(zf, {2:'c2', 3:'d2', 5:'f2'})
      
              # f @head
      >       f.assertCache([1,1,0,0,0,0])
      
      wcfs/wcfs_test.py:1341:
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      
      t = <wcfs.wcfs_test.tFile instance at 0x7ff61457b960>, incorev = [1, 1, 0, 0, 0, 0]
      
          def assertCache(t, incorev):
      >       assert t.cached() == incorev
      E       assert [0, 0, 0, 0, 0, 0] == [1, 1, 0, 0, 0, 0]
      E         At index 0 diff: 0 != 1
      E         Use -v to get the full diff
      
      wcfs/wcfs_test.py:791: AssertionError
      ------------------------------------ Captured stdout call ------------------------------------
      
      M: commit -> @at0 (03fb5dfbe3c1cd55)
      
      M: commit -> @at1 (03fb5dfbe4936a66)
      M:      f<0000000000000002>     [2]
      
      M: commit -> @at2 (03fb5dfbe4d01166)
      M:      f<0000000000000002>     [2, 3, 5]
      >>> Change history by file:
      
      f<0000000000000002>:
                                      0 1 2 3 4 5 6 7
                                      a b c d e f g h
              @at0 (03fb5dfbe3c1cd55)
              @at1 (03fb5dfbe4936a66)     2
              @at2 (03fb5dfbe4d01166)     2 3   5
      
      ------------------------------------ Captured stderr call ------------------------------------
      I0917 12:43:53.392222  124283 wcfs.go:2752] start "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      I0917 12:43:53.392282  124283 wcfs.go:2758] (built with go1.21.13)
      W0917 12:43:53.392404  124283 storage.go:232] zodb: FIXME: open file:///tmp/testdb_fs.z5ZoMH/1.fs: raw cache is not ready for invalidations -> NoCache forced
      W0917 12:43:53.567807  124283 wcfs.go:2331] /head/bigfile: lookup "0000000000000001": bigfopen 0000000000000001 @03fb5dfbe3c1cd55: invalid argument: ZODB.Broken("persistent.Persistent") is not a ZBigFile
      I0917 12:43:53.710208  124283 wcfs.go:2933] stop "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      ------------------------------------- Captured log call --------------------------------------
      WARNING  ZODB.FileStorage:FileStorage.py:412 Ignoring index for /tmp/testdb_fs.z5ZoMH/1.fs
      _________________________________ test_wcfs_watch_vs_access __________________________________
      
          @func
          def test_wcfs_watch_vs_access():
              t = tDB(); zf = t.zfile; at0=t.at0
              defer(t.close)
      
              f = t.open(zf)
              at1 = t.commit(zf, {2:'c1'})
              at2 = t.commit(zf, {2:'c2', 3:'d2', 5:'f2'})
              at3 = t.commit(zf, {0:'a3', 2:'c3', 5:'f3'})
      
              f.assertData(['a3','','c3','d2','x','x'])
              f.assertCache([1,1,1,1,0,0])
      
              # watched + commit -> read -> receive pin messages.
              # read vs pin ordering is checked by assertBlk.
              #
              # f(5) is kept not accessed to check later how wcfs.go handles δFtail
              # rebuild after it sees not yet accessed ZBlk that has change history.
              wl3  = t.openwatch();  w3 = wl3.watch(zf, at3);  assert at3 == t.head
              assert w3.at     == at3
              assert w3.pinned == {}
      
              wl3_ = t.openwatch();  w3_ = wl3_.watch(zf, at3)
              assert w3_.at     == at3
              assert w3_.pinned == {}
      
              wl2  = t.openwatch();  w2 = wl2.watch(zf, at2)
              assert w2.at     == at2
              assert w2.pinned == {0:at0, 2:at2}
      
              # w_assertPin asserts on state of .pinned for {w3,w3_,w2}
              def w_assertPin(pinw3, pinw3_, pinw2):
                  assert w3.pinned   == pinw3
                  assert w3_.pinned  == pinw3_
                  assert w2.pinned   == pinw2
      
              f.assertCache([1,1,1,1,0,0])
              at4 = t.commit(zf, {1:'b4', 2:'c4', 5:'f4', 6:'g4'})
      >       f.assertCache([1,0,0,1,0,0,0])
      
      wcfs/wcfs_test.py:1702:
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      
      t = <wcfs.wcfs_test.tFile instance at 0x7ff614512050>, incorev = [1, 0, 0, 1, 0, 0, ...]
      
          def assertCache(t, incorev):
      >       assert t.cached() == incorev
      E       assert [0, 0, 0, 0, 0, 0, ...] == [1, 0, 0, 1, 0, 0, ...]
      E         At index 0 diff: 0 != 1
      E         Use -v to get the full diff
      
      wcfs/wcfs_test.py:791: AssertionError
      ------------------------------------ Captured stdout call ------------------------------------
      
      M: commit -> @at0 (03fb5dfc0fd82300)
      
      M: commit -> @at1 (03fb5dfc10b92ecc)
      M:      f<0000000000000049>     [2]
      
      M: commit -> @at2 (03fb5dfc10cee9dd)
      M:      f<0000000000000049>     [2, 3, 5]
      
      M: commit -> @at3 (03fb5dfc1100c999)
      M:      f<0000000000000049>     [0, 2, 5]
      
      C: setup watch f<0000000000000049> @at3 (03fb5dfc1100c999)
      
      C: setup watch f<0000000000000049> @at3 (03fb5dfc1100c999)
      
      C: setup watch f<0000000000000049> @at2 (03fb5dfc10cee9dd)
      
      M: commit -> @at4 (03fb5dfc120ed611)
      M:      f<0000000000000049>     [1, 2, 5, 6]
      >>> Change history by file:
      
      f<0000000000000049>:
                                      0 1 2 3 4 5 6 7
                                      a b c d e f g h
              @at0 (03fb5dfc0fd82300)
              @at1 (03fb5dfc10b92ecc)     2
              @at2 (03fb5dfc10cee9dd)     2 3   5
              @at3 (03fb5dfc1100c999) 0   2     5
              @at4 (03fb5dfc120ed611)   1 2     5 6
      
      ------------------------------------ Captured stderr call ------------------------------------
      I0917 12:44:03.733037  125217 wcfs.go:2752] start "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      I0917 12:44:03.733126  125217 wcfs.go:2758] (built with go1.21.13)
      W0917 12:44:03.733418  125217 storage.go:232] zodb: FIXME: open file:///tmp/testdb_fs.z5ZoMH/1.fs: raw cache is not ready for invalidations -> NoCache forced
      I0917 12:44:04.475273  125217 wcfs.go:2933] stop "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      ============================ 2 failed, 42 passed in 55.81 seconds ============================
      I0917 12:44:17.882140  125540 wcfs.go:2933] stop "/dev/shm/wcfs/c4d833a0bdea4c51decf5425b8ad2cc4d017280f" "file:///tmp/testdb_fs.bvHBy9/1.fs"
      make: *** [Makefile:174: test.wcfs] Error 1
      
      /reviewed-by @kirr
      /reviewed-on !30
      764d4da8
    • Kirill Smelkov's avatar
      wcfs: Implement protection against faulty client + related fixes and improvements · 89d653c0
      Kirill Smelkov authored
      The WCFS documentation specifies [1]:
      
      - - - 8> - - - 8> - - -
      
      If a client, on purpose or due to a bug or being stopped, is slow to respond
      with ack to file invalidation notification, it creates a problem because the
      server will become blocked waiting for pin acknowledgments, and thus all
      other clients, that try to work with the same file, will get stuck.
      
      [...]
      
      Lacking OS primitives to change address space of another process and not
      being able to work it around with ptrace in userspace, wcfs takes approach
      to kill a slow client on 30 seconds timeout by default.
      
      - - - <8 - - - <8 - - -
      
      But before, this protection wasn't implemented yet: one
      faulty client could therefore freeze the whole system. With this work
      this protection is implemented now: faulty clients are killed after the
      timeout or any other misbehaviour in their pin handlers.
      
      Working on this topic also resulted in several fixes and improvements
      around isolation protocol implementation on the server side.
      
      See individual patches for details.
      
      [1] https://lab.nexedi.com/nexedi/wendelin.core/blob/38dde766/wcfs/wcfs.go#L186-208Co-authored-by: Levin Zimmermann's avatarLevin Zimmermann <levin.zimmermann@nexedi.com>
      
      /reviewed-on !18
      89d653c0
    • Levin Zimmermann's avatar
      wcfs: Require Go 1.19 + go mod tidy · 1fcef9c9
      Levin Zimmermann authored
      I would only suggest one very tiny change. In go.mod we have:
      
          module lab.nexedi.com/nexedi/wendelin.core/wcfs
      
          go 1.14
      
      I think this needs to be updated to go 1.19 due to atomic.Int64.
      
      And maybe we just need general go mod tidy update.
      
      /reviewed-by @kirr
      /reviewed-on !18
      1fcef9c9
    • Kirill Smelkov's avatar
      wcfs: Implement protection against faulty client · c559ec1a
      Kirill Smelkov authored
      The WCFS documentation specifies [1]:
      
      - - - 8> - - - 8> - - -
      
      If a client, on purpose or due to a bug or being stopped, is slow to respond
      with ack to file invalidation notification, it creates a problem because the
      server will become blocked waiting for pin acknowledgments, and thus all
      other clients, that try to work with the same file, will get stuck.
      
      [...]
      
      Lacking OS primitives to change address space of another process and not
      being able to work it around with ptrace in userspace, wcfs takes approach
      to kill a slow client on 30 seconds timeout by default.
      
      - - - <8 - - - <8 - - -
      
      But before this patch, this protection wasn't implemented yet: one
      faulty client could therefore freeze the whole system. With this patch
      this protection is implemented now: faulty clients are killed after the
      timeout or any other misbehaviour in their pin handlers.
      
      [1] https://lab.nexedi.com/nexedi/wendelin.core/blob/38dde766/wcfs/wcfs.go#L186-208
      
      Preliminary history:
      
          levin.zimmermann/wendelin.core@24904e82
          levin.zimmermann/wendelin.core@b02dcadcCo-authored-by: Levin Zimmermann's avatarLevin Zimmermann <levin.zimmermann@nexedi.com>
      
      /discussed-on !18
      c559ec1a
    • Kirill Smelkov's avatar
      wcfs: Shutdown WatchLink on any pin error · 007d53db
      Kirill Smelkov authored
      If a pin misbehaves or there is IO error or anything else, we want to
      stop all communication on the watchlink, cancel on in-flight pin
      handlers, and (TODO) kill the client with SIGBUS.
      
      This patch organizes WatchLink shutdown on any pin error.
      This functionality is indirectly tested by test_Wcfs_watch_robust and
      will be also indirectly tested by faultyprotection tests.
      
      It would be good to have dedicated tests probably, but that is,
      hopefully, TODO.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      007d53db
    • Kirill Smelkov's avatar
      wcfs: Do not cancel pin handler by a READ interrupt · 7a79bdd6
      Kirill Smelkov authored
      Pinning is critical operation whose failure will soon lead to client
      being killed with SIGBUS. WCFS correctness also depend fundamentally on
      pin operation, if started, to be handled by the client.
      
      -> rework the READ handler not to cancel pin if a READ interrupt comes
         in from the OS client.
      
      Do this via organizing WatchLink.serveCtx and running pins under this
      context instead of under READ context. Later we will adjust pins to also
      cancel this context on any error.
      
      Test is, hopefully, TODO.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on !18
      7a79bdd6
    • Kirill Smelkov's avatar
      wcfs: Fix potential stuck in WatchLink.serve exit codepath · a6dd7806
      Kirill Smelkov authored
      When serve is completing and going to exit, it sends an error message to
      the client without any timeout. So if the client is not reading from the
      channel, wcfs will get stuck waiting for the message to be consumed.
      
      -> Fix that by trying to send that last error only during 1 second and
         ignoring errors if any
      
      Test is, hopefully, TODO.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on !18
      a6dd7806
    • Kirill Smelkov's avatar
      wcfs: Rework WatchLink.serve exit codepath for better clarity · 08b011f5
      Kirill Smelkov authored
      Bring in more structure:
      
      - final watchlink cleanup is done in its own block
      - cancelling spawned handlers is done in another block
      - add more comments explaining things
      
      /reviewed-by @levin.zimmermann
      /reviewed-on !18
      08b011f5
    • Kirill Smelkov's avatar
      wcfs: Rework WatchLink.serve to rely on context cancellation to stop reading · c7c3b82a
      Kirill Smelkov authored
      Previously we were using .sk.CloseRead() to interrupt sk.Read(), but
      that is not necessary since .sk, relying on xio.Pipe, implements
      xio.Reader natively with full support for cancellation.
      
      The original code to cancel via CloseRead comes from mid 2019 and predates
      
      kirr/go123@7ad867a3
      kirr/go123@0e368363
      kirr/go123@0bdac628
      kirr/go123@9db4dfac
      kirr/go123@d2dc6c09
      
      And in kirr/wendelin.core@b17aeb8c and
      6f0cdaff (wcfs: Provide isolation to clients), it seems, I missed to
      update WatchLink.serve code to that.
      
      Do that now because it simplifies code flow organization a bit.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on !18
      c7c3b82a
    • Kirill Smelkov's avatar
      wcfs: tests: Extend faulty protection tests with more kinds of faulty clients · c91fb14e
      Kirill Smelkov authored
      So far we were testing only against faulty client that reads pin
      notification ok, but does not reply to the notification. But there could
      be more problems:
      
      1) a client does not read pin notification at all
      2) a client closes watchlink abruptly after reading pin notification
      3) a client replies to pin notification but the reply is not "ack"
      
      The first problem, if not handled leads to whole set of clients to
      become stuck on reading the same block as the faulty client. The other
      problems also indicate breakage of the isolation protocol from the client
      side and that wcfs can no longer be sure that it provides good
      uncorrupted data to the client.
      
      In the first case, similarly to "no reply" situation we need to kill the
      client to make progress while maintaining safety as well. In the cases 2
      and 3 we cannot maintain safety if the faulty client remains in the set
      of live and served clients, so it is also logical to send SIGBUS/SIGKILL
      to it.
      
      Killing a client with SIGBUS is similar to how OS kernel sends SIGBUS when
      a memory-mapped file is accessed and loading file data results in EIO. It is
      also similar to wendelin.core 1 where SIGBUS is raised if loading file block
      results in an error.
      
      Extend tests to cover all explained scenarios.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      c91fb14e