• Kirill Smelkov's avatar
    Switch tee from threading.Thread to sync.WorkGroup · 1e6a1cc6
    Kirill Smelkov authored
    The reason is that with threading.Thread if exception happens in that
    spawned thread, this error is not propagated to main driver, while with
    sync.WorkGroup an exception from any spawned worker is propagated back
    to main. For example with the following injected error
    
        --- a/nxdtest/__init__.py
        +++ b/nxdtest/__init__.py
        @@ -267,6 +267,7 @@ def main():
    
         # tee, similar to tee(1) utility, copies data from fin to fout appending them to buf.
         def tee(ctx, fin, fout, buf):
        +    1/0
             while 1:
    
    before this patch nxdtest behaves like ...
    
        (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ nxdtest
        date:   Tue, 24 Nov 2020 14:55:08 MSK
        xnode:  kirr@deco.navytux.spb.ru
        uname:  Linux deco 5.9.0-2-amd64 #1 SMP Debian 5.9.6-1 (2020-11-08) x86_64
        cpu:    Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
    
        >>> pytest
        $ python -m pytest
        Exception in thread Thread-2:
        Traceback (most recent call last):
          File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
            self.run()
          File "/usr/lib/python2.7/threading.py", line 754, in run
            self.__target(*self.__args, **self.__kwargs)
          File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee
            1/0
        ZeroDivisionError: integer division or modulo by zero
    
        Exception in thread Thread-1:
        Traceback (most recent call last):
          File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
            self.run()
          File "/usr/lib/python2.7/threading.py", line 754, in run
            self.__target(*self.__args, **self.__kwargs)
          File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee
            1/0
        ZeroDivisionError: integer division or modulo by zero
    
        error   pytest  0.583s  # 1t 1e 0f 0s
        (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ echo $?
        0
    
    Here the error in another thread is only printed, but nxdtest is not aborted.
    Above it reported "error", but e.g. when testing pygolang/py3 and raising an
    error in tee it even reported it was succeeding
    ( !6 (comment 121393) ):
    
        slapuser34@vifibcloud-rapidspace-hosting-007:~/srv/runner/instance/slappart0$ ./bin/runTestSuite
        date:   Tue, 24 Nov 2020 12:51:23 MSK
        xnode:  slapuser34@vifibcloud-rapidspace-hosting-007
        uname:  Linux vifibcloud-rapidspace-hosting-007 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
        cpu:    Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
    
        >>> thread
        $ python -m pytest
        Exception in thread Thread-1:
        Traceback (most recent call last):
          File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 926, in _bootstrap_inner
            self.run()
          File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 870, in run
            self._target(*self._args, **self._kwargs)
          File "/srv/slapgrid/slappart34/srv/runner/software/44fe7dd3f13ecd100894c6368a35c055/parts/nxdtest/nxdtest/__init__.py", line 268, in tee
            fout.write(data)
        TypeError: write() argument must be str, not bytes
    
        ok      thread  9.145s  # 1t 0e 0f 0s
    
        >>> gevent
        $ gpython -m pytest
        Exception in thread Thread-3:
        Traceback (most recent call last):
          File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 926, in _bootstrap_inner
            self.run()
          File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 870, in run
            self._target(*self._args, **self._kwargs)
          File "/srv/slapgrid/slappart34/srv/runner/software/44fe7dd3f13ecd100894c6368a35c055/parts/nxdtest/nxdtest/__init__.py", line 268, in tee
            fout.write(data)
        TypeError: write() argument must be str, not bytes
    
        ok      gevent  21.980s # 1t 0e 0f 0s
    
    After this patch nxdtest correctly handles and propagates an error originating
    in spawned thread back to main driver:
    
        (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ nxdtest
        date:   Tue, 24 Nov 2020 14:54:19 MSK
        xnode:  kirr@deco.navytux.spb.ru
        uname:  Linux deco 5.9.0-2-amd64 #1 SMP Debian 5.9.6-1 (2020-11-08) x86_64
        cpu:    Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
    
        >>> pytest
        $ python -m pytest
        Traceback (most recent call last):
          File "/home/kirr/src/wendelin/venv/z4-dev/bin/nxdtest", line 11, in <module>
            load_entry_point('nxdtest', 'console_scripts', 'nxdtest')()
          File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 230, in main
            wg.wait()
          File "golang/_sync.pyx", line 237, in golang._sync.PyWorkGroup.wait
            pyerr_reraise(pyerr)
          File "golang/_sync.pyx", line 217, in golang._sync.PyWorkGroup.go.pyrunf
            f(pywg._pyctx, *argv, **kw)
          File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee
            1/0
        ZeroDivisionError: integer division or modulo by zero
        (neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ echo $?
        1
    
    NOTE sync.WorkGroup requires every worker to handle context cancellation, so
    that whenever there is an error, all other workers are canceled. We add such
    cancellation handling to tee but only lightly: before going to block in
    read/write syscalls we check for whether ctx is canceled or not. However the
    proper handling would be to switch file descriptors into non-block mode and to
    select at every IO point on both potential IO events and potential
    cancellation. This is left as TODO for the future.
    
    /reviewed-on !7
    1e6a1cc6
__init__.py 15.9 KB