nxdtest:master commitshttps://lab.node.vifib.com/nexedi/nxdtest/-/commits/master2022-06-20T19:51:57+09:00https://lab.node.vifib.com/nexedi/nxdtest/-/commit/39464b65ab09266ef94a5e084bf69cb3f9dc1020fixup! PyTest.summaryf: better detection of summary line2022-06-20T19:51:57+09:00Jérome Perrinjerome@nexedi.com
fix a wrong construct being syntax error on python2https://lab.node.vifib.com/nexedi/nxdtest/-/commit/f89c3b55c0ae87ac89e4e87020906217981c86e1PyTest.summaryf: better detection of summary line2022-06-20T19:34:11+09:00Jérome Perrinjerome@nexedi.com
If program output after pytest's summary line, our summary function
would be confused and detect 0 test run.
Make this matching more robust by expecting the actual status of tests
reported by pytest: (x)passed, (x)failed, skipped errors or "no test run"
From Kirill's suggestion on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17#note_162070" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17#note_162070" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5817" data-project-path="nexedi/nxdtest" data-iid="17" data-mr-title="Fix test failures and improve pytest summary" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!17 (comment 162070)</a>
/reviewed-by <a href="/kirr" data-user="14" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Kirill Smelkov">@kirr</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5817" data-project-path="nexedi/nxdtest" data-iid="17" data-mr-title="Fix test failures and improve pytest summary" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!17</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/d9f8a34588ff40453a1179be2ea923543099435fPyTest.summaryf: always report count of failures and errors2022-06-20T19:33:23+09:00Jérome Perrinjerome@nexedi.com
summaryf needs to always return the count of failures, errors, skips
for two reasons:
- nxdtest sets error_count += 1 when the test program's returncode is
not zero, so if we don't set error_count explicitly, the number of
errors will be 1 when there are only failures
- without error_count / failure_count, _test_result_summary does not
display the global test result as pass or fail but as ?
/reviewed-by <a href="/kirr" data-user="14" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Kirill Smelkov">@kirr</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5817" data-project-path="nexedi/nxdtest" data-iid="17" data-mr-title="Fix test failures and improve pytest summary" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!17</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/d30b73f05fa53e392086fffa6174e4138659f0a7test: fix flaky test_cancel_from_master2022-06-17T23:19:09+09:00Jérome Perrinjerome@nexedi.com
We did not wait long enough for the tested process to set their signal
handler, as we can see on test history [1], this test fails often when
running on test node.
It always pass when I run locally, but if I adjust the _tmasterpoll
patch set to 0.01 second, it fails the same way.
1: <a href="https://erp5.nexedi.net/test_result_module/20220616-13A460D9B/2/TestResultLine_viewResultHistory" rel="nofollow noreferrer noopener" target="_blank">https://erp5.nexedi.net/test_result_module/20220616-13A460D9B/2/TestResultLine_viewResultHistory</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/17" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5817" data-project-path="nexedi/nxdtest" data-iid="17" data-mr-title="Fix test failures and improve pytest summary" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!17</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/56e52da6cc9b1da30aaaf159af3a7f952e00a4bftrun: Fix returncode when test run is canceled2022-01-27T12:49:17+03:00Kirill Smelkovkirr@nexedi.com
To detect leaked processes in the end of the run, we are first waiting
for remaining test processes via procps. If the main waiting loop was
canceled without main test process first completed (p.poll calls never
returned !None), then the waitpid(pid=p.pid) system call will be done
via procps. Which leaves further waitpid(pid=p.pid) system call invoked
by subprocess to get -ECHILD error and artificially report 0 exit
status:
<a href="https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Lib/subprocess.py#L1094-L1107" rel="nofollow noreferrer noopener" target="_blank">https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Lib/subprocess.py#L1094-L1107</a>
-> Fix it by propagating .returncode from procps to Popen instance so
that it does not get lost.
On sample .nxdtest with
TestCase('sleep', ['sleep', '10'])
Before the patch the output with CTRL+C was:
$ nxdtest
...
>>> sleep
$ sleep 10
^C# Interrupt
ok sleep 0.604s # 1t 0e 0f 0s <-- NOTE
# test run canceled
# ran 1 test case: 1·ok <-- NOTE
After the patch the output becomes:
$ nxdtest
...
>>> sleep
$ sleep 100
^C# Interrupt
error sleep 1.006s # 1t 1e 0f 0s <-- NOTE
# test run canceled
# ran 1 test case: 1·error <-- NOTE
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5405" data-project-path="nexedi/nxdtest" data-iid="16" data-mr-title="Cancel test run on SIGINT/SIGTERM" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">nexedi/nxdtest!16</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/cf3001848685d1458bb2a97f07a74bddbfc7016bMove detection of leaked processes to trun2022-01-27T12:49:10+03:00Kirill Smelkovkirr@nexedi.com
When test run is canceled via signal, we want to send SIGTERM to trun
and let it kill/wait the tested processes. However until now we were
sending SIGTERM to trun and immediately checking whether some spawned
tested processes remain without giving any time for trun to complete first.
Fix this by moving the code that detects/cleanups leaked processes to
trun itself, and in the main nxdtest driver to only send SIGTERM to trun
and let trun do the cleanup/leaked processes detection.
This fixes erroneous "leaked process detected" as noticed by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a> in
the following example:
(<a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16#note_150835" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16#note_150835" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5405" data-project-path="nexedi/nxdtest" data-iid="16" data-mr-title="Cancel test run on SIGINT/SIGTERM" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!16 (comment 150835)</a>)
$ nxdtest
...
>>> sleep
$ sleep 10
^C# Interrupt
# stopping due to cancel
# leaked pid=188877 'sleep' ['sleep', '10'] <-- "leaked" is a bit misleading here
error sleep 1.030s # 1t 1e 0f 0s
# test run canceled
# ran 1 test case: 1·error
After the patch the output in this case becomes:
$ nxdtest
...
>>> sleep
$ sleep 10
^C# Interrupt
ok sleep 0.604s # 1t 0e 0f 0s
# test run canceled
# ran 1 test case: 1·ok
There is no more "leaked pid=...." emitted, but the change from "error" to "ok" status is wrong.
We will fix it in the next patch.
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5405" data-project-path="nexedi/nxdtest" data-iid="16" data-mr-title="Cancel test run on SIGINT/SIGTERM" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!16</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/548a5325b4722b4c4349de1501ff76d9de14bda9Fix thinko when finally killing processes2022-01-27T12:49:04+03:00Kirill Smelkovkirr@nexedi.com
First we send SIGTERM to leaked processes, and then, after timeout,
intend to send SIGKILL to the leaked processes and to the main test
process if it is still alive.
However there is a thiko in the code: we were sending SIGKILL only to
the main test process, not to all leaked ones.
-> Fix it.
The bug was there since <a href="/nexedi/nxdtest/-/commit/0ad45a9cafdf7704071f77a41a58c4393e75f192" data-original="0ad45a9c" data-link="false" data-link-reference="false" data-project="1269" data-commit="0ad45a9cafdf7704071f77a41a58c4393e75f192" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Detect if a test leaks processes and terminate them" class="gfm gfm-commit has-tooltip">0ad45a9c</a> (Detect if a test leaks processes and
terminate them).
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5405" data-project-path="nexedi/nxdtest" data-iid="16" data-mr-title="Cancel test run on SIGINT/SIGTERM" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!16</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/b0cf277d68d40df9d9bf2a75b7e49a6741980948Cancel test run on SIGINT/SIGTERM2022-01-27T12:48:36+03:00Kirill Smelkovkirr@nexedi.com
In addition to canceling test run is master tells us to do so, also cancel the
run if interrupted or terminated.
With the following sample .nxdtest
TestCase('sleep', ['sleep', '10'])
before the patch it does not react to CTRL+C:
$ nxdtest
...
>>> sleep
$ sleep 10
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C# ran 0 test cases. <-- no reaction to CTRL+C, finishes after 10 seconds
Traceback (most recent call last):
File "/home/kirr/src/tools/go/py3.venv2/bin/nxdtest", line 33, in <module>
sys.exit(load_entry_point('nxdtest', 'console_scripts', 'nxdtest')())
File "/home/kirr/src/tools/go/py3.venv2/lib/python3.9/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/kirr/src/tools/go/pygolang-master/golang/__init__.py", line 103, in _
return f(*argv, **kw)
File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 339, in main
wg.wait()
KeyboardInterrupt
after the patch:
$ nxdtest
...
>>> sleep
$ sleep 10
^C# Interrupt <-- prompt reaction to CTRL+C
# stopping due to cancel
# leaked pid=188877 'sleep' ['sleep', '10']
error sleep 1.030s # 1t 1e 0f 0s
# test run canceled
# ran 1 test case: 1·error
Needs <a href="https://lab.nexedi.com/nexedi/pygolang/merge_requests/17" data-original="https://lab.nexedi.com/nexedi/pygolang/merge_requests/17" data-link="false" data-link-reference="true" data-project="1156" data-merge-request="5404" data-project-path="nexedi/pygolang" data-iid="17" data-mr-title="Nogil signals + IO" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">pygolang!17</a> to work.
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/16" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5405" data-project-path="nexedi/nxdtest" data-iid="16" data-mr-title="Cancel test run on SIGINT/SIGTERM" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!16</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/6f75fa906b193f272e5f468c8decb2ddefdd865dfixup! trun: Spawn user test with sole regular uid/gid in /etc/{passwd,group}...2021-12-23T23:59:20+03:00Kirill Smelkovkirr@nexedi.com
Even though slapos.core tests seem to mock getgrnam calls[1], disk group
is being looked up in /etc/groups for real which fails, if there is no
such group, e.g. as
ERROR: test_not_existing (slapos.tests.test_slapgrid.TestSlapgridWithDevPermManagerDevPermEmptyLsblk)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/srv/slapgrid/slappart3/t/bvi/i/0/parts/slapos.core/slapos/tests/test_slapgrid.py", line 3246, in setUp
self.setUpExpected()
File "/srv/slapgrid/slappart3/t/bvi/i/0/parts/slapos.core/slapos/tests/test_slapgrid.py", line 3230, in setUpExpected
gid = grp.getgrnam("disk").gr_gid
KeyError: 'getgrnam(): name not found: disk'
-> Fix it up by also creating "disk" group in our namespace environment.
I'm not sure, but maybe the better long-term fix would be for
slapos.core tests not to access /etc/groups for real and to instead mock
access to this database completely.
Amends commits <a href="/nexedi/nxdtest/-/commit/e6b7993c12c79feab7097b14e8d4fdc3896b8c7f" data-original="e6b7993c" data-link="false" data-link-reference="false" data-project="1269" data-commit="e6b7993c12c79feab7097b14e8d4fdc3896b8c7f" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="trun: Spawn user test with sole regular uid/gid in /etc/{passwd,group} database" class="gfm gfm-commit has-tooltip">e6b7993c</a> and <a href="/nexedi/nxdtest/-/commit/b42ccfa536bb5d2e07ccea8df0cd9eff9ac8595e" data-original="b42ccfa5" data-link="false" data-link-reference="false" data-project="1269" data-commit="b42ccfa536bb5d2e07ccea8df0cd9eff9ac8595e" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="trun: Add test for how /etc/{passwd,group} is setup for spawned job" class="gfm gfm-commit has-tooltip">b42ccfa5</a>.
/reported-by <a href="/tomo" data-user="737" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Thomas Gambier">@tomo</a>
/reported-at <a href="https://lab.nexedi.com/nexedi/slapos/merge_requests/1107#note_148758" data-original="https://lab.nexedi.com/nexedi/slapos/merge_requests/1107#note_148758" data-link="false" data-link-reference="true" data-project="15" data-merge-request="5316" data-project-path="nexedi/slapos" data-iid="1107" data-mr-title="component/nxdtest: v↑ (namespaces, /tmp-on-tmpfs, cancellation)" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">slapos!1107 (comment 148758)</a>
[1] <a href="https://lab.nexedi.com/nexedi/slapos.core/blob/1.7.1-28-g0b6bf2af4/slapos/tests/test_slapformat.py#L160-166">https://lab.nexedi.com/nexedi/slapos.core/blob/1.7.1-28-g0b6bf2af4/slapos/tests/test_slapformat.py#L160-166</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/81b1907a28c9205234cea039ff4efae3ce8806c1Propagate cancellation to spawned test jobs2021-12-20T12:26:57+03:00Kirill Smelkovkirr@nexedi.com
So that if a test run is canceled in ERP5 UI, nxdtest stops its run
soon, instead of after several hours in case of
SlapOS.SoftwareReleases.IntegrationTest-* tests.
See the main (<a href="/nexedi/nxdtest/-/commit/938b545504706033e0188a0976eb0b45686124a3" data-original="938b5455" data-link="false" data-link-reference="false" data-project="1269" data-commit="938b545504706033e0188a0976eb0b45686124a3" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Propagate cancellation to spawned test jobs" class="gfm gfm-commit has-tooltip">938b5455</a>) and other patches for details.
/helped-and-reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5287" data-project-path="nexedi/nxdtest" data-iid="14" data-mr-title="Propagate cancellation to spawned test jobs" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!14</a>
* y/cancel:
Add test for cancel propagation
tests: Run nxdtest.main for each test in a separate thread, so that pytest.timeout generally works
Propagate cancellation to spawned test jobs
Raise ctx.err() if test run was cancelled
Stop spawned process softly on ctx cancelhttps://lab.node.vifib.com/nexedi/nxdtest/-/commit/5d656ccf256d78f4ead9fccc24e74cf402437334Add test for cancel propagation2021-12-20T12:21:45+03:00Jérome Perrinjerome@nexedi.com
/reviewed-by <a href="/kirr" data-user="14" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Kirill Smelkov">@kirr</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5287" data-project-path="nexedi/nxdtest" data-iid="14" data-mr-title="Propagate cancellation to spawned test jobs" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!14</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/8f59b689253b2809d6d19a8be84e0b0a08ca2e32tests: Run nxdtest.main for each test in a separate thread, so that pytest.ti...2021-12-20T12:21:16+03:00Kirill Smelkovkirr@nexedi.com
Factor-out the driver to run_nxdtest in a separate thread done in
<a href="/nexedi/nxdtest/-/commit/0ad45a9cafdf7704071f77a41a58c4393e75f192" data-original="0ad45a9c" data-link="false" data-link-reference="false" data-project="1269" data-commit="0ad45a9cafdf7704071f77a41a58c4393e75f192" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Detect if a test leaks processes and terminate them" class="gfm gfm-commit has-tooltip">0ad45a9c</a> (Detect if a test leaks processes and terminate them) from
test_run_procleak into run_nxdtest, so that all tests are run this way
for pytest.timeout, if requested, to work universally for all tests.
The logic of how we run nxdtest under tests and handle timeout will be,
hopefully, soon reworked more, but it is anyway good to keep this logic
in only one place.
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5287" data-project-path="nexedi/nxdtest" data-iid="14" data-mr-title="Propagate cancellation to spawned test jobs" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!14</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/938b545504706033e0188a0976eb0b45686124a3Propagate cancellation to spawned test jobs2021-12-20T12:17:58+03:00Kirill Smelkovkirr@nexedi.com
A user might cancel test result in ERP5 UI if e.g. some misbehaviour is
detected and a new revision is ready to be tested. This works by
test_result.start() returning None - indicating that there is no more
test_result_lines to exercise. Master also indicates this cancellation
via test_result.isAlive() returning False, but until now we were not
using that information and were always waiting for completion of current
test job that is already spawned.
This works well in practice if individual tests are not long, but e.g.
for SlapOS.SoftwareReleases.IntegrationTest-* it is not good, because
there an individual test might takes _hours_ to execute.
-> Fix it by first setting global context to where we'll propagate
cancellation from test_result.isAlive, and by using that context as the
base for all other activities. This should terminate spawned test
process if test_result is canceled.
The interval to check is picked up as 5 minutes not to overload master.
<a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a> says that
We now have 341 active test nodes, but sometimes we are using
more, we did in the past to stress test some new machines.
For the developer, if we reduce the waiting time from a few hours to 1
minutes or 5 minutes seems more or less equivalent.
For 350 testnodes and each nxdtest checking its test_result status via
isAlive query to master every 5 minutes, it results in ~ 1 isAlive
request/second to master on average.
Had to change time to golang.time to use time.after().
Due to that time() and sleep() are changed to time.now() and
time.sleep() correspondingly.
/helped-and-reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5287" data-project-path="nexedi/nxdtest" data-iid="14" data-mr-title="Propagate cancellation to spawned test jobs" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!14</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/bdd183fb7b74d01797738975943efbe69e6ef22dRaise ctx.err() if test run was cancelled2021-12-20T12:17:33+03:00Kirill Smelkovkirr@nexedi.com
This is normal rule to return an error from a task if the task has to
abort due to cancellation. We already do this in tee, but not in the
function that is waiting for spawned process to complete(*).
-> Fix that. Wrap corresponding wg.wait() into try/except and check if
it fails due to cancellation, upon which we should not raise, but
instead should continue to finish current test_result_line and only
after stop test run normally (again without raise, but with regular one
line log entry).
(*) added in <a href="/nexedi/nxdtest/-/commit/0ad45a9cafdf7704071f77a41a58c4393e75f192" data-original="0ad45a9c" data-link="false" data-link-reference="false" data-project="1269" data-commit="0ad45a9cafdf7704071f77a41a58c4393e75f192" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Detect if a test leaks processes and terminate them" class="gfm gfm-commit has-tooltip">0ad45a9c</a> (Detect if a test leaks processes and terminate them)
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5287" data-project-path="nexedi/nxdtest" data-iid="14" data-mr-title="Propagate cancellation to spawned test jobs" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!14</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/34cb7879e18db16a1ee4623a7d30ff35461276e5Stop spawned process softly on ctx cancel2021-12-20T12:15:10+03:00Kirill Smelkovkirr@nexedi.com
In <a href="/nexedi/nxdtest/-/commit/0ad45a9cafdf7704071f77a41a58c4393e75f192" data-original="0ad45a9c" data-link="false" data-link-reference="false" data-project="1269" data-commit="0ad45a9cafdf7704071f77a41a58c4393e75f192" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Detect if a test leaks processes and terminate them" class="gfm gfm-commit has-tooltip">0ad45a9c</a> (Detect if a test leaks processes and terminate them) we
organized waiting for spawned processes with handing ctx and sending
SIGKILL to the main spawned process on ctx cancel even though other
leaked processes are always first sent with SIGTERM and - only after
shutdown timeout - later with SIGKILL.
This is too brutal. Rework the code to first send SIGTERM to the main
spawned test process too, and leverage SIGKILL only later after shutdown
timeout.
This will be tested in a later patch which exercises how cancel from
master is propagated.
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/14" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5287" data-project-path="nexedi/nxdtest" data-iid="14" data-mr-title="Propagate cancellation to spawned test jobs" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!14</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/5acd13595947fa4a492352a6b8a441b2892924a0Run each testcase with its own /tmp and /dev/shm2021-12-20T12:06:22+03:00Kirill Smelkovkirr@nexedi.com
to detect after each test run leaked temporary files, leaked mount
entries, to isolate different test runs from each other, and to provide
tmpfs on /tmp for every test.
The main change and description is in patch1 (<a href="/nexedi/nxdtest/-/commit/a191468f098358461dec880aebf5b0f14f24bbd5" data-original="a191468f" data-link="false" data-link-reference="false" data-project="1269" data-commit="a191468f098358461dec880aebf5b0f14f24bbd5" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Run each testcase with its own /tmp and /dev/shm" class="gfm gfm-commit has-tooltip">a191468f</a>); the other
patches fix that up step-by-step to work for real for all our tests.
/helped-by <a href="/tomo" data-user="737" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Thomas Gambier">@tomo</a>
/helped-and-reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5271" data-project-path="nexedi/nxdtest" data-iid="13" data-mr-title="Run each testcase with its own /tmp and /dev/shm" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!13</a>
* y/unshare:
trun: Require FUSE to be working inside user-namespaces to activate them
Factor checking whether user-namespaces are available into trun.userns_available()
trun: Add test for how /etc/{passwd,group} is setup for spawned job
trun: Spawn user test with sole regular uid/gid in /etc/{passwd,group} database
trun: Deactivate most capabilities before spawning user test
Run each testcase with its own /tmp and /dev/shmhttps://lab.node.vifib.com/nexedi/nxdtest/-/commit/f5f5243446a1cfcc266721afea8595cf408e9de4trun: Require FUSE to be working inside user-namespaces to activate them2021-12-20T12:04:28+03:00Kirill Smelkovkirr@nexedi.com
FUSE is needed for wendelin.core, and if we don't check that FUSE works
inside and activate user namespaces on e.g. Linux 4.9 kernel,
wendelin.core will fail to function. Since today wendelin.core is
included into ERP5 as its base component(*) it is practical to require
FUSE-in-userns support unconditionally.
FUSE in user-namespaces started to work in Linux 4.18(+). Detect if it
should work inside via checking whether running kernel is newer. I
choosed this way for simplicity not to unroll test filesystem to try to
mount inside and also not to slow-down startup time.
(*) see <a href="https://lab.nexedi.com/nexedi/slapos/merge_requests/874" data-original="https://lab.nexedi.com/nexedi/slapos/merge_requests/874" data-link="false" data-link-reference="true" data-project="15" data-merge-request="4426" data-project-path="nexedi/slapos" data-iid="874" data-mr-title="Move wendelin.core from Wendelin to ERP5" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">slapos!874</a>
(+) see <a href="https://git.kernel.org/linus/da315f6e0398" rel="nofollow noreferrer noopener" target="_blank">https://git.kernel.org/linus/da315f6e0398</a> and <a href="https://git.kernel.org/linus/8cb08329b080" rel="nofollow noreferrer noopener" target="_blank">https://git.kernel.org/linus/8cb08329b080</a>.
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5271" data-project-path="nexedi/nxdtest" data-iid="13" data-mr-title="Run each testcase with its own /tmp and /dev/shm" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!13</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/ef41d9601b95fd1db2dd70be0919584a4534dc65Factor checking whether user-namespaces are available into trun.userns_availa...2021-12-20T12:03:24+03:00Kirill Smelkovkirr@nexedi.com
In the next patch we'll also add detection whether FUSE works inside
user namespaces. Before doing that stop duplicating related code in
between trun.py and nxdtest_test.py .
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5271" data-project-path="nexedi/nxdtest" data-iid="13" data-mr-title="Run each testcase with its own /tmp and /dev/shm" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!13</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/b42ccfa536bb5d2e07ccea8df0cd9eff9ac8595etrun: Add test for how /etc/{passwd,group} is setup for spawned job2021-12-20T12:02:44+03:00Jérome Perrinjerome@nexedi.com
Even if we don't want to spawn full sshd to verify that ptys work, we can
still minimally test that inside user/group database is setup as expected.
/reviewed-by <a href="/kirr" data-user="14" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Kirill Smelkov">@kirr</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5271" data-project-path="nexedi/nxdtest" data-iid="13" data-mr-title="Run each testcase with its own /tmp and /dev/shm" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!13</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/e6b7993c12c79feab7097b14e8d4fdc3896b8c7ftrun: Spawn user test with sole regular uid/gid in /etc/{passwd,group} database2021-12-20T12:00:15+03:00Kirill Smelkovkirr@nexedi.com
Even though libc.opentty stopped to insist on chown(group=tty) for
/dev/pts/*, openssh still wants to do it and fails, preventing sshd from
working. Fix it by spawning test workload with sole current user and
group being present in the password database.
We anyway don't have IDs for users/groups different from current uid/gid
mapped from current namespace, and the kernel, when seeing a file owned
by those, maps them to "nobody/nogroup" for existing files, and rejects
chown to those original IDs obtained from parent's namespace
/etc/{passwd,group} as EINVAL. For the same reason we don't try to mount
our own /dev/pts instance, because we have only current uid/gid mapped
to parent namespace and gid=5 maps to nogroup in parent. With existing
/dev/pts mount entries are only listed as having nogroup, and from
outside they _are_ owned by parent's tty group. If we would mount
/dev/pts anew, parent won't see our /dev/pts/* at all which moves us
a bit more far from desired behaviour.
Still keep root and nobody/nogroup as <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a> reports that without those
users Go tests fail on Debian9:
<a href="https://lab.nexedi.com/nexedi/slapos/merge_requests/1095#note_147177" data-original="https://lab.nexedi.com/nexedi/slapos/merge_requests/1095#note_147177" data-link="false" data-link-reference="true" data-project="15" data-merge-request="5273" data-project-path="nexedi/slapos" data-iid="1095" data-mr-title="component/nxdtest: Prepare for https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">slapos!1095 (comment 147177)</a>
<a href="https://lab.nexedi.com/nexedi/slapos/merge_requests/1095#note_147201" data-original="https://lab.nexedi.com/nexedi/slapos/merge_requests/1095#note_147201" data-link="false" data-link-reference="true" data-project="15" data-merge-request="5273" data-project-path="nexedi/slapos" data-iid="1095" data-mr-title="component/nxdtest: Prepare for https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">slapos!1095 (comment 147201)</a>
See added comment about all this for more details.
This patch fixes sshd to work under tryn.py(*)
No test because libc.openpty works both with and without this patch, and
it would need too spawn whole sshd under test to verify this.
(*) the diff for sshd 1) running successfully under regular user, and 2)
previously failing under trun.py without this patch is below:
diff --git a/regular-nopam.txt b/trun-nopam.txt
index 378ccb6..5b96c08 100644
--- a/regular-nopam.txt
+++ b/trun-nopam.txt
@@ -1,4 +1,4 @@
-(neo) (z-dev) (g.env) kirr@deca:~/tmp/trashme/sshd$ /sbin/sshd -d -p 2222 -h `pwd`/ssh_host_rsa_key -o UsePAM=no
+kirr@deca:~/tmp/trashme/sshd$ /sbin/sshd -d -p 2222 -h `pwd`/ssh_host_rsa_key -o UsePAM=no
debug1: sshd version OpenSSH_8.4, OpenSSL 1.1.1k 25 Mar 2021
debug1: private host key #0: ssh-rsa SHA256:y+ujVDqqFBXTclDM2NLy4GME7wReutLcUYOWAeriXdc
debug1: setgroups() failed: Operation not permitted
@@ -91,35 +91,13 @@ debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug1: session_new: session 0
debug1: SELinux support disabled
-Attempt to write login records by non-root user (aborting)
-debug1: session_pty_req: session 0 alloc /dev/pts/2
-debug1: server_input_channel_req: channel 0 request env reply 0
-debug1: session_by_channel: session 0 channel 0
-debug1: session_input_channel_req: session 0 req env
-debug1: server_input_channel_req: channel 0 request shell reply 1
-debug1: session_by_channel: session 0 channel 0
-debug1: session_input_channel_req: session 0 req shell
-Starting session: shell on pts/2 for kirr from 127.0.0.1 port 44106 id 0
-debug1: Setting controlling tty using TIOCSCTTY.
-
-debug1: Received SIGCHLD.
-debug1: session_by_pid: pid 693948
-debug1: session_exit_message: session 0 channel 0 pid 693948
-debug1: session_exit_message: release channel 0
-debug1: session_by_tty: session 0 tty /dev/pts/2
-debug1: session_pty_cleanup2: session 0 release /dev/pts/2
-Attempt to write login records by non-root user (aborting)
-debug1: session_by_channel: session 0 channel 0
-debug1: session_close_by_channel: channel 0 child 0
-Close session: user kirr from 127.0.0.1 port 44106 id 0
-debug1: channel 0: free: server-session, nchannels 1
-Received disconnect from 127.0.0.1 port 44106:11: disconnected by user
-Disconnected from user kirr 127.0.0.1 port 44106
+chown(/dev/pts/2, 1000, 5) failed: Invalid argument
debug1: do_cleanup
debug1: temporarily_use_uid: 1000/1000 (e=1000/1000)
debug1: restore_uid: (unprivileged)
+debug1: session_pty_cleanup2: session 0 release /dev/pts/2
+Attempt to write login records by non-root user (aborting)
+debug1: audit_event: unhandled event 12
debug1: do_cleanup
debug1: temporarily_use_uid: 1000/1000 (e=1000/1000)
debug1: restore_uid: (unprivileged)
-debug1: audit_event: unhandled event 12
(see <a href="https://lab.nexedi.com/nexedi/slapos/merge_requests/1095#note_147018" data-original="https://lab.nexedi.com/nexedi/slapos/merge_requests/1095#note_147018" data-link="false" data-link-reference="true" data-project="15" data-merge-request="5273" data-project-path="nexedi/slapos" data-iid="1095" data-mr-title="component/nxdtest: Prepare for https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">slapos!1095 (comment 147018)</a>)
/helped-and-reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5271" data-project-path="nexedi/nxdtest" data-iid="13" data-mr-title="Run each testcase with its own /tmp and /dev/shm" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!13</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/a9af3a8a6a95a2ed7b7702ce67a839997cb9f511trun: Deactivate most capabilities before spawning user test2021-12-20T11:59:03+03:00Kirill Smelkovkirr@nexedi.com
In the previous patch we asked unshare to keep capabilities so that FUSE
mounting works from under regular user. However full set of capabilities
is too much, and in particular if cap_dac_override is present(*), it means
that writes to files that have read-only permission, is not rejected by
kernel.
-> Adjust trun to retain only those capabilities that we actually need
= CAP_SYS_ADMIN to mount things.
This should fix the following Go build failure:
--- FAIL: TestReadOnlyWriteFile (0.00s)
ioutil_test.go:90: Expected an error when writing to read-only file /tmp/TestReadOnlyWriteFile3940340549/blurp.txt
FAIL
FAIL io/ioutil 0.053s
P.S. And if we would unshare to root instead (unshare -Umr) it should be
still a good idea to drop extra capabilities, as we still want to reject
writes to read-only files.
(*) see <a href="https://man7.org/linux/man-pages/man7/capabilities.7.html" rel="nofollow noreferrer noopener" target="_blank">https://man7.org/linux/man-pages/man7/capabilities.7.html</a>
/helped-and-reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5271" data-project-path="nexedi/nxdtest" data-iid="13" data-mr-title="Run each testcase with its own /tmp and /dev/shm" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!13</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/a191468f098358461dec880aebf5b0f14f24bbd5Run each testcase with its own /tmp and /dev/shm2021-12-20T11:57:42+03:00Kirill Smelkovkirr@nexedi.com
and detect leaked temporary files and mount entries after each test run.
Background
Currently we have several testing-related problems that are
all connected to /tmp and similar directories:
Problem 1: many tests create temporary files for each run. Usually
tests are careful to remove them on teardown, but due to bugs, many kind
of tests, test processes being hard-killed (SIGKILL, or SIGSEGV) and
other reasons, in practice this cleanup does not work 100% reliably and
there is steady growth of files leaked on /tmp on testnodes.
Problem 2: due to using shared /tmp and /dev/shm, the isolation in
between different test runs of potentially different users is not
strong. For example <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a> reports that due to leakage of faketime's
shared segments separate test runs affect each other and fail:
<a href="https://erp5.nexedi.net/bug_module/20211125-1C8FE17" rel="nofollow noreferrer noopener" target="_blank">https://erp5.nexedi.net/bug_module/20211125-1C8FE17</a>
Problem 3: many tests depend on /tmp being a tmpfs instance. This are for
example wendelin.core tests which are intensively writing to database,
and, if /tmp is resided on disk, timeout due to disk IO stalls in fsync
on every commit. The stalls are as much as >30s and lead to ~2.5x overall
slowdown for test runs. However the main problem is spike of increased
latency which, with close to 100% probability, always render some test
as missing its deadline. This topic is covered in
<a href="https://erp5.com/group_section/forum/Using-tmpfs-for--tmp-on-testnodes-JTocCtJjOd" rel="nofollow noreferrer noopener" target="_blank">https://erp5.com/group_section/forum/Using-tmpfs-for--tmp-on-testnodes-JTocCtJjOd</a>
--------
There are many ways to try to address each problem separately, but they
all come with limitations and drawbacks. We discussed things with <a href="/tomo" data-user="737" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Thomas Gambier">@tomo</a>
and <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>, and it looks like that all those problems can be addressed
in one go if we run tests under user namespaces with private mounts for
/tmp and /dev/shm.
Even though namespaces is generally no-go in Nexedi, they seem to be ok
to use in tests. For example they are already used via private_tmpfs
option in SlapOS:
<a href="https://lab.nexedi.com/nexedi/slapos/blob/1876c150/slapos/recipe/librecipe/execute.py#L87-103">https://lab.nexedi.com/nexedi/slapos/blob/1876c150/slapos/recipe/librecipe/execute.py#L87-103</a>
<a href="https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo-input-schema.json#L121-124">https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo-input-schema.json#L121-124</a>
<a href="https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L11-16">https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L11-16</a>
<a href="https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L30-34">https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L30-34</a>
<a href="https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L170-177">https://lab.nexedi.com/nexedi/slapos/blob/1876c150/software/neoppod/instance-neo.cfg.in#L170-177</a>
...
<a href="https://lab.nexedi.com/nexedi/slapos/blob/1876c150/stack/erp5/instance-zope.cfg.in#L227-230">https://lab.nexedi.com/nexedi/slapos/blob/1876c150/stack/erp5/instance-zope.cfg.in#L227-230</a>
Thomas says that using private tmpfs for each test would be a better
solution than implementing tmpfs for whole /tmp on testnodes. He also
reports that <a href="/jp" data-user="2" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jean-Paul Smets">@jp</a> is OK to use namespaces for test as long as there is a
fallback if namespaces aren't available.
-> So let's do that: teach nxdtest to run each test case in its own
private environment with privately-mounted /tmp and /dev/shm if we can
detect that user namespaces are available. In an environment where user
namespaces are indeed available this addresses all 3 problems because
isolation and being-tmpfs are there by design, and even if some files
will leak, the kernel will free everything when test terminates and the
filesystem is automatically unmounted. We also detect such leakage and
report a warning so that such problems do not go completely unnoticed.
Implementation
We leverage unshare(1) for simplicity. I decided to preserve uid/gid
instead of becoming uid=0 (= `unshare -Umr`) for better traceability, so
that it is clear from test output under which real slapuser a test is
run(*). Not changing uid requires to activate ambient capabilities so
that mounting filesystems, including FUSE-based needed by wendelin.core,
continue to work under regular non-zero uid. Please see
<a href="https://git.kernel.org/linus/58319057b784" rel="nofollow noreferrer noopener" target="_blank">https://git.kernel.org/linus/58319057b784</a> for details on this topic. And
please refer to added trun.py for details on how per-test namespace is setup.
Using FUSE inside user namespaces requires Linux >= 4.18 (see
<a href="https://git.kernel.org/linus/da315f6e0398" rel="nofollow noreferrer noopener" target="_blank">https://git.kernel.org/linus/da315f6e0398</a> and
<a href="https://git.kernel.org/linus/8cb08329b080" rel="nofollow noreferrer noopener" target="_blank">https://git.kernel.org/linus/8cb08329b080</a>), so if we are really to use
this patch we'll have to upgrade kernel on our testnodes, at least where
wendelin.core is used in tests.
"no namespaces" detection is implemented via first running `unshare ...
true` with the same unshare options that are going to be used to create
and enter new user namespace for real. If that fails, we fallback into
"no namespaces" mode where no private /tmp and /dev/shm are mounted(%).
(*) for example nxdtest logs information about the system on startup:
date: Mon, 29 Nov 2021 17:27:04 MSK
xnode: slapuserX@test.node
...
(%) Here is how nxdtest is run in fallback mode on my Debian 11 with
user namespaces disabled via `sysctl kernel.unprivileged_userns_clone=0`
(neo) (z-dev) (g.env) kirr@deca:~/src/wendelin/nxdtest$ nxdtest
date: Thu, 02 Dec 2021 14:04:30 MSK
xnode: kirr@deca.navytux.spb.ru
uname: Linux deca 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64
cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>>> pytest
$ python -m pytest
# user namespaces not available. isolation and many checks will be deactivated. <--- NOTE
===================== test session starts ======================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.10.0, pluggy-0.13.1
rootdir: /home/kirr/src/wendelin/nxdtest
plugins: timeout-1.4.2
collected 23 items
nxdtest/nxdtest_pylint_test.py .... [ 17%]
nxdtest/nxdtest_pytest_test.py ... [ 30%]
nxdtest/nxdtest_test.py ......xx [ 65%]
nxdtest/nxdtest_unittest_test.py ........ [100%]
============= 21 passed, 2 xfailed in 2.67 seconds =============
ok pytest 3.062s # 23t 0e 0f 0s
# ran 1 test case: 1·ok
/helped-by <a href="/tomo" data-user="737" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Thomas Gambier">@tomo</a>
/helped-and-reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/13" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5271" data-project-path="nexedi/nxdtest" data-iid="13" data-mr-title="Run each testcase with its own /tmp and /dev/shm" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!13</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/4fe9ee16dda5d2f67b52532a5ce9b3175c7bcb4dLog that master is connected and for which test_result this run is2021-12-09T06:41:59+03:00Kirill Smelkovkirr@nexedi.com
Also log if master told us that we have nothing to do, and if the mode to run is local.
This should make it a bit more clear what is going on just by looking at
nxdtest log. See previous patch for more details and context.
For the reference: here is how updated output looks like in the normal case:
date: Thu, 09 Dec 2021 04:20:37 CET
xnode: slapuser7@rapidspace-testnode-001
uname: Linux rapidspace-testnode-001 4.9.0-16-amd64 #1 SMP Debian 4.9.272-1 (2021-06-21) x86_64
cpu: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
# running for test_result_module/20211209-170FD3998
>>> pytest
$ python -m pytest
============================= test session starts ==============================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.9.0, pluggy-0.13.1
rootdir: /srv/slapgrid/slappart7/t/dfp/soft/47cc86af27d234f0464630f2a0d22a6f/parts/zodbtools-dev
collected 46 items
zodbtools/test/test_analyze.py . [ 2%]
zodbtools/test/test_commit.py .. [ 6%]
zodbtools/test/test_dump.py ... [ 13%]
zodbtools/test/test_restore.py .. [ 17%]
zodbtools/test/test_tidrange.py ............................. [ 80%]
zodbtools/test/test_zodb.py ......... [100%]
========================== 46 passed in 9.15 seconds ===========================
ok pytest 12.433s # 46t 0e 0f 0s
# ran 1 test case: 1·ok
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/15" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/15" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5293" data-project-path="nexedi/nxdtest" data-iid="15" data-mr-title="Log how we connected to master and system info + run summary, even if master tells us to do nothing" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!15</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/f8ec57870ebd9e3c62034ad609fb3f169add44d3Always log system info and run summary, even if master tells us to do nothing2021-12-09T06:41:06+03:00Kirill Smelkovkirr@nexedi.com
Nxdtest logs system information (<a href="/nexedi/nxdtest/-/commit/bd91f6f1579acc5eb52fbc40c1965ca99598253e" data-original="bd91f6f1" data-link="false" data-link-reference="false" data-project="1269" data-commit="bd91f6f1579acc5eb52fbc40c1965ca99598253e" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Include system information into log output" class="gfm gfm-commit has-tooltip">bd91f6f1</a> "Include system information
into log output") and run summary at the end (<a href="/nexedi/nxdtest/-/commit/9f413221fb9fd7785b23c39616f6c7066660bc0e" data-original="9f413221" data-link="false" data-link-reference="false" data-project="1269" data-commit="9f413221fb9fd7785b23c39616f6c7066660bc0e" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Emit run summary at the end" class="gfm gfm-commit has-tooltip">9f413221</a> "Emit run summary
at the end"). However all that information currently is printed only if
master is successfully connected and actually tells to run the tests.
This behaviour is not very useful, because if nxdtest log output on
testnode is empty, it is not clear - whether it was "we have to do
nothing", or nxdtest stuck somewhere or something else.
For example
<a href="https://nexedijs.erp5.net/#/test_result_module/20211208-47D165B7/12" rel="nofollow noreferrer noopener" target="_blank">https://nexedijs.erp5.net/#/test_result_module/20211208-47D165B7/12</a> is
currently marked as Running for many long hours already. And the log on
testnode regarding nxdtest run is just:
2021-12-08 15:42:17,314 INFO $ PATH=/srv/slapgrid/slappart13/srv/slapos/soft/2956f419073cb2249ed953507fa6b173/bin:/opt/slapos/parts/bison/bin:/opt/slapos/parts/bzip2/bin:/opt/slapos/parts/gettext/bin:/opt/slapos/parts/glib/bin:/opt/slapos/parts/libxml2/bin:/opt/slapos/parts/libxslt/bin:/opt/slapos/parts/m4/bin:/opt/slapos/parts/ncurses/bin:/opt/slapos/parts/openssl/bin:/opt/slapos/parts/pkgconfig/bin:/opt/slapos/parts/python2.7/bin:/opt/slapos/parts/readline/bin:/opt/slapos/parts/sqlite3/bin:/opt/slapos/parts/swig/bin:/opt/slapos/bin:/opt/slapos/parts/patch/bin:/opt/slapos/parts/socat/bin:/usr/bin:/usr/sbin:/sbin:/bin SLAPOS_TEST_LOG_DIRECTORY=/srv/slapgrid/slappart13/var/log/testnode/dgd-xStX9safSG SLAPOS_TEST_SHARED_PART_LIST=/srv/slapgrid/slappart13/srv/shared:/srv/slapgrid/slappart13/t/dgd/shared /bin/sh /srv/slapgrid/slappart13/t/dgd/i/0/bin/runTestSuite --master_url $DISTRIBUTOR_URL --revision slapos=13977-ec686a708633f689382426063c21efbe3b2eab04,slapos.core=8698-91edab77ed36c160da8017cfdc1673fe7a8e10de --test_node_title rapidspace-testnode-008-3Nodes-DEPLOYTASK0 --test_suite SLAPOS-SR-TEST --test_suite_title SlapOS.SoftwareReleases.IntegrationTest-kirr.Python2 --project_title 'Rapid.Space Project'
without anything else.
With this patch nxdtest would print system information and report how many
tests it had run, if its invocation did not stuck.
In this patch we only move code that calls system_info and defer summary log
before code that connects to master. In the following patch we'll add more
logging around connecting to master.
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/15" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/15" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5293" data-project-path="nexedi/nxdtest" data-iid="15" data-mr-title="Log how we connected to master and system info + run summary, even if master tells us to do nothing" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!15</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/9f413221fb9fd7785b23c39616f6c7066660bc0eEmit run summary at the end2021-11-11T11:35:49+03:00Kirill Smelkovkirr@nexedi.com
Sometimes there is zero testcases to be executed on testnodes, and
log output from nxdtest is just
date: Wed, 10 Nov 2021 12:31:50 MSK
xnode: kirr@deca.navytux.spb.ru
uname: Linux deca 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64
cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
it is not clear from such output did the run ended or the test got
stuck. After this patch it becomes
date: Wed, 10 Nov 2021 12:31:50 MSK
xnode: kirr@deca.navytux.spb.ru
uname: Linux deca 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64
cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
# ran 0 test cases.
And in general, when there are several testcases to be run, it is helpful to
indicate end of such run and to print brief summary of result status for all
ran test cases. Example output:
wendelin.core$ nxdtest -k test.wcfs
date: Wed, 10 Nov 2021 12:35:34 MSK
xnode: kirr@deca.navytux.spb.ru
uname: Linux deca 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64
cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>>> test.wcfs/fs:1
...
ok test.wcfs/fs:1 25.035s # 35t 0e 0f 0s
>>> test.wcfs/fs:2
...
ok test.wcfs/fs:2 21.033s # 35t 0e 0f 0s
>>> test.wcfs/fs:
...
ok test.wcfs/fs: 21.056s # 35t 0e 0f 0s
# ran 3 test cases: 3·ok
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/12" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/12" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5214" data-project-path="nexedi/nxdtest" data-iid="12" data-mr-title="Emit run summary et the end" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!12</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/72e3608893359c79460bb2e634fdaecccf7d33a3support parsing pylint output2021-08-13T12:13:27+02:00Jérome Perrinjerome@nexedi.com
This parses pylint output with a simple regexp and counts one failure per
message reported.
This has not been tested yet, but we decided to apply this commit already.
/acked-by <a href="/kirr" data-user="14" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Kirill Smelkov">@kirr</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/11" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/11" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5035" data-project-path="nexedi/nxdtest" data-iid="11" data-mr-title="WIP: support parsing pylint output" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!11</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/7b5add47d60227a9733b17c0ffaebe06f456548eloadNXDTestFile: use `compile` for better tracebacks on errors2021-08-13T12:07:08+02:00Jérome Perrinjerome@nexedi.com
When using compile with the actual file path, we can have better tracebacks
in case of errors.
before:
Traceback (most recent call last):
File "/srv/slapgrid/slappart3/srv/runner/software/9544feb19475590d240ba2d32743c0a0/bin/nxdtest", line 22, in <module>
sys.exit(nxdtest.main())
File "/srv/slapgrid/slappart3/srv/runner/software/9544feb19475590d240ba2d32743c0a0/parts/nxdtest/nxdtest/__init__.py", line 142, in main
tenv = loadNXDTestFile('.nxdtest')
File "/srv/slapgrid/slappart3/srv/runner/software/9544feb19475590d240ba2d32743c0a0/parts/nxdtest/nxdtest/__init__.py", line 75, in loadNXDTestFile
six.exec_(src, g)
File "<string>", line 77, in <module>
NameError: name 'Pylint' is not defined
after:
Traceback (most recent call last):
File "/srv/slapgrid/slappart3/srv/runner/software/9544feb19475590d240ba2d32743c0a0/bin/nxdtest", line 22, in <module>
sys.exit(nxdtest.main())
File "/srv/slapgrid/slappart3/srv/runner/software/9544feb19475590d240ba2d32743c0a0/parts/nxdtest/nxdtest/__init__.py", line 142, in main
tenv = loadNXDTestFile('.nxdtest')
File "/srv/slapgrid/slappart3/srv/runner/software/9544feb19475590d240ba2d32743c0a0/parts/nxdtest/nxdtest/__init__.py", line 75, in loadNXDTestFile
six.exec_(compile(src, os.path.realpath(path), 'exec'), g)
File "/srv/slapgrid/slappart3/srv/runner/instance/slappart8/var/nxdtest/.nxdtest", line 77, in <module>
summaryf=Pylint.summary,
NameError: name 'Pylint' is not defined
/reviewed-by <a href="/kirr" data-user="14" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Kirill Smelkov">@kirr</a>
/reviewed-in <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/10" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/10" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5034" data-project-path="nexedi/nxdtest" data-iid="10" data-mr-title="loadNXDTestFile: use `compile` for better tracebacks on errors" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!10</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/0ad45a9cafdf7704071f77a41a58c4393e75f192Detect if a test leaks processes and terminate them2021-08-12T11:14:52+03:00Kirill Smelkovkirr@nexedi.com
For every TestCase nxdtest spawns test process to run with stdout/stderr
redirected to pipes that nxdtest reads. Nxdtest, in turn, tees those
pipes to its stdout/stderr until the pipes become EOF. If the test
process, in turn, spawns other processes, those other processes will
inherit opened pipes, and so the pipes won't become EOF untill _all_
spawned test processes (main test process + other processes that it
spawns) exit. Thus, if there will be any process, that the main test
process spawned, but did not terminated upon its own exit, nxdtest will
get stuck waiting for pipes to become EOF which won't happen at all if a
spawned test subprocess persists not to terminate.
I hit this problem for real on a Wendelin.core 2 test - there the main
test processes was segfaulting and so did not instructed other spawned
processes (ZEO, WCFS, ...) to terminate. As the result the whole test
was becoming stuck instead of being promptly reported as failed:
runTestSuite: Makefile:175: recipe for target 'test.wcfs' failed
runTestSuite: make: *** [test.wcfs] Segmentation fault
runTestSuite: wcfs: 2021/08/09 17:32:09 zlink [::1]:52052 - [::1]:23386: recvPkt: EOF
runTestSuite: E0809 17:32:09.376800 38082 wcfs.go:2574] zwatch <a href="zeo://localhost:23386">zeo://localhost:23386</a>: zlink [::1]:52052 - [::1]:23386: recvPkt: EOF
runTestSuite: E0809 17:32:09.377431 38082 wcfs.go:2575] zwatcher failed -> switching filesystem to EIO mode (TODO)
<LONG WAIT>
runTestSuite: PROCESS TOO LONG OR DEAD, GOING TO BE TERMINATED
-> Fix it.
/reviewed-by <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/9" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/9" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="5026" data-project-path="nexedi/nxdtest" data-iid="9" data-mr-title="Detect if a test leaks processes and terminate them" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!9</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/b5a74214e1ef34fb4d917a072fa9efc6ef42d1ccuse re.search to filter tests in --run2020-12-01T06:08:46+01:00Jérome Perrinjerome@nexedi.com
re.match only find matches where the pattern appears at the beginning of
the string, whereas re.search matches if the pattern appears anywhere in
the string. This is behavior is consistent with pytest, go test and ERP5's
runUnitTest
For more details, see the discussion from <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/6#note_121409" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/6#note_121409" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4384" data-project-path="nexedi/nxdtest" data-iid="6" data-mr-title="Unittest and Python3 support" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!6 (comment 121409)</a>
/reviewed-on: <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/8" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/8" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4405" data-project-path="nexedi/nxdtest" data-iid="8" data-mr-title="use re.search to filter tests in --run" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!8</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/1e6a1cc62d8d9e78900e1e2206959f147d7bcc0eSwitch tee from threading.Thread to sync.WorkGroup2020-11-26T10:36:43+03:00Kirill Smelkovkirr@nexedi.com
The reason is that with threading.Thread if exception happens in that
spawned thread, this error is not propagated to main driver, while with
sync.WorkGroup an exception from any spawned worker is propagated back
to main. For example with the following injected error
--- a/nxdtest/__init__.py
+++ b/nxdtest/__init__.py
@@ -267,6 +267,7 @@ def main():
# tee, similar to tee(1) utility, copies data from fin to fout appending them to buf.
def tee(ctx, fin, fout, buf):
+ 1/0
while 1:
before this patch nxdtest behaves like ...
(neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ nxdtest
date: Tue, 24 Nov 2020 14:55:08 MSK
xnode: kirr@deco.navytux.spb.ru
uname: Linux deco 5.9.0-2-amd64 #1 SMP Debian 5.9.6-1 (2020-11-08) x86_64
cpu: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
>>> pytest
$ python -m pytest
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee
1/0
ZeroDivisionError: integer division or modulo by zero
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee
1/0
ZeroDivisionError: integer division or modulo by zero
error pytest 0.583s # 1t 1e 0f 0s
(neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ echo $?
0
Here the error in another thread is only printed, but nxdtest is not aborted.
Above it reported "error", but e.g. when testing pygolang/py3 and raising an
error in tee it even reported it was succeeding
( <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/6#note_121393" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/6#note_121393" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4384" data-project-path="nexedi/nxdtest" data-iid="6" data-mr-title="Unittest and Python3 support" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">nexedi/nxdtest!6 (comment 121393)</a> ):
slapuser34@vifibcloud-rapidspace-hosting-007:~/srv/runner/instance/slappart0$ ./bin/runTestSuite
date: Tue, 24 Nov 2020 12:51:23 MSK
xnode: slapuser34@vifibcloud-rapidspace-hosting-007
uname: Linux vifibcloud-rapidspace-hosting-007 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
cpu: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
>>> thread
$ python -m pytest
Exception in thread Thread-1:
Traceback (most recent call last):
File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/srv/slapgrid/slappart34/srv/runner/software/44fe7dd3f13ecd100894c6368a35c055/parts/nxdtest/nxdtest/__init__.py", line 268, in tee
fout.write(data)
TypeError: write() argument must be str, not bytes
ok thread 9.145s # 1t 0e 0f 0s
>>> gevent
$ gpython -m pytest
Exception in thread Thread-3:
Traceback (most recent call last):
File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/srv/slapgrid/slappart34/srv/runner/shared/python3/5497998c60d97cbbf748337ccce21db2/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/srv/slapgrid/slappart34/srv/runner/software/44fe7dd3f13ecd100894c6368a35c055/parts/nxdtest/nxdtest/__init__.py", line 268, in tee
fout.write(data)
TypeError: write() argument must be str, not bytes
ok gevent 21.980s # 1t 0e 0f 0s
After this patch nxdtest correctly handles and propagates an error originating
in spawned thread back to main driver:
(neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ nxdtest
date: Tue, 24 Nov 2020 14:54:19 MSK
xnode: kirr@deco.navytux.spb.ru
uname: Linux deco 5.9.0-2-amd64 #1 SMP Debian 5.9.6-1 (2020-11-08) x86_64
cpu: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
>>> pytest
$ python -m pytest
Traceback (most recent call last):
File "/home/kirr/src/wendelin/venv/z4-dev/bin/nxdtest", line 11, in <module>
load_entry_point('nxdtest', 'console_scripts', 'nxdtest')()
File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 230, in main
wg.wait()
File "golang/_sync.pyx", line 237, in golang._sync.PyWorkGroup.wait
pyerr_reraise(pyerr)
File "golang/_sync.pyx", line 217, in golang._sync.PyWorkGroup.go.pyrunf
f(pywg._pyctx, *argv, **kw)
File "/home/kirr/src/wendelin/nxdtest/nxdtest/__init__.py", line 270, in tee
1/0
ZeroDivisionError: integer division or modulo by zero
(neo) (z4-dev) (g.env) kirr@deco:~/src/wendelin/nxdtest$ echo $?
1
NOTE sync.WorkGroup requires every worker to handle context cancellation, so
that whenever there is an error, all other workers are canceled. We add such
cancellation handling to tee but only lightly: before going to block in
read/write syscalls we check for whether ctx is canceled or not. However the
proper handling would be to switch file descriptors into non-block mode and to
select at every IO point on both potential IO events and potential
cancellation. This is left as TODO for the future.
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/7" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/7" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4393" data-project-path="nexedi/nxdtest" data-iid="7" data-mr-title="Switch tee from threading.Thread to sync.WorkGroup" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">nexedi/nxdtest!7</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/40e2c4abb793aed2e18235079cf6365382a09c8bUnittest and Python3 support2020-11-24T12:17:51+01:00Jérome Perrinjerome@nexedi.com
These are the necessary changes to run `SlapOS.Eggs.UnitTest-*` and `SlapOS.SoftwareReleases.IntegrationTest-*` using nxdtest
See merge request <a href="/nexedi/nxdtest/-/merge_requests/6" data-original="nexedi/nxdtest!6" data-link="false" data-link-reference="false" data-project="1269" data-merge-request="4384" data-project-path="nexedi/nxdtest" data-iid="6" data-mr-title="Unittest and Python3 support" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!6</a>
/reviewed-by <a href="/kirr" data-user="14" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Kirill Smelkov">@kirr</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/a129b560d947a9916818fb8263f589817fb1c453Flush output right after printing running test name2020-11-24T06:30:19+01:00Jérome Perrinjerome@nexedi.com
If test program output on stderr (which is unbuffered), the
output of the test program will appear before output from nxdtest
advertising the program that is about to be executed, because nxdtest
stdout is buffered (testnode does not set PYTHONUNBUFFERED, and eventhough
nxdtest sets PYTHONUNBUFFERED in its own environ, this only applies to sub
processes)https://lab.node.vifib.com/nexedi/nxdtest/-/commit/bca50060fe522f7ca3c33afb75dc44ae7ed0cccdSupport output from unittest module from python standard library2020-11-24T06:30:19+01:00Jérome Perrinjerome@nexedi.comhttps://lab.node.vifib.com/nexedi/nxdtest/-/commit/53064e712b3521996418e12839442540b724219dAlso pass stderr output to summary method2020-11-24T06:30:19+01:00Jérome Perrinjerome@nexedi.com
While pytest sends everything in stdout, some other programs send on stderr.https://lab.node.vifib.com/nexedi/nxdtest/-/commit/beb9d47e6180d8dd697f12fb218caa2eff6b8103Treat program output as binary for python3 support2020-11-24T06:28:23+01:00Jérome Perrinjerome@nexedi.com
While treating output as text would not really be impossible, treating it
as bytes seems a better choice because:
- we don't have to make assumptions about what output encoding the test
program is using for output
- `tee` can just read stream output bytes by bytes without having to worry
about multi-bytes characters
- testnode protocol uses xmlrpc.client.Binary, which uses bytes.
Because using bufsize=1 implies reading subprocess output as text, we use
bufsize=0 instead in the subprocess.Popen call, to prevent buffering.
To make manipulation of strings and bytes easier, we add a dependency on
pygolang, so that we can use its strings utility functions.
Also add a few tests to verify general functionality.https://lab.node.vifib.com/nexedi/nxdtest/-/commit/d829e9caac428405d1a76976a8df95a31867c698PyTest: summary: Skip trail text after test summary2020-11-09T11:18:20+03:00Kirill Smelkovkirr@nexedi.com
For example wendelin.core 2 uses pytest_unconfigure to unmount wcfs
servers that it spawned:
<a href="https://lab.nexedi.com/kirr/wendelin.core/blob/1be4d730/conftest.py#L27-53">https://lab.nexedi.com/kirr/wendelin.core/blob/1be4d730/conftest.py#L27-53</a>
This usually leads to some test as shown in added test.
-> Skip that text backwards and actually detect pytest summary line
instead of returning empty dict, because empty dict forces Nexedi ERP5
to treat such test run to have UNKNOWN status, e.g.
<a href="https://erp5.nexedi.net/test_result_module/20201108-C170850/11" rel="nofollow noreferrer noopener" target="_blank">https://erp5.nexedi.net/test_result_module/20201108-C170850/11</a>
/cc <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/5" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/5" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4357" data-project-path="nexedi/nxdtest" data-iid="5" data-mr-title="PyTest: summary: Skip trail text after test summary" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!5</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/549cc2f66c1906917c6c42b7d6e1697f91cdedf7Test nxdtest via nxdtest2020-11-02T14:58:30+03:00Kirill Smelkovkirr@nexedi.com
This allows to establish tests for Nxdtest itself under Nexedi testing
infrastructure.
/cc <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/4" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/4" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4345" data-project-path="nexedi/nxdtest" data-iid="4" data-mr-title="Test nxdtest via nxdtest" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!4</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/7417e7cf29afb3fa989dd3fb70b0fb12276cc6f4Move to new home2020-10-30T12:38:00+03:00Kirill Smelkovkirr@nexedi.com
lab.nexedi.com/kirr/nxdtest -> lab.nexedi.com/nexedi/nxdtest
<a href="https://lab.nexedi.com/nexedi/slapos/merge_requests/839#note_119120" data-original="https://lab.nexedi.com/nexedi/slapos/merge_requests/839#note_119120" data-link="false" data-link-reference="true" data-project="15" data-merge-request="4303" data-project-path="nexedi/slapos" data-iid="839" data-mr-title="stack/nxdtest; Tests for wendelin.core, zodbtools, pygolang" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">nexedi/slapos!839 (comment 119120)</a>
<a href="https://lab.nexedi.com/nexedi/slapos/merge_requests/839#note_119142" data-original="https://lab.nexedi.com/nexedi/slapos/merge_requests/839#note_119142" data-link="false" data-link-reference="true" data-project="15" data-merge-request="4303" data-project-path="nexedi/slapos" data-iid="839" data-mr-title="stack/nxdtest; Tests for wendelin.core, zodbtools, pygolang" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">nexedi/slapos!839 (comment 119142)</a>
/cc <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/3" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/3" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4340" data-project-path="nexedi/nxdtest" data-iid="3" data-mr-title="Move to new home" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">nexedi/nxdtest!3</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/8b32fd7a0f991165fe8156269437c06a4ef84c98Modularize + Start of tests2020-10-30T10:37:04+01:00Kirill Smelkovkirr@nexedi.com
/cc <a href="/jerome" data-user="9" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Jérome Perrin">@jerome</a>
/reviewed-on <a href="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/1" data-original="https://lab.nexedi.com/nexedi/nxdtest/merge_requests/1" data-link="false" data-link-reference="true" data-project="1269" data-merge-request="4317" data-project-path="nexedi/nxdtest" data-iid="1" data-mr-title="Modularize + Start of tests" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">nexedi/nxdtest!1</a>https://lab.node.vifib.com/nexedi/nxdtest/-/commit/f91a050d8bf97645f110e24689a935a7a1f3e1f5Start of tests2020-10-28T14:13:01+03:00Kirill Smelkovkirr@nexedi.com
Verify pytest summary-parsing functionality.