nxdtest/__init__.py · 938b545504706033e0188a0976eb0b45686124a3 · nexedi / nxdtest

Propagate cancellation to spawned test jobs · 938b5455

Kirill Smelkov authored Dec 07, 2021

A user might cancel test result in ERP5 UI if e.g. some misbehaviour is
detected and a new revision is ready to be tested. This works by
test_result.start() returning None - indicating that there is no more
test_result_lines to exercise. Master also indicates this cancellation
via test_result.isAlive() returning False, but until now we were not
using that information and were always waiting for completion of current
test job that is already spawned.

This works well in practice if individual tests are not long, but e.g.
for SlapOS.SoftwareReleases.IntegrationTest-* it is not good, because
there an individual test might takes _hours_ to execute.

-> Fix it by first setting global context to where we'll propagate
cancellation from test_result.isAlive, and by using that context as the
base for all other activities. This should terminate spawned test
process if test_result is canceled.

The interval to check is picked up as 5 minutes not to overload master.
@jerome says that

    We now have 341 active test nodes, but sometimes we are using
    more, we did in the past to stress test some new machines.

    For the developer, if we reduce the waiting time from a few hours to 1
    minutes or 5 minutes seems more or less equivalent.

For 350 testnodes and each nxdtest checking its test_result status via
isAlive query to master every 5 minutes, it results in ~ 1 isAlive
request/second to master on average.

Had to change time to golang.time to use time.after().
Due to that time() and sleep() are changed to time.now() and
time.sleep() correspondingly.

/helped-and-reviewed-by @jerome
/reviewed-on nexedi/nxdtest!14

938b5455

__init__.py 20.9 KB

Replace __init__.py

Replace init.py