Commit 8a7bf8bb authored by Guillaume Hervier's avatar Guillaume Hervier

software/slaprunner: Fix resiliency (runner-sshd)

/reviewed-on nexedi/slapos!465
parent dd9b9623
...@@ -26,11 +26,11 @@ md5sum = ed2e08c07a6727b2012f15da67c0705d ...@@ -26,11 +26,11 @@ md5sum = ed2e08c07a6727b2012f15da67c0705d
[instance-runner-import] [instance-runner-import]
filename = instance-runner-import.cfg.in filename = instance-runner-import.cfg.in
md5sum = 2d516c6f1337efb7def32771c2eabc2b md5sum = 1ed9526a6e5ac9a80e5b3add2d0a88fe
[instance-runner-export] [instance-runner-export]
filename = instance-runner-export.cfg.in filename = instance-runner-export.cfg.in
md5sum = 01395b98534066ec23a5ad5c96d5f1e7 md5sum = 8132130c89896c2d30a08f3b3a7000ff
[template-resilient] [template-resilient]
filename = instance-resilient.cfg.jinja2 filename = instance-resilient.cfg.jinja2
......
...@@ -23,6 +23,7 @@ parts += ...@@ -23,6 +23,7 @@ parts +=
runner-sshkeys-authority runner-sshkeys-authority
runner-sshkeys-authority-service runner-sshkeys-authority-service
runner-sshkeys-sshd runner-sshkeys-sshd
runner-sshkeys-sshd-service
runtestsuite runtestsuite
symlinks symlinks
shellinabox shellinabox
......
...@@ -19,6 +19,7 @@ parts += ...@@ -19,6 +19,7 @@ parts +=
runner-sshkeys-authority runner-sshkeys-authority
runner-sshkeys-authority-service runner-sshkeys-authority-service
runner-sshkeys-sshd runner-sshkeys-sshd
runner-sshkeys-sshd-service
runtestsuite runtestsuite
shellinabox shellinabox
shellinabox-service shellinabox-service
......
  • @guillaume.hervier Good to see that this issue is now fixed 👍 The commit message is not good however, as runner-sshd is a base service of the runner itself, and has nothing to do with the resiliency (the resilient stack deploys its own ssh server). Did you have time to check also the 2nd bug I've reported about the duplicate certificate authority service ?

  • Indeed, i should have named the commit something like "Fix resilient webrunner"

    About the duplicate certificate-authority bug, the problem is that this part is defined at multiples locations because of stack/monitor also defining this part, which create some conflicts on what part takes precedence, which makes the presence of two certificate-authority services, one from the slaprunner SR, one from the monitor stack...

    I don't actually have a good fix for that as the problem is mostly about parts conflicts between SRs and how we should name them, which I think could lead to bigger problems

  • which I think could lead to bigger problems

    Indeed, for the moment we have something that "happens to work", but by using a bug in the Software Release.

    which create some conflicts on what part takes precedence,

    I don't know if you already digged in buildout's code (if not it will happen some day for sure), but in reality it is really simple (and stupid). You can overwrite options by re-defining its section in a higher-level template. For a same section name, the highest/latest section wins.

    Which means that the latest certificate-authority declared wins. Now, if I look to a resilient webrunner .installed-switch-software-type, I have :

    • a section [certificate-authority] installing the service ~/etc/service/certificate_authority
    • a section [certificate-authority-service] installing the service ~/etc/service/certificate_authority-f570d062fd7fa79dc5303876e0c4575e
    • certificate_authority-f570d062fd7fa79dc5303876e0c4575e wrapping a ~/srv/runner/instance/slappart2/bin/certificate_authority that doesn't exist

    In your commit 142d35d8 , I think you intended to make the section [certificate-authority] create a script in ~/bin, and you wanted to install the service with a new section [certificate-authority-service] .

    As there is no ~/bin/certificate_authority but a ~/etc/service/certificate_authority, it seems that the [certificate-authority] section installed is the one from monitor, which means that the monitor buildout is overwriting the common instance-runner.cfg buildout.

    Because I was interested in the issue, I played a bit to understand what was happening. The result is here : Nicolas/slapos@0c519a52 . It seems fixing this issue of duplicated [certificate-authority]. Please give me your opinion on this commit. If it is positive, I'll run tests on it then merge.

    In conclusion, the bugs introduced by !409 (merged) could have been spotted before or right after merging into master, thanks to test results. At that time tests were broken, this is why we found it out so long after. So if someone breaks tests again, please crush them.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment