Commit 8c83f2e7 authored by Nicolas Wavrant's avatar Nicolas Wavrant

pbs: open device for output in write mode

As found by jm@nexedi.com :

>>> with open('/dev/null') as f:
...   subprocess.check_call(('echo', 'foo'), stdout=f)

echo: write error: Bad file descriptor
parent 7ddefe29
Pipeline #8613 failed with stage
in 0 seconds
...@@ -43,7 +43,7 @@ from slapos.recipe.librecipe import shlex ...@@ -43,7 +43,7 @@ from slapos.recipe.librecipe import shlex
def promise(ssh_client, user, host, port): def promise(ssh_client, user, host, port):
# Redirect output to /dev/null # Redirect output to /dev/null
with open(os.devnull) as _dev_null: with open(os.devnull, 'wb') as _dev_null:
ssh = subprocess.Popen( ssh = subprocess.Popen(
(ssh_client, '%s@%s' % (user, host), '-p', str(port)), (ssh_client, '%s@%s' % (user, host), '-p', str(port)),
stdin=subprocess.PIPE, stdout=_dev_null, universal_newlines=True) stdin=subprocess.PIPE, stdout=_dev_null, universal_newlines=True)
......
  • mentioned in commit 40fb8650

    Toggle commit list
  • Sorry but this is a bad commit message. On one side, it's useless because what it states is obvious by reading the diff. On the other side, it does not explain why it is the correct change.

    I didn't fix it myself because the intention is not clear. For me, there are 2 possibilities:

    1. Either the command outputs nothing and there's no point redirecting to /dev/null
    2. Or the command can output something and we had a bug. If so, what is it? What is its severity and its urgency to fix on production?

    The only case where this commit is valid is if the command has finished its work when it outputs things for the first time, in which case we had a harmless bug. But it has to be proven. Things can actually be more complicated because ssh is sometimes tolerant: with invalid stdout, set -e; echo foo; touch /tmp/foo creates /tmp/foo but set -e; echo foo; sleep 1; echo bar; touch /tmp/foo doesn't.

    /cc @nexedi to remind about the importance of commit messages

  • Hi @jm,

    I agree that this commit message is a bit sparse, but I'm not sure what is not obvious in this commit. More especially with the exemple showing a villain traceback without this change.

    The context here is of a promise, so we don't care about stdout (= all is fine), any error will be printed on stderr. If something's wrong, the promise raises because of the return code. If I remember correctly, stderr is reported by failed promises.

    Things can actually be more complicated because ssh is sometimes tolerant: with invalid stdout, set -e; echo foo; touch /tmp/foo creates /tmp/foo but set -e; echo foo; sleep 1; echo bar; touch /tmp/foo

    This I don't understand, I'll be glad you explain me this strange behavior :) But I'm wondering if this exemple is not a bit out of scope, because this Python code is using the execvp API, when the commands you're giving are shell commands.

  • This I don't understand, I'll be glad you explain me this strange behavior :)

    Strange as you said. That's all you need to know, i.e. you must not assume anything about the behaviour of a program when run in a buggy environment.

    But I'm wondering if this example is not a bit out of scope, because this Python code is using the execvp API, when the commands you're giving are shell commands.

    When you pass a command to ssh, this command is executed by a remote shell. I misread and wrongly thought that's what our code does, whereas we only send a rdiff-backup quit command (BTW, I should have kept the comment explaining this). But my example is still relevant to show how ssh behaves with respect to its stdout. What if rdiff-backup does something similar to my shell example?

    The context here is of a promise, so we don't care about stdout (= all is fine), any error will be printed on stderr.

    1. Any part of some software can bug in a critical way. We must always analyze an issue, not necessarily completely but at least identify what could be the worse and then, if we can leave with it for a while, stop.
    2. Software don't always print to the correct fd.

    I'm not sure what is not obvious in this commit.

    How can I write the 2 different possibilities I listed in my previous comment?

    What I wanted to read for such a diff is a confirmation that:

    1. the executed command may really write something to stdout (justifying the need to keep a redirection to /dev/null)
    2. writing to bad fd was always done in a way that did not alter the behaviour of ssh

    But the whole problem may be elsewhere, in the commit that added the redirection. There's also no explanation why it was done:

    • whether it was a precaution against any possible change of behaviour of an external program, even if no stdout was actually seen
    • or some unwanted stdout was seen
  • What if rdiff-backup does something similar to my shell example?

    My guess is that it is indeed not shell-specific.

    The issue timing-dependent behaviour may be related to the delay added to small packets when TCP_NODELAY is not enabled, plus network latency, plus any intermediate buffers: server side will take a while to realise that its stdout/stderr errors out on client side, and only get EPIPE errors (converted into a SIGPIPE ?) well after the corresponding write.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment