• Eric W. Biederman's avatar
    signal: Requeue ptrace signals · b171f667
    Eric W. Biederman authored
    Kyle Huey <me@kylehuey.com> writes:
    
    > rr, a userspace record and replay debugger[0], uses the recorded register
    > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
    > executing the program during replay.
    >
    > If a SIGKILL races with processing another signal in get_signal, it is
    > possible for the kernel to decline to notify the tracer of the original
    > signal. But if the original signal had a handler, the kernel proceeds
    > with setting up a signal handler frame as if the tracer had chosen to
    > deliver the signal unmodified to the tracee. When the kernel goes to
    > execute the signal handler that it has now modified the stack and registers
    > for, it will discover the pending SIGKILL, and terminate the tracee
    > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
    > the tracer, however, the effects of handler setup will be visible to
    > the tracer.
    >
    > Because rr (the tracer) was never notified of the signal, it is not aware
    > that a signal handler frame was set up and expects the state of the program
    > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
    > by allowing the program to execute from the last event. When that fails
    > to happen during replay, rr will assert and die.
    >
    > The following patches add an explicit check for a newly pending SIGKILL
    > after the ptracer has been notified and the siglock has been reacquired.
    > If this happens, we stop processing the current signal and proceed
    > immediately to handling the SIGKILL. This makes the state reported at
    > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
    > work to set up a signal handler frame that will never be used.
    >
    > [0] https://rr-project.org/
    
    The problem is that while the traced process makes it into ptrace_stop,
    the tracee is killed before the tracer manages to wait for the
    tracee and discover which signal was about to be delivered.
    
    More generally the problem is that while siglock was dropped a signal
    with process wide effect is short cirucit delivered to the entire
    process killing it, but the process continues to try and deliver another
    signal.
    
    In general it impossible to avoid all cases where work is performed
    after the process has been killed.  In particular if the process is
    killed after get_signal returns the code will simply not know it has
    been killed until after delivering the signal frame to userspace.
    
    On the other hand when the code has already discovered the process
    has been killed and taken user space visible action that shows
    the kernel knows the process has been killed, it is just silly
    to then write the signal frame to the user space stack.
    
    Instead of being silly detect the process has been killed
    in ptrace_signal and requeue the signal so the code can pretend
    it was simply never dequeued for delivery.
    
    To test the process has been killed I use fatal_signal_pending rather
    than signal_group_exit to match the test in signal_pending_state which
    is used in schedule which is where ptrace_stop detects the process has
    been killed.
    
    Requeuing the signal so the code can pretend it was simply never
    dequeued improves the user space visible behavior that has been
    present since ebf5ebe3 ("[PATCH] signal-fixes-2.5.59-A4").
    
    Kyle Huey verified that this change in behavior and makes rr happy.
    Reported-by: default avatarKyle Huey <khuey@kylehuey.com>
    Reported-by: default avatarMarko Mäkelä <marko.makela@mariadb.com>
    History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.giReviewed-by: default avatarKees Cook <keescook@chromium.org>
    Link: https://lkml.kernel.org/r/87tugcd5p2.fsf_-_@email.froward.int.ebiederm.orgSigned-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
    b171f667
signal.c 122 KB