• Dave Martin's avatar
    arm64: signal: Ensure si_code is valid for all fault signals · af40ff68
    Dave Martin authored
    Currently, as reported by Eric, an invalid si_code value 0 is
    passed in many signals delivered to userspace in response to faults
    and other kernel errors.  Typically 0 is passed when the fault is
    insufficiently diagnosable or when there does not appear to be any
    sensible alternative value to choose.
    
    This appears to violate POSIX, and is intuitively wrong for at
    least two reasons arising from the fact that 0 == SI_USER:
    
     1) si_code is a union selector, and SI_USER (and si_code <= 0 in
        general) implies the existence of a different set of fields
        (siginfo._kill) from that which exists for a fault signal
        (siginfo._sigfault).  However, the code raising the signal
        typically writes only the _sigfault fields, and the _kill
        fields make no sense in this case.
    
        Thus when userspace sees si_code == 0 (SI_USER) it may
        legitimately inspect fields in the inactive union member _kill
        and obtain garbage as a result.
    
        There appears to be software in the wild relying on this,
        albeit generally only for printing diagnostic messages.
    
     2) Software that wants to be robust against spurious signals may
        discard signals where si_code == SI_USER (or <= 0), or may
        filter such signals based on the si_uid and si_pid fields of
        siginfo._sigkill.  In the case of fault signals, this means
        that important (and usually fatal) error conditions may be
        silently ignored.
    
    In practice, many of the faults for which arm64 passes si_code == 0
    are undiagnosable conditions such as exceptions with syndrome
    values in ESR_ELx to which the architecture does not yet assign any
    meaning, or conditions indicative of a bug or error in the kernel
    or system and thus that are unrecoverable and should never occur in
    normal operation.
    
    The approach taken in this patch is to translate all such
    undiagnosable or "impossible" synchronous fault conditions to
    SIGKILL, since these are at least probably localisable to a single
    process.  Some of these conditions should really result in a kernel
    panic, but due to the lack of diagnostic information it is
    difficult to be certain: this patch does not add any calls to
    panic(), but this could change later if justified.
    
    Although si_code will not reach userspace in the case of SIGKILL,
    it is still desirable to pass a nonzero value so that the common
    siginfo handling code can detect incorrect use of si_code == 0
    without false positives.  In this case the si_code dependent
    siginfo fields will not be correctly initialised, but since they
    are not passed to userspace I deem this not to matter.
    
    A few faults can reasonably occur in realistic userspace scenarios,
    and _should_ raise a regular, handleable (but perhaps not
    ignorable/blockable) signal: for these, this patch attempts to
    choose a suitable standard si_code value for the raised signal in
    each case instead of 0.
    
    arm64 was the only arch to define a BUS_FIXME code, so after this
    patch nobody defines it.  This patch therefore also removes the
    relevant code from siginfo_layout().
    
    Cc: James Morse <james.morse@arm.com>
    Reported-by: default avatarEric W. Biederman <ebiederm@xmission.com>
    Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
    Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
    af40ff68
fault.c 22.4 KB