• Thomas Gleixner's avatar
    x86/irq: Unbreak interrupt affinity setting · e027ffff
    Thomas Gleixner authored
    Several people reported that 5.8 broke the interrupt affinity setting
    mechanism.
    
    The consolidation of the entry code reused the regular exception entry code
    for device interrupts and changed the way how the vector number is conveyed
    from ptregs->orig_ax to a function argument.
    
    The low level entry uses the hardware error code slot to push the vector
    number onto the stack which is retrieved from there into a function
    argument and the slot on stack is set to -1.
    
    The reason for setting it to -1 is that the error code slot is at the
    position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
    indicates that the entry came via a syscall. If it's not set to a negative
    value then a signal delivery on return to userspace would try to restart a
    syscall. But there are other places which rely on pt_regs::orig_ax being a
    valid indicator for syscall entry.
    
    But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
    interrupt affinity setting mechanism, which was overlooked when this change
    was made.
    
    Moving interrupts on x86 happens in several steps. A new vector on a
    different CPU is allocated and the relevant interrupt source is
    reprogrammed to that. But that's racy and there might be an interrupt
    already in flight to the old vector. So the old vector is preserved until
    the first interrupt arrives on the new vector and the new target CPU. Once
    that happens the old vector is cleaned up, but this cleanup still depends
    on the vector number being stored in pt_regs::orig_ax, which is now -1.
    
    That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
    always false. As a consequence the interrupt is moved once, but then it
    cannot be moved anymore because the cleanup of the old vector never
    happens.
    
    There would be several ways to convey the vector information to that place
    in the guts of the interrupt handling, but on deeper inspection it turned
    out that this check is pointless and a leftover from the old affinity model
    of X86 which supported multi-CPU affinities. Under this model it was
    possible that an interrupt had an old and a new vector on the same CPU, so
    the vector match was required.
    
    Under the new model the effective affinity of an interrupt is always a
    single CPU from the requested affinity mask. If the affinity mask changes
    then either the interrupt stays on the CPU and on the same vector when that
    CPU is still in the new affinity mask or it is moved to a different CPU, but
    it is never moved to a different vector on the same CPU.
    
    Ergo the cleanup check for the matching vector number is not required and
    can be removed which makes the dependency on pt_regs:orig_ax go away.
    
    The remaining check for new_cpu == smp_processsor_id() is completely
    sufficient. If it matches then the interrupt was successfully migrated and
    the cleanup can proceed.
    
    For paranoia sake add a warning into the vector assignment code to
    validate that the assumption of never moving to a different vector on
    the same CPU holds.
    
    Fixes: 633260fa ("x86/irq: Convey vector as argument and not in ptregs")
    Reported-by: default avatarAlex bykov <alex.bykov@scylladb.com>
    Reported-by: default avatarAvi Kivity <avi@scylladb.com>
    Reported-by: default avatarAlexander Graf <graf@amazon.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarAlexander Graf <graf@amazon.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
    e027ffff
vector.c 33 KB