• Paul Mackerras's avatar
    KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code · 19ccb76a
    Paul Mackerras authored
    With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
    core), whenever a CPU goes idle, we have to pull all the other
    hardware threads in the core out of the guest, because the H_CEDE
    hcall is handled in the kernel.  This is inefficient.
    
    This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
    in real mode.  When a guest vcpu does an H_CEDE hcall, we now only
    exit to the kernel if all the other vcpus in the same core are also
    idle.  Otherwise we mark this vcpu as napping, save state that could
    be lost in nap mode (mainly GPRs and FPRs), and execute the nap
    instruction.  When the thread wakes up, because of a decrementer or
    external interrupt, we come back in at kvm_start_guest (from the
    system reset interrupt vector), find the `napping' flag set in the
    paca, and go to the resume path.
    
    This has some other ramifications.  First, when starting a core, we
    now start all the threads, both those that are immediately runnable and
    those that are idle.  This is so that we don't have to pull all the
    threads out of the guest when an idle thread gets a decrementer interrupt
    and wants to start running.  In fact the idle threads will all start
    with the H_CEDE hcall returning; being idle they will just do another
    H_CEDE immediately and go to nap mode.
    
    This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
    These functions have been restructured to make them simpler and clearer.
    We introduce a level of indirection in the wait queue that gets woken
    when external and decrementer interrupts get generated for a vcpu, so
    that we can have the 4 vcpus in a vcore using the same wait queue.
    We need this because the 4 vcpus are being handled by one thread.
    
    Secondly, when we need to exit from the guest to the kernel, we now
    have to generate an IPI for any napping threads, because an HDEC
    interrupt doesn't wake up a napping thread.
    
    Thirdly, we now need to be able to handle virtual external interrupts
    and decrementer interrupts becoming pending while a thread is napping,
    and deliver those interrupts to the guest when the thread wakes.
    This is done in kvmppc_cede_reentry, just before fast_guest_return.
    
    Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
    and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
    from kvm_arch_vcpu_runnable.
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
    19ccb76a
book3s_hv_rmhandlers.S 35.1 KB