• Michael Ellerman's avatar
    powerpc: Fix stack overflow crash in resume_kernel when ftracing · 0e37739b
    Michael Ellerman authored
    It's possible for us to crash when running with ftrace enabled, eg:
    
      Bad kernel stack pointer bffffd12 at c00000000000a454
      cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
          pc: c00000000000a454: resume_kernel+0x34/0x60
          lr: c00000000000335c: performance_monitor_common+0x15c/0x180
          sp: bffffd12
         msr: 8000000000001032
         dar: bffffd12
       dsisr: 42000000
    
    If we look at current's stack (paca->__current->stack) we see it is
    equal to c0000002ecab0000. Our stack is 16K, and comparing to
    paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
    kernel stack. This leads to us writing over our struct thread_info, and
    in this case we have corrupted thread_info->flags and set
    _TIF_EMULATE_STACK_STORE.
    
    Dumping the stack we see:
    
      3:mon> t c0000002ecab0000
      [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
      [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
      --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
      [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
      [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
      [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
      [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
      [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
      [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
      [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
      --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
      [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
      [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
      [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
      [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
      [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
      [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
      --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
    
      ... and so on
    
    __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
    path. At that point the irq state is not consistent, ie. interrupts are
    hard disabled (by the exception entry), but the paca soft-enabled flag
    may be out of sync.
    
    This leads to the local_irq_restore() in trace_graph_entry() actually
    enabling interrupts, which we do not want. Because we have not yet
    reprogrammed the decrementer we immediately take another decrementer
    exception, and recurse.
    
    The fix is twofold. Firstly make sure we call DISABLE_INTS before
    calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
    the irq state in the paca with the hardware, making it safe again to
    call local_irq_save/restore().
    
    Although that should be sufficient to fix the bug, we also mark the
    runlatch routines as notrace. They are called very early in the
    exception entry and we are asking for trouble tracing them. They are
    also fairly uninteresting and tracing them just adds unnecessary
    overhead.
    
    [ This regression was introduced by fe1952fc
      "powerpc: Rework runlatch code" by myself --BenH
    ]
    
    CC: <stable@vger.kernel.org> [v3.4+]
    Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    0e37739b
exception-64s.h 17.5 KB