• Michael Ellerman's avatar
    powerpc/mm: Fix set_memory_*() against concurrent accesses · 9f7853d7
    Michael Ellerman authored
    Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
    on one of his systems:
    
      kernel tried to execute exec-protected page (c008000004073278) - exploit attempt? (uid: 0)
      BUG: Unable to handle kernel instruction fetch
      Faulting instruction address: 0xc008000004073278
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
      CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
      Workqueue: events control_work_handler [virtio_console]
      NIP:  c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0
      REGS: c00000002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
      MSR:  800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002822 XER: 200400cf
      ...
      NIP fill_queue+0xf0/0x210 [virtio_console]
      LR  fill_queue+0xf0/0x210 [virtio_console]
      Call Trace:
        fill_queue+0xb4/0x210 [virtio_console] (unreliable)
        add_port+0x1a8/0x470 [virtio_console]
        control_work_handler+0xbc/0x1e8 [virtio_console]
        process_one_work+0x290/0x590
        worker_thread+0x88/0x620
        kthread+0x194/0x1a0
        ret_from_kernel_thread+0x5c/0x64
    
    Jordan, Fabiano & Murilo were able to reproduce and identify that the
    problem is caused by the call to module_enable_ro() in do_init_module(),
    which happens after the module's init function has already been called.
    
    Our current implementation of change_page_attr() is not safe against
    concurrent accesses, because it invalidates the PTE before flushing the
    TLB and then installing the new PTE. That leaves a window in time where
    there is no valid PTE for the page, if another CPU tries to access the
    page at that time we see something like the fault above.
    
    We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
    code doesn't handle a set_pte_at() of a valid PTE. See [1].
    
    But we do have pte_update(), which replaces the old PTE with the new,
    meaning there's no window where the PTE is invalid. And the hash MMU
    version hash__pte_update() deals with synchronising the hash page table
    correctly.
    
    [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r.fsf@linux.ibm.com/
    
    Fixes: 1f9ad21c ("powerpc/mm: Implement set_memory() routines")
    Reported-by: default avatarLaurent Vivier <lvivier@redhat.com>
    Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: default avatarMurilo Opsfelder Araújo <muriloo@linux.ibm.com>
    Tested-by: default avatarLaurent Vivier <lvivier@redhat.com>
    Signed-off-by: default avatarFabiano Rosas <farosas@linux.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210818120518.3603172-1-mpe@ellerman.id.au
    9f7853d7
pageattr.c 3.07 KB