• Li Li's avatar
    binder: fix freeze race · b564171a
    Li Li authored
    Currently cgroup freezer is used to freeze the application threads, and
    BINDER_FREEZE is used to freeze the corresponding binder interface.
    There's already a mechanism in ioctl(BINDER_FREEZE) to wait for any
    existing transactions to drain out before actually freezing the binder
    interface.
    
    But freezing an app requires 2 steps, freezing the binder interface with
    ioctl(BINDER_FREEZE) and then freezing the application main threads with
    cgroupfs. This is not an atomic operation. The following race issue
    might happen.
    
    1) Binder interface is frozen by ioctl(BINDER_FREEZE);
    2) Main thread A initiates a new sync binder transaction to process B;
    3) Main thread A is frozen by "echo 1 > cgroup.freeze";
    4) The response from process B reaches the frozen thread, which will
    unexpectedly fail.
    
    This patch provides a mechanism to check if there's any new pending
    transaction happening between ioctl(BINDER_FREEZE) and freezing the
    main thread. If there's any, the main thread freezing operation can
    be rolled back to finish the pending transaction.
    
    Furthermore, the response might reach the binder driver before the
    rollback actually happens. That will still cause failed transaction.
    
    As the other process doesn't wait for another response of the response,
    the response transaction failure can be fixed by treating the response
    transaction like an oneway/async one, allowing it to reach the frozen
    thread. And it will be consumed when the thread gets unfrozen later.
    
    NOTE: This patch reuses the existing definition of struct
    binder_frozen_status_info but expands the bit assignments of __u32
    member sync_recv.
    
    To ensure backward compatibility, bit 0 of sync_recv still indicates
    there's an outstanding sync binder transaction. This patch adds new
    information to bit 1 of sync_recv, indicating the binder transaction
    happens exactly when there's a race.
    
    If an existing userspace app runs on a new kernel, a sync binder call
    will set bit 0 of sync_recv so ioctl(BINDER_GET_FROZEN_INFO) still
    return the expected value (true). The app just doesn't check bit 1
    intentionally so it doesn't have the ability to tell if there's a race.
    This behavior is aligned with what happens on an old kernel which
    doesn't set bit 1 at all.
    
    A new userspace app can 1) check bit 0 to know if there's a sync binder
    transaction happened when being frozen - same as before; and 2) check
    bit 1 to know if that sync binder transaction happened exactly when
    there's a race - a new information for rollback decision.
    
    the same time, confirmed the pending transactions succeeded.
    
    Fixes: 432ff1e9 ("binder: BINDER_FREEZE ioctl")
    Acked-by: default avatarTodd Kjos <tkjos@google.com>
    Cc: stable <stable@vger.kernel.org>
    Signed-off-by: default avatarLi Li <dualli@google.com>
    Test: stress test with apps being frozen and initiating binder calls at
    Link: https://lore.kernel.org/r/20210910164210.2282716-2-dualli@chromium.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    b564171a
binder_internal.h 19.9 KB