• Ross Lagerwall's avatar
    xenbus: Avoid deadlock during suspend due to open transactions · d10e0cc1
    Ross Lagerwall authored
    During a suspend/resume, the xenwatch thread waits for all outstanding
    xenstore requests and transactions to complete. This does not work
    correctly for transactions started by userspace because it waits for
    them to complete after freezing userspace threads which means the
    transactions have no way of completing, resulting in a deadlock. This is
    trivial to reproduce by running this script and then suspending the VM:
    
        import pyxs, time
        c = pyxs.client.Client(xen_bus_path="/dev/xen/xenbus")
        c.connect()
        c.transaction()
        time.sleep(3600)
    
    Even if this deadlock were resolved, misbehaving userspace should not
    prevent a VM from being migrated. So, instead of waiting for these
    transactions to complete before suspending, store the current generation
    id for each transaction when it is started. The global generation id is
    incremented during resume. If the caller commits the transaction and the
    generation id does not match the current generation id, return EAGAIN so
    that they try again. If the transaction was instead discarded, return OK
    since no changes were made anyway.
    
    This only affects users of the xenbus file interface. In-kernel users of
    xenbus are assumed to be well-behaved and complete all transactions
    before freezing.
    Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
    Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
    Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
    d10e0cc1
xenbus.h 4.17 KB