• Frederic Barrat's avatar
    ocxl: Fix concurrent AFU open and device removal · a58d37bc
    Frederic Barrat authored
    If an ocxl device is unbound through sysfs at the same time its AFU is
    being opened by a user process, the open code may dereference freed
    stuctures, which can lead to kernel oops messages. You'd have to hit a
    tiny time window, but it's possible. It's fairly easy to test by
    making the time window bigger artificially.
    
    Fix it with a combination of 2 changes:
      - when an AFU device is found in the IDR by looking for the device
        minor number, we should hold a reference on the device until after
        the context is allocated. A reference on the AFU structure is kept
        when the context is allocated, so we can release the reference on
        the device after the context allocation.
      - with the fix above, there's still another even tinier window,
        between the time the AFU device is found in the IDR and the
        reference on the device is taken. We can fix this one by removing
        the IDR entry earlier, when the device setup is removed, instead
        of waiting for the 'release' device callback. With proper locking
        around the IDR.
    
    Fixes: 75ca758a ("ocxl: Create a clear delineation between ocxl backend & frontend")
    Cc: stable@vger.kernel.org # v5.2+
    Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
    Reviewed-by: default avatarGreg Kurz <groug@kaod.org>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20190624144148.32022-1-fbarrat@linux.ibm.com
    a58d37bc
file.c 13.2 KB