• Andrea Arcangeli's avatar
    userfaultfd: document _IOR/_IOW · e067eba5
    Andrea Arcangeli authored
    Patch series "userfaultfd tmpfs/hugetlbfs/non-cooperative", v2
    
    These userfaultfd features are finished and are ready for larger
    exposure in -mm and upstream merging.
    
    1) tmpfs non present userfault
    2) hugetlbfs non present userfault
    3) non cooperative userfault for fork/madvise/mremap
    
    qemu development code is already exercising 2) and container postcopy
    live migration needs 3).
    
    1) is not currently used but there's a self test and we know some qemu
    user for various reasons uses tmpfs as backing for KVM so it'll need it
    too to use postcopy live migration with tmpfs memory.
    
    All review feedback from the previous submit has been handled and the
    fixes are included.  There's no outstanding issue AFIK.
    
    Upstream code just did a s/fe/vmf/ conversion in the page faults and
    this has been converted as well incrementally.
    
    In addition to the previous submits, this also wakes up stuck userfaults
    during UFFDIO_UNREGISTER.  The non cooperative testcase actually
    reproduced this problem by getting stuck instead of quitting clean in
    some rare case as it could call UFFDIO_UNREGISTER while some userfault
    could be still in flight.  The other option would have been to keep
    leaving it up to userland to serialize itself and to patch the testcase
    instead but the wakeup during unregister I think is preferable.
    
    David also asked the UFFD_FEATURE_MISSING_HUGETLBFS and
    UFFD_FEATURE_MISSING_SHMEM feature flags to be added so QEMU can avoid
    to probe if the hugetlbfs/shmem missing support is available by calling
    UFFDIO_REGISTER.  QEMU already checks HUGETLBFS_MAGIC with fstatfs so if
    UFFD_FEATURE_MISSING_HUGETLBFS is also set, it knows UFFDIO_REGISTER
    will succeed (or if it fails, it's for some other more concerning
    reason).  There's no reason to worry about adding too many feature
    flags.  There are 64 available and worst case we've to bump the API if
    someday we're really going to run out of them.
    
    The round-trip network latency of hugetlbfs userfaults during postcopy
    live migration is still of the order of dozen milliseconds on 10GBit if
    at 2MB hugepage granularity so it's working perfectly and it should
    provide for higher bandwidth or lower CPU usage (which makes it
    interesting to add an option in the future to support THP granularity
    too for anonymous memory, UFFDIO_COPY would then have to create THP if
    alignment/len allows for it).  1GB hugetlbfs granularity will require
    big changes in hugetlbfs to work so it's deferred for later.
    
    This patch (of 42):
    
    This adds proper documentation (inline) to avoid the risk of further
    misunderstandings about the semantics of _IOW/_IOR and it also reminds
    whoever will bump the UFFDIO_API in the future, to change the two ioctl
    to _IOW.
    
    This was found while implementing strace support for those ioctl,
    otherwise we could have never found it by just reviewing kernel code and
    testing it.
    
    _IOC_READ or _IOC_WRITE alters nothing but the ioctl number itself, so
    it's only worth fixing if the UFFDIO_API is bumped someday.
    
    Link: http://lkml.kernel.org/r/20161216144821.5183-2-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Reported-by: default avatar"Dmitry V. Levin" <ldv@altlinux.org>
    Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
    Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Pavel Emelyanov <xemul@parallels.com>
    Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e067eba5
ioctl.h 3.38 KB