• Eric W. Biederman's avatar
    mnt: Make propagate_umount less slow for overlapping mount propagation trees · 295f2370
    Eric W. Biederman authored
    [ Upstream commit 296990de ]
    
    Andrei Vagin pointed out that time to executue propagate_umount can go
    non-linear (and take a ludicrious amount of time) when the mount
    propogation trees of the mounts to be unmunted by a lazy unmount
    overlap.
    
    Make the walk of the mount propagation trees nearly linear by
    remembering which mounts have already been visited, allowing
    subsequent walks to detect when walking a mount propgation tree or a
    subtree of a mount propgation tree would be duplicate work and to skip
    them entirely.
    
    Walk the list of mounts whose propgatation trees need to be traversed
    from the mount highest in the mount tree to mounts lower in the mount
    tree so that odds are higher that the code will walk the largest trees
    first, allowing later tree walks to be skipped entirely.
    
    Add cleanup_umount_visitation to remover the code's memory of which
    mounts have been visited.
    
    Add the functions last_slave and skip_propagation_subtree to allow
    skipping appropriate parts of the mount propagation tree without
    needing to change the logic of the rest of the code.
    
    A script to generate overlapping mount propagation trees:
    
    $ cat runs.h
    set -e
    mount -t tmpfs zdtm /mnt
    mkdir -p /mnt/1 /mnt/2
    mount -t tmpfs zdtm /mnt/1
    mount --make-shared /mnt/1
    mkdir /mnt/1/1
    
    iteration=10
    if [ -n "$1" ] ; then
    	iteration=$1
    fi
    
    for i in $(seq $iteration); do
    	mount --bind /mnt/1/1 /mnt/1/1
    done
    
    mount --rbind /mnt/1 /mnt/2
    
    TIMEFORMAT='%Rs'
    nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 ))
    echo -n "umount -l /mnt/1 -> $nr        "
    time umount -l /mnt/1
    
    nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l )
    time umount -l /mnt/2
    
    $ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done
    
    Here are the performance numbers with and without the patch:
    
         mhash |  8192   |  8192  | 1048576 | 1048576
        mounts | before  | after  |  before | after
        ------------------------------------------------
          1025 |  0.040s | 0.016s |  0.038s | 0.019s
          2049 |  0.094s | 0.017s |  0.080s | 0.018s
          4097 |  0.243s | 0.019s |  0.206s | 0.023s
          8193 |  1.202s | 0.028s |  1.562s | 0.032s
         16385 |  9.635s | 0.036s |  9.952s | 0.041s
         32769 | 60.928s | 0.063s | 44.321s | 0.064s
         65537 |         | 0.097s |         | 0.097s
        131073 |         | 0.233s |         | 0.176s
        262145 |         | 0.653s |         | 0.344s
        524289 |         | 2.305s |         | 0.735s
       1048577 |         | 7.107s |         | 2.603s
    
    Andrei Vagin reports fixing the performance problem is part of the
    work to fix CVE-2016-6213.
    
    Cc: stable@vger.kernel.org
    Fixes: a05964f3 ("[PATCH] shared mounts handling: umount")
    Reported-by: default avatarAndrei Vagin <avagin@openvz.org>
    Reviewed-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
    Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
    295f2370
pnode.c 15.3 KB