• Yonghong Song's avatar
    net: fix pos incrementment in ipv6_route_seq_next · 6617dfd4
    Yonghong Song authored
    Commit 4fc427e0 ("ipv6_route_seq_next should increase position index")
    tried to fix the issue where seq_file pos is not increased
    if a NULL element is returned with seq_ops->next(). See bug
      https://bugzilla.kernel.org/show_bug.cgi?id=206283
    The commit effectively does:
      - increase pos for all seq_ops->start()
      - increase pos for all seq_ops->next()
    
    For ipv6_route, increasing pos for all seq_ops->next() is correct.
    But increasing pos for seq_ops->start() is not correct
    since pos is used to determine how many items to skip during
    seq_ops->start():
      iter->skip = *pos;
    seq_ops->start() just fetches the *current* pos item.
    The item can be skipped only after seq_ops->show() which essentially
    is the beginning of seq_ops->next().
    
    For example, I have 7 ipv6 route entries,
      root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
      00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
      fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
      00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
      00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
      fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
      ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001     eth0
      00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
      0+1 records in
      0+1 records out
      1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
      root@arch-fb-vm1:~/net-next
    
    In the above, I specify buffer size 4096, so all records can be returned
    to user space with a single trip to the kernel.
    
    If I use buffer size 128, since each record size is 149, internally
    kernel seq_read() will read 149 into its internal buffer and return the data
    to user space in two read() syscalls. Then user read() syscall will trigger
    next seq_ops->start(). Since the current implementation increased pos even
    for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
    record is #1.
    
      root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
      00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
      00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
      fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
      00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
    4+1 records in
    4+1 records out
    600 bytes copied, 0.00127758 s, 470 kB/s
    
    To fix the problem, create a fake pos pointer so seq_ops->start()
    won't actually increase seq_file pos. With this fix, the
    above `dd` command with `bs=128` will show correct result.
    
    Fixes: 4fc427e0 ("ipv6_route_seq_next should increase position index")
    Cc: Alexei Starovoitov <ast@kernel.org>
    Suggested-by: default avatarVasily Averin <vvs@virtuozzo.com>
    Reviewed-by: default avatarVasily Averin <vvs@virtuozzo.com>
    Signed-off-by: default avatarYonghong Song <yhs@fb.com>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    6617dfd4
ip6_fib.c 62.4 KB