• Christophe Leroy's avatar
    splice: sendfile() at once fails for big files · 32e87ff7
    Christophe Leroy authored
    commit 0ff28d9f upstream.
    
    Using sendfile with below small program to get MD5 sums of some files,
    it appear that big files (over 64kbytes with 4k pages system) get a
    wrong MD5 sum while small files get the correct sum.
    This program uses sendfile() to send a file to an AF_ALG socket
    for hashing.
    
    /* md5sum2.c */
    
    int main(int argc, char **argv)
    {
    	int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
    	struct stat st;
    	struct sockaddr_alg sa = {
    		.salg_family = AF_ALG,
    		.salg_type = "hash",
    		.salg_name = "md5",
    	};
    	int n;
    
    	bind(sk, (struct sockaddr*)&sa, sizeof(sa));
    
    	for (n = 1; n < argc; n++) {
    		int size;
    		int offset = 0;
    		char buf[4096];
    		int fd;
    		int sko;
    		int i;
    
    		fd = open(argv[n], O_RDONLY);
    		sko = accept(sk, NULL, 0);
    		fstat(fd, &st);
    		size = st.st_size;
    		sendfile(sko, fd, &offset, size);
    		size = read(sko, buf, sizeof(buf));
    		for (i = 0; i < size; i++)
    			printf("%2.2x", buf[i]);
    		printf("  %s\n", argv[n]);
    		close(fd);
    		close(sko);
    	}
    	exit(0);
    }
    
    Test below is done using official linux patch files. First result is
    with a software based md5sum. Second result is with the program above.
    
    root@vgoip:~# ls -l patch-3.6.*
    -rw-r--r--    1 root     root         64011 Aug 24 12:01 patch-3.6.2.gz
    -rw-r--r--    1 root     root         94131 Aug 24 12:01 patch-3.6.3.gz
    
    root@vgoip:~# md5sum patch-3.6.*
    b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
    c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
    
    root@vgoip:~# ./md5sum2 patch-3.6.*
    b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
    5fd77b24e68bb24dcc72d6e57c64790e  patch-3.6.3.gz
    
    After investivation, it appears that sendfile() sends the files by blocks
    of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
    block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
    is reset as if it was the end of the file.
    
    This patch adds SPLICE_F_MORE to the flags when more data is pending.
    
    With the patch applied, we get the correct sums:
    
    root@vgoip:~# md5sum patch-3.6.*
    b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
    c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
    
    root@vgoip:~# ./md5sum2 patch-3.6.*
    b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
    c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
    Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
    Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Cc: Ben Hutchings <ben@decadent.org.uk>
    Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
    32e87ff7
splice.c 46.4 KB