1. 06 May, 2015 1 commit
    • Christophe Leroy's avatar
      splice: sendfile() at once fails for big files · 0ff28d9f
      Christophe Leroy authored
      Using sendfile with below small program to get MD5 sums of some files,
      it appear that big files (over 64kbytes with 4k pages system) get a
      wrong MD5 sum while small files get the correct sum.
      This program uses sendfile() to send a file to an AF_ALG socket
      for hashing.
      
      /* md5sum2.c */
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <string.h>
      #include <fcntl.h>
      #include <sys/socket.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <linux/if_alg.h>
      
      int main(int argc, char **argv)
      {
      	int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
      	struct stat st;
      	struct sockaddr_alg sa = {
      		.salg_family = AF_ALG,
      		.salg_type = "hash",
      		.salg_name = "md5",
      	};
      	int n;
      
      	bind(sk, (struct sockaddr*)&sa, sizeof(sa));
      
      	for (n = 1; n < argc; n++) {
      		int size;
      		int offset = 0;
      		char buf[4096];
      		int fd;
      		int sko;
      		int i;
      
      		fd = open(argv[n], O_RDONLY);
      		sko = accept(sk, NULL, 0);
      		fstat(fd, &st);
      		size = st.st_size;
      		sendfile(sko, fd, &offset, size);
      		size = read(sko, buf, sizeof(buf));
      		for (i = 0; i < size; i++)
      			printf("%2.2x", buf[i]);
      		printf("  %s\n", argv[n]);
      		close(fd);
      		close(sko);
      	}
      	exit(0);
      }
      
      Test below is done using official linux patch files. First result is
      with a software based md5sum. Second result is with the program above.
      
      root@vgoip:~# ls -l patch-3.6.*
      -rw-r--r--    1 root     root         64011 Aug 24 12:01 patch-3.6.2.gz
      -rw-r--r--    1 root     root         94131 Aug 24 12:01 patch-3.6.3.gz
      
      root@vgoip:~# md5sum patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      
      root@vgoip:~# ./md5sum2 patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      5fd77b24e68bb24dcc72d6e57c64790e  patch-3.6.3.gz
      
      After investivation, it appears that sendfile() sends the files by blocks
      of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
      block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
      is reset as if it was the end of the file.
      
      This patch adds SPLICE_F_MORE to the flags when more data is pending.
      
      With the patch applied, we get the correct sums:
      
      root@vgoip:~# md5sum patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      
      root@vgoip:~# ./md5sum2 patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0ff28d9f
  2. 04 May, 2015 2 commits
    • Shaohua Li's avatar
      blk-mq: don't lose requests if a stopped queue restarts · 9ba52e58
      Shaohua Li authored
      Normally if driver is busy to dispatch a request the logic is like below:
      block layer:					driver:
      	__blk_mq_run_hw_queue
      a.						blk_mq_stop_hw_queue
      b.	rq add to ctx->dispatch
      
      later:
      1.						blk_mq_start_hw_queue
      2.	__blk_mq_run_hw_queue
      
      But it's possible step 1-2 runs between a and b. And since rq isn't in
      ctx->dispatch yet, step 2 will not run rq. The rq might get lost if
      there are no subsequent requests kick in.
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9ba52e58
    • Shaohua Li's avatar
      blk-mq: fix FUA request hang · b2387ddc
      Shaohua Li authored
      When a FUA request enters its DATA stage of flush pipeline, the
      request is added to mq requeue list, the request will then be added to
      ctx->rq_list. blk_mq_attempt_merge() might merge the request with a bio.
      Later when the request is finished the flush pipeline, the
      request->__data_len is 0. Then I only saw the bio gets endio called, the
      original request never finish.
      
      Adding REQ_FLUSH_SEQ into REQ_NOMERGE_FLAGS looks an easy fix.
      
      stable: 3.15+
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b2387ddc
  3. 27 Apr, 2015 2 commits
    • NeilBrown's avatar
      block: destroy bdi before blockdev is unregistered. · 6cd18e71
      NeilBrown authored
      Because of the peculiar way that md devices are created (automatically
      when the device node is opened), a new device can be created and
      registered immediately after the
      	blk_unregister_region(disk_devt(disk), disk->minors);
      call in del_gendisk().
      
      Therefore it is important that all visible artifacts of the previous
      device are removed before this call.  In particular, the 'bdi'.
      
      Since:
      commit c4db59d3
      Author: Christoph Hellwig <hch@lst.de>
          fs: don't reassign dirty inodes to default_backing_dev_info
      
      moved the
         device_unregister(bdi->dev);
      call from bdi_unregister() to bdi_destroy() it has been quite easy to
      lose a race and have a new (e.g.) "md127" be created after the
      blk_unregister_region() call and before bdi_destroy() is ultimately
      called by the final 'put_disk', which must come after del_gendisk().
      
      The new device finds that the bdi name is already registered in sysfs
      and complains
      
      > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
      > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
      
      We can fix this by moving the bdi_destroy() call out of
      blk_release_queue() (which can happen very late when a refcount
      reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
      device driver calls it.
      
      Then it is only necessary for md to call blk_cleanup_queue() before
      del_gendisk().  As loop.c devices are also created on demand by
      opening the device node, we make the same change there.
      
      Fixes: c4db59d3Reported-by: default avatarAzat Khuzhin <a3at.mail@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org (v4.0)
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      6cd18e71
    • Wang YanQing's avatar
      block:bounce: fix call inc_|dec_zone_page_state on different pages confuse value of NR_BOUNCE · 393a3397
      Wang YanQing authored
      Commit d2c5e30c
      ("[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter")
      convert statistic of nr_bounce to per zone and one global value in vm_stat,
      but it call inc_|dec_zone_page_state on different pages, then different
      zones, and cause us to get unexpected value of NR_BOUNCE.
      
      Below is the result on my machine:
      Mar  2 09:26:08 udknight kernel: [144766.778265] Mem-Info:
      Mar  2 09:26:08 udknight kernel: [144766.778266] DMA per-cpu:
      Mar  2 09:26:08 udknight kernel: [144766.778268] CPU    0: hi:    0, btch:   1 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778269] CPU    1: hi:    0, btch:   1 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778270] Normal per-cpu:
      Mar  2 09:26:08 udknight kernel: [144766.778271] CPU    0: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778273] CPU    1: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778274] HighMem per-cpu:
      Mar  2 09:26:08 udknight kernel: [144766.778275] CPU    0: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778276] CPU    1: hi:  186, btch:  31 usd:   0
      Mar  2 09:26:08 udknight kernel: [144766.778279] active_anon:46926 inactive_anon:287406 isolated_anon:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  active_file:105085 inactive_file:139432 isolated_file:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  unevictable:653 dirty:0 writeback:0 unstable:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  free:178957 slab_reclaimable:6419 slab_unreclaimable:9966
      Mar  2 09:26:08 udknight kernel: [144766.778279]  mapped:4426 shmem:305277 pagetables:784 bounce:0
      Mar  2 09:26:08 udknight kernel: [144766.778279]  free_cma:0
      Mar  2 09:26:08 udknight kernel: [144766.778286] DMA free:3324kB min:68kB low:84kB high:100kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Mar  2 09:26:08 udknight kernel: [144766.778287] lowmem_reserve[]: 0 822 3754 3754
      Mar  2 09:26:08 udknight kernel: [144766.778293] Normal free:26828kB min:3632kB low:4540kB high:5448kB active_anon:4872kB inactive_anon:68kB active_file:1796kB inactive_file:1796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:842560kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:4144kB slab_reclaimable:25676kB slab_unreclaimable:39864kB kernel_stack:1944kB pagetables:3136kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2412612 all_unreclaimable? yes
      Mar  2 09:26:08 udknight kernel: [144766.778294] lowmem_reserve[]: 0 0 23451 23451
      Mar  2 09:26:08 udknight kernel: [144766.778299] HighMem free:685676kB min:512kB low:3748kB high:6984kB active_anon:182832kB inactive_anon:1149556kB active_file:418544kB inactive_file:555932kB unevictable:2612kB isolated(anon):0kB isolated(file):0kB present:3001732kB managed:3001732kB mlocked:0kB dirty:0kB writeback:0kB mapped:17704kB shmem:1216964kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:75771152kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      Mar  2 09:26:08 udknight kernel: [144766.778300] lowmem_reserve[]: 0 0 0 0
      
      You can see bounce:75771152kB for HighMem, but bounce:0 for lowmem and global.
      
      This patch fix it.
      Signed-off-by: default avatarWang YanQing <udknight@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      393a3397
  4. 23 Apr, 2015 5 commits
  5. 17 Apr, 2015 30 commits