1. 18 Aug, 2017 18 commits
  2. 17 Aug, 2017 2 commits
  3. 15 Aug, 2017 1 commit
  4. 11 Aug, 2017 3 commits
    • Ritesh Harjani's avatar
      cfq: Give a chance for arming slice idle timer in case of group_idle · b3193bc0
      Ritesh Harjani authored
      In below scenario blkio cgroup does not work as per their assigned
      weights :-
      1. When the underlying device is nonrotational with a single HW queue
      with depth of >= CFQ_HW_QUEUE_MIN
      2. When the use case is forming two blkio cgroups cg1(weight 1000) &
      cg2(wight 100) and two processes(file1 and file2) doing sync IO in
      their respective blkio cgroups.
      
      For above usecase result of fio (without this patch):-
      file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan  1 19:41:49 1970
        write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
      <...>
      file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan  1 19:41:49 1970
        write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
      <...>
      // both the process BW is equal even though they belong to diff.
      cgroups with weight of 1000(cg1) and 100(cg2)
      
      In above case (for non rotational NCQ devices),
      as soon as the request from cg1 is completed and even
      though it is provided with higher set_slice=10, because of CFQ
      algorithm when the driver tries to fetch the request, CFQ expires
      this group without providing any idle time nor weight priority
      and schedules another cfq group (in this case cg2).
      And thus both cfq groups(cg1 & cg2) keep alternating to get the
      disk time and hence loses the cgroup weight based scheduling.
      
      Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
      to arm the slice timer in case group_idle is enabled.
      In case if group_idle is also not required (including for nonrotational
      NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
      such cases.
      
      With this patch result of fio(for above usecase) :-
      file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan  1 00:06:08 1970
        write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
      <..>
      file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan  1 00:06:08 1970
        write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
      <..>
      // In this processes BW is as per their respective cgroups weight.
      Signed-off-by: default avatarRitesh Harjani <riteshh@codeaurora.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b3193bc0
    • Paolo Valente's avatar
      block, bfq: boost throughput with flash-based non-queueing devices · edaf9428
      Paolo Valente authored
      When a queue associated with a process remains empty, there are cases
      where throughput gets boosted if the device is idled to await the
      arrival of a new I/O request for that queue. Currently, BFQ assumes
      that one of these cases is when the device has no internal queueing
      (regardless of the properties of the I/O being served). Unfortunately,
      this condition has proved to be too general. So, this commit refines it
      as "the device has no internal queueing and is rotational".
      
      This refinement provides a significant throughput boost with random
      I/O, on flash-based storage without internal queueing. For example, on
      a HiKey board, throughput increases by up to 125%, growing, e.g., from
      6.9MB/s to 15.6MB/s with two or three random readers in parallel.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarLuca Miccio <lucmiccio@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      edaf9428
    • Paolo Valente's avatar
      block,bfq: refactor device-idling logic · d5be3fef
      Paolo Valente authored
      The logic that decides whether to idle the device is scattered across
      three functions. Almost all of the logic is in the function
      bfq_bfqq_may_idle, but (1) part of the decision is made in
      bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
      switch off idling regardless of the output of bfq_bfqq_may_idle. In
      addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
      their decisions as a function of parameters that are used, for similar
      purposes, also in bfq_bfqq_may_idle. This commit addresses these
      issues by moving all the logic into bfq_bfqq_may_idle.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d5be3fef
  5. 10 Aug, 2017 1 commit
  6. 09 Aug, 2017 7 commits
  7. 07 Aug, 2017 2 commits
  8. 02 Aug, 2017 1 commit
  9. 01 Aug, 2017 1 commit
    • Jens Axboe's avatar
      blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled · b7a71e66
      Jens Axboe authored
      We recently had a bug in the IPR SCSI driver, where it would end up
      making the SCSI mid layer run the mq hardware queue with interrupts
      disabled. This isn't legal, since the software queue locking relies
      on never being grabbed from interrupt context. Additionally, drivers
      that set BLK_MQ_F_BLOCKING may schedule from this context.
      
      Add a WARN_ON_ONCE() to catch bad users up front.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b7a71e66
  10. 29 Jul, 2017 4 commits
    • Jens Axboe's avatar
      blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flags · 18e9781d
      Jens Axboe authored
      We know we're in process context, so don't bother using the
      IRQ safe versions of the spin lock.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      18e9781d
    • Arnd Bergmann's avatar
      block: DAC960: shut up format-overflow warning · 33027c2b
      Arnd Bergmann authored
      gcc-7 points out that a large controller number would overflow the
      string length for the procfs name and the firmware version string:
      
      drivers/block/DAC960.c: In function 'DAC960_Probe':
      drivers/block/DAC960.c:6591:38: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
      drivers/block/DAC960.c: In function 'DAC960_V1_ReadControllerConfiguration':
      drivers/block/DAC960.c:1681:40: error: '%02d' directive writing between 2 and 3 bytes into a region of size between 2 and 5 [-Werror=format-overflow=]
      drivers/block/DAC960.c:1681:40: note: directive argument in the range [0, 255]
      drivers/block/DAC960.c:1681:3: note: 'sprintf' output between 10 and 14 bytes into a destination of size 12
      
      Both of these seem appropriately sized, and using snprintf()
      instead of sprintf() improves this by ensuring that even
      incorrect data won't cause undefined behavior here.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      33027c2b
    • Shaohua Li's avatar
      block: use standard blktrace API to output cgroup info for debug notes · 35fe6d76
      Shaohua Li authored
      Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
      way. Now we have standard blktrace API for this, so convert them to use
      it.
      
      Note, this changes the behavior a little bit. cgroup info isn't output
      by default, we only do this with 'blk_cgroup' option enabled. cgroup
      info isn't output as a string by default too, we only do this with
      'blk_cgname' option enabled. Also cgroup info is output in different
      position of the note string. I think these behavior changes aren't a big
      issue (actually we make trace data shorter which is good), since the
      blktrace note is solely for debugging.
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      35fe6d76
    • Shaohua Li's avatar
      blktrace: add an option to allow displaying cgroup path · 69fd5c39
      Shaohua Li authored
      By default we output cgroup id in blktrace. This adds an option to
      display cgroup path. Since get cgroup path is a relativly heavy
      operation, we don't enable it by default.
      
      with the option enabled, blktrace will output something like this:
      dd-1353  [007] d..2   293.015252:   8,0   /test/level  D   R 24 + 8 [dd]
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      69fd5c39