1. 07 Apr, 2017 1 commit
    • NeilBrown's avatar
      block: simple improvements for bio->flags · dbde775c
      NeilBrown authored
      The comment for the 'flags' field of 'bio' mentions
      "command" which is no longer stored there, and doesn't
      mention the bvec pool number, which is.
      
      BIO_RESET_BITS is set in such a way that it would need to be
      updated if new bits were added, which is easy to miss.
      
      BVEC_POOL_BITS is larger than needed.  The BVEC_POOL_IDX()
      ranges from 0 to 6, so 3 bits are sufficient.
      
      This patch make improvements in each of these areas.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dbde775c
  2. 05 Apr, 2017 9 commits
    • Jens Axboe's avatar
      block: move timeout field in struct request to pack better · 1dd5198b
      Jens Axboe authored
      After commit 64c7f1d1, we went from 1 to 2 holes in my
      test setup. If we move the timeout field a bit, we remove
      both of those holes and shrink struct request by 8 bytes.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1dd5198b
    • Christoph Hellwig's avatar
      block, scsi: move the retries field to struct scsi_request · 64c7f1d1
      Christoph Hellwig authored
      Instead of bloating the generic struct request with it.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      64c7f1d1
    • Christoph Hellwig's avatar
      nvme: move the retries count to struct nvme_request · 44e44b29
      Christoph Hellwig authored
      The way NVMe uses this field is entirely different from the older
      SCSI/BLOCK_PC usage, so move it into struct nvme_request.
      
      Also reduce the size of the file to a unsigned char so that we leave
      space for additional smaller fields that will appear soon.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      44e44b29
    • Christoph Hellwig's avatar
      83f3aeb3
    • Christoph Hellwig's avatar
      nvme: cleanup nvme_req_needs_retry · f6324b1b
      Christoph Hellwig authored
      Don't pass the status explicitly but derive it from the requeust,
      and unwind the complex condition to be more readable.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f6324b1b
    • Christoph Hellwig's avatar
      nvme: move ->retries setup to nvme_setup_cmd · 987f699a
      Christoph Hellwig authored
      ->retries is counting the number of times a command is resubmitted, and
      be cleared on the first time we see the command.  We currently don't do
      that for non-PCIe command, which is easily fixed by moving the setup
      to common code.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      987f699a
    • Christoph Hellwig's avatar
      remove the obsolete hd driver · 8e14be53
      Christoph Hellwig authored
      This driver is for pre-IDE hardisk that are only found in PC from the
      stoneage of personal computing, and which we don't support elsewhere
      in the kernel these days.
      
      It's also been marked broken forever.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8e14be53
    • Bart Van Assche's avatar
      blk-mq: Remove blk_mq_queue_data.list · f2fbc9dd
      Bart Van Assche authored
      The block layer core sets blk_mq_queue_data.list but no block
      drivers read that member. Hence remove it and also the code that
      is used to set this member.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f2fbc9dd
    • Jan Kara's avatar
      cfq: Disable writeback throttling by default · 142bbdfc
      Jan Kara authored
      Writeback throttling does not play well with CFQ since that also tries
      to throttle async writes. As a result async writeback can get starved in
      presence of readers. As an example take a benchmark simulating
      postgreSQL database running over a standard rotating SATA drive. There
      are 16 processes doing random reads from a huge file (2*machine memory),
      1 process doing random writes to the huge file and calling fsync once
      per 50000 writes and 1 process doing sequential 8k writes to a
      relatively small file wrapping around at the end of the file and calling
      fsync every 5 writes. Under this load read latency easily exceeds the
      target latency of 75 ms (just because there are so many reads happening
      against a relatively slow disk) and thus writeback is throttled to a
      point where only 1 write request is allowed at a time. Blktrace data
      then looks like:
      
        8,0    1        0     8.347751764     0  m   N cfq workload slice:40000000
        8,0    1        0     8.347755256     0  m   N cfq293A  / set_active wl_class: 0 wl_type:0
        8,0    1        0     8.347784100     0  m   N cfq293A  / Not idling.  st->count:1
        8,0    1     3814     8.347763916  5839 UT   N [kworker/u9:2] 1
        8,0    0        0     8.347777605     0  m   N cfq293A  / Not idling.  st->count:1
        8,0    1        0     8.347784100     0  m   N cfq293A  / Not idling.  st->count:1
        8,0    3     1596     8.354364057     0  C   R 156109528 + 8 (6906954) [0]
        8,0    3        0     8.354383193     0  m   N cfq6196SN / complete rqnoidle 0
        8,0    3        0     8.354386476     0  m   N cfq schedule dispatch
        8,0    3        0     8.354399397     0  m   N cfq293A  / Not idling.  st->count:1
        8,0    3        0     8.354404705     0  m   N cfq293A  / dispatch_insert
        8,0    3        0     8.354409454     0  m   N cfq293A  / dispatched a request
        8,0    3        0     8.354412527     0  m   N cfq293A  / activate rq, drv=1
        8,0    3     1597     8.354414692     0  D   W 145961400 + 24 (6718452) [swapper/0]
        8,0    3        0     8.354484184     0  m   N cfq293A  / Not idling.  st->count:1
        8,0    3        0     8.354487536     0  m   N cfq293A  / slice expired t=0
        8,0    3        0     8.354498013     0  m   N / served: vt=5888102466265088 min_vt=5888074869387264
        8,0    3        0     8.354502692     0  m   N cfq293A  / sl_used=6737519 disp=1 charge=6737519 iops=0 sect=24
        8,0    3        0     8.354505695     0  m   N cfq293A  / del_from_rr
      ...
        8,0    0     1810     8.354728768     0  C   W 145961400 + 24 (314076) [0]
        8,0    0        0     8.354746927     0  m   N cfq293A  / complete rqnoidle 0
      ...
        8,0    1     3829     8.389886102  5839  G   W 145962968 + 24 [kworker/u9:2]
        8,0    1     3830     8.389888127  5839  P   N [kworker/u9:2]
        8,0    1     3831     8.389908102  5839  A   W 145978336 + 24 <- (8,4) 44000
        8,0    1     3832     8.389910477  5839  Q   W 145978336 + 24 [kworker/u9:2]
        8,0    1     3833     8.389914248  5839  I   W 145962968 + 24 (28146) [kworker/u9:2]
        8,0    1        0     8.389919137     0  m   N cfq293A  / insert_request
        8,0    1        0     8.389924305     0  m   N cfq293A  / add_to_rr
        8,0    1     3834     8.389933175  5839 UT   N [kworker/u9:2] 1
      ...
        8,0    0        0     9.455290997     0  m   N cfq workload slice:40000000
        8,0    0        0     9.455294769     0  m   N cfq293A  / set_active wl_class:0 wl_type:0
        8,0    0        0     9.455303499     0  m   N cfq293A  / fifo=ffff880003166090
        8,0    0        0     9.455306851     0  m   N cfq293A  / dispatch_insert
        8,0    0        0     9.455311251     0  m   N cfq293A  / dispatched a request
        8,0    0        0     9.455314324     0  m   N cfq293A  / activate rq, drv=1
        8,0    0     2043     9.455316210  6204  D   W 145962968 + 24 (1065401962) [pgioperf]
        8,0    0        0     9.455392407     0  m   N cfq293A  / Not idling.  st->count:1
        8,0    0        0     9.455395969     0  m   N cfq293A  / slice expired t=0
        8,0    0        0     9.455404210     0  m   N / served: vt=5888958194597888 min_vt=5888941810597888
        8,0    0        0     9.455410077     0  m   N cfq293A  / sl_used=4000000 disp=1 charge=4000000 iops=0 sect=24
        8,0    0        0     9.455416851     0  m   N cfq293A  / del_from_rr
      ...
        8,0    0     2045     9.455648515     0  C   W 145962968 + 24 (332305) [0]
        8,0    0        0     9.455668350     0  m   N cfq293A  / complete rqnoidle 0
      ...
        8,0    1     4371     9.455710115  5839  G   W 145978336 + 24 [kworker/u9:2]
        8,0    1     4372     9.455712350  5839  P   N [kworker/u9:2]
        8,0    1     4373     9.455730159  5839  A   W 145986616 + 24 <- (8,4) 52280
        8,0    1     4374     9.455732674  5839  Q   W 145986616 + 24 [kworker/u9:2]
        8,0    1     4375     9.455737563  5839  I   W 145978336 + 24 (27448) [kworker/u9:2]
        8,0    1        0     9.455742871     0  m   N cfq293A  / insert_request
        8,0    1        0     9.455747550     0  m   N cfq293A  / add_to_rr
        8,0    1     4376     9.455756629  5839 UT   N [kworker/u9:2] 1
      
      So we can see a Q event for a write request, then IO is blocked by
      writeback throttling and G and I events for the request happen only once
      other writeback IO is completed. Thus CFQ always sees only one write
      request. When it sees it, it queues the async queue behind all the read
      queues and the async queue gets scheduled after about one second. When
      it is scheduled, that one request gets dispatched and async queue is
      expired as it has no more requests to submit. Overall we submit about
      one write request per second.
      
      Although this scheduling is beneficial for read latency, writes are
      heavily starved and this causes large delays all over the system (due to
      processes blocking on page lock, transaction starts, etc.). When
      writeback throttling is disabled, write throughput is about one fifth of
      a read throughput which roughly matches readers/writers ratio and
      overall the system stalls are much shorter.
      
      Mixing writeback throttling logic with CFQ throttling logic is always a
      recipe for surprises as CFQ assumes it sees the big part of the picture
      which is not necessarily true when writeback throttling is blocking
      requests. So disable writeback throttling logic by default when CFQ is
      used as an IO scheduler.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      142bbdfc
  3. 04 Apr, 2017 30 commits