1. 09 Feb, 2015 8 commits
    • Mike Snitzer's avatar
      dm table: train hybrid target type detection to select blk-mq if appropriate · 65803c20
      Mike Snitzer authored
      Otherwise replacing the multipath target with the error target fails:
        device-mapper: ioctl: can't change device type after initial table load.
      
      The error target was mistakenly considered to be target type
      DM_TYPE_REQUEST_BASED rather than DM_TYPE_MQ_REQUEST_BASED even if the
      target it was to replace was of type DM_TYPE_MQ_REQUEST_BASED.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      65803c20
    • Mike Snitzer's avatar
      dm: allocate requests in target when stacking on blk-mq devices · e5863d9a
      Mike Snitzer authored
      For blk-mq request-based DM the responsibility of allocating a cloned
      request is transfered from DM core to the target type.  Doing so
      enables the cloned request to be allocated from the appropriate
      blk-mq request_queue's pool (only the DM target, e.g. multipath, can
      know which block device to send a given cloned request to).
      
      Care was taken to preserve compatibility with old-style block request
      completion that requires request-based DM _not_ acquire the clone
      request's queue lock in the completion path.  As such, there are now 2
      different request-based DM target_type interfaces:
      1) the original .map_rq() interface will continue to be used for
         non-blk-mq devices -- the preallocated clone request is passed in
         from DM core.
      2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
         blk-mq devices -- blk_get_request() and blk_put_request() are used
         respectively from these hooks.
      
      dm_table_set_type() was updated to detect if the request-based target is
      being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
      DM core disallows switching the DM table's type after it is set.  This
      means that there is no mixing of non-blk-mq and blk-mq devices within
      the same request-based DM table.
      
      [This patch was started by Keith and later heavily modified by Mike]
      Tested-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e5863d9a
    • Keith Busch's avatar
      dm: prepare for allocating blk-mq clone requests in target · 466d89a6
      Keith Busch authored
      For blk-mq request-based DM the responsibility of allocating a cloned
      request will be transfered from DM core to the target type.
      
      To prepare for conditionally using this new model the original
      request's 'special' now points to the dm_rq_target_io because the
      clone is allocated later in the block layer rather than in DM core.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      466d89a6
    • Keith Busch's avatar
      dm: submit stacked requests in irq enabled context · 2eb6e1e3
      Keith Busch authored
      Switch to having request-based DM enqueue all prep'ed requests into work
      processed by another thread.  This allows request-based DM to invoke
      block APIs that assume interrupt enabled context (e.g. blk_get_request)
      and is a prerequisite for adding blk-mq support to request-based DM.
      
      The new kernel thread is only initialized for request-based DM devices.
      
      multipath_map() is now always in irq enabled context so change multipath
      spinlock (m->lock) locking to always disable interrupts.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2eb6e1e3
    • Mike Snitzer's avatar
      dm: split request structure out from dm_rq_target_io structure · 1ae49ea2
      Mike Snitzer authored
      Request-based DM support for blk-mq devices requires that
      dm_rq_target_io structures not be allocated with an embedded request
      structure.  The request-based DM target (e.g. dm-multipath) must
      allocate the request from the blk-mq devices' request_queue using
      blk_get_request().
      
      The unfortunate side-effect of this change is old-style request-based DM
      support will no longer use contiguous memory for the dm_rq_target_io and
      request structures for each clone.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      1ae49ea2
    • Mike Snitzer's avatar
      dm: remove exports for request-based interfaces without external callers · dbf9782c
      Mike Snitzer authored
      Remove exports for dm_dispatch_request, dm_requeue_unmapped_request,
      and dm_kill_unmapped_request.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      dbf9782c
    • Mike Snitzer's avatar
      dm: fix multipath regression due to initializing wrong request · db507b3f
      Mike Snitzer authored
      Commit febf7158 ("block: require blk_rq_prep_clone() be given an
      initialized clone request") introduced a regression by calling
      blk_rq_init() on the original request rather than the clone
      request that is passed to setup_clone().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Fixes: febf7158 ("block: require blk_rq_prep_clone() be given an initialized clone request")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      db507b3f
    • Konstantin Khlebnikov's avatar
      cfq-iosched: handle failure of cfq group allocation · 69abaffe
      Konstantin Khlebnikov authored
      Cfq_lookup_create_cfqg() allocates struct blkcg_gq using GFP_ATOMIC.
      In cfq_find_alloc_queue() possible allocation failure is not handled.
      As a result kernel oopses on NULL pointer dereference when
      cfq_link_cfqq_cfqg() calls cfqg_get() for NULL pointer.
      
      Bug was introduced in v3.5 in commit cd1604fa ("blkcg: factor
      out blkio_group creation"). Prior to that commit cfq group lookup
      had returned pointer to root group as fallback.
      
      This patch handles this error using existing fallback oom_cfqq.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Fixes: cd1604fa ("blkcg: factor out blkio_group creation")
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      69abaffe
  2. 05 Feb, 2015 8 commits
  3. 28 Jan, 2015 4 commits
  4. 23 Jan, 2015 2 commits
    • Shaohua Li's avatar
      blk-mq: add tag allocation policy · 24391c0d
      Shaohua Li authored
      This is the blk-mq part to support tag allocation policy. The default
      allocation policy isn't changed (though it's not a strict FIFO). The new
      policy is round-robin for libata. But it's a try-best implementation. If
      multiple tasks are competing, the tags returned will be mixed (which is
      unavoidable even with !mq, as requests from different tasks can be
      mixed in queue)
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      24391c0d
    • Shaohua Li's avatar
      block: support different tag allocation policy · ee1b6f7a
      Shaohua Li authored
      The libata tag allocation is using a round-robin policy. Next patch will
      make libata use block generic tag allocation, so let's add a policy to
      tag allocation.
      
      Currently two policies: FIFO (default) and round-robin.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ee1b6f7a
  5. 22 Jan, 2015 1 commit
  6. 21 Jan, 2015 2 commits
    • Martin K. Petersen's avatar
      block: Add discard flag to blkdev_issue_zeroout() function · d93ba7a5
      Martin K. Petersen authored
      blkdev_issue_discard() will zero a given block range. This is done by
      way of explicit writing, thus provisioning or allocating the blocks on
      disk.
      
      There are use cases where the desired behavior is to zero the blocks but
      unprovision them if possible. The blocks must deterministically contain
      zeroes when they are subsequently read back.
      
      This patch adds a flag to blkdev_issue_zeroout() that provides this
      variant. If the discard flag is set and a block device guarantees
      discard_zeroes_data we will use REQ_DISCARD to clear the block range. If
      the device does not support discard_zeroes_data or if the discard
      request fails we will fall back to first REQ_WRITE_SAME and then a
      regular REQ_WRITE.
      
      Also update the callers of blkdev_issue_zero() to reflect the new flag
      and make sb_issue_zeroout() prefer the discard approach.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d93ba7a5
    • Jeff Moyer's avatar
      cfq-iosched: fix incorrect filing of rt async cfqq · c6ce1943
      Jeff Moyer authored
      Hi,
      
      If you can manage to submit an async write as the first async I/O from
      the context of a process with realtime scheduling priority, then a
      cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
      ends up in the best effort array, but actually has realtime I/O
      scheduling priority set in cfqq->ioprio.
      
      The reason is that cfq_get_queue assumes the default scheduling class and
      priority when there is no information present (i.e. when the async cfqq
      is created):
      
      static struct cfq_queue *
      cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
      	      struct bio *bio, gfp_t gfp_mask)
      {
      	const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
      	const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
      
      cic->ioprio starts out as 0, which is "invalid".  So, class of 0
      (IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:
      
      		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
      
      static struct cfq_queue **
      cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
      {
              switch (ioprio_class) {
              case IOPRIO_CLASS_RT:
                      return &cfqd->async_cfqq[0][ioprio];
              case IOPRIO_CLASS_NONE:
                      ioprio = IOPRIO_NORM;
                      /* fall through */
              case IOPRIO_CLASS_BE:
                      return &cfqd->async_cfqq[1][ioprio];
              case IOPRIO_CLASS_IDLE:
                      return &cfqd->async_idle_cfqq;
              default:
                      BUG();
              }
      }
      
      Here, instead of returning a class mapped from the process' scheduling
      priority, we get back the bucket associated with IOPRIO_CLASS_BE.
      
      Now, there is no queue allocated there yet, so we create it:
      
      		cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
      
      That function ends up doing this:
      
      			cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
      			cfq_init_prio_data(cfqq, cic);
      
      cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
      data does this:
      
      	ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
      	switch (ioprio_class) {
      	default:
      		printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
      	case IOPRIO_CLASS_NONE:
      		/*
      		 * no prio set, inherit CPU scheduling settings
      		 */
      		cfqq->ioprio = task_nice_ioprio(tsk);
      		cfqq->ioprio_class = task_nice_ioclass(tsk);
      		break;
      
      So we basically have two code paths that treat IOPRIO_CLASS_NONE
      differently, which results in an RT async cfqq filed into a best effort
      bucket.
      
      Attached is a patch which fixes the problem.  I'm not sure how to make
      it cleaner.  Suggestions would be welcome.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Tested-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c6ce1943
  7. 14 Jan, 2015 2 commits
    • Jens Axboe's avatar
      blk-mq: fix false negative out-of-tags condition · 0bf36498
      Jens Axboe authored
      The blk-mq tagging tries to maintain some locality between CPUs and
      the tags issued. The tags are split into groups of words, and the
      words may not be fully populated. When searching for a new free tag,
      blk-mq may look at partial words, hence it passes in an offset/size
      to find_next_zero_bit(). However, it does that wrong, the size must
      always be the full length of the number of tags in that word,
      otherwise we'll potentially miss some near the end.
      
      Another issue is when __bt_get() goes from one word set to the next.
      It bumps the index, but not the last_tag associated with the
      previous index. Bump that to be in the range of the new word.
      
      Finally, clean up __bt_get() and __bt_get_word() a bit and get
      rid of the goto in there, and the unnecessary 'wrap' variable.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0bf36498
    • Matthew Wilcox's avatar
      block: Change direct_access calling convention · dd22f551
      Matthew Wilcox authored
      In order to support accesses to larger chunks of memory, pass in a
      'size' parameter (counted in bytes), and return the amount available at
      that address.
      
      Add a new helper function, bdev_direct_access(), to handle common
      functionality including partition handling, checking the length requested
      is positive, checking for the sector being page-aligned, and checking
      the length of the request does not pass the end of the partition.
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBoaz Harrosh <boaz@plexistor.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dd22f551
  8. 02 Jan, 2015 3 commits
  9. 31 Dec, 2014 1 commit
  10. 29 Dec, 2014 1 commit
  11. 28 Dec, 2014 4 commits
  12. 27 Dec, 2014 4 commits