1. 17 Apr, 2017 8 commits
    • Josef Bacik's avatar
      nbd: only clear the queue on device teardown · 2516ab15
      Josef Bacik authored
      When running a disconnect torture test I noticed that sometimes we would
      crash with a negative ref count on our queue.  This was because we were
      ending the same request twice.  Turns out we were racing with
      NBD_CLEAR_SOCK clearing the requests as well as the teardown of the
      device clearing the requests.  So instead make the ioctl only shutdown
      the sockets and make it so that we only ever run nbd_clear_que from the
      device teardown.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2516ab15
    • Josef Bacik's avatar
      nbd: multicast dead link notifications · 799f9a38
      Josef Bacik authored
      Provide a mechanism to notify userspace that there's been a link problem
      on a NBD device.  This will allow userspace to re-establish a connection
      and provide the new socket to the device without disrupting the device.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      799f9a38
    • Josef Bacik's avatar
      nbd: add a reconfigure netlink command · b7aa3d39
      Josef Bacik authored
      We want to be able to reconnect dead connections to existing block
      devices, so add a reconfigure netlink command.  We will also allow users
      to change their timeout on the fly, but everything else will require a
      disconnect and reconnect.  You won't be able to add more connections
      either, simply replace dead connections with new more lively
      connections.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b7aa3d39
    • Josef Bacik's avatar
      nbd: add a basic netlink interface · e46c7287
      Josef Bacik authored
      The existing ioctl interface for configuring NBD devices is a bit
      cumbersome and hard to extend.  The other problem is we leave a
      userspace app sitting in it's syscall until the device disconnects,
      which is less than ideal.
      
      This patch introduces a netlink interface for adding and disconnecting
      nbd devices.  This has the benefits of being easily extendable without
      breaking older userspace applications, and allows us to configure a nbd
      device without leaving a userspace app sitting waiting for the device to
      disconnect.
      
      With this interface we also gain the ability to configure more devices
      than are preallocated at insmod time.  We also have gained the ability
      to not specify a particular device and be provided one for us so that
      userspace doesn't need to find a free device to configure.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e46c7287
    • Josef Bacik's avatar
      nbd: stop using the bdev everywhere · 29eaadc0
      Josef Bacik authored
      In preparation for the upcoming netlink interface we need to not rely on
      already having the bdev for the NBD device we are doing operations on.
      Instead of passing the bdev around, just use it in places where we know
      we already have the bdev.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      29eaadc0
    • Josef Bacik's avatar
      nbd: separate out the config information · 5ea8d108
      Josef Bacik authored
      In order to properly refcount the various aspects of a NBD device we
      need to separate out the configuration elements of the nbd device.  The
      configuration of a NBD device has a different lifetime from the actual
      device, so it doesn't make sense to bundle these two concepts.  Add a
      config_refs to keep track of the configuration structure, that way we
      can be sure that we never access it when we've torn down the device.
      Add a new nbd_config structure to hold all of the transient
      configuration information.  Finally create this when we open the device
      so that it is in place when we start to configure the device.  This has
      a nice side-effect of fixing a long standing problem where you could end
      up with a half-configured nbd device that needed to be "disconnected" in
      order to be usable again.  Now once we close our device the
      configuration will be discarded.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5ea8d108
    • Josef Bacik's avatar
      nbd: handle single path failures gracefully · f3733247
      Josef Bacik authored
      Currently if we have multiple connections and one of them goes down we will tear
      down the whole device.  However there's no reason we need to do this as we
      could have other connections that are working fine.  Deal with this by keeping
      track of the state of the different connections, and if we lose one we mark it
      as dead and send all IO destined for that socket to one of the other healthy
      sockets.  Any outstanding requests that were on the dead socket will timeout and
      be re-submitted properly.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f3733247
    • Josef Bacik's avatar
      nbd: put socket in error cases · 9b1355d5
      Josef Bacik authored
      When adding a new socket we look it up and then try to add it to our
      configuration.  If any of those steps fail we need to make sure we put
      the socket so we don't leak them.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9b1355d5
  2. 16 Apr, 2017 19 commits
  3. 14 Apr, 2017 7 commits
    • Dan Carpenter's avatar
      net: off by one in inet6_pton() · a88086e0
      Dan Carpenter authored
      If "scope_len" is sizeof(scope_id) then we would put the NUL terminator
      one space beyond the end of the buffer.
      
      Fixes: b1a951fe ("net/utils: generic inet_pton_with_scope helper")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a88086e0
    • Omar Sandoval's avatar
      blk-mq: introduce Kyber multiqueue I/O scheduler · 00e04393
      Omar Sandoval authored
      The Kyber I/O scheduler is an I/O scheduler for fast devices designed to
      scale to multiple queues. Users configure only two knobs, the target
      read and synchronous write latencies, and the scheduler tunes itself to
      achieve that latency goal.
      
      The implementation is based on "tokens", built on top of the scalable
      bitmap library. Tokens serve as a mechanism for limiting requests. There
      are two tiers of tokens: queueing tokens and dispatch tokens.
      
      A queueing token is required to allocate a request. In fact, these
      tokens are actually the blk-mq internal scheduler tags, but the
      scheduler manages the allocation directly in order to implement its
      policy.
      
      Dispatch tokens are device-wide and split up into two scheduling
      domains: reads vs. writes. Each hardware queue dispatches batches
      round-robin between the scheduling domains as long as tokens are
      available for that domain.
      
      These tokens can be used as the mechanism to enable various policies.
      The policy Kyber uses is inspired by active queue management techniques
      for network routing, similar to blk-wbt. The scheduler monitors
      latencies and scales the number of dispatch tokens accordingly. Queueing
      tokens are used to prevent starvation of synchronous requests by
      asynchronous requests.
      
      Various extensions are possible, including better heuristics and ionice
      support. The new scheduler isn't set as the default yet.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      00e04393
    • Omar Sandoval's avatar
      blk-mq-sched: make completed_request() callback more useful · c05f8525
      Omar Sandoval authored
      Currently, this callback is called right after put_request() and has no
      distinguishable purpose. Instead, let's call it before put_request() as
      soon as I/O has completed on the request, before we account it in
      blk-stat. With this, Kyber can enable stats when it sees a latency
      outlier and make sure the outlier gets accounted.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c05f8525
    • Omar Sandoval's avatar
      blk-mq: export helpers · 5b727272
      Omar Sandoval authored
      blk_mq_finish_request() is required for schedulers that define their own
      put_request(). blk_mq_run_hw_queue() is required for schedulers that
      hold back requests to be run later.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5b727272
    • Omar Sandoval's avatar
      blk-mq: add shallow depth option for blk_mq_get_tag() · 229a9287
      Omar Sandoval authored
      Wire up the sbitmap_get_shallow() operation to the tag code so that a
      caller can limit the number of tags available to it.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      229a9287
    • Omar Sandoval's avatar
      sbitmap: add sbitmap_get_shallow() operation · c05e6673
      Omar Sandoval authored
      This operation supports the use case of limiting the number of bits that
      can be allocated for a given operation. Rather than setting aside some
      bits at the end of the bitmap, we can set aside bits in each word of the
      bitmap. This means we can keep the allocation hints spread out and
      support sbitmap_resize() nicely at the cost of lower granularity for the
      allowed depth.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c05e6673
    • Christoph Hellwig's avatar
      remove the mg_disk driver · 84253394
      Christoph Hellwig authored
      This drivers was added in 2008, but as far as a I can tell we never had a
      single platform that actually registered resources for the platform driver.
      
      It's also been unmaintained for a long time and apparently has a ATA mode
      that can be driven using the IDE/libata subsystem.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      84253394
  4. 11 Apr, 2017 1 commit
    • Jan Kara's avatar
      block: Fix list corruption of blk stats callback list · 3f19cd23
      Jan Kara authored
      When CFQ calls wbt_disable_default(), it will call
      blk_stat_remove_callback() to stop gathering IO statistics for the
      purposes of writeback throttling. Later, when request_queue is
      unregistered, wbt_exit() will call blk_stat_remove_callback() again
      which will try to delete callback from the list again and possibly cause
      list corruption.
      
      Fix the problem by making wbt_disable_default() called wbt_exit() which
      is properly guarded against being called multiple times.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      3f19cd23
  5. 10 Apr, 2017 2 commits
  6. 08 Apr, 2017 3 commits