1. 17 Jan, 2013 7 commits
  2. 28 Dec, 2012 4 commits
    • Sage Weil's avatar
      libceph: fix protocol feature mismatch failure path · 0fa6ebc6
      Sage Weil authored
      We should not set con->state to CLOSED here; that happens in
      ceph_fault() in the caller, where it first asserts that the state
      is not yet CLOSED.  Avoids a BUG when the features don't match.
      
      Since the fail_protocol() has become a trivial wrapper, replace
      calls to it with direct calls to reset_connection().
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      0fa6ebc6
    • Alex Elder's avatar
      libceph: WARN, don't BUG on unexpected connection states · 122070a2
      Alex Elder authored
      A number of assertions in the ceph messenger are implemented with
      BUG_ON(), killing the system if connection's state doesn't match
      what's expected.  At this point our state model is (evidently) not
      well understood enough for these assertions to trigger a BUG().
      Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...)
      so we learn about these issues without killing the machine.
      
      We now recognize that a connection fault can occur due to a socket
      closure at any time, regardless of the state of the connection.  So
      there is really nothing we can assert about the state of the
      connection at that point so eliminate that assertion.
      Reported-by: default avatarUgis <ugis22@gmail.com>
      Tested-by: default avatarUgis <ugis22@gmail.com>
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      122070a2
    • Alex Elder's avatar
      libceph: always reset osds when kicking · e6d50f67
      Alex Elder authored
      When ceph_osdc_handle_map() is called to process a new osd map,
      kick_requests() is called to ensure all affected requests are
      updated if necessary to reflect changes in the osd map.  This
      happens in two cases:  whenever an incremental map update is
      processed; and when a full map update (or the last one if there is
      more than one) gets processed.
      
      In the former case, the kick_requests() call is followed immediately
      by a call to reset_changed_osds() to ensure any connections to osds
      affected by the map change are reset.  But for full map updates
      this isn't done.
      
      Both cases should be doing this osd reset.
      
      Rather than duplicating the reset_changed_osds() call, move it into
      the end of kick_requests().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      e6d50f67
    • Alex Elder's avatar
      libceph: move linger requests sooner in kick_requests() · ab60b16d
      Alex Elder authored
      The kick_requests() function is called by ceph_osdc_handle_map()
      when an osd map change has been indicated.  Its purpose is to
      re-queue any request whose target osd is different from what it
      was when it was originally sent.
      
      It is structured as two loops, one for incomplete but registered
      requests, and a second for handling completed linger requests.
      As a special case, in the first loop if a request marked to linger
      has not yet completed, it is moved from the request list to the
      linger list.  This is as a quick and dirty way to have the second
      loop handle sending the request along with all the other linger
      requests.
      
      Because of the way it's done now, however, this quick and dirty
      solution can result in these incomplete linger requests never
      getting re-sent as desired.  The problem lies in the fact that
      the second loop only arranges for a linger request to be sent
      if it appears its target osd has changed.  This is the proper
      handling for *completed* linger requests (it avoids issuing
      the same linger request twice to the same osd).
      
      But although the linger requests added to the list in the first loop
      may have been sent, they have not yet completed, so they need to be
      re-sent regardless of whether their target osd has changed.
      
      The first required fix is we need to avoid calling __map_request()
      on any incomplete linger request.  Otherwise the subsequent
      __map_request() call in the second loop will find the target osd
      has not changed and will therefore not re-send the request.
      
      Second, we need to be sure that a sent but incomplete linger request
      gets re-sent.  If the target osd is the same with the new osd map as
      it was when the request was originally sent, this won't happen.
      This can be fixed through careful handling when we move these
      requests from the request list to the linger list, by unregistering
      the request *before* it is registered as a linger request.  This
      works because a side-effect of unregistering the request is to make
      the request's r_osd pointer be NULL, and *that* will ensure the
      second loop actually re-sends the linger request.
      
      Processing of such a request is done at that point, so continue with
      the next one once it's been moved.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      ab60b16d
  3. 20 Dec, 2012 6 commits
  4. 17 Dec, 2012 7 commits
    • Alex Elder's avatar
      libceph: socket can close in any connection state · 7bb21d68
      Alex Elder authored
      A connection's socket can close for any reason, independent of the
      state of the connection (and without irrespective of the connection
      mutex).  As a result, the connectino can be in pretty much any state
      at the time its socket is closed.
      
      Handle those other cases at the top of con_work().  Pull this whole
      block of code into a separate function to reduce the clutter.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      7bb21d68
    • Alex Elder's avatar
      rbd: don't use ENOTSUPP · b8f5c6ed
      Alex Elder authored
      ENOTSUPP is not a standard errno (it shows up as "Unknown error 524"
      in an error message).  This is what was getting produced when the
      the local rbd code does not implement features required by a
      discovered rbd image.
      
      Change the error code returned in this case to ENXIO.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      b8f5c6ed
    • Alex Elder's avatar
      rbd: remove linger unconditionally · 61c74035
      Alex Elder authored
      In __unregister_linger_request(), the request is being removed
      from the osd client's req_linger list only when the request
      has a non-null osd pointer.  It should be done whether or not
      the request currently has an osd.
      
      This is most likely a non-issue because I believe the request
      will always have an osd when this function is called.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      61c74035
    • Alex Elder's avatar
      rbd: get rid of RBD_MAX_SEG_NAME_LEN · 2fd82b9e
      Alex Elder authored
      RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object
      name (i.e., one of the objects providing storage backing an rbd
      image).
      
      Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to
      define the maximum length of any object name in an osd request.
      
      Right now they disagree, with RBD_MAX_SEG_NAME_LEN being too big.
      
      There's no real benefit at this point to defining the rbd object
      name length limit separate from any other object name, so just
      get rid of RBD_MAX_SEG_NAME_LEN and use MAX_OBJ_NAME_SIZE in its
      place.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      2fd82b9e
    • Alex Elder's avatar
      libceph: avoid using freed osd in __kick_osd_requests() · 685a7555
      Alex Elder authored
      If an osd has no requests and no linger requests, __reset_osd()
      will just remove it with a call to __remove_osd().  That drops
      a reference to the osd, and therefore the osd may have been free
      by the time __reset_osd() returns.  That function offers no
      indication this may have occurred, and as a result the osd will
      continue to be used even when it's no longer valid.
      
      Change__reset_osd() so it returns an error (ENODEV) when it
      deletes the osd being reset.  And change __kick_osd_requests() so it
      returns immediately (before referencing osd again) if __reset_osd()
      returns *any* error.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      685a7555
    • Alex Elder's avatar
      ceph: don't reference req after put · 7d5f2481
      Alex Elder authored
      In __unregister_request(), there is a call to list_del_init()
      referencing a request that was the subject of a call to
      ceph_osdc_put_request() on the previous line.  This is not
      safe, because the request structure could have been freed
      by the time we reach the list_del_init().
      
      Fix this by reversing the order of these lines.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-off-by: default avatarSage Weil <sage@inktank.com>
      7d5f2481
    • Alex Elder's avatar
      rbd: do not allow remove of mounted-on image · 42382b70
      Alex Elder authored
      There is no check in rbd_remove() to see if anybody holds open the
      image being removed.  That's not cool.
      
      Add a simple open count that goes up and down with opens and closes
      (releases) of the device, and don't allow an rbd image to be removed
      if the count is non-zero.
      
      Protect the updates of the open count value with ctl_mutex to ensure
      the underlying rbd device doesn't get removed while concurrently
      being opened.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      42382b70
  5. 13 Dec, 2012 9 commits
  6. 05 Nov, 2012 1 commit
    • Sage Weil's avatar
      ceph: Fix i_size update race · 22cddde1
      Sage Weil authored
      ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR
      dirty before data is copied to page cache and inode size is updated.
      If ceph_check_caps() flushes the dirty cap before the inode size is
      updated, MDS can miss the new inode size. The fix is move
      ceph_{get,put}_cap_refs() into ceph_write_{begin,end}() and call
      __ceph_mark_dirty_caps() after inode size is updated.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      22cddde1
  7. 04 Nov, 2012 1 commit
  8. 01 Nov, 2012 5 commits
    • Alex Elder's avatar
      rbd: get additional info in parent spec · 9e15b77d
      Alex Elder authored
      When a layered rbd image has a parent, that parent is identified
      only by its pool id, image id, and snapshot id.  Images that have
      been mapped also record *names* for those three id's.
      
      Add code to look up these names for parent images so they match
      mapped images more closely.  Skip doing this for an image if it
      already has its pool name defined (this will be the case for images
      mapped by the user).
      
      It is possible that an the name of a parent image can't be
      determined, even if the image id is valid.  If this occurs it
      does not preclude correct operation, so don't treat this as
      an error.
      
      On the other hand, defined pools will always have both an id and a
      name.   And any snapshot of an image identified as a parent for a
      clone image will exist, and will have a name (if not it indicates
      some other internal error).  So treat failure to get these bits
      of information as errors.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      9e15b77d
    • Alex Elder's avatar
      libceph: define ceph_pg_pool_name_by_id() · 72afc71f
      Alex Elder authored
      Define and export function ceph_pg_pool_name_by_id() to supply
      the name of a pg pool whose id is given.  This will be used by
      the next patch.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      72afc71f
    • Alex Elder's avatar
      rbd: get parent spec for version 2 images · 86b00e0d
      Alex Elder authored
      Add support for getting the the information identifying the parent
      image for rbd images that have them.  The child image holds a
      reference to its parent image specification structure.  Create a new
      entry "parent" in /sys/bus/rbd/image/N/ to report the identifying
      information for the parent image, if any.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      86b00e0d
    • Alex Elder's avatar
      rbd: allow null image name · a92ffdf8
      Alex Elder authored
      Format 2 parent images are partially identified by their image id,
      but it may not be possible to determine their image name.  The name
      is not strictly needed for correct operation, so we won't be
      treating it as an error if we don't know it.  Handle this case
      gracefully in rbd_name_show().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      a92ffdf8
    • Alex Elder's avatar
      rbd: allow null image name · 2c0d0a10
      Alex Elder authored
      We will know the image id for format 2 parent images, but won't
      initially know its image name.  Avoid making the query for an image
      id in rbd_dev_image_id() if it's already known.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      2c0d0a10