1. 25 Jan, 2013 3 commits
  2. 17 Jan, 2013 37 commits
    • Alex Elder's avatar
      rbd: fix type of snap_id in rbd_dev_v2_snap_info() · e0b49868
      Alex Elder authored
      The type of the snap_id local variable is defined with the
      wrong byte order.  Fix that.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e0b49868
    • Alex Elder's avatar
      rbd: assign watch request more directly · 8b84de79
      Alex Elder authored
      Both rbd_req_sync_op() and rbd_do_request() have a "linger"
      parameter, which is the address of a pointer that should refer to
      the osd request structure used to issue a request to an osd.
      
      Only one case ever supplies a non-null "linger" argument: an
      CEPH_OSD_OP_WATCH start.  And in that one case it is assigned
      &rbd_dev->watch_request.
      
      Within rbd_do_request() (where the assignment ultimately gets made)
      we know the rbd_dev and therefore its watch_request field.  We
      also know whether the op being sent is CEPH_OSD_OP_WATCH start.
      
      Stop opaquely passing down the "linger" pointer, and instead just
      assign the value directly inside rbd_do_request() when it's needed.
      
      This makes it unnecessary for rbd_req_sync_watch() to make
      arrangements to hold a value that's not available until a
      bit later.  This more clearly separates setting up a watch
      request from submitting it.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      8b84de79
    • Alex Elder's avatar
      rbd: move remaining osd op setup into rbd_osd_req_op_create() · 5efea49a
      Alex Elder authored
      The two remaining osd ops used by rbd are CEPH_OSD_OP_WATCH and
      CEPH_OSD_OP_NOTIFY_ACK.  Move the setup of those operations into
      rbd_osd_req_op_create(), and get rid of rbd_create_rw_op() and
      rbd_destroy_op().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      5efea49a
    • Alex Elder's avatar
      rbd: move call osd op setup into rbd_osd_req_op_create() · 2647ba38
      Alex Elder authored
      Move the initialization of the CEPH_OSD_OP_CALL operation into
      rbd_osd_req_op_create().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      2647ba38
    • Alex Elder's avatar
      rbd: don't assign extent info in rbd_req_sync_op() · 8d23bf29
      Alex Elder authored
      Move the assignment of the extent offset and length and payload
      length out of rbd_req_sync_op() and into its caller in the one spot
      where a read (and note--no write) operation might be initiated.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      8d23bf29
    • Alex Elder's avatar
      rbd: don't assign extent info in rbd_do_request() · c5611918
      Alex Elder authored
      In rbd_do_request() there's a sort of last-minute assignment of the
      extent offset and length and payload length for read and write
      operations.  Move those assignments into the caller (in those spots
      that might initiate read or write operations)
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c5611918
    • Alex Elder's avatar
      rbd: don't leak rbd_req for rbd_req_sync_notify_ack() · 18216657
      Alex Elder authored
      When rbd_req_sync_notify_ack() calls rbd_do_request() it supplies
      rbd_simple_req_cb() as its callback function.  Because the callback
      is supplied, an rbd_req structure gets allocated and populated so it
      can be used by the callback.  However rbd_simple_req_cb() is not
      freeing (or even using) the rbd_req structure, so it's getting
      leaked.
      
      Since rbd_simple_req_cb() has no need for the rbd_req structure,
      just avoid allocating one for this case.  Of the three calls to
      rbd_do_request(), only the one from rbd_do_op() needs the rbd_req
      structure, and that call can be distinguished from the other two
      because it supplies a non-null rbd_collection pointer.
      
      So fix this leak by only allocating the rbd_req structure if a
      non-null "coll" value is provided to rbd_do_request().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      18216657
    • Alex Elder's avatar
      rbd: don't leak rbd_req on synchronous requests · 2e53c6c3
      Alex Elder authored
      When rbd_do_request() is called it allocates and populates an
      rbd_req structure to hold information about the osd request to be
      sent.  This is done for the benefit of the callback function (in
      particular, rbd_req_cb()), which uses this in processing when
      the request completes.
      
      Synchronous requests provide no callback function, in which case
      rbd_do_request() waits for the request to complete before returning.
      This case is not handling the needed free of the rbd_req structure
      like it should, so it is getting leaked.
      
      Note however that the synchronous case has no need for the rbd_req
      structure at all.  So rather than simply freeing this structure for
      synchronous requests, just don't allocate it to begin with.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      2e53c6c3
    • Alex Elder's avatar
      rbd: combine rbd sync watch/unwatch functions · 907703d0
      Alex Elder authored
      The rbd_req_sync_watch() and rbd_req_sync_unwatch() functions are
      nearly identical.  Combine them into a single function with a flag
      indicating whether a watch is to be initiated or torn down.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      907703d0
    • Alex Elder's avatar
      rbd: use a common layout for each device · 0903e875
      Alex Elder authored
      Each osd message includes a layout structure, and for rbd it is
      always the same (at least for osd's in a given pool).
      
      Initialize a layout structure when an rbd_dev gets created and just
      copy that into osd requests for the rbd image.
      
      Replace an assertion that was done when initializing the layout
      structures with code that catches and handles anything that would
      trigger the assertion as soon as it is identified.  This precludes
      that (bad) condition from ever occurring.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      0903e875
    • Alex Elder's avatar
      rbd: don't bother calculating file mapping · 47dba7ba
      Alex Elder authored
      When rbd_do_request() has a request to process it initializes a ceph
      file layout structure and uses it to compute offsets and limits for
      the range of the request using ceph_calc_file_object_mapping().
      
      The layout used is fixed, and is based on RBD_MAX_OBJ_ORDER (30).
      It sets the layout's object size and stripe unit to be 1 GB (2^30),
      and sets the stripe count to be 1.
      
      The job of ceph_calc_file_object_mapping() is to determine which
      of a sequence of objects will contain data covered by range, and
      within that object, at what offset the range starts.  It also
      truncates the length of the range at the end of the selected object
      if necessary.
      
      This is needed for ceph fs, but for rbd it really serves no purpose.
      It does its own blocking of images into objects, echo of which is
      (1 << obj_order) in size, and as a result it ignores the "bno"
      value returned by ceph_calc_file_object_mapping().  In addition,
      by the point a request has reached this function, it is already
      destined for a single rbd object, and its length will not exceed
      that object's extent.  Because of this, and because the mapping will
      result in blocking up the range using an integer multiple of the
      image's object order, ceph_calc_file_object_mapping() will never
      change the offset or length values defined by the request.
      
      In other words, this call is a big no-op for rbd data requests.
      
      There is one exception.  We read the header object using this
      function, and in that case we will not have already limited the
      request size.  However, the header is a single object (not a file or
      rbd image), and should not be broken into pieces anyway.  So in fact
      we should *not* be calling ceph_calc_file_object_mapping() when
      operating on the header object.
      
      So...
      
      Don't call ceph_calc_file_object_mapping() in rbd_do_request(),
      because useless for image data and incorrect to do sofor the image
      header.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      47dba7ba
    • Alex Elder's avatar
      rbd: open code rbd_calc_raw_layout() · e01e7927
      Alex Elder authored
      This patch gets rid of rbd_calc_raw_layout() by simply open coding
      it in its one caller.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e01e7927
    • Alex Elder's avatar
      rbd: pull in ceph_calc_raw_layout() · 08296618
      Alex Elder authored
      This is the first in a series of patches aimed at eliminating
      the use of ceph_calc_raw_layout() by rbd.
      
      It simply pulls in a copy of that function and renames it
      rbd_calc_raw_layout().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      08296618
    • Alex Elder's avatar
      rbd: kill ceph_osd_req_op->flags · 2b5fc648
      Alex Elder authored
      The flags field of struct ceph_osd_req_op is never used, so just get
      rid of it.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      2b5fc648
    • Alex Elder's avatar
      rbd: assume single op in a request · 30573d68
      Alex Elder authored
      We now know that every of rbd_req_sync_op() passes an array of
      exactly one operation, as evidenced by all callers passing 1 as its
      num_op argument.  So get rid of that argument, assuming a single op.
      
      Similarly, we now know that all callers of rbd_do_request() pass 1
      as the num_op value, so that parameter can be eliminated as well.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      30573d68
    • Alex Elder's avatar
      rbd: there is really only one op · 139b4318
      Alex Elder authored
      Throughout the rbd code there are spots where it appears we can
      handle an osd request containing more than one osd request op.
      
      But that is only the way it appears.  In fact, currently only one
      operation at a time can be supported, and supporting more than
      one will require much more than fleshing out the support that's
      there now.
      
      This patch changes names to make it perfectly clear that anywhere
      we're dealing with a block of ops, we're in fact dealing with
      exactly one of them.  We'll be able to simplify some things as
      a result.
      
      When multiple op support is implemented, we can update things again
      accordingly.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      139b4318
    • Alex Elder's avatar
      libceph: pass num_op with ops · ae7ca4a3
      Alex Elder authored
      Both ceph_osdc_alloc_request() and ceph_osdc_build_request() are
      provided an array of ceph osd request operations.  Rather than just
      passing the number of operations in the array, the caller is
      required append an additional zeroed operation structure to signal
      the end of the array.
      
      All callers know the number of operations at the time these
      functions are called, so drop the silly zero entry and supply that
      number directly.  As a result, get_num_ops() is no longer needed.
      This also means that ceph_osdc_alloc_request() never uses its ops
      argument, so that can be dropped.
      
      Also rbd_create_rw_ops() no longer needs to add one to reserve room
      for the additional op.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      ae7ca4a3
    • Alex Elder's avatar
      rbd: pass num_op with ops array · d07c0958
      Alex Elder authored
      Add a num_op parameter to rbd_do_request() and rbd_req_sync_op() to
      indicate the number of entries in the array.  The callers of these
      functions always know how many entries are in the array, so just
      pass that information down.
      
      This is in anticipation of eliminating the extra zero-filled entry
      in these ops arrays.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      d07c0958
    • Alex Elder's avatar
      libceph: don't set pages or bio in ceph_osdc_alloc_request() · 54a54007
      Alex Elder authored
      Only one of the two callers of ceph_osdc_alloc_request() provides
      page or bio data for its payload.  And essentially all that function
      was doing with those arguments was assigning them to fields in the
      osd request structure.
      
      Simplify ceph_osdc_alloc_request() by having the caller take care of
      making those assignments
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      54a54007
    • Alex Elder's avatar
      libceph: don't set flags in ceph_osdc_alloc_request() · d178a9e7
      Alex Elder authored
      The only thing ceph_osdc_alloc_request() really does with the
      flags value it is passed is assign it to the newly-created
      osd request structure.  Do that in the caller instead.
      
      Both callers subsequently call ceph_osdc_build_request(), so have
      that function (instead of ceph_osdc_alloc_request()) issue a warning
      if a request comes through with neither the read nor write flags set.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      d178a9e7
    • Alex Elder's avatar
      libceph: drop osdc from ceph_calc_raw_layout() · e75b45cf
      Alex Elder authored
      The osdc parameter to ceph_calc_raw_layout() is not used, so get rid
      of it.  Consequently, the corresponding parameter in calc_layout()
      becomes unused, so get rid of that as well.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e75b45cf
    • Alex Elder's avatar
      libceph: drop snapid in ceph_calc_raw_layout() · 4d6b250b
      Alex Elder authored
      A snapshot id must be provided to ceph_calc_raw_layout() even though
      it is not needed at all for calculating the layout.
      
      Where the snapshot id *is* needed is when building the request
      message for an osd operation.
      
      Drop the snapid parameter from ceph_calc_raw_layout() and pass
      that value instead in ceph_osdc_build_request().
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      4d6b250b
    • Alex Elder's avatar
      libceph: pass length to ceph_calc_file_object_mapping() · e8afad65
      Alex Elder authored
      ceph_calc_file_object_mapping() takes (among other things) a "file"
      offset and length, and based on the layout, determines the object
      number ("bno") backing the affected portion of the file's data and
      the offset into that object where the desired range begins.  It also
      computes the size that should be used for the request--either the
      amount requested or something less if that would exceed the end of
      the object.
      
      This patch changes the input length parameter in this function so it
      is used only for input.  That is, the argument will be passed by
      value rather than by address, so the value provided won't get
      updated by the function.
      
      The value would only get updated if the length would surpass the
      current object, and in that case the value it got updated to would
      be exactly that returned in *oxlen.
      
      Only one of the two callers is affected by this change.  Update
      ceph_calc_raw_layout() so it records any updated value.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e8afad65
    • Alex Elder's avatar
      libceph: pass length to ceph_osdc_build_request() · 0120be3c
      Alex Elder authored
      The len argument to ceph_osdc_build_request() is set up to be
      passed by address, but that function never updates its value
      so there's no need to do this.  Tighten up the interface by
      passing the length directly.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      0120be3c
    • Alex Elder's avatar
      libceph: kill op_needs_trail() · 5b9d1b1c
      Alex Elder authored
      Since every osd message is now prepared to include trailing data,
      there's no need to check ahead of time whether any operations will
      make use of the trail portion of the message.
      
      We can drop the second argument to get_num_ops(), and as a result we
      can also get rid of op_needs_trail() which is no longer used.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      5b9d1b1c
    • Alex Elder's avatar
      libceph: always allow trail in osd request · c885837f
      Alex Elder authored
      An osd request structure contains an optional trail portion, which
      if present will contain data to be passed in the payload portion of
      the message containing the request.  The trail field is a
      ceph_pagelist pointer, and if null it indicates there is no trail.
      
      A ceph_pagelist structure contains a length field, and it can
      legitimately hold value 0.  Make use of this to change the
      interpretation of the "trail" of an osd request so that every osd
      request has trailing data, it just might have length 0.
      
      This means we change the r_trail field in a ceph_osd_request
      structure from a pointer to a structure that is always initialized.
      
      Note that in ceph_osdc_start_request(), the trail pointer (or now
      address of that structure) is assigned to a ceph message's trail
      field.  Here's why that's still OK (looking at net/ceph/messenger.c):
          - What would have resulted in a null pointer previously will now
            refer to a 0-length page list.  That message trail pointer
            is used in two functions, write_partial_msg_pages() and
            out_msg_pos_next().
          - In write_partial_msg_pages(), a null page list pointer is
            handled the same as a message with 0-length trail, and both
            result in a "in_trail" variable set to false.  The trail
            pointer is only used if in_trail is true.
          - The only other place the message trail pointer is used is
            out_msg_pos_next().  That function is only called by
            write_partial_msg_pages() and only touches the trail pointer
            if the in_trail value it is passed is true.
      Therefore a null ceph_msg->trail pointer is equivalent to a non-null
      pointer referring to a 0-length page list structure.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c885837f
    • Alex Elder's avatar
      rbd: don't bother setting snapid in rbd_do_request() · 7c3d22cf
      Alex Elder authored
      For some reason, the snapid field of the osd request header is
      explicitly set to CEPH_NOSNAP in rbd_do_request().  Just a few lines
      later--with no code that would access this field in between--a call
      is made to ceph_calc_raw_layout() passing the snapid provided to
      rbd_do_request(), which encodes the snapid value it is provided into
      that field instead.
      
      In other words, there is no need to fill in CEPH_NOSNAP, and doing
      so suggests it might be necessary.  Don't do that any more.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      7c3d22cf
    • Alex Elder's avatar
      rbd: kill rbd_req_sync_op() snapc and snapid parameters · 25704ac9
      Alex Elder authored
      The snapc and snapid parameters to rbd_req_sync_op() always take
      the values NULL and CEPH_NOSNAP, respectively.  So just get rid
      of them and use those values where needed.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      25704ac9
    • Alex Elder's avatar
      rbd: drop flags parameter from rbd_req_sync_exec() · 07b2391f
      Alex Elder authored
      All callers of rbd_req_sync_exec() pass CEPH_OSD_FLAG_READ as their
      flags argument.  Delete that parameter and use CEPH_OSD_FLAG_READ
      within the function.  If we find a need to support write operations
      we can add it back again.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      07b2391f
    • Alex Elder's avatar
      rbd: drop snapid parameter from rbd_req_sync_read() · 4775618d
      Alex Elder authored
      There is only one caller of rbd_req_sync_read(), and it passes
      CEPH_NOSNAP as the snapshot id argument.  Delete that parameter
      and just use CEPH_NOSNAP within the function.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      4775618d
    • Alex Elder's avatar
      rbd: drop oid parameters from ceph_osdc_build_request() · af77f26c
      Alex Elder authored
      The last two parameters to ceph_osd_build_request() describe the
      object id, but the values passed always come from the osd request
      structure whose address is also provided.  Get rid of those last
      two parameters.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      af77f26c
    • Alex Elder's avatar
      rbd: separate layout init · 0ec8ce87
      Alex Elder authored
      Pull a block of code that initializes the layout structure in an osd
      request into its own function so it can be reused.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarDan Mick <dan.mick@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      0ec8ce87
    • Alex Elder's avatar
      rbd: only get snap context for write requests · a7b4c65f
      Alex Elder authored
      Right now we get the snapshot context for an rbd image (under
      protection of the header semaphore) for every request processed.
      
      There's no need to get the snap context if we're doing a read,
      so avoid doing so in that case.
      
      Note that we no longer need to hold the header semaphore to
      check the rbd_dev's existence flag.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      a7b4c65f
    • Alex Elder's avatar
      rbd: make exists flag atomic · d78b650a
      Alex Elder authored
      The rbd_device->exists field can be updated asynchronously, changing
      from set to clear if a mapped snapshot disappears from the base
      image's snapshot context.
      
      Currently, value of the "exists" flag is only read and modified
      under protection of the header semaphore, but that will change with
      the next patch.  Making it atomic ensures this won't be a problem
      because the a the non-existence of device will be immediately known.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      d78b650a
    • Alex Elder's avatar
      rbd: a little more cleanup of rbd_rq_fn() · b395e8b5
      Alex Elder authored
      Now that a big hunk in the middle of rbd_rq_fn() has been moved
      into its own routine we can simplify it a little more.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      b395e8b5
    • Alex Elder's avatar
      rbd: end request on error in rbd_do_request() caller · cd323ac0
      Alex Elder authored
      Only one of the three callers of rbd_do_request() provide a
      collection structure to aggregate status.
      
      If an error occurs in rbd_do_request(), have the caller
      take care of calling rbd_coll_end_req() if necessary in
      that one spot.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      cd323ac0
    • Alex Elder's avatar
      rbd: encapsulate handling for a single request · 8295cda7
      Alex Elder authored
      In rbd_rq_fn(), requests are fetched from the block layer and each
      request is processed, looping through the request's list of bio's
      until they've all been consumed.
      
      Separate the handling for a single request into its own function to
      make it a bit easier to see what's going on.
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      8295cda7