1. 23 Feb, 2021 2 commits
  2. 22 Feb, 2021 3 commits
    • Andreas Gruenbacher's avatar
      gfs2: Per-revoke accounting in transactions · 2129b428
      Andreas Gruenbacher authored
      In the log, revokes are stored as a revoke descriptor (struct
      gfs2_log_descriptor), followed by zero or more additional revoke blocks
      (struct gfs2_meta_header).  On filesystems with a blocksize of 4k, the
      revoke descriptor contains up to 503 revokes, and the metadata blocks
      contain up to 509 revokes each.  We've so far been reserving space for
      revokes in transactions in block granularity, so a lot more space than
      necessary was being allocated and then released again.
      
      This patch switches to assigning revokes to transactions individually
      instead.  Initially, space for the revoke descriptor is reserved and
      handed out to transactions.  When more revokes than that are reserved,
      additional revoke blocks are added.  When the log is flushed, the space
      for the additional revoke blocks is released, but we keep the space for
      the revoke descriptor block allocated.
      
      Transactions may still reserve more revokes than they will actually need
      in the end, but now we won't overshoot the target as much, and by only
      returning the space for excess revokes at log flush time, we further
      reduce the amount of contention between processes.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      2129b428
    • Andreas Gruenbacher's avatar
      gfs2: Rework the log space allocation logic · fe3e3976
      Andreas Gruenbacher authored
      The current log space allocation logic is hard to understand or extend.
      The principle it that when the log is flushed, we may or may not have a
      transaction active that has space allocated in the log.  To deal with
      that, we set aside a magical number of blocks to be used in case we
      don't have an active transaction.  It isn't clear that the pool will
      always be big enough.  In addition, we can't return unused log space at
      the end of a transaction, so the number of blocks allocated must exactly
      match the number of blocks used.
      
      Simplify this as follows:
       * When transactions are allocated or merged, always reserve enough
         blocks to flush the transaction (err on the safe side).
       * In gfs2_log_flush, return any allocated blocks that haven't been used.
       * Maintain a pool of spare blocks big enough to do one log flush, as
         before.
       * In gfs2_log_flush, when we have no active transaction, allocate a
         suitable number of blocks.  For that, use the spare pool when
         called from logd, and leave the pool alone otherwise.  This means
         that when the log is almost full, logd will still be able to do one
         more log flush, which will result in more log space becoming
         available.
      
      This will make the log space allocator code easier to work with in
      the future.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      fe3e3976
    • Andreas Gruenbacher's avatar
      gfs2: Minor calc_reserved cleanup · 71b219f4
      Andreas Gruenbacher authored
      No functional change.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      71b219f4
  3. 17 Feb, 2021 9 commits
    • Bob Peterson's avatar
      gfs2: Use resource group glock sharing · 4fc7ec31
      Bob Peterson authored
      This patch takes advantage of the new glock holder sharing feature for
      resource groups.  We have already introduced local resource group
      locking in a previous patch, so competing accesses of local processes
      are already under control.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      4fc7ec31
    • Bob Peterson's avatar
      gfs2: Allow node-wide exclusive glock sharing · 06e908cd
      Bob Peterson authored
      Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
      glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
      set, the exclusive lock is shared among all local processes who are
      holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
      From the point of view of other nodes, the lock is still held
      exclusively.
      
      A future patch will start using this flag to improve performance with
      rgrp sharing.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      06e908cd
    • Andreas Gruenbacher's avatar
      gfs2: Add local resource group locking · 9e514605
      Andreas Gruenbacher authored
      Prepare for treating resource group glocks as exclusive among nodes but
      shared among all tasks running on a node: introduce another layer of
      node-specific locking that the local tasks can use to coordinate their
      accesses.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      9e514605
    • Andreas Gruenbacher's avatar
      gfs2: Add per-reservation reserved block accounting · 725d0e9d
      Andreas Gruenbacher authored
      Add a rs_reserved field to struct gfs2_blkreserv to keep track of the number of
      blocks reserved by this particular reservation, and a rd_reserved field to
      struct gfs2_rgrpd to keep track of the total number of reserved blocks in the
      resource group.  Those blocks are exclusively reserved, as opposed to the
      rs_requested / rd_requested blocks which are tracked in the reservation tree
      (rd_rstree) and which can be stolen if necessary.
      
      When making a reservation with gfs2_inplace_reserve, rs_reserved is set to
      somewhere between ap->min_target and ap->target depending on the number of free
      blocks in the resource group.  When allocating blocks with gfs2_alloc_blocks,
      rs_reserved is decremented accordingly.  Eventually, any reserved but not
      consumed blocks are returned to the resource group by gfs2_inplace_release.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      725d0e9d
    • Andreas Gruenbacher's avatar
      gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested} · 07974d2a
      Andreas Gruenbacher authored
      We keep track of what we've so far been referring to as reservations in
      rd_rstree: the nodes in that tree indicate where in a resource group we'd
      like to allocate the next couple of blocks for a particular inode.  Local
      processes take those as hints, but they may still "steal" blocks from those
      extents, so when actually allocating a block, we must double check in the
      bitmap whether that block is actually still free.  Likewise, other cluster
      nodes may "steal" such blocks as well.
      
      One of the following patches introduces resource group glock sharing, i.e.,
      sharing of an exclusively locked resource group glock among local processes to
      speed up allocations.  To make that work, we'll need to keep track of how many
      blocks we've actually reserved for each inode, so we end up with two different
      kinds of reservations.
      
      Distinguish these two kinds by referring to blocks which are reserved but may
      still be "stolen" as "requested".  This rename also makes it more obvious that
      rs_requested and rd_requested are strongly related.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      07974d2a
    • Andreas Gruenbacher's avatar
      gfs2: Check for active reservation in gfs2_release · 0ec9b9ea
      Andreas Gruenbacher authored
      In gfs2_release, check if the inode has an active reservation to avoid
      unnecessary lock taking.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0ec9b9ea
    • Andreas Gruenbacher's avatar
      gfs2: Don't search for unreserved space twice · b2598965
      Andreas Gruenbacher authored
      If gfs2_inplace_reserve has chosen a resource group but it couldn't make a
      reservation there, there are too many other reservations in that resource
      group.  In that case, don't even try to respect existing reservations in
      gfs2_alloc_blocks.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b2598965
    • Andreas Gruenbacher's avatar
      gfs2: Only pass reservation down to gfs2_rbm_find · 3d39fcd1
      Andreas Gruenbacher authored
      Only pass the current reservation down to gfs2_rbm_find rather than the entire
      inode; we don't need any of the other information.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      3d39fcd1
    • Andreas Gruenbacher's avatar
      gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt · f38e998f
      Andreas Gruenbacher authored
      Pass a non-NULL minext to gfs2_rbm_find even for single-block allocations.  In
      gfs2_rbm_find, also set rgd->rd_extfail_pt when a single-block allocation
      fails in a resource group: there is no reason for treating that case
      differently.  In gfs2_reservation_check_and_update, only check how many free
      blocks we have if more than one block is requested; we already know there's at
      least one free block.
      
      In addition, when allocating N blocks fails in gfs2_rbm_find, we need to set
      rd_extfail_pt to N - 1 rather than N:  rd_extfail_pt defines the biggest
      allocation that might still succeed.
      
      Finally, reset rd_extfail_pt when updating the resource group statistics in
      update_rgrp_lvb, as we already do in gfs2_rgrp_bh_get.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      f38e998f
  4. 10 Feb, 2021 1 commit
    • Andreas Gruenbacher's avatar
      gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end · 7009fa9c
      Andreas Gruenbacher authored
      When starting an iomap write, gfs2_quota_lock_check -> gfs2_quota_lock
      -> gfs2_quota_hold is called from gfs2_iomap_begin.  At the end of the
      write, before unlocking the quotas, punch_hole -> gfs2_quota_hold can be
      called again in gfs2_iomap_end, which is incorrect and leads to a failed
      assertion.  Instead, move the call to gfs2_quota_unlock before the call
      to punch_hole to fix that.
      
      Fixes: 64bc06bb ("gfs2: iomap buffered write support")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      7009fa9c
  5. 08 Feb, 2021 2 commits
    • Andreas Gruenbacher's avatar
      gfs2: Add trusted xattr support · 866eef48
      Andreas Gruenbacher authored
      Add support for an additional filesystem version (sb_fs_format = 1802).
      When a filesystem with the new version is mounted, the filesystem
      supports "trusted.*" xattrs.
      
      In addition, version 1802 filesystems implement a form of forward
      compatibility for xattrs: when xattrs with an unknown prefix (ea_type)
      are found on a version 1802 filesystem, those attributes are not shown
      by listxattr, and they are not accessible by getxattr, setxattr, or
      removexattr.
      
      This mechanism might turn out to be what we need in the future, but if
      not, we can always bump the filesystem version and break compatibility
      instead.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarAndrew Price <anprice@redhat.com>
      866eef48
    • Andrew Price's avatar
      gfs2: Enable rgrplvb for sb_fs_format 1802 · 47b7ec1d
      Andrew Price authored
      Turn on rgrplvb by default for sb_fs_format > 1801.
      
      Mount options still have to override this so a new args field to
      differentiate between 'off' and 'not specified' is added, and the new
      default is applied only when it's not specified.
      Signed-off-by: default avatarAndrew Price <anprice@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      47b7ec1d
  6. 05 Feb, 2021 2 commits
    • Bob Peterson's avatar
      gfs2: Don't skip dlm unlock if glock has an lvb · 78178ca8
      Bob Peterson authored
      Patch fb6791d1 was designed to allow gfs2 to unmount quicker by
      skipping the step where it tells dlm to unlock glocks in EX with lvbs.
      This was done because when gfs2 unmounts a file system, it destroys the
      dlm lockspace shortly after it destroys the glocks so it doesn't need to
      unlock them all: the unlock is implied when the lockspace is destroyed
      by dlm.
      
      However, that patch introduced a use-after-free in dlm: as part of its
      normal dlm_recoverd process, it can call ls_recovery to recover dead
      locks. In so doing, it can call recover_rsbs which calls recover_lvb for
      any mastered rsbs. Func recover_lvb runs through the list of lkbs queued
      to the given rsb (if the glock is cached but unlocked, it will still be
      queued to the lkb, but in NL--Unlocked--mode) and if it has an lvb,
      copies it to the rsb, thus trying to preserve the lkb. However, when
      gfs2 skips the dlm unlock step, it frees the glock and its lvb, which
      means dlm's function recover_lvb references the now freed lvb pointer,
      copying the freed lvb memory to the rsb.
      
      This patch changes the check in gdlm_put_lock so that it calls
      dlm_unlock for all glocks that contain an lvb pointer.
      
      Fixes: fb6791d1 ("GFS2: skip dlm_unlock calls in unmount")
      Cc: stable@vger.kernel.org # v3.8+
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      78178ca8
    • Andreas Gruenbacher's avatar
      gfs2: Lock imbalance on error path in gfs2_recover_one · 834ec3e1
      Andreas Gruenbacher authored
      In gfs2_recover_one, fix a sd_log_flush_lock imbalance when a recovery
      pass fails.
      
      Fixes: c9ebc4b7 ("gfs2: allow journal replay to hold sd_log_flush_lock")
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      834ec3e1
  7. 03 Feb, 2021 9 commits
  8. 25 Jan, 2021 4 commits
  9. 22 Jan, 2021 1 commit
  10. 19 Jan, 2021 7 commits