lib/gitlab/checks/changes_access.rb · 0f226e8760e75b9d9e8e3cae6d861b89838b2e99 · nexedi / gitlab-ce · GitLab

Find file Blame History Permalink

checks: Fix revalidation of preexisting commits · d86514f1

Patrick Steinhardt authored Dec 14, 2021

When pushing commits into a repository, then client and server negotiate
a packfile containing all objects which are necessary to end up with a
fully connected graph on the server-side. In general, this contains at
least all objects which have been newly introduced between the set of
all old and new references. But in some cases, it can be that Git will
send packfiles which knowingly contain objects which existed on the
server side already, e.g. when Git decides to reuse deltas from an
existing packfile where the delta base is a preexisting commit. As a
result, any well-formed packfile is a superset of objects required to
satisfy the update.

In v14.3 we have refactored access checks to use the quarantine
directory to enumerate new commits directly. Because of above property,
we may get too many objects from the object quarantine directory, which
means that as a result we may perform access checks on commits which in
fact aren't new in case the client decided to include these in the pack.
While this is not a problem in most access checks (an object which is in
the main repository but which we re-check is going to still pass the
checks), other checks are more sensitive. Most importantly, push rules
may require a commit to be created by the author who is currently
performing the change. If we include preexisting commits of a different
author in such a check, then it is totally expected that the access
check will now fail. As a result, we must never include preexisting
commits in push rule access checks.

To determine new commits in push rules, we do an in-memory walk of
commits returned from the quarantine directory, where we walk from the
tip of each change until we are not able to satisfy the commit's parents
anymore. And in this case, we happily traverse past commits which are
known already inc ase those were returned from the quarantine directory.
To fix this, we need to abort the walk as soon as we hit an already
known object.

The problem is though that we have no easy way to determine the already
known object in the general case. But we can do so in limited cases:
when the change we're processing has both an old and a new revision
(that is, it is an "update"), then we simply skip adding oldrev to the
result set. This doesn't work though for branch creations, where we
ain't got no oldrev. We thus fall back to enumerating commits not via
the quarantine directory in that case, but instead by using a revision
walk with `--not --all`. This walk will not contain any objects which
are referenced by any reference, and thus we can be sure that the
in-memory walk will not traverse past any preexisting object.

Implement this schema. Unfortunately this is going to be a lot less
performant compared to using the quarantine directory in all cases. But
better be less performant than wrong.

Changelog: fixed

d86514f1

To find the state of this project's repository at the time of any of these versions, check out the tags.

changes_access.rb 4.26 KB