Commit d86514f1 authored by Patrick Steinhardt's avatar Patrick Steinhardt

checks: Fix revalidation of preexisting commits

When pushing commits into a repository, then client and server negotiate
a packfile containing all objects which are necessary to end up with a
fully connected graph on the server-side. In general, this contains at
least all objects which have been newly introduced between the set of
all old and new references. But in some cases, it can be that Git will
send packfiles which knowingly contain objects which existed on the
server side already, e.g. when Git decides to reuse deltas from an
existing packfile where the delta base is a preexisting commit. As a
result, any well-formed packfile is a superset of objects required to
satisfy the update.

In v14.3 we have refactored access checks to use the quarantine
directory to enumerate new commits directly. Because of above property,
we may get too many objects from the object quarantine directory, which
means that as a result we may perform access checks on commits which in
fact aren't new in case the client decided to include these in the pack.
While this is not a problem in most access checks (an object which is in
the main repository but which we re-check is going to still pass the
checks), other checks are more sensitive. Most importantly, push rules
may require a commit to be created by the author who is currently
performing the change. If we include preexisting commits of a different
author in such a check, then it is totally expected that the access
check will now fail. As a result, we must never include preexisting
commits in push rule access checks.

To determine new commits in push rules, we do an in-memory walk of
commits returned from the quarantine directory, where we walk from the
tip of each change until we are not able to satisfy the commit's parents
anymore. And in this case, we happily traverse past commits which are
known already inc ase those were returned from the quarantine directory.
To fix this, we need to abort the walk as soon as we hit an already
known object.

The problem is though that we have no easy way to determine the already
known object in the general case. But we can do so in limited cases:
when the change we're processing has both an old and a new revision
(that is, it is an "update"), then we simply skip adding oldrev to the
result set. This doesn't work though for branch creations, where we
ain't got no oldrev. We thus fall back to enumerating commits not via
the quarantine directory in that case, but instead by using a revision
walk with `--not --all`. This walk will not contain any objects which
are referenced by any reference, and thus we can be sure that the
in-memory walk will not traverse past any preexisting object.

Implement this schema. Unfortunately this is going to be a lot less
performant compared to using the quarantine directory in all cases. But
better be less performant than wrong.

Changelog: fixed
parent eec2e4b9
......@@ -33,18 +33,33 @@ module Gitlab
# changes. This set may also contain commits which are not referenced by
# any of the new revisions.
def commits
allow_quarantine = true
newrevs = @changes.map do |change|
oldrev = change[:oldrev]
newrev = change[:newrev]
newrev unless blank_rev?(newrev)
next if blank_rev?(newrev)
# In case any of the old revisions is blank, then we cannot reliably
# detect which commits are new for a given change when enumerating
# objects via the object quarantine directory given that the client
# may have pushed too many commits, and we don't know when to
# terminate the walk. We thus fall back to using `git rev-list --not
# --all`, which is a lot less efficient but at least can only ever
# returns commits which really are new.
allow_quarantine = false if allow_quarantine && blank_rev?(oldrev)
newrev
end.compact
return [] if newrevs.empty?
@commits ||= project.repository.new_commits(newrevs, allow_quarantine: true)
@commits ||= project.repository.new_commits(newrevs, allow_quarantine: allow_quarantine)
end
# All commits which have been newly introduced via the given revision.
def commits_for(newrev)
def commits_for(oldrev, newrev)
commits_by_id = commits.index_by(&:id)
result = []
......@@ -65,9 +80,11 @@ module Gitlab
# Only add the parent ID to the pending set if we actually know its
# commit to guards us against readding an ID which we have already
# queued up before.
# queued up before. Furthermore, we stop walking as soon as we hit
# `oldrev` such that we do not include any commits in our checks
# which have been "over-pushed" by the client.
commit.parent_ids.each do |parent_id|
pending.add(parent_id) if commits_by_id.has_key?(parent_id)
pending.add(parent_id) if commits_by_id.has_key?(parent_id) && parent_id != oldrev
end
result << commit
......@@ -83,7 +100,7 @@ module Gitlab
if blank_rev?(change[:newrev])
[]
else
Gitlab::Lazy.new { commits_for(change[:newrev]) }
Gitlab::Lazy.new { commits_for(change[:oldrev], change[:newrev]) }
end
Checks::SingleChangeAccess.new(
......
......@@ -44,16 +44,30 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
it 'calls #new_commits' do
expect(project.repository).to receive(:new_commits).and_call_original
expect(subject.commits).to eq([])
expect(subject.commits).to match_array([])
end
context 'when changes contain empty revisions' do
let(:changes) { [{ newrev: newrev }, { newrev: '' }, { newrev: Gitlab::Git::BLANK_SHA }] }
let(:expected_commit) { instance_double(Commit) }
it 'returns only commits with non empty revisions' do
expect(project.repository).to receive(:new_commits).with([newrev], { allow_quarantine: true }) { [expected_commit] }
expect(subject.commits).to eq([expected_commit])
shared_examples 'returns only commits with non empty revisions' do
specify do
expect(project.repository).to receive(:new_commits).with([newrev], { allow_quarantine: allow_quarantine }) { [expected_commit] }
expect(subject.commits).to match_array([expected_commit])
end
end
it_behaves_like 'returns only commits with non empty revisions' do
let(:changes) { [{ oldrev: oldrev, newrev: newrev }, { newrev: '' }, { newrev: Gitlab::Git::BLANK_SHA }] }
let(:allow_quarantine) { true }
end
context 'without oldrev' do
it_behaves_like 'returns only commits with non empty revisions' do
let(:changes) { [{ newrev: newrev }, { newrev: '' }, { newrev: Gitlab::Git::BLANK_SHA }] }
# The quarantine directory should not be used because we're lacking oldrev.
let(:allow_quarantine) { false }
end
end
end
end
......@@ -61,12 +75,13 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
describe '#commits_for' do
let(:new_commits) { [] }
let(:expected_commits) { [] }
let(:oldrev) { Gitlab::Git::BLANK_SHA }
shared_examples 'a listing of new commits' do
it 'returns expected commits' do
expect(subject).to receive(:commits).and_return(new_commits)
expect(subject.commits_for(newrev)).to eq(expected_commits)
expect(subject.commits_for(oldrev, newrev)).to eq(expected_commits)
end
end
......@@ -172,6 +187,31 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
it_behaves_like 'a listing of new commits'
end
context 'with over-push' do
let(:newrev) { '1' }
let(:oldrev) { '3' }
# `#new_commits` returns too many commits, where some commits are not
# part of the current change.
let(:new_commits) do
[
create_commit('1', %w[2]),
create_commit('2', %w[3]),
create_commit('3', %w[4]),
create_commit('4', %w[])
]
end
let(:expected_commits) do
[
create_commit('1', %w[2]),
create_commit('2', %w[3])
]
end
it_behaves_like 'a listing of new commits'
end
end
describe '#single_change_accesses' do
......@@ -180,10 +220,10 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
shared_examples '#single_change_access' do
before do
commits_for.each do |id, commits|
commits_for.each do |oldrev, newrev, commits|
expect(subject)
.to receive(:commits_for)
.with(id)
.with(oldrev, newrev)
.and_return(commits)
end
end
......@@ -205,7 +245,12 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
end
context 'with a single change and no new commits' do
let(:commits_for) { { 'new' => [] } }
let(:commits_for) do
[
['old', 'new', []]
]
end
let(:changes) do
[
{ oldrev: 'old', newrev: 'new', ref: 'refs/heads/branch' }
......@@ -222,7 +267,12 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
end
context 'with a single change and new commits' do
let(:commits_for) { { 'new' => [create_commit('new', [])] } }
let(:commits_for) do
[
['old', 'new', [create_commit('new', [])]]
]
end
let(:changes) do
[
{ oldrev: 'old', newrev: 'new', ref: 'refs/heads/branch' }
......@@ -240,11 +290,11 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
context 'with multiple changes' do
let(:commits_for) do
{
'a' => [create_commit('a', [])],
'c' => [create_commit('c', [])],
'd' => []
}
[
[nil, 'a', [create_commit('a', [])]],
['a', 'c', [create_commit('c', [])]],
[nil, 'd', []]
]
end
let(:changes) do
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment