Commit 0a90028f authored by Patrick Steinhardt's avatar Patrick Steinhardt

checks: Speed up retrieving commits via quarantine directory

The Git access checks need access to objects which have been newly
introduced for a set of given changes. These new objects used to be
retrieved via a revision walk, where preexisting references have been
excluded and new references included. The resulting set of objects is
then the set of objects which haven't previously been referenced by
anything else. Depending on the number of references, walking all
preexisting ones can be quite expensive and easily take dozens of
seconds.

In the past, we have thus converted our LFS pointer checks to use the
quarantine directory to list new LFS pointers. The quarantine directory
hosts all objects which have been uploaded by a git-push(1), and can
thus directly tell us about which objects are new and which aren't. This
has sped up LFS pointer checks in case we have a quarantine directory by
3 orders of magnitude in biggish repos with many refs, going from up to
20 seconds down to 20 milliseconds.

Given that commits can be front-loaded in a single batched RPC call
since 156ce433 (checks: Implement infrastructure for bulk diff
checks, 2021-07-29), we are now in a position to do the same for
commits: If we have an object quarantine directory, then we use the new
`ListAllCommits()` RPC with all alternate paths unset. This will thus
effectively only list commits of the object directory, which in a
quarantined environment points to the quarantine object directory.

Note that we still have to retain the old code to list commits via
a revision walk, given that non-push RPCs currently don't have an object
quarantine directory available. While this is slated to change soon by
creating manual quarantines, we must continue to support enumeration via
revwalks.

Changelog: performance
parent deae8d1e
...@@ -40,7 +40,7 @@ module Gitlab ...@@ -40,7 +40,7 @@ module Gitlab
return [] if newrevs.empty? return [] if newrevs.empty?
@commits ||= project.repository.new_commits(newrevs) @commits ||= project.repository.new_commits(newrevs, allow_quarantine: true)
end end
# All commits which have been newly introduced via the given revision. # All commits which have been newly introduced via the given revision.
......
...@@ -354,9 +354,9 @@ module Gitlab ...@@ -354,9 +354,9 @@ module Gitlab
end end
end end
def new_commits(newrevs) def new_commits(newrevs, allow_quarantine: false)
wrapped_gitaly_errors do wrapped_gitaly_errors do
gitaly_commit_client.list_commits(Array.wrap(newrevs) + %w[--not --all]) gitaly_commit_client.list_new_commits(Array.wrap(newrevs), allow_quarantine: allow_quarantine)
end end
end end
......
...@@ -265,6 +265,31 @@ module Gitlab ...@@ -265,6 +265,31 @@ module Gitlab
consume_commits_response(response) consume_commits_response(response)
end end
# List all commits which are new in the repository. If commits have been pushed into the repo
def list_new_commits(revisions, allow_quarantine: false)
git_env = Gitlab::Git::HookEnv.all(@gitaly_repo.gl_repository)
if allow_quarantine && git_env['GIT_OBJECT_DIRECTORY_RELATIVE'].present?
# If we have a quarantine environment, then we can optimize the check
# by doing a ListAllCommitsRequest. Instead of walking through
# references, we just walk through all quarantined objects, which is
# a lot more efficient. To do so, we throw away any alternate object
# directories, which point to the main object directory of the
# repository, and only keep the object directory which points into
# the quarantine object directory.
quarantined_repo = @gitaly_repo.dup
quarantined_repo.git_alternate_object_directories = Google::Protobuf::RepeatedField.new(:string)
request = Gitaly::ListAllCommitsRequest.new(
repository: quarantined_repo
)
response = GitalyClient.call(@repository.storage, :commit_service, :list_all_commits, request, timeout: GitalyClient.medium_timeout)
consume_commits_response(response)
else
list_commits(Array.wrap(revisions) + %w[--not --all])
end
end
def list_commits_by_oid(oids) def list_commits_by_oid(oids)
return [] if oids.empty? return [] if oids.empty?
......
...@@ -70,7 +70,7 @@ RSpec.describe Gitlab::Checks::ChangesAccess do ...@@ -70,7 +70,7 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
let(:expected_commit) { instance_double(Commit) } let(:expected_commit) { instance_double(Commit) }
it 'returns only commits with non empty revisions' do it 'returns only commits with non empty revisions' do
expect(project.repository).to receive(:new_commits).with([newrev]) { [expected_commit] } expect(project.repository).to receive(:new_commits).with([newrev], { allow_quarantine: true }) { [expected_commit] }
expect(subject.commits).to eq([expected_commit]) expect(subject.commits).to eq([expected_commit])
end end
end end
......
...@@ -343,6 +343,92 @@ RSpec.describe Gitlab::GitalyClient::CommitService do ...@@ -343,6 +343,92 @@ RSpec.describe Gitlab::GitalyClient::CommitService do
end end
end end
describe '#list_new_commits' do
let(:revisions) { [revision] }
let(:gitaly_commits) { create_list(:gitaly_commit, 3) }
let(:commits) { gitaly_commits.map { |c| Gitlab::Git::Commit.new(repository, c) }}
subject { client.list_new_commits(revisions, allow_quarantine: allow_quarantine) }
shared_examples 'a #list_all_commits message' do
it 'sends a list_all_commits message' do
expected_repository = repository.gitaly_repository.dup
expected_repository.git_alternate_object_directories = Google::Protobuf::RepeatedField.new(:string)
expect_next_instance_of(Gitaly::CommitService::Stub) do |service|
expect(service).to receive(:list_all_commits)
.with(gitaly_request_with_params(repository: expected_repository), kind_of(Hash))
.and_return([Gitaly::ListAllCommitsResponse.new(commits: gitaly_commits)])
end
expect(subject).to eq(commits)
end
end
shared_examples 'a #list_commits message' do
it 'sends a list_commits message' do
expect_next_instance_of(Gitaly::CommitService::Stub) do |service|
expect(service).to receive(:list_commits)
.with(gitaly_request_with_params(revisions: revisions + %w[--not --all]), kind_of(Hash))
.and_return([Gitaly::ListCommitsResponse.new(commits: gitaly_commits)])
end
expect(subject).to eq(commits)
end
end
before do
::Gitlab::GitalyClient.clear_stubs!
allow(Gitlab::Git::HookEnv)
.to receive(:all)
.with(repository.gl_repository)
.and_return(git_env)
end
context 'with hook environment' do
let(:git_env) do
{
'GIT_OBJECT_DIRECTORY_RELATIVE' => '.git/objects',
'GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE' => ['/dir/one', '/dir/two']
}
end
context 'with allowed quarantine' do
let(:allow_quarantine) { true }
it_behaves_like 'a #list_all_commits message'
end
context 'with disallowed quarantine' do
let(:allow_quarantine) { false }
it_behaves_like 'a #list_commits message'
end
end
context 'without hook environment' do
let(:git_env) do
{
'GIT_OBJECT_DIRECTORY_RELATIVE' => '',
'GIT_ALTERNATE_OBJECT_DIRECTORIES_RELATIVE' => []
}
end
context 'with allowed quarantine' do
let(:allow_quarantine) { true }
it_behaves_like 'a #list_commits message'
end
context 'with disallowed quarantine' do
let(:allow_quarantine) { false }
it_behaves_like 'a #list_commits message'
end
end
end
describe '#commit_stats' do describe '#commit_stats' do
let(:request) do let(:request) do
Gitaly::CommitStatsRequest.new( Gitaly::CommitStatsRequest.new(
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment