Commit 2aef8c6e authored by Sean McGivern's avatar Sean McGivern

Only fetch repo once on secondary after push

Currently, after a push we trigger the 'push' system hook on the
secondary for every ref. This hook performs `git fetch primary
--force`. When a repository with a large number of local refs is pushed
for the first time, git will then try to fetch the entire repository
once for every ref, generating a large amount of network and disk usage
for the temporary packfiles used during the fetch process.

In addition to this, if a fetch fails in the
`GeoRepositoryUpdateWorker`, its temporary packfile will remain present
until the next GC run.

To work around this, define a new 'fetch' hook type. Fake the system
hooks so that this is only called for secondary Geo nodes. Then remove
the fetching from the per-ref update step, and explicitly call the hook
once per push instead.

This means that there is a risk the update step will happen before the
fetch is finished, but it does stop the disk usage problem.
parent 9e5f3496
......@@ -2,4 +2,8 @@ class SystemHook < WebHook
def async_execute(data, hook_name)
Sidekiq::Client.enqueue(SystemHookWorker, id, data, hook_name)
end
def self.fetch_hooks
GeoNode.where(primary: false).map(&:system_hook)
end
end
module Geo
class ScheduleRepoFetchService
def initialize(params)
@project_id = params[:project_id]
@remote_url = params[:remote_url]
end
def execute
GeoRepositoryFetchWorker.perform_async(@project_id, @remote_url)
end
end
end
class GeoRepositoryFetchWorker
include Sidekiq::Worker
include Gitlab::ShellAdapter
sidekiq_options queue: 'geo_repository_update'
def perform(project_id, clone_url)
project = Project.find(project_id)
project.create_repository unless project.repository_exists?
project.repository.after_create if project.empty_repo?
project.repository.fetch_geo_mirror(clone_url)
end
end
......@@ -5,22 +5,15 @@ class GeoRepositoryUpdateWorker
attr_accessor :project
def perform(project_id, clone_url, push_data = nil)
def perform(project_id, _clone_url, push_data = nil)
@project = Project.find(project_id)
@push_data = push_data
fetch_repository(clone_url)
process_hooks if push_data # we should be compatible with old unprocessed data
end
private
def fetch_repository(remote_url)
@project.create_repository unless @project.repository_exists?
@project.repository.after_create if @project.empty_repo?
@project.repository.fetch_geo_mirror(remote_url)
end
def process_hooks
if @push_data['type'] == 'push'
branch = Gitlab::Git.ref_name(@push_data['ref'])
......
......@@ -27,6 +27,16 @@ class PostReceive
# Triggers repository update on secondary nodes when Geo is enabled
Gitlab::Geo.notify_wiki_update(post_received.project) if Gitlab::Geo.enabled?
elsif post_received.regular_project?
if Gitlab::Geo.enabled?
hook_data = {
project_id: post_received.project.id,
event_name: 'trigger_fetch',
remote_url: post_received.project.ssh_url_to_repo
}
SystemHooksService.new.execute_hooks(hook_data, :fetch_hooks)
end
process_project_changes(post_received)
else
log("Triggered hook for unidentifiable repository type with full path \"#{repo_path}\"")
......
......@@ -24,6 +24,9 @@ module API
when 'key_create', 'key_destroy'
required_attributes! %w(key id)
::Geo::ScheduleKeyChangeService.new(params).execute
when 'trigger_fetch'
required_attributes! %w(event_name project_id remote_url)
::Geo::ScheduleRepoFetchService.new(params).execute
when 'push'
required_attributes! %w(event_name project_id project)
::Geo::ScheduleRepoUpdateService.new(params).execute
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment