Commit 3a1c2978 authored by Dylan Griffith's avatar Dylan Griffith

Speed up Elastic AddNewDataToMergeRequestsDocuments migration

This migration was just introduced in
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/59836 .

After watching this migration running in production we can see that we
can make it run faster. The numbers here also match what we used in
`AddPermissionsDataToNotesDocuments` so should be safe.

I've also changed it to use the `ProcessInitialBookkeepingService` which
is basically just moving the work into a different queue. We have 2
queues for indexing work in Elasticsearch and the "initial" queue is for
newly created projects/imported projects. It gets much less traffic than
the so called "incremental" queue and therefore it's safer to put
backfill migrations into this queue. We actually should have used this
queue to begin with but switching it now is perfectly fine.

Changelog: changed
parent 0a448da2
---
title: Speed up Elastic AddNewDataToMergeRequestsDocuments migration
merge_request: 60563
author:
type: changed
......@@ -2,9 +2,9 @@
class AddNewDataToMergeRequestsDocuments < Elastic::Migration
batched!
throttle_delay 5.minutes
throttle_delay 3.minutes
QUERY_BATCH_SIZE = 5000
QUERY_BATCH_SIZE = 6000
UPDATE_BATCH_SIZE = 100
def migrate
......@@ -38,7 +38,7 @@ class AddNewDataToMergeRequestsDocuments < Elastic::Migration
end
document_references.each_slice(UPDATE_BATCH_SIZE) do |refs|
Elastic::ProcessBookkeepingService.track!(*refs)
Elastic::ProcessInitialBookkeepingService.track!(*refs)
end
log "Adding visibility_level field to merge_requests documents is completed for batch of #{document_references.size} documents"
......
......@@ -20,7 +20,7 @@ RSpec.describe AddNewDataToMergeRequestsDocuments, :elastic, :sidekiq_inline do
describe 'migration_options' do
it 'has migration options set', :aggregate_failures do
expect(migration.batched?).to be_truthy
expect(migration.throttle_delay).to eq(5.minutes)
expect(migration.throttle_delay).to eq(3.minutes)
end
end
......@@ -33,7 +33,7 @@ RSpec.describe AddNewDataToMergeRequestsDocuments, :elastic, :sidekiq_inline do
end
it 'does not modify data', :aggregate_failures do
expect(::Elastic::ProcessBookkeepingService).not_to receive(:track!)
expect(::Elastic::ProcessInitialBookkeepingService).not_to receive(:track!)
subject
end
......@@ -46,7 +46,7 @@ RSpec.describe AddNewDataToMergeRequestsDocuments, :elastic, :sidekiq_inline do
it 'updates all merge_request documents' do
# track calls are batched in groups of 100
expect(::Elastic::ProcessBookkeepingService).to receive(:track!).once do |*tracked_refs|
expect(::Elastic::ProcessInitialBookkeepingService).to receive(:track!).once do |*tracked_refs|
expect(tracked_refs.count).to eq(3)
end
......@@ -58,7 +58,7 @@ RSpec.describe AddNewDataToMergeRequestsDocuments, :elastic, :sidekiq_inline do
add_visibility_level_for_merge_requests(merge_requests[1..-1])
expected = [Gitlab::Elastic::DocumentReference.new(MergeRequest, merge_request.id, merge_request.es_id, merge_request.es_parent)]
expect(::Elastic::ProcessBookkeepingService).to receive(:track!).with(*expected).once
expect(::Elastic::ProcessInitialBookkeepingService).to receive(:track!).with(*expected).once
subject
end
......@@ -67,7 +67,7 @@ RSpec.describe AddNewDataToMergeRequestsDocuments, :elastic, :sidekiq_inline do
stub_const("#{described_class}::QUERY_BATCH_SIZE", 2)
stub_const("#{described_class}::UPDATE_BATCH_SIZE", 1)
expect(::Elastic::ProcessBookkeepingService).to receive(:track!).exactly(3).times.and_call_original
expect(::Elastic::ProcessInitialBookkeepingService).to receive(:track!).exactly(3).times.and_call_original
# cannot use subject in spec because it is memoized
migration.migrate
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment