Commit fdd1fda8 authored by pbair's avatar pbair

Update BG migration docs for removed methods

Updated BG migration docs to remove examples that refer to migration
helper methods that were recently removed: `migrate_async`,
`bulk_migrate_async`, and `bulk_migrate_in`.
parent 68d1ea85
...@@ -83,23 +83,11 @@ replacing the class name and arguments with whatever values are necessary for ...@@ -83,23 +83,11 @@ replacing the class name and arguments with whatever values are necessary for
your migration: your migration:
```ruby ```ruby
migrate_async('BackgroundMigrationClassName', [arg1, arg2, ...]) migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...])
``` ```
Usually it's better to enqueue jobs in bulk, for this you can use You can use the function `queue_background_migration_jobs_by_range_at_intervals`
`bulk_migrate_async`: to automatically split the job into batches:
```ruby
bulk_migrate_async(
[['BackgroundMigrationClassName', [1]],
['BackgroundMigrationClassName', [2]]]
)
```
Note that this will queue a Sidekiq job immediately: if you have a large number
of records, this may not be what you want. You can use the function
`queue_background_migration_jobs_by_range_at_intervals` to split the job into
batches:
```ruby ```ruby
queue_background_migration_jobs_by_range_at_intervals( queue_background_migration_jobs_by_range_at_intervals(
...@@ -117,16 +105,6 @@ consuming migrations it's best to schedule a background job using an ...@@ -117,16 +105,6 @@ consuming migrations it's best to schedule a background job using an
updates. Removals in turn can be handled by simply defining foreign keys with updates. Removals in turn can be handled by simply defining foreign keys with
cascading deletes. cascading deletes.
If you would like to schedule jobs in bulk with a delay, you can use
`BackgroundMigrationWorker.bulk_perform_in`:
```ruby
jobs = [['BackgroundMigrationClassName', [1]],
['BackgroundMigrationClassName', [2]]]
bulk_migrate_in(5.minutes, jobs)
```
### Rescheduling background migrations ### Rescheduling background migrations
If one of the background migrations contains a bug that is fixed in a patch If one of the background migrations contains a bug that is fixed in a patch
...@@ -197,53 +175,47 @@ the new format. ...@@ -197,53 +175,47 @@ the new format.
## Example ## Example
To explain all this, let's use the following example: the table `services` has a To explain all this, let's use the following example: the table `integrations` has a
field called `properties` which is stored in JSON. For all rows you want to field called `properties` which is stored in JSON. For all rows you want to
extract the `url` key from this JSON object and store it in the `services.url` extract the `url` key from this JSON object and store it in the `integrations.url`
column. There are millions of services and parsing JSON is slow, thus you can't column. There are millions of integrations and parsing JSON is slow, thus you can't
do this in a regular migration. do this in a regular migration.
To do this using a background migration we'll start with defining our migration To do this using a background migration we'll start with defining our migration
class: class:
```ruby ```ruby
class Gitlab::BackgroundMigration::ExtractServicesUrl class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
class Service < ActiveRecord::Base class Integration < ActiveRecord::Base
self.table_name = 'services' self.table_name = 'integrations'
end end
def perform(service_id) def perform(start_id, end_id)
# A row may be removed between scheduling and starting of a job, thus we Integration.where(id: start_id..end_id).each do |integration|
# need to make sure the data is still present before doing any work. json = JSON.load(integration.properties)
service = Service.select(:properties).find_by(id: service_id)
return unless service integration.update(url: json['url']) if json['url']
begin
json = JSON.load(service.properties)
rescue JSON::ParserError rescue JSON::ParserError
# If the JSON is invalid we don't want to keep the job around forever, # If the JSON is invalid we don't want to keep the job around forever,
# instead we'll just leave the "url" field to whatever the default value # instead we'll just leave the "url" field to whatever the default value
# is. # is.
return next
end end
service.update(url: json['url']) if json['url']
end end
end end
``` ```
Next we'll need to adjust our code so we schedule the above migration for newly Next we'll need to adjust our code so we schedule the above migration for newly
created and updated services. We can do this using something along the lines of created and updated integrations. We can do this using something along the lines of
the following: the following:
```ruby ```ruby
class Service < ActiveRecord::Base class Integration < ActiveRecord::Base
after_commit :schedule_service_migration, on: :update after_commit :schedule_integration_migration, on: :update
after_commit :schedule_service_migration, on: :create after_commit :schedule_integration_migration, on: :create
def schedule_service_migration def schedule_integration_migration
BackgroundMigrationWorker.perform_async('ExtractServicesUrl', [id]) BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id])
end end
end end
``` ```
...@@ -253,21 +225,20 @@ before the transaction completes as doing so can lead to race conditions where ...@@ -253,21 +225,20 @@ before the transaction completes as doing so can lead to race conditions where
the changes are not yet visible to the worker. the changes are not yet visible to the worker.
Next we'll need a post-deployment migration that schedules the migration for Next we'll need a post-deployment migration that schedules the migration for
existing data. Since we're dealing with a lot of rows we'll schedule jobs in existing data.
batches instead of doing this one by one:
```ruby ```ruby
class ScheduleExtractServicesUrl < Gitlab::Database::Migration[1.0] class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
disable_ddl_transaction! disable_ddl_transaction!
def up MIGRATION = 'ExtractIntegrationsUrl'
define_batchable_model('services').select(:id).in_batches do |relation| DELAY_INTERVAL = 2.minutes
jobs = relation.pluck(:id).map do |id|
['ExtractServicesUrl', [id]]
end
BackgroundMigrationWorker.bulk_perform_async(jobs) def up
end queue_background_migration_jobs_by_range_at_intervals(
define_batchable_model('integrations'),
MIGRATION,
DELAY_INTERVAL)
end end
def down def down
...@@ -284,18 +255,18 @@ jobs and manually run on any un-migrated rows. Such a migration would look like ...@@ -284,18 +255,18 @@ jobs and manually run on any un-migrated rows. Such a migration would look like
this: this:
```ruby ```ruby
class ConsumeRemainingExtractServicesUrlJobs < Gitlab::Database::Migration[1.0] class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0]
disable_ddl_transaction! disable_ddl_transaction!
def up def up
# This must be included # This must be included
Gitlab::BackgroundMigration.steal('ExtractServicesUrl') Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl')
# This should be included, but can be skipped - see below # This should be included, but can be skipped - see below
define_batchable_model('services').where(url: nil).each_batch(of: 50) do |batch| define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch|
range = batch.pluck('MIN(id)', 'MAX(id)').first range = batch.pluck('MIN(id)', 'MAX(id)').first
Gitlab::BackgroundMigration::ExtractServicesUrl.new.perform(*range) Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range)
end end
end end
...@@ -313,9 +284,9 @@ If the application does not depend on the data being 100% migrated (for ...@@ -313,9 +284,9 @@ If the application does not depend on the data being 100% migrated (for
instance, the data is advisory, and not mission-critical), then this final step instance, the data is advisory, and not mission-critical), then this final step
can be skipped. can be skipped.
This migration will then process any jobs for the ExtractServicesUrl migration This migration will then process any jobs for the ExtractIntegrationsUrl migration
and continue once all jobs have been processed. Once done you can safely remove and continue once all jobs have been processed. Once done you can safely remove
the `services.properties` column. the `integrations.properties` column.
## Testing ## Testing
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment