Commit 7de59ad6 authored by Bob Van Landuyt's avatar Bob Van Landuyt

Document adding context to jobs

This adds the documentation on how to add context to cron-workers or
jobs scheduled in batch.
parent 86996ee3
......@@ -13,7 +13,7 @@ module MergeRequests
diffs.each_batch(of: BATCH_SIZE) do |relation, index|
ids = relation.pluck_primary_key.map { |id| [id] }
DeleteDiffFilesWorker.bulk_perform_in(index * 5.minutes, ids)
DeleteDiffFilesWorker.bulk_perform_in(index * 5.minutes, ids) # rubocop:disable Scalability/BulkPerformWithContext
end
end
end
......
......@@ -276,6 +276,125 @@ class SomeCrossCuttingConcernWorker
end
```
## Worker context
To have some more information about workers in the logs, we add
[metadata to the jobs in the form of an
`ApplicationContext`](logging.md#logging-context-metadata-through-rails-or-grape-requests).
In most cases, when scheduling a job from a request, this context will
already be deducted from the request and added to the scheduled
job.
When a job runs, the context that was active when it was scheduled
will be restored. This causes the context to be propagated to any job
scheduled from within the running job.
All this means that in most cases, to add context to jobs, we don't
need to do anything.
There are however some instances when there would be no context
present when the job is scheduled, or the context that is present is
likely to be incorrect. For these instances we've added rubocop-rules
to draw attention and avoid incorrect metadata in our logs.
As with most our cops, there are perfectly valid reasons for disabling
them. In this case it could be that the context from the request is
correct. Or maybe you've specified a context already in a way that
isn't picked up by the cops. In any case, please leave a code-comment
pointing to which context will be used when disabling the cops.
When you do provide objects to the context, please make sure that the
route for namespaces and projects is preloaded. This can be done using
the `.with_route` scope defined on all `Routable`s.
### Cron-Workers
The context is automatically cleared for workers in the cronjob-queue
(which `include CronjobQueue`), even when scheduling them from
requests. We do this to avoid incorrect metadata when other jobs are
scheduled from the cron-worker.
Cron-Workers themselves run instance wide, so they aren't scoped to
users, namespaces, projects or other resources that should be added to
the context.
However, they often schedule other jobs that _do_ require context.
That is why there needs to be an indication of context somewhere in
the worker. This can be done by using one of the following methods
somewhere within the worker:
1. Wrap the code that schedules jobs in the `with_context` helper:
```ruby
def perform
deletion_cutoff = Gitlab::CurrentSettings
.deletion_adjourned_period.days.ago.to_date
projects = Project.with_route.with_namespace
.aimed_for_deletion(deletion_cutoff)
projects.find_each(batch_size: 100).with_index do |project, index|
delay = index * INTERVAL
with_context(project: project) do
AdjournedProjectDeletionWorker.perform_in(delay, project.id)
end
end
end
```
1. Use the a batch scheduling method that provides context:
```ruby
def schedule_projects_in_batch(projects)
ProjectImportScheduleWorker.bulk_perform_async_with_contexts(
projects,
arguments_proc: -> (project) { project.id },
context_proc: -> (project) { { project: project } }
)
end
```
or when scheduling with delays:
```ruby
diffs.each_batch(of: BATCH_SIZE) do |diffs, index|
DeleteDiffFilesWorker
.bulk_perform_in_with_contexts(index * 5.minutes,
diffs,
arguments_proc: -> (diff) { diff.id },
context_proc: -> (diff) { { project: diff.merge_request.target_project } })
end
```
### Jobs scheduled in bulk
Often, when scheduling jobs in bulk, these jobs should have a separate
context rather than the overarching context.
If that is the case, `bulk_perform_async` can be replaced by the
`bulk_perform_async_with_context` helper, and instead of
`bulk_perform_in` use `bulk_perform_in_with_context`.
For example:
```ruby
ProjectImportScheduleWorker.bulk_perform_async_with_contexts(
projects,
arguments_proc: -> (project) { project.id },
context_proc: -> (project) { { project: project } }
)
```
Each object from the enumerable in the first argument is yielded into 2
blocks:
The `arguments_proc` which needs to return the list of arguments the
job needs to be scheduled with.
The `context_proc` which needs to return a hash with the context
information for the job.
## Tests
Each Sidekiq worker must be tested using RSpec, just like any other class. These
......
......@@ -24,16 +24,22 @@ module RuboCop
MSG
def_node_matcher :schedules_in_batch_without_context?, <<~PATTERN
(send (...) {:bulk_perform_async :bulk_perform_in} (...))
(send (...) {:bulk_perform_async :bulk_perform_in} _*)
PATTERN
def on_send(node)
return if in_migration?(node)
return if in_migration?(node) || in_spec?(node)
return unless schedules_in_batch_without_context?(node)
return if name_of_receiver(node) == "BackgroundMigrationWorker"
add_offense(node, location: :expression)
end
private
def in_spec?(node)
file_path_for_node(node).end_with?("_spec.rb")
end
end
end
end
......
......@@ -21,7 +21,10 @@ describe RuboCop::Cop::Scalability::BulkPerformWithContext do
it "adds an offense when calling bulk_perform_in" do
inspect_source(<<~CODE.strip_indent)
Worker.bulk_perform_in(args)
diffs.each_batch(of: BATCH_SIZE) do |relation, index|
ids = relation.pluck_primary_key.map { |id| [id] }
DeleteDiffFilesWorker.bulk_perform_in(index * 5.minutes, ids)
end
CODE
expect(cop.offenses.size).to eq(1)
......@@ -37,6 +40,16 @@ describe RuboCop::Cop::Scalability::BulkPerformWithContext do
expect(cop.offenses.size).to eq(0)
end
it "does not add an offence for specs" do
allow(cop).to receive(:in_spec?).and_return(true)
inspect_source(<<~CODE.strip_indent)
Worker.bulk_perform_in(args)
CODE
expect(cop.offenses.size).to eq(0)
end
it "does not add an offense for scheduling BackgroundMigrations" do
inspect_source(<<~CODE.strip_indent)
BackgroundMigrationWorker.bulk_perform_in(args)
......
......@@ -38,7 +38,7 @@ describe ApplicationWorker do
describe '.bulk_perform_async' do
it 'enqueues jobs in bulk' do
Sidekiq::Testing.fake! do
worker.bulk_perform_async([['Foo', [1]], ['Foo', [2]]]) # rubocop:disable Scalability/BulkPerformWithContext
worker.bulk_perform_async([['Foo', [1]], ['Foo', [2]]])
expect(worker.jobs.count).to eq 2
expect(worker.jobs).to all(include('enqueued_at'))
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment