Document adding context to jobs

This adds the documentation on how to add context to cron-workers or jobs scheduled in batch.

Document adding context to jobs
This adds the documentation on how to add context to cron-workers or jobs scheduled in batch.
7de59ad6 · Bob Van Landuyt · 86996ee3 · 7de59ad6 · 7de59ad6 · 7de59ad6
Commit 7de59ad6 authored Jan 30, 2020 by Bob Van Landuyt
5 changed files
--- a/app/services/merge_requests/delete_non_latest_diffs_service.rb
+++ b/app/services/merge_requests/delete_non_latest_diffs_service.rb
@@ -13,7 +13,7 @@ module MergeRequests

      diffs.each_batch(of: BATCH_SIZE) do |relation, index|
        ids = relation.pluck_primary_key.map { |id| [id] }
-        DeleteDiffFilesWorker.bulk_perform_in(index * 5.minutes, ids)
+        DeleteDiffFilesWorker.bulk_perform_in(index * 5.minutes, ids) # rubocop:disable Scalability/BulkPerformWithContext
      end
    end
  end

--- a/doc/development/sidekiq_style_guide.md
+++ b/doc/development/sidekiq_style_guide.md
@@ -276,6 +276,125 @@ class SomeCrossCuttingConcernWorker
 end
 ```

+## Worker context
+
+To have some more information about workers in the logs, we add
+[metadata to the jobs in the form of an
+`ApplicationContext`](logging.md#logging-context-metadata-through-rails-or-grape-requests).
+In most cases, when scheduling a job from a request, this context will
+already be deducted from the request and added to the scheduled
+job.
+
+When a job runs, the context that was active when it was scheduled
+will be restored. This causes the context to be propagated to any job
+scheduled from within the running job.
+
+All this means that in most cases, to add context to jobs, we don't
+need to do anything.
+
+There are however some instances when there would be no context
+present when the job is scheduled, or the context that is present is
+likely to be incorrect. For these instances we've added rubocop-rules
+to draw attention and avoid incorrect metadata in our logs.
+
+As with most our cops, there are perfectly valid reasons for disabling
+them. In this case it could be that the context from the request is
+correct. Or maybe you've specified a context already in a way that
+isn't picked up by the cops. In any case, please leave a code-comment
+pointing to which context will be used when disabling the cops.
+
+When you do provide objects to the context, please make sure that the
+route for namespaces and projects is preloaded. This can be done using
+the `.with_route` scope defined on all `Routable`s.
+
+### Cron-Workers
+
+The context is automatically cleared for workers in the cronjob-queue
+(which `include CronjobQueue`), even when scheduling them from
+requests. We do this to avoid incorrect metadata when other jobs are
+scheduled from the cron-worker.
+
+Cron-Workers themselves run instance wide, so they aren't scoped to
+users, namespaces, projects or other resources that should be added to
+the context.
+
+However, they often schedule other jobs that _do_ require context.
+
+That is why there needs to be an indication of context somewhere in
+the worker. This can be done by using one of the following methods
+somewhere within the worker:
+
+1. Wrap the code that schedules jobs in the `with_context` helper:
+
+```ruby
+  def perform
+    deletion_cutoff = Gitlab::CurrentSettings
+                        .deletion_adjourned_period.days.ago.to_date
+    projects = Project.with_route.with_namespace
+                 .aimed_for_deletion(deletion_cutoff)
+
+    projects.find_each(batch_size: 100).with_index do |project, index|
+      delay = index * INTERVAL
+
+      with_context(project: project) do
+        AdjournedProjectDeletionWorker.perform_in(delay, project.id)
+      end
+    end
+  end
+```
+
+1. Use the a batch scheduling method that provides context:
+
+```ruby
+  def schedule_projects_in_batch(projects)
+    ProjectImportScheduleWorker.bulk_perform_async_with_contexts(
+      projects,
+      arguments_proc: -> (project) { project.id },
+      context_proc: -> (project) { { project: project } }
+    )
+  end
+```
+
+or when scheduling with delays:
+
+```ruby
+  diffs.each_batch(of: BATCH_SIZE) do |diffs, index|
+    DeleteDiffFilesWorker
+      .bulk_perform_in_with_contexts(index *  5.minutes,
+                                     diffs,
+                                     arguments_proc: -> (diff) { diff.id },
+                                     context_proc: -> (diff) { { project: diff.merge_request.target_project } })
+  end
+```
+
+### Jobs scheduled in bulk
+
+Often, when scheduling jobs in bulk, these jobs should have a separate
+context rather than the overarching context.
+
+If that is the case, `bulk_perform_async` can be replaced by the
+`bulk_perform_async_with_context` helper, and instead of
+`bulk_perform_in` use `bulk_perform_in_with_context`.
+
+For example:
+
+```ruby
+    ProjectImportScheduleWorker.bulk_perform_async_with_contexts(
+      projects,
+      arguments_proc: -> (project) { project.id },
+      context_proc: -> (project) { { project: project } }
+    )
+```
+
+Each object from the enumerable in the first argument is yielded into 2
+blocks:
+
+The `arguments_proc` which needs to return the list of arguments the
+job needs to be scheduled with.
+
+The `context_proc` which needs to return a hash with the context
+information for the job.
+
 ## Tests

 Each Sidekiq worker must be tested using RSpec, just like any other class. These

--- a/rubocop/cop/scalability/bulk_perform_with_context.rb
+++ b/rubocop/cop/scalability/bulk_perform_with_context.rb
@@ -24,16 +24,22 @@ module RuboCop
        MSG

        def_node_matcher :schedules_in_batch_without_context?, <<~PATTERN
-          (send (...) {:bulk_perform_async :bulk_perform_in} (...))
+          (send (...) {:bulk_perform_async :bulk_perform_in} _*)
        PATTERN

        def on_send(node)
-          return if in_migration?(node)
+          return if in_migration?(node) || in_spec?(node)
          return unless schedules_in_batch_without_context?(node)
          return if name_of_receiver(node) == "BackgroundMigrationWorker"

          add_offense(node, location: :expression)
        end
+
+        private
+
+        def in_spec?(node)
+          file_path_for_node(node).end_with?("_spec.rb")
+        end
      end
    end
  end

--- a/spec/rubocop/cop/scalability/bulk_perform_with_context_spec.rb
+++ b/spec/rubocop/cop/scalability/bulk_perform_with_context_spec.rb
@@ -21,7 +21,10 @@ describe RuboCop::Cop::Scalability::BulkPerformWithContext do

  it "adds an offense when calling bulk_perform_in" do
    inspect_source(<<~CODE.strip_indent)
-      Worker.bulk_perform_in(args)
+      diffs.each_batch(of: BATCH_SIZE) do |relation, index|
+        ids = relation.pluck_primary_key.map { |id| [id] }
+        DeleteDiffFilesWorker.bulk_perform_in(index * 5.minutes, ids)
+      end
    CODE

    expect(cop.offenses.size).to eq(1)
@@ -37,6 +40,16 @@ describe RuboCop::Cop::Scalability::BulkPerformWithContext do
    expect(cop.offenses.size).to eq(0)
  end

+  it "does not add an offence for specs" do
+    allow(cop).to receive(:in_spec?).and_return(true)
+
+    inspect_source(<<~CODE.strip_indent)
+      Worker.bulk_perform_in(args)
+    CODE
+
+    expect(cop.offenses.size).to eq(0)
+  end
+
  it "does not add an offense for scheduling BackgroundMigrations" do
    inspect_source(<<~CODE.strip_indent)
      BackgroundMigrationWorker.bulk_perform_in(args)

--- a/spec/workers/concerns/application_worker_spec.rb
+++ b/spec/workers/concerns/application_worker_spec.rb
@@ -38,7 +38,7 @@ describe ApplicationWorker do
  describe '.bulk_perform_async' do
    it 'enqueues jobs in bulk' do
      Sidekiq::Testing.fake! do
-        worker.bulk_perform_async([['Foo', [1]], ['Foo', [2]]]) # rubocop:disable Scalability/BulkPerformWithContext
+        worker.bulk_perform_async([['Foo', [1]], ['Foo', [2]]])

        expect(worker.jobs.count).to eq 2
        expect(worker.jobs).to all(include('enqueued_at'))