Optimize query for loading artifacts in pipeline

Previously to query the latest artifacts for a given pipeline we would include the project ID in the query just in case. However, the inclusion of this project ID may cause the PostgreSQL query planner to use the wrong index. For example, it may use `index_ci_job_artifacts_on_project_id`, which is slow because there may be thousands of artifacts for a given project. To optimize this, we can just omit the project ID since we really care about job artifacts for a given build. Relates to https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4674 Changelog: performance

Optimize query for loading artifacts in pipeline
Previously to query the latest artifacts for a given pipeline we would include the project ID in the query just in case. However, the inclusion of this project ID may cause the PostgreSQL query planner to use the wrong index. For example, it may use `index_ci_job_artifacts_on_project_id`, which is slow because there may be thousands of artifacts for a given project. To optimize this, we can just omit the project ID since we really care about job artifacts for a given build. Relates to https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4674 Changelog: performance
e386fabe · Stan Hu · 7f9cd804 · e386fabe · e386fabe
Commit e386fabe authored May 20, 2021 by Stan Hu
Showing with 5 additions and 6 deletions

app/models/ci/pipeline.rb app/models/ci/pipeline.rb +0 -6

changelogs/unreleased/sh-optimize-artifact-loading-mr.yml changelogs/unreleased/sh-optimize-artifact-loading-mr.yml +5 -0

No files found.
--- a/app/models/ci/pipeline.rb
+++ b/app/models/ci/pipeline.rb
@@ -660,15 +660,9 @@ module Ci
    # Return a hash of file type => array of 1 job artifact
    def latest_report_artifacts
      ::Gitlab::SafeRequestStore.fetch("pipeline:#{self.id}:latest_report_artifacts") do
-        # Note we use read_attribute(:project_id) to read the project
-        # ID instead of self.project_id. The latter appears to load
-        # the Project model. This extra filter doesn't appear to
-        # affect query plan but included to ensure we don't leak the
-        # wrong informaiton.
        ::Ci::JobArtifact.where(
          id: job_artifacts.with_reports
            .select('max(ci_job_artifacts.id) as id')
-            .where(project_id: self.read_attribute(:project_id))
            .group(:file_type)
        )
          .preload(:job)

--- a/changelogs/unreleased/sh-optimize-artifact-loading-mr.yml
+++ b/changelogs/unreleased/sh-optimize-artifact-loading-mr.yml
+---
+title: Optimize query for loading artifacts in pipeline
+merge_request: 62249
+author:
+type: performance