Commit e386fabe authored by Stan Hu's avatar Stan Hu

Optimize query for loading artifacts in pipeline

Previously to query the latest artifacts for a given pipeline we would
include the project ID in the query just in case. However, the inclusion
of this project ID may cause the PostgreSQL query planner to use the
wrong index. For example, it may use
`index_ci_job_artifacts_on_project_id`, which is slow because there may
be thousands of artifacts for a given project.

To optimize this, we can just omit the project ID since we really care
about job artifacts for a given build.

Relates to
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/4674

Changelog: performance
parent 7f9cd804
......@@ -660,15 +660,9 @@ module Ci
# Return a hash of file type => array of 1 job artifact
def latest_report_artifacts
::Gitlab::SafeRequestStore.fetch("pipeline:#{self.id}:latest_report_artifacts") do
# Note we use read_attribute(:project_id) to read the project
# ID instead of self.project_id. The latter appears to load
# the Project model. This extra filter doesn't appear to
# affect query plan but included to ensure we don't leak the
# wrong informaiton.
::Ci::JobArtifact.where(
id: job_artifacts.with_reports
.select('max(ci_job_artifacts.id) as id')
.where(project_id: self.read_attribute(:project_id))
.group(:file_type)
)
.preload(:job)
......
---
title: Optimize query for loading artifacts in pipeline
merge_request: 62249
author:
type: performance
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment