Update composite pipelines index to include "id"

This updates the composite index on ci_pipelines (project_id, ref, status) to also include the "id" column at the end. Adding this column to the index drastically improves the performance of queries used for getting the latest pipeline for a particular branch. For example, on project dashboards we'll run a query like the following: SELECT ci_pipelines.* FROM ci_pipelines WHERE ci_pipelines.project_id = 13083 AND ci_pipelines.ref = 'master' AND ci_pipelines.status = 'success' ORDER BY ci_pipelines.id DESC LIMIT 1; Limit (cost=0.43..58.88 rows=1 width=224) (actual time=26.956..26.956 rows=1 loops=1) Buffers: shared hit=6544 dirtied=16 -> Index Scan Backward using ci_pipelines_pkey on ci_pipelines (cost=0.43..830922.89 rows=14216 width=224) (actual time=26.954..26.954 rows=1 loops=1) Filter: ((project_id = 13083) AND ((ref)::text = 'master'::text) AND ((status)::text = 'success'::text)) Rows Removed by Filter: 6476 Buffers: shared hit=6544 dirtied=16 Planning time: 1.484 ms Execution time: 27.000 ms Because of the lack of "id" in the index we end up scanning over the primary key index, then applying a filter to filter out any remaining rows. The more pipelines a GitLab instance has the slower this will get. By adding "id" to the mentioned composite index we can change the above plan into the following: Limit (cost=0.56..2.01 rows=1 width=224) (actual time=0.034..0.034 rows=1 loops=1) Buffers: shared hit=5 -> Index Scan Backward using yorick_test on ci_pipelines (cost=0.56..16326.37 rows=11243 width=224) (actual time=0.033..0.033 rows=1 loops=1) Index Cond: ((project_id = 13083) AND ((ref)::text = 'master'::text) AND ((status)::text = 'success'::text)) Buffers: shared hit=5 Planning time: 0.695 ms Execution time: 0.061 ms This in turn leads to a best-case improvement of roughly 25 milliseconds, give or take a millisecond or two.

Update composite pipelines index to include "id"
This updates the composite index on ci_pipelines (project_id, ref, status) to also include the "id" column at the end. Adding this column to the index drastically improves the performance of queries used for getting the latest pipeline for a particular branch. For example, on project dashboards we'll run a query like the following: SELECT ci_pipelines.* FROM ci_pipelines WHERE ci_pipelines.project_id = 13083 AND ci_pipelines.ref = 'master' AND ci_pipelines.status = 'success' ORDER BY ci_pipelines.id DESC LIMIT 1; Limit (cost=0.43..58.88 rows=1 width=224) (actual time=26.956..26.956 rows=1 loops=1) Buffers: shared hit=6544 dirtied=16 -> Index Scan Backward using ci_pipelines_pkey on ci_pipelines (cost=0.43..830922.89 rows=14216 width=224) (actual time=26.954..26.954 rows=1 loops=1) Filter: ((project_id = 13083) AND ((ref)::text = 'master'::text) AND ((status)::text = 'success'::text)) Rows Removed by Filter: 6476 Buffers: shared hit=6544 dirtied=16 Planning time: 1.484 ms Execution time: 27.000 ms Because of the lack of "id" in the index we end up scanning over the primary key index, then applying a filter to filter out any remaining rows. The more pipelines a GitLab instance has the slower this will get. By adding "id" to the mentioned composite index we can change the above plan into the following: Limit (cost=0.56..2.01 rows=1 width=224) (actual time=0.034..0.034 rows=1 loops=1) Buffers: shared hit=5 -> Index Scan Backward using yorick_test on ci_pipelines (cost=0.56..16326.37 rows=11243 width=224) (actual time=0.033..0.033 rows=1 loops=1) Index Cond: ((project_id = 13083) AND ((ref)::text = 'master'::text) AND ((status)::text = 'success'::text)) Buffers: shared hit=5 Planning time: 0.695 ms Execution time: 0.061 ms This in turn leads to a best-case improvement of roughly 25 milliseconds, give or take a millisecond or two.
aafe5c12 · Yorick Peterse · 9024875e · aafe5c12 · aafe5c12 · aafe5c12
Commit aafe5c12 authored Nov 21, 2017 by Yorick Peterse
3 changed files
--- a/changelogs/unreleased/fix-ci-pipelines-index.yml
+++ b/changelogs/unreleased/fix-ci-pipelines-index.yml
+---
+title: Update composite pipelines index to include "id"
+merge_request:
+author:
+type: performance
--- a/db/migrate/20171121144800_ci_pipelines_index_on_project_id_ref_status_id.rb
+++ b/db/migrate/20171121144800_ci_pipelines_index_on_project_id_ref_status_id.rb
+# See http://doc.gitlab.com/ce/development/migration_style_guide.html
+# for more information on how to write migrations for GitLab.
+
+class CiPipelinesIndexOnProjectIdRefStatusId < ActiveRecord::Migration
+  include Gitlab::Database::MigrationHelpers
+
+  # Set this constant to true if this migration requires downtime.
+  DOWNTIME = false
+
+  disable_ddl_transaction!
+
+  TABLE = :ci_pipelines
+  OLD_COLUMNS = %i[project_id ref status].freeze
+  NEW_COLUMNS = %i[project_id ref status id].freeze
+
+  def up
+    unless index_exists?(TABLE, NEW_COLUMNS)
+      add_concurrent_index(TABLE, NEW_COLUMNS)
+    end
+
+    if index_exists?(TABLE, OLD_COLUMNS)
+      remove_concurrent_index(TABLE, OLD_COLUMNS)
+    end
+  end
+
+  def down
+    unless index_exists?(TABLE, OLD_COLUMNS)
+      add_concurrent_index(TABLE, OLD_COLUMNS)
+    end
+
+    if index_exists?(TABLE, NEW_COLUMNS)
+      remove_concurrent_index(TABLE, NEW_COLUMNS)
+    end
+  end
+end
--- a/db/schema.rb
+++ b/db/schema.rb
@@ -11,7 +11,7 @@
 #
 # It's strongly recommended that you check this file into your version control system.

-ActiveRecord::Schema.define(version: 20171114162227) do
+ActiveRecord::Schema.define(version: 20171121144800) do

  # These are extensions that must be enabled in order to support this database
  enable_extension "plpgsql"
@@ -382,7 +382,7 @@ ActiveRecord::Schema.define(version: 20171114162227) do

  add_index "ci_pipelines", ["auto_canceled_by_id"], name: "index_ci_pipelines_on_auto_canceled_by_id", using: :btree
  add_index "ci_pipelines", ["pipeline_schedule_id"], name: "index_ci_pipelines_on_pipeline_schedule_id", using: :btree
-  add_index "ci_pipelines", ["project_id", "ref", "status"], name: "index_ci_pipelines_on_project_id_and_ref_and_status", using: :btree
+  add_index "ci_pipelines", ["project_id", "ref", "status", "id"], name: "index_ci_pipelines_on_project_id_and_ref_and_status_and_id", using: :btree
  add_index "ci_pipelines", ["project_id", "sha"], name: "index_ci_pipelines_on_project_id_and_sha", using: :btree
  add_index "ci_pipelines", ["project_id"], name: "index_ci_pipelines_on_project_id", using: :btree
  add_index "ci_pipelines", ["status"], name: "index_ci_pipelines_on_status", using: :btree