Commit 9189bee5 authored by Sean McGivern's avatar Sean McGivern

Merge branch 'in-optimization-with-sql-expressions' into 'master'

Extend in-operator optimziation ORDER BY support

See merge request gitlab-org/gitlab!76931
parents 3719e8ca 5a640061
......@@ -589,6 +589,87 @@ LIMIT 20
NOTE:
To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
#### Using calculated ORDER BY expression
The following example orders epic records by the duration between the creation time and closed
time. It is calculated with the following formula:
```sql
SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at) FROM epics
```
The query above returns the duration in seconds (`double precision`) between the two timestamp
columns in seconds. To order the records by this expression, you must reference it
in the `ORDER BY` clause:
```sql
SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at)
FROM epics
ORDER BY EXTRACT('epoch' FROM epics.closed_at - epics.created_at) DESC
```
To make this ordering efficient on the group-level with the in-operator optimization, use a
custom `ORDER BY` configuration. Since the duration is not a distinct value (no unique index
present), you must add a tie-breaker column (`id`).
The following example shows the final `ORDER BY` clause:
```sql
ORDER BY extract('epoch' FROM epics.closed_at - epics.created_at) DESC, epics.id DESC
```
Snippet for loading records ordered by the calcualted duration:
```ruby
arel_table = Epic.arel_table
order = Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'duration_in_seconds',
order_expression: Arel.sql('EXTRACT(EPOCH FROM epics.closed_at - epics.created_at)').desc,
distinct: false,
sql_type: 'double precision' # important for calculated SQL expressions
),
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id',
order_expression: arel_table[:id].desc
)
])
records = Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
scope: Epic.where.not(closed_at: nil).reorder(order), # filter out NULL values
array_scope: Group.find(9970).self_and_descendants.select(:id),
array_mapping_scope: -> (id_expression) { Epic.where(Epic.arel_table[:group_id].eq(id_expression)) }
).execute.limit(20)
puts records.pluck(:duration_in_seconds, :id) # other columnns are not available
```
Building the query requires quite a bit of configuration. For the order configuration you
can find more information within the
[complex order configuration](keyset_pagination.md#complex-order-configuration)
section for keyset paginated database queries.
The query requires a specialized database index:
```sql
CREATE INDEX index_epics_on_duration ON epics USING btree (group_id, EXTRACT(EPOCH FROM epics.closed_at - epics.created_at) DESC, id DESC) WHERE (closed_at IS NOT NULL);
```
Notice that the `finder_query` parameter is not used. The query only returns the `ORDER BY` columns
which are the `duration_in_seconds` (calculated column) and the `id` columns. This is a limitation
of the feature, defining the `finder_query` with calculated `ORDER BY` expressions is not supported.
To get the complete database records, an extra query can be invoked by the returned `id` column:
```ruby
records_by_id = records.index_by(&:id)
complete_records = Epic.where(id: records_by_id.keys).index_by(&:id)
# Printing the complete records according to the `ORDER BY` clause
records_by_id.each do |id, _|
puts complete_records[id].attributes
end
```
#### Batch iteration
Batch iteration over the records is possible via the keyset `Iterator` class.
......
......@@ -114,6 +114,20 @@ module Gitlab
# - When the order is a calculated expression or the column is in another table (JOIN-ed)
#
# If the add_to_projections is true, the query builder will automatically add the column to the SELECT values
#
# **sql_type**
#
# The SQL type of the column or SQL expression. This is an optional field which is only required when using the
# column with the InOperatorOptimization class.
#
# Example: When the order expression is a calculated SQL expression.
#
# {
# attribute_name: 'id_times_count',
# order_expression: Arel.sql('(id * count)').asc,
# sql_type: 'integer' # the SQL type here must match with the type of the produced data by the order_expression. Putting 'text' here would be incorrect.
# }
#
class ColumnOrderDefinition
REVERSED_ORDER_DIRECTIONS = { asc: :desc, desc: :asc }.freeze
REVERSED_NULL_POSITIONS = { nulls_first: :nulls_last, nulls_last: :nulls_first }.freeze
......@@ -122,7 +136,8 @@ module Gitlab
attr_reader :attribute_name, :column_expression, :order_expression, :add_to_projections, :order_direction
def initialize(attribute_name:, order_expression:, column_expression: nil, reversed_order_expression: nil, nullable: :not_nullable, distinct: true, order_direction: nil, add_to_projections: false)
# rubocop: disable Metrics/ParameterLists
def initialize(attribute_name:, order_expression:, column_expression: nil, reversed_order_expression: nil, nullable: :not_nullable, distinct: true, order_direction: nil, sql_type: nil, add_to_projections: false)
@attribute_name = attribute_name
@order_expression = order_expression
@column_expression = column_expression || calculate_column_expression(order_expression)
......@@ -130,8 +145,10 @@ module Gitlab
@reversed_order_expression = reversed_order_expression || calculate_reversed_order(order_expression)
@nullable = parse_nullable(nullable, distinct)
@order_direction = parse_order_direction(order_expression, order_direction)
@sql_type = sql_type
@add_to_projections = add_to_projections
end
# rubocop: enable Metrics/ParameterLists
def reverse
self.class.new(
......@@ -185,6 +202,12 @@ module Gitlab
sql_string
end
def sql_type
raise Gitlab::Pagination::Keyset::SqlTypeMissingError.for_column(self) if @sql_type.nil?
@sql_type
end
private
attr_reader :reversed_order_expression, :nullable, :distinct
......
......@@ -4,23 +4,35 @@ module Gitlab
module Pagination
module Keyset
module InOperatorOptimization
# This class is used for wrapping an Arel column with
# convenient helper methods in order to make the query
# building for the InOperatorOptimization a bit cleaner.
class ColumnData
attr_reader :original_column_name, :as, :arel_table
def initialize(original_column_name, as, arel_table)
@original_column_name = original_column_name.to_s
# column - name of the DB column
# as - custom alias for the column
# arel_table - relation where the column is located
def initialize(column, as, arel_table)
@original_column_name = column
@as = as.to_s
@arel_table = arel_table
end
# Generates: `issues.name AS my_alias`
def projection
arel_column.as(as)
end
# Generates: issues.name`
def arel_column
arel_table[original_column_name]
end
# overridden in OrderByColumnData class
alias_method :column_expression, :arel_column
# Generates: `issues.my_alias`
def arel_column_as
arel_table[as]
end
......@@ -29,8 +41,9 @@ module Gitlab
"#{arel_table.name}_#{original_column_name}_array"
end
# Generates: SELECT ARRAY_AGG(...) AS issues_name_array
def array_aggregated_column
Arel::Nodes::NamedFunction.new('ARRAY_AGG', [arel_column]).as(array_aggregated_column_name)
Arel::Nodes::NamedFunction.new('ARRAY_AGG', [column_expression]).as(array_aggregated_column_name)
end
end
end
......
# frozen_string_literal: true
module Gitlab
module Pagination
module Keyset
module InOperatorOptimization
class OrderByColumnData < ColumnData
extend ::Gitlab::Utils::Override
attr_reader :column
# column - a ColumnOrderDefinition object
# as - custom alias for the column
# arel_table - relation where the column is located
def initialize(column, as, arel_table)
super(column.attribute_name.to_s, as, arel_table)
@column = column
end
override :arel_column
def arel_column
column.column_expression
end
override :column_expression
def column_expression
arel_table[original_column_name]
end
def column_for_projection
column.column_expression.as(original_column_name)
end
end
end
end
end
end
......@@ -9,16 +9,16 @@ module Gitlab
# This class exposes collection methods for the order by columns
#
# Example: by modelling the `issues.created_at ASC, issues.id ASC` ORDER BY
# Example: by modeling the `issues.created_at ASC, issues.id ASC` ORDER BY
# SQL clause, this class will receive two ColumnOrderDefinition objects
def initialize(columns, arel_table)
@columns = columns.map do |column|
ColumnData.new(column.attribute_name, "order_by_columns_#{column.attribute_name}", arel_table)
OrderByColumnData.new(column, "order_by_columns_#{column.attribute_name}", arel_table)
end
end
def arel_columns
columns.map(&:arel_column)
columns.map(&:column_for_projection)
end
def array_aggregated_columns
......
......@@ -120,7 +120,7 @@ module Gitlab
.from(array_cte)
.join(Arel.sql("LEFT JOIN LATERAL (#{initial_keyset_query.to_sql}) #{table_name} ON TRUE"))
order_by_columns.each { |column| q.where(column.arel_column.not_eq(nil)) }
order_by_columns.each { |column| q.where(column.column_expression.not_eq(nil)) }
q.as('array_scope_lateral_query')
end
......@@ -231,7 +231,7 @@ module Gitlab
order
.apply_cursor_conditions(keyset_scope, cursor_values, use_union_optimization: true)
.reselect(*order_by_columns.arel_columns)
.reselect(*order_by_columns.map(&:column_for_projection))
.limit(1)
end
......
......@@ -12,11 +12,7 @@ module Gitlab
end
def initializer_columns
order_by_columns.map do |column|
column_name = column.original_column_name.to_s
type = model.columns_hash[column_name].sql_type
"NULL::#{type} AS #{column_name}"
end
order_by_columns.map { |column_data| null_with_type_cast(column_data) }
end
def columns
......@@ -30,6 +26,15 @@ module Gitlab
private
attr_reader :model, :order_by_columns
def null_with_type_cast(column_data)
column_name = column_data.original_column_name.to_s
active_record_column = model.columns_hash[column_name]
type = active_record_column ? active_record_column.sql_type : column_data.column.sql_type
"NULL::#{type} AS #{column_name}"
end
end
end
end
......
......@@ -9,6 +9,8 @@ module Gitlab
RECORDS_COLUMN = 'records'
def initialize(finder_query, model, order_by_columns)
verify_order_by_attributes_on_model!(model, order_by_columns)
@finder_query = finder_query
@order_by_columns = order_by_columns
@table_name = model.table_name
......@@ -34,6 +36,20 @@ module Gitlab
private
attr_reader :finder_query, :order_by_columns, :table_name
def verify_order_by_attributes_on_model!(model, order_by_columns)
order_by_columns.map(&:column).each do |column|
unless model.columns_hash[column.attribute_name.to_s]
text = <<~TEXT
The "RecordLoaderStrategy" does not support the following ORDER BY column because
it's not available on the \"#{model.table_name}\" table: #{column.attribute_name}
Omit the "finder_query" parameter to use the "OrderValuesLoaderStrategy".
TEXT
raise text
end
end
end
end
end
end
......
# frozen_string_literal: true
module Gitlab
module Pagination
module Keyset
class SqlTypeMissingError < StandardError
def self.for_column(column)
message = <<~TEXT
The "sql_type" attribute is not set for the following column definition:
#{column.attribute_name}
See the ColumnOrderDefinition class for more context.
TEXT
new(message)
end
end
end
end
end
# frozen_string_literal: true
require 'spec_helper'
RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::OrderByColumnData do
let(:arel_table) { Issue.arel_table }
let(:column) do
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: :id,
column_expression: arel_table[:id],
order_expression: arel_table[:id].desc
)
end
subject(:column_data) { described_class.new(column, 'column_alias', arel_table) }
describe '#arel_column' do
it 'delegates to column_expression' do
expect(column_data.arel_column).to eq(column.column_expression)
end
end
describe '#column_for_projection' do
it 'returns the expression with AS using the original column name' do
expect(column_data.column_for_projection.to_sql).to eq('"issues"."id" AS id')
end
end
describe '#projection' do
it 'returns the expression with AS using the specified column lias' do
expect(column_data.projection.to_sql).to eq('"issues"."id" AS column_alias')
end
end
end
......@@ -33,14 +33,14 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
]
end
shared_examples 'correct ordering examples' do
let(:iterator) do
Gitlab::Pagination::Keyset::Iterator.new(
scope: scope.limit(batch_size),
in_operator_optimization_options: in_operator_optimization_options
)
end
let(:iterator) do
Gitlab::Pagination::Keyset::Iterator.new(
scope: scope.limit(batch_size),
in_operator_optimization_options: in_operator_optimization_options
)
end
shared_examples 'correct ordering examples' do |opts = {}|
let(:all_records) do
all_records = []
iterator.each_batch(of: batch_size) do |records|
......@@ -49,8 +49,10 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
all_records
end
it 'returns records in correct order' do
expect(all_records).to eq(expected_order)
unless opts[:skip_finder_query_test]
it 'returns records in correct order' do
expect(all_records).to eq(expected_order)
end
end
context 'when not passing the finder query' do
......@@ -248,4 +250,57 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
expect { described_class.new(**options).execute }.to raise_error(/The order on the scope does not support keyset pagination/)
end
context 'when ordering by SQL expression' do
let(:order) do
# ORDER BY (id * 10), id
Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id_multiplied_by_ten',
order_expression: Arel.sql('(id * 10)').asc,
sql_type: 'integer'
),
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: :id,
order_expression: Issue.arel_table[:id].asc
)
])
end
let(:scope) { Issue.reorder(order) }
let(:expected_order) { issues.sort_by(&:id) }
let(:in_operator_optimization_options) do
{
array_scope: Project.where(namespace_id: top_level_group.self_and_descendants.select(:id)).select(:id),
array_mapping_scope: -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
}
end
context 'when iterating records one by one' do
let(:batch_size) { 1 }
it_behaves_like 'correct ordering examples', skip_finder_query_test: true
end
context 'when iterating records with LIMIT 3' do
let(:batch_size) { 3 }
it_behaves_like 'correct ordering examples', skip_finder_query_test: true
end
context 'when passing finder query' do
let(:batch_size) { 3 }
it 'raises error, loading complete rows are not supported with SQL expressions' do
in_operator_optimization_options[:finder_query] = -> (_, _) { Issue.select(:id, '(id * 10)').where(id: -1) }
expect(in_operator_optimization_options[:finder_query]).not_to receive(:call)
expect do
iterator.each_batch(of: batch_size) { |records| records.to_a }
end.to raise_error /The "RecordLoaderStrategy" does not support/
end
end
end
end
......@@ -31,4 +31,41 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::Strategies::O
])
end
end
context 'when an SQL expression is given' do
context 'when the sql_type attribute is missing' do
let(:order) do
Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id_times_ten',
order_expression: Arel.sql('id * 10').asc
)
])
end
let(:keyset_scope) { Project.order(order) }
it 'raises error' do
expect { strategy.initializer_columns }.to raise_error(Gitlab::Pagination::Keyset::SqlTypeMissingError)
end
end
context 'when the sql_type_attribute is present' do
let(:order) do
Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id_times_ten',
order_expression: Arel.sql('id * 10').asc,
sql_type: 'integer'
)
])
end
let(:keyset_scope) { Project.order(order) }
it 'returns the initializer columns' do
expect(strategy.initializer_columns).to eq(['NULL::integer AS id_times_ten'])
end
end
end
end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment