Commit 855a4095 authored by Adam Hegyi's avatar Adam Hegyi

Merge branch 'migrations-for-multiple-databases' into 'master'

Add documentation for migrations for multiple databases

See merge request gitlab-org/gitlab!84764
parents da10c4a7 f14c7af7
...@@ -23,6 +23,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -23,6 +23,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Migrations ## Migrations
- [Migrations for multiple databases](migrations_for_multiple_databases.md)
- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md) - [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md)
- [SQL guidelines](../sql.md) for working with SQL queries - [SQL guidelines](../sql.md) for working with SQL queries
- [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations - [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations
......
This diff is collapsed.
...@@ -9,6 +9,63 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -9,6 +9,63 @@ info: To determine the technical writer assigned to the Stage/Group associated w
To scale GitLab, the we are To scale GitLab, the we are
[decomposing the GitLab application database into multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168). [decomposing the GitLab application database into multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168).
## GitLab Schema
For properly discovering allowed patterns between different databases
the GitLab application implements the `lib/gitlab/database/gitlab_schemas.yml` YAML file.
This file provides a virtual classification of tables into a `gitlab_schema`
which conceptually is similar to [PostgreSQL Schema](https://www.postgresql.org/docs/current/ddl-schemas.html).
We decided as part of [using database schemas to better isolated CI decomposed features](https://gitlab.com/gitlab-org/gitlab/-/issues/333415)
that we cannot use PostgreSQL schema due to complex migration procedures. Instead we implemented
the concept of application-level classification.
Each table of GitLab needs to have a `gitlab_schema` assigned:
- `gitlab_main`: describes all tables that are being stored in the `main:` database (for example, like `projects`, `users`).
- `gitlab_ci`: describes all CI tables that are being stored in the `ci:` database (for example, `ci_pipelines`, `ci_builds`).
- `gitlab_shared`: describe all application tables that contain data across all decomposed databases (for example, `loose_foreign_keys_deleted_records`).
- `...`: more schemas to be introduced with additional decomposed databases
The usage of schema enforces the base class to be used:
- `ApplicationRecord` for `gitlab_main`
- `Ci::ApplicationRecord` for `gitlab_ci`
- `Gitlab::Database::SharedModel` for `gitlab_shared`
### The impact of `gitlab_schema`
The usage of `gitlab_schema` has a significant impact on the application.
The `gitlab_schema` primary purpose is to introduce a barrier between different data access patterns.
This is used as a primary source of classification for:
- [Discovering cross-joins across tables from different schemas](#removing-joins-between-ci_-and-non-ci_-tables)
- [Discovering cross-database transactions across tables from different schemas](#removing-cross-database-transactions)
### The special purpose of `gitlab_shared`
`gitlab_shared` is a special case describing tables or views that by design contain data across
all decomposed databases. This does describe application-defined tables (like `loose_foreign_keys_deleted_records`),
Rails-defined tables (like `schema_migrations` or `ar_internal_metadata` as well as internal PostgreSQL tables
(for example, `pg_attribute`).
**Be careful** to use `gitlab_shared` as it requires special handling while accessing data.
Since `gitlab_shared` shares not only structure but also data, the application needs to be written in a way
that traverses all data from all databases in sequential manner.
```ruby
Gitlab::Database::EachDatabase.each_model_connection([MySharedModel]) do |connection, connection_name|
MySharedModel.select_all_data...
end
```
As such, migrations modifying data of `gitlab_shared` tables are expected to run across
all decomposed databases.
## Migrations
Read [Migrations for Multiple Databases](migrations_for_multiple_databases.md).
## CI/CD Database ## CI/CD Database
> Support for configuring the GitLab Rails application to use a distinct > Support for configuring the GitLab Rails application to use a distinct
......
...@@ -240,7 +240,7 @@ of migration helpers. ...@@ -240,7 +240,7 @@ of migration helpers.
In this example, we use version 1.0 of the migration class: In this example, we use version 1.0 of the migration class:
```ruby ```ruby
class TestMigration < Gitlab::Database::Migration[1.0] class TestMigration < Gitlab::Database::Migration[2.0]
def change def change
end end
end end
...@@ -253,7 +253,7 @@ version of migration helpers automatically. ...@@ -253,7 +253,7 @@ version of migration helpers automatically.
Migration helpers and versioning were [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68986) Migration helpers and versioning were [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68986)
in GitLab 14.3. in GitLab 14.3.
For merge requests targeting previous stable branches, use the old format and still inherit from For merge requests targeting previous stable branches, use the old format and still inherit from
`ActiveRecord::Migration[6.1]` instead of `Gitlab::Database::Migration[1.0]`. `ActiveRecord::Migration[6.1]` instead of `Gitlab::Database::Migration[2.0]`.
## Retry mechanism when acquiring database locks ## Retry mechanism when acquiring database locks
...@@ -535,7 +535,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration ...@@ -535,7 +535,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration
class like so: class like so:
```ruby ```ruby
class MyMigration < Gitlab::Database::Migration[1.0] class MyMigration < Gitlab::Database::Migration[2.0]
disable_ddl_transaction! disable_ddl_transaction!
INDEX_NAME = 'index_name' INDEX_NAME = 'index_name'
...@@ -586,7 +586,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration ...@@ -586,7 +586,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration
class like so: class like so:
```ruby ```ruby
class MyMigration < Gitlab::Database::Migration[1.0] class MyMigration < Gitlab::Database::Migration[2.0]
disable_ddl_transaction! disable_ddl_transaction!
INDEX_NAME = 'index_name' INDEX_NAME = 'index_name'
...@@ -629,7 +629,7 @@ The easiest way to test for existence of an index by name is to use the ...@@ -629,7 +629,7 @@ The easiest way to test for existence of an index by name is to use the
be used with a name option. For example: be used with a name option. For example:
```ruby ```ruby
class MyMigration < Gitlab::Database::Migration[1.0] class MyMigration < Gitlab::Database::Migration[2.0]
INDEX_NAME = 'index_name' INDEX_NAME = 'index_name'
def up def up
...@@ -664,7 +664,7 @@ Here's an example where we add a new column with a foreign key ...@@ -664,7 +664,7 @@ Here's an example where we add a new column with a foreign key
constraint. Note it includes `index: true` to create an index for it. constraint. Note it includes `index: true` to create an index for it.
```ruby ```ruby
class Migration < Gitlab::Database::Migration[1.0] class Migration < Gitlab::Database::Migration[2.0]
def change def change
add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade } add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade }
...@@ -710,7 +710,7 @@ expensive and disruptive operation for larger tables, but in reality it's not. ...@@ -710,7 +710,7 @@ expensive and disruptive operation for larger tables, but in reality it's not.
Take the following migration as an example: Take the following migration as an example:
```ruby ```ruby
class DefaultRequestAccessGroups < Gitlab::Database::Migration[1.0] class DefaultRequestAccessGroups < Gitlab::Database::Migration[2.0]
def change def change
change_column_default(:namespaces, :request_access_enabled, from: false, to: true) change_column_default(:namespaces, :request_access_enabled, from: false, to: true)
end end
...@@ -943,7 +943,7 @@ The Rails 5 natively supports `JSONB` (binary JSON) column type. ...@@ -943,7 +943,7 @@ The Rails 5 natively supports `JSONB` (binary JSON) column type.
Example migration adding this column: Example migration adding this column:
```ruby ```ruby
class AddOptionsToBuildMetadata < Gitlab::Database::Migration[1.0] class AddOptionsToBuildMetadata < Gitlab::Database::Migration[2.0]
def change def change
add_column :ci_builds_metadata, :config_options, :jsonb add_column :ci_builds_metadata, :config_options, :jsonb
end end
...@@ -975,7 +975,7 @@ Do not store `attr_encrypted` attributes as `:text` in the database; use ...@@ -975,7 +975,7 @@ Do not store `attr_encrypted` attributes as `:text` in the database; use
efficient: efficient:
```ruby ```ruby
class AddSecretToSomething < Gitlab::Database::Migration[1.0] class AddSecretToSomething < Gitlab::Database::Migration[2.0]
def change def change
add_column :something, :encrypted_secret, :binary add_column :something, :encrypted_secret, :binary
add_column :something, :encrypted_secret_iv, :binary add_column :something, :encrypted_secret_iv, :binary
...@@ -1033,8 +1033,8 @@ If you need more complex logic, you can define and use models local to a ...@@ -1033,8 +1033,8 @@ If you need more complex logic, you can define and use models local to a
migration. For example: migration. For example:
```ruby ```ruby
class MyMigration < Gitlab::Database::Migration[1.0] class MyMigration < Gitlab::Database::Migration[2.0]
class Project < ActiveRecord::Base class Project < MigrationRecord
self.table_name = 'projects' self.table_name = 'projects'
end end
...@@ -1132,8 +1132,8 @@ in a previous migration. ...@@ -1132,8 +1132,8 @@ in a previous migration.
It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information. It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information.
```ruby ```ruby
class AddAndSeedMyColumn < Gitlab::Database::Migration[1.0] class AddAndSeedMyColumn < Gitlab::Database::Migration[2.0]
class User < ActiveRecord::Base class User < MigrationRecord
self.table_name = 'users' self.table_name = 'users'
end end
......
...@@ -31,8 +31,8 @@ could result in loading unexpected code or associations which may cause unintend ...@@ -31,8 +31,8 @@ could result in loading unexpected code or associations which may cause unintend
side effects or failures during upgrades. side effects or failures during upgrades.
```ruby ```ruby
class SomeMigration < Gitlab::Database::Migration[1.0] class SomeMigration < Gitlab::Database::Migration[2.0]
class Services < ActiveRecord::Base class Services < MigrationRecord
self.table_name = 'services' self.table_name = 'services'
self.inheritance_column = :_type_disabled self.inheritance_column = :_type_disabled
end end
......
...@@ -254,13 +254,13 @@ of records plucked. `MAX_PLUCK` defaults to `1_000` in `ApplicationRecord`. ...@@ -254,13 +254,13 @@ of records plucked. `MAX_PLUCK` defaults to `1_000` in `ApplicationRecord`.
## Inherit from ApplicationRecord ## Inherit from ApplicationRecord
Most models in the GitLab codebase should inherit from `ApplicationRecord`, Most models in the GitLab codebase should inherit from `ApplicationRecord`
rather than from `ActiveRecord::Base`. This allows helper methods to be easily or `Ci::ApplicationRecord` rather than from `ActiveRecord::Base`. This allows
added. helper methods to be easily added.
An exception to this rule exists for models created in database migrations. As An exception to this rule exists for models created in database migrations. As
these should be isolated from application code, they should continue to subclass these should be isolated from application code, they should continue to subclass
from `ActiveRecord::Base`. from `MigrationRecord` which is available only in migration context.
## Use UNIONs ## Use UNIONs
......
...@@ -16,6 +16,10 @@ class <%= migration_class_name %> < Gitlab::Database::Migration[<%= Gitlab::Data ...@@ -16,6 +16,10 @@ class <%= migration_class_name %> < Gitlab::Database::Migration[<%= Gitlab::Data
# To disable transactions uncomment the following line and remove these # To disable transactions uncomment the following line and remove these
# comments: # comments:
# disable_ddl_transaction! # disable_ddl_transaction!
#
# Configure the `gitlab_schema` to perform data manipulation (DML).
# Visit: https://docs.gitlab.com/ee/development/database/migrations_for_multiple_databases.html
# restrict_gitlab_migration gitlab_schema: :gitlab_main
<%- if migration_action == 'add' -%> <%- if migration_action == 'add' -%>
def change def change
......
...@@ -16,6 +16,10 @@ class <%= migration_class_name %> < Gitlab::Database::Migration[<%= Gitlab::Data ...@@ -16,6 +16,10 @@ class <%= migration_class_name %> < Gitlab::Database::Migration[<%= Gitlab::Data
# To disable transactions uncomment the following line and remove these # To disable transactions uncomment the following line and remove these
# comments: # comments:
# disable_ddl_transaction! # disable_ddl_transaction!
#
# Configure the `gitlab_schema` to perform data manipulation (DML).
# Visit: https://docs.gitlab.com/ee/development/database/migrations_for_multiple_databases.html
# restrict_gitlab_migration gitlab_schema: :gitlab_main
def up def up
end end
......
...@@ -41,6 +41,12 @@ module Gitlab ...@@ -41,6 +41,12 @@ module Gitlab
class V2_0 < V1_0 # rubocop:disable Naming/ClassAndModuleCamelCase class V2_0 < V1_0 # rubocop:disable Naming/ClassAndModuleCamelCase
include Gitlab::Database::MigrationHelpers::RestrictGitlabSchema include Gitlab::Database::MigrationHelpers::RestrictGitlabSchema
# When running migrations, the `db:migrate` switches connection of
# ActiveRecord::Base depending where the migration runs.
# This helper class is provided to avoid confusion using `ActiveRecord::Base`
class MigrationRecord < ActiveRecord::Base
end
end end
def self.[](version) def self.[](version)
...@@ -53,7 +59,7 @@ module Gitlab ...@@ -53,7 +59,7 @@ module Gitlab
# The current version to be used in new migrations # The current version to be used in new migrations
def self.current_version def self.current_version
1.0 2.0
end end
end end
end end
......
...@@ -69,8 +69,10 @@ module Gitlab ...@@ -69,8 +69,10 @@ module Gitlab
schemas = self.dml_schemas(tables) schemas = self.dml_schemas(tables)
if (schemas - self.allowed_gitlab_schemas).any? if (schemas - self.allowed_gitlab_schemas).any?
raise DMLAccessDeniedError, "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \ raise DMLAccessDeniedError, \
"which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'." "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
"which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'. " \
"#{documentation_url}"
end end
end end
...@@ -93,11 +95,19 @@ module Gitlab ...@@ -93,11 +95,19 @@ module Gitlab
end end
def raise_dml_not_allowed_error(message) def raise_dml_not_allowed_error(message)
raise DMLNotAllowedError, "Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode. #{message}" raise DMLNotAllowedError, \
"Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode. " \
"#{message}. #{documentation_url}" \
end end
def raise_ddl_not_allowed_error(message) def raise_ddl_not_allowed_error(message)
raise DDLNotAllowedError, "DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode. #{message}" raise DDLNotAllowedError, \
"DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode. " \
"#{message}. #{documentation_url}"
end
def documentation_url
"For more information visit: https://docs.gitlab.com/ee/development/database/migrations_for_multiple_databases.html"
end end
end end
end end
......
...@@ -240,7 +240,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a ...@@ -240,7 +240,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a
end end
def software_license_class def software_license_class
Class.new(ActiveRecord::Base) do Class.new(Gitlab::Database::Migration[2.0]::MigrationRecord) do
self.table_name = 'software_licenses' self.table_name = 'software_licenses'
end end
end end
...@@ -272,7 +272,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a ...@@ -272,7 +272,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a
end end
def ci_instance_variables_class def ci_instance_variables_class
Class.new(ActiveRecord::Base) do Class.new(Gitlab::Database::Migration[2.0]::MigrationRecord) do
self.table_name = 'ci_instance_variables' self.table_name = 'ci_instance_variables'
end end
end end
...@@ -303,7 +303,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a ...@@ -303,7 +303,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a
end end
def detached_partitions_class def detached_partitions_class
Class.new(ActiveRecord::Base) do Class.new(Gitlab::Database::Migration[2.0]::MigrationRecord) do
self.table_name = 'detached_partitions' self.table_name = 'detached_partitions'
end end
end end
......
...@@ -32,7 +32,7 @@ RSpec.describe Gitlab::Database::Migration do ...@@ -32,7 +32,7 @@ RSpec.describe Gitlab::Database::Migration do
# This breaks upon Rails upgrade. In that case, we'll add a new version in Gitlab::Database::Migration::MIGRATION_CLASSES, # This breaks upon Rails upgrade. In that case, we'll add a new version in Gitlab::Database::Migration::MIGRATION_CLASSES,
# bump .current_version and leave existing migrations and already defined versions of Gitlab::Database::Migration # bump .current_version and leave existing migrations and already defined versions of Gitlab::Database::Migration
# untouched. # untouched.
expect(described_class[described_class.current_version].superclass).to eq(ActiveRecord::Migration::Current) expect(described_class[described_class.current_version]).to be < ActiveRecord::Migration::Current
end end
end end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment