Commit 855a4095 authored by Adam Hegyi's avatar Adam Hegyi

Merge branch 'migrations-for-multiple-databases' into 'master'

Add documentation for migrations for multiple databases

See merge request gitlab-org/gitlab!84764
parents da10c4a7 f14c7af7
......@@ -23,6 +23,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Migrations
- [Migrations for multiple databases](migrations_for_multiple_databases.md)
- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md)
- [SQL guidelines](../sql.md) for working with SQL queries
- [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations
......
This diff is collapsed.
......@@ -9,6 +9,63 @@ info: To determine the technical writer assigned to the Stage/Group associated w
To scale GitLab, the we are
[decomposing the GitLab application database into multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168).
## GitLab Schema
For properly discovering allowed patterns between different databases
the GitLab application implements the `lib/gitlab/database/gitlab_schemas.yml` YAML file.
This file provides a virtual classification of tables into a `gitlab_schema`
which conceptually is similar to [PostgreSQL Schema](https://www.postgresql.org/docs/current/ddl-schemas.html).
We decided as part of [using database schemas to better isolated CI decomposed features](https://gitlab.com/gitlab-org/gitlab/-/issues/333415)
that we cannot use PostgreSQL schema due to complex migration procedures. Instead we implemented
the concept of application-level classification.
Each table of GitLab needs to have a `gitlab_schema` assigned:
- `gitlab_main`: describes all tables that are being stored in the `main:` database (for example, like `projects`, `users`).
- `gitlab_ci`: describes all CI tables that are being stored in the `ci:` database (for example, `ci_pipelines`, `ci_builds`).
- `gitlab_shared`: describe all application tables that contain data across all decomposed databases (for example, `loose_foreign_keys_deleted_records`).
- `...`: more schemas to be introduced with additional decomposed databases
The usage of schema enforces the base class to be used:
- `ApplicationRecord` for `gitlab_main`
- `Ci::ApplicationRecord` for `gitlab_ci`
- `Gitlab::Database::SharedModel` for `gitlab_shared`
### The impact of `gitlab_schema`
The usage of `gitlab_schema` has a significant impact on the application.
The `gitlab_schema` primary purpose is to introduce a barrier between different data access patterns.
This is used as a primary source of classification for:
- [Discovering cross-joins across tables from different schemas](#removing-joins-between-ci_-and-non-ci_-tables)
- [Discovering cross-database transactions across tables from different schemas](#removing-cross-database-transactions)
### The special purpose of `gitlab_shared`
`gitlab_shared` is a special case describing tables or views that by design contain data across
all decomposed databases. This does describe application-defined tables (like `loose_foreign_keys_deleted_records`),
Rails-defined tables (like `schema_migrations` or `ar_internal_metadata` as well as internal PostgreSQL tables
(for example, `pg_attribute`).
**Be careful** to use `gitlab_shared` as it requires special handling while accessing data.
Since `gitlab_shared` shares not only structure but also data, the application needs to be written in a way
that traverses all data from all databases in sequential manner.
```ruby
Gitlab::Database::EachDatabase.each_model_connection([MySharedModel]) do |connection, connection_name|
MySharedModel.select_all_data...
end
```
As such, migrations modifying data of `gitlab_shared` tables are expected to run across
all decomposed databases.
## Migrations
Read [Migrations for Multiple Databases](migrations_for_multiple_databases.md).
## CI/CD Database
> Support for configuring the GitLab Rails application to use a distinct
......
......@@ -240,7 +240,7 @@ of migration helpers.
In this example, we use version 1.0 of the migration class:
```ruby
class TestMigration < Gitlab::Database::Migration[1.0]
class TestMigration < Gitlab::Database::Migration[2.0]
def change
end
end
......@@ -253,7 +253,7 @@ version of migration helpers automatically.
Migration helpers and versioning were [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68986)
in GitLab 14.3.
For merge requests targeting previous stable branches, use the old format and still inherit from
`ActiveRecord::Migration[6.1]` instead of `Gitlab::Database::Migration[1.0]`.
`ActiveRecord::Migration[6.1]` instead of `Gitlab::Database::Migration[2.0]`.
## Retry mechanism when acquiring database locks
......@@ -535,7 +535,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration
class like so:
```ruby
class MyMigration < Gitlab::Database::Migration[1.0]
class MyMigration < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
INDEX_NAME = 'index_name'
......@@ -586,7 +586,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration
class like so:
```ruby
class MyMigration < Gitlab::Database::Migration[1.0]
class MyMigration < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
INDEX_NAME = 'index_name'
......@@ -629,7 +629,7 @@ The easiest way to test for existence of an index by name is to use the
be used with a name option. For example:
```ruby
class MyMigration < Gitlab::Database::Migration[1.0]
class MyMigration < Gitlab::Database::Migration[2.0]
INDEX_NAME = 'index_name'
def up
......@@ -664,7 +664,7 @@ Here's an example where we add a new column with a foreign key
constraint. Note it includes `index: true` to create an index for it.
```ruby
class Migration < Gitlab::Database::Migration[1.0]
class Migration < Gitlab::Database::Migration[2.0]
def change
add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade }
......@@ -710,7 +710,7 @@ expensive and disruptive operation for larger tables, but in reality it's not.
Take the following migration as an example:
```ruby
class DefaultRequestAccessGroups < Gitlab::Database::Migration[1.0]
class DefaultRequestAccessGroups < Gitlab::Database::Migration[2.0]
def change
change_column_default(:namespaces, :request_access_enabled, from: false, to: true)
end
......@@ -943,7 +943,7 @@ The Rails 5 natively supports `JSONB` (binary JSON) column type.
Example migration adding this column:
```ruby
class AddOptionsToBuildMetadata < Gitlab::Database::Migration[1.0]
class AddOptionsToBuildMetadata < Gitlab::Database::Migration[2.0]
def change
add_column :ci_builds_metadata, :config_options, :jsonb
end
......@@ -975,7 +975,7 @@ Do not store `attr_encrypted` attributes as `:text` in the database; use
efficient:
```ruby
class AddSecretToSomething < Gitlab::Database::Migration[1.0]
class AddSecretToSomething < Gitlab::Database::Migration[2.0]
def change
add_column :something, :encrypted_secret, :binary
add_column :something, :encrypted_secret_iv, :binary
......@@ -1033,8 +1033,8 @@ If you need more complex logic, you can define and use models local to a
migration. For example:
```ruby
class MyMigration < Gitlab::Database::Migration[1.0]
class Project < ActiveRecord::Base
class MyMigration < Gitlab::Database::Migration[2.0]
class Project < MigrationRecord
self.table_name = 'projects'
end
......@@ -1132,8 +1132,8 @@ in a previous migration.
It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information.
```ruby
class AddAndSeedMyColumn < Gitlab::Database::Migration[1.0]
class User < ActiveRecord::Base
class AddAndSeedMyColumn < Gitlab::Database::Migration[2.0]
class User < MigrationRecord
self.table_name = 'users'
end
......
......@@ -31,8 +31,8 @@ could result in loading unexpected code or associations which may cause unintend
side effects or failures during upgrades.
```ruby
class SomeMigration < Gitlab::Database::Migration[1.0]
class Services < ActiveRecord::Base
class SomeMigration < Gitlab::Database::Migration[2.0]
class Services < MigrationRecord
self.table_name = 'services'
self.inheritance_column = :_type_disabled
end
......
......@@ -254,13 +254,13 @@ of records plucked. `MAX_PLUCK` defaults to `1_000` in `ApplicationRecord`.
## Inherit from ApplicationRecord
Most models in the GitLab codebase should inherit from `ApplicationRecord`,
rather than from `ActiveRecord::Base`. This allows helper methods to be easily
added.
Most models in the GitLab codebase should inherit from `ApplicationRecord`
or `Ci::ApplicationRecord` rather than from `ActiveRecord::Base`. This allows
helper methods to be easily added.
An exception to this rule exists for models created in database migrations. As
these should be isolated from application code, they should continue to subclass
from `ActiveRecord::Base`.
from `MigrationRecord` which is available only in migration context.
## Use UNIONs
......
......@@ -16,6 +16,10 @@ class <%= migration_class_name %> < Gitlab::Database::Migration[<%= Gitlab::Data
# To disable transactions uncomment the following line and remove these
# comments:
# disable_ddl_transaction!
#
# Configure the `gitlab_schema` to perform data manipulation (DML).
# Visit: https://docs.gitlab.com/ee/development/database/migrations_for_multiple_databases.html
# restrict_gitlab_migration gitlab_schema: :gitlab_main
<%- if migration_action == 'add' -%>
def change
......
......@@ -16,6 +16,10 @@ class <%= migration_class_name %> < Gitlab::Database::Migration[<%= Gitlab::Data
# To disable transactions uncomment the following line and remove these
# comments:
# disable_ddl_transaction!
#
# Configure the `gitlab_schema` to perform data manipulation (DML).
# Visit: https://docs.gitlab.com/ee/development/database/migrations_for_multiple_databases.html
# restrict_gitlab_migration gitlab_schema: :gitlab_main
def up
end
......
......@@ -41,6 +41,12 @@ module Gitlab
class V2_0 < V1_0 # rubocop:disable Naming/ClassAndModuleCamelCase
include Gitlab::Database::MigrationHelpers::RestrictGitlabSchema
# When running migrations, the `db:migrate` switches connection of
# ActiveRecord::Base depending where the migration runs.
# This helper class is provided to avoid confusion using `ActiveRecord::Base`
class MigrationRecord < ActiveRecord::Base
end
end
def self.[](version)
......@@ -53,7 +59,7 @@ module Gitlab
# The current version to be used in new migrations
def self.current_version
1.0
2.0
end
end
end
......
......@@ -69,8 +69,10 @@ module Gitlab
schemas = self.dml_schemas(tables)
if (schemas - self.allowed_gitlab_schemas).any?
raise DMLAccessDeniedError, "Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
"which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'."
raise DMLAccessDeniedError, \
"Select/DML queries (SELECT/UPDATE/DELETE) do access '#{tables}' (#{schemas.to_a}) " \
"which is outside of list of allowed schemas: '#{self.allowed_gitlab_schemas}'. " \
"#{documentation_url}"
end
end
......@@ -93,11 +95,19 @@ module Gitlab
end
def raise_dml_not_allowed_error(message)
raise DMLNotAllowedError, "Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode. #{message}"
raise DMLNotAllowedError, \
"Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode. " \
"#{message}. #{documentation_url}" \
end
def raise_ddl_not_allowed_error(message)
raise DDLNotAllowedError, "DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode. #{message}"
raise DDLNotAllowedError, \
"DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode. " \
"#{message}. #{documentation_url}"
end
def documentation_url
"For more information visit: https://docs.gitlab.com/ee/development/database/migrations_for_multiple_databases.html"
end
end
end
......
......@@ -240,7 +240,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a
end
def software_license_class
Class.new(ActiveRecord::Base) do
Class.new(Gitlab::Database::Migration[2.0]::MigrationRecord) do
self.table_name = 'software_licenses'
end
end
......@@ -272,7 +272,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a
end
def ci_instance_variables_class
Class.new(ActiveRecord::Base) do
Class.new(Gitlab::Database::Migration[2.0]::MigrationRecord) do
self.table_name = 'ci_instance_variables'
end
end
......@@ -303,7 +303,7 @@ RSpec.describe Gitlab::Database::MigrationHelpers::RestrictGitlabSchema, query_a
end
def detached_partitions_class
Class.new(ActiveRecord::Base) do
Class.new(Gitlab::Database::Migration[2.0]::MigrationRecord) do
self.table_name = 'detached_partitions'
end
end
......
......@@ -32,7 +32,7 @@ RSpec.describe Gitlab::Database::Migration do
# This breaks upon Rails upgrade. In that case, we'll add a new version in Gitlab::Database::Migration::MIGRATION_CLASSES,
# bump .current_version and leave existing migrations and already defined versions of Gitlab::Database::Migration
# untouched.
expect(described_class[described_class.current_version].superclass).to eq(ActiveRecord::Migration::Current)
expect(described_class[described_class.current_version]).to be < ActiveRecord::Migration::Current
end
end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment