Merge branch 'nik-db-migrations-prefer-atomic-docs' into 'master'

Update DB migrations style guide: prefer atomic migrations when possible See merge request gitlab-org/gitlab-ce!31915

Merge branch 'nik-db-migrations-prefer-atomic-docs' into 'master'
Update DB migrations style guide: prefer atomic migrations when possible See merge request gitlab-org/gitlab-ce!31915
190dba6d · Evan Read · 0839fb0c · 9d5ea420 · 190dba6d · 190dba6d
Commit 190dba6d authored Aug 27, 2019 by Evan Read
Show whitespace changes
Inline Side-by-side

Showing with 99 additions and 44 deletions

doc/development/database_review.md doc/development/database_review.md +1 -1

doc/development/migration_style_guide.md doc/development/migration_style_guide.md +98 -43

No files found.
--- a/doc/development/database_review.md
+++ b/doc/development/database_review.md
@@ -91,7 +91,7 @@ and details for a database reviewer:
    concurrent index/foreign key helpers (with transactions disabled)
  - Check consistency with `db/schema.rb` and that migrations are [reversible](migration_style_guide.md#reversibility)
  - Check queries timing (If any): Queries executed in a migration
-    need to fit comfortable within `15s` - preferably much less than that - on GitLab.com.
+    need to fit comfortably within `15s` - preferably much less than that - on GitLab.com.
 - Check [background migrations](background_migrations.md):
  - For data migrations, establish a time estimate for execution
  - They should only be used when migrating data in larger tables.

--- a/doc/development/migration_style_guide.md
+++ b/doc/development/migration_style_guide.md
 # Migration Style Guide
 When writing migrations for GitLab, you have to take into account that
-these will be ran by hundreds of thousands of organizations of all sizes, some with
+these will be run by hundreds of thousands of organizations of all sizes, some with
 many years of data in their database.
 In addition, having to take a server offline for an upgrade small or big is a
-big burden for most organizations. For this reason it is important that your
+big burden for most organizations. For this reason, it is important that your
-migrations are written carefully, can be applied online and adhere to the style
+migrations are written carefully, can be applied online, and adhere to the style
 guide below.
 Migrations are **not** allowed to require GitLab installations to be taken
@@ -85,7 +85,38 @@ be possible to downgrade in case of a vulnerability or bugs.
 In your migration, add a comment describing how the reversibility of the
 migration was tested.
-## Multi Threading
+## Atomicity
+By default, migrations are single transaction. That is, a transaction is opened
+at the beginning of the migration, and committed after all steps are processed.
+Running migrations in a single transaction makes sure that if one of the steps fails,
+none of the steps will be executed, leaving the database in valid state.
+Therefore, either:
+- Put all migrations in one single-transaction migration.
+- If necessary, put most actions in one migration and create a separate migration 
+  for the steps that cannot be done in a single transaction.
+For example, if you create an empty table and need to build an index for it,
+it is recommended to use a regular single-transaction migration and the default
+rails schema statement: [`add_index`](https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index).
+This is a blocking operation, but it won't cause problems because the table is not yet used,
+and therefore it does not have any records yet.
+## Heavy operations in a single transaction
+When using a single-transaction migration, a transaction will hold on a database connection
+for the duration of the migration, so you must make sure the actions in the migration
+do not take too much time: In general, queries executed in a migration need to fit comfortably
+within `15s` on GitLab.com.
+In case you need to insert, update, or delete a significant amount of data, you:
+- Must disable the single transaction with `disable_ddl_transaction!`.
+- Should consider doing it in a [Background Migration](background_migrations.md).
+## Multi-Threading
 Sometimes a migration might need to use multiple Ruby threads to speed up a
 migration. For this to work your migration needs to include the module
@@ -122,16 +153,16 @@ pool.  This ensures each thread has its own connection object, and won't time
 out when trying to obtain one.
 **NOTE:** PostgreSQL has a maximum amount of connections that it allows. This
-limit can vary from installation to installation. As a result it's recommended
+limit can vary from installation to installation. As a result, it's recommended
-you do not use more than 32 threads in a single migration. Usually 4-8 threads
+you do not use more than 32 threads in a single migration. Usually, 4-8 threads
 should be more than enough.
 ## Removing indexes
-When removing an index make sure to use the method `remove_concurrent_index` instead
+If the table is not empty when removing an index, make sure to use the method
-of the regular `remove_index` method. The `remove_concurrent_index` method
+`remove_concurrent_index` instead of the regular `remove_index` method.
-automatically drops concurrent indexes when using PostgreSQL, removing the
+The `remove_concurrent_index` method drops indexes concurrently, so no locking is required,
-need for downtime. To use this method you must disable single-transaction mode
+and there is no need for downtime. To use this method, you must disable single-transaction mode
 by calling the method `disable_ddl_transaction!` in the body of your migration
 class like so:
@@ -149,19 +180,25 @@ end
 Note that it is not necessary to check if the index exists prior to
 removing it.
+For a small table (such as an empty one or one with less than `1,000` records),
+it is recommended to use `remove_index` in a single-transaction migration,
+combining it with other operations that don't require `disable_ddl_transaction!`.
 ## Adding indexes
-If you need to add a unique index please keep in mind there is the possibility
+If you need to add a unique index, please keep in mind there is the possibility
 of existing duplicates being present in the database. This means that should
 always _first_ add a migration that removes any duplicates, before adding the
 unique index.
-When adding an index make sure to use the method `add_concurrent_index` instead
+When adding an index to a non-empty table make sure to use the method
-of the regular `add_index` method. The `add_concurrent_index` method
+`add_concurrent_index` instead of the regular `add_index` method. 
-automatically creates concurrent indexes when using PostgreSQL, removing the
+The `add_concurrent_index` method automatically creates concurrent indexes
-need for downtime. To use this method you must disable transactions by calling
+when using PostgreSQL, removing the need for downtime. 
-the method `disable_ddl_transaction!` in the body of your migration class like
-so:
+To use this method, you must disable single-transactions mode
+by calling the method `disable_ddl_transaction!` in the body of your migration
+class like so:
 ```ruby
 class MyMigration < ActiveRecord::Migration[4.2]
@@ -179,16 +216,20 @@ class MyMigration < ActiveRecord::Migration[4.2]
 end
 ```
+For a small table (such as an empty one or one with less than `1,000` records), 
+it is recommended to use `add_index` in a single-transaction migration, combining it with other
+operations that don't require `disable_ddl_transaction!`.
 ## Adding foreign-key constraints
-When adding a foreign-key constraint to either an existing or new
+When adding a foreign-key constraint to either an existing or a new column also
-column remember to also add a index on the column.
+remember to add an index on the column.
 This is **required** for all foreign-keys, e.g., to support efficient cascading
 deleting: when a lot of rows in a table get deleted, the referenced records need
 to be deleted too. The database has to look for corresponding records in the
 referenced table. Without an index, this will result in a sequential scan on the
-table which can take a long time.
+table, which can take a long time.
 Here's an example where we add a new column with a foreign key
 constraint. Note it includes `index: true` to create an index for it.
@@ -202,13 +243,17 @@ class Migration < ActiveRecord::Migration[4.2]
 end
 ```
-When adding a foreign-key constraint to an existing column, we
+When adding a foreign-key constraint to an existing column in a non-empty table,
-have to employ `add_concurrent_foreign_key` and `add_concurrent_index`
+we have to employ `add_concurrent_foreign_key` and `add_concurrent_index`
 instead of `add_reference`.
+For an empty table (such as a fresh one), it is recommended to use
+`add_reference` in a single-transaction migration, combining it with other
+operations that don't require `disable_ddl_transaction!`.
 ## Adding Columns With Default Values
-When adding columns with default values you must use the method
+When adding columns with default values to non-empty tables, you must use
 `add_column_with_default`. This method ensures the table is updated without
 requiring downtime. This method is not reversible so you must manually define
 the `up` and `down` methods in your migration class.
@@ -232,10 +277,14 @@ end
 ```
 Keep in mind that this operation can easily take 10-15 minutes to complete on
-larger installations (e.g. GitLab.com). As a result you should only add default
+larger installations (e.g. GitLab.com). As a result, you should only add
-values if absolutely necessary. There is a RuboCop cop that will fail if this
+default values if absolutely necessary. There is a RuboCop cop that will fail if
-method is used on some tables that are very large on GitLab.com, which would
+this method is used on some tables that are very large on GitLab.com, which
-cause other issues.
+would cause other issues.
+For a small table (such as an empty one or one with less than `1,000` records),
+use `add_column` and `change_column_default` in a single-transaction migration,
+combining it with other operations that don't require `disable_ddl_transaction!`.
 ## Updating an existing column
@@ -253,8 +302,10 @@ update_column_in_batches(:projects, :foo, 10) do |table, query|
 end
 ```
-To perform a computed update, the value can be wrapped in `Arel.sql`, so Arel
+If a computed update is needed, the value can be wrapped in `Arel.sql`, so Arel
-treats it as an SQL literal. The below example is the same as the one above, but
+treats it as an SQL literal. It's also a required deprecation for [Rails 6](https://gitlab.com/gitlab-org/gitlab-ce/issues/61451).
+The below example is the same as the one above, but
 the value is set to the product of the `bar` and `baz` columns:
 ```ruby
@@ -275,12 +326,12 @@ staging environment - or asking someone else to do so for you - beforehand.
 By default, an integer column can hold up to a 4-byte (32-bit) number. That is
 a max value of 2,147,483,647. Be aware of this when creating a column that will
-hold file sizes in byte units. If you are tracking file size in bytes this
+hold file sizes in byte units. If you are tracking file size in bytes, this
 restricts the maximum file size to just over 2GB.
 To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
 set the limit to 8-bytes. This will allow the column to hold a value up to
-9,223,372,036,854,775,807.
+`9,223,372,036,854,775,807`.
 Rails migration example:
@@ -294,9 +345,11 @@ add_column(:projects, :foo, :integer, default: 10, limit: 8)
 ## Timestamp column type
-By default, Rails uses the `timestamp` data type that stores timestamp data without timezone information.
+By default, Rails uses the `timestamp` data type that stores timestamp data
-The `timestamp` data type is used by calling either the `add_timestamps` or the `timestamps` method.
+without timezone information. The `timestamp` data type is used by calling
-Also Rails converts the `:datetime` data type to the `timestamp` one.
+either the `add_timestamps` or the `timestamps` method.
+Also, Rails converts the `:datetime` data type to the `timestamp` one.
 Example:
@@ -317,14 +370,16 @@ def up
 end
 ```
-Instead of using these methods one should use the following methods to store timestamps with timezones:
+Instead of using these methods, one should use the following methods to store
+timestamps with timezones:
 - `add_timestamps_with_timezone`
 - `timestamps_with_timezone`
-This ensures all timestamps have a time zone specified. This in turn means existing timestamps won't
+This ensures all timestamps have a time zone specified. This, in turn, means
-suddenly use a different timezone when the system's timezone changes. It also makes it very clear which
+existing timestamps won't suddenly use a different timezone when the system's
-timezone was used in the first place.
+timezone changes. It also makes it very clear which timezone was used in the
+first place.
 ## Storing JSON in database
@@ -359,7 +414,7 @@ Make sure your migration can be reversed.
 ## Data migration
 Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
-using plain SQL you need to quote all input manually with `quote_string` helper.
+using plain SQL, you need to quote all input manually with `quote_string` helper.
 Example with Arel:
@@ -384,7 +439,7 @@ select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(i
 end
 ```
-If you need more complex logic you can define and use models local to a
+If you need more complex logic, you can define and use models local to a
 migration. For example:
 ```ruby
@@ -395,13 +450,13 @@ class MyMigration < ActiveRecord::Migration[4.2]
 end
 ```
-When doing so be sure to explicitly set the model's table name so it's not
+When doing so be sure to explicitly set the model's table name, so it's not
 derived from the class name or namespace.
 ### Renaming reserved paths
-When a new route for projects is introduced that could conflict with any
+When a new route for projects is introduced, it could conflict with any
-existing records. The path for this records should be renamed, and the
+existing records. The path for these records should be renamed, and the
 related data should be moved on disk.
 Since we had to do this a few times already, there are now some helpers to help