Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
abbd960f
Commit
abbd960f
authored
Mar 11, 2020
by
Matthias Käppler
Committed by
Adam Hegyi
Mar 11, 2020
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add developer docs for bulk-inserts
First iteration.
parent
320abfd7
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
157 additions
and
0 deletions
+157
-0
doc/development/README.md
doc/development/README.md
+1
-0
doc/development/insert_into_tables_in_batches.md
doc/development/insert_into_tables_in_batches.md
+156
-0
No files found.
doc/development/README.md
View file @
abbd960f
...
...
@@ -145,6 +145,7 @@ Complementary reads:
-
[
Hash indexes
](
hash_indexes.md
)
-
[
Storing SHA1 hashes as binary
](
sha1_as_binary.md
)
-
[
Iterating tables in batches
](
iterating_tables_in_batches.md
)
-
[
Insert into tables in batches
](
insert_into_tables_in_batches.md
)
-
[
Ordering table columns
](
ordering_table_columns.md
)
-
[
Verifying database capabilities
](
verifying_database_capabilities.md
)
-
[
Database Debugging and Troubleshooting
](
database_debugging.md
)
...
...
doc/development/insert_into_tables_in_batches.md
0 → 100644
View file @
abbd960f
---
description
:
"
Sometimes
it
is
necessary
to
store
large
amounts
of
records
at
once,
which
can
be
inefficient
when
iterating
collections
and
performing
individual
`save`s.
With
the
arrival
of
`insert_all`
in
Rails
6,
which
operates
at
the
row
level
(that
is,
using
`Hash`es),
GitLab
has
added
a
set
of
APIs
that
make
it
safe
and
simple
to
insert
ActiveRecord
objects
in
bulk."
---
# Insert into tables in batches
Sometimes it is necessary to store large amounts of records at once, which can be inefficient
when iterating collections and saving each record individually. With the arrival of
[
`insert_all`
](
https://apidock.com/rails/ActiveRecord/Persistence/ClassMethods/insert_all
)
in Rails 6, which operates at the row level (that is, using
`Hash`
objects), GitLab has added a set
of APIs that make it safe and simple to insert
`ActiveRecord`
objects in bulk.
## Prepare `ApplicationRecord`s for bulk insertion
In order for a model class to take advantage of the bulk insertion API, it has to include the
`BulkInsertSafe`
concern first:
```
ruby
class
MyModel
<
ApplicationRecord
# other includes here
# ...
include
BulkInsertSafe
# include this last
# ...
end
```
The
`BulkInsertSafe`
concern has two functions:
-
It performs checks against your model class to ensure that it does not use ActiveRecord
APIs that are not safe to use with respect to bulk insertions (more on that below).
-
It adds a new class method
`bulk_insert!`
, which you can use to insert many records at once.
## Insert records via `bulk_insert!`
If the target class passes the checks performed by
`BulkInsertSafe`
, you can proceed to use
the
`bulk_insert!`
class method as follows:
```
ruby
records
=
[
MyModel
.
new
,
...
]
MyModel
.
bulk_insert!
(
records
)
```
### Record validation
The
`bulk_insert!`
method guarantees that
`records`
will be inserted transactionally, and
will run validations on each record prior to insertion. If any record fails to validate,
an error is raised and the transaction is rolled back. You can turn off validations via
the
`:validate`
option:
```
ruby
MyModel
.
bulk_insert!
(
records
,
validate:
false
)
```
### Batch size configuration
In those cases where the number of
`records`
is above a given threshold, insertions will
occur in multiple batches. The default batch size is defined in
[
`BulkInsertSafe::DEFAULT_BATCH_SIZE`
](
https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb
)
.
Assuming a default threshold of 500, inserting 950 records
would result in two batches being written sequentially (of size 500 and 450 respectively.)
You can override the default batch size via the
`:batch_size`
option:
```
ruby
MyModel
.
bulk_insert!
(
records
,
batch_size:
100
)
```
Assuming the same number of 950 records, this would result in 10 batches being written instead.
Since this will also affect the number of
`INSERT`
s that occur, make sure you measure the
performance impact this might have on your code. There is a trade-off between the number of
`INSERT`
statements the database has to process and the size and cost of each
`INSERT`
.
### Requirements for safe bulk insertions
Large parts of ActiveRecord's persistence API are built around the notion of callbacks. Many
of these callbacks fire in response to model life cycle events such as
`save`
or
`create`
.
These callbacks cannot be used with bulk insertions, since they are meant to be called for
every instance that is saved or created. Since these events do not fire when
records are inserted in bulk, we currently disallow their use.
The specifics around which callbacks are disallowed are defined in
[
`BulkInsertSafe`
](
https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/bulk_insert_safe.rb
)
.
Consult the module source code for details. If your class uses any of the blacklisted
functionality, and you
`include BulkInsertSafe`
, the application will fail with an error.
### `BulkInsertSafe` versus `InsertAll`
Internally,
`BulkInsertSafe`
is based on
`InsertAll`
, and you may wonder when to choose
the former over the latter. To help you make the decision,
the key differences between these classes are listed in the table below.
| | Input type | Validates input | Specify batch size | Can bypass callbacks | Transactional |
|--------------- | -------------------- | --------------- | ------------------ | --------------------------------- | ------------- |
|
`bulk_insert!`
| ActiveRecord objects | Yes (optional) | Yes (optional) | No (prevents unsafe callback use) | Yes |
|
`insert_all!`
| Attribute hashes | No | No | Yes | Yes |
To summarize,
`BulkInsertSafe`
moves bulk inserts closer to how ActiveRecord objects
and inserts would normally behave. However, if all you need is to insert raw data in bulk, then
`insert_all`
is more efficient.
## Insert `has_many` associations in bulk
A common use case is to save collections of associated relations through the owner side of the relation,
where the owned relation is associated to the owner through the
`has_many`
class method:
```
ruby
owner
=
OwnerModel
.
new
(
owned_relations:
array_of_owned_relations
)
# saves all `owned_relations` one-by-one
owner
.
save!
```
This will issue a single
`INSERT`
, and transaction, for every record in
`owned_relations`
, which is inefficient if
`array_of_owned_relations`
is large. To remedy this, the
`BulkInsertableAssociations`
concern can be
used to declare that the owner defines associations that are safe for bulk insertion:
```
ruby
class
OwnerModel
<
ApplicationRecord
# other includes here
# ...
include
BulkInsertableAssociations
# include this last
has_many
:my_models
end
```
Here
`my_models`
must be declared
`BulkInsertSafe`
(as described previously) for bulk insertions
to happen. You can now insert any yet unsaved records as follows:
```
ruby
BulkInsertableAssociations
.
with_bulk_insert
do
owner
=
OwnerModel
.
new
(
my_models:
array_of_my_model_instances
)
# saves `my_models` using a single bulk insert (possibly via multiple batches)
owner
.
save!
end
```
Note that you can still save relations that are not
`BulkInsertSafe`
in this block; they will
simply be treated as if you had invoked
`save`
from outside the block.
## Known limitations
There are a few restrictions to how these APIs can be used:
-
Bulk inserts only work for new records;
`UPDATE`
s or "upserts" are not supported yet.
-
`ON CONFLICT`
behavior cannot currently be configured; an error will be raised on primary key conflicts.
-
`BulkInsertableAssociations`
furthermore has the following restrictions:
-
only compatible with
`has_many`
relations.
-
does not support
`has_many through: ...`
relations.
Moreover, input data should either be limited to around 1000 records at most,
or already batched prior to calling bulk insert. The
`INSERT`
statement will run in a single
transaction, so for large amounts of records it may negatively affect database stability.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment