Commit da9e5fcd authored by Robert Speicher's avatar Robert Speicher

Merge branch '42396-pristine-test-environments-docs' into 'master'

Document how GitLab keeps its tests pristine

Closes #42396

See merge request gitlab-org/gitlab-ce!22481
parents e90adb67 a5ee4e0d
......@@ -209,6 +209,130 @@ it 'is overdue' do
end
```
### Pristine test environments
The code exercised by a single GitLab test may access and modify many items of
data. Without careful preparation before a test runs, and cleanup afterward,
data can be changed by a test in such a way that it affects the behaviour of
following tests. This should be avoided at all costs! Fortunately, the existing
test framework handles most cases already.
When the test environment does get polluted, a common outcome is
[flaky tests](flaky_tests.md). Pollution will often manifest as an order
dependency: running spec A followed by spec B will reliably fail, but running
spec B followed by spec A will reliably succeed. In these cases, you can use
`rspec --bisect` (or a manual pairwise bisect of spec files) to determine which
spec is at fault. Fixing the problem requires some understanding of how the test
suite ensures the environment is pristine. Read on to discover more about each
data store!
#### SQL database
This is managed for us by the `database_cleaner` gem. Each spec is surrounded in
a transaction, which is rolled back once the test completes. Certain specs will
instead issue `DELETE FROM` queries against every table after completion; this
allows the created rows to be viewed from multiple database connections, which
is important for specs that run in a browser, or migration specs, among others.
One consequence of using these strategies, instead of the well-known
`TRUNCATE TABLES` approach, is that primary keys and other sequences are **not**
reset across specs. So if you create a project in spec A, then create a project
in spec B, the first will have `id=1`, while the second will have `id=2`.
This means that specs should **never** rely on the value of an ID, or any other
sequence-generated column. To avoid accidental conflicts, specs should also
avoid manually specifying any values in these kinds of columns. Instead, leave
them unspecified, and look up the value after the row is created.
#### Redis
GitLab stores two main categories of data in Redis: cached items, and sidekiq
jobs.
In most specs, the Rails cache is actually an in-memory store. This is replaced
between specs, so calls to `Rails.cache.read` and `Rails.cache.write` are safe.
However, if a spec makes direct Redis calls, it should mark itself with the
`:clean_gitlab_redis_cache`, `:clean_gitlab_redis_shared_state` or
`:clean_gitlab_redis_queues` traits as appropriate.
Sidekiq jobs are typically not run in specs, but this behaviour can be altered
in each spec through the use of `Sidekiq::Testing.inline!` blocks. Any spec that
causes Sidekiq jobs to be pushed to Redis should use the `:sidekiq` trait, to
ensure that they are removed once the spec completes.
#### Filesystem
Filesystem data can be roughly split into "repositories", and "everything else".
Repositories are stored in `tmp/tests/repositories`. This directory is emptied
before a test run starts, and after the test run ends. It is not emptied between
specs, so created repositories accumulate within this directory over the
lifetime of the process. Deleting them is expensive, but this could lead to
pollution unless carefully managed.
To avoid this, [hashed storage](../../administration/repository_storage_types.md)
is enabled in the test suite. This means that repositories are given a unique
path that depends on their project's ID. Since the project IDs are not reset
between specs, this guarantees that each spec gets its own repository on disk,
and prevents changes from being visible between specs.
If a spec manually specifies a project ID, or inspects the state of the
`tmp/tests/repositories/` directory directly, then it should clean up the
directory both before and after it runs. In general, these patterns should be
completely avoided.
Other classes of file linked to database objects, such as uploads, are generally
managed in the same way. With hashed storage enabled in the specs, they are
written to disk in locations determined by ID, so conflicts should not occur.
Some specs disable hashed storage by passing the `:legacy_storage` trait to the
`projects` factory. Specs that do this must **never** override the `path` of the
project, or any of its groups. The default path includes the project ID, so will
not conflict; but if two specs create a `:legacy_storage` project with the same
path, they will use the same repository on disk and lead to test environment
pollution.
Other files must be managed manually by the spec. If you run code that creates a
`tmp/test-file.csv` file, for instance, the spec must ensure that the file is
removed as part of cleanup.
#### Persistent in-memory application state
All the specs in a given `rspec` run share the same Ruby process, which means
they can affect each other by modifying Ruby objects that are accessible between
specs. In practice, this means global variables, and constants (which includes
Ruby classes, modules, etc).
Global variables should generally not be modified. If absolutely necessary, a
block like this can be used to ensure the change is rolled back afterwards:
```ruby
around(:each) do |example|
old_value = $0
begin
$0 = "new-value"
example.run
ensure
$0 = old_value
end
end
```
If a spec needs to modify a constant, it should use the `stub_const` helper to
ensure the change is rolled back.
If you need to modify the contents of the `ENV` constant, you can use the
`stub_env` helper method instead.
While most Ruby **instances** are not shared between specs, **classes**
and **modules** generally are. Class and module instance variables, accessors,
class variables, and other stateful idioms, should be treated in the same way as
global variables - don't modify them unless you have to! In particular, prefer
using expectations, or dependency injection along with stubs, to avoid the need
for modifications. If you have no other choice, an `around` block similar to the
example for global variables, above, can be used, but this should be avoided if
at all possible.
### Table-based / Parameterized tests
This style of testing is used to exercise one piece of code with a comprehensive
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment