Document how GitLab keeps its tests pristine

a5ee4e0d · Nick Thomas · c094bdb8 · a5ee4e0d
Commit a5ee4e0d authored Oct 19, 2018 by Nick Thomas
Hide whitespace changes
Inline Side-by-side

Showing with 124 additions and 0 deletions

doc/development/testing_guide/best_practices.md doc/development/testing_guide/best_practices.md +124 -0

No files found.
--- a/doc/development/testing_guide/best_practices.md
+++ b/doc/development/testing_guide/best_practices.md
@@ -209,6 +209,130 @@ it 'is overdue' do
 end
 ```

+### Pristine test environments
+
+The code exercised by a single GitLab test may access and modify many items of
+data. Without careful preparation before a test runs, and cleanup afterward,
+data can be changed by a test in such a way that it affects the behaviour of
+following tests. This should be avoided at all costs! Fortunately, the existing
+test framework handles most cases already.
+
+When the test environment does get polluted, a common outcome is
+[flaky tests](flaky_tests.md). Pollution will often manifest as an order
+dependency: running spec A followed by spec B will reliably fail, but running
+spec B followed by spec A will reliably succeed. In these cases, you can use
+`rspec --bisect` (or a manual pairwise bisect of spec files) to determine which
+spec is at fault. Fixing the problem requires some understanding of how the test
+suite ensures the environment is pristine. Read on to discover more about each
+data store!
+
+#### SQL database
+
+This is managed for us by the `database_cleaner` gem. Each spec is surrounded in
+a transaction, which is rolled back once the test completes. Certain specs will
+instead issue `DELETE FROM` queries against every table after completion; this
+allows the created rows to be viewed from multiple database connections, which
+is important for specs that run in a browser, or migration specs, among others.
+
+One consequence of using these strategies, instead of the well-known
+`TRUNCATE TABLES` approach, is that primary keys and other sequences are **not**
+reset across specs. So if you create a project in spec A, then create a project
+in spec B, the first will have `id=1`, while the second will have `id=2`.
+
+This means that specs should **never** rely on the value of an ID, or any other
+sequence-generated column. To avoid accidental conflicts, specs should also
+avoid manually specifying any values in these kinds of columns. Instead, leave
+them unspecified, and look up the value after the row is created.
+
+#### Redis
+
+GitLab stores two main categories of data in Redis: cached items, and sidekiq
+jobs.
+
+In most specs, the Rails cache is actually an in-memory store. This is replaced
+between specs, so calls to `Rails.cache.read` and `Rails.cache.write` are safe.
+However, if a spec makes direct Redis calls, it should mark itself with the
+`:clean_gitlab_redis_cache`, `:clean_gitlab_redis_shared_state` or
+`:clean_gitlab_redis_queues` traits as appropriate.
+
+Sidekiq jobs are typically not run in specs, but this behaviour can be altered
+in each spec through the use of `Sidekiq::Testing.inline!` blocks. Any spec that
+causes Sidekiq jobs to be pushed to Redis should use the `:sidekiq` trait, to
+ensure that they are removed once the spec completes.
+
+#### Filesystem
+
+Filesystem data can be roughly split into "repositories", and "everything else".
+Repositories are stored in `tmp/tests/repositories`. This directory is emptied
+before a test run starts, and after the test run ends. It is not emptied between
+specs, so created repositories accumulate within this directory over the
+lifetime of the process. Deleting them is expensive, but this could lead to
+pollution unless carefully managed.
+
+To avoid this, [hashed storage](../../administration/repository_storage_types.md)
+is enabled in the test suite. This means that repositories are given a unique
+path that depends on their project's ID. Since the project IDs are not reset
+between specs, this guarantees that each spec gets its own repository on disk,
+and prevents changes from being visible between specs.
+
+If a spec manually specifies a project ID, or inspects the state of the
+`tmp/tests/repositories/` directory directly, then it should clean up the
+directory both before and after it runs. In general, these patterns should be
+completely avoided.
+
+Other classes of file linked to database objects, such as uploads, are generally
+managed in the same way. With hashed storage enabled in the specs, they are
+written to disk in locations determined by ID, so conflicts should not occur.
+
+Some specs disable hashed storage by passing the `:legacy_storage` trait to the
+`projects` factory. Specs that do this must **never** override the `path` of the
+project, or any of its groups. The default path includes the project ID, so will
+not conflict; but if two specs create a `:legacy_storage` project with the same
+path, they will use the same repository on disk and lead to test environment
+pollution.
+
+Other files must be managed manually by the spec. If you run code that creates a
+`tmp/test-file.csv` file, for instance, the spec must ensure that the file is 
+removed as part of cleanup.
+
+#### Persistent in-memory application state
+
+All the specs in a given `rspec` run share the same Ruby process, which means
+they can affect each other by modifying Ruby objects that are accessible between
+specs. In practice, this means global variables, and constants (which includes
+Ruby classes, modules, etc).
+
+Global variables should generally not be modified. If absolutely necessary, a
+block like this can be used to ensure the change is rolled back afterwards:
+
+```ruby
+around(:each) do |example|
+  old_value = $0
+
+  begin
+    $0 = "new-value"
+    example.run
+  ensure
+    $0 = old_value
+  end
+end
+```
+
+If a spec needs to modify a constant, it should use the `stub_const` helper to
+ensure the change is rolled back.
+
+If you need to modify the contents of the `ENV` constant, you can use the
+`stub_env` helper method instead.
+
+While most Ruby **instances** are not shared between specs, **classes**
+and **modules** generally are. Class and module instance variables, accessors,
+class variables, and other stateful idioms, should be treated in the same way as
+global variables - don't modify them unless you have to! In particular, prefer
+using expectations, or dependency injection along with stubs, to avoid the need
+for modifications. If you have no other choice, an `around` block similar to the
+example for global variables, above, can be used, but this should be avoided if
+at all possible.
+
 ### Table-based / Parameterized tests

 This style of testing is used to exercise one piece of code with a comprehensive