Commit d59d4deb authored by Sean McGivern's avatar Sean McGivern Committed by Marcel Amirault

Add documentation on adding new Redis instances

parent 0ec328ab
...@@ -215,6 +215,7 @@ the [reviewer values](https://about.gitlab.com/handbook/engineering/workflow/rev ...@@ -215,6 +215,7 @@ the [reviewer values](https://about.gitlab.com/handbook/engineering/workflow/rev
- [How to dump production data to staging](db_dump.md) - [How to dump production data to staging](db_dump.md)
- [Geo development](geo.md) - [Geo development](geo.md)
- [Redis guidelines](redis.md) - [Redis guidelines](redis.md)
- [Adding a new Redis instance](redis/new_redis_instance.md)
- [Sidekiq guidelines](sidekiq_style_guide.md) for working with Sidekiq workers - [Sidekiq guidelines](sidekiq_style_guide.md) for working with Sidekiq workers
- [Working with Gitaly](gitaly.md) - [Working with Gitaly](gitaly.md)
- [Elasticsearch integration docs](elasticsearch.md) - [Elasticsearch integration docs](elasticsearch.md)
......
...@@ -6,11 +6,14 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -6,11 +6,14 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Redis guidelines # Redis guidelines
## Redis instances
GitLab uses [Redis](https://redis.io) for the following distinct purposes: GitLab uses [Redis](https://redis.io) for the following distinct purposes:
- Caching (mostly via `Rails.cache`). - Caching (mostly via `Rails.cache`).
- As a job processing queue with [Sidekiq](sidekiq_style_guide.md). - As a job processing queue with [Sidekiq](sidekiq_style_guide.md).
- To manage the shared application state. - To manage the shared application state.
- To store CI trace chunks.
- As a Pub/Sub queue backend for ActionCable. - As a Pub/Sub queue backend for ActionCable.
In most environments (including the GDK), all of these point to the same In most environments (including the GDK), all of these point to the same
...@@ -29,6 +32,8 @@ more often than it is read. ...@@ -29,6 +32,8 @@ more often than it is read.
If [Geo](geo.md) is enabled, each Geo node gets its own, independent Redis If [Geo](geo.md) is enabled, each Geo node gets its own, independent Redis
database. database.
We have [development documentation on adding a new Redis instance](redis/new_redis_instance.md).
## Key naming ## Key naming
Redis is a flat namespace with no hierarchy, which means we must pay attention Redis is a flat namespace with no hierarchy, which means we must pay attention
......
---
stage: none
group: unassigned
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Add a new Redis instance
GitLab can make use of multiple [Redis instances](../redis.md#redis-instances).
These instances are functionally partitioned so that, for example, we
can store [CI trace chunks](../../administration/job_logs.md#incremental-logging-architecture)
from one Redis instance while storing sessions in another.
From time to time we might want to add a new Redis instance. Typically this will
be a functional partition split from one of the existing instances such as the
cache or shared state. This document describes an approach
for adding a new Redis instance that handles existing data, based on
prior examples:
- [Dedicated Redis instance for Trace Chunk storage](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/462).
This document does not cover the operational side of preparing and configuring
the new Redis instance in detail, but the example epics do contain information
on previous approaches to this.
## Step 1: Support configuring the new instance
Before we can switch any features to using the new instance, we have to support
configuring it and referring to it in the codebase. We must support the
main installation types:
- Source installs (including development environments) - [example MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62767)
- Omnibus - [example MR](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/5316)
- Helm charts - [example MR](https://gitlab.com/gitlab-org/charts/gitlab/-/merge_requests/2031)
### Fallback instance
In the application code, we need to define a fallback instance in case the new
instance is not configured. For example, if a GitLab instance has already
configured a separate shared state Redis, and we are partitioning data from the
shared state Redis, our new instance's configuration should default to that of
the shared state Redis when it's not present. Otherwise we could break instances
that don't configure the new Redis instance as soon as it's available.
You can [define a `.config_fallback` method](https://gitlab.com/gitlab-org/gitlab/-/blob/a75471dd744678f1a59eeb99f71fca577b155acd/lib/gitlab/redis/wrapper.rb#L69-87)
in `Gitlab::Redis::Wrapper` (the base class for all Redis instances)
that defines the instance to be used if this one is not configured. If we were
adding a `Foo` instance that should fall back to `SharedState`, we can do that
like this:
```ruby
module Gitlab
module Redis
class Foo < ::Gitlab::Redis::Wrapper
# The data we store on Foo used to be stored on SharedState.
def self.config_fallback
SharedState
end
end
end
end
```
We should also add specs like those in
[`trace_chunks_spec.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/lib/gitlab/redis/trace_chunks_spec.rb)
to ensure that this fallback works correctly.
## Step 2: Support writing to and reading from the new instance
When migrating to the new instance, we must account for cases where data is
either on:
- The 'old' (original) instance.
- The new one that we have just added support for.
As a result we may need to support reading from and writing to both
instances, depending on some condition.
The exact condition to use varies depending on the data to be migrated. For
the trace chunks case above, there was already a database column indicating where the
data was stored (as there are other storage options than Redis).
This step may not apply if the data has a very short lifetime (a few minutes at most)
and is not critical. In that case, we
may decide that it is OK to incur a small amount of data loss and switch
over through configuration only.
If there is not a more natural way to mark where the data is stored, using a
[feature flag](../feature_flags/index.md) may be convenient:
- It does not require an application restart to take effect.
- It applies to all application instances (Sidekiq, API, web, etc.) at
the same time.
- It supports incremental rollout - ideally by actor (project, group,
user, etc.) - so that we can monitor for errors and roll back easily.
## Step 3: Migrate the data
We then need to configure the new instance for GitLab.com's production and
staging environments. Hopefully it will be possible to test this change
effectively on staging, to at least make sure that basic usage continues to
work.
After that is done, we can roll out the change to production. Ideally this would
be in an incremental fashion, following the
[standard incremental rollout](../feature_flags/controls.md#rolling-out-changes)
documentation for feature flags.
When we have been using the new instance 100% of the time in production for a
while and there are no issues, we can proceed.
## Step 4: clean up after the migration
<!-- markdownlint-disable MD044 -->
We may choose to keep the migration paths or remove them, depending on whether
or not we expect self-managed instances to perform this migration.
[gitlab-com/gl-infra/scalability#1131](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1131#note_603354746)
contains a discussion on this topic for the trace chunks feature flag. It may
be - as in that case - that we decide that the maintenance costs of supporting
the migration code are higher than the benefits of allowing self-managed
instances to perform this migration seamlessly, if we expect self-managed
instances to cope without this functional partition.
<!-- markdownlint-enable MD044 -->
If we decide to keep the migration code:
- We should document the migration steps.
- If we used a feature flag, we should ensure it's an [ops type feature
flag](../feature_flags/index.md#ops-type), as these are long-lived flags.
Otherwise, we can remove the flags and conclude the project.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment