Merge branch 'redis-production-documentation' into 'master'

Update Redis guidelines See merge request gitlab-org/gitlab!41889

Merge branch 'redis-production-documentation' into 'master'
Update Redis guidelines See merge request gitlab-org/gitlab!41889
ee7e7ac4 · Mike Jang · 780c94d7 · c7235b75 · ee7e7ac4 · ee7e7ac4
Commit ee7e7ac4 authored Sep 16, 2020 by Mike Jang
Hide whitespace changes
Inline Side-by-side

Showing with 141 additions and 7 deletions

.markdownlint.json .markdownlint.json +1 -0

doc/development/redis.md doc/development/redis.md +140 -7

No files found.
--- a/.markdownlint.json
+++ b/.markdownlint.json
@@ -49,6 +49,7 @@
      "Elasticsearch",
      "Facebook",
      "fastlane",
+      "fluent-plugin-redis-slowlog",
      "GDK",
      "Geo",
      "Git LFS",

--- a/doc/development/redis.md
+++ b/doc/development/redis.md
 # Redis guidelines
-GitLab uses [Redis](https://redis.io) for three distinct purposes:
+GitLab uses [Redis](https://redis.io) for the following distinct purposes:
- Caching via `Rails.cache`.
+- Caching (mostly via `Rails.cache`).
 - As a job processing queue with [Sidekiq](sidekiq_style_guide.md).
 - To manage the shared application state.
+- As a Pub/Sub queue backend for ActionCable.
+In most environments (including the GDK), all of these point to the same
+Redis instance.
+On GitLab.com, we use [separate Redis
+instances](../administration/redis/replication_and_failover.md#running-multiple-redis-clusters).
+(We do not currently use [ActionCable on
+GitLab.com](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/228)).
 Every application process is configured to use the same Redis servers, so they
 can be used for inter-process communication in cases where [PostgreSQL](sql.md)
@@ -21,11 +30,11 @@ to key names to avoid collisions. Typically we use colon-separated elements to
 provide a semblance of structure at application level. An example might be
 `projects:1:somekey`.
-Although we split our Redis usage into three separate purposes, and those may
+Although we split our Redis usage by purpose into distinct categories, and
-map to separate Redis servers in a [Highly Available](../administration/high_availability/redis.md)
+those may map to separate Redis servers in a Highly Available
-configuration, the default Omnibus and GDK setups share a single Redis server.
+configuration like GitLab.com, the default Omnibus and GDK setups share
-This means that keys should **always** be globally unique across the three
+a single Redis server. This means that keys should **always** be
-purposes.
+globally unique across all categories.
 It is usually better to use immutable identifiers - project ID rather than
 full path, for instance - in Redis key names. If full path is used, the key will
@@ -56,3 +65,127 @@ Currently, we validate this in the development and test environments
 with the [`RedisClusterValidator`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/instrumentation/redis_cluster_validator.rb),
 which is enabled for the `cache` and `shared_state`
 [Redis instances](https://docs.gitlab.com/omnibus/settings/redis.html#running-with-multiple-redis-instances)..
+## Redis in structured logging
+Our [structured logging](logging.md#use-structured-json-logging) for web
+requests and Sidekiq jobs contains fields for the duration, call count,
+bytes written, and bytes read per Redis instance, along with a total for
+all Redis instances. For a particular request, this might look like:
+| Field | Value |
+| --- | --- |
+| `json.queue_duration_s` | 0.01 |
+| `json.redis_cache_calls` | 1 |
+| `json.redis_cache_duration_s` | 0 |
+| `json.redis_cache_read_bytes` | 109 |
+| `json.redis_cache_write_bytes` | 49 |
+| `json.redis_calls` | 2 |
+| `json.redis_duration_s` | 0.001 |
+| `json.redis_read_bytes` | 111 |
+| `json.redis_shared_state_calls` | 1 |
+| `json.redis_shared_state_duration_s` | 0 |
+| `json.redis_shared_state_read_bytes` | 2 |
+| `json.redis_shared_state_write_bytes` | 206 |
+| `json.redis_write_bytes` | 255 |
+As all of these fields are indexed, it is then straightforward to
+investigate Redis usage in production. For instance, to find the
+requests that read the most data from the cache, we can just sort by
+`redis_cache_read_bytes` in descending order.
+### The slow log
+On GitLab.com, entries from the [Redis
+slow log](https://redis.io/commands/slowlog) are available in the
+`pubsub-redis-inf-gprd*` index with the [`redis.slowlog`
+tag](https://log.gprd.gitlab.net/app/kibana#/discover?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-1d,to:now))&_a=(columns:!(json.type,json.command,json.exec_time),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:AWSQX_Vf93rHTYrsexmk,key:json.tag,negate:!f,params:(query:redis.slowlog),type:phrase),query:(match:(json.tag:(query:redis.slowlog,type:phrase))))),index:AWSQX_Vf93rHTYrsexmk)).
+This shows commands that have taken a long time and may be a performance
+concern.
+The
+[fluent-plugin-redis-slowlog](https://gitlab.com/gitlab-org/fluent-plugin-redis-slowlog)
+project is responsible for taking the slowlog entries from Redis and
+passing to fluentd (and ultimately Elasticsearch).
+## Analyzing the entire keyspace
+The [Redis Keyspace
+Analyzer](https://gitlab.com/gitlab-com/gl-infra/redis-keyspace-analyzer)
+project contains tools for dumping the full key list and memory usage of a Redis
+instance, and then analyzing those lists while elimating potentially sensitive
+data from the results. It can be used to find the most frequent key patterns, or
+those that use the most memory.
+Currently this is not run automatically for the GitLab.com Redis instances, but
+is run manually on an as-needed basis.
+## Utility classes
+We have some extra classes to help with specific use cases. These are
+mostly for fine-grained control of Redis usage, so they wouldn't be used
+in combination with the `Rails.cache` wrapper: we'd either use
+`Rails.cache` or these classes and literal Redis commands.
+`Rails.cache` or these classes and literal Redis commands. We prefer
+using `Rails.cache` so we can reap the benefits of future optimizations
+done to Rails. It is worth noting that Ruby objects are
+[marshalled](https://github.com/rails/rails/blob/v6.0.3.1/activesupport/lib/active_support/cache/redis_cache_store.rb#L447)
+when written to Redis, so we need to pay attention to not to store huge
+objects, or untrusted user input.
+Typically we would only use these classes when at least one of the
+following is true:
+1. We want to manipulate data on a non-cache Redis instance.
+1. `Rails.cache` does not support the operations we want to perform.
+### `Gitlab::Redis::{Cache,SharedState,Queues}`
+These classes wrap the Redis instances (using
+[`Gitlab::Redis::Wrapper`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/redis/wrapper.rb))
+to make it convenient to work with them directly. The typical use is to
+call `.with` on the class, which takes a block that yields the Redis
+connection. For example:
+```ruby
+# Get the value of `key` from the shared state (persistent) Redis
+Gitlab::Redis::SharedState.with { |redis| redis.get(key) }
+# Check if `value` is a member of the set `key`
+Gitlab::Redis::Cache.with { |redis| redis.sismember(key, value) }
+```
+### `Gitlab::Redis::Boolean`
+In Redis, every value is a string.
+[`Gitlab::Redis::Boolean`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/redis/boolean.rb)
+makes sure that booleans are encoded and decoded consistently.
+### `Gitlab::Redis::HLL`
+The Redis [`PFCOUNT`](https://redis.io/commands/pfcount),
+[`PFADD`](https://redis.io/commands/pfadd), and
+[`PFMERGE`](https://redis.io/commands/pfmergge) commands operate on
+HyperLogLogs, a data structure that allows estimating the number of unique
+elements with low memory usage. (In addition to the `PFCOUNT` documentation,
+Thoughtbot's article on [HyperLogLogs in
+Redis](https://thoughtbot.com/blog/hyperloglogs-in-redis) provides a good
+background here.)
+[`Gitlab::Redis::HLL`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/redis/hll.rb)
+provides a convenient interface for adding and counting values in HyperLogLogs.
+### `Gitlab::SetCache`
+For cases where we need to efficiently check the whether an item is in a group
+of items, we can use a Redis set.
+[`Gitlab::SetCache`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/set_cache.rb)
+provides an `#include?` method that will use the
+[`SISMEMBER`](https://redis.io/commands/sismember) command, as well as `#read`
+to fetch all entries in the set.
+This is used by the
+[`RepositorySetCache`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/repository_set_cache.rb)
+to provide a convenient way to use sets to cache repository data like branch
+names.