Commit 99b76451 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'jc-docs-praefect-failover' into 'master'

Add docs to describe praefect failover

Closes gitaly#2472

See merge request gitlab-org/gitlab!27242
parents d1c62865 40eab35d
...@@ -573,6 +573,127 @@ Particular attention should be shown to: ...@@ -573,6 +573,127 @@ Particular attention should be shown to:
Congratulations! You have configured a highly available Praefect cluster. Congratulations! You have configured a highly available Praefect cluster.
### Failover
There are two ways to do a failover from one internal Gitaly node to another as the primary. Manually, or automatically.
As an example, in this `config.toml` we have one virtual storage named "default" with two internal Gitaly nodes behind it.
One is deemed the "primary". This means that read and write traffic will go to `internal_storage_0`, and writes
will get replicated to `internal_storage_1`:
```toml
socket_path = "/path/to/Praefect.socket"
# failover_enabled will enable automatic failover
failover_enabled = false
[logging]
format = "json"
level = "info"
[[virtual_storage]]
name = "default"
[[virtual_storage.node]]
name = "internal_storage_0"
address = "tcp://localhost:9999"
primary = true
token = "supersecret"
[[virtual_storage.node]]
name = "internal_storage_1"
address = "tcp://localhost:9998"
token = "supersecret"
```
#### Manual failover
In order to failover from using one internal Gitaly node to using another, a manual failover step can be used. Unless `failover_enabled` is set to `true`
in the `config.toml`, the only way to fail over from one primary to using another node as the primary is to do a manual failover.
1. Move `primary = true` from the current `[[virtual_storage.node]]` to another node in `/etc/gitlab/gitlab.rb`:
```ruby
praefect['virtual_storages'] = {
'praefect' => {
'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
# no longer the primary
},
'gitaly-2' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
# this is the new primary
'primary' => true
},
'gitaly-3' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
}
}
}
```
1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
On a restart, Praefect will send write traffic to `internal_storage_1`. `internal_storage_0` is the new secondary now,
and replication jobs will be created to replicate repository data to `internal_storage_0` **from** `internal_storage_1`
#### Automatic failover
When automatic failover is enabled, Praefect will do automatic detection of the health of internal Gitaly nodes. If the
primary has a certain amount of healthchecks fail, it will decide to promote one of the secondaries to be primary, and
demote the primary to be a secondary.
1. To enable automatic failover, edit `/etc/gitlab/gitlab.rb`:
```ruby
# failover_enabled turns on automatic failover
praefect['failover_enabled'] = true
praefect['virtual_storages'] = {
'praefect' => {
'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
'primary' => true
},
'gitaly-2' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
},
'gitaly-3' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
}
}
}
```
1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
Below is the picture when Praefect starts up with the config.toml above:
```mermaid
graph TD
A[Praefect] -->|Mutator RPC| B(internal_storage_0)
B --> |Replication|C[internal_storage_1]
```
Let's say suddenly `internal_storage_0` goes down. Praefect will detect this and
automatically switch over to `internal_storage_1`, and `internal_storage_0` will serve as a secondary:
```mermaid
graph TD
A[Praefect] -->|Mutator RPC| B(internal_storage_1)
B --> |Replication|C[internal_storage_0]
```
NOTE: **Note:**: Currently this feature is supported for setups that only have 1 Praefect instance. Praefect instances running,
for example behind a load balancer, `failover_enabled` should be disabled. The reason is The reason is because there
is no coordination that currently happens across different Praefect instances, so there could be a situation where
two Praefect instances think two different Gitaly nodes are the primary.
## Migrating existing repositories to Praefect ## Migrating existing repositories to Praefect
If your GitLab instance already has repositories, these won't be migrated If your GitLab instance already has repositories, these won't be migrated
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment