Commit 0d452261 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'gitaly-docs-improvements' into 'master'

Simplify failover documentation

Closes gitaly#2584

See merge request gitlab-org/gitlab!28405
parents 643567db c5925bca
......@@ -256,9 +256,9 @@ application server, or a Gitaly node.
```ruby
# Name of storage hash must match storage name in git_data_dirs on GitLab
# server ('praefect') and in git_data_dirs on Gitaly nodes ('gitaly-1')
# server ('storage_1') and in git_data_dirs on Gitaly nodes ('gitaly-1')
praefect['virtual_storages'] = {
'praefect' => {
'storage_1' => {
'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
......@@ -430,7 +430,7 @@ documentation](index.md#3-gitaly-server-configuration).
gitlab-ctl restart gitaly
```
**Complete these steps for each Gitaly node!**
**The steps above must be completed for each Gitaly node!**
After all Gitaly nodes are configured, you can run the Praefect connection
checker to verify Praefect can connect to all Gitaly servers in the Praefect
......@@ -442,6 +442,34 @@ config.
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dial-nodes
```
1. Enable automatic failover by editing `/etc/gitlab/gitlab.rb`:
```ruby
praefect['failover_enabled'] = true
```
When automatic failover is enabled, Praefect checks the health of internal
Gitaly nodes. If the primary has a certain amount of health checks fail, it
will promote one of the secondaries to be primary, and demote the primary to
be a secondary.
Manual failover is possible by updating `praefect['virtual_storages']` and
nominating a new primary node.
NOTE: **Note:**: Automatic failover is not yet supported for setups with
multiple Praefect nodes. There is currently no coordination between Praefect
nodes, which could result in two Praefect instances thinking two different
Gitaly nodes are the primary. Follow issue
[#2547](https://gitlab.com/gitlab-org/gitaly/-/issues/2547) for
updates.
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure):
```shell
gitlab-ctl reconfigure
```
### GitLab
To complete this section you will need:
......@@ -560,133 +588,80 @@ Particular attention should be shown to:
- Deselect the **default** storage location
- Select the **praefect** storage location
![Update repository storage](img/praefect_storage_v12_10.png)
1. Verify everything is still working by creating a new project. Check the
"Initialize repository with a README" box so that there is content in the
repository that viewed. If the project is created, and you can see the
README file, it works!
Congratulations! You have configured a highly available Praefect cluster.
### Failover
### Grafana
There are two ways to do a failover from one internal Gitaly node to another as the primary. Manually, or automatically.
As an example, in this `config.toml` we have one virtual storage named "default" with two internal Gitaly nodes behind it.
One is deemed the "primary". This means that read and write traffic will go to `internal_storage_0`, and writes
will get replicated to `internal_storage_1`:
```toml
socket_path = "/path/to/Praefect.socket"
# failover_enabled will enable automatic failover
failover_enabled = false
[logging]
format = "json"
level = "info"
[[virtual_storage]]
name = "default"
[[virtual_storage.node]]
name = "internal_storage_0"
address = "tcp://localhost:9999"
primary = true
token = "supersecret"
Grafana is included with GitLab, and can be used to monitor your Praefect
cluster. See [Grafana Dashboard
Service](https://docs.gitlab.com/omnibus/settings/grafana.html)
for detailed documentation.
[[virtual_storage.node]]
name = "internal_storage_1"
address = "tcp://localhost:9998"
token = "supersecret"
```
To get started quickly:
#### Manual failover
1. SSH into the **GitLab** node and login as root:
In order to failover from using one internal Gitaly node to using another, a manual failover step can be used. Unless `failover_enabled` is set to `true`
in the `config.toml`, the only way to fail over from one primary to using another node as the primary is to do a manual failover.
```shell
sudo -i
```
1. Move `primary = true` from the current `[[virtual_storage.node]]` to another node in `/etc/gitlab/gitlab.rb`:
1. Enable the Grafana login form by editing `/etc/gitlab/gitlab.rb`.
```ruby
praefect['virtual_storages'] = {
'praefect' => {
'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
# no longer the primary
},
'gitaly-2' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
# this is the new primary
'primary' => true
},
'gitaly-3' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
}
}
}
grafana['disable_login_form'] = false
```
1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure):
On a restart, Praefect will send write traffic to `internal_storage_1`. `internal_storage_0` is the new secondary now,
and replication jobs will be created to replicate repository data to `internal_storage_0` **from** `internal_storage_1`
```shell
gitlab-ctl reconfigure
```
#### Automatic failover
1. Set the Grafana admin password. This command will prompt you to enter a new
password:
When automatic failover is enabled, Praefect will do automatic detection of the health of internal Gitaly nodes. If the
primary has a certain amount of health checks fail, it will decide to promote one of the secondaries to be primary, and
demote the primary to be a secondary.
```shell
gitlab-ctl set-grafana-password
```
1. To enable automatic failover, edit `/etc/gitlab/gitlab.rb`:
1. In your web browser, open `/-/grafana` (e.g.
`https://gitlab.example.com/-/grafana`) on your GitLab server.
```ruby
# failover_enabled turns on automatic failover
praefect['failover_enabled'] = true
praefect['virtual_storages'] = {
'praefect' => {
'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
'primary' => true
},
'gitaly-2' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
},
'gitaly-3' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
}
}
}
```
Login using the password you set, and the username `admin`.
1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
1. Go to **Explore** and query `gitlab_build_info` to verify that you are
getting metrics from all your machines.
Below is the picture when Praefect starts up with the config.toml above:
Congratulations! You've configured an observable highly available Praefect
cluster.
```mermaid
graph TD
A[Praefect] -->|Mutator RPC| B(internal_storage_0)
B --> |Replication|C[internal_storage_1]
```
## Automatic failover and leader election
Let's say suddenly `internal_storage_0` goes down. Praefect will detect this and
automatically switch over to `internal_storage_1`, and `internal_storage_0` will serve as a secondary:
Praefect regularly checks the health of each backend Gitaly node. This
information can be used to automatically failover to a new primary node if the
current primary node is found to be unhealthy.
```mermaid
graph TD
A[Praefect] -->|Mutator RPC| B(internal_storage_1)
B --> |Replication|C[internal_storage_0]
```
- **Manual:** Automatic failover is disabled. The primary node can be
reconfigured in `/etc/gitlab/gitlab.rb` on the Praefect node. Modify the
`praefect['virtual_storages']` field by moving the `primary = true` to promote
a different Gitaly node to primary. In the steps above, `gitaly-1` was set to
the primary.
- **Memory:** Enabled by setting `praefect['failover_enabled'] = true` in
`/etc/gitlab/gitlab.rb` on the Praefect node. If a sufficient number of health
checks fail for the current primary backend Gitaly node, and new primary will
be elected. **Do not use with multiple Praefect nodes!** Using with multiple
Praefect nodes is likely to result in a split brain.
- **PostgreSQL:** Coming soon. See isse
[#2547](https://gitlab.com/gitlab-org/gitaly/-/issues/2547) for updates.
NOTE: **Note:**: Currently this feature is supported for setups that only have 1 Praefect instance. Praefect instances running,
for example behind a load balancer, `failover_enabled` should be disabled. The reason is The reason is because there
is no coordination that currently happens across different Praefect instances, so there could be a situation where
two Praefect instances think two different Gitaly nodes are the primary.
It is likely that we will implement support for Consul, and a cloud native
strategy in the future.
## Backend Node Recovery
......@@ -711,49 +686,6 @@ The command will return a list of repositories that were found to be
inconsistent against the current primary. Each of these inconsistencies will
also be logged with an accompanying replication job ID.
## Grafana
Grafana is included with GitLab, and can be used to monitor your Praefect
cluster. See [Grafana Dashboard
Service](https://docs.gitlab.com/omnibus/settings/grafana.html)
for detailed documentation.
To get started quickly:
1. SSH into the **GitLab** node and login as root:
```shell
sudo -i
```
1. Enable the Grafana login form by editing `/etc/gitlab/gitlab.rb`.
```ruby
grafana['disable_login_form'] = false
```
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure):
```shell
gitlab-ctl reconfigure
```
1. Set the Grafana admin password. This command will prompt you to enter a new
password:
```shell
gitlab-ctl set-grafana-password
```
1. In your web browser, open `/-/grafana` (e.g.
`https://gitlab.example.com/-/grafana`) on your GitLab server.
Login using the password you set, and the username `admin`.
1. Go to **Explore** and query `gitlab_build_info` to verify that you are
getting metrics from all your machines.
## Migrating existing repositories to Praefect
If your GitLab instance already has repositories, these won't be migrated
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment