Commit 6fc5c01e authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'omnibus-5116-patroni-docs' into 'master'

Add documentation for using Patroni with Omnibus

Closes omnibus-gitlab#5116

See merge request gitlab-org/gitlab!33363
parents 119b6318 5248c77a
......@@ -1127,3 +1127,208 @@ If you're running into an issue with a component not outlined here, be sure to c
- [Consul](../high_availability/consul.md#troubleshooting)
- [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting)
- [GitLab application](../high_availability/gitlab.md#troubleshooting)
## Patroni
NOTE: **Note:** Starting from GitLab 13.1, Patroni is available for **experimental** use to replace repmgr. Due to its
experimental nature, Patroni support is **subject to change without notice.**
Patroni is an opinionated solution for PostgreSQL high-availability. It takes the control of PostgreSQL, overrides its
configuration and manages its lifecycle (start, stop, restart). This is a more active approach when compared to repmgr.
Both repmgr and Patroni are both supported and available. But Patroni will be the default (and perhaps the only) option
for PostgreSQL 12 clustering and cascading replication for Geo deployments.
The [architecture](#example-recommended-setup-manual-steps) (that was mentioned above) does not change for Patroni.
You do not need any special consideration for Patroni while provisioning your database nodes. Patroni heavily relies on
Consul to store the state of the cluster and elect a leader. Any failure in Consul cluster and its leader election will
propagate to Patroni cluster as well.
Similar to repmgr, Patroni monitors the cluster and handles failover. When the primary node fails it works with Consul
to notify PgBouncer. However, as opposed to repmgr, on failure, Patroni handles the transitioning of the old primary to
a replica and rejoins it to the cluster automatically. So you do not need any manual operation for recovering the
cluster as you do with repmgr.
With Patroni the connection flow is slightly different. Patroni on each node connects to Consul agent to join the
cluster. Only after this point it decides if the node is the primary or a replica. Based on this decision, it configures
and starts PostgreSQL which it communicates with directly over a Unix socket. This implies that if Consul cluster is not
functional or does not have a leader, Patroni and by extension PostgreSQL will not start. Patroni also exposes a REST
API which can be accessed via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#patroni)
on each node.
### Configuring Patroni cluster
You must enable Patroni explicitly to be able to use it (with `patroni['enable'] = true`). When Patroni is enabled
repmgr will be disabled automatically.
Any PostgreSQL configuration item that controls replication, for example `wal_level`, `max_wal_senders`, etc, are strictly
controlled by Patroni and will override the original settings that you make with the `postgresql[...]` configuration key.
Hence, they are all separated and placed under `patroni['postgresql'][...]`. This behavior is limited to replication.
Patroni honours any other PostgreSQL configuration that was made with the `postgresql[...]` configuration key. For example,
`max_wal_senders` by default is set to `5`. If you wish to change this you must set it with the `patroni['postgresql']['max_wal_senders']`
configuration key.
The configuration of Patroni node is very similar to a repmgr but shorter. When Patroni is enabled, first you can ignore
any replication setting of PostgreSQL (it will be overwritten anyway). Then you can remove any `repmgr[...]` or
repmgr-specific configuration as well. Especially, make sure that you remove `postgresql['shared_preload_libraries'] = 'repmgr_funcs'`.
Here is an example similar to [the one that was done with repmgr](#configuring-the-database-nodes):
```ruby
# Disable all components except PostgreSQL and Repmgr and Consul
roles['postgres_role']
# Enable Patroni
patroni['enable'] = true
# PostgreSQL configuration
postgresql['listen_address'] = '0.0.0.0'
# Disable automatic database migrations
gitlab_rails['auto_migrate'] = false
# Configure the Consul agent
consul['services'] = %w(postgresql)
# START user configuration
# Please set the real values as explained in Required Information section
#
# Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
# Replace X with value of number of db nodes + 1 (OPTIONAL the default value is 5)
patroni['postgresql']['max_wal_senders'] = X
patroni['postgresql']['max_replication_slots'] = X
# Replace XXX.XXX.XXX.XXX/YY with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY)
# Replace placeholders:
#
# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
# with the addresses gathered for CONSUL_SERVER_NODES
consul['configuration'] = {
retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
}
#
# END user configuration
```
You do not need an additional or different configuration for replica nodes. As a matter of fact, you don't have to have
a predetermined primary node. Therefore all database nodes use the same configuration.
Once the configuration of a node is done, you must [reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure)
on each node for the changes to take effect.
Generally, when Consul cluster is ready, the first node that [reconfigures](../restart_gitlab.md#omnibus-gitlab-reconfigure)
becomes the leader. You do not need to sequence the nodes reconfiguration. You can run them in parallel or in any order.
If you choose an arbitrary order you do not have any predetermined master.
As opposed to repmgr, once the nodes are reconfigured you do not need any further action or additional command to join
the replicas.
#### Database authorization for Patroni
Patroni uses Unix socket to manage PostgreSQL instance. Therefore, the connection from the `local` socket must be trusted.
Also, replicas use the replication user (`gitlab_replicator` by default) to communicate with the leader. For this user,
you can choose between `trust` and `md5` authentication. If you set `postgresql['sql_replication_password']`,
Patroni will use `md5` authentication, otherwise it falls back to `trust`. You must to specify the cluster CIDR in
`postgresql['md5_auth_cidr_addresses']` or `postgresql['trust_auth_cidr_addresses']` respectively.
### Interacting with Patroni cluster
You can use `gitlab-ctl patroni members` to check the status of the cluster members. To check the status of each node
`gitlab-ctl patroni` provides two additional sub-commands, `check-leader` and `check-replica` which indicate if a node
is the primary or a replica.
When Patroni is enabled, you don't have direct control over `postgresql` service. Patroni will signal PostgreSQL's startup,
shutdown, and restart. For example, for shutting down PostgreSQL on a node, you must shutdown Patroni on the same node
with:
```shell
sudo gitlab-ctl stop patroni
```
### Failover procedure for Patroni
With Patroni, you have two slightly different options: failover and switchover. Essentially, failover allows you to
perform a manual failover when there are no healthy nodes, while switchover only works when the cluster is healthy and
allows you to schedule a switchover (it can happen immediately). For further details, see
[Patroni documentation on this subject](https://patroni.readthedocs.io/en/latest/rest_api.html#switchover-and-failover-endpoints).
To schedule a switchover:
```shell
sudo gitlab-ctl patroni switchover
```
For manual failover:
```shell
sudo gitlab-ctl patroni failover
```
### Recovering the Patroni cluster
To recover the old primary and rejoin it to the cluster as a replica, you can simply start Patroni with:
```shell
sudo gitlab-ctl start patroni
```
No further configuration or intervention is needed.
### Maintenance procedure for Patroni
With Patroni enabled, you can run a planned maintenance. If you want to do some maintenance work on one node and you
don't want Patroni to manage it, you can use put it into maintenance mode:
```shell
sudo gitlab-ctl patroni pause
```
When Patroni runs in a paused mode, it does not change the state of PostgreSQL. Once you are done you can resume Patroni:
```shell
sudo gitlab-ctl patroni resume
```
For further details, see [Patroni documentation on this subject](https://patroni.readthedocs.io/en/latest/pause.html).
### Switching from repmgr to Patroni
CAUTION: **Warning:**
Although switching from repmgr to Patroni is fairly straightforward the other way around is not. Rolling back from
Patroni to repmgr can be complicated and may involve deletion of data directory. If you need to do that, please contact
GitLab support.
You can switch an exiting database cluster to use Patroni instead of repmgr with the following steps:
1. Stop repmgr on all replica nodes and lastly with the primary node:
```shell
sudo gitlab-ctl stop repmgrd
```
1. Stop PostgreSQL on all replica nodes:
```shell
sudo gitlab-ctl stop postgresql
```
NOTE: **Note:** Ensure that there is no `walsender` process running on the primary node.
`ps aux | grep walsender` must not show any running process.
1. On the primary node, [configure Patroni](#configuring-patroni-cluster). Remove `repmgr` and any other
repmgr-specific configuration. Also remove any configuration that is related to PostgreSQL replication.
1. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) on the primary node. It will become
the leader. You can check this with:
```shell
sudo gitlab-ctl tail patroni
```
1. Repeat the last two steps for all replica nodes. `gitlab.rb` should look the same on all nodes.
1. Optional: You can remove `gitlab_repmgr` database and role on the primary.
......@@ -120,13 +120,13 @@ As long as at least one of each component is online and capable of handling the
### Automated database failover **(PREMIUM ONLY)**
> - Level of complexity: **High**
> - Required domain knowledge: PgBouncer, Repmgr, shared storage, distributed systems
> - Required domain knowledge: PgBouncer, Repmgr or Patroni, shared storage, distributed systems
> - Supported tiers: [GitLab Premium and Ultimate](https://about.gitlab.com/pricing/)
By adding automatic failover for database systems, you can enable higher uptime
with additional database nodes. This extends the default database with
cluster management and failover policies.
[PgBouncer in conjunction with Repmgr](../postgresql/replication_and_failover.md)
[PgBouncer in conjunction with Repmgr or Patroni](../postgresql/replication_and_failover.md)
is recommended.
### Instance level replication with GitLab Geo **(PREMIUM ONLY)**
......@@ -164,6 +164,7 @@ column.
| [PostgreSQL](../../development/architecture.md#postgresql) | Database | [PostgreSQL configuration](https://docs.gitlab.com/omnibus/settings/database.html) | Yes |
| [PgBouncer](../../development/architecture.md#pgbouncer) | Database connection pooler | [PgBouncer configuration](../high_availability/pgbouncer.md#running-pgbouncer-as-part-of-a-non-ha-gitlab-installation) **(PREMIUM ONLY)** | Yes |
| Repmgr | PostgreSQL cluster management and failover | [PostgreSQL and Repmgr configuration](../postgresql/replication_and_failover.md) | Yes |
| Patroni | An alternative PostgreSQL cluster management and failover | [PostgreSQL and Patroni configuration](../postgresql/replication_and_failover.md#patroni) | Yes |
| [Redis](../../development/architecture.md#redis) ([3](#footnotes)) | Key/value store for fast data lookup and caching | [Redis configuration](../high_availability/redis.md) | Yes |
| Redis Sentinel | Redis | [Redis Sentinel configuration](../high_availability/redis.md) | Yes |
| [Gitaly](../../development/architecture.md#gitaly) ([2](#footnotes)) ([7](#footnotes)) | Provides access to Git repositories | [Gitaly configuration](../gitaly/index.md#run-gitaly-on-its-own-server) | Yes |
......
......@@ -46,7 +46,7 @@ This section is for links to information elsewhere in the GitLab documentation.
- Managing Omnibus PostgreSQL versions [from the development docs](https://docs.gitlab.com/omnibus/development/managing-postgresql-versions.html)
- [PostgreSQL scaling](../postgresql/replication_and_failover.md)
- including [troubleshooting](../postgresql/replication_and_failover.md#troubleshooting) `gitlab-ctl repmgr-check-master` and PgBouncer errors
- including [troubleshooting](../postgresql/replication_and_failover.md#troubleshooting) `gitlab-ctl repmgr-check-master` (or `gitlab-ctl patroni check-leader` if you are using Patroni) and PgBouncer errors
- [Developer database documentation](../../development/README.md#database-guides) - some of which is absolutely not for production use. Including:
- understanding EXPLAIN plans
......
......@@ -115,8 +115,7 @@ that backup, the database can apply the WAL logs in order until the
database has reached the target time.
On GitLab.com, Consul and Patroni work together to coordinate failovers with
the read replicas. [Omnibus ships with repmgr instead of
Patroni](../administration/postgresql/replication_and_failover.md).
the read replicas. [Omnibus ships with both repmgr and Patroni](../administration/postgresql/replication_and_failover.md).
#### Load-balancing
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment