Commit 92ee313c authored by Douglas Barbosa Alexandre's avatar Douglas Barbosa Alexandre Committed by Marcel Amirault

Update docs to promote a Geo secondary site with the new single command

parent ac9b0aa0
......@@ -5,27 +5,27 @@ info: To determine the technical writer assigned to the Stage/Group associated w
type: howto
---
# Bring a demoted primary node back online **(PREMIUM SELF)**
# Bring a demoted primary site back online **(PREMIUM SELF)**
After a failover, it is possible to fail back to the demoted **primary** node to
After a failover, it is possible to fail back to the demoted **primary** site to
restore your original configuration. This process consists of two steps:
1. Making the old **primary** node a **secondary** node.
1. Promoting a **secondary** node to a **primary** node.
1. Making the old **primary** site a **secondary** site.
1. Promoting a **secondary** site to a **primary** site.
WARNING:
If you have any doubts about the consistency of the data on this node, we recommend setting it up from scratch.
If you have any doubts about the consistency of the data on this site, we recommend setting it up from scratch.
## Configure the former **primary** node to be a **secondary** node
## Configure the former **primary** site to be a **secondary** site
Since the former **primary** node will be out of sync with the current **primary** node, the first step is to bring the former **primary** node up to date. Note, deletion of data stored on disk like
repositories and uploads will not be replayed when bringing the former **primary** node back
Since the former **primary** site will be out of sync with the current **primary** site, the first step is to bring the former **primary** site up to date. Note, deletion of data stored on disk like
repositories and uploads will not be replayed when bringing the former **primary** site back
into sync, which may result in increased disk usage.
Alternatively, you can [set up a new **secondary** GitLab instance](../setup/index.md) to avoid this.
To bring the former **primary** node up to date:
To bring the former **primary** site up to date:
1. SSH into the former **primary** node that has fallen behind.
1. SSH into the former **primary** site that has fallen behind.
1. Make sure all the services are up:
```shell
......@@ -33,36 +33,36 @@ To bring the former **primary** node up to date:
```
NOTE:
If you [disabled the **primary** node permanently](index.md#step-2-permanently-disable-the-primary-node),
If you [disabled the **primary** site permanently](index.md#step-2-permanently-disable-the-primary-site),
you need to undo those steps now. For Debian/Ubuntu you just need to run
`sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install
the GitLab instance from scratch and set it up as a **secondary** node by
the GitLab instance from scratch and set it up as a **secondary** site by
following [Setup instructions](../setup/index.md). In this case, you don't need to follow the next step.
NOTE:
If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domain-dns-record)
for this node during disaster recovery procedure you may need to [block
all the writes to this node](planned_failover.md#prevent-updates-to-the-primary-node)
for this site during disaster recovery procedure you may need to [block
all the writes to this site](planned_failover.md#prevent-updates-to-the-primary-node)
during this procedure.
1. [Set up database replication](../setup/database.md). In this case, the **secondary** node
refers to the former **primary** node.
1. If [PgBouncer](../../postgresql/pgbouncer.md) was enabled on the **current secondary** node
(when it was a primary node) disable it by editing `/etc/gitlab/gitlab.rb`
1. [Set up database replication](../setup/database.md). In this case, the **secondary** site
refers to the former **primary** site.
1. If [PgBouncer](../../postgresql/pgbouncer.md) was enabled on the **current secondary** site
(when it was a primary site) disable it by editing `/etc/gitlab/gitlab.rb`
and running `sudo gitlab-ctl reconfigure`.
1. You can then set up database replication on the **secondary** node.
1. You can then set up database replication on the **secondary** site.
If you have lost your original **primary** node, follow the
[setup instructions](../setup/index.md) to set up a new **secondary** node.
If you have lost your original **primary** site, follow the
[setup instructions](../setup/index.md) to set up a new **secondary** site.
## Promote the **secondary** node to **primary** node
## Promote the **secondary** site to **primary** site
When the initial replication is complete and the **primary** node and **secondary** node are
When the initial replication is complete and the **primary** site and **secondary** site are
closely in sync, you can do a [planned failover](planned_failover.md).
## Restore the **secondary** node
## Restore the **secondary** site
If your objective is to have two nodes again, you need to bring your **secondary**
node back online as well by repeating the first step
([configure the former **primary** node to be a **secondary** node](#configure-the-former-primary-node-to-be-a-secondary-node))
for the **secondary** node.
If your objective is to have two sites again, you need to bring your **secondary**
site back online as well by repeating the first step
([configure the former **primary** site to be a **secondary** site](#configure-the-former-primary-site-to-be-a-secondary-site))
for the **secondary** site.
......@@ -16,36 +16,36 @@ For the latest updates, check the [Disaster Recovery epic for complete maturity]
Multi-secondary configurations require the complete re-synchronization and re-configuration of all non-promoted secondaries and
causes downtime.
## Promoting a **secondary** Geo node in single-secondary configurations
## Promoting a **secondary** Geo site in single-secondary configurations
We don't currently provide an automated way to promote a Geo replica and do a
failover, but you can do it manually if you have `root` access to the machine.
This process promotes a **secondary** Geo node to a **primary** node. To regain
geographic redundancy as quickly as possible, you should add a new **secondary** node
This process promotes a **secondary** Geo site to a **primary** site. To regain
geographic redundancy as quickly as possible, you should add a new **secondary** site
immediately after following these instructions.
### Step 1. Allow replication to finish if possible
If the **secondary** node is still replicating data from the **primary** node, follow
If the **secondary** site is still replicating data from the **primary** site, follow
[the planned failover docs](planned_failover.md) as closely as possible in
order to avoid unnecessary data loss.
### Step 2. Permanently disable the **primary** node
### Step 2. Permanently disable the **primary** site
WARNING:
If the **primary** node goes offline, there may be data saved on the **primary** node
that have not been replicated to the **secondary** node. This data should be treated
If the **primary** site goes offline, there may be data saved on the **primary** site
that have not been replicated to the **secondary** site. This data should be treated
as lost if you proceed.
If an outage on the **primary** node happens, you should do everything possible to
If an outage on the **primary** site happens, you should do everything possible to
avoid a split-brain situation where writes can occur in two different GitLab
instances, complicating recovery efforts. So to prepare for the failover, we
must disable the **primary** node.
must disable the **primary** site.
- If you have SSH access:
1. SSH into the **primary** node to stop and disable GitLab:
1. SSH into the **primary** site to stop and disable GitLab:
```shell
sudo gitlab-ctl stop
......@@ -57,35 +57,35 @@ must disable the **primary** node.
sudo systemctl disable gitlab-runsvdir
```
- If you do not have SSH access to the **primary** node, take the machine offline and
- If you do not have SSH access to the **primary** site, take the machine offline and
prevent it from rebooting by any means at your disposal.
You might need to:
- Reconfigure the load balancers.
- Change DNS records (for example, point the primary DNS record to the
**secondary** node to stop usage of the **primary** node).
**secondary** site to stop usage of the **primary** site).
- Stop the virtual servers.
- Block traffic through a firewall.
- Revoke object storage permissions from the **primary** node.
- Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine.
If you plan to [update the primary domain DNS record](#step-4-optional-updating-the-primary-domain-dns-record),
you may wish to lower the TTL now to speed up propagation.
### Step 3. Promoting a **secondary** node
### Step 3. Promoting a **secondary** site
WARNING:
In GitLab 13.2 and 13.3, promoting a secondary node to a primary while the
In GitLab 13.2 and 13.3, promoting a secondary site to a primary while the
secondary is paused fails. Do not pause replication before promoting a
secondary. If the node is paused, be sure to resume before promoting.
secondary. If the secondary site is paused, be sure to resume before promoting.
This issue has been fixed in GitLab 13.4 and later.
Note the following when promoting a secondary:
- If replication was paused on the secondary node (for example as a part of
- If replication was paused on the secondary site (for example as a part of
upgrading, while you were running a version of GitLab earlier than 13.4), you
_must_ [enable the node by using the database](../replication/troubleshooting.md#message-activerecordrecordinvalid-validation-failed-enabled-geo-primary-node-cannot-be-disabled)
before proceeding. If the secondary node
_must_ [enable the site by using the database](../replication/troubleshooting.md#message-activerecordrecordinvalid-validation-failed-enabled-geo-primary-node-cannot-be-disabled)
before proceeding. If the secondary site
[has been paused](../../geo/index.md#pausing-and-resuming-replication), the promotion
performs a point-in-time recovery to the last known state.
Data that was created on the primary while the secondary was paused is lost.
......@@ -99,7 +99,32 @@ Note the following when promoting a secondary:
for more information, see this
[troubleshooting advice](../replication/troubleshooting.md#errors-when-using---skip-preflight-checks-or---force).
#### Promoting a **secondary** node running on a single machine
#### Promoting a **secondary** site running on a single node running GitLab 14.5 and later
1. SSH in to your **secondary** node and execute:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site running on a single node running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
1. SSH in to your **secondary** node and login as root:
......@@ -116,7 +141,7 @@ Note the following when promoting a secondary:
roles ['geo_secondary_role']
```
1. Promote the **secondary** node to the **primary** node:
1. Promote the **secondary** site to the **primary** site:
- To promote the secondary node to primary along with [preflight checks](planned_failover.md#preflight-checks):
......@@ -146,18 +171,57 @@ Note the following when promoting a secondary:
gitlab-ctl promote-to-primary-node --force
```
1. Verify you can connect to the newly-promoted **primary** node using the URL used
previously for the **secondary** node.
1. If successful, the **secondary** node is now promoted to the **primary** node.
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** node with multiple servers
#### Promoting a **secondary** site with multiple nodes running GitLab 14.5 and later
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with multiple nodes running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
The `gitlab-ctl promote-to-primary-node` command cannot be used yet in
conjunction with multiple servers, as it can only
perform changes on a **secondary** with only a single machine. Instead, you must
conjunction with multiple nodes, as it can only perform changes on
a **secondary** with only a single node. Instead, you must
do this manually.
1. SSH in to the database node in the **secondary** and trigger PostgreSQL to
1. SSH in to the database node in the **secondary** site and trigger PostgreSQL to
promote to read-write:
```shell
......@@ -187,16 +251,54 @@ do this manually.
1. Verify you can connect to the newly-promoted **primary** using the URL used
previously for the **secondary**.
1. If successful, the **secondary** node is now promoted to the **primary** node.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with a Patroni standby cluster running GitLab 14.5 and later
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
#### Promoting a **secondary** node with a Patroni standby cluster
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with a Patroni standby cluster running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
The `gitlab-ctl promote-to-primary-node` command cannot be used yet in
conjunction with a Patroni standby cluster, as it can only
perform changes on a **secondary** with only a single machine. Instead, you must
do this manually.
conjunction with a Patroni standby cluster, as it can only perform changes on
a **secondary** with only a single node. Instead, you must do this manually.
1. SSH in to the Standby Leader database node in the **secondary** and trigger PostgreSQL to
1. SSH in to the Standby Leader database node in the **secondary** site and trigger PostgreSQL to
promote to read-write:
```shell
......@@ -230,9 +332,81 @@ do this manually.
1. Verify you can connect to the newly-promoted **primary** using the URL used
previously for the **secondary**.
1. If successful, the **secondary** node is now promoted to the **primary** node.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with an external PostgreSQL database running GitLab 14.5 and later
The `gitlab-ctl geo promote` command can be used in conjunction with
an external PostgreSQL database, but it can only perform changes on
a **secondary** PostgreSQL database managed by Omnibus.
You must promote the replica database associated with the **secondary**
site first.
1. Promote the replica database associated with the **secondary** site. This
sets the database to read-write. The instructions vary depending on where your database is hosted:
- [Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Promote)
- [Azure PostgreSQL](https://docs.microsoft.com/en-us/azure/postgresql/howto-read-replicas-portal#stop-replication)
- [Google Cloud SQL](https://cloud.google.com/sql/docs/mysql/replication/manage-replicas#promote-replica)
- For other external PostgreSQL databases, save the following script in your
secondary node, for example `/tmp/geo_promote.sh`, and modify the connection
parameters to match your environment. Then, execute it to promote the replica:
```shell
#!/bin/bash
#### Promoting a **secondary** node with an external PostgreSQL database
PG_SUPERUSER=postgres
# The path to your pg_ctl binary. You may need to adjust this path to match
# your PostgreSQL installation
PG_CTL_BINARY=/usr/lib/postgresql/10/bin/pg_ctl
# The path to your PostgreSQL data directory. You may need to adjust this
# path to match your PostgreSQL installation. You can also run
# `SHOW data_directory;` from PostgreSQL to find your data directory
PG_DATA_DIRECTORY=/etc/postgresql/10/main
# Promote the PostgreSQL database and allow read/write operations
sudo -u $PG_SUPERUSER $PG_CTL_BINARY -D $PG_DATA_DIRECTORY promote
```
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with an external PostgreSQL database running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
The `gitlab-ctl promote-to-primary-node` command cannot be used in conjunction with
an external PostgreSQL database, as it can only perform changes on a **secondary**
......@@ -287,23 +461,23 @@ required:
1. Verify you can connect to the newly-promoted **primary** using the URL used
previously for the **secondary**.
1. If successful, the **secondary** node is now promoted to the **primary** node.
1. If successful, the **secondary** site is now promoted to the **primary** site.
### Step 4. (Optional) Updating the primary domain DNS record
Updating the DNS records for the primary domain to point to the **secondary** node
Updating the DNS records for the primary domain to point to the **secondary** site
to prevent the need to update all references to the primary domain to the
secondary domain, like changing Git remotes and API URLs.
1. SSH into the **secondary** node and login as root:
1. SSH into the **secondary** site and login as root:
```shell
sudo -i
```
1. Update the primary domain's DNS record. After updating the primary domain's
DNS records to point to the **secondary** node, edit `/etc/gitlab/gitlab.rb` on the
**secondary** node to reflect the new URL:
DNS records to point to the **secondary** site, edit `/etc/gitlab/gitlab.rb` on the
**secondary** site to reflect the new URL:
```ruby
# Change the existing external_url configuration
......@@ -314,13 +488,13 @@ secondary domain, like changing Git remotes and API URLs.
Changing `external_url` does not prevent access via the old secondary URL, as
long as the secondary DNS records are still intact.
1. Reconfigure the **secondary** node for the change to take effect:
1. Reconfigure the **secondary** site for the change to take effect:
```shell
gitlab-ctl reconfigure
```
1. Execute the command below to update the newly promoted **primary** node URL:
1. Execute the command below to update the newly promoted **primary** site URL:
```shell
gitlab-rake geo:update_primary_node_url
......@@ -335,14 +509,14 @@ secondary domain, like changing Git remotes and API URLs.
To determine if you need to do this, search for the
`gitlab_rails["geo_node_name"]` setting in your `/etc/gitlab/gitlab.rb`
file. If it is commented out with `#` or not found at all, then you
need to update the **primary** node's name in the database. You can search for it
need to update the **primary** site's name in the database. You can search for it
like so:
```shell
grep "geo_node_name" /etc/gitlab/gitlab.rb
```
To update the **primary** node's name in the database:
To update the **primary** site's name in the database:
```shell
gitlab-rails runner 'Gitlab::Geo.primary_node.update!(name: GeoNode.current_node_name)'
......@@ -352,12 +526,12 @@ secondary domain, like changing Git remotes and API URLs.
If you updated the DNS records for the primary domain, these changes may
not have yet propagated depending on the previous DNS records TTL.
### Step 5. (Optional) Add **secondary** Geo node to a promoted **primary** node
### Step 5. (Optional) Add **secondary** Geo site to a promoted **primary** site
Promoting a **secondary** node to **primary** node using the process above does not enable
Geo on the new **primary** node.
Promoting a **secondary** site to **primary** site using the process above does not enable
Geo on the new **primary** site.
To bring a new **secondary** node online, follow the [Geo setup instructions](../index.md#setup-instructions).
To bring a new **secondary** site online, follow the [Geo setup instructions](../index.md#setup-instructions).
### Step 6. (Optional) Removing the secondary's tracking database
......@@ -376,13 +550,13 @@ for the changes to take effect.
## Promoting secondary Geo replica in multi-secondary configurations
If you have more than one **secondary** node and you need to promote one of them, we suggest you follow
[Promoting a **secondary** Geo node in single-secondary configurations](#promoting-a-secondary-geo-node-in-single-secondary-configurations)
If you have more than one **secondary** site and you need to promote one of them, we suggest you follow
[Promoting a **secondary** Geo site in single-secondary configurations](#promoting-a-secondary-geo-site-in-single-secondary-configurations)
and after that you also need two extra steps.
### Step 1. Prepare the new **primary** node to serve one or more **secondary** nodes
### Step 1. Prepare the new **primary** site to serve one or more **secondary** sites
1. SSH into the new **primary** node and login as root:
1. SSH into the new **primary** site and login as root:
```shell
sudo -i
......@@ -442,13 +616,13 @@ and after that you also need two extra steps.
### Step 2. Initiate the replication process
Now we need to make each **secondary** node listen to changes on the new **primary** node. To do that you need
Now we need to make each **secondary** site listen to changes on the new **primary** site. To do that you need
to [initiate the replication process](../setup/database.md#step-3-initiate-the-replication-process) again but this time
for another **primary** node. All the old replication settings are overwritten.
for another **primary** site. All the old replication settings are overwritten.
## Promoting a secondary Geo cluster in GitLab Cloud Native Helm Charts
When updating a Cloud Native Geo deployment, the process for updating any node that is external to the secondary Kubernetes cluster does not differ from the non Cloud Native approach. As such, you can always defer to [Promoting a secondary Geo node in single-secondary configurations](#promoting-a-secondary-geo-node-in-single-secondary-configurations) for more information.
When updating a Cloud Native Geo deployment, the process for updating any node that is external to the secondary Kubernetes cluster does not differ from the non Cloud Native approach. As such, you can always defer to [Promoting a secondary Geo site in single-secondary configurations](#promoting-a-secondary-geo-site-in-single-secondary-configurations) for more information.
The following sections assume you are using the `gitlab` namespace. If you used a different namespace when setting up your cluster, you should also replace `--namespace gitlab` with your namespace.
......@@ -489,13 +663,45 @@ must disable the **primary** site:
- Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine.
### Step 2. Promote all **secondary** nodes external to the cluster
### Step 2. Promote all **secondary** sites external to the cluster
WARNING:
If the secondary site [has been paused](../../geo/index.md#pausing-and-resuming-replication), this performs
a point-in-time recovery to the last known state.
Data that was created on the primary while the secondary was paused is lost.
If you are running GitLab 14.5 and later:
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
If you are running GitLab 14.4 and earlier:
1. SSH in to the database node in the **secondary** and trigger PostgreSQL to
promote to read-write:
......@@ -522,8 +728,6 @@ Data that was created on the primary while the secondary was paused is lost.
After making these changes, [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) on the database node.
### Step 3. Promote the **secondary** cluster
1. Find the task runner pod:
```shell
......@@ -536,6 +740,8 @@ Data that was created on the primary while the secondary was paused is lost.
kubectl --namespace gitlab exec -ti gitlab-geo-task-runner-XXX -- gitlab-rake geo:set_secondary_as_primary
```
### Step 3. Promote the **secondary** cluster
1. Update the existing cluster configuration.
You can retrieve the existing configuration with Helm:
......
......@@ -204,4 +204,4 @@ in the loss of any data uploaded to the new **primary** in the meantime.
Don't forget to remove the broadcast message after the failover is complete.
Finally, you can bring the [old site back as a secondary](bring_primary_back.md#configure-the-former-primary-node-to-be-a-secondary-node).
Finally, you can bring the [old site back as a secondary](bring_primary_back.md#configure-the-former-primary-site-to-be-a-secondary-site).
......@@ -66,13 +66,13 @@ promote a Geo replica and perform a failover.
NOTE:
GitLab 13.9 through GitLab 14.3 are affected by a bug in which the Geo secondary site statuses will appear to stop updating and become unhealthy. For more information, see [Geo Admin Area shows 'Unhealthy' after enabling Maintenance Mode](../../replication/troubleshooting.md#geo-admin-area-shows-unhealthy-after-enabling-maintenance-mode).
On the **secondary** node:
On the **secondary** site:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Geo > Nodes** to see its status.
Replicated objects (shown in green) should be close to 100%,
and there should be no failures (shown in red). If a large proportion of
objects aren't yet replicated (shown in gray), consider giving the node more
objects aren't yet replicated (shown in gray), consider giving the site more
time to complete.
![Replication status](../../replication/img/geo_dashboard_v14_0.png)
......@@ -85,20 +85,20 @@ You can use the
[Geo status API](../../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node)
to review failed objects and the reasons for failure.
A common cause of replication failures is the data being missing on the
**primary** node - you can resolve these failures by restoring the data from backup,
**primary** site - you can resolve these failures by restoring the data from backup,
or removing references to the missing data.
The maintenance window won't end until Geo replication and verification is
completely finished. To keep the window as short as possible, you should
ensure these processes are close to 100% as possible during active use.
If the **secondary** node is still replicating data from the **primary** node,
If the **secondary** site is still replicating data from the **primary** site,
follow these steps to avoid unnecessary data loss:
1. Until a [read-only mode](https://gitlab.com/gitlab-org/gitlab/-/issues/14609)
is implemented, updates must be prevented from happening manually to the
**primary**. Your **secondary** node still needs read-only
access to the **primary** node during the maintenance window:
**primary**. Your **secondary** site still needs read-only
access to the **primary** site during the maintenance window:
1. At the scheduled time, using your cloud provider or your node's firewall, block
all HTTP, HTTPS and SSH traffic to/from the **primary** node, **except** for your IP and
......@@ -121,18 +121,18 @@ follow these steps to avoid unnecessary data loss:
```
From this point, users are unable to view their data or make changes on the
**primary** node. They are also unable to log in to the **secondary** node.
**primary** site. They are also unable to log in to the **secondary** site.
However, existing sessions need to work for the remainder of the maintenance period, and
so public data is accessible throughout.
1. Verify the **primary** node is blocked to HTTP traffic by visiting it in browser via
1. Verify the **primary** site is blocked to HTTP traffic by visiting it in browser via
another IP. The server should refuse connection.
1. Verify the **primary** node is blocked to Git over SSH traffic by attempting to pull an
1. Verify the **primary** site is blocked to Git over SSH traffic by attempting to pull an
existing Git repository with an SSH remote URL. The server should refuse
connection.
1. On the **primary** node:
1. On the **primary** site:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dhasboard, select **Cron**.
......@@ -150,7 +150,7 @@ follow these steps to avoid unnecessary data loss:
1. If you are manually replicating any
[data not managed by Geo](../../replication/datatypes.md#limitations-on-replicationverification),
trigger the final replication process now.
1. On the **primary** node:
1. On the **primary** site:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all queues except
......@@ -165,7 +165,7 @@ follow these steps to avoid unnecessary data loss:
- Database replication lag is 0ms.
- The Geo log cursor is up to date (0 events behind).
1. On the **secondary** node:
1. On the **secondary** site:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all the `geo`
......@@ -173,14 +173,14 @@ follow these steps to avoid unnecessary data loss:
1. [Run an integrity check](../../../raketasks/check.md) to verify the integrity
of CI artifacts, LFS objects, and uploads in file storage.
At this point, your **secondary** node contains an up-to-date copy of everything the
**primary** node has, meaning nothing is lost when you fail over.
At this point, your **secondary** site contains an up-to-date copy of everything the
**primary** site has, meaning nothing is lost when you fail over.
1. In this final step, you need to permanently disable the **primary** node.
1. In this final step, you need to permanently disable the **primary** site.
WARNING:
When the **primary** node goes offline, there may be data saved on the **primary** node
that has not been replicated to the **secondary** node. This data should be treated
When the **primary** site goes offline, there may be data saved on the **primary** site
that has not been replicated to the **secondary** site. This data should be treated
as lost if you proceed.
NOTE:
......@@ -189,9 +189,9 @@ follow these steps to avoid unnecessary data loss:
When performing a failover, we want to avoid a split-brain situation where
writes can occur in two different GitLab instances. So to prepare for the
failover, you must disable the **primary** node:
failover, you must disable the **primary** site:
- If you have SSH access to the **primary** node, stop and disable GitLab:
- If you have SSH access to the **primary** site, stop and disable GitLab:
```shell
sudo gitlab-ctl stop
......@@ -214,19 +214,58 @@ follow these steps to avoid unnecessary data loss:
from starting if the machine reboots as `root` with
`initctl stop gitlab-runsvvdir && echo 'manual' > /etc/init/gitlab-runsvdir.override && initctl reload-configuration`.
- If you do not have SSH access to the **primary** node, take the machine offline and
- If you do not have SSH access to the **primary** site, take the machine offline and
prevent it from rebooting. Since there are many ways you may prefer to accomplish
this, we avoid a single recommendation. You may need to:
- Reconfigure the load balancers.
- Change DNS records (for example, point the **primary** DNS record to the
**secondary** node to stop using the **primary** node).
**secondary** site to stop using the **primary** site).
- Stop the virtual servers.
- Block traffic through a firewall.
- Revoke object storage permissions from the **primary** node.
- Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine.
### Promoting the **secondary** node
### Promoting the **secondary** site running GitLab 14.5 and later
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
### Promoting the **secondary** site running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
NOTE:
A new **secondary** should not be added at this time. If you want to add a new
......@@ -243,13 +282,13 @@ perform changes on a **secondary** with only a single machine. Instead, you must
do this manually.
WARNING:
In GitLab 13.2 and 13.3, promoting a secondary node to a primary while the
In GitLab 13.2 and 13.3, promoting a secondary site to a primary while the
secondary is paused fails. Do not pause replication before promoting a
secondary. If the node is paused, be sure to resume before promoting. This
secondary. If the site is paused, be sure to resume before promoting. This
issue has been fixed in GitLab 13.4 and later.
WARNING:
If the secondary node [has been paused](../../../geo/index.md#pausing-and-resuming-replication), this performs
If the secondary site [has been paused](../../../geo/index.md#pausing-and-resuming-replication), this performs
a point-in-time recovery to the last known state.
Data that was created on the primary while the secondary was paused is lost.
......@@ -291,6 +330,6 @@ Data that was created on the primary while the secondary was paused is lost.
### Next steps
To regain geographic redundancy as quickly as possible, you should
[add a new **secondary** node](../../setup/index.md). To
[add a new **secondary** site](../../setup/index.md). To
do that, you can re-add the old **primary** as a new secondary and bring it back
online.
......@@ -54,10 +54,10 @@ promote a Geo replica and perform a failover.
NOTE:
GitLab 13.9 through GitLab 14.3 are affected by a bug in which the Geo secondary site statuses will appear to stop updating and become unhealthy. For more information, see [Geo Admin Area shows 'Unhealthy' after enabling Maintenance Mode](../../replication/troubleshooting.md#geo-admin-area-shows-unhealthy-after-enabling-maintenance-mode).
On the **secondary** node, navigate to the **Admin Area > Geo** dashboard to
On the **secondary** site, navigate to the **Admin Area > Geo** dashboard to
review its status. Replicated objects (shown in green) should be close to 100%,
and there should be no failures (shown in red). If a large proportion of
objects aren't yet replicated (shown in gray), consider giving the node more
objects aren't yet replicated (shown in gray), consider giving the site more
time to complete.
![Replication status](../../replication/img/geo_dashboard_v14_0.png)
......@@ -70,20 +70,20 @@ You can use the
[Geo status API](../../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node)
to review failed objects and the reasons for failure.
A common cause of replication failures is the data being missing on the
**primary** node - you can resolve these failures by restoring the data from backup,
**primary** site - you can resolve these failures by restoring the data from backup,
or removing references to the missing data.
The maintenance window won't end until Geo replication and verification is
completely finished. To keep the window as short as possible, you should
ensure these processes are close to 100% as possible during active use.
If the **secondary** node is still replicating data from the **primary** node,
If the **secondary** site is still replicating data from the **primary** site,
follow these steps to avoid unnecessary data loss:
1. Until a [read-only mode](https://gitlab.com/gitlab-org/gitlab/-/issues/14609)
is implemented, updates must be prevented from happening manually to the
**primary**. Your **secondary** node still needs read-only
access to the **primary** node during the maintenance window:
**primary**. Your **secondary** site still needs read-only
access to the **primary** site during the maintenance window:
1. At the scheduled time, using your cloud provider or your node's firewall, block
all HTTP, HTTPS and SSH traffic to/from the **primary** node, **except** for your IP and
......@@ -106,18 +106,18 @@ follow these steps to avoid unnecessary data loss:
```
From this point, users are unable to view their data or make changes on the
**primary** node. They are also unable to log in to the **secondary** node.
**primary** site. They are also unable to log in to the **secondary** site.
However, existing sessions need to work for the remainder of the maintenance period, and
so public data is accessible throughout.
1. Verify the **primary** node is blocked to HTTP traffic by visiting it in browser via
1. Verify the **primary** site is blocked to HTTP traffic by visiting it in browser via
another IP. The server should refuse connection.
1. Verify the **primary** node is blocked to Git over SSH traffic by attempting to pull an
1. Verify the **primary** site is blocked to Git over SSH traffic by attempting to pull an
existing Git repository with an SSH remote URL. The server should refuse
connection.
1. On the **primary** node:
1. On the **primary** site:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dhasboard, select **Cron**.
......@@ -135,7 +135,7 @@ follow these steps to avoid unnecessary data loss:
1. If you are manually replicating any
[data not managed by Geo](../../replication/datatypes.md#limitations-on-replicationverification),
trigger the final replication process now.
1. On the **primary** node:
1. On the **primary** site:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all queues except
......@@ -143,14 +143,14 @@ follow these steps to avoid unnecessary data loss:
These queues contain work that has been submitted by your users; failing over
before it is completed, causes the work to be lost.
1. On the left sidebar, select **Geo > Nodes** and wait for the
following conditions to be true of the **secondary** node you are failing over to:
following conditions to be true of the **secondary** site you are failing over to:
- All replication meters reach 100% replicated, 0% failures.
- All verification meters reach 100% verified, 0% failures.
- Database replication lag is 0ms.
- The Geo log cursor is up to date (0 events behind).
1. On the **secondary** node:
1. On the **secondary** site:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all the `geo`
......@@ -158,14 +158,14 @@ follow these steps to avoid unnecessary data loss:
1. [Run an integrity check](../../../raketasks/check.md) to verify the integrity
of CI artifacts, LFS objects, and uploads in file storage.
At this point, your **secondary** node contains an up-to-date copy of everything the
**primary** node has, meaning nothing is lost when you fail over.
At this point, your **secondary** site contains an up-to-date copy of everything the
**primary** site has, meaning nothing is lost when you fail over.
1. In this final step, you need to permanently disable the **primary** node.
1. In this final step, you need to permanently disable the **primary** site.
WARNING:
When the **primary** node goes offline, there may be data saved on the **primary** node
that has not been replicated to the **secondary** node. This data should be treated
When the **primary** site goes offline, there may be data saved on the **primary** site
that has not been replicated to the **secondary** site. This data should be treated
as lost if you proceed.
NOTE:
......@@ -174,9 +174,9 @@ follow these steps to avoid unnecessary data loss:
When performing a failover, we want to avoid a split-brain situation where
writes can occur in two different GitLab instances. So to prepare for the
failover, you must disable the **primary** node:
failover, you must disable the **primary** site:
- If you have SSH access to the **primary** node, stop and disable GitLab:
- If you have SSH access to the **primary** site, stop and disable GitLab:
```shell
sudo gitlab-ctl stop
......@@ -199,19 +199,19 @@ follow these steps to avoid unnecessary data loss:
from starting if the machine reboots as `root` with
`initctl stop gitlab-runsvvdir && echo 'manual' > /etc/init/gitlab-runsvdir.override && initctl reload-configuration`.
- If you do not have SSH access to the **primary** node, take the machine offline and
- If you do not have SSH access to the **primary** site, take the machine offline and
prevent it from rebooting. Since there are many ways you may prefer to accomplish
this, we avoid a single recommendation. You may need to:
- Reconfigure the load balancers.
- Change DNS records (for example, point the **primary** DNS record to the
**secondary** node to stop using the **primary** node).
**secondary** site to stop using the **primary** site).
- Stop the virtual servers.
- Block traffic through a firewall.
- Revoke object storage permissions from the **primary** node.
- Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine.
### Promoting the **secondary** node
### Promoting the **secondary** site
Note the following when promoting a secondary:
......@@ -222,9 +222,35 @@ Note the following when promoting a secondary:
error during this process, read
[the troubleshooting advice](../../replication/troubleshooting.md#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node).
To promote the secondary node:
To promote the secondary site running GitLab 14.5 and later:
1. SSH in to your **secondary** node and login as root:
1. SSH in to your **secondary** node and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly promoted **primary** site using the URL used
previously for the **secondary** site.
If successful, the **secondary** site is now promoted to the **primary** site.
To promote the secondary site running GitLab 14.4 and earlier:
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
1. SSH in to your **secondary** site and login as root:
```shell
sudo -i
......@@ -275,20 +301,20 @@ To promote the secondary node:
gitlab-ctl promote-to-primary-node --skip-preflight-check
```
You can also promote the secondary node to primary **without any further confirmation**, even when preflight checks fail:
You can also promote the secondary site to primary **without any further confirmation**, even when preflight checks fail:
```shell
sudo gitlab-ctl promote-to-primary-node --force
```
1. Verify you can connect to the newly promoted **primary** node using the URL used
previously for the **secondary** node.
1. Verify you can connect to the newly promoted **primary** site using the URL used
previously for the **secondary** site.
If successful, the **secondary** node has now been promoted to the **primary** node.
If successful, the **secondary** site is now promoted to the **primary** site.
### Next steps
To regain geographic redundancy as quickly as possible, you should
[add a new **secondary** node](../../setup/index.md). To
[add a new **secondary** site](../../setup/index.md). To
do that, you can re-add the old **primary** as a new secondary and bring it back
online.
......@@ -28,7 +28,7 @@ To disable Geo, you need to first remove all your secondary Geo sites, which mea
anymore on these sites. You can follow our docs to [remove your secondary Geo sites](remove_geo_site.md).
If the current site that you want to keep using is a secondary site, you need to first promote it to primary.
You can use our steps on [how to promote a secondary site](../disaster_recovery/#step-3-promoting-a-secondary-node)
You can use our steps on [how to promote a secondary site](../disaster_recovery/#step-3-promoting-a-secondary-site)
to do that.
## Remove the primary site from the UI
......
......@@ -683,7 +683,7 @@ when promoting a secondary to a primary node with strategies to resolve them.
### Message: ActiveRecord::RecordInvalid: Validation failed: Name has already been taken
When [promoting a **secondary** node](../disaster_recovery/index.md#step-3-promoting-a-secondary-node),
When [promoting a **secondary** site](../disaster_recovery/index.md#step-3-promoting-a-secondary-site),
you might encounter the following error:
```plaintext
......@@ -751,7 +751,7 @@ This can be fixed in the database.
### Message: ``NoMethodError: undefined method `secondary?' for nil:NilClass``
When [promoting a **secondary** node](../disaster_recovery/index.md#step-3-promoting-a-secondary-node),
When [promoting a **secondary** site](../disaster_recovery/index.md#step-3-promoting-a-secondary-site),
you might encounter the following error:
```plaintext
......@@ -767,13 +767,13 @@ Tasks: TOP => geo:set_secondary_as_primary
(See full trace by running task with --trace)
```
This command is intended to be executed on a secondary node only, and this error
is displayed if you attempt to run this command on a primary node.
This command is intended to be executed on a secondary site only, and this error
is displayed if you attempt to run this command on a primary site.
### Message: `sudo: gitlab-pg-ctl: command not found`
When
[promoting a **secondary** node with multiple servers](../disaster_recovery/index.md#promoting-a-secondary-node-with-multiple-servers),
[promoting a **secondary** site with multiple nodes](../disaster_recovery/index.md#promoting-a-secondary-site-with-multiple-nodes-running-gitlab-144-and-earlier),
you need to run the `gitlab-pg-ctl` command to promote the PostgreSQL
read-replica database.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment