Commit 004ff6f5 authored by Fabian Zimmer's avatar Fabian Zimmer Committed by Achilleas Pipinellis

Moved DR troubleshooting section

We have two troubleshooting sections.
This merges the two.
parent eacdaa70
......@@ -100,7 +100,7 @@ Note the following when promoting a secondary:
- If replication was paused on the secondary node, for example as a part of upgrading,
while you were running a version of GitLab lower than 13.4, you _must_
[enable the node via the database](#while-promoting-the-secondary-i-got-an-error-activerecordrecordinvalid)
[enable the node via the database](../replication/troubleshooting.md#while-promoting-the-secondary-i-got-an-error-activerecordrecordinvalid)
before proceeding.
- A new **secondary** should not be added at this time. If you want to add a new
**secondary**, do this after you have completed the entire process of promoting
......@@ -421,33 +421,4 @@ for another **primary** node. All the old replication settings will be overwritt
## Troubleshooting
### I followed the disaster recovery instructions and now two-factor auth is broken
The setup instructions for Geo prior to 10.5 failed to replicate the
`otp_key_base` secret, which is used to encrypt the two-factor authentication
secrets stored in the database. If it differs between **primary** and **secondary**
nodes, users with two-factor authentication enabled won't be able to log in
after a failover.
If you still have access to the old **primary** node, you can follow the
instructions in the
[Upgrading to GitLab 10.5](../replication/version_specific_updates.md#updating-to-gitlab-105)
section to resolve the error. Otherwise, the secret is lost and you'll need to
[reset two-factor authentication for all users](../../../security/two_factor_authentication.md#disabling-2fa-for-everyone).
### While Promoting the secondary, I got an error `ActiveRecord::RecordInvalid`
If you disabled a secondary node, either with the [replication pause task](../index.md#pausing-and-resuming-replication)
(13.2) or via the UI (13.1 and earlier), you must first re-enable the
node before you can continue. This is fixed in 13.4.
From `gitlab-psql`, execute the following, replacing `<your secondary url>`
with the URL for your secondary server starting with `http` or `https` and ending with a `/`.
```shell
SECONDARY_URL="https://<secondary url>/"
DATABASE_NAME="gitlabhq_production"
sudo gitlab-psql -d "$DATABASE_NAME" -c "UPDATE geo_nodes SET enabled = true WHERE url = '$SECONDARY_URL';"
```
This should update 1 row.
This section was moved to [another location](../replication/troubleshooting.md#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node).
......@@ -632,6 +632,23 @@ To double check this, you can do the following:
UPDATE geo_nodes SET enabled = 't' WHERE id = ID_FROM_ABOVE;
```
### While Promoting the secondary, I got an error `ActiveRecord::RecordInvalid`
If you disabled a secondary node, either with the [replication pause task](../index.md#pausing-and-resuming-replication)
(13.2) or via the UI (13.1 and earlier), you must first re-enable the
node before you can continue. This is fixed in 13.4.
From `gitlab-psql`, execute the following, replacing `<your secondary url>`
with the URL for your secondary server starting with `http` or `https` and ending with a `/`.
```shell
SECONDARY_URL="https://<secondary url>/"
DATABASE_NAME="gitlabhq_production"
sudo gitlab-psql -d "$DATABASE_NAME" -c "UPDATE geo_nodes SET enabled = true WHERE url = '$SECONDARY_URL';"
```
This should update 1 row.
### Message: ``NoMethodError: undefined method `secondary?' for nil:NilClass``
When [promoting a **secondary** node](../disaster_recovery/index.md#step-3-promoting-a-secondary-node),
......@@ -674,6 +691,20 @@ sudo /opt/gitlab/embedded/bin/gitlab-pg-ctl promote
GitLab 12.9 and later are [unaffected by this error](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5147).
### Two-factor authentication is broken after a failover
The setup instructions for Geo prior to 10.5 failed to replicate the
`otp_key_base` secret, which is used to encrypt the two-factor authentication
secrets stored in the database. If it differs between **primary** and **secondary**
nodes, users with two-factor authentication enabled won't be able to log in
after a failover.
If you still have access to the old **primary** node, you can follow the
instructions in the
[Upgrading to GitLab 10.5](../replication/version_specific_updates.md#updating-to-gitlab-105)
section to resolve the error. Otherwise, the secret is lost and you'll need to
[reset two-factor authentication for all users](../../../security/two_factor_authentication.md#disabling-2fa-for-everyone).
## Expired artifacts
If you notice for some reason there are more artifacts on the Geo
......
......@@ -314,7 +314,7 @@ sudo gitlab-ctl reconfigure
```
If you do not perform this step, you may find that two-factor authentication
[is broken following DR](../disaster_recovery/index.md#i-followed-the-disaster-recovery-instructions-and-now-two-factor-auth-is-broken).
[is broken following DR](troubleshooting.md#two-factor-authentication-is-broken-after-a-failover).
To prevent SSH requests to the newly promoted **primary** node from failing
due to SSH host key mismatch when updating the **primary** node domain's DNS record
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment