Commit 5d302102 authored by Evan Read's avatar Evan Read Committed by Achilleas Pipinellis

Standardise style and nomenclature for Geo documentation

parent ab6e5b9d
...@@ -4,18 +4,18 @@ NOTE: **Note:** ...@@ -4,18 +4,18 @@ NOTE: **Note:**
Automatic background verification of repositories and wikis was added in Automatic background verification of repositories and wikis was added in
GitLab EE 10.6 but is enabled by default only on GitLab EE 11.1. You can GitLab EE 10.6 but is enabled by default only on GitLab EE 11.1. You can
disable or enable this feature manually by following disable or enable this feature manually by following
[these instructions][feature-flag]. [these instructions](#disabling-or-enabling-the-automatic-background-verification).
Automatic backgorund verification ensures that the transferred data matches a Automatic background verification ensures that the transferred data matches a
calculated checksum, proving that the content on the **secondary** matches that calculated checksum. If the checksum of the data on the **primary** node matches checksum of the
on the **primary**. Following a planned failover, any corrupted data may be data on the **secondary** node, the data transferred successfully. Following a planned failover,
**lost**, depending on the extent of the corruption. any corrupted data may be **lost**, depending on the extent of the corruption.
If verification fails on the **primary**, this indicates that Geo is If verification fails on the **primary** node, this indicates that Geo is
successfully replicating a corrupted object; restore it from backup or remove it successfully replicating a corrupted object; restore it from backup or remove it
it from the primary to resolve the issue. it from the **primary** node to resolve the issue.
If verification succeeds on the **primary** but fails on the **secondary**, If verification succeeds on the **primary** node but fails on the **secondary** node,
this indicates that the object was corrupted during the replication process. this indicates that the object was corrupted during the replication process.
Geo actively try to correct verification failures marking the repository to Geo actively try to correct verification failures marking the repository to
be resynced with a backoff period. If you want to reset the verification for be resynced with a backoff period. If you want to reset the verification for
...@@ -65,7 +65,7 @@ in grey, and failures in red. ...@@ -65,7 +65,7 @@ in grey, and failures in red.
![Verification status](img/verification-status-primary.png) ![Verification status](img/verification-status-primary.png)
Navigate to the **Admin Area > Geo** dashboard on the **secondary** node and expand Navigate to the **Admin Area > Geo** dashboard on the **secondary** node and expand
the **Verification information** tab for that node to view automatic verifcation the **Verification information** tab for that node to view automatic verification
status for repositories and wikis. As with checksumming, successes are shown in status for repositories and wikis. As with checksumming, successes are shown in
green, pending work in grey, and failures in red. green, pending work in grey, and failures in red.
...@@ -73,7 +73,7 @@ green, pending work in grey, and failures in red. ...@@ -73,7 +73,7 @@ green, pending work in grey, and failures in red.
## Using checksums to compare Geo nodes ## Using checksums to compare Geo nodes
To check the health of Geo secondary nodes, we use a checksum over the list of To check the health of Geo **secondary** nodes, we use a checksum over the list of
Git references and their values. The checksum includes `HEAD`, `heads`, `tags`, Git references and their values. The checksum includes `HEAD`, `heads`, `tags`,
`notes`, and GitLab-specific references to ensure true consistency. If two nodes `notes`, and GitLab-specific references to ensure true consistency. If two nodes
have the same checksum, then they definitely hold the same references. We compute have the same checksum, then they definitely hold the same references. We compute
...@@ -129,33 +129,33 @@ be resynced with a backoff period. If you want to reset them manually, this ...@@ -129,33 +129,33 @@ be resynced with a backoff period. If you want to reset them manually, this
rake task marks projects where verification has failed or the checksum mismatch rake task marks projects where verification has failed or the checksum mismatch
to be resynced without the backoff period: to be resynced without the backoff period:
#### For repositories: For repositories:
**Omnibus Installation** - Omnibus Installation
``` ```sh
sudo gitlab-rake geo:verification:repository:reset sudo gitlab-rake geo:verification:repository:reset
``` ```
**Source Installation** - Source Installation
```bash ```sh
sudo -u git -H bundle exec rake geo:verification:repository:reset RAILS_ENV=production sudo -u git -H bundle exec rake geo:verification:repository:reset RAILS_ENV=production
``` ```
#### For wikis: For wikis:
**Omnibus Installation** - Omnibus Installation
``` ```sh
sudo gitlab-rake geo:verification:wiki:reset sudo gitlab-rake geo:verification:wiki:reset
``` ```
**Source Installation** - Source Installation
```bash ```sh
sudo -u git -H bundle exec rake geo:verification:wiki:reset RAILS_ENV=production sudo -u git -H bundle exec rake geo:verification:wiki:reset RAILS_ENV=production
``` ```
## Current limitations ## Current limitations
...@@ -167,7 +167,6 @@ on both nodes, and comparing the output between them. ...@@ -167,7 +167,6 @@ on both nodes, and comparing the output between them.
Data in object storage is **not verified**, as the object store is responsible Data in object storage is **not verified**, as the object store is responsible
for ensuring the integrity of the data. for ensuring the integrity of the data.
[feature-flag]: background_verification.md#enabling-or-disabling-the-automatic-background-verification
[reset-verification]: background_verification.md#reset-verification-for-projects-where-verification-has-failed [reset-verification]: background_verification.md#reset-verification-for-projects-where-verification-has-failed
[foreground-verification]: ../../raketasks/check.md [foreground-verification]: ../../raketasks/check.md
[ee-5064]: https://gitlab.com/gitlab-org/gitlab-ee/issues/5064 [ee-5064]: https://gitlab.com/gitlab-org/gitlab-ee/issues/5064
# Bring a demoted primary node back online # Bring a demoted primary node back online
After a failover, it is possible to fail back to the demoted primary to After a failover, it is possible to fail back to the demoted **primary** node to
restore your original configuration. This process consists of two steps: restore your original configuration. This process consists of two steps:
1. Making the old primary a secondary 1. Making the old **primary** node a **secondary** node.
1. Promoting a secondary to a primary 1. Promoting a **secondary** node to a **primary** node.
> *Warning:* If you have any doubts about the consistency of the data on this node, we recommend to set up it from scratch. CAUTION: **Caution:**
If you have any doubts about the consistency of the data on this node, we recommend setting it up from scratch.
## Configure the former primary to be a secondary ## Configure the former **primary** node to be a **secondary** node
Since the former primary will be out of sync with the current primary, the first step is Since the former **primary** node will be out of sync with the current **primary** node, the first step is to bring the former **primary** node up to date. Note, deletion of data stored on disk like
to bring the former primary up to date. Note, deletion of data stored on disk like repositories and uploads will not be replayed when bringing the former **primary** node back
repositories and uploads will not be replayed when bringing the former primary in back
into sync, which may result in increased disk usage. into sync, which may result in increased disk usage.
Alternatively, you can [set up a new secondary GitLab instance][setup-geo] to avoid this. Alternatively, you can [set up a new **secondary** GitLab instance][setup-geo] to avoid this.
To bring the former primary up to date: To bring the former **primary** node up to date:
1. SSH into the former primary that has fallen behind 1. SSH into the former **primary** node that has fallen behind.
1. Make sure all the services are up: 1. Make sure all the services are up:
```bash ```sh
sudo gitlab-ctl start sudo gitlab-ctl start
``` ```
> **Note 1:** If you [disabled primary permanently][disaster-recovery-disable-primary], > **Note 1:** If you [disabled the **primary** node permanently][disaster-recovery-disable-primary],
> you need to undo those steps now. For Debian/Ubuntu you just need to run > you need to undo those steps now. For Debian/Ubuntu you just need to run
> `sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install > `sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install
> the GitLab instance from scratch and set it up as a secondary node by > the GitLab instance from scratch and set it up as a **secondary** node by
> following [Setup instructions][setup-geo]. In this case you don't need to follow the next step. > following [Setup instructions][setup-geo]. In this case, you don't need to follow the next step.
> >
> **Note 2:** If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domains-dns-record) > **Note 2:** If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domains-dns-record)
> for this node during disaster recovery procedure you may need to [block > for this node during disaster recovery procedure you may need to [block
...@@ -37,25 +37,25 @@ To bring the former primary up to date: ...@@ -37,25 +37,25 @@ To bring the former primary up to date:
> during this procedure. > during this procedure.
1. [Setup database replication][database-replication]. Note that in this 1. [Setup database replication][database-replication]. Note that in this
case, primary refers to the current primary, and secondary refers to the case, **primary** node refers to the current **primary** node, and **secondary** node refers to the
former primary. former **primary** node.
If you have lost your original primary, follow the If you have lost your original **primary** node, follow the
[setup instructions][setup-geo] to set up a new secondary. [setup instructions][setup-geo] to set up a new **secondary** node.
## Promote the secondary to primary ## Promote the **secondary** node to **primary** node
When the initial replication is complete and the primary and secondary are When the initial replication is complete and the **primary** node and **secondary** node are
closely in sync, you can do a [planned failover]. closely in sync, you can do a [planned failover].
## Restore the secondary node ## Restore the **secondary** node
If your objective is to have two nodes again, you need to bring your secondary If your objective is to have two nodes again, you need to bring your **secondary**
node back online as well by repeating the first step node back online as well by repeating the first step
([configure the former primary to be a secondary](#configure-the-former-primary-to-be-a-secondary)) ([configure the former **primary** node to be a **secondary** node](#configure-the-former-primary-node-to-be-a-secondary-node))
for the secondary node. for the **secondary** node.
[setup-geo]: ../replication/index.md#setup-instructions [setup-geo]: ../replication/index.md#setup-instructions
[database-replication]: ../replication/database.md [database-replication]: ../replication/database.md
[disaster-recovery-disable-primary]: index.md#step-2-permanently-disable-the-primary [disaster-recovery-disable-primary]: index.md#step-2-permanently-disable-the-primary-node
[planned failover]: planned_failover.md [planned failover]: planned_failover.md
...@@ -6,37 +6,38 @@ Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/om ...@@ -6,37 +6,38 @@ Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/om
using the Omnibus GitLab packages, follow the using the Omnibus GitLab packages, follow the
[**Omnibus Geo nodes configuration**][configuration] guide. [**Omnibus Geo nodes configuration**][configuration] guide.
## Configuring a new secondary node ## Configuring a new **secondary** node
NOTE: **Note:** NOTE: **Note:**
This is the final step in setting up a secondary Geo node. Stages of the setup This is the final step in setting up a **secondary** node. Stages of the setup
process must be completed in the documented order. Before attempting the steps process must be completed in the documented order. Before attempting the steps
in this stage, [complete all prior stages][setup-geo-source]. in this stage, [complete all prior stages][setup-geo-source].
The basic steps of configuring a secondary node are to replicate required The basic steps of configuring a **secondary** node are to:
configurations between the primary and the secondaries; to configure a tracking
database on each secondary; and to start GitLab on the secondary node. - Replicate required configurations between the **primary** and **secondary** nodes.
- Configure a tracking database on each **secondary** node.
- Start GitLab on the **secondary** node.
You are encouraged to first read through all the steps before executing them You are encouraged to first read through all the steps before executing them
in your testing/production environment. in your testing/production environment.
NOTE: **Note:**
**Do not** set up any custom authentication on **secondary** nodes, this will be handled by the **primary** node.
NOTE: **Notes:** NOTE: **Note:**
- **Do not** setup any custom authentication in the secondary nodes, this will be **Do not** add anything in the **secondary** node's admin area (**Admin Area > Geo**). This is handled solely by the **primary** node.
handled by the primary node.
- **Do not** add anything in the secondaries Geo nodes admin area
(**Admin Area > Geo**). This is handled solely by the primary node.
### Step 1. Manually replicate secret GitLab values ### Step 1. Manually replicate secret GitLab values
GitLab stores a number of secret values in the `/home/git/gitlab/config/secrets.yml` GitLab stores a number of secret values in the `/home/git/gitlab/config/secrets.yml`
file which *must* match between the primary and secondary nodes. Until there is file which *must* match between the **primary** and **secondary** nodes. Until there is
a means of automatically replicating these between nodes (see [gitlab-org/gitlab-ee#3789]), they must a means of automatically replicating these between nodes (see [gitlab-org/gitlab-ee#3789]), they must
be manually replicated to the secondary. be manually replicated to **secondary** nodes.
1. SSH into the **primary** node, and execute the command below: 1. SSH into the **primary** node, and execute the command below:
```bash ```sh
sudo cat /home/git/gitlab/config/secrets.yml sudo cat /home/git/gitlab/config/secrets.yml
``` ```
...@@ -44,20 +45,20 @@ be manually replicated to the secondary. ...@@ -44,20 +45,20 @@ be manually replicated to the secondary.
1. SSH into the **secondary** node and login as the `git` user: 1. SSH into the **secondary** node and login as the `git` user:
```bash ```sh
sudo -i -u git sudo -i -u git
``` ```
1. Make a backup of any existing secrets: 1. Make a backup of any existing secrets:
```bash ```sh
mv /home/git/gitlab/config/secrets.yml /home/git/gitlab/config/secrets.yml.`date +%F` mv /home/git/gitlab/config/secrets.yml /home/git/gitlab/config/secrets.yml.`date +%F`
``` ```
1. Copy `/home/git/gitlab/config/secrets.yml` from the primary to the secondary, or 1. Copy `/home/git/gitlab/config/secrets.yml` from the **primary** node to the **secondary** node, or
copy-and-paste the file contents between nodes: copy-and-paste the file contents between nodes:
```bash ```sh
sudo editor /home/git/gitlab/config/secrets.yml sudo editor /home/git/gitlab/config/secrets.yml
# paste the output of the `cat` command you ran on the primary # paste the output of the `cat` command you ran on the primary
...@@ -66,65 +67,65 @@ be manually replicated to the secondary. ...@@ -66,65 +67,65 @@ be manually replicated to the secondary.
1. Ensure the file permissions are correct: 1. Ensure the file permissions are correct:
```bash ```sh
chown git:git /home/git/gitlab/config/secrets.yml chown git:git /home/git/gitlab/config/secrets.yml
chmod 0600 /home/git/gitlab/config/secrets.yml chmod 0600 /home/git/gitlab/config/secrets.yml
``` ```
1. Restart GitLab 1. Restart GitLab
```bash ```sh
service gitlab restart service gitlab restart
``` ```
Once restarted, the secondary will automatically start replicating missing data Once restarted, the **secondary** node will automatically start replicating missing data
from the primary in a process known as backfill. Meanwhile, the primary node from the **primary** node in a process known as backfill. Meanwhile, the **primary** node
will start to notify the secondary of any changes, so that the secondary can will start to notify the **secondary** node of any changes, so that the **secondary** node can
act on those notifications immediately. act on those notifications immediately.
Make sure the secondary instance is running and accessible. You can login to Make sure the **secondary** node is running and accessible. You can login to
the secondary node with the same credentials as used in the primary. the **secondary** node with the same credentials as used for the **primary** node.
### Step 2. Manually replicate primary SSH host keys ### Step 2. Manually replicate the **primary** node's SSH host keys
Read [Manually replicate primary SSH host keys][configuration-replicate-ssh] Read [Manually replicate the **primary** node's SSH host keys][configuration-replicate-ssh]
### Step 3. Add the secondary GitLab node ### Step 3. Add the **secondary** GitLab node
1. Navigate to the **primary** node's **Admin Area > Geo** 1. Navigate to the **primary** node's **Admin Area > Geo**
(`/admin/geo_nodes`) in your browser. (`/admin/geo/nodes`) in your browser.
1. Add the secondary node by providing its full URL. **Do NOT** check the box 1. Add the **secondary** node by providing its full URL. **Do NOT** check the
'This is a primary node'. **This is a primary node** checkbox.
1. Optionally, choose which namespaces should be replicated by the 1. Optionally, choose which namespaces should be replicated by the
secondary node. Leave blank to replicate all. Read more in **secondary** node. Leave blank to replicate all. Read more in
[selective synchronization](#selective-synchronization). [selective synchronization](#selective-synchronization).
1. Click the **Add node** button. 1. Click the **Add node** button.
1. SSH into your GitLab **secondary** server and restart the services: 1. SSH into your GitLab **secondary** server and restart the services:
```bash ```sh
service gitlab restart service gitlab restart
``` ```
Check if there are any common issue with your Geo setup by running: Check if there are any common issue with your Geo setup by running:
```bash ```sh
bundle exec rake gitlab:geo:check bundle exec rake gitlab:geo:check
``` ```
1. SSH into your GitLab **primary** server and login as root to verify the 1. SSH into your GitLab **primary** server and login as root to verify the
secondary is reachable or there are any common issue with your Geo setup: **secondary** node is reachable or there are any common issue with your Geo setup:
```bash ```sh
bundle exec rake gitlab:geo:check bundle exec rake gitlab:geo:check
``` ```
Once reconfigured, the secondary will automatically start Once reconfigured, the **secondary** node will automatically start
replicating missing data from the primary in a process known as backfill. replicating missing data from the **primary** node in a process known as backfill.
Meanwhile, the primary node will start to notify the secondary of any changes, so Meanwhile, the **primary** node will start to notify the **secondary** node of any changes, so
that the secondary can act on those notifications immediately. that the **secondary** node can act on those notifications immediately.
Make sure the secondary instance is running and accessible. Make sure the **secondary** node is running and accessible.
You can login to the secondary node with the same credentials as used in the primary. You can log in to the **secondary** node with the same credentials as used for the **primary** node.
### Step 4. Enabling Hashed Storage ### Step 4. Enabling Hashed Storage
...@@ -132,15 +133,15 @@ Read [Enabling Hashed Storage](configuration.md##step-4-enabling-hashed-storage) ...@@ -132,15 +133,15 @@ Read [Enabling Hashed Storage](configuration.md##step-4-enabling-hashed-storage)
### Step 5. (Optional) Configuring the secondary to trust the primary ### Step 5. (Optional) Configuring the secondary to trust the primary
You can safely skip this step if your primary uses a CA-issued HTTPS certificate. You can safely skip this step if your **primary** node uses a CA-issued HTTPS certificate.
If your primary is using a self-signed certificate for *HTTPS* support, you will If your **primary** node is using a self-signed certificate for *HTTPS* support, you will
need to add that certificate to the secondary's trust store. Retrieve the need to add that certificate to the **secondary** node's trust store. Retrieve the
certificate from the primary and follow your distribution's instructions for certificate from the **primary** node and follow your distribution's instructions for
adding it to the secondary's trust store. In Debian/Ubuntu, for example, with a adding it to the **secondary** node's trust store. In Debian/Ubuntu, for example, with a
certificate file of `primary.geo.example.com.crt`, you would follow these steps: certificate file of `primary.geo.example.com.crt`, you would follow these steps:
``` ```sh
sudo -i sudo -i
cp primary.geo.example.com.crt /usr/local/share/ca-certificates cp primary.geo.example.com.crt /usr/local/share/ca-certificates
update-ca-certificates update-ca-certificates
...@@ -150,7 +151,7 @@ update-ca-certificates ...@@ -150,7 +151,7 @@ update-ca-certificates
Geo synchronizes repositories over HTTP/HTTPS, and therefore requires this clone Geo synchronizes repositories over HTTP/HTTPS, and therefore requires this clone
method to be enabled. Navigate to **Admin Area > Settings** method to be enabled. Navigate to **Admin Area > Settings**
(`/admin/application_settings`) on the primary node, and set (`/admin/application_settings`) on the **primary** node, and set
`Enabled Git access protocols` to `Both SSH and HTTP(S)` or `Only HTTP(S)`. `Enabled Git access protocols` to `Both SSH and HTTP(S)` or `Only HTTP(S)`.
### Step 7. Verify proper functioning of the secondary node ### Step 7. Verify proper functioning of the secondary node
......
# Docker Registry for a secondary node # Docker Registry for a secondary node
You can set up a [Docker Registry] on your You can set up a [Docker Registry] on your
secondary Geo node that mirrors the one on the primary Geo node. **secondary** Geo node that mirrors the one on the **primary** Geo node.
## Storage support ## Storage support
CAUTION: **Warning:** CAUTION: **Warning:**
If you use [local storage][registry-storage] If you use [local storage][registry-storage]
for the Container Registry you **cannot** replicate it to the secondary Geo node. for the Container Registry you **cannot** replicate it to a **secondary** node.
Docker Registry currently supports a few types of storages. If you choose a Docker Registry currently supports a few types of storages. If you choose a
distributed storage (`azure`, `gcs`, `s3`, `swift`, or `oss`) for your Docker distributed storage (`azure`, `gcs`, `s3`, `swift`, or `oss`) for your Docker
Registry on a primary Geo node, you can use the same storage for a secondary Registry on the **primary** node, you can use the same storage for a **secondary**
Docker Registry as well. For more information, read the Docker Registry as well. For more information, read the
[Load balancing considerations][registry-load-balancing] [Load balancing considerations][registry-load-balancing]
when deploying the Registry, and how to set up the storage driver for GitLab's when deploying the Registry, and how to set up the storage driver for GitLab's
......
...@@ -7,22 +7,22 @@ The requirements are listed [on the index page](index.md#requirements-for-runnin ...@@ -7,22 +7,22 @@ The requirements are listed [on the index page](index.md#requirements-for-runnin
## Can I use Geo in a disaster recovery situation? ## Can I use Geo in a disaster recovery situation?
Yes, but there are limitations to what we replicate (see Yes, but there are limitations to what we replicate (see
[What data is replicated to a secondary node?](#what-data-is-replicated-to-a-secondary-node)). [What data is replicated to a **secondary** node?](#what-data-is-replicated-to-a-secondary-node)).
Read the documentation for [Disaster Recovery](../disaster_recovery/index.md). Read the documentation for [Disaster Recovery](../disaster_recovery/index.md).
## What data is replicated to a secondary node? ## What data is replicated to a **secondary** node?
We currently replicate project repositories, LFS objects, generated We currently replicate project repositories, LFS objects, generated
attachments / avatars and the whole database. This means user accounts, attachments / avatars and the whole database. This means user accounts,
issues, merge requests, groups, project data, etc., will be available for issues, merge requests, groups, project data, etc., will be available for
query. query.
## Can I git push to a secondary node? ## Can I git push to a **secondary** node?
Yes! Pushing directly to a **secondary** node (for both HTTP and SSH, including git-lfs) was [introduced](https://about.gitlab.com/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3. Yes! Pushing directly to a **secondary** node (for both HTTP and SSH, including git-lfs) was [introduced](https://about.gitlab.com/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
## How long does it take to have a commit replicated to a secondary node? ## How long does it take to have a commit replicated to a **secondary** node?
All replication operations are asynchronous and are queued to be dispatched. Therefore, it depends on a lot of All replication operations are asynchronous and are queued to be dispatched. Therefore, it depends on a lot of
factors including the amount of traffic, how big your commit is, the factors including the amount of traffic, how big your commit is, the
...@@ -30,8 +30,8 @@ connectivity between your nodes, your hardware, etc. ...@@ -30,8 +30,8 @@ connectivity between your nodes, your hardware, etc.
## What if the SSH server runs at a different port? ## What if the SSH server runs at a different port?
That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** node to **secondary** nodes. That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** node to all **secondary** nodes.
## Is this possible to set up a Docker Registry for a secondary node that mirrors the one on a primary node? ## Is this possible to set up a Docker Registry for a **secondary** node that mirrors the one on the **primary** node?
Yes. See [Docker Registry for a secondary Geo node](docker_registry.md). Yes. See [Docker Registry for a **secondary** node](docker_registry.md).
...@@ -10,7 +10,7 @@ described, it is possible to adapt these instructions to your needs. ...@@ -10,7 +10,7 @@ described, it is possible to adapt these instructions to your needs.
_[diagram source - gitlab employees only][diagram-source]_ _[diagram source - gitlab employees only][diagram-source]_
The topology above assumes that the primary and secondary Geo clusters The topology above assumes that the **primary** and **secondary** Geo clusters
are located in two separate locations, on their own virtual network are located in two separate locations, on their own virtual network
with private IP addresses. The network is configured such that all machines within with private IP addresses. The network is configured such that all machines within
one geographic location can communicate with each other using their private IP addresses. one geographic location can communicate with each other using their private IP addresses.
...@@ -20,14 +20,14 @@ network topology of your deployment. ...@@ -20,14 +20,14 @@ network topology of your deployment.
The only external way to access the two Geo deployments is by HTTPS at The only external way to access the two Geo deployments is by HTTPS at
`gitlab.us.example.com` and `gitlab.eu.example.com` in the example above. `gitlab.us.example.com` and `gitlab.eu.example.com` in the example above.
> **Note:** The primary and secondary Geo deployments must be able to NOTE: **Note:**
communicate to each other over HTTPS. The **primary** and **secondary** Geo deployments must be able to communicate to each other over HTTPS.
## Redis and PostgreSQL High Availability ## Redis and PostgreSQL High Availability
The primary and secondary Redis and PostgreSQL should be configured The **primary** and **secondary** Redis and PostgreSQL should be configured
for high availability. Because of the additional complexity involved for high availability. Because of the additional complexity involved
in setting up this configuration for PostgreSQL and Redis in setting up this configuration for PostgreSQL and Redis,
it is not covered by this Geo HA documentation. it is not covered by this Geo HA documentation.
For more information about setting up a highly available PostgreSQL cluster and Redis cluster using the omnibus package see the high availability documentation for For more information about setting up a highly available PostgreSQL cluster and Redis cluster using the omnibus package see the high availability documentation for
...@@ -35,7 +35,7 @@ For more information about setting up a highly available PostgreSQL cluster and ...@@ -35,7 +35,7 @@ For more information about setting up a highly available PostgreSQL cluster and
[Redis](../../high_availability/redis.md), respectively. [Redis](../../high_availability/redis.md), respectively.
NOTE: **Note:** NOTE: **Note:**
It is possible to use cloud hosted services for PostgreSQL and Redis but this is beyond the scope of this document. It is possible to use cloud hosted services for PostgreSQL and Redis, but this is beyond the scope of this document.
## Prerequisites: A working GitLab HA cluster ## Prerequisites: A working GitLab HA cluster
...@@ -189,7 +189,6 @@ following modifications: ...@@ -189,7 +189,6 @@ following modifications:
registry['uid'] = 9002 registry['uid'] = 9002
registry['gid'] = 9002 registry['gid'] = 9002
``` ```
NOTE: **Note:** NOTE: **Note:**
If you had set up PostgreSQL cluster using the omnibus package and you had set If you had set up PostgreSQL cluster using the omnibus package and you had set
up `postgresql['sql_user_password'] = 'md5 digest of secret'` setting, keep in up `postgresql['sql_user_password'] = 'md5 digest of secret'` setting, keep in
......
...@@ -30,7 +30,7 @@ Implementing Geo provides the following benefits: ...@@ -30,7 +30,7 @@ Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects. - Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are. - Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
- Balance the load between your **primary** and **secondary** nodes, or offload your automated tests to the **secondary** node. - Balance the load between your **primary** and **secondary** nodes, or offload your automated tests to a **secondary** node.
In addition, it: In addition, it:
...@@ -42,8 +42,8 @@ In addition, it: ...@@ -42,8 +42,8 @@ In addition, it:
Geo provides: Geo provides:
- Read-only **secondary** nodes: Maintain one **primary** GitLab node while still enabling a read-only **secondary** node for each of your distributed teams. - Read-only **secondary** nodes: Maintain one **primary** GitLab node while still enabling read-only **secondary** nodes for each of your distributed teams.
- Authentication system hooks: The **secondary** node receives all authentication data (like user accounts and logins) from the **primary** instance. - Authentication system hooks: **Secondary** nodes receives all authentication data (like user accounts and logins) from the **primary** instance.
- An intuitive UI: **Secondary** nodes utilize the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** node. - An intuitive UI: **Secondary** nodes utilize the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** node.
## How it works ## How it works
...@@ -75,12 +75,12 @@ The following diagram illustrates the underlying architecture of Geo. ...@@ -75,12 +75,12 @@ The following diagram illustrates the underlying architecture of Geo.
In this diagram: In this diagram:
- There is the **primary** node and the details of one **secondary** node. - There is the **primary** node and the details of one **secondary** node.
- Writes to the database can only be performed on the **primary** node. A **secondary** node receives database - Writes to the database can only be performed on the **primary** node. A **secondary** node receives database
updates via PostgreSQL streaming replication. updates via PostgreSQL streaming replication.
- If present, the [LDAP server](#ldap) should be configured to replicate for [Disaster Recovery](../disaster_recovery/index.md) scenarios. - If present, the [LDAP server](#ldap) should be configured to replicate for [Disaster Recovery](../disaster_recovery/index.md) scenarios.
- A **secondary** node performs different type of synchronizations against the **primary** node, using a special - A **secondary** node performs different type of synchronizations against the **primary** node, using a special
authorization protected by JWT: authorization protected by JWT:
- Repositories are cloned/updated via Git over HTTPS. - Repositories are cloned/updated via Git over HTTPS.
- Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint. - Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint.
From the perspective of a user performing Git operations: From the perspective of a user performing Git operations:
...@@ -89,6 +89,7 @@ From the perspective of a user performing Git operations: ...@@ -89,6 +89,7 @@ From the perspective of a user performing Git operations:
- **Secondary** nodes are read-only but proxy Git push operations to the **primary** node. This makes **secondary** nodes appear to support push operations themselves. - **Secondary** nodes are read-only but proxy Git push operations to the **primary** node. This makes **secondary** nodes appear to support push operations themselves.
To simplify the diagram, some necessary components are omitted. Note that: To simplify the diagram, some necessary components are omitted. Note that:
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH. - Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH.
- Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse). - Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse).
...@@ -97,7 +98,7 @@ Note that a **secondary** node needs two different PostgreSQL databases: ...@@ -97,7 +98,7 @@ Note that a **secondary** node needs two different PostgreSQL databases:
- A read-only database instance that streams data from the main GitLab database. - A read-only database instance that streams data from the main GitLab database.
- [Another database instance](#geo-tracking-database) used internally by the **secondary** node to record what data has been replicated. - [Another database instance](#geo-tracking-database) used internally by the **secondary** node to record what data has been replicated.
In secondary nodes, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor). In **secondary** nodes, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor).
## Requirements for running Geo ## Requirements for running Geo
...@@ -115,12 +116,12 @@ The following are required to run Geo: ...@@ -115,12 +116,12 @@ The following are required to run Geo:
The following table lists basic ports that must be open between the **primary** and **secondary** nodes for Geo. The following table lists basic ports that must be open between the **primary** and **secondary** nodes for Geo.
| Primary server | Server secondary | Protocol | | **Primary** node | **Secondary** node | Protocol |
| -------------- | ---------------- | --------------- | |:-----------------|:-------------------|:-------------|
| 80 | 80 | HTTP | | 80 | 80 | HTTP |
| 443 | 443 | TCP or HTTPS | | 443 | 443 | TCP or HTTPS |
| 22 | 22 | TCP | | 22 | 22 | TCP |
| 5432 | | PostgreSQL | | 5432 | | PostgreSQL |
See the full list of ports used by GitLab in [Package defaults](https://docs.gitlab.com/omnibus/package-information/defaults.html) See the full list of ports used by GitLab in [Package defaults](https://docs.gitlab.com/omnibus/package-information/defaults.html)
...@@ -134,9 +135,12 @@ If you wish to terminate SSL at the GitLab application server instead, use TCP p ...@@ -134,9 +135,12 @@ If you wish to terminate SSL at the GitLab application server instead, use TCP p
### LDAP ### LDAP
We recommend that if you use LDAP on your **primary** node, you also set up a secondary LDAP server for the **secondary** node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work. We recommend that if you use LDAP on your **primary** node, you also set up secondary LDAP servers on each **secondary** node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
NOTE: **Note:**
It is possible for all **secondary** nodes to share an LDAP server, but additional latency can be an issue. Also, consider what LDAP server will be available in a [disaster recovery](../disaster_recovery/index.md) scenario if a **secondary** node is promoted to be a **primary** node.
Check with your LDAP provider for instructions on how to set up replication. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html). Check for instructions on how to set up replication in your LDAP service. Instructions will be different depending on the software or service used. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html).
### Geo Tracking Database ### Geo Tracking Database
...@@ -153,7 +157,7 @@ The tracking database requires the `postgres_fdw` extension. ...@@ -153,7 +157,7 @@ The tracking database requires the `postgres_fdw` extension.
This daemon: This daemon:
- Reads a log of events replicated by the **primary** node to the secondary database instance. - Reads a log of events replicated by the **primary** node to the **secondary** database instance.
- Updates the Geo Tracking Database instance with changes that need to be executed. - Updates the Geo Tracking Database instance with changes that need to be executed.
When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** node will execute the required operations and update the state. When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** node will execute the required operations and update the state.
......
...@@ -6,9 +6,9 @@ other compatible object storage). ...@@ -6,9 +6,9 @@ other compatible object storage).
## Configuration ## Configuration
At this time it is required that if object storage is enabled on the At this time it is required that if object storage is enabled on the
primary, it must also be enabled on the secondary. **primary** node, it must also be enabled on each **secondary** node.
The secondary nodes can use the same storage bucket as the primary, or **Secondary** nodes can use the same storage bucket as the **primary** node, or
they can use a replicated storage bucket. At this time GitLab does not they can use a replicated storage bucket. At this time GitLab does not
take care of content replication in object storage. take care of content replication in object storage.
...@@ -22,15 +22,15 @@ For user uploads, there is similar documentation to configure [upload object sto ...@@ -22,15 +22,15 @@ For user uploads, there is similar documentation to configure [upload object sto
You should enable and configure object storage on both **primary** and **secondary** You should enable and configure object storage on both **primary** and **secondary**
nodes. Migrating existing data to object storage should be performed on the nodes. Migrating existing data to object storage should be performed on the
**primary** node only; secondaries will automatically notice that the migrated **primary** node only. **Secondary** nodes will automatically notice that the migrated
files are now in object storage. files are now in object storage.
## Replication ## Replication
When using Amazon S3, you can use When using Amazon S3, you can use
[CRR](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) to [CRR](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) to
have automatic replication between the bucket used by the primary and have automatic replication between the bucket used by the **primary** node and
the bucket used by the secondary. the bucket used by **secondary** nodes.
If you are using Google Cloud Storage, consider using If you are using Google Cloud Storage, consider using
[Multi-Regional Storage](https://cloud.google.com/storage/docs/storage-classes#multi-regional). [Multi-Regional Storage](https://cloud.google.com/storage/docs/storage-classes#multi-regional).
......
...@@ -2,14 +2,14 @@ ...@@ -2,14 +2,14 @@
## Changing the sync capacity values ## Changing the sync capacity values
In the Geo admin page (`/admin/geo_nodes`), there are several variables that In the Geo admin page (`/admin/geo/nodes`), there are several variables that
can be tuned to improve performance of Geo: can be tuned to improve performance of Geo:
* Repository sync capacity - Repository sync capacity.
* File sync capacity - File sync capacity.
Increasing these values will increase the number of jobs that are scheduled, Increasing these values will increase the number of jobs that are scheduled.
but this may not lead to a more downloads in parallel unless the number of However, this may not lead to more downloads in parallel unless the number of
available Sidekiq threads is also increased. For example, if repository sync available Sidekiq threads is also increased. For example, if repository sync
capacity is increased from 25 to 50, you may also want to increase the number capacity is increased from 25 to 50, you may also want to increase the number
of Sidekiq threads from 25 to 50. See the [Sidekiq concurrency of Sidekiq threads from 25 to 50. See the [Sidekiq concurrency
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
For more information about setting up GitLab Geo, read the For more information about setting up GitLab Geo, read the
[Geo documentation](../../gitlab-geo/README.md). [Geo documentation](../../gitlab-geo/README.md).
When you're done, you can navigate to **Admin area > Geo** (`/admin/geo_nodes`). When you're done, you can navigate to **Admin area > Geo** (`/admin/geo/nodes`).
## Common settings ## Common settings
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment