Commit 402a2afe authored by Fabian Zimmer's avatar Fabian Zimmer Committed by Achilleas Pipinellis

Replaced node with site to align with our defined terms on setup page

parent 32474423
...@@ -22,12 +22,14 @@ Geo undergoes significant changes from release to release. Upgrades **are** supp ...@@ -22,12 +22,14 @@ Geo undergoes significant changes from release to release. Upgrades **are** supp
Fetching large repositories can take a long time for teams located far from a single GitLab instance. Fetching large repositories can take a long time for teams located far from a single GitLab instance.
Geo provides local, read-only instances of your GitLab instances. This can reduce the time it takes Geo provides local, read-only sites of your GitLab instances. This can reduce the time it takes
to clone and fetch large repositories, speeding up development. to clone and fetch large repositories, speeding up development.
For a video introduction to Geo, see [Introduction to GitLab Geo - GitLab Features](https://www.youtube.com/watch?v=-HDLxSjEh6w). For a video introduction to Geo, see [Introduction to GitLab Geo - GitLab Features](https://www.youtube.com/watch?v=-HDLxSjEh6w).
To make sure you're using the right version of the documentation, navigate to [the Geo page on GitLab.com](https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/geo/index.md) and choose the appropriate release from the **Switch branch/tag** dropdown. For example, [`v11.2.3-ee`](https://gitlab.com/gitlab-org/gitlab/blob/v11.2.3-ee/doc/administration/geo/index.md). To make sure you're using the right version of the documentation, navigate to [the Geo page on GitLab.com](https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/geo/index.md) and choose the appropriate release from the **Switch branch/tag** dropdown. For example, [`v13.7.6-ee`](https://gitlab.com/gitlab-org/gitlab/-/blob/v13.7.6-ee/doc/administration/geo/index.md).
Geo uses a set of defined terms that is described in the [Geo Glossary](glossary.md), please familiarize yourself with those terms.
## Use cases ## Use cases
...@@ -35,21 +37,21 @@ Implementing Geo provides the following benefits: ...@@ -35,21 +37,21 @@ Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects. - Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are. - Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
- Balance the read-only load between your **primary** and **secondary** nodes. - Balance the read-only load between your **primary** and **secondary** sites.
In addition, it: In addition, it:
- Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see [limitations](#limitations)). - Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see [limitations](#limitations)).
- Overcomes slow connections between distant offices, saving time by improving speed for distributed teams. - Overcomes slow connections between distant offices, saving time by improving speed for distributed teams.
- Helps reducing the loading time for automated tasks, custom integrations, and internal workflows. - Helps reducing the loading time for automated tasks, custom integrations, and internal workflows.
- Can quickly fail over to a **secondary** node in a [disaster recovery](disaster_recovery/index.md) scenario. - Can quickly fail over to a **secondary** site in a [disaster recovery](disaster_recovery/index.md) scenario.
- Allows [planned failover](disaster_recovery/planned_failover.md) to a **secondary** node. - Allows [planned failover](disaster_recovery/planned_failover.md) to a **secondary** site.
Geo provides: Geo provides:
- Read-only **secondary** nodes: Maintain one **primary** GitLab node while still enabling read-only **secondary** nodes for each of your distributed teams. - Read-only **secondary** sites: Maintain one **primary** GitLab site while still enabling read-only **secondary** sites for each of your distributed teams.
- Authentication system hooks: **Secondary** nodes receives all authentication data (like user accounts and logins) from the **primary** instance. - Authentication system hooks: **Secondary** sites receives all authentication data (like user accounts and logins) from the **primary** instance.
- An intuitive UI: **Secondary** nodes use the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** node. - An intuitive UI: **Secondary** sites use the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** sites.
### Gitaly Cluster ### Gitaly Cluster
...@@ -64,16 +66,16 @@ Your Geo instance can be used for cloning and fetching projects, in addition to ...@@ -64,16 +66,16 @@ Your Geo instance can be used for cloning and fetching projects, in addition to
When Geo is enabled, the: When Geo is enabled, the:
- Original instance is known as the **primary** node. - Original instance is known as the **primary** site.
- Replicated read-only nodes are known as **secondary** nodes. - Replicated read-only sites are known as **secondary** sites.
Keep in mind that: Keep in mind that:
- **Secondary** nodes talk to the **primary** node to: - **Secondary** sites talk to the **primary** site to:
- Get user data for logins (API). - Get user data for logins (API).
- Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT). - Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT).
- In GitLab Premium 10.0 and later, the **primary** node no longer talks to **secondary** nodes to notify for changes (API). - In GitLab Premium 10.0 and later, the **primary** site no longer talks to **secondary** sites to notify for changes (API).
- Pushing directly to a **secondary** node (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3. - Pushing directly to a **secondary** site (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
- There are [limitations](#limitations) when using Geo. - There are [limitations](#limitations) when using Geo.
### Architecture ### Architecture
...@@ -84,31 +86,31 @@ The following diagram illustrates the underlying architecture of Geo. ...@@ -84,31 +86,31 @@ The following diagram illustrates the underlying architecture of Geo.
In this diagram: In this diagram:
- There is the **primary** node and the details of one **secondary** node. - There is the **primary** site and the details of one **secondary** site.
- Writes to the database can only be performed on the **primary** node. A **secondary** node receives database - Writes to the database can only be performed on the **primary** site. A **secondary** site receives database
updates via PostgreSQL streaming replication. updates via PostgreSQL streaming replication.
- If present, the [LDAP server](#ldap) should be configured to replicate for [Disaster Recovery](disaster_recovery/index.md) scenarios. - If present, the [LDAP server](#ldap) should be configured to replicate for [Disaster Recovery](disaster_recovery/index.md) scenarios.
- A **secondary** node performs different type of synchronizations against the **primary** node, using a special - A **secondary** site performs different type of synchronizations against the **primary** site, using a special
authorization protected by JWT: authorization protected by JWT:
- Repositories are cloned/updated via Git over HTTPS. - Repositories are cloned/updated via Git over HTTPS.
- Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint. - Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint.
From the perspective of a user performing Git operations: From the perspective of a user performing Git operations:
- The **primary** node behaves as a full read-write GitLab instance. - The **primary** site behaves as a full read-write GitLab instance.
- **Secondary** nodes are read-only but proxy Git push operations to the **primary** node. This makes **secondary** nodes appear to support push operations themselves. - **Secondary** sites are read-only but proxy Git push operations to the **primary** site. This makes **secondary** sites appear to support push operations themselves.
To simplify the diagram, some necessary components are omitted. Note that: To simplify the diagram, some necessary components are omitted. Note that:
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH. - Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH.
- Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse). - Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse).
Note that a **secondary** node needs two different PostgreSQL databases: Note that a **secondary** site needs two different PostgreSQL databases:
- A read-only database instance that streams data from the main GitLab database. - A read-only database instance that streams data from the main GitLab database.
- [Another database instance](#geo-tracking-database) used internally by the **secondary** node to record what data has been replicated. - [Another database instance](#geo-tracking-database) used internally by the **secondary** site to record what data has been replicated.
In **secondary** nodes, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor). In **secondary** sites, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor).
## Requirements for running Geo ## Requirements for running Geo
...@@ -122,7 +124,7 @@ The following are required to run Geo: ...@@ -122,7 +124,7 @@ The following are required to run Geo:
- PostgreSQL 11+ with [Streaming Replication](https://wiki.postgresql.org/wiki/Streaming_Replication) - PostgreSQL 11+ with [Streaming Replication](https://wiki.postgresql.org/wiki/Streaming_Replication)
- Git 2.9+ - Git 2.9+
- Git-lfs 2.4.2+ on the user side when using LFS - Git-lfs 2.4.2+ on the user side when using LFS
- All nodes must run the same GitLab version. - All sites must run the same GitLab version.
Additionally, check the GitLab [minimum requirements](../../install/requirements.md), Additionally, check the GitLab [minimum requirements](../../install/requirements.md),
and we recommend you use: and we recommend you use:
...@@ -132,9 +134,9 @@ and we recommend you use: ...@@ -132,9 +134,9 @@ and we recommend you use:
### Firewall rules ### Firewall rules
The following table lists basic ports that must be open between the **primary** and **secondary** nodes for Geo. The following table lists basic ports that must be open between the **primary** and **secondary** sites for Geo.
| **Primary** node | **Secondary** node | Protocol | | **Primary** site | **Secondary** site | Protocol |
|:-----------------|:-------------------|:-------------| |:-----------------|:-------------------|:-------------|
| 80 | 80 | HTTP | | 80 | 80 | HTTP |
| 443 | 443 | TCP or HTTPS | | 443 | 443 | TCP or HTTPS |
...@@ -153,10 +155,10 @@ If you wish to terminate SSL at the GitLab application server instead, use TCP p ...@@ -153,10 +155,10 @@ If you wish to terminate SSL at the GitLab application server instead, use TCP p
### LDAP ### LDAP
We recommend that if you use LDAP on your **primary** node, you also set up secondary LDAP servers on each **secondary** node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work. We recommend that if you use LDAP on your **primary** site, you also set up secondary LDAP servers on each **secondary** site. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** site using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
NOTE: NOTE:
It is possible for all **secondary** nodes to share an LDAP server, but additional latency can be an issue. Also, consider what LDAP server will be available in a [disaster recovery](disaster_recovery/index.md) scenario if a **secondary** node is promoted to be a **primary** node. It is possible for all **secondary** sites to share an LDAP server, but additional latency can be an issue. Also, consider what LDAP server will be available in a [disaster recovery](disaster_recovery/index.md) scenario if a **secondary** site is promoted to be a **primary** site.
Check for instructions on how to set up replication in your LDAP service. Instructions will be different depending on the software or service used. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html). Check for instructions on how to set up replication in your LDAP service. Instructions will be different depending on the software or service used. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html).
...@@ -168,32 +170,32 @@ The tracking database instance is used as metadata to control what needs to be u ...@@ -168,32 +170,32 @@ The tracking database instance is used as metadata to control what needs to be u
- Fetch new LFS Objects. - Fetch new LFS Objects.
- Fetch changes from a repository that has recently been updated. - Fetch changes from a repository that has recently been updated.
Because the replicated database instance is read-only, we need this additional database instance for each **secondary** node. Because the replicated database instance is read-only, we need this additional database instance for each **secondary** site.
### Geo Log Cursor ### Geo Log Cursor
This daemon: This daemon:
- Reads a log of events replicated by the **primary** node to the **secondary** database instance. - Reads a log of events replicated by the **primary** site to the **secondary** database instance.
- Updates the Geo Tracking Database instance with changes that need to be executed. - Updates the Geo Tracking Database instance with changes that need to be executed.
When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** node will execute the required operations and update the state. When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** site will execute the required operations and update the state.
This new architecture allows GitLab to be resilient to connectivity issues between the nodes. It doesn't matter how long the **secondary** node is disconnected from the **primary** node as it will be able to replay all the events in the correct order and become synchronized with the **primary** node again. This new architecture allows GitLab to be resilient to connectivity issues between the sites. It doesn't matter how long the **secondary** site is disconnected from the **primary** site as it will be able to replay all the events in the correct order and become synchronized with the **primary** site again.
## Limitations ## Limitations
WARNING: WARNING:
This list of limitations only reflects the latest version of GitLab. If you are using an older version, extra limitations may be in place. This list of limitations only reflects the latest version of GitLab. If you are using an older version, extra limitations may be in place.
- Pushing directly to a **secondary** node redirects (for HTTP) or proxies (for SSH) the request to the **primary** node instead of [handling it directly](https://gitlab.com/gitlab-org/gitlab/-/issues/1381), except when using Git over HTTP with credentials embedded within the URI. For example, `https://user:password@secondary.tld`. - Pushing directly to a **secondary** site redirects (for HTTP) or proxies (for SSH) the request to the **primary** site instead of [handling it directly](https://gitlab.com/gitlab-org/gitlab/-/issues/1381), except when using Git over HTTP with credentials embedded within the URI. For example, `https://user:password@secondary.tld`.
- The **primary** node has to be online for OAuth login to happen. Existing sessions and Git are not affected. Support for the **secondary** node to use an OAuth provider independent from the primary is [being planned](https://gitlab.com/gitlab-org/gitlab/-/issues/208465). - The **primary** site has to be online for OAuth login to happen. Existing sessions and Git are not affected. Support for the **secondary** site to use an OAuth provider independent from the primary is [being planned](https://gitlab.com/gitlab-org/gitlab/-/issues/208465).
- The installation takes multiple manual steps that together can take about an hour depending on circumstances. We are working on improving this experience. See [Omnibus GitLab issue #2978](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/2978) for details. - The installation takes multiple manual steps that together can take about an hour depending on circumstances. We are working on improving this experience. See [Omnibus GitLab issue #2978](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/2978) for details.
- Real-time updates of issues/merge requests (for example, via long polling) doesn't work on the **secondary** node. - Real-time updates of issues/merge requests (for example, via long polling) doesn't work on the **secondary** site.
- [Selective synchronization](replication/configuration.md#selective-synchronization) applies only to files and repositories. Other datasets are replicated to the **secondary** node in full, making it inappropriate for use as an access control mechanism. - [Selective synchronization](replication/configuration.md#selective-synchronization) applies only to files and repositories. Other datasets are replicated to the **secondary** site in full, making it inappropriate for use as an access control mechanism.
- Object pools for forked project deduplication work only on the **primary** node, and are duplicated on the **secondary** node. - Object pools for forked project deduplication work only on the **primary** site, and are duplicated on the **secondary** site.
- GitLab Runners cannot register with a **secondary** node. Support for this is [planned for the future](https://gitlab.com/gitlab-org/gitlab/-/issues/3294). - GitLab Runners cannot register with a **secondary** site. Support for this is [planned for the future](https://gitlab.com/gitlab-org/gitlab/-/issues/3294).
- Geo **secondary** nodes can not be configured to [use high-availability configurations of PostgreSQL](https://gitlab.com/groups/gitlab-org/-/epics/2536). - Configuring Geo **secondary** sites to [use high-availability configurations of PostgreSQL](https://gitlab.com/groups/gitlab-org/-/epics/2536) is currently in **alpha** support.
- [Selective synchronization](replication/configuration.md#selective-synchronization) only limits what repositories are replicated. The entire PostgreSQL data is still replicated. Selective synchronization is not built to accomodate compliance / export control use cases. - [Selective synchronization](replication/configuration.md#selective-synchronization) only limits what repositories are replicated. The entire PostgreSQL data is still replicated. Selective synchronization is not built to accomodate compliance / export control use cases.
### Limitations on replication/verification ### Limitations on replication/verification
...@@ -206,7 +208,7 @@ For setup instructions, see [Setting up Geo](setup/index.md). ...@@ -206,7 +208,7 @@ For setup instructions, see [Setting up Geo](setup/index.md).
## Post-installation documentation ## Post-installation documentation
After installing GitLab on the **secondary** nodes and performing the initial configuration, see the following documentation for post-installation information. After installing GitLab on the **secondary** site(s) and performing the initial configuration, see the following documentation for post-installation information.
### Configuring Geo ### Configuring Geo
...@@ -214,16 +216,16 @@ For information on configuring Geo, see [Geo configuration](replication/configur ...@@ -214,16 +216,16 @@ For information on configuring Geo, see [Geo configuration](replication/configur
### Updating Geo ### Updating Geo
For information on how to update your Geo nodes to the latest GitLab version, see [Updating the Geo nodes](replication/updating_the_geo_nodes.md). For information on how to update your Geo site(s) to the latest GitLab version, see [Updating the Geo sites](replication/updating_the_geo_nodes.md).
### Pausing and resuming replication ### Pausing and resuming replication
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/35913) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2. > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/35913) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2.
WARNING: WARNING:
In GitLab 13.2 and 13.3, promoting a secondary node to a primary while the In GitLab 13.2 and 13.3, promoting a secondary site to a primary while the
secondary is paused fails. Do not pause replication before promoting a secondary is paused fails. Do not pause replication before promoting a
secondary. If the node is paused, be sure to resume before promoting. This secondary. If the site is paused, be sure to resume before promoting. This
issue has been fixed in GitLab 13.4 and later. issue has been fixed in GitLab 13.4 and later.
WARNING: WARNING:
...@@ -232,7 +234,7 @@ Omnibus GitLab-managed database. External databases are currently not supported. ...@@ -232,7 +234,7 @@ Omnibus GitLab-managed database. External databases are currently not supported.
In some circumstances, like during [upgrades](replication/updating_the_geo_nodes.md) or a [planned failover](disaster_recovery/planned_failover.md), it is desirable to pause replication between the primary and secondary. In some circumstances, like during [upgrades](replication/updating_the_geo_nodes.md) or a [planned failover](disaster_recovery/planned_failover.md), it is desirable to pause replication between the primary and secondary.
Pausing and resuming replication is done via a command line tool from the secondary node where the `postgresql` service is enabled. Pausing and resuming replication is done via a command line tool from the a node in the secondary site where the `postgresql` service is enabled.
If `postgresql` is on a standalone database node, ensure that `gitlab.rb` on that node contains the configuration line `gitlab_rails['geo_node_name'] = 'node_name'`, where `node_name` is the same as the `geo_name_name` on the application node. If `postgresql` is on a standalone database node, ensure that `gitlab.rb` on that node contains the configuration line `gitlab_rails['geo_node_name'] = 'node_name'`, where `node_name` is the same as the `geo_name_name` on the application node.
...@@ -262,7 +264,7 @@ For information on using Geo in disaster recovery situations to mitigate data-lo ...@@ -262,7 +264,7 @@ For information on using Geo in disaster recovery situations to mitigate data-lo
### Replicating the Container Registry ### Replicating the Container Registry
For more information on how to replicate the Container Registry, see [Docker Registry for a **secondary** node](replication/docker_registry.md). For more information on how to replicate the Container Registry, see [Docker Registry for a **secondary** site](replication/docker_registry.md).
### Security Review ### Security Review
...@@ -278,9 +280,9 @@ For an example of how to set up a location-aware Git remote URL with AWS Route53 ...@@ -278,9 +280,9 @@ For an example of how to set up a location-aware Git remote URL with AWS Route53
### Backfill ### Backfill
Once a **secondary** node is set up, it will start replicating missing data from Once a **secondary** site is set up, it will start replicating missing data from
the **primary** node in a process known as **backfill**. You can monitor the the **primary** site in a process known as **backfill**. You can monitor the
synchronization process on each Geo node from the **primary** node's **Geo Nodes** synchronization process on each Geo site from the **primary** site's **Geo Nodes**
dashboard in your browser. dashboard in your browser.
Failures that happen during a backfill are scheduled to be retried at the end Failures that happen during a backfill are scheduled to be retried at the end
...@@ -288,7 +290,7 @@ of the backfill. ...@@ -288,7 +290,7 @@ of the backfill.
## Remove Geo site ## Remove Geo site
For more information on removing a Geo node, see [Removing **secondary** Geo nodes](replication/remove_geo_site.md). For more information on removing a Geo site, see [Removing **secondary** Geo sites](replication/remove_geo_site.md).
## Disable Geo ## Disable Geo
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment