The following tables are intended to guide you to choose the right combination of capabilties based on your requirements. It is common to want the most
available, quickly recoverable, highly performant and fully data resilient solution. However, there are tradeoffs.
The tables lists features on the left and provides their capabilities to the right along with known trade-offs.
|Gitaly Cluster | Very high - tolerant of node failures | RTO for a single node of 10s with no manual intervention | Data is stored on multiple nodes | Good - While writes may take slightly longer due to voting, read distribution improves read speeds | **Trade-off** - Slight decrease in write speed for redundant, strongly-consistent storage solution. **Risks** - [Does not currently support snapshot backups](gitaly/index.md#snapshot-backup-and-recovery-limitations), GitLab backup task can be slow for large data sets |
|Gitaly Shards | Single storage location is a single point of failure | Would need to restore only shards which failed | Single point of failure | Good - can allocate repositories to shards to spread load | **Trade-off** - Need to manually configure repositories into different shards to balance loads / storage space **Risks** - Single point of failure relies on recovery process when single-node failure occurs |
|Gitaly + NFS | Single storage location is a single point of failure | Single node failure requires restoration from backup | Single point of failure | Average - NFS is not ideally suited to large quantities of small reads / writes which can have a detrimental impact on performance | **Trade-off** - Easy and familiar administration though NFS is not ideally suited to Git demands **Risks** - Many instances of NFS compatibility issues which provide very poor customer experiences |
## Geo Capabilities
If your availabity needs to span multiple zones or multiple locations, please read about [Geo](geo/index.md).
|Geo| Depends on the architecture of the Geo site. It is possible to deploy secondaries in single and multiple node configurations. | Eventually consistent. Recovery point depends on replication lag, which depends on a number of factors such as network speeds. Geo supports failover from a primary to secondary site using manual commands that are scriptable. | Geo currently replicates 100% of planned data types and verifies 50%. See [limitations table](geo/replication/datatypes.md#limitations-on-replicationverification) for more detail. | Improves read/clone times for users of a secondary. | Geo is not intended to replace other backup/restore solutions. Because of replication lag and the possibility of replicating bad data from a primary, we recommend that customers also take regular backups of their primary site and test the restore process. |
## Scenarios for failure modes and available mitigation paths
The following table outlines failure modes and mitigation paths for the product offerings detailed in the tables above. Note - Gitaly Cluster install assumes an odd number replication factor of 3 or greater
| Gitaly Mode | Loss of Single Gitaly Node | Application / Data Corruption | Regional Outage (Loss of Instance) | Notes |
| Single Gitaly Node | Downtime - Must restore from backup | Downtime - Must restore from Backup | Downtime - Must wait for outage to end | |
| Single Gitaly Node + Geo Secondary | Downtime - Must restore from backup, can perform a manual failover to secondary | Downtime - Must restore from Backup, errors could have propagated to secondary | Manual intervention - failover to Geo secondary | |
| Sharded Gitaly Install | Partial Downtime - Only repos on impacted node affected, must restore from backup | Partial Downtime - Only repos on impacted node affected, must restore from backup | Downtime - Must wait for outage to end | |
| Sharded Gitaly Install + Geo Secondary | Partial Downtime - Only repos on impacted node affected, must restore from backup, could perform manual failover to secondary for impacted repos | Partial Downtime - Only repos on impacted node affected, must restore from backup, errors could have propagated to secondary | Manual intervention - failover to Geo secondary | |
| Gitaly Cluster Install* | No Downtime - will swap repository primary to another node after 10 seconds | N/A - All writes are voted on by multiple Gitaly Cluster nodes | Downtime - Must wait for outage to end | Snapshot backups for Gitaly Cluster nodes not supported at this time |
| Gitaly Cluster Install* + Geo Secondary | No Downtime - will swap repository primary to another node after 10 seconds | N/A - All writes are voted on by multiple Gitaly Cluster nodes | Manual intervention - failover to Geo secondary | Snapshot backups for Gitaly Cluster nodes not supported at this time |
- Read requests are distributed between multiple Gitaly nodes, which can improve performance.
- Read requests are distributed between multiple Gitaly nodes, which can improve performance.
- Write requests are broadcast to repository replicas.
- Write requests are broadcast to repository replicas.
WARNING:
## Guidance regarding Gitaly Cluster
Engineering support for NFS for Git repositories is deprecated. Read the
[deprecation notice](#nfs-deprecation-notice).
## Recommendations
Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Please review existing technical limitations and considerations prior to deploying Gitaly Cluster.
The following table provides recommendations based on your requirements. Users means concurrent
| User scaling | Reference architecture to use | GitLab instance configuration | Gitaly configuration | Git repository storage | Instances dedicated to Gitaly services |
Please also review the [configuration guidance](configure_gitaly.md) and [Repository storage options](../repository_storage_paths.md) to make sure that Gitaly Cluster is the best set-up for you. Finally, refer to the following guidance:
| Up to 1000 users | [1K](../reference_architectures/1k_users.md) | Single instance for all of GitLab | Already integrated as a service | Local disk | 0 |
| Up to 2999 users | [2K](../reference_architectures/2k_users.md) | Horizontally-scaled GitLab instance (non-HA) | Single Gitaly server | Local disk of Gitaly instance | 1 |
| 3000 users and over | [3K](../reference_architectures/3k_users.md) | Horizontally-scaled GitLab instance with HA | Gitaly Cluster | Local disk of each Gitaly instance | 8 |
| RTO/RPO requirements for AZ failure tolerance regardless of user scale | [3K (with downscaling)](../reference_architectures/3k_users.md) | Custom (1) | Gitaly Cluster | Local disk of each Gitaly instance | 8 |
1. If you need AZ failure tolerance for user scaling lower than 3K, please contact Customer Success
- If you have not yet migrated to Gitaly Cluster and want to continue using NFS, remain on the
to discuss your RTO and RPO needs and what options exist to meet those objectives.
service you are using. NFS is supported in 14.x releases.
- If you have not yet migrated to Gitaly Cluster but want to migrate away from NFS, you have two options - a sharded Gitaly instance or Gitaly Cluster.
- If you have migrated to Gitaly Cluster and the limitations and tradeoffs are not suitable for your environment, your options are:
1.[Migrate off Gitaly Cluster](#migrate-off-gitaly-cluster) back to your NFS solution
1.[Migrate off Gitaly Cluster](#migrate-off-gitaly-cluster) to NFS solution or to a sharded Gitaly instance.
GitLab [reference architectures](../reference_architectures/index.md) provide guidance for what
Reach out to your Technical Account Manager or customer support if you have any questions.
Gitaly configuration is recommended at each scale. The Gitaly configuration is noted by the
architecture diagrams and the table of required resources.
WARNING:
### Known issues
Some [known database inconsistency issues](#known-issues) exist in Gitaly Cluster. We recommend you
remain on your current service for now. We can adjust the date for
The following table outlines current known issues impacting the use of Gitaly Cluster. For
[NFS support removal](#nfs-deprecation-notice) if this applies to you.
the current status of these issues, please refer to the referenced issues and epics.
| Gitaly Cluster + Geo - Issues retrying failed syncs | If Gitaly Cluster is used on a Geo secondary site, repositories that have failed to sync could continue to fail when Geo tries to resync them. Recovering from this state requires assistance from support to run manual steps. Work is in-progress to update Gitaly Cluster to [identify repositories with a unique and persistent identifier](https://gitlab.com/gitlab-org/gitaly/-/issues/3485), which is expected to resolve the issue. | No known solution at this time. |
| Database inconsistencies due to repository access outside of Gitaly Cluster's control | Operations that write to the repository storage that do not go through normal Gitaly Cluster methods can cause database inconsistencies. These can include (but are not limited to) snapshot restoration for cluster node disks, node upgrades which modify files under Git control, or any other disk operation that may touch repository storage external to GitLab. The Gitaly team is actively working to provide manual commands to [reconcile the Praefect database with the repository storage](https://gitlab.com/groups/gitlab-org/-/epics/6723). | Don't directly change repositories on any Gitaly Cluster node at this time. |
| Praefect unable to insert data into the database due to migrations not being applied after an upgrade | If the database is not kept up to date with completed migrations, then the Praefect node is unable to perform normal operation. | Make sure the Praefect database is up and running with all migrations completed (For example: `/opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-migrate-status` should show a list of all applied migrations). Consider [requesting live upgrade assistance](https://about.gitlab.com/support/scheduling-live-upgrade-assistance.html) so your upgrade plan can be reviewed by support. |
| Restoring a Gitaly Cluster node from a snapshot in a running cluster | Because the Gitaly Cluster runs with consistent state, introducing a single node that is behind will result in the cluster not being able to reconcile the nodes data and other nodes data | Don't restore a single Gitaly Cluster node from a backup snapshot. If you need to restore from backup, it's best to snapshot all Gitaly Cluster nodes at the same time and take a database dump of the Praefect database. |
### Snapshot backup and recovery limitations
Gitaly Cluster does not support snapshot backups because these can cause issues where the Praefect
database becomes out of sync with the disk storage. Because of how Praefect rebuilds the replication
metadata of Gitaly disk information during a restore, we recommend using the
[official backup and restore Rake tasks](../../raketasks/backup_restore.md). If you are unable to use this method, please contact customer support for restoration help.
To track progress on work on a solution for manually re-synchronizing the Praefect database with
disk storage, see [this epic](https://gitlab.com/groups/gitlab-org/-/epics/6575).
### What to do if you are on Gitaly Cluster experiencing an issue or limitation
Please contact customer support for immediate help in restoration or recovery.
## Gitaly
## Gitaly
...
@@ -188,26 +204,6 @@ WARNING:
...
@@ -188,26 +204,6 @@ WARNING:
If complete cluster failure occurs, disaster recovery plans should be executed. These can affect the
If complete cluster failure occurs, disaster recovery plans should be executed. These can affect the
RPO and RTO discussed above.
RPO and RTO discussed above.
### Known issues
The following table outlines current known issues impacting the use of Gitaly Cluster. For
the current status of these issues, please refer to the referenced issues and epics.
| Gitaly Cluster + Geo can cause database inconsistencies | There are some conditions during Geo replication that can cause database inconsistencies with Gitaly Cluster. These have been identified and are being resolved by updating Gitaly Cluster to [identify repositories with a unique and persistent identifier](https://gitlab.com/gitlab-org/gitaly/-/issues/3485). |
| Database inconsistencies due to repository access outside of Gitaly Cluster's control | Operations that write to the repository storage that do not go through normal Gitaly Cluster methods can cause database inconsistencies. These can include (but are not limited to) snapshot restoration for cluster node disks, node upgrades which modify files under Git control, or any other disk operation that may touch repository storage external to GitLab. The Gitaly team is actively working to provide manual commands to [reconcile the Praefect database with the repository storage](https://gitlab.com/groups/gitlab-org/-/epics/6723). |
### Snapshot backup and recovery limitations
Gitaly Cluster does not support snapshot backups because these can cause issues where the Praefect
database becomes out of sync with the disk storage. Because of how Praefect rebuilds the replication
metadata of Gitaly disk information during a restore, we recommend using the
[official backup and restore Rake tasks](../../raketasks/backup_restore.md).
To track progress on work on a solution for manually re-synchronizing the Praefect database with
disk storage, see [this epic](https://gitlab.com/groups/gitlab-org/-/epics/6575).
### Virtual storage
### Virtual storage
Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository
Virtual storage makes it viable to have a single repository storage in GitLab to simplify repository
...
@@ -236,9 +232,7 @@ As with normal Gitaly storages, virtual storages can be sharded.
...
@@ -236,9 +232,7 @@ As with normal Gitaly storages, virtual storages can be sharded.
### Moving beyond NFS
### Moving beyond NFS
WARNING:
Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be unavailable starting GitLab 15.0. Please see our [statement of support](https://about.gitlab.com/support/statement-of-support.html#gitaly-and-nfs) for more details.
Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
unavailable from GitLab 15.0. No further enhancements are planned for this feature.
[Network File System (NFS)](https://en.wikipedia.org/wiki/Network_File_System)
[Network File System (NFS)](https://en.wikipedia.org/wiki/Network_File_System)
is not well suited to Git workloads which are CPU and IOPS sensitive.
is not well suited to Git workloads which are CPU and IOPS sensitive.
...
@@ -359,13 +353,9 @@ For configuration information, see [Configure replication factor](praefect.md#co
...
@@ -359,13 +353,9 @@ For configuration information, see [Configure replication factor](praefect.md#co
For more information on configuring Gitaly Cluster, see [Configure Gitaly Cluster](praefect.md).
For more information on configuring Gitaly Cluster, see [Configure Gitaly Cluster](praefect.md).
## Migrate to Gitaly Cluster
## Migrating to Gitaly Cluster
We recommend you migrate to Gitaly Cluster if your [requirements recommend](#recommendations) Gitaly
Please see [current guidance on Gitaly Cluster](#guidance-regarding-gitaly-cluster). The basic process for migrating to Gitaly Cluster involves:
Cluster.
Whether migrating to Gitaly Cluster because of [NFS support deprecation](index.md#nfs-deprecation-notice)
or to move from single Gitaly nodes, the basic process involves:
@@ -376,8 +366,7 @@ or to move from single Gitaly nodes, the basic process involves:
...
@@ -376,8 +366,7 @@ or to move from single Gitaly nodes, the basic process involves:
WARNING:
WARNING:
Some [known database inconsistency issues](#known-issues) exist in Gitaly Cluster. We recommend you
Some [known database inconsistency issues](#known-issues) exist in Gitaly Cluster. We recommend you
remain on your current service for now. We can adjust the date for
remain on your current service for now.
[NFS support removal](#nfs-deprecation-notice) if this applies to you.
### Migrate off Gitaly Cluster
### Migrate off Gitaly Cluster
...
@@ -631,20 +620,4 @@ The second facet presents the only real solution. For this, we developed
...
@@ -631,20 +620,4 @@ The second facet presents the only real solution. For this, we developed
## NFS deprecation notice
## NFS deprecation notice
Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
unavailable from GitLab 15.0. No further enhancements are planned for this feature.
unavailable from GitLab 15.0. For further information, please see our [NFS Deprecation](../nfs.md#gitaly-and-nfs-deprecation) documentation.
Additional information:
-[Recommended NFS mount options and known issues with Gitaly and NFS](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss).
-[GitLab statement of support](https://about.gitlab.com/support/statement-of-support.html#gitaly-and-nfs).
GitLab recommends:
- Creating a [Gitaly Cluster](#gitaly-cluster) as soon as possible.
-[Moving your repositories](#migrate-to-gitaly-cluster) from NFS-based storage to Gitaly
Cluster.
We welcome your feedback on this process. You can:
- Raise a support ticket.
-[Comment on the epic](https://gitlab.com/groups/gitlab-org/-/epics/4916).
Starting with GitLab version 14.0, support for NFS to store Git repository data will be deprecated. Technical customer support and engineering support will be available for the 14.x releases. Engineering will fix bugs and security vulnerabilities consistent with our [release and maintenance policy](../policy/maintenance.md#security-releases).
At the end of the 14.12 milestone (tenatively June 22nd, 2022) technical and engineering support for using NFS to store Git repository data will be officially at end-of-life. There will be no product changes or troubleshooting provided via Engineering, Security or Paid Support channels.
For those customers still running earlier versions of GitLab, [our support eligibility and maintenance policy applies](https://about.gitlab.com/support/statement-of-support.html#version-support).
For the 14.x releases, we will continue to help with Git related tickets from customers running one or more Gitaly servers with its data stored on NFS. Examples may include:
- Performance issues or timeouts accessing Git data
- Commits or branches vanish
- GitLab intermittently returns the wrong Git data (such as reporting that a repository has no branches)
Assistance will be limited to activities like:
- Verifying developers' workflow uses features like protected branches
- Reviewing GitLab event data from the database to advise if it looks like a force push over-wrote branches
- Verifying that NFS client mount options match our [documented recommendations](#mount-options)
- Analyzing the GitLab Workhorse and Rails logs, and determining that `500` errors being seen in the environment are caused by slow responses from Gitaly
GitLab support will be unable to continue with the investigation if:
- The date of the request is on or after the release of GitLab version 15.0, and
- Support Engineers and Management determine that all reasonable non-NFS root causes have been exhausted
If the issue is reproducible, or if it happens intermittently but regularly, GitLab Support will investigate providing the issue reproduces without the use of NFS. In order to reproduce without NFS, the affected repositories should be migrated to a different Gitaly shard, such as Gitaly cluster or a standalone Gitaly VM, backed with block storage.
### Why remove NFS for Git repository data
{:.no-toc}
NFS is not well-suited to a workload consisting of many small files, like Git repositories. NFS does provide a number of configuration options designed to improve performance. However, over time, a number of these mount options have proven to result in inconsistencies across multiple nodes mounting the NFS volume, up to and including data loss. Addressing these inconsistencies consume extraordinary development and support engineer time that hamper our ability to develop [Gitaly Cluster](gitaly/praefect.md), our purpose-built solution to addressing the deficiencies of NFS in this environment.
Please note that Gitaly Cluster provides highly-available Git repository storage. If this is not a requirement, single-node Gitaly backed by block storage is a suitable substitute.
Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
Engineering support for NFS for Git repositories is deprecated. Technical support is planned to be
unavailable from GitLab 15.0. No further enhancements are planned for this feature.
unavailable from GitLab 15.0. No further enhancements are planned for this feature.
Read:
Read:
-The [Gitaly and NFS deprecation notice](gitaly/index.md#nfs-deprecation-notice).