Commit d7f55b23 authored by Marcel Amirault's avatar Marcel Amirault Committed by Russell Dickenson

Copy edit geo data types page

parent 456dc198
......@@ -5,7 +5,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
type: howto
---
# Geo data types support **(PREMIUM SELF)**
# Supported Geo data types **(PREMIUM SELF)**
A Geo data type is a specific class of data that is required by one or more GitLab features to
store relevant information.
......@@ -14,7 +14,7 @@ To replicate data produced by these features with Geo, we use several strategies
## Data types
We currently distinguish between three different data types:
We distinguish between three different data types:
- [Git repositories](#git-repositories)
- [Blobs](#blobs)
......@@ -51,7 +51,7 @@ verification methods:
| Blobs | Infrastructure registry _(object storage)_ | Geo with API/Managed (*2*) | _Not implemented_ |
| Blobs | Versioned Terraform State _(file system)_ | Geo with API | SHA256 checksum |
| Blobs | Versioned Terraform State _(object storage)_ | Geo with API/Managed (*2*) | _Not implemented_ |
| Blobs | External Merge Request Diffs _(file system)_ | Geo with API | SHA256 checksum |
| Blobs | External Merge Request Diffs _(file system)_ | Geo with API | SHA256 checksum |
| Blobs | External Merge Request Diffs _(object storage)_ | Geo with API/Managed (*2*) | _Not implemented_ |
| Blobs | Pipeline artifacts _(file system)_ | Geo with API | SHA256 checksum |
| Blobs | Pipeline artifacts _(object storage)_ | Geo with API/Managed (*2*) | SHA256 checksum |
......@@ -66,15 +66,20 @@ verification methods:
A GitLab instance can have one or more repository shards. Each shard has a Gitaly instance that
is responsible for allowing access and operations on the locally stored Git repositories. It can run
on a machine with a single disk, multiple disks mounted as a single mount-point (like with a RAID array),
or using LVM.
on a machine:
It requires no special file system and can work with NFS or a mounted Storage Appliance (there may be
performance limitations when using a remote file system).
- With a single disk.
- With multiple disks mounted as a single mount-point (like with a RAID array).
- Using LVM.
Geo will trigger garbage collection in Gitaly to [deduplicate forked repositories](../../../development/git_object_deduplication.md#git-object-deduplication-and-gitlab-geo) on Geo secondary sites.
GitLab does not require a special file system and can work with:
Communication is done via Gitaly's own gRPC API. There are three possible ways of synchronization:
- NFS.
- A mounted Storage Appliance (there may be performance limitations when using a remote file system).
Geo triggers garbage collection in Gitaly to [deduplicate forked repositories](../../../development/git_object_deduplication.md#git-object-deduplication-and-gitlab-geo) on Geo secondary sites.
Communication is done via Gitaly's own gRPC API, with three possible ways of synchronization:
- Using regular Git clone/fetch from one Geo site to another (with special authentication).
- Using repository snapshots (for when the first method fails or repository is corrupt).
......@@ -90,7 +95,7 @@ They all live in the same shard and share the same base name with a `-wiki` and
for Wiki and Design Repository cases.
Besides that, there are snippet repositories. They can be connected to a project or to some specific user.
Both types will be synced to a secondary site.
Both types are synced to a secondary site.
### Blobs
......@@ -102,7 +107,7 @@ GitLab stores files and blobs such as Issue attachments or LFS objects into eith
- Hosted by you (like MinIO).
- A Storage Appliance that exposes an Object Storage-compatible API.
When using the file system store instead of Object Storage, you need to use network mounted file systems
When using the file system store instead of Object Storage, use network mounted file systems
to run GitLab when using more than one node.
With respect to replication and verification:
......@@ -118,17 +123,17 @@ GitLab relies on data stored in multiple databases, for different use-cases.
PostgreSQL is the single point of truth for user-generated content in the Web interface, like issues content, comments
as well as permissions and credentials.
PostgreSQL can also hold some level of cached data like HTML rendered Markdown, cached merge-requests diff (this can
also be configured to be offloaded to object storage).
PostgreSQL can also hold some level of cached data like HTML-rendered Markdown and cached merge-requests diff.
This can also be configured to be offloaded to object storage.
We use PostgreSQL's own replication functionality to replicate data from the **primary** to **secondary** sites.
We use Redis both as a cache store and to hold persistent data for our background jobs system. Because both
use-cases have data that are exclusive to the same Geo site, we don't replicate it between sites.
Elasticsearch is an optional database, that can enable advanced searching capabilities, like improved Advanced Search
in both source-code level and user generated content in Issues / Merge-Requests and discussions. Currently it's not
supported in Geo.
Elasticsearch is an optional database that for advanced searching capabilities. It can improve search
in both source-code level, and user generated content in issues, merge requests, and discussions.
Elasticsearch is not supported in Geo.
## Limitations on replication/verification
......@@ -174,35 +179,35 @@ Feature.enable(:geo_package_file_replication)
WARNING:
Features not on this list, or with **No** in the **Replicated** column,
are not replicated to a **secondary** site. Failing over without manually
replicating data from those features will cause the data to be **lost**.
If you wish to use those features on a **secondary** site, or to execute a failover
replicating data from those features causes the data to be **lost**.
To use those features on a **secondary** site, or to execute a failover
successfully, you must replicate their data using some other means.
|Feature | Replicated (added in GitLab version) | Verified (added in GitLab version) | Object Storage replication (see [Geo with Object Storage](object_storage.md)) | Notes |
|:--------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------|:------------------------------------------------------------------------|:------------------------------------------------------------------------------|:------|
|[Application data in PostgreSQL](../../postgresql/index.md) | **Yes** (10.2) | **Yes** (10.2) | No | |
|[Project repository](../../../user/project/repository/) | **Yes** (10.2) | **Yes** (10.7) | No | |
|[Project wiki repository](../../../user/project/wiki/) | **Yes** (10.2) | **Yes** (10.7) | No | |
|[Group wiki repository](../../../user/project/wiki/group.md) | [**Yes** (13.10)](https://gitlab.com/gitlab-org/gitlab/-/issues/208147) | No | No | Behind feature flag `geo_group_wiki_repository_replication`, enabled by default. |
|[Uploads](../../uploads.md) | **Yes** (10.2) | [No](https://gitlab.com/groups/gitlab-org/-/epics/1817) | No | Verified only on transfer or manually using [Integrity Check Rake Task](../../raketasks/check.md) on both sites and comparing the output between them. |
|[LFS objects](../../lfs/index.md) | **Yes** (10.2) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/8922) | Via Object Storage provider if supported. Native Geo support (Beta). | Verified only on transfer or manually using [Integrity Check Rake Task](../../raketasks/check.md) on both sites and comparing the output between them. GitLab versions 11.11.x and 12.0.x are affected by [a bug that prevents any new LFS objects from replicating](https://gitlab.com/gitlab-org/gitlab/-/issues/32696).<br /><br />Behind feature flag `geo_lfs_object_replication`, enabled by default. |
|[Personal snippets](../../../user/snippets.md) | **Yes** (10.2) | **Yes** (10.2) | No | |
|[Project snippets](../../../user/snippets.md) | **Yes** (10.2) | **Yes** (10.2) | No | |
|[CI job artifacts](../../../ci/pipelines/job_artifacts.md) | **Yes** (10.4) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/8923) | Via Object Storage provider if supported. Native Geo support (Beta). | Verified only manually using [Integrity Check Rake Task](../../raketasks/check.md) on both sites and comparing the output between them. Job logs also verified on transfer. |
|[CI Pipeline Artifacts](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/ci/pipeline_artifact.rb) | [**Yes** (13.11)](https://gitlab.com/gitlab-org/gitlab/-/issues/238464) | [**Yes** (13.11)](https://gitlab.com/gitlab-org/gitlab/-/issues/238464) | Via Object Storage provider if supported. Native Geo support (Beta). | Persists additional artifacts after a pipeline completes |
|[Container Registry](../../packages/container_registry.md) | **Yes** (12.3) | No | No | Disabled by default. See [instructions](docker_registry.md) to enable. |
|[Content in object storage (beta)](object_storage.md) | **Yes** (12.4) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/13845) | No | |
|[Infrastructure Registry](../../../user/packages/infrastructure_registry/index.md) | **Yes** (14.0) | [**Yes**](#limitation-of-verification-for-files-in-object-storage) (14.0) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag `geo_package_file_replication`, enabled by default. |
|[Project designs repository](../../../user/project/issues/design_management.md) | **Yes** (12.7) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/32467) | No | Designs also require replication of LFS objects and Uploads. |
|[Package Registry](../../../user/packages/package_registry/index.md) | **Yes** (13.2) | [**Yes**](#limitation-of-verification-for-files-in-object-storage) (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag `geo_package_file_replication`, enabled by default. |
|[Versioned Terraform State](../../terraform_state.md) | **Yes** (13.5) | [**Yes**](#limitation-of-verification-for-files-in-object-storage) (13.12) | Via Object Storage provider if supported. Native Geo support (Beta). | Replication is behind the feature flag `geo_terraform_state_version_replication`, enabled by default. Verification was behind the feature flag `geo_terraform_state_version_verification`, which was removed in 14.0|
|[External merge request diffs](../../merge_request_diffs.md) | **Yes** (13.5) | **Yes** (14.6) | Via Object Storage provider if supported. Native Geo support (Beta). | Replication is behind the feature flag `geo_merge_request_diff_replication`, enabled by default. Verification is behind the feature flag `geo_merge_request_diff_verification`, enabled by default in 14.6.|
|[Versioned snippets](../../../user/snippets.md#versioned-snippets) | [**Yes** (13.7)](https://gitlab.com/groups/gitlab-org/-/epics/2809) | [**Yes** (14.2)](https://gitlab.com/groups/gitlab-org/-/epics/2810) | No | Verification was implemented behind the feature flag `geo_snippet_repository_verification` in 13.11, and the feature flag was removed in 14.2. |
|[GitLab Pages](../../pages/index.md) | [**Yes** (14.3)](https://gitlab.com/groups/gitlab-org/-/epics/589) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag `geo_pages_deployment_replication`, enabled by default. |
|[Server-side Git hooks](../../server_hooks.md) | [Not planned](https://gitlab.com/groups/gitlab-org/-/epics/1867) | No | No | Not planned because of current implementation complexity, low customer interest, and availability of alternatives to hooks. |
|[Elasticsearch integration](../../../integration/elasticsearch.md) | [Not planned](https://gitlab.com/gitlab-org/gitlab/-/issues/1186) | No | No | Not planned because further product discovery is required and Elasticsearch (ES) clusters can be rebuilt. Secondaries currently use the same ES cluster as the primary. |
|[Dependency proxy images](../../../user/packages/dependency_proxy/index.md) | [Not planned](https://gitlab.com/gitlab-org/gitlab/-/issues/259694) | No | No | Blocked by [Geo: Secondary Mimicry](https://gitlab.com/groups/gitlab-org/-/epics/1528). Replication of this cache is not needed for disaster recovery purposes because it can be recreated from external sources. |
|[Vulnerability Export](../../../user/application_security/vulnerability_report/#export-vulnerability-details) | [Not planned](https://gitlab.com/groups/gitlab-org/-/epics/3111) | No | No | Not planned because they are ephemeral and sensitive information. They can be regenerated on demand. |
|Feature | Replicated (added in GitLab version) | Verified (added in GitLab version) | Object Storage replication (see [Geo with Object Storage](object_storage.md)) | Notes |
|:--------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------|:---------------------------------------------------------------------------|:------------------------------------------------------------------------------|:------|
|[Application data in PostgreSQL](../../postgresql/index.md) | **Yes** (10.2) | **Yes** (10.2) | No | |
|[Project repository](../../../user/project/repository/) | **Yes** (10.2) | **Yes** (10.7) | No | |
|[Project wiki repository](../../../user/project/wiki/) | **Yes** (10.2) | **Yes** (10.7) | No | |
|[Group wiki repository](../../../user/project/wiki/group.md) | [**Yes** (13.10)](https://gitlab.com/gitlab-org/gitlab/-/issues/208147) | No | No | Behind feature flag `geo_group_wiki_repository_replication`, enabled by default. |
|[Uploads](../../uploads.md) | **Yes** (10.2) | [No](https://gitlab.com/groups/gitlab-org/-/epics/1817) | No | Verified only on transfer or manually using [Integrity Check Rake Task](../../raketasks/check.md) on both sites and comparing the output between them. |
|[LFS objects](../../lfs/index.md) | **Yes** (10.2) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/8922) | Via Object Storage provider if supported. Native Geo support (Beta). | Verified only on transfer or manually using [Integrity Check Rake Task](../../raketasks/check.md) on both sites and comparing the output between them. GitLab versions 11.11.x and 12.0.x are affected by [a bug that prevents any new LFS objects from replicating](https://gitlab.com/gitlab-org/gitlab/-/issues/32696).<br><br>Behind feature flag `geo_lfs_object_replication`, enabled by default. |
|[Personal snippets](../../../user/snippets.md) | **Yes** (10.2) | **Yes** (10.2) | No | |
|[Project snippets](../../../user/snippets.md) | **Yes** (10.2) | **Yes** (10.2) | No | |
|[CI job artifacts](../../../ci/pipelines/job_artifacts.md) | **Yes** (10.4) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/8923) | Via Object Storage provider if supported. Native Geo support (Beta). | Verified only manually using [Integrity Check Rake Task](../../raketasks/check.md) on both sites and comparing the output between them. Job logs also verified on transfer. |
|[CI Pipeline Artifacts](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/ci/pipeline_artifact.rb) | [**Yes** (13.11)](https://gitlab.com/gitlab-org/gitlab/-/issues/238464) | [**Yes** (13.11)](https://gitlab.com/gitlab-org/gitlab/-/issues/238464) | Via Object Storage provider if supported. Native Geo support (Beta). | Persists additional artifacts after a pipeline completes. |
|[Container Registry](../../packages/container_registry.md) | **Yes** (12.3) | No | No | Disabled by default. See [instructions](docker_registry.md) to enable. |
|[Content in object storage (beta)](object_storage.md) | **Yes** (12.4) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/13845) | No | |
|[Infrastructure Registry](../../../user/packages/infrastructure_registry/index.md) | **Yes** (14.0) | [**Yes**](#limitation-of-verification-for-files-in-object-storage) (14.0) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag `geo_package_file_replication`, enabled by default. |
|[Project designs repository](../../../user/project/issues/design_management.md) | **Yes** (12.7) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/32467) | No | Designs also require replication of LFS objects and Uploads. |
|[Package Registry](../../../user/packages/package_registry/index.md) | **Yes** (13.2) | [**Yes**](#limitation-of-verification-for-files-in-object-storage) (13.10) | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag `geo_package_file_replication`, enabled by default. |
|[Versioned Terraform State](../../terraform_state.md) | **Yes** (13.5) | [**Yes**](#limitation-of-verification-for-files-in-object-storage) (13.12) | Via Object Storage provider if supported. Native Geo support (Beta). | Replication is behind the feature flag `geo_terraform_state_version_replication`, enabled by default. Verification was behind the feature flag `geo_terraform_state_version_verification`, which was removed in 14.0. |
|[External merge request diffs](../../merge_request_diffs.md) | **Yes** (13.5) | **Yes** (14.6) | Via Object Storage provider if supported. Native Geo support (Beta). | Replication is behind the feature flag `geo_merge_request_diff_replication`, enabled by default. Verification is behind the feature flag `geo_merge_request_diff_verification`, enabled by default in 14.6.|
|[Versioned snippets](../../../user/snippets.md#versioned-snippets) | [**Yes** (13.7)](https://gitlab.com/groups/gitlab-org/-/epics/2809) | [**Yes** (14.2)](https://gitlab.com/groups/gitlab-org/-/epics/2810) | No | Verification was implemented behind the feature flag `geo_snippet_repository_verification` in 13.11, and the feature flag was removed in 14.2. |
|[GitLab Pages](../../pages/index.md) | [**Yes** (14.3)](https://gitlab.com/groups/gitlab-org/-/epics/589) | No | Via Object Storage provider if supported. Native Geo support (Beta). | Behind feature flag `geo_pages_deployment_replication`, enabled by default. |
|[Server-side Git hooks](../../server_hooks.md) | [Not planned](https://gitlab.com/groups/gitlab-org/-/epics/1867) | No | No | Not planned because of current implementation complexity, low customer interest, and availability of alternatives to hooks. |
|[Elasticsearch integration](../../../integration/elasticsearch.md) | [Not planned](https://gitlab.com/gitlab-org/gitlab/-/issues/1186) | No | No | Not planned because further product discovery is required and Elasticsearch (ES) clusters can be rebuilt. Secondaries use the same ES cluster as the primary. |
|[Dependency proxy images](../../../user/packages/dependency_proxy/index.md) | [Not planned](https://gitlab.com/gitlab-org/gitlab/-/issues/259694) | No | No | Blocked by [Geo: Secondary Mimicry](https://gitlab.com/groups/gitlab-org/-/epics/1528). Replication of this cache is not needed for disaster recovery purposes because it can be recreated from external sources. |
|[Vulnerability Export](../../../user/application_security/vulnerability_report/#export-vulnerability-details) | [Not planned](https://gitlab.com/groups/gitlab-org/-/epics/3111) | No | No | Not planned because they are ephemeral and sensitive information. They can be regenerated on demand. |
#### Limitation of verification for files in Object Storage
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment