Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
aedabc6d
Commit
aedabc6d
authored
Jan 29, 2021
by
Evan Read
Committed by
Nick Gaskill
Jan 29, 2021
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Explain Gitaly Cluster components
parent
2d7b448d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
58 additions
and
16 deletions
+58
-16
doc/administration/gitaly/praefect.md
doc/administration/gitaly/praefect.md
+58
-16
No files found.
doc/administration/gitaly/praefect.md
View file @
aedabc6d
...
...
@@ -10,7 +10,7 @@ type: reference
[
Gitaly
](
index.md
)
, the service that provides storage for Git repositories, can
be run in a clustered configuration to increase fault tolerance. In this
configuration, every Git repository is stored on every Gitaly node in the
cluster. Multiple clusters (or shards) can be configured.
cluster. Multiple clusters (or s
torage s
hards) can be configured.
NOTE:
Technical support for Gitaly clusters is limited to GitLab Premium and Ultimate
...
...
@@ -21,7 +21,7 @@ component for running a Gitaly Cluster.
![
Architecture diagram
](
img/praefect_architecture_v12_10.png
)
Using a Gitaly Cluster increase fault tolerance by:
Using a Gitaly Cluster increase
s
fault tolerance by:
-
Replicating write operations to warm standby Gitaly nodes.
-
Detecting Gitaly node failures.
...
...
@@ -53,7 +53,7 @@ Gitaly Cluster supports:
-
Reporting of possible data loss if replication queue is non-empty.
-
Marking repositories as
[
read only
](
#read-only-mode
)
if data loss is detected to prevent data inconsistencies.
Follow the
[
HA Gitaly
epic
](
https://gitlab.com/groups/gitlab-org/-/epics/1489
)
Follow the
[
Gitaly Cluster
epic
](
https://gitlab.com/groups/gitlab-org/-/epics/1489
)
for improvements including
[
horizontally distributing reads
](
https://gitlab.com/groups/gitlab-org/-/epics/2013
)
.
...
...
@@ -80,23 +80,65 @@ For more information, see:
-
[
Gitaly architecture
](
index.md#architecture
)
.
-
Geo
[
use cases
](
../geo/index.md#use-cases
)
and
[
architecture
](
../geo/index.md#architecture
)
.
## Cluster or shard
## Where Gitaly Cluster fits
GitLab accesses
[
repositories
](
../../user/project/repository/index.md
)
through the configured
[
repository storages
](
../repository_storage_paths.md
)
. Each new repository is stored on one of the
repository storages based on their configured weights. Each repository storage is either:
-
A Gitaly storage served directly by Gitaly. These map to a directory on the file system of a
Gitaly node.
-
A
[
virtual storage
](
#virtual-storage-or-direct-gitaly-storage
)
served by Praefect. A virtual
storage is a cluster of Gitaly storages that appear as a single repository storage.
Virtual storages are a feature of Gitaly Cluster. They support replicating the repositories to
multiple storages for fault tolerance. Virtual storages can improve performance by distributing
requests across Gitaly nodes. Their distributed nature makes it viable to have a single repository
storage in GitLab to simplify repository management.
## Components of Gitaly Cluster
Gitaly Cluster consists of multiple components:
-
[
Load balancer
](
#load-balancer
)
for distributing requests and providing fault-tolerant access to
Praefect nodes.
-
[
Praefect
](
#praefect
)
nodes for managing the cluster and routing requests to Gitaly nodes.
-
[
PostgreSQL database
](
#postgresql
)
for persisting cluster metadata and
[
PgBouncer
](
#pgbouncer
)
,
recommended for pooling Praefect's database connections.
-
[
Gitaly
](
index.md
)
nodes to provide repository storage and Git access.
![
Cluster example
](
img/cluster_example_v13_3.png
)
In this example:
-
Repositories are stored on a virtual storage called
`storage-1`
.
-
Three Gitaly nodes provide
`storage-1`
access:
`gitaly-1`
,
`gitaly-2`
, and
`gitaly-3`
.
-
The three Gitaly nodes store data on their filesystems.
### Virtual storage or direct Gitaly storage
Gitaly supports multiple models of scaling:
-
Clustering using Gitaly Cluster, where each repository is stored on multiple Gitaly nodes in the
cluster. Read requests are distributed between repository replicas and write requests are
broadcast to repository replicas.
-
Sharding using
[
repository storage paths
](
../repository_storage_paths.md
)
, where each repository
is stored on the assigned Gitaly node. All requests are routed to this node.
broadcast to repository replicas. GitLab accesses virtual storage.
-
Direct access to Gitaly storage using
[
repository storage paths
](
../repository_storage_paths.md
)
,
where each repository is stored on the assigned Gitaly node. All requests are routed to this node.
The following is Gitaly set up to use direct access to Gitaly instead of Gitaly Cluster:
![
Shard example
](
img/shard_example_v13_3.png
)
In this example:
| Cluster | Shard |
|:--------------------------------------------------|:----------------------------------------------|
| !
[
Cluster example
](
img/cluster_example_v13_3.png
)
| !
[
Shard example
](
img/shard_example_v13_3.png
)
|
-
Each repository is stored on one of three Gitaly storages:
`storage-1`
,
`storage-2`
,
or
`storage-3`
.
-
Each storage is serviced by a Gitaly node.
-
The three Gitaly nodes store data in three separate hashed storage locations.
Generally,
Gitaly Cluster can replace sharded configurations, at the expense of additional storage
needed to store each repository on multiple Gitaly nodes. The benefit of using Gitaly Cluster over
sharding
is:
Generally,
virtual storage with Gitaly Cluster can replace direct Gitaly storage configurations, at
the expense of additional storage needed to store each repository on multiple Gitaly nodes. The
benefit of using Gitaly Cluster over direct Gitaly storage
is:
-
Improved fault tolerance, because each Gitaly node has a copy of every repository.
-
Improved resource utilization, reducing the need for over-provisioning for shard-specific peak
...
...
@@ -773,7 +815,7 @@ configuration.
### Load Balancer
In a
highly available
Gitaly configuration, a load balancer is needed to route
In a
fault-tolerant
Gitaly configuration, a load balancer is needed to route
internal traffic from the GitLab application to the Praefect nodes. The
specifics on which load balancer to use or the exact configuration is beyond the
scope of the GitLab documentation.
...
...
@@ -785,7 +827,7 @@ addition to the GitLab nodes. Some requests handled by
process.
`gitaly-ruby`
uses the Gitaly address set in the GitLab server's
`git_data_dirs`
setting to make this connection.
We hope that if you’re managing
HA
systems like GitLab, you have a load balancer
We hope that if you’re managing
fault-tolerant
systems like GitLab, you have a load balancer
of choice already. Some examples include
[
HAProxy
](
https://www.haproxy.org/
)
(open-source),
[
Google Internal Load Balancer
](
https://cloud.google.com/load-balancing/docs/internal/
)
,
[
AWS Elastic Load Balancer
](
https://aws.amazon.com/elasticloadbalancing/
)
, F5
...
...
@@ -974,7 +1016,7 @@ To get started quickly:
1.
Go to
**Explore**
and query
`gitlab_build_info`
to verify that you are
getting metrics from all your machines.
Congratulations! You've configured an observable
highly available
Praefect
Congratulations! You've configured an observable
fault-tolerant
Praefect
cluster.
## Distributed reads
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment