Commit ff5730a9 authored by Drew Blessing's avatar Drew Blessing

Add scaling options to HA documentation

Full HA is not required in many cases. There are several options
for scaling GitLab without going full HA. This documents those
options so customers don't dive into HA when it's not really
required.
parent ecb6c46c
# High Availability # Scaling and High Availability
GitLab supports several different types of clustering and high-availability. GitLab supports several different types of clustering and high-availability.
The solution you choose will be based on the level of scalability and The solution you choose will be based on the level of scalability and
...@@ -13,7 +13,7 @@ of Git, developers can still commit code locally even when GitLab is not ...@@ -13,7 +13,7 @@ of Git, developers can still commit code locally even when GitLab is not
available. However, some GitLab features such as the issue tracker and available. However, some GitLab features such as the issue tracker and
Continuous Integration are not available when GitLab is down. Continuous Integration are not available when GitLab is down.
**Keep in mind that all Highly Available solutions come with a trade-off between **Keep in mind that all highly-available solutions come with a trade-off between
cost/complexity and uptime**. The more uptime you want, the more complex the cost/complexity and uptime**. The more uptime you want, the more complex the
solution. And the more complex the solution, the more work is involved in solution. And the more complex the solution, the more work is involved in
setting up and maintaining it. High availability is not free and every HA setting up and maintaining it. High availability is not free and every HA
...@@ -31,8 +31,9 @@ engineers, and live questions coming in from some of our customers. ...@@ -31,8 +31,9 @@ engineers, and live questions coming in from some of our customers.
## GitLab Components ## GitLab Components
The following components need to be considered for an HA environment. In many The following components need to be considered for a scaled or highly-available
cases components can be combined on the same nodes to reduce complexity. environment. In many cases components can be combined on the same nodes to reduce
complexity.
- Unicorn/Workhorse - Web-requests (UI, API, Git over HTTP) - Unicorn/Workhorse - Web-requests (UI, API, Git over HTTP)
- Sidekiq - Asynchronous/Background jobs - Sidekiq - Asynchronous/Background jobs
...@@ -41,15 +42,64 @@ cases components can be combined on the same nodes to reduce complexity. ...@@ -41,15 +42,64 @@ cases components can be combined on the same nodes to reduce complexity.
- [PGBouncer](pgbouncer.md) - Database pool manager - [PGBouncer](pgbouncer.md) - Database pool manager
- [Redis](redis.md) - Key/Value store (User sessions, cache, queue for Sidekiq) - [Redis](redis.md) - Key/Value store (User sessions, cache, queue for Sidekiq)
- Sentinel - Redis health check/failover manager - Sentinel - Redis health check/failover manager
- Gitaly - Provides high-level RPC access to Git repositories
## Architecture Examples ## Scalable Architecture Examples
When an organization reaches a certain threshold it will be necessary to scale
the GitLab instance. Still, true high availability may not be necessary. There
are options for scaling GitLab instances relatively easily without incurring the
infrastructure and maintenance costs of full high availability.
GitLab recommends that an organization begin to explore scaling when they have
around 1,000 active users. At this point increasing CPU cores and memory is
not recommended as there are some components that may not handle increased
load well on a single host.
### Basic Scaling
This is the simplest form of scaling and will work for the majority of
cases. Backend components such as PostgreSQL, Redis and storage are offloaded
to their own nodes while the remaining GitLab components all run on 2 or more
application nodes.
This form of scaling also works well in a cloud environment when it is more
cost-effective to deploy several small nodes rather than a single
larger one.
- 1 PostgreSQL node
- 1 Redis nodes
- 2 or more GitLab application nodes (Unicorn, Workhorse, Sidekiq)
- 1 NFS/Gitaly storage server
### Full Scaling
For very large installations it may be necessary to further split components
for maximum scalability. In a fully-scaled architecture the application node
is split into separate Sidekiq and Unicorn/Workhorse nodes. One indication that
this architecture is required is if Sidekiq queues begin to periodically increase
in size, indicating that there is contention or not enough resources.
- 1 PostgreSQL node
- 1 Redis nodes
- 2 or more GitLab application nodes (Unicorn, Workhorse)
- 2 or more Sidekiq nodes
- 2 or more NFS/Gitaly storage servers
## High Availability Architecture Examples
When organizations require scaling *and* high availability the following
architectures can be utilized. As the introduction section at the top of this
page mentions, there is a tradeoff between cost/complexity and uptime. Be sure
this complexity is absolutely required before taking the step into full
high availability.
For all examples below, we recommend running Consul and Redis Sentinel on For all examples below, we recommend running Consul and Redis Sentinel on
dedicated nodes. If Consul is running on PostgreSQL nodes or Sentinel on dedicated nodes. If Consul is running on PostgreSQL nodes or Sentinel on
Redis nodes there is a potential that high resource usage by PostgreSQL or Redis nodes there is a potential that high resource usage by PostgreSQL or
Redis could prevent communication between the other Consul and Sentinel nodes. Redis could prevent communication between the other Consul and Sentinel nodes.
This may lead to the other nodes believing a failure has occurred and automated This may lead to the other nodes believing a failure has occurred and automated
failover is necessary. Isolating them away from the services they monitor reduces failover is necessary. Isolating them from the services they monitor reduces
the chances of split-brain. the chances of split-brain.
The examples below do not really address high availability of NFS. Some enterprises The examples below do not really address high availability of NFS. Some enterprises
...@@ -77,7 +127,7 @@ the contention. ...@@ -77,7 +127,7 @@ the contention.
- 2 Redis nodes - 2 Redis nodes
- 3 Consul/Sentinel nodes - 3 Consul/Sentinel nodes
- 2 or more GitLab application nodes (Unicorn, Workhorse, Sidekiq, PGBouncer) - 2 or more GitLab application nodes (Unicorn, Workhorse, Sidekiq, PGBouncer)
- 1 NFS server/appliance - 1 NFS/Gitaly server
![Horizontal architecture diagram](../img/high_availability/horizontal.png) ![Horizontal architecture diagram](../img/high_availability/horizontal.png)
...@@ -93,7 +143,7 @@ contention due to certain workloads. ...@@ -93,7 +143,7 @@ contention due to certain workloads.
- 3 Consul/Sentinel nodes - 3 Consul/Sentinel nodes
- 2 or more Sidekiq nodes - 2 or more Sidekiq nodes
- 2 or more Web nodes (Unicorn, Workhorse, PGBouncer) - 2 or more Web nodes (Unicorn, Workhorse, PGBouncer)
- 1 or more NFS servers/appliances - 1 or more NFS/Gitaly servers
![Hybrid architecture diagram](../img/high_availability/hybrid.png) ![Hybrid architecture diagram](../img/high_availability/hybrid.png)
...@@ -112,7 +162,7 @@ with the added complexity of many more nodes to configure, manage and monitor. ...@@ -112,7 +162,7 @@ with the added complexity of many more nodes to configure, manage and monitor.
- 2 or more Git nodes (Git over SSH/Git over HTTP) - 2 or more Git nodes (Git over SSH/Git over HTTP)
- 2 or more API nodes (All requests to `/api`) - 2 or more API nodes (All requests to `/api`)
- 2 or more Web nodes (All other web requests) - 2 or more Web nodes (All other web requests)
- 2 or more NFS servers/appliances - 2 or more NFS/Gitaly servers
![Fully Distributed architecture diagram](../img/high_availability/fully-distributed.png) ![Fully Distributed architecture diagram](../img/high_availability/fully-distributed.png)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment