@@ -101,12 +101,12 @@ From the perspective of a user performing Git operations:
- The **primary** site behaves as a full read-write GitLab instance.
-**Secondary** sites are read-only but proxy Git push operations to the **primary** site. This makes **secondary** sites appear to support push operations themselves.
To simplify the diagram, some necessary components are omitted. Note that:
To simplify the diagram, some necessary components are omitted.
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH.
- Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse).
Note that a**secondary** site needs two different PostgreSQL databases:
A**secondary** site needs two different PostgreSQL databases:
- A read-only database instance that streams data from the main GitLab database.
-[Another database instance](#geo-tracking-database) used internally by the **secondary** site to record what data has been replicated.
@@ -204,7 +204,7 @@ successfully, you must replicate their data using some other means.
|[Server-side Git hooks](../../server_hooks.md) | [No](https://gitlab.com/groups/gitlab-org/-/epics/1867) | No | No | |
|[Elasticsearch integration](../../../integration/elasticsearch.md) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/1186) | No | No | |
|[GitLab Pages](../../pages/index.md) | [No](https://gitlab.com/groups/gitlab-org/-/epics/589) | No | Via Object Storage provider if supported. **No** native Geo support (Beta). | |
|[Dependency proxy images](../../../user/packages/dependency_proxy/index.md) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/259694) | No | No | Blocked on [Geo: Secondary Mimicry](https://gitlab.com/groups/gitlab-org/-/epics/1528). Note that replication of this cache is not needed for Disaster Recovery purposes because it can be recreated from external sources. |
|[Dependency proxy images](../../../user/packages/dependency_proxy/index.md) | [No](https://gitlab.com/gitlab-org/gitlab/-/issues/259694) | No | No | Blocked on [Geo: Secondary Mimicry](https://gitlab.com/groups/gitlab-org/-/epics/1528). Replication of this cache is not needed for Disaster Recovery purposes because it can be recreated from external sources. |
|[Vulnerability Export](../../../user/application_security/vulnerability_report/#export-vulnerability-details) | [Not planned](https://gitlab.com/groups/gitlab-org/-/epics/3111) | No | | Not planned because they are ephemeral and sensitive. They can be regenerated on demand. |
#### Limitation of verification for files in Object Storage
@@ -242,7 +242,7 @@ the tracking database on port 5432.
1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
1. The reconfigure should automatically create the database. If needed, you can perform this task manually. Note that this task (whether run by itself or during reconfigure) requires the database user to be a superuser.
1. The reconfigure should automatically create the database. If needed, you can perform this task manually. This task (whether run by itself or during reconfigure) requires the database user to be a superuser.
@@ -346,7 +346,7 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`.
The Internal Load Balancer is used to balance any internal connections the GitLab environment requires
such as connections to [PgBouncer](#configure-pgbouncer) and [Praefect](#configure-praefect)(Gitaly Cluster).
Note that it's a separate node from the External Load Balancer and shouldn't have any access externally.
It's a separate node from the External Load Balancer and shouldn't have any access externally.
The following IP will be used as an example:
...
...
@@ -2086,7 +2086,7 @@ but with smaller performance requirements, several modifications can be consider
- Combining select nodes: Some nodes can be combined to reduce complexity at the cost of some performance:
- GitLab Rails and Sidekiq: Sidekiq nodes can be removed and the component instead enabled on the GitLab Rails nodes.
- PostgreSQL and PgBouncer: PgBouncer nodes can be removed and the component instead enabled on PostgreSQL with the Internal Load Balancer pointing to them instead.
- Reducing the node counts: Some node types do not need consensus and can run with fewer nodes (but more than one for redundancy). Note that this will also lead to reduced performance.
- Reducing the node counts: Some node types do not need consensus and can run with fewer nodes (but more than one for redundancy). This will also lead to reduced performance.
- GitLab Rails and Sidekiq: Stateless services don't have a minimum node count. Two are enough for redundancy.
- Gitaly and Praefect: A quorum is not strictly necessary. Two Gitaly nodes and two Praefect nodes are enough for redundancy.
- Running select components in reputable Cloud PaaS solutions: Select components of the GitLab setup can instead be run on Cloud Provider PaaS solutions. By doing this, additional dependent components can also be removed:
This document is a proposal to work towards reducing and limiting table sizes on GitLab.com. We establish a **measurable target** by limiting table size to a certain threshold. This will be used as an indicator to drive database focus and decision making. With GitLab.com growing, we continuously re-evaluate which tables need to be worked on to prevent or otherwise fix violations.
Note that this is not meant to be a hard rule but rather a strong indication that work needs to be done to break a table apart or otherwise reduce its size.
This is not meant to be a hard rule but rather a strong indication that work needs to be done to break a table apart or otherwise reduce its size.
This is meant to be read in context with the [Database Sharding blueprint](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64115),
which paints the bigger picture. This proposal here is thought to be part of the "debloating step" below, as we aim to reduce storage requirements and improve data modeling. Partitioning is part of the standard tool-belt: where possible, we can already use partitioning as a solution to cut physical table sizes significantly. Both will help to prepare efforts like decomposition (database usage is already optimized) and sharding (database is already partitioned along an identified data access dimension).
...
...
@@ -124,7 +124,7 @@ In order to maintain and improve operational stability and lessen development bu
1. Indexes are smaller, can be maintained more efficiently and fit better into memory
1. Data migrations are easier to reason about, take less time to implement and execute
Note that this target is *pragmatic*: We understand table sizes depend on feature usage, code changes and other factors - which all change over time. We may not always find solutions where we can tightly limit the size of physical tables once and for all. That is acceptable though and we primarily aim to keep the situation on GitLab.com under control. We adapt our efforts to the situation present on GitLab.com and will re-evaluate frequently.
This target is *pragmatic*: We understand table sizes depend on feature usage, code changes and other factors - which all change over time. We may not always find solutions where we can tightly limit the size of physical tables once and for all. That is acceptable though and we primarily aim to keep the situation on GitLab.com under control. We adapt our efforts to the situation present on GitLab.com and will re-evaluate frequently.
While there are changes we can make that lead to a constant maximum physical table size over time, this doesn't need to be the case necessarily. Consider for example hash partitioniong, which breaks a table down into a static number of partitions. With data growth over time, individual partitions will also grow in size and may eventually reach the threshold size again. We strive to get constant table sizes, but it is acceptable to ship easier solutions that don't have this characteristic but improve the situation for a considerable amount of time.
@@ -64,7 +64,7 @@ We already use Database Lab from [postgres.ai](https://postgres.ai/), which is a
Internally, this is based on ZFS and implements a "thin-cloning technology". That is, ZFS snapshots are being used to clone the data and it exposes a full read/write PostgreSQL cluster based on the cloned data. This is called a *thin clone*. It is rather short lived and is going to be destroyed again shortly after we are finished using it.
It is important to note that a thin clone is fully read/write. This allows us to execute migrations on top of it.
A thin clone is fully read/write. This allows us to execute migrations on top of it.
Database Lab provides an API we can interact with to manage thin clones. In order to automate the migration and query testing, we add steps to the `gitlab/gitlab-org` CI pipeline. This triggers automation that performs the following steps for a given merge request:
You can [disable comments](#disable-comments-on-jira-issues) on issues.
### Require associated Jira issue for merge requests to be merged
### Require associated Jira issue for merge requests to be merged **(ULTIMATE)**
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/280766) in [GitLab Ultimate](https://about.gitlab.com/pricing/) 13.12 behind a feature flag, disabled by default.
> - [Deployed behind a feature flag](../../user/feature_flags.md), disabled by default.