Commit 3314933e authored by Grzegorz Bizon's avatar Grzegorz Bizon

Refine the proposal paragraph in CI scale blueprint

parent 55fd14f1
...@@ -25,12 +25,12 @@ builds is growing exponentially. We will run out of the available primary keys ...@@ -25,12 +25,12 @@ builds is growing exponentially. We will run out of the available primary keys
for builds before December 2021 unless we improve the database model used to for builds before December 2021 unless we improve the database model used to
store CI/CD data. store CI/CD data.
We expect to see 20M builds created daily on Gitlab.com in the first half of We expect to see 20M builds created daily on GitLab.com in the first half of
2024. 2024.
![ci_builds cumulative with forecast](ci_builds_cumulative_forecast.png) ![ci_builds cumulative with forecast](ci_builds_cumulative_forecast.png)
## Goal ## Goals
**Enable future growth by making processing 20M builds in a day possible.** **Enable future growth by making processing 20M builds in a day possible.**
...@@ -55,7 +55,8 @@ We will run out of the capacity of the integer type to store primary keys in ...@@ -55,7 +55,8 @@ We will run out of the capacity of the integer type to store primary keys in
workaround or an emergency plan, GitLab.com will go down. workaround or an emergency plan, GitLab.com will go down.
`ci_builds` is just one of the tables that are running out of the primary keys `ci_builds` is just one of the tables that are running out of the primary keys
available for Int4 sequence. available in Int4 sequence. There are multiple other tables storing CI/CD data
that have the same problem.
Primary keys problem will be tackled by our Database Team. Primary keys problem will be tackled by our Database Team.
...@@ -78,37 +79,39 @@ on. ...@@ -78,37 +79,39 @@ on.
The size of the table also hinders development velocity because queries that The size of the table also hinders development velocity because queries that
seem fine in the development environment may not work on GitLab.com. The seem fine in the development environment may not work on GitLab.com. The
difference in the dataset size between the environments makes it difficult to difference in the dataset size between the environments makes it difficult to
predict the performance of event the most simple queries. predict the performance of even the most simple queries.
We also expect a significant, exponential growth in the upcoming years. We also expect a significant, exponential growth in the upcoming years.
One of the forecasts done using [Facebook's One of the forecasts done using [Facebook's
Prophet](https://facebook.github.io/prophet/) shows that in the first half of Prophet](https://facebook.github.io/prophet/) shows that in the first half of
2024 we expect seeing 20M builds created on Gitlab.com each day. In comparison 2024 we expect seeing 20M builds created on GitLab.com each day. In comparison
to around 2M we see created today, this is 10x growth our product might need to to around 2M we see created today, this is 10x growth our product might need to
sustain in upcoming years. sustain in upcoming years.
![ci_builds daily forecast](ci_builds_daily_forecast.png) ![ci_builds daily forecast](ci_builds_daily_forecast.png)
### Queuing mechanisms using the large table ### Queuing mechanisms are using the large table
Because of how large the table is, mechanisms that we use to build queues of Because of how large the table is, mechanisms that we use to build queues of
pending builds, are not very efficient. Pending builds represent a small pending builds (there is more than one queue), are not very efficient. Pending
fraction of what we store in the `ci_builds` table, yet we need to find them in builds represent a small fraction of what we store in the `ci_builds` table,
this big dataset to determine an order in which we want to process them. yet we need to find them in this big dataset to determine an order in which we
want to process them.
This mechanism is very inefficient, and it has been causing problems on the This mechanism is very inefficient, and it has been causing problems on the
production environment frequently. This usually results in a significant drop production environment frequently. This usually results in a significant drop
of the CI/CD processing apdex score, and sometimes even causes a of the CI/CD apdex score, and sometimes even causes a significant performance
production-environment-wide performance degradation. degradation in the production environment.
There are multiple other strategies that can improve performance and There are multiple other strategies that can improve performance and
reliability. We can use [Redis reliability. We can use [Redis
queuing](https://gitlab.com/gitlab-org/gitlab/-/issues/322972), or [a separate queuing](https://gitlab.com/gitlab-org/gitlab/-/issues/322972), or [a separate
table that will accelerate SQL queries used to build table that will accelerate SQL queries used to build
queues](https://gitlab.com/gitlab-org/gitlab/-/issues/322766). queues](https://gitlab.com/gitlab-org/gitlab/-/issues/322766) and we want to
explore them.
### Moving this amount of data is challenging ### Moving big amounts of data is challenging
We store a significant amount of data in `ci_builds` table. Some of the columns We store a significant amount of data in `ci_builds` table. Some of the columns
in that table store a serialized user-provided data. Column `ci_builds.options` in that table store a serialized user-provided data. Column `ci_builds.options`
...@@ -136,30 +139,32 @@ environment. ...@@ -136,30 +139,32 @@ environment.
## Proposal ## Proposal
Making GitLab CI/CD product more suitable for the scale we expect to see in the Making GitLab CI/CD product ready for the scale we expect to see in the
upcoming years is a multi-phase effort. upcoming years is a multi-phase effort.
First, we want to focus on extending metrics we already have, to get a better First, we want to focus on things that are urgently needed right now. We need
sense of how the system performs and what is the growth trajectory. This will to fix primary keys overflow risk and unblock other teams that are working on
make it easier for us to identify bottlenecks and perform more advanced database partitioning and sharding.
capacity planning.
We want to also improve situation around bottlenecks that are known already, We want to improve situation around bottlenecks that are known already, like
like queuing mechanisms using the large table. queuing mechanisms using the large table and things that are holding other
teams back.
Migrating primary keys to Int8 is something that needs to happen in parallel, Extending CI/CD metrics is important to get a better sense of how the system
because although we might be able to resolve this problem for `ci_builds` using performs and to what growth should we expect. This will make it easier for us
partitioning or sharding, there are other CI/CD tables that need to be updated. to identify bottlenecks and perform more advanced capacity planning.
As we work on queuing and metrics we expect our Database Sharding team and As we work on first iterations we expect our Database Sharding team and
Database Scalability Working Group to make progress on patterns we will be able Database Scalability Working Group to make progress on patterns we will be able
to use to partition the large CI/CD dataset. We consider the strong time-decay, to use to partition the large CI/CD dataset. We consider the strong time-decay
related to he diminishing importance of pipelines with time, as an opportunity. effect, related to the diminishing importance of pipelines with time, as an
opportunity we might want to seize.
## Iterations ## Iterations
Work required to achieve our next CI/CD scaling target is tracked in the Work required to achieve our next CI/CD scaling target is tracked in the
following epic: https://gitlab.com/groups/gitlab-org/-/epics/5745. [GitLab CI/CD 20M builds per day scaling
target](https://gitlab.com/groups/gitlab-org/-/epics/5745) epic.
## Status ## Status
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment