Large tables on GitLab.com are a major problem - for both operations and development. They cause a variety of problems:
Large tables on GitLab.com are a major problem - for both operations and development. They cause a variety of problems:
1.**Query timings** and hence overall application performance suffers
1.**Query timings** and hence overall application performance suffers
1.**Table maintenance** becomes much more costly. Vacuum activity has become a significant concern on GitLab.com - with large tables only seeing infrequent (e.g. once per day) and vacuum runs taking many hours to complete. This has various negative consequences and a very large table has potential to impact seemingly unrelated parts of the database and hence overall application performance suffers.
1.**Table maintenance** becomes much more costly. Vacuum activity has become a significant concern on GitLab.com - with large tables only seeing infrequent (once per day) processing and vacuum runs taking many hours to complete. This has various negative consequences and a very large table has potential to impact seemingly unrelated parts of the database and hence overall application performance suffers.
1.**Data migrations** on large tables are significantly more complex to implement and incur development overhead. They have potential to cause stability problems on GitLab.com and take a long time to execute on large datasets.
1.**Data migrations** on large tables are significantly more complex to implement and incur development overhead. They have potential to cause stability problems on GitLab.com and take a long time to execute on large datasets.
1.**Indexes size** is significant. This directly impacts performance as smaller parts of the index are kept in memory and also makes the indexes harder to maintain (think repacking).
1.**Indexes size** is significant. This directly impacts performance as smaller parts of the index are kept in memory and also makes the indexes harder to maintain (think repacking).
1.**Index creation times** go up significantly - in 2021, we see btree creation take up to 6 hours for a single btree index. This impacts our ability to deploy frequently and leads to vacuum-related problems (delayed cleanup).
1.**Index creation times** go up significantly - in 2021, we see btree creation take up to 6 hours for a single btree index. This impacts our ability to deploy frequently and leads to vacuum-related problems (delayed cleanup).
...
@@ -141,7 +141,7 @@ There is no standard solution to reduce table sizes - there are many!
...
@@ -141,7 +141,7 @@ There is no standard solution to reduce table sizes - there are many!
1.**Partitioning**: Apply a partitioning scheme if there is a common access dimension.
1.**Partitioning**: Apply a partitioning scheme if there is a common access dimension.
1.**Normalization**: Review relational modeling and apply normalization techniques to remove duplicate data
1.**Normalization**: Review relational modeling and apply normalization techniques to remove duplicate data
1.**Vertical table splits**: Review column usage and split table vertically.
1.**Vertical table splits**: Review column usage and split table vertically.
1.**Externalize**: Move large data types out of the database entirely. For example, JSON documents, especially when not used for filtering, may be better stored outside the database, e.g. in object storage.
1.**Externalize**: Move large data types out of the database entirely. For example, JSON documents, especially when not used for filtering, may be better stored outside the database, for example, in object storage.
NOTE:
NOTE:
While we're targeting to limit physical table sizes, we consider retaining or improving performance a goal, too.
While we're targeting to limit physical table sizes, we consider retaining or improving performance a goal, too.
@@ -40,7 +40,7 @@ can't be terminated and its memory usage grows over time.
...
@@ -40,7 +40,7 @@ can't be terminated and its memory usage grows over time.
## Select a version to install
## Select a version to install
Make sure you view [this installation guide](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/install/installation.md) from the branch (version) of GitLab you would like to install (e.g., `11-7-stable`).
Make sure you view [this installation guide](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/install/installation.md) from the branch (version) of GitLab you would like to install (for example, `11-7-stable`).
You can select the branch in the version dropdown in the top left corner of GitLab (below the menu bar).
You can select the branch in the version dropdown in the top left corner of GitLab (below the menu bar).
If the highest number stable branch is unclear, check the [GitLab blog](https://about.gitlab.com/blog/) for installation guide links by version.
If the highest number stable branch is unclear, check the [GitLab blog](https://about.gitlab.com/blog/) for installation guide links by version.
@@ -139,7 +139,7 @@ If you're using Cloudflare, check
...
@@ -139,7 +139,7 @@ If you're using Cloudflare, check
> - **Do not** use a CNAME record if you want to point your
> - **Do not** use a CNAME record if you want to point your
`domain.com` to your GitLab Pages site. Use an `A` record instead.
`domain.com` to your GitLab Pages site. Use an `A` record instead.
> - **Do not** add any special chars after the default Pages
> - **Do not** add any special chars after the default Pages
domain. E.g., don't point `subdomain.domain.com` to
domain. For example, don't point `subdomain.domain.com` to
or `namespace.gitlab.io/`. Some domain hosting providers may request a trailing dot (`namespace.gitlab.io.`), though.
or `namespace.gitlab.io/`. Some domain hosting providers may request a trailing dot (`namespace.gitlab.io.`), though.
> - GitLab Pages IP on GitLab.com [was changed](https://about.gitlab.com/releases/2017/03/06/we-are-changing-the-ip-of-gitlab-pages-on-gitlab-com/) in 2017.
> - GitLab Pages IP on GitLab.com [was changed](https://about.gitlab.com/releases/2017/03/06/we-are-changing-the-ip-of-gitlab-pages-on-gitlab-com/) in 2017.
> - GitLab Pages IP on GitLab.com [has changed](https://about.gitlab.com/blog/2018/07/19/gcp-move-update/#gitlab-pages-and-custom-domains)
> - GitLab Pages IP on GitLab.com [has changed](https://about.gitlab.com/blog/2018/07/19/gcp-move-update/#gitlab-pages-and-custom-domains)
...
@@ -315,6 +315,6 @@ important to describe those, too. Think of things that may go wrong and include
...
@@ -315,6 +315,6 @@ important to describe those, too. Think of things that may go wrong and include
This is important to minimize requests for support, and to avoid doc comments with
This is important to minimize requests for support, and to avoid doc comments with
questions that you know someone might ask.
questions that you know someone might ask.
Each scenario can be a third-level heading, e.g.`### Getting error message X`.
Each scenario can be a third-level heading, for example,`### Getting error message X`.
If you have none to add when creating a doc, leave this section in place
If you have none to add when creating a doc, leave this section in place
but commented out to help encourage others to add to it in the future. -->
but commented out to help encourage others to add to it in the future. -->