Document that gitlab-org/gitlab no longer uses CI_PRE_CLONE_SCRIPT

This updates the developer documentation to reflect the fact that we stopped using CI_PRE_CLONE_SCRIPT to make the gitlab-org/gitlab CI Git fetch workload manageable.

Document that gitlab-org/gitlab no longer uses CI_PRE_CLONE_SCRIPT
This updates the developer documentation to reflect the fact that we stopped using CI_PRE_CLONE_SCRIPT to make the gitlab-org/gitlab CI Git fetch workload manageable.
3c9766d6 · Jacob Vosmaer · a09edca1 · 3c9766d6 · 3c9766d6
Commit 3c9766d6 authored Nov 03, 2021 by Jacob Vosmaer
Hide whitespace changes
Inline Side-by-side

Showing with 30 additions and 16 deletions

doc/ci/large_repositories/index.md doc/ci/large_repositories/index.md +12 -9

doc/development/pipelines.md doc/development/pipelines.md +18 -7

No files found.
--- a/doc/ci/large_repositories/index.md
+++ b/doc/ci/large_repositories/index.md
@@ -250,12 +250,15 @@ concurrent = 4
 This makes the cloning configuration to be part of the given runner
 and does not require us to update each `.gitlab-ci.yml`.
-## Pre-clone step
+## Git fetch caching or pre-clone step
-> [An issue exists](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/463) to remove the need for this optimization.
+For very active repositories with a large number of references and files, you can either (or both):
-For very active repositories with a large number of references and files, you can also
+- Consider using the [Gitaly pack-objects cache](../../administration/gitaly/configure_gitaly.md#pack-objects-cache) instead of a
-optimize your CI jobs by seeding repository data with GitLab Runner's [`pre_clone_script`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section).
+  pre-clone step. This is easier to set up and it benefits all repositories on your GitLab server, unlike the pre-clone step that
+  must be configured per-repository. The pack-objects cache also automatically works for forks. For `gitlab-org/gitlab` development
-See [our development documentation](../../development/pipelines.md#pre-clone-step) for
+  on GitLab.com, we stopped using a pre-clone step.
-an overview of how we implemented this approach on GitLab.com for the main GitLab repository.
+- Optimize your CI/CD jobs by seeding repository data in a pre-clone step with the
+  [`pre_clone_script`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section) of GitLab Runner. See our
+  [development documentation](../../development/pipelines.md#pre-clone-step) for an overview of how we used to implement this approach on
+  GitLab.com for the main GitLab repository.
--- a/doc/development/pipelines.md
+++ b/doc/development/pipelines.md
@@ -791,19 +791,30 @@ request, be sure to start the `dont-interrupt-me` job before pushing.
 We limit the artifacts that are saved and retrieved by jobs to the minimum in order to reduce the upload/download time and costs, as well as the artifacts storage.
-### Pre-clone step
+### Git fetch caching
+Because GitLab.com uses the [pack-objects cache](../administration/gitaly/configure_gitaly.md#pack-objects-cache),
+concurrent Git fetches of the same pipeline ref are deduplicated on
+the Gitaly server (always) and served from cache (when available).
+This works well for the following reasons:
-The `gitlab-org/gitlab` project on GitLab.com uses a [pre-clone step](https://gitlab.com/gitlab-org/gitlab/-/issues/39134)
+- The pack-objects cache is enabled on all Gitaly servers on GitLab.com.
-to seed the project with a recent archive of the repository. This is done for
+- The CI/CD [Git strategy setting](../ci/pipelines/settings.md#choose-the-default-git-strategy) for `gitlab-org/gitlab` is **Git clone**,
-several reasons:
+  causing all jobs to fetch the same data, which maximizes the cache hit ratio.
+- We use [shallow clone](../ci/pipelines/settings.md#limit-the-number-of-changes-fetched-during-clone) to avoid downloading the full Git
+  history for every job.
+### Pre-clone step
- It speeds up builds because a 800 MB download only takes seconds, as opposed to a full Git clone.
+NOTE:
- It significantly reduces load on the file server, as smaller deltas mean less time spent in `git pack-objects`.
+We no longer use this optimization for `gitlab-org/gitlab` because the [pack-objects cache](../administration/gitaly/configure_gitaly.md#pack-objects-cache)
+allows Gitaly to serve the full CI/CD fetch traffic now. See [Git fetch caching](#git-fetch-caching).
 The pre-clone step works by using the `CI_PRE_CLONE_SCRIPT` variable
 [defined by GitLab.com shared runners](../ci/runners/build_cloud/linux_build_cloud.md#pre-clone-script).
-The `CI_PRE_CLONE_SCRIPT` is currently defined as a project CI/CD variable:
+The `CI_PRE_CLONE_SCRIPT` is defined as a project CI/CD variable:
 ```shell
 (