Commit 046a35cc authored by Russell Dickenson's avatar Russell Dickenson

Merge branch 'eread/refactor-repository-storage-types-docs' into 'master'

Refactor part of repository storage types documentation

See merge request gitlab-org/gitlab!54655
parents e1d5c208 ef423e92
...@@ -428,7 +428,7 @@ To solve this: ...@@ -428,7 +428,7 @@ To solve this:
1. Log into the secondary Geo node. 1. Log into the secondary Geo node.
1. Back up [the `.git` folder](../../repository_storage_types.md#translating-hashed-storage-paths). 1. Back up [the `.git` folder](../../repository_storage_types.md#translate-hashed-storage-paths).
1. Optional: [Spot-check](../../troubleshooting/log_parsing.md#find-all-projects-affected-by-a-fatal-git-problem)) 1. Optional: [Spot-check](../../troubleshooting/log_parsing.md#find-all-projects-affected-by-a-fatal-git-problem))
a few of those IDs whether they indeed correspond a few of those IDs whether they indeed correspond
......
...@@ -13,7 +13,9 @@ GitLab stores [repositories](../user/project/repository/index.md) on repository ...@@ -13,7 +13,9 @@ GitLab stores [repositories](../user/project/repository/index.md) on repository
storage is either: storage is either:
- A `gitaly_address`, which points to a [Gitaly node](gitaly/index.md). - A `gitaly_address`, which points to a [Gitaly node](gitaly/index.md).
- A `path`, which points directly a directory where the repository is stored. - A `path`, which points directly a directory where the repositories are stored. This method is
deprecated and [scheduled to be removed](https://gitlab.com/gitlab-org/gitaly/-/issues/1690) in
GitLab 14.0.
GitLab allows you to define multiple repository storages to distribute the storage load between GitLab allows you to define multiple repository storages to distribute the storage load between
several mount points. For example: several mount points. For example:
......
...@@ -7,51 +7,53 @@ type: reference, howto ...@@ -7,51 +7,53 @@ type: reference, howto
# Repository storage types **(FREE SELF)** # Repository storage types **(FREE SELF)**
> - [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/28283) in GitLab 10.0. GitLab can be configured to use one or multiple repository storages. These storages can be:
> - Hashed storage became the default for new installations in GitLab 12.0
> - Hashed storage is enabled by default for new and renamed projects in GitLab 13.0.
GitLab can be configured to use one or multiple repository storage paths/shard - Accessed via [Gitaly](gitaly/index.md), optionally on
locations that can be: [its own server](gitaly/index.md#run-gitaly-on-its-own-server).
- Mounted to the local disk. This [method](repository_storage_paths.md#configure-repository-storage-paths)
is deprecated and [scheduled to be removed](https://gitlab.com/groups/gitlab-org/-/epics/2320) in
GitLab 14.0.
- Exposed as an NFS shared volume. This method is deprecated and
[scheduled to be removed](https://gitlab.com/groups/gitlab-org/-/epics/3371) in GitLab 14.0.
- Mounted to the local disk In GitLab:
- Exposed as an NFS shared volume
- Accessed via [Gitaly](gitaly/index.md) on its own machine.
In GitLab, this is configured in `/etc/gitlab/gitlab.rb` by the `git_data_dirs({})` - Repository storages are configured in:
configuration hash. The storage layouts discussed here apply to any shard - `/etc/gitlab/gitlab.rb` by the `git_data_dirs({})` configuration hash for Omnibus GitLab
defined in it. installations.
- `gitlab.yml` by the `repositories.storages` key for installations from source.
- The `default` repository storage is available in any installations that haven't customized it. By
default, it points to a Gitaly node.
The `default` repository shard that is available in any installations The repository storage types documented here apply to any repository storage defined in
that haven't customized it, points to the local folder: `/var/opt/gitlab/git-data`. `git_data_dirs({})` or `repositories.storages`.
Anything discussed below is expected to be part of that folder.
## Hashed storage ## Hashed storage
NOTE: > - [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/28283) in GitLab 10.0.
In GitLab 13.0, hashed storage is enabled by default and the legacy storage is > - Made the default for new installations in GitLab 12.0.
deprecated. Support for legacy storage is scheduled to be removed in GitLab 14.0. > - Enabled by default for new and renamed projects in GitLab 13.0.
If you haven't migrated yet, check the
[migration instructions](raketasks/storage.md#migrate-to-hashed-storage). Hashed storage stores projects on disk in a location based on a hash of the project's ID. Hashed
The option to choose between hashed and legacy storage in the admin area has storage is different to [legacy storage](#legacy-storage) where a project is stored based on:
been disabled.
- The project's URL.
- The folder structure where the repository is stored on disk.
This makes the folder structure immutable and eliminates the need to synchronize state from URLs to
disk structure. This means that renaming a group, user, or project:
Hashed storage is the storage behavior we rolled out with 10.0. Instead - Costs only the database transaction.
of coupling project URL and the folder structure where the repository is - Takes effect immediately.
stored on disk, we couple a hash based on the project's ID. This makes
the folder structure immutable, and therefore eliminates any requirement to
synchronize state from URLs to disk structure. This means that renaming a group,
user, or project costs only the database transaction, and takes effect
immediately.
The hash also helps spread the repositories more evenly on the disk. The The hash also helps spread the repositories more evenly on the disk. The top-level directory
top-level directory contains fewer folders than the total number of top-level contains fewer folders than the total number of top-level namespaces.
namespaces.
The hash format is based on the hexadecimal representation of SHA256: The hash format is based on the hexadecimal representation of a SHA256, calculated with
`SHA256(project.id)`. The top-level folder uses the first 2 characters, followed `SHA256(project.id)`. The top-level folder uses the first two characters, followed by another folder
by another folder with the next 2 characters. They are both stored in a special with the next two characters. They are both stored in a special `@hashed` folder so they can
`@hashed` folder, to be able to co-exist with existing Legacy Storage projects: co-exist with existing legacy storage projects. For example:
```ruby ```ruby
# Project's repository: # Project's repository:
...@@ -61,53 +63,59 @@ by another folder with the next 2 characters. They are both stored in a special ...@@ -61,53 +63,59 @@ by another folder with the next 2 characters. They are both stored in a special
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git" "@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
``` ```
### Translating hashed storage paths ### Translate hashed storage paths
Troubleshooting problems with the Git repositories, adding hooks, and other Troubleshooting problems with the Git repositories, adding hooks, and other tasks requires you
tasks requires you translate between the human readable project name translate between the human-readable project name and the hashed storage path. You can translate:
and the hashed storage path.
- From a [project's name to its hashed path](#from-project-name-to-hashed-path).
- From a [hashed path to a project's name](#from-hashed-path-to-project-name).
#### From project name to hashed path #### From project name to hashed path
The hashed path is shown on the project's page in the [admin area](../user/admin_area/index.md#administering-projects). Administrators can look up a project's hashed path from its name or ID using:
- The [Admin area](../user/admin_area/index.md#administering-projects).
- A Rails console.
To access the Projects page, go to **Admin Area > Overview > Projects** and then To look up a project's hash path in the Admin Area:
open up the page for the project.
The "Gitaly relative path" is shown there, for example: 1. Go to the **Admin Area** (**{admin}**).
1. Go to **Overview > Projects** and select the project.
The **Gitaly relative path** is displayed there and looks similar to:
```plaintext ```plaintext
"@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git" "@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git"
``` ```
This is the path under `/var/opt/gitlab/git-data/repositories/` on a To look up a project's hash path using a Rails console:
default Omnibus installation.
In a [Rails console](operations/rails_console.md#starting-a-rails-console-session), 1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session).
get this information using either the numeric project ID or the full path: 1. Run a command similar to this example (use either the project's ID or its name):
```ruby ```ruby
Project.find(16).disk_path Project.find(16).disk_path
Project.find_by_full_path('group/project').disk_path Project.find_by_full_path('group/project').disk_path
``` ```
#### From hashed path to project name #### From hashed path to project name
To translate from a hashed storage path to a project name: Administrators can look up a project's name from its hashed storage path using a Rails console. To
look up a project's name from its hashed storage path:
1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session). 1. Start a [Rails console](operations/rails_console.md#starting-a-rails-console-session).
1. Run the following: 1. Run a command similar to this example:
```ruby ```ruby
ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project
``` ```
The quoted string in that command is the directory tree you can find on your The quoted string in that command is the directory tree you can find on your GitLab server. For
GitLab server. For example, on a default Omnibus installation this would be example, on a default Omnibus installation this would be `/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git`
`/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git`
with `.git` from the end of the directory name removed. with `.git` from the end of the directory name removed.
The output includes the project ID and the project name: The output includes the project ID and the project name. For example:
```plaintext ```plaintext
=> #<Project id:16 it/supportteam/ticketsystem> => #<Project id:16 it/supportteam/ticketsystem>
......
...@@ -54,7 +54,7 @@ Follow the steps below to set up a server-side hook for a repository: ...@@ -54,7 +54,7 @@ Follow the steps below to set up a server-side hook for a repository:
1. Navigate to **Admin area > Projects** and click on the project you want to add a server hook to. 1. Navigate to **Admin area > Projects** and click on the project you want to add a server hook to.
1. Locate the **Gitaly relative path** on the page that appears. This is where the server hook 1. Locate the **Gitaly relative path** on the page that appears. This is where the server hook
must be implemented. For information on interpreting the relative path, see must be implemented. For information on interpreting the relative path, see
[Translating hashed storage paths](repository_storage_types.md#translating-hashed-storage-paths). [Translate hashed storage paths](repository_storage_types.md#translate-hashed-storage-paths).
1. On the file system, create a new directory in this location called `custom_hooks`. 1. On the file system, create a new directory in this location called `custom_hooks`.
1. Inside the new `custom_hooks` directory, create a file with a name matching the hook type. For 1. Inside the new `custom_hooks` directory, create a file with a name matching the hook type. For
example, for a pre-receive hook the filename should be `pre-receive` with no extension. example, for a pre-receive hook the filename should be `pre-receive` with no extension.
...@@ -128,7 +128,7 @@ Any other names are ignored. ...@@ -128,7 +128,7 @@ Any other names are ignored.
Files in `.d` directories must be executable and not match the backup file pattern (`*~`). Files in `.d` directories must be executable and not match the backup file pattern (`*~`).
For `<project>.git` you need to [translate](repository_storage_types.md#translating-hashed-storage-paths) For `<project>.git` you need to [translate](repository_storage_types.md#translate-hashed-storage-paths)
your project name into the hashed storage format that GitLab uses. your project name into the hashed storage format that GitLab uses.
## Environment Variables ## Environment Variables
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment