Commit f512dbe6 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Clean-up git-annex documentation

- Rearrange sections
- Limit lines to 80 chars
- Fix typos, grammar, etc.

[ci skip]
parent fd7eb717
# Git annex
The biggest limitation of git compared to some older centralized version control systems has been the maximum size of the repositories.
The general recommendation is to not have git repositories larger than 1GB to preserve performance.
Although GitLab has no limit (some repositories in GitLab are over 50GB!) we subscribe to the advise to keep repositories as small as you can.
The biggest limitation of Git, compared to some older centralized version
control systems, has been the maximum size of the repositories.
The general recommendation is to not have Git repositories larger than 1GB to
preserve performance. Although GitLab has no limit (some repositories in GitLab
are over 50GB!), we subscribe to the advise to keep repositories as small as
you can.
Not being able to version control large binaries is a big problem for many
larger organizations.
Videos, photos, audio, compiled binaries and many other types of files are too
large. As a workaround, people keep artwork-in-progress in a Dropbox folder and
only check in the final result. This results in using outdated files, not
having a complete history and the risk of losing work.
This problem is solved in GitLab Enterprise Edition by integrating the
[git-annex] application.
`git-annex` allows managing large binaries with Git without checking the
contents into Git.
You check-in only a symlink that contains the SHA-1 of the large binary. If you
need the large binary, you can sync it from the GitLab server over `rsync`, a
very fast file copying tool.
Not being able to version control large binaries is a big problem for many larger organizations.
Video, photo's, audio, compiled binaries and many other types of files are too large.
As a workaround, people keep artwork-in-progress in a Dropbox folder and only check in the final result.
This results in using outdated files, not having a complete history and the risk of losing work.
## GitLab git-annex Configuration
This problem is solved by integrating the awesome [git-annex](https://git-annex.branchable.com/).
Git-annex allows managing large binaries with git, without checking the contents into git.
You check in only a symlink that contains the SHA-1 of the large binary.
If you need the large binary you can sync it from the GitLab server over rsync, a very fast file copying tool.
`git-annex` is disabled by default in GitLab. Below you will find the
configuration options required to enable it.
<!-- more -->
### Requirements
## Using GitLab git-annex
`git-annex` needs to be installed both on the server and the client side.
For example, if you want to upload a very large file and check it into your Git repository:
For Debian-like systems (eg., Debian, Ubuntu) this can be achieved by running:
```bash
git clone git@gitlab.example.com:group/project.git
git annex init 'My Laptop' # initialize the annex project
cp ~/tmp/debian.iso ./ # copy a large file into the current directory
git annex add . # add the large file to git annex
git commit -am"Added Debian iso" # commit the file meta data
git annex sync --content # sync the git repo and large file to the GitLab server
```
Downloading a single large file is also very simple:
```bash
git clone git@gitlab.example.com:group/project.git
git annex sync # sync git branches but not the large file
git annex get debian.iso # download the large file
sudo apt-get update && sudo apt-get install git-annex
```
To download all files:
For RedHat-like systems (eg., CentOS, RHEL) this can be achieved by running:
```bash
git clone git@gitlab.example.com:group/project.git
git annex sync --content # sync git branches and download all the large files
```
sudo yum install epel-release && sudo yum install git-annex
```
You don't have to setup git-annex on a separate server or add annex remotes to the repository.
Git-annex without GitLab gives everyone that can access the server access to the files of all projects.
GitLab annex ensures you can only acces files of projects you work on (developer, master or owner role).
### Configuration for Omnibus packages
## GitLab git-annex Configuration
For omnibus-gitlab packages, only one configuration setting is needed.
The Omnibus package will internally set the correct options in all locations.
### Requirements
1. In `/etc/gitlab/gitlab.rb` add the following line:
Git-annex needs to be installed both on the server and the client side.
```ruby
gitlab_shell['git_annex_enabled'] = true
```
For Debian-like systems (eg., Debian, Ubuntu) this can be achieved by running: `sudo apt-get update && sudo apt-get install git-annex`.
1. Save the file and
[reconfigure GitLab](../administration/restart_gitlab.md#omnibus-gitlab-reconfigure)
for the changes to take effect.
For RedHat-like systems (eg., CentOS, RHEL) this can be achieved by running `sudo yum install epel-release && sudo yum install git-annex`
### Configuration for installations from source
### Configuration
There are 2 settings to enable git-annex on your GitLab server.
By default, git-annex is disabled in GitLab.
One is located in `config/gitlab.yml` of the GitLab repository and the other
one is located in `config.yml` of gitlab-shell.
There are two configuration options required to enable git-annex.
1. In `config/gitlab.yml` add or edit the following lines:
### Omnibus packages
```yaml
gitlab_shell:
git_annex_enabled: true
```
For omnibus-gitlab packages only one configuration setting is needed.
Package will internally set the correct options in all locations.
1. In `config.yml` of gitlab-shell add or edit the following lines:
In `/etc/gitlab/gitlab.rb`:
```yaml
git_annex_enabled: true
```
```ruby
gitlab_shell['git_annex_enabled'] = true
```
1. Save the files and
[restart GitLab](administration/restart_gitlab.md#installations-from-source)
for the changes to take effect.
save the file and [reconfigure GitLab](administration/restart_gitlab.md#omnibus-gitlab-reconfigure)
for the changes to take effect.
## Using GitLab git-annex
### Installations from source
_**Important note:** Your Git remotes must be use the SSH protocol, not HTTP._
There are 2 settings to enable git-annex on your GitLab server.
One is located in `config/gitlab.yml` of GitLab repository and the other one
is located in `config.yml` of gitlab-shell.
Here is an example workflow of uploading a very large file and then checking it
into your Git repository:
In `config/gitlab.yml`:
```bash
git clone git@gitlab.example.com:group/project.git
git annex init 'My Laptop' # initialize the annex project
cp ~/tmp/debian.iso ./ # copy a large file into the current directory
git annex add . # add the large file to git annex
git commit -am "Add Debian iso" # commit the file meta data
git annex sync --content # sync the git repo and large file to the GitLab server
```
```yaml
gitlab_shell:
git_annex_enabled: true
Downloading a single large file is also very simple:
```bash
git clone git@gitlab.example.com:group/project.git
git annex sync # sync git branches but not the large file
git annex get debian.iso # download the large file
```
and in `config.yml` in gitlab-shell:
To download all files:
```yaml
git_annex_enabled: true
```bash
git clone git@gitlab.example.com:group/project.git
git annex sync --content # sync git branches and download all the large files
```
save the files and [restart GitLab](administration/restart_gitlab.md#installations-from-source)
for the changes to take effect.
You don't have to setup `git-annex` on a separate server or add annex remotes
to the repository.
By using `git-annex` without GitLab, anyone that can access the server can also
access the files of all projects.
GitLab annex ensures that you can only access files of projects you have access
of (developer, master or owner role).
## How it works
Internally GitLab uses [GitLab Shell](https://gitlab.com/gitlab-org/gitlab-shell) to handle ssh access and this was a great integration point for git-annex.
We've added a setting to GitLab Shell so you can disable GitLab Annex support if you don't want it.
Internally GitLab uses [GitLab Shell] to handle SSH access and this was a great
integration point for `git-annex`.
There is a setting in gitlab-shell so you can disable GitLab Annex support
if you want to.
You'll have to use ssh style links for to git remote to your GitLab server instead of https style links.
_**Important note:** Your Git remotes must be use the SSH protocol, not HTTP._
## Troubleshooting tips
Differences in version of `git-annex` on `GitLab` server and on local machine can cause `git-annex` to raise unpredicted warnings and errors.
Although there is no general guide for `git-annex` errors, there are a few tips on how to go arround the warnings.
Differences in version of `git-annex` on the GitLab server and on local machines
can cause `git-annex` to raise unpredicted warnings and errors.
Although there is no general guide for `git-annex` errors, there are a few tips
on how to go around the warnings.
### git-annex-shell: Not a git-annex or gcrypt repository.
This warning can appear on inital `git annex sync --content`. This is caused by differences in `git-annex-shell`, read more about it in [this git-annex issue](https://git-annex.branchable.com/forum/Error_from_git-annex-shell_on_creation_of_gcrypt_special_remote/).
This warning can appear on the initial `git annex sync --content` and is caused
by differences in `git-annex-shell`. You can read more about it
[in this git-annex issue][issue].
One important thing to note, is that despite the warning, the `sync` succeeds
and the files are pushed to the GitLab repository.
Important thing to note is that the `sync` succeeds and the files are pushed to the GitLab repository. After this warning it is required to do:
If you get hit by this, you can run the following command inside the repository
that the warning was raised:
```
git config remote.origin.annex-ignore false
```
in the repository that was pushed.
Consecutive `git annex sync --content` **should not** produce this warning and the output should look like this:
Consecutive runs of `git annex sync --content` **should not** produce this
warning and the output should look like this:
```
commit ok
......@@ -134,3 +168,7 @@ pull origin
ok
push origin
```
[gitlab shell]: https://gitlab.com/gitlab-org/gitlab-shell "GitLab Shell repository"
[issue]: https://git-annex.branchable.com/forum/Error_from_git-annex-shell_on_creation_of_gcrypt_special_remote/ "git-annex issue"
[git-annex]: https://git-annex.branchable.com/ "git-annex website"
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment