Commit 9006f046 authored by Gabriel Mazetto's avatar Gabriel Mazetto

Merge branch 'gitlab-geo-documentation' into 'master'

Geo documentation

documentation for GitLab Geo (#76)

Closes https://gitlab.com/gitlab-org/gitlab-ee/issues/356

See merge request !236
parents 0dba8e20 41933694
...@@ -86,6 +86,8 @@ be linked with your base image. Below is a list of examples you may use: ...@@ -86,6 +86,8 @@ be linked with your base image. Below is a list of examples you may use:
- [GitLab Pages configuration](pages/administration.md) - [GitLab Pages configuration](pages/administration.md)
- [Elasticsearch](integration/elasticsearch.md) Enable Elasticsearch - [Elasticsearch](integration/elasticsearch.md) Enable Elasticsearch
- [GitLab Performance Monitoring](monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics - [GitLab Performance Monitoring](monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics
- [GitLab GEO](administration/gitlab-geo/README.md) Configure GitLab GEO, a
secondary read-only GitLab instance
## Contributor documentation ## Contributor documentation
......
# GitLab Geo
> **Note:**
This feature was introduced in GitLab 8.5 EE.
GitLab Geo allows you to replicate your GitLab instance to other geographical
locations as a read-only fully operational version.
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Overview](#overview)
- [Setup instructions](#setup-instructions)
- [Current limitations](#current-limitations)
- [Frequently Asked Questions](#frequently-asked-questions)
- [Can I use Geo in a disaster recovery situation?](#can-i-use-geo-in-a-disaster-recovery-situation)
- [What data is replicated to a secondary node?](#what-data-is-replicated-to-a-secondary-node)
- [Can I git push to a secondary node?](#can-i-git-push-to-a-secondary-node)
- [How long does it take to have a commit replicated to a secondary node?](#how-long-does-it-take-to-have-a-commit-replicated-to-a-secondary-node)
- [What happens if the SSH server runs at a different port?](#what-happens-if-the-ssh-server-runs-at-a-different-port)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Overview
If you have two or more teams geographically spread out, but your GitLab
instance is in a single location, fetching large repositories can take a long
time.
Your Geo instance can be used for cloning and fetching projects, in addition to
reading any data. This will make working with large repositories over large
distances much faster.
![GitLab Geo overview](img/geo-overview.png)
When Geo is enabled, we refer to your original instance as a **primary** node
and the replicated read-only ones as **secondaries**.
Keep in mind that:
- Secondaries talk to primary to get user data for logins (API), and to
clone/pull from repositories (HTTP(S)/SSH).
- Primary talks to secondaries to notify for changes (API).
## Setup instructions
GitLab Geo requires some additional work installing and configuring your
instance, than a normal setup.
There are a couple of things you need to do in order to have one or more GitLab
Geo instances. Follow the steps below in the order that they appear:
1. Install GitLab Enterprise Edition on the server that will serve as the
secondary Geo node
1. [Setup a database replication](./database.md) in `master <-> slave` topology
1. [Configure GitLab](configuration.md) and set the primary and secondary nodes
After you set up the database replication and configure the GitLab Geo nodes,
there are a few things to consider:
1. When you create a new project in the primary node, the Git repository will
appear in the secondary only _after_ the first `git push`
1. To fetch from the secondary node, a separate remote URL must be set in your
Git repository locally
## Current limitations
- The secondary node cannot be used for browsing
- You cannot push code to secondary nodes
- Git LFS is not supported yet
- Git Annex is not supported yet
- Wiki's are not being replicated yet
- Git clone from secondaries by HTTP/HTTPS only (ssh-keys aren't being
replicated yet)
## Frequently Asked Questions
### Can I use Geo in a disaster recovery situation?
There are limitations to what we replicate (see Current limitations).
In an extreme data-loss situation you can make a secondary Geo into your
primary, but this is not officially supported yet.
### What data is replicated to a secondary node?
We currently replicate project repositories and the whole database. This
means user accounts, issues, merge requests, groups, project data, etc.,
will be available for query.
We currently don't replicate user generated attachments / avatars or any
other file in `public/upload`. We also don't replicate LFS / Annex or
artifacts data (`shared/folder`).
### Can I git push to a secondary node?
No. All writing operations (this includes `git push`) must be done in your
primary node.
### How long does it take to have a commit replicated to a secondary node?
All replication operations are asynchronous and are queued to be dispatched in
a batched request every 10 seconds. Besides that, it depends on a lot of other
factors including the amount of traffic, how big your commit is, the
connectivity between your nodes, your hardware, etc.
### What happens if the SSH server runs at a different port?
We send the clone url from the primary server to any secondaries, so it
doesn't matter. If primary is running on port `2200` clone url will reflect
that.
# GitLab Geo configuration
By now, you should have an [idea of GitLab Geo](README.md) and already set up
the [database replication](./database.md). There are a few more steps needed to
complete the process.
---
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
- [Repositories data replication](#repositories-data-replication)
- [Create SSH key pairs for Geo nodes](#create-ssh-key-pairs-for-geo-nodes)
- [Primary Node GitLab setup](#primary-node-gitlab-setup)
- [Secondary Node GitLab setup](#secondary-node-gitlab-setup)
- [Authorized keys regeneration](#authorized-keys-regeneration)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Repositories data replication
Getting a new secondary Geo node up and running, will also require the
repositories directory to be synced from the primary node. You can use `rsync`
for that. From the secondary node run:
```bash
# For Omnibus installations
rsync -avrP root@1.2.3.4:/var/opt/gitlab/git-data/repositories/ /var/opt/gitlab/git-data/repositories/
gitlab-ctl reconfigure # to fix directory permissions
# For installations from source
rsync -avrP root@1.2.3.4:/home/git/repositories/ /home/git/repositories/
chown -R git:git /home/git/repositories
chmod ug+rwX,o-rwx /home/git/repositories
```
where `1.2.3.4` is the IP of the primary node.
If this step is not followed, the secondary node will eventually clone and
fetch every missing repository as they are updated with new commits on the
primary node, so syncing the repositories beforehand will buy you some time.
## Create SSH key pairs for Geo nodes
When adding a Geo node you must provide an SSH public key of the user that your
GitLab instance runs on (unless changed, should be the user `git`). This user
will act as a "normal user" who fetches from the primary Geo node.
Run the command below on each server that will be a Geo node (primary or
secondary), and paste the contents of `id_rsa.pub` to the admin area of the
primary node (**Admin Area > Geo Nodes**) when adding a new one:
```bash
sudo -u git -H ssh-keygen
```
The public key for Omnibus installations will be at `/var/opt/gitlab/.ssh/id_rsa.pub`,
whereas for installation from source it will be at `/home/git/.ssh/id_rsa.pub`.
If for any reason you generate the key using a different name from the default
`id_rsa`, or you want to generate an extra key only for the repository
synchronization feature, you can do so, but you have to create/modify your
`~/.ssh/config` (for the `git` user).
This is an example on how to change the default key for all remote hosts:
```bash
Host * # Match all remote hosts
IdentityFile ~/.ssh/mycustom.key # The location of your private key
```
This is how to change it for an specific host:
```bash
Host example.com # The FQDN of the primary Geo node
HostName example.com # The FQDN of the primary Geo node
IdentityFile ~/.ssh/mycustom.key # The location of your private key
```
## Primary Node GitLab setup
>**Note:**
You will need to setup your database into a **Master <-> Slave** replication
topology, and your Primary node should always point to a database's Master
instance. If you haven't done that already, read [database replication](./database.md).
Go to the server that you chose to be your primary, and visit
**Admin Area > Geo Nodes** (`/admin/geo_nodes`) in order to add the Geo nodes.
Although we are looking at the primary Geo node setup, **this is where you also
add any secondary servers as well**.
In the following table you can see what all these settings mean:
| Setting | Description |
| ------- | ----------- |
| Primary | This marks a Geo Node as primary. There can be only one primary, make sure that you first add the primary node and then all the others.
| URL | Your instance's full URL, in the same way it is configured in `gitlab.yml` (source based installations) or `/etc/gitlab/gitlab.rb` (omnibus installations). |
|Public Key | The SSH public key of the user that your GitLab instance runs on (unless changed, should be the user `git`). That means that you have to go in each Geo Node separately and create an SSH key pair. See the [SSH key creation](#create-ssh-key-pairs-for-geo-nodes) section.
First, add your primary node by providing its full URL and the public SSH key
you created previously. Make sure to check the box 'This is a primary node'
when adding it. Continue with all secondaries.
![Geo Nodes Screen](img/geo-nodes-screen.png)
---
## Secondary Node GitLab setup
>**Note:**
The Geo nodes admin area (**Admin Area > Geo Nodes**) is not used when setting
up the secondary servers. This is handled by the primary server setup.
To install a secondary node, you must follow the normal GitLab Enterprise
Edition installation, with some extra requirements:
- You should point your database connection to a [replicated instance](./database.md).
- Your secondary node should be allowed to communicate via HTTP/HTTPS and
SSH with your primary node (make sure your firewall is not blocking that).
### Authorized keys regeneration
The final step will be to regenerate the keys for `.ssh/authorized_keys` using
the following command (HTTPS clone will still work without this extra step).
On the secondary node where the database is [already replicated](./database.md),
run the following:
```
# For Omnibus installations
gitlab-rake gitlab:shell:setup
# For source installations
sudo -u git -H bundle exec rake gitlab:shell:setup RAILS_ENV=production
```
# GitLab Geo database replication
This document describes the minimal steps you have to take in order to
replicate your GitLab database into another server. You may have to change
some values according to your database setup, how big it is, etc.
The GitLab primary node where the write operations happen will act as `master`,
and the secondary ones which are read-only will act as `slaves`.
>**Note:**
To be on par with GitLab's notation, we will use `primary` to denote the `master`
server, and `secondary` for the `slave`.
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
- [PostgreSQL replication](#postgresql-replication)
- [PostgreSQL - Configure the primary server](#postgresql-configure-the-primary-server)
- [PostgreSQL - Configure the secondary server](#postgresql-configure-the-secondary-server)
- [PostgreSQL - Initiate the replication process](#postgresql-initiate-the-replication-process)
- [MySQL replication](#mysql-replication)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## PostgreSQL replication
The following guide assumes that:
- You are using PostgreSQL 9.1 or later which includes the
[`pg_basebackup` tool][pgback]. As of this writing, the latest Omnibus
packages (8.5) have version 9.2.
- You have a primary server already set up, running PostgreSQL 9.2.x, and you
have a new secondary server set up on the same OS and PostgreSQL version. If
you are using Omnibus, make sure the GitLab version is the same on all nodes.
- The IP of the primary server for our examples will be `1.2.3.4`, whereas the
secondary's IP will be `5.6.7.8`.
[pgback]: http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
### PostgreSQL - Configure the primary server
**For installations from source**
1. Login as root and create a replication user:
```bash
sudo -u postgres psql -c "CREATE USER gitlab_replicator REPLICATION ENCRYPTED PASSWORD 'thepassword';"
```
1. Edit `postgresql.conf` to configure the primary server for streaming replication
(for Debian/Ubuntu that would be `/etc/postgresql/9.2/main/postgresql.conf`):
```bash
listen_address = '1.2.3.4'
wal_level = hot_standby
max_wal_senders = 5
checkpoint_segments = 10
wal_keep_segments = 10
hot_standby = on
```
Edit the `wal` values as you see fit.
1. Set the access control on the primary to allow TCP connections using the
server's public IP and set the connection from the secondary to require a
password. Edit `pg_hba.conf` (for Debian/Ubuntu that would be
`/etc/postgresql/9.2/main/pg_hba.conf`):
```bash
host all all 127.0.0.1/32 trust
host all all 1.2.3.4/32 trust
host replication gitlab_replicator 5.6.7.8/32 md5
```
Where `1.2.3.4` is the public IP address of the primary server, and `5.6.7.8`
the public IP address of the secondary one.
1. Restart PostgreSQL for the changes to take effect
---
**For Omnibus installations**
1. Omnibus GitLab has already a replicator user called `gitlab_replicator`.
You must set its password manually:
```bash
sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql \
-d template1 \
-c "ALTER USER gitlab_replicator WITH ENCRYPTED PASSWORD 'thepassword'"
```
1. Edit `/etc/gitlab/gitlab.rb` and add the following:
```ruby
postgresql['listen_address'] = "1.2.3.4"
postgresql['trust_auth_cidr_addresses'] = ['127.0.0.1/32','1.2.3.4/32']
postgresql['md5_auth_cidr_addresses'] = ['5.6.7.8/32']
postgresql['sql_replication_user'] = "gitlab_replicator"
postgresql['wal_level'] = "hot_standby"
postgresql['max_wal_senders'] = 10
postgresql['wal_keep_segments'] = 10
postgresql['hot_standby'] = "on"
```
Where `1.2.3.4` is the public IP address of the primary server, and `5.6.7.8`
the public IP address of the secondary one.
Edit the `wal` values as you see fit.
1. [Reconfigure GitLab][] for the changes to take effect.
---
Now that the PostgreSQL server is set up to accept remote connections, run
`netstat -plnt` to make sure that PostgreSQL is listening to the server's
public IP.
Test that the remote connection works by going to the secondary server and
running:
```
# For Omnibus installations
sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h 1.2.3.4 -U gitlab_replicator -d gitlabhq_production -W
# For source installations
sudo -u postgres psql -h 1.2.3.4 -U gitlab_replicator -d gitlabhq_production -W
```
When prompted enter the password you set in the first step for the
`gitlab_replicator` user. If all worked correctly, you should see the database
prompt.
### PostgreSQL - Configure the secondary server
**For installations from source**
1. Edit `postgresql.conf` to configure the secondary for streaming replication
(for Debian/Ubuntu that would be `/etc/postgresql/9.2/main/postgresql.conf`):
```bash
wal_level = hot_standby
max_wal_senders = 5
checkpoint_segments = 10
wal_keep_segments = 10
hot_standby = on
```
1. Restart PostgreSQL for the changes to take effect
---
**For Omnibus installations**
1. Edit `/etc/gitlab/gitlab.rb` and add the following:
```ruby
postgresql['wal_level'] = "hot_standby"
postgresql['max_wal_senders'] = 10
postgresql['wal_keep_segments'] = 10
postgresql['hot_standby'] = "on"
```
1. [Reconfigure GitLab][] for the changes to take effect.
### PostgreSQL - Initiate the replication process
Below we provide a script that connects to the primary server, replicates the
database and creates the needed files for replication.
The directories used are the defaults that are set up in Omnibus. Configure it
as you see fit replacing the directories and paths.
>**Warning:**
Make sure to run this on the _**secondary**_ server as it removes all PostgreSQL's
data before running `pg_basebackup`.
```bash
#!/bin/bash
PORT="5432"
USER="gitlab_replicator"
echo Enter ip of primary postgresql server
read HOST
echo Enter password for $USER@$HOST
read -s PASSWORD
echo Stopping PostgreSQL
gitlab-ctl stop
echo Backup postgresql.conf
sudo -u gitlab-psql mv /var/opt/gitlab/postgresql/data/postgresql.conf /var/opt/gitlab/postgresql/
echo Cleaning up old cluster directory
sudo -u gitlab-psql rm -rf /var/opt/gitlab/postgresql/data
rm -f /tmp/postgresql.trigger
echo Starting base backup as replicator
echo Enter password for $USER@$HOST
sudo -u gitlab-psql /opt/gitlab/embedded/bin/pg_basebackup -h $HOST -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -x -P
echo Writing recovery.conf file
sudo -u gitlab-psql bash -c "cat > /var/opt/gitlab/postgresql/data/recovery.conf <<- _EOF1_
standby_mode = 'on'
primary_conninfo = 'host=$HOST port=$PORT user=$USER password=$PASSWORD'
trigger_file = '/tmp/postgresql.trigger'
_EOF1_
"
echo Restore postgresql.conf
sudo -u gitlab-psql mv /var/opt/gitlab/postgresql/postgresql.conf /var/opt/gitlab/postgresql/data/
echo Starting PostgreSQL
gitlab-ctl start
```
When prompted, enter the password you set up for the `gitlab_replicator` user.
## MySQL replication
TODO
[reconfigure gitlab]: ../restart_gitlab.md#omnibus-gitlab-reconfigure
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment