Commit 9273de85 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'geo-docs-8-13x' into 'master'

Improve Geo documentation for 8.13

This merge-request was initiated after feedback from costumers about our documentation.

It's a bunch of small fixes to add extra troubleshooting and improve the understanding of the required steps.

# Documentation changes/added:

- [x] Add "Host key verification" troubleshooting doc for Geo
- [x] Update rsync instructions to preserve repository user/group ownership
- [x] Communicate better the setup steps and the correct order they must happen

cc @dewetblomerus @marin

See merge request !766
parents 9e37d1e8 cae6e93e
......@@ -20,8 +20,6 @@ locations as a read-only fully operational version.
- [How long does it take to have a commit replicated to a secondary node?](#how-long-does-it-take-to-have-a-commit-replicated-to-a-secondary-node)
- [What happens if the SSH server runs at a different port?](#what-happens-if-the-ssh-server-runs-at-a-different-port)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Overview
If you have two or more teams geographically spread out, but your GitLab
......@@ -49,19 +47,35 @@ GitLab Geo requires some additional work installing and configuring your
instance, than a normal setup.
There are a couple of things you need to do in order to have one or more GitLab
Geo instances. Follow the steps below in the order that they appear:
Geo instances. Follow the steps below in the **exact order** that they appear:
1. Follow the instructions to [install GitLab Enterprise Edition][install-ee]
on the server that will serve as the secondary Geo node, but don't further
configure GitLab as authentication will be handled by the primary node (more
on this in the configuration step).
1. [Setup a database replication](database.md) in `primary <-> secondary (read-only)` topology.
1. [Configure GitLab](configuration.md) and set the primary and secondary nodes.
## After setup
1. Install GitLab Enterprise Edition on the server that will serve as the
secondary Geo node
1. [Setup a database replication](database.md) in `primary <-> secondary (read-only)` topology
1. [Configure GitLab](configuration.md) and set the primary and secondary nodes
After you set up the database replication and configure the GitLab Geo nodes,
there are a few things to consider:
1. When you create a new project in the primary node, the Git repository will
appear in the secondary only _after_ the first `git push`
1. To fetch from the secondary node, a separate remote URL must be set in your
Git repository locally
appear in the secondary only _after_ the first `git push`.
1. You need an extra step to be able to fetch code from the `secondary` and push
to `primary`:
1. Clone your repository as you would normally do from the `secondary` node
1. Change the remote push URL following this example:
```bash
git remote set-url --push origin git@primary.gitlab.example.com:user/repo.git
```
> **Important**: The initialization of a new Geo secondary node requires data
to be copied from the primary, as there is no backfill feature bundled with it.
See more details in the [Configure GitLab](configuration.md) step.
## Current limitations
......@@ -107,3 +121,5 @@ connectivity between your nodes, your hardware, etc.
We send the clone url from the primary server to any secondaries, so it
doesn't matter. If primary is running on port `2200` clone url will reflect
that.
[install-ee]: https://about.gitlab.com/downloads-ee/
# GitLab Geo configuration
By now, you should have an [idea of GitLab Geo](README.md) and already set up
the [database replication](./database.md). There are a few more steps needed to
complete the process.
> **Important:**
Make sure you have followed the first two steps of the
[Setup instructions](README.md#setup-instructions).
After having installed GitLab Enterprise Edition in the instance that will serve
as a Geo node and set up the database replication, the next steps can be summed
up to:
1. Configure the primary node
1. Replicate some required configurations between the primary and the secondaries
1. Start GitLab in the secondary node's machine
1. Configure every secondary node in the primary's Admin screen
After GitLab's instance is online and defined in **Geo Nodes** admin screen,
new data will start to be automatically replicated, but you still need to copy
old data from the primary machine (more information below).
## Primary node GitLab setup
>**Notes:**
- You will need to setup your database into a **Primary <-> Secondary (read-only)** replication
topology, and your Primary node should always point to a database's Primary
instance. If you haven't done that already, read [database replication](./database.md).
- Only in the Geo nodes admin area of the primary node, will you be adding all
nodes' information (secondary and primary). Do not add anything in the Geo
nodes admin area of the secondaries.
To setup the primary node:
1. [Create the SSH key pair][ssh-pair] for the primary node.
1. Visit the primary node's **Admin Area > Geo Nodes** (`/admin/geo_nodes`).
1. Add your primary node by providing its full URL and the public SSH key
you created previously. Make sure to check the box 'This is a primary node'
when adding it.
![Add new primary Geo node](img/geo_nodes_add_new.png)
---
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
>**Note:**
Don't set anything up for the `secondary` node yet, make sure to follow the
[Secondary node GitLab setup](#secondary-node-gitlab-setup) first.
In the following table you can see what all these settings mean:
| Setting | Description |
| --------- | ----------- |
| Primary | This marks a Geo Node as primary. There can be only one primary, make sure that you first add the primary node and then all the others. |
| URL | Your instance's full URL, in the same way it is configured in `gitlab.yml` (source based installations) or `/etc/gitlab/gitlab.rb` (omnibus installations). |
|Public Key | The SSH public key of the user that your GitLab instance runs on (unless changed, should be the user `git`). That means that you have to go in each Geo Node separately and create an SSH key pair. See the [SSH key creation][ssh-pair] section. |
## Secondary node GitLab setup
>**Note:**
The Geo nodes admin area (**Admin Area > Geo Nodes**) is not used when setting
up the secondary nodes. This is handled at the primary one.
To install a secondary node, you must follow the normal GitLab Enterprise
Edition installation, with some extra requirements:
- You should point your database connection to a [replicated instance](./database.md).
- Your secondary node should be allowed to [communicate via HTTP/HTTPS and
SSH with your primary node (make sure your firewall is not blocking that).
- Don't make any extra steps you would do for a normal new installation
- Don't setup any custom authentication (this will be handled by the `primary` node)
You need to make sure you restored the database backup (that is part of setting
up replication) and that the primary node PostgreSQL instance is ready to
replicate data.
### Database Encryption Key
- [Repositories data replication](#repositories-data-replication)
- [Create SSH key pairs for Geo nodes](#create-ssh-key-pairs-for-geo-nodes)
- [Primary Node GitLab setup](#primary-node-gitlab-setup)
- [Secondary Node GitLab setup](#secondary-node-gitlab-setup)
- [Database Encryptation Key](#database-encryptation-key)
- [Authorized keys regeneration](#authorized-keys-regeneration)
GitLab stores a unique encryption key in disk that we use to safely store
sensitive data in the database.
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
Any secondary node must have the **exact same value** for `db_key_base` as
defined in the primary one.
## Repositories data replication
- For Omnibus installations it is stored at `/etc/gitlab/gitlab-secrets.json`.
- For installations from source it is stored at `/home/git/gitlab/config/secrets.yml`.
Find that key in the primary node and copy paste its value in the secondaries.
### Enable the secondary GitLab instance
Your new GitLab secondary node can now be safely started.
1. [Create the SSH key pair][ssh-pair] for the secondary node.
1. Visit the primary node's **Admin Area > Geo Nodes** (`/admin/geo_nodes`).
1. Add your secondary node by providing its full URL and the public SSH key
you created previously.
1. Hit the **Add node** button.
---
After the **Add Node** button is pressed, the primary node will start to notify
changes to the secondary. Make sure the secondary instance is running and
accessible.
The two most obvious issues that replication can have here are:
- Database replication not working well
- Instance to instance notification not working. In that case, it can be
something of the following:
- You are using a custom certificate or custom CA (see the
[Troubleshooting](#troubleshooting) section)
- Instance is firewalled (check your firewall rules)
### Repositories data replication
Getting a new secondary Geo node up and running, will also require the
repositories directory to be synced from the primary node. You can use `rsync`
for that. From the secondary node run:
for that. Assuming `1.2.3.4` is the IP of the primary node, SSH into the
secondary and run:
```bash
# For Omnibus installations
rsync -avrP root@1.2.3.4:/var/opt/gitlab/git-data/repositories/ /var/opt/gitlab/git-data/repositories/
rsync -guavrP root@1.2.3.4:/var/opt/gitlab/git-data/repositories/ /var/opt/gitlab/git-data/repositories/
gitlab-ctl reconfigure # to fix directory permissions
# For installations from source
rsync -avrP root@1.2.3.4:/home/git/repositories/ /home/git/repositories/
chown -R git:git /home/git/repositories
rsync -guavrP root@1.2.3.4:/home/git/repositories/ /home/git/repositories/
chmod ug+rwX,o-rwx /home/git/repositories
```
where `1.2.3.4` is the IP of the primary node.
If this step is not followed, the secondary node will eventually clone and
fetch every missing repository as they are updated with new commits on the
primary node, so syncing the repositories beforehand will buy you some time.
## Create SSH key pairs for Geo nodes
While active repositories will be eventually replicated, if you don't rsync,
the files, any archived/inactive repositories will not get in the secondary node
as Geo doesn't run any routine task to look for missing repositories.
When adding a Geo node you must provide an SSH public key of the user that your
GitLab instance runs on (unless changed, should be the user `git`). This user
will act as a "normal user" who fetches from the primary Geo node.
### Authorized keys regeneration
Run the command below on each server that will be a Geo node (primary or
secondary), and paste the contents of `id_rsa.pub` to the admin area of the
primary node (**Admin Area > Geo Nodes**) when adding a new one:
The final step will be to regenerate the keys for `~/.ssh/authorized_keys` using
the command below (HTTPS clone will still work without this extra step).
```bash
sudo -u git -H ssh-keygen
On the secondary node where the database is [already replicated](./database.md),
run:
```
# For Omnibus installations
gitlab-rake gitlab:shell:setup
# For source installations
sudo -u git -H bundle exec rake gitlab:shell:setup RAILS_ENV=production
```
Remember to add your primary node to the `known_hosts` file of your `git` user.
This will enable `git` operations to authorize against your existing users.
New users and SSH keys updated after this step, will be replicated automatically.
You can find ssh key files and `know_hosts` at `/var/opt/gitlab/.ssh/` in
Omnibus installations or at `/home/git/.ssh/` when following the source
installation guide.
### Ready to use
Your instance should be ready to use. You can visit the Admin area in the
secondary node to check if it's correctly identified as a secondary Geo node and
if Geo is enabled.
If your installation isn't working properly, check the
[troubleshooting](#troubleshooting) section.
## Create SSH key pairs for new Geo nodes
>**Note:**
These are general instructions to create a new SSH key pair for a new Geo node,
either primary or secondary.
---
When adding a new Geo node, you must provide an SSH public key of the user that
your GitLab instance runs on (unless changed, should be the user `git`). This
user will act as a "normal user" who fetches from the primary Geo node.
1. Run the command below on each server that will be a Geo node:
```bash
sudo -u git -H ssh-keygen
```
1. Get the contents of `id_rsa.pub` the was just created:
```
# Omnibus installations
sudo -u git cat /var/opt/gitlab/.ssh/id_rsa.pub
# Installations from source
sudo -u git cat /home/git/.ssh/id_rsa.pub
```
1. Copy them to the admin area of the **primary** node (**Admin Area > Geo Nodes**).
---
If for any reason you generate the key using a different name from the default
`id_rsa`, or you want to generate an extra key only for the repository
......@@ -84,105 +213,86 @@ Host example.com # The FQDN of the primary Geo node
IdentityFile ~/.ssh/mycustom.key # The location of your private key
```
## Primary Node GitLab setup
### Add the primary node to the `known_hosts` file of the secondary nodes
>**Note:**
You will need to setup your database into a **Primary <-> Secondary (read-only)** replication
topology, and your Primary node should always point to a database's Primary
instance. If you haven't done that already, read [database replication](./database.md).
Go to the server that you chose to be your primary, and visit
**Admin Area > Geo Nodes** (`/admin/geo_nodes`) in order to add the Geo nodes.
Although we are looking at the primary Geo node setup, **this is where you also
add any secondary servers as well**.
In the following table you can see what all these settings mean:
| Setting | Description |
| --------- | ----------- |
| Primary | This marks a Geo Node as primary. There can be only one primary, make sure that you first add the primary node and then all the others. |
| URL | Your instance's full URL, in the same way it is configured in `gitlab.yml` (source based installations) or `/etc/gitlab/gitlab.rb` (omnibus installations). |
|Public Key | The SSH public key of the user that your GitLab instance runs on (unless changed, should be the user `git`). That means that you have to go in each Geo Node separately and create an SSH key pair. See the [SSH key creation](#create-ssh-key-pairs-for-geo-nodes) section. |
First, add your primary node by providing its full URL and the public SSH key
you created previously. Make sure to check the box 'This is a primary node'
when adding it. Continue with all secondaries.
![Geo Nodes Screen](img/geo-nodes-screen.png)
This operation is only needed for the secondary nodes.
---
## Secondary Node GitLab setup
>**Note:**
The Geo nodes admin area (**Admin Area > Geo Nodes**) is not used when setting
up the secondary servers. This is handled by the primary server setup.
To install a secondary node, you must follow the normal GitLab Enterprise
Edition installation, with some extra requirements:
- You should point your database connection to a [replicated instance](./database.md).
- Your secondary node should be allowed to communicate via HTTP/HTTPS and
SSH with your primary node (make sure your firewall is not blocking that).
### Database Encryption Key
GitLab stores a unique encryption key in disk that we use to safely store sensitive
data in the database.
Any secondary node must have the exact same value for `db_key_base` as defined in the primary one.
For Omnibus installations it is stored at `/etc/gitlab/gitlab-secrets.json`.
For Source installations it is stored at `/home/git/gitlab/config/secrets.yml`.
### Authorized keys regeneration
The final step will be to regenerate the keys for `.ssh/authorized_keys` using
the following command (HTTPS clone will still work without this extra step).
On the secondary node where the database is [already replicated](./database.md),
run the following:
The secondary nodes need to know the SSH fingerprint of the primary node that
will be used for the Git clone/fetch operations. In order to add it to the
`known_hosts` file, while in the terminal of a secondary node, run the
following command and type `yes` when asked:
```
# For Omnibus installations
gitlab-rake gitlab:shell:setup
# For source installations
sudo -u git -H bundle exec rake gitlab:shell:setup RAILS_ENV=production
sudo -u git -H ssh git@<primary-node-url>
```
Replace `<primary-node-url>` with the FQDN of the primary node. You can verify
that the fingerprint was added by checking:
- `/var/opt/gitlab/.ssh/known_hosts` for Omnibus installations or
- `/home/git/.ssh/known_hosts` for installations from source
## Troubleshooting
Setting up Geo requires careful attention to details and sometimes it's easy to
miss a step.
Here is a checklist of questions you should ask to try to detect where you have
to fix (all commands and path locations are for Omnibus installs):
miss a step. Here is a checklist of questions you should ask to try to detect
where you have to fix (all commands and path locations are for Omnibus installs):
- Is Postgres replication working?
- Are my nodes pointing to the correct database instance?
- You should make sure your primary Geo node points to the instance with
writting permissions.
writing permissions.
- Any secondary nodes should point only to read-only instances.
- Can Geo detect my current node correctly?
- Geo uses your defined node from `Admin > Geo` screen, and tries to match
with the value defined in `/etc/gitlab/gitlab.rb` configuration file.
The relevant line looks like: `external_url "http://gitlab.example.com"`.
- To check if node on current machine is correctly detected type:
`sudo gitlab-rails runner "Gitlab::Geo.current_node"`,
expect something like: `#<GeoNode id: 2, schema: "https", host: "gitlab.example.com", port: 443, relative_url_root: "", primary: false, ...>`
```
sudo gitlab-rails runner "Gitlab::Geo.current_node"
```
and expect something like:
```
#<GeoNode id: 2, schema: "https", host: "gitlab.example.com", port: 443, relative_url_root: "", primary: false, ...>
```
- By running the command above, `primary` should be `true` when executed in
the primary node, and `false` on any secondary
- Did I defined the correct SSH Key for the node?
- Did I define the correct SSH Key for the node?
- You must create an SSH Key for `git` user
- This key is the one you have to inform at `Admin > Geo`
- Can I SSH from secondary to primary node using `git` user account?
- This is the most obvious cause of problems with repository replication issues.
If you haven't added the primary node's key to `known_hosts`, you will end up with
a lot of failed sidekiq jobs with an error similar to:
```
Gitlab::Shell::Error: Host key verification failed. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
```
An easy way to fix is by logging in as the `git` user in the secondary node and run:
```
# remove old entries to your primary gitlab in known_hosts
ssh-keyscan -R your-primary-gitlab.example.com
# add a new entry in known_hosts
ssh-keyscan -t rsa your-primary-gitlab.example.com >> ~/.ssh/known_hosts
```
- Can primary node communicate with secondary node by HTTP/HTTPS ports?
- Can secondary nodes communicate with primary node by HTTP/HTTPS/SSH ports?
- Can secondary nodes execute a succesfull git clone using git user's own
- Can secondary nodes execute a successful git clone using git user's own
SSH Key to primary node repository?
> This list is an atempt to document all the moving parts that can go wrong.
>**Note:**
This list is an attempt to document all the moving parts that can go wrong.
We are working into getting all this steps verified automatically in a
rake task in the future. :)
rake task in the future.
[ssh-pair]: #create-ssh-key-pairs-for-new-geo-nodes
......@@ -14,7 +14,7 @@ and `secondary` as either `slave` or `standby` server (read-only).
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
**Table of Contents**
- [PostgreSQL replication](#postgresql-replication)
- [PostgreSQL - Configure the primary server](#postgresql-configure-the-primary-server)
......
doc/gitlab-geo/img/geo-nodes-screen.png

52.8 KB | W: | H:

doc/gitlab-geo/img/geo-nodes-screen.png

39.5 KB | W: | H:

doc/gitlab-geo/img/geo-nodes-screen.png
doc/gitlab-geo/img/geo-nodes-screen.png
doc/gitlab-geo/img/geo-nodes-screen.png
doc/gitlab-geo/img/geo-nodes-screen.png
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment