external_database.md 7.34 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13
# Geo with external PostgreSQL instances

This document is relevant if you are using a PostgreSQL instance that is *not
managed by Omnibus*. This includes cloud-managed instances like AWS RDS, or
manually installed and configured PostgreSQL instances.

NOTE: **Note**:
We strongly recommend running Omnibus-managed instances as they are actively
developed and tested. We aim to be compatible with most external
(not managed by Omnibus) databases but we do not guarantee compatibility.

## **Primary** node

14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
1. SSH into a GitLab **primary** application server and login as root:

    ```sh
    sudo -i
    ```

1. Execute the command below to define the node as **primary** node:

    ```sh
    gitlab-ctl set-geo-primary-node
    ```

    This command will use your defined `external_url` in `/etc/gitlab/gitlab.rb`.


29 30 31 32 33 34 35
### Configure the external database to be replicated

To set up an external database, you can either:

- Set up streaming replication yourself (for example, in AWS RDS).
- Perform the Omnibus configuration manually as follows.

36 37 38 39 40 41 42 43 44 45 46
#### Leverage your cloud provider's tools to replicate the primary database

Given you have a primary node set up on AWS EC2 that uses RDS.
You can now just create a read-only replica in a different region and the
replication process will be managed by AWS. Make sure you've set Network ACL, Subnet, and
Security Group according to your needs, so the secondary application node can access the database.
Skip to the [Configure secondary application node](#configure-secondary-application-node) section below.

#### Manually configure the primary database for replication

The [geo_primary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
configures the **primary** node's database to be replicated by making changes to
`pg_hba.conf` and `postgresql.conf`. Make the following configuration changes
manually to your external database configuration:

```
##
## Geo Primary Role
## - pg_hba.conf
##
host    replication gitlab_replicator <trusted secondary IP>/32     md5
```

```
##
## Geo Primary Role
## - postgresql.conf
##
sql_replication_user = gitlab_replicator
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 50
max_replication_slots = 1 # number of secondary instances
hot_standby = on
```

## **Secondary** nodes

74
### Manually configure the replica database
75

76 77
Make the following configuration changes manually to your `postgresql.conf`
of external replica database:
78 79 80 81 82 83 84 85 86 87 88 89

```
##
## Geo Secondary Role
## - postgresql.conf
##
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 10
hot_standby = on
```

90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
### Configure **secondary** application nodes to use the external read-replica

With Omnibus, the
[geo_secondary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
has three main functions:

1. Configure the replica database.
1. Configure the tracking database.
1. Enable the [Geo Log Cursor](index.md#geo-log-cursor) (not covered in this section).

To configure the connection to the external read-replica database and enable Log Cursor:

1. SSH into a GitLab **secondary** application server and login as root:

    ```bash
    sudo -i
    ```

1. Edit `/etc/gitlab/gitlab.rb` and add the following

    ```ruby
    ##
    ## Geo Secondary role
    ## - configure dependent flags automatically to enable Geo
    ##
    roles ['geo_secondary_role']

    # note this is shared between both databases,
    # make sure you define the same password in both
    gitlab_rails['db_password'] = 'mypassword'

    gitlab_rails['db_username'] = 'gitlab'
    gitlab_rails['db_host'] = 'my-database-read-replica.dbs.com'
    ```
1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)

126 127 128 129
### Configure the tracking database

**Secondary** nodes use a separate PostgreSQL installation as a tracking
database to keep track of replication status and automatically recover from
130 131 132 133
potential replication issues. Omnibus automatically configures a tracking database
when `roles ['geo_secondary_role']` is set. For high availability,
refer to [Geo High Availability](https://docs.gitlab.com/ee/administration/high_availability).
If you want to run this database external to Omnibus, please follow the instructions below.
134

135
The tracking database requires an [FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
136 137 138 139 140
connection with the **secondary** replica database for improved performance.

If you have an external database ready to be used as the tracking database,
follow the instructions below to use it:

141 142 143 144 145 146 147
NOTE: **Note:**
If you want to use AWS RDS as a tracking database, make sure it has access to
the secondary database. Unfortunately, just assigning the same security group is not enough as
outbound rules do not apply to RDS PostgreSQL databases. Therefore, you need to explicitly add an inbound
rule to the read-replica's security group allowing any TCP traffic from
the tracking database on port 5432.

148 149 150 151 152 153 154 155 156 157
1. SSH into a GitLab **secondary** server and login as root:

    ```bash
    sudo -i
    ```

1. Edit `/etc/gitlab/gitlab.rb` with the connection params and credentials for
    the machine with the PostgreSQL instance:

    ```ruby
158 159
    geo_secondary['db_username'] = 'gitlab_geo'
    geo_secondary['db_password'] = 'my password'
160

161 162
    geo_secondary['db_host'] = '<change to the tracking DB host>'
    geo_secondary['db_port'] = 5432      # change to the correct port
163 164 165 166
    geo_secondary['db_fdw'] = true       # enable FDW
    geo_postgresql['enable'] = false     # don't use internal managed instance
    ```

167
1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
168 169 170 171

1. Run the tracking database migrations:

    ```bash
172
    gitlab-rake geo:db:create
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
    gitlab-rake geo:db:migrate
    ```

1. Configure the
    [PostgreSQL FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
    connection and credentials:

    Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
    params to match your environment. Execute it to set up the FDW connection.

    ```bash
    #!/bin/bash

    # Secondary Database connection params:
    DB_HOST="<change to the public IP or VPC private IP>"
    DB_NAME="gitlabhq_production"
    DB_USER="gitlab"
190
    DB_PASS="my password"
191 192 193 194 195 196 197 198 199 200 201 202 203 204
    DB_PORT="5432"

    # Tracking Database connection params:
    GEO_DB_HOST="<change to the public IP or VPC private IP>"
    GEO_DB_NAME="gitlabhq_geo_production"
    GEO_DB_USER="gitlab_geo"
    GEO_DB_PORT="5432"

    query_exec () {
      gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
    }

    query_exec "CREATE EXTENSION postgres_fdw;"
    query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
205
    query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}', password '${DB_PASS}');"
206 207 208 209 210
    query_exec "CREATE SCHEMA gitlab_secondary;"
    query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
    ```

    NOTE: **Note:** The script template above uses `gitlab-psql` as it's intended to be executed from the Geo machine,
211 212
    but you can change it to `psql` and run it from any machine that has access to the database. We also recommend using
    `psql` for AWS RDS.
213

214
1. Save the file and [restart GitLab](../../restart_gitlab.md#omnibus-gitlab-restart)
215 216 217 218 219
1. Populate the FDW tables:

    ```bash
    gitlab-rake geo:db:refresh_foreign_tables
    ```