database_source.md 16.2 KB
Newer Older
1
# Geo database replication (source)
2

3
NOTE: **Note:**
4 5
This documentation applies to GitLab source installations. In GitLab 11.5, this documentation was deprecated and will be removed in a future release.
Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/omnibus/update/convert_to_omnibus.html). For installations
6
using the Omnibus GitLab packages, follow the
7
[**database replication for Omnibus GitLab**][database] guide.
8

9
NOTE: **Note:**
10
The stages of the setup process must be completed in the documented order.
11
Before attempting the steps in this stage, [complete all prior stages](index.md#using-gitlab-installed-from-source-deprecated).
12 13

This document describes the minimal steps you have to take in order to
14
replicate your **primary** GitLab database to a **secondary** node's database. You may
15 16 17 18 19 20 21
have to change some values according to your database setup, how big it is, etc.

You are encouraged to first read through all the steps before executing them
in your testing/production environment.

## PostgreSQL replication

22 23 24
The GitLab **primary** node where the write operations happen will connect to
**primary** database server, and the **secondary** ones which are read-only will
connect to **secondary** database servers (which are read-only too).
25

26
NOTE: **Note:**
27 28
In many databases' documentation, you will see "**primary**" being referenced as "master"
and "**secondary**" as either "slave" or "standby" server (read-only).
29

30
We recommend using [PostgreSQL replication slots][replication-slots-article]
31
to ensure the **primary** node retains all the data necessary for the secondaries to
32 33 34 35 36 37
recover. See below for more details.

The following guide assumes that:

- You are using PostgreSQL 9.6 or later which includes the
  [`pg_basebackup` tool][pgback] and improved [Foreign Data Wrapper][FDW] support.
38
- You have a **primary** node already set up (the GitLab server you are
39
  replicating from), running PostgreSQL 9.6 or later, and
40
  you have a new **secondary** server set up with the same versions of the OS,
41
  PostgreSQL, and GitLab on all nodes.
Merve's avatar
Merve committed
42 43
- The IP of the **primary** server for our examples is `198.51.100.1`, whereas the
  **secondary** node's IP is `198.51.100.2`. Note that the **primary** and **secondary** servers
44 45
  **must** be able to communicate over these addresses. These IP addresses can either
  be public or private.
46 47 48 49
  
CAUTION: **Warning:**
Geo works with streaming replication. Logical replication is not supported at this time. 
There is an [issue where support is being discussed](https://gitlab.com/gitlab-org/gitlab-ee/issues/7420).
50

51
### Step 1. Configure the **primary** server
52 53 54

1. SSH into your GitLab **primary** server and login as root:

55
    ```sh
56 57 58
    sudo -i
    ```

59
1. Add this node as the Geo **primary** by running:
60

61
    ```sh
62 63 64 65 66
    bundle exec rake geo:set_primary_node
    ```

1. Create a [replication user] named `gitlab_replicator`:

67 68 69
    ```sql
    --- Create a new user 'replicator'
    CREATE USER gitlab_replicator;
70 71

    --- Set/change a password and grants replication privilege
72
    ALTER USER gitlab_replicator WITH REPLICATION ENCRYPTED PASSWORD '<replication_password>';
73 74
    ```

75 76 77
1. Make sure your the `gitlab` database user has a password defined:

    ```sh
78 79 80 81
    sudo \
       -u postgres psql \
       -d template1 \
       -c "ALTER USER gitlab WITH ENCRYPTED PASSWORD '<database_password>';"
82
    ```
83 84

1. Edit the content of `database.yml` in `production:` and add the password like the example below:
85 86 87 88 89 90 91 92 93 94 95

    ```yaml
    #
    # PRODUCTION
    #
    production:
      adapter: postgresql
      encoding: unicode
      database: gitlabhq_production
      pool: 10
      username: gitlab
96
      password: <database_password>
97 98 99
      host: /var/opt/gitlab/geo-postgresql
    ```

100
1. Set up TLS support for the PostgreSQL **primary** server:
101

102
    CAUTION: **Warning**:
103
    Only skip this step if you **know** that PostgreSQL traffic
104
    between the **primary** and **secondary** nodes will be secured through some other
105 106
    means, e.g., a known-safe physical network path or a site-to-site VPN that
    you have configured.
107 108 109 110 111 112 113 114

    If you are replicating your database across the open Internet, it is
    **essential** that the connection is TLS-secured. Correctly configured, this
    provides protection against both passive eavesdroppers and active
    "man-in-the-middle" attackers.

    To generate a self-signed certificate and key, run this command:

115
    ```sh
116 117 118 119 120 121 122 123
    openssl req \
       -nodes \
       -batch \
       -x509 \
       -newkey rsa:4096 \
       -keyout server.key \
       -out server.crt \
       -days 3650
124 125 126 127 128 129 130
    ```

    This will create two files - `server.key` and `server.crt` - that you can
    use for authentication.

    Copy them to the correct location for your PostgreSQL installation:

131
    ```sh
132 133 134 135 136 137 138 139 140 141 142 143 144 145
    # Copying a self-signed certificate and key
    install -o postgres -g postgres -m 0400 -T server.crt ~postgres/9.x/main/data/server.crt
    install -o postgres -g postgres -m 0400 -T server.key ~postgres/9.x/main/data/server.key
    ```

    Add this configuration to `postgresql.conf`, removing any existing
    configuration for `ssl_cert_file` or `ssl_key_file`:

    ```
    ssl = on
    ssl_cert_file='server.crt'
    ssl_key_file='server.key'
    ```

146
1. Edit `postgresql.conf` to configure the **primary** server for streaming replication
147 148 149
   (for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`):

    ```
150
    listen_address = '<primary_node_ip>'
151 152 153 154 155 156 157 158 159
    wal_level = hot_standby
    max_wal_senders = 5
    min_wal_size = 80MB
    max_wal_size = 1GB
    max_replicaton_slots = 1 # Number of Geo secondary nodes
    wal_keep_segments = 10
    hot_standby = on
    ```

160
    NOTE: **Note**:
161
    Be sure to set `max_replication_slots` to the number of Geo **secondary**
162 163 164 165
    nodes that you may potentially have (at least 1).

    For security reasons, PostgreSQL by default only listens on the local
    interface (e.g. 127.0.0.1). However, Geo needs to communicate
166
    between the **primary** and **secondary** nodes over a common network, such as a
167 168 169 170 171
    corporate LAN or the public Internet. For this reason, we need to
    configure PostgreSQL to listen on more interfaces.

    The `listen_address` option opens PostgreSQL up to external connections
    with the interface corresponding to the given IP. See [the PostgreSQL
172
    documentation][pg-docs-runtime-conn] for more details.
173 174

    You may also want to edit the `wal_keep_segments` and `max_wal_senders` to
175
    match your database replication requirements. Consult the
176
    [PostgreSQL - Replication documentation][pg-docs-runtime-replication] for more information.
177

178 179 180
1. Set the access control on the **primary** node to allow TCP connections using the
   server's public IP and set the connection from the **secondary** node to require a
   password. Edit `pg_hba.conf` (for Debian/Ubuntu that would be
181 182
   `/etc/postgresql/9.x/main/pg_hba.conf`):

183
    ```sh
184 185
    host    all             all                      <primary_node_ip>/32      md5
    host    replication     gitlab_replicator        <secondary_node_ip>/32      md5
186 187
    ```

188 189
    If you want to add another secondary, add one more row like the replication 
    one and change the IP address:
190

191
    ```sh
192 193 194
    host    all             all                      <primary_node_ip>/32      md5
    host    replication     gitlab_replicator        <secondary_node_ip>/32      md5
    host    replication     gitlab_replicator        <another_secondary_node_ip>/32  md5
195 196 197 198 199 200 201 202 203
    ```

1. Restart PostgreSQL for the changes to take effect.

1. Choose a database-friendly name to use for your secondary to use as the
   replication slot name. For example, if your domain is
   `secondary.geo.example.com`, you may use `secondary_example` as the slot
   name.

204
1. Create the replication slot on the **primary** node:
205

206
    ```sh
207 208 209 210 211 212 213 214 215 216 217
    $ sudo -u postgres psql -c "SELECT * FROM pg_create_physical_replication_slot('secondary_example');"
      slot_name         | xlog_position
      ------------------+---------------
      secondary_example |
      (1 row)
    ```

1. Now that the PostgreSQL server is set up to accept remote connections, run
   `netstat -plnt` to make sure that PostgreSQL is listening to the server's
   public IP.

218
### Step 2. Configure the secondary server
219

220
Follow the first steps in ["configure the secondary server"][database-replication] and note that since you are installing from source, the username and
221 222
group listed as `gitlab-psql` in those steps should be replaced by `postgres`
instead. After completing the "Test that the `gitlab-psql` user can connect to
223
the **primary** node's database" step, continue here:
224 225 226 227

1. Edit `postgresql.conf` to configure the secondary for streaming replication
   (for Debian/Ubuntu that would be `/etc/postgresql/9.*/main/postgresql.conf`):

228
    ```sh
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
    wal_level = hot_standby
    max_wal_senders = 5
    checkpoint_segments = 10
    wal_keep_segments = 10
    hot_standby = on
    ```

1. Restart PostgreSQL for the changes to take effect.

#### Enable tracking database on the secondary server

Geo secondary nodes use a tracking database to keep track of replication status
and recover automatically from some replication issues. Follow the steps below to create
the tracking database.

1. On the secondary node, run the following command to create `database_geo.yml` with the
245
   information of your secondary PostgreSQL instance:
246

247
    ```sh
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266
    sudo cp /home/git/gitlab/config/database_geo.yml.postgresql /home/git/gitlab/config/database_geo.yml
    ```

1. Edit the content of `database_geo.yml` in `production:` as in the example below:

    ```yaml
    #
    # PRODUCTION
    #
    production:
      adapter: postgresql
      encoding: unicode
      database: gitlabhq_geo_production
      pool: 10
      username: gitlab_geo
      # password:
      host: /var/opt/gitlab/geo-postgresql
    ```

267
1. Create the database `gitlabhq_geo_production` on the PostgreSQL instance of the **secondary** node.
268 269 270

1. Set up the Geo tracking database:

271
    ```sh
272 273 274 275 276 277
    bundle exec rake geo:db:migrate
    ```

1. Configure the [PostgreSQL FDW][FDW] connection and credentials:

    Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
278
    params to match your environment. Execute it to set up the FDW connection.
279 280

    ```sh
281
    #!/bin/bash
282

283
    # Secondary Database connection params:
284
    DB_HOST="/var/opt/gitlab/postgresql" # change to the public IP or VPC private IP if its an external server
285 286 287
    DB_NAME="gitlabhq_production"
    DB_USER="gitlab"
    DB_PORT="5432"
288

289
    # Tracking Database connection params:
290
    GEO_DB_HOST="/var/opt/gitlab/geo-postgresql" # change to the public IP or VPC private IP if its an external server
291 292 293
    GEO_DB_NAME="gitlabhq_geo_production"
    GEO_DB_USER="gitlab_geo"
    GEO_DB_PORT="5432"
294

295 296 297
    query_exec () {
      gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
    }
298

299 300 301 302 303
    query_exec "CREATE EXTENSION postgres_fdw;"
    query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
    query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}');"
    query_exec "CREATE SCHEMA gitlab_secondary;"
    query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
304 305 306 307 308
    ```

    And edit the content of `database_geo.yml` and to add `fdw: true` to
    the  `production:` block.

309
### Step 3. Initiate the replication process
310

311 312
Below we provide a script that connects the database on the **secondary** node to
the database on the **primary** node, replicates the database, and creates the
313 314 315 316 317
needed files for streaming replication.

The directories used are the defaults for Debian/Ubuntu. If you have changed
any defaults, configure it as you see fit replacing the directories and paths.

318
CAUTION: **Warning:**
319 320 321 322 323
Make sure to run this on the **secondary** server as it removes all PostgreSQL's
data before running `pg_basebackup`.

1. SSH into your GitLab **secondary** server and login as root:

324
    ```sh
325 326 327 328 329 330
    sudo -i
    ```

1. Save the snippet below in a file, let's say `/tmp/replica.sh`. Modify the
   embedded paths if necessary:

331
    ```
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347
    #!/bin/bash

    PORT="5432"
    USER="gitlab_replicator"
    echo ---------------------------------------------------------------
    echo WARNING: Make sure this script is run from the secondary server
    echo ---------------------------------------------------------------
    echo
    echo Enter the IP or FQDN of the primary PostgreSQL server
    read HOST
    echo Enter the password for $USER@$HOST
    read -s PASSWORD
    echo Enter the required sslmode
    read SSLMODE

    echo Stopping PostgreSQL and all GitLab services
348 349
    sudo service gitlab stop
    sudo service postgresql stop
350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370

    echo Backing up postgresql.conf
    sudo -u postgres mv /var/opt/gitlab/postgresql/data/postgresql.conf /var/opt/gitlab/postgresql/

    echo Cleaning up old cluster directory
    sudo -u postgres rm -rf /var/opt/gitlab/postgresql/data

    echo Starting base backup as the replicator user
    echo Enter the password for $USER@$HOST
    sudo -u postgres /opt/gitlab/embedded/bin/pg_basebackup -h $HOST -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -x -P

    echo Writing recovery.conf file
    sudo -u postgres bash -c "cat > /var/opt/gitlab/postgresql/data/recovery.conf <<- _EOF1_
      standby_mode = 'on'
      primary_conninfo = 'host=$HOST port=$PORT user=$USER password=$PASSWORD sslmode=$SSLMODE'
    _EOF1_
    "

    echo Restoring postgresql.conf
    sudo -u postgres mv /var/opt/gitlab/postgresql/postgresql.conf /var/opt/gitlab/postgresql/data/

371 372
    echo Starting PostgreSQL
    sudo service postgresql start
373 374 375 376
    ```

1. Run it with:

377
    ```sh
378 379 380
    bash /tmp/replica.sh
    ```

381
    When prompted, enter the IP/FQDN of the **primary** node, and the password you set up
382 383 384 385 386 387 388 389
    for the `gitlab_replicator` user in the first step.

    You should use `verify-ca` for the `sslmode`. You can use `disable` if you
    are happy to skip PostgreSQL TLS authentication altogether (e.g., you know
    the network path is secure, or you are using a site-to-site VPN). This is
    **not** safe over the public Internet!

    You can read more details about each `sslmode` in the
390
    [PostgreSQL documentation][pg-docs-ssl];
391 392 393 394 395
    the instructions above are carefully written to ensure protection against
    both passive eavesdroppers and active "man-in-the-middle" attackers.

The replication process is now over.

396 397 398 399
## PGBouncer support (optional)

1. First, enter the PostgreSQL console as an admin user.

400
1. Then create the read-only user:
401 402 403

    ```sql
    -- NOTE: Use the password defined earlier
404
    CREATE USER gitlab_geo_fdw WITH password '<your_password_here>';
405 406 407 408 409 410 411 412 413 414 415
    GRANT CONNECT ON DATABASE gitlabhq_production to gitlab_geo_fdw;
    GRANT USAGE ON SCHEMA public TO gitlab_geo_fdw;
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO gitlab_geo_fdw;
    GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO gitlab_geo_fdw;

    -- Tables created by "gitlab" should be made read-only for "gitlab_geo_fdw"
    -- automatically.
    ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON TABLES TO gitlab_geo_fdw;
    ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON SEQUENCES TO gitlab_geo_fdw;
    ```

416
1. Enter the PostgreSQL console on the **secondary** tracking database and change the user mapping to this new user:
417 418 419 420 421

    ```
    ALTER USER MAPPING FOR gitlab_geo SERVER gitlab_secondary OPTIONS (SET user 'gitlab_geo_fdw')
    ```

422 423 424 425 426 427 428 429
## MySQL replication

MySQL replication is not supported for Geo.

## Troubleshooting

Read the [troubleshooting document](troubleshooting.md).

430
[replication-slots-article]: https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75
431 432 433
[pgback]: http://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html
[replication user]:https://wiki.postgresql.org/wiki/Streaming_Replication
[FDW]: https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
434 435 436 437 438 439
[database]: database.md
[add-geo-node]: configuration.md#step-3-add-the-secondary-gitlab-node
[database-replication]: database.md#step-2-configure-the-secondary-server
[pg-docs-ssl]: https://www.postgresql.org/docs/9.6/static/libpq-ssl.html#LIBPQ-SSL-PROTECTION
[pg-docs-runtime-conn]: https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html
[pg-docs-runtime-replication]: https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html