Commit c79ecf08 authored by Douglas Barbosa Alexandre's avatar Douglas Barbosa Alexandre

Merge branch '223256-geo-update-installation-docs-updated-to-reflect-fdw-removal' into 'master'

Geo - Update docs to reflect FDW removal

Closes #223256

See merge request gitlab-org/gitlab!39485
parents 937ee14c e1856d80
......@@ -37,8 +37,7 @@ recover. See below for more details.
The following guide assumes that:
- You are using Omnibus and therefore you are using PostgreSQL 11 or later
which includes the [`pg_basebackup` tool](https://www.postgresql.org/docs/11/app-pgbasebackup.html) and improved
[Foreign Data Wrapper](https://www.postgresql.org/docs/11/postgres-fdw.html) support.
which includes the [`pg_basebackup` tool](https://www.postgresql.org/docs/11/app-pgbasebackup.html).
- You have a **primary** node already set up (the GitLab server you are
replicating from), running Omnibus' PostgreSQL (or equivalent version), and
you have a new **secondary** server set up with the same versions of the OS,
......@@ -346,10 +345,10 @@ There is an [issue where support is being discussed](https://gitlab.com/gitlab-o
Ensure that the contents of `~gitlab-psql/data/server.crt` on the **primary** node
match the contents of `~gitlab-psql/.postgresql/root.crt` on the **secondary** node.
1. Configure PostgreSQL to enable FDW support:
1. Configure PostgreSQL:
This step is similar to how we configured the **primary** instance.
We need to enable this, to enable FDW support, even if using a single node.
We need to enable this, even if using a single node.
Edit `/etc/gitlab/gitlab.rb` and add the following, replacing the IP
addresses with addresses appropriate to your network configuration:
......@@ -375,11 +374,6 @@ There is an [issue where support is being discussed](https://gitlab.com/gitlab-o
postgresql['sql_user_password'] = '<md5_hash_of_your_password>'
gitlab_rails['db_password'] = '<your_password_here>'
##
## Enable FDW support for the Geo Tracking Database (improves performance)
##
geo_secondary['db_fdw'] = true
```
For external PostgreSQL instances, see [additional instructions](external_database.md).
If you bring a former **primary** node back online to serve as a **secondary** node, then you also need to remove `roles ['geo_primary_role']` or `geo_primary_role['enable'] = true`.
......@@ -390,15 +384,12 @@ There is an [issue where support is being discussed](https://gitlab.com/gitlab-o
gitlab-ctl reconfigure
```
1. Restart PostgreSQL for the IP change to take effect and reconfigure again:
1. Restart PostgreSQL for the IP change to take effect:
```shell
gitlab-ctl restart postgresql
gitlab-ctl reconfigure
```
This last reconfigure will provision the FDW configuration and enable it.
### Step 3. Initiate the replication process
Below we provide a script that connects the database on the **secondary** node to
......@@ -473,48 +464,6 @@ high-availability configuration with a cluster of nodes supporting a Geo
**primary** node and another cluster of nodes supporting a Geo **secondary** node. For more
information, see [High Availability with Omnibus GitLab](../../postgresql/replication_and_failover.md).
For a Geo **secondary** node to work properly with PgBouncer in front of the database,
it will need a separate read-only user to make [PostgreSQL FDW queries](https://www.postgresql.org/docs/11/postgres-fdw.html)
work:
1. On the **primary** Geo database, enter the PostgreSQL on the console as an
admin user. If you are using an Omnibus-managed database, log onto the **primary**
node that is running the PostgreSQL database (the default Omnibus database name is `gitlabhq_production`):
```shell
sudo \
-u gitlab-psql /opt/gitlab/embedded/bin/psql \
-h /var/opt/gitlab/postgresql gitlabhq_production
```
1. Then create the read-only user:
```sql
-- NOTE: Use the password defined earlier
CREATE USER gitlab_geo_fdw WITH password 'mypassword';
GRANT CONNECT ON DATABASE gitlabhq_production to gitlab_geo_fdw;
GRANT USAGE ON SCHEMA public TO gitlab_geo_fdw;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO gitlab_geo_fdw;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO gitlab_geo_fdw;
-- Tables created by "gitlab" should be made read-only for "gitlab_geo_fdw"
-- automatically.
ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON TABLES TO gitlab_geo_fdw;
ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON SEQUENCES TO gitlab_geo_fdw;
```
1. On the **secondary** nodes, change `/etc/gitlab/gitlab.rb`:
```ruby
geo_postgresql['fdw_external_user'] = 'gitlab_geo_fdw'
```
1. Save the file and reconfigure GitLab for the changes to be applied:
```shell
gitlab-ctl reconfigure
```
## Troubleshooting
Read the [troubleshooting document](troubleshooting.md).
......@@ -183,9 +183,6 @@ to grant additional roles to your tracking database user (by default, this is
- Amazon RDS requires the [`rds_superuser`](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.Roles) role.
- Azure Database for PostgreSQL requires the [`azure_pg_admin`](https://docs.microsoft.com/en-us/azure/postgresql/howto-create-users#how-to-create-additional-admin-users-in-azure-database-for-postgresql) role.
The tracking database requires an [FDW](https://www.postgresql.org/docs/11/postgres-fdw.html)
connection with the **secondary** replica database for improved performance.
If you have an external database ready to be used as the tracking database,
follow the instructions below to use it:
......@@ -224,7 +221,6 @@ the tracking database on port 5432.
geo_secondary['db_host'] = '<tracking_database_host>'
geo_secondary['db_port'] = <tracking_database_port> # change to the correct port
geo_secondary['db_fdw'] = true # enable FDW
geo_postgresql['enable'] = false # don't use internal managed instance
```
......@@ -236,48 +232,3 @@ the tracking database on port 5432.
gitlab-rake geo:db:create
gitlab-rake geo:db:migrate
```
1. Configure the [PostgreSQL FDW](https://www.postgresql.org/docs/11/postgres-fdw.html)
connection and credentials:
Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
parameters to match your environment. Execute it to set up the FDW connection.
```shell
#!/bin/bash
# Secondary Database connection params:
DB_HOST="<public_ip_or_vpc_private_ip>"
DB_NAME="gitlabhq_production"
DB_USER="gitlab"
DB_PASS="<your_password_here>"
DB_PORT="5432"
# Tracking Database connection params:
GEO_DB_HOST="<public_ip_or_vpc_private_ip>"
GEO_DB_NAME="gitlabhq_geo_production"
GEO_DB_USER="gitlab_geo"
GEO_DB_PORT="5432"
query_exec () {
gitlab-psql -h $GEO_DB_HOST -U $GEO_DB_USER -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
}
query_exec "CREATE EXTENSION postgres_fdw;"
query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}', password '${DB_PASS}');"
query_exec "CREATE SCHEMA gitlab_secondary;"
query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
```
NOTE: **Note:**
The script template above uses `gitlab-psql` as it's intended to be executed from the Geo machine,
but you can change it to `psql` and run it from any machine that has access to the database. We also recommend using
`psql` for AWS RDS.
1. Save the file and [restart GitLab](../../restart_gitlab.md#omnibus-gitlab-restart)
1. Populate the FDW tables:
```shell
gitlab-rake geo:db:refresh_foreign_tables
```
......@@ -117,7 +117,7 @@ The following are required to run Geo:
The following operating systems are known to ship with a current version of OpenSSH:
- [CentOS](https://www.centos.org) 7.4+
- [Ubuntu](https://ubuntu.com) 16.04+
- PostgreSQL 11+ with [FDW](https://www.postgresql.org/docs/11/postgres-fdw.html) support and [Streaming Replication](https://wiki.postgresql.org/wiki/Streaming_Replication)
- PostgreSQL 11+ with [Streaming Replication](https://wiki.postgresql.org/wiki/Streaming_Replication)
- Git 2.9+
- All nodes must run the same GitLab version.
......@@ -166,7 +166,6 @@ The tracking database instance is used as metadata to control what needs to be u
- Fetch changes from a repository that has recently been updated.
Because the replicated database instance is read-only, we need this additional database instance for each **secondary** node.
The tracking database requires the `postgres_fdw` extension.
### Geo Log Cursor
......
......@@ -260,10 +260,8 @@ Configure the tracking database.
geo_postgresql['sql_user_password'] = '<tracking_database_password_md5_hash>'
##
## Configure FDW connection to the replica database
## Configure PostgreSQL connection to the replica database
##
geo_secondary['db_fdw'] = true
geo_postgresql['fdw_external_password'] = '<replica_database_password_plaintext>'
geo_postgresql['md5_auth_cidr_addresses'] = ['<replica_database_ip>/32']
gitlab_rails['db_host'] = '<replica_database_ip>'
......
......@@ -11,6 +11,19 @@ Check this document if it includes instructions for the version you are updating
These steps go together with the [general steps](updating_the_geo_nodes.md#general-update-steps)
for updating Geo nodes.
## Updating to GitLab 13.3
In GitLab 13.3, Geo removed the PostgreSQL [Foreign Data Wrapper](https://www.postgresql.org/docs/11/postgres-fdw.html) dependency for the tracking database.
The FDW server, user, and the extension will be removed during the upgrade process on each **secondary** node. The GitLab settings related to the FDW in the `/etc/gitlab/gitlab.rb` have been deprecated and can be safely removed.
There are some scenarios like using an external PostgreSQL instance for the tracking database where the FDW settings must be removed manually. Enter the PostgreSQL console of that instance and remove them:
```shell
DROP SERVER gitlab_secondary CASCADE;
DROP EXTENSION IF EXISTS postgres_fdw;
```
## Updating to GitLab 13.0
Upgrading to GitLab 13.0 requires GitLab 12.10 to already be using PostgreSQL
......
......@@ -58,7 +58,6 @@ This section is for links to information elsewhere in the GitLab documentation.
- Support for MySQL was removed in GitLab 12.1; [migrate to PostgreSQL](../../update/mysql_to_postgresql.md)
- required extension `pg_trgm`
- required extension `btree_gist`
- required extension `postgres_fdw` for Geo
- Errors like this in the `production/sidekiq` log; see: [Set default_transaction_isolation into read committed](https://docs.gitlab.com/omnibus/settings/database.html#set-default_transaction_isolation-into-read-committed):
......@@ -85,7 +84,7 @@ This section is for links to information elsewhere in the GitLab documentation.
PANIC: could not write to file ‘pg_xlog/xlogtemp.123’: No space left on device
```
- [Checking Geo configuration](../geo/replication/troubleshooting.md#checking-configuration) including
- [Checking Geo configuration](../geo/replication/troubleshooting.md) including
- reconfiguring hosts/ports
- checking and fixing user/password mappings
......
......@@ -209,121 +209,12 @@ To migrate the tracking database, run:
bundle exec rake geo:db:migrate
```
### Foreign Data Wrapper
> Introduced in GitLab 10.1.
Foreign Data Wrapper ([FDW](#fdw)) is used by the [Geo Log Cursor](#geo-log-cursor) and improves
the performance of many synchronization operations.
FDW is a PostgreSQL extension ([`postgres_fdw`](https://www.postgresql.org/docs/11/postgres-fdw.html)) that is enabled within
the Geo Tracking Database (on a **secondary** node), which allows it
to connect to the read-only database replica and perform queries and filter
data from both instances.
This persistent connection is configured as an FDW server
named `gitlab_secondary`. This configuration exists within the database's user
context only. To access the `gitlab_secondary`, GitLab needs to use the
same database user that had previously been configured.
The Geo Tracking Database accesses the read-only database replica via FDW as a regular user,
limited by its own restrictions. The credentials are configured as a
`USER MAPPING` associated with the `SERVER` mapped previously
(`gitlab_secondary`).
FDW configuration and credentials definition are managed automatically by the
Omnibus GitLab `gitlab-ctl reconfigure` command.
#### Refreshing the Foreign Tables
Whenever a new Geo node is configured or the database schema changes on the
**primary** node, you must refresh the foreign tables on the **secondary** node
by running the following:
```shell
bundle exec rake geo:db:refresh_foreign_tables
```
Failure to do this will prevent the **secondary** node from
functioning properly. The **secondary** node will generate error
messages, as the following PostgreSQL error:
```sql
ERROR: relation "gitlab_secondary.ci_job_artifacts" does not exist at character 323
STATEMENT: SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
FROM pg_attribute a LEFT JOIN pg_attrdef d
ON a.attrelid = d.adrelid AND a.attnum = d.adnum
WHERE a.attrelid = '"gitlab_secondary"."ci_job_artifacts"'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
```
#### Accessing data from a Foreign Table
At the SQL level, all you have to do is `SELECT` data from `gitlab_secondary.*`.
Here's an example of how to access all projects from the Geo Tracking Database's FDW:
```sql
SELECT * FROM gitlab_secondary.projects;
```
As a more real-world example, this is how you filter for unarchived projects
on the Tracking Database:
```sql
SELECT project_registry.*
FROM project_registry
JOIN gitlab_secondary.projects
ON (project_registry.project_id = gitlab_secondary.projects.id
AND gitlab_secondary.projects.archived IS FALSE)
```
At the ActiveRecord level, we have additional Models that represent the
foreign tables. They must be mapped in a slightly different way, and they are read-only.
Check the existing FDW models in `ee/app/models/geo/fdw` for reference.
From a developer's perspective, it's no different than creating a model that
represents a Database View.
With the examples above, you can access the projects with:
```ruby
Geo::Fdw::Project.all
```
and to access the `ProjectRegistry` filtering by unarchived projects:
```ruby
# We have to use Arel here:
project_registry_table = Geo::ProjectRegistry.arel_table
fdw_project_table = Geo::Fdw::Project.arel_table
project_registry_table.join(fdw_project_table)
.on(project_registry_table[:project_id].eq(fdw_project_table[:id]))
.where((fdw_project_table[:archived]).eq(true)) # if you append `.to_sql` you can check generated query
```
## Finders
Geo uses [Finders](https://gitlab.com/gitlab-org/gitlab/tree/master/app/finders),
which are classes take care of the heavy lifting of looking up
projects/attachments/etc. in the tracking database and main database.
### Finders Performance
The Finders need to compare data from the main database with data in
the tracking database. For example, counting the number of synced
projects normally involves retrieving the project IDs from one
database and checking their state in the other database. This is slow
and requires a lot of memory.
To overcome this, the Finders use [FDW](#fdw), or Foreign Data
Wrappers. This allows a regular `JOIN` between the main database and
the tracking database.
## Redis
Redis on the **secondary** node works the same as on the **primary**
......@@ -397,12 +288,6 @@ migration do not need to run on the secondary nodes.
A database on each Geo **secondary** node that keeps state for the node
on which it resides. Read more in [Using the Tracking database](#using-the-tracking-database).
### FDW
Foreign Data Wrapper, or FDW, is a feature built-in in PostgreSQL. It
allows data to be queried from different data sources. In Geo, it's
used to query data from different PostgreSQL instances.
## Geo Event Log
The Geo **primary** stores events in the `geo_event_log` table. Each
......
......@@ -156,10 +156,6 @@ If you're using [GitLab Geo](../administration/geo/replication/index.md):
- We strongly recommend running Omnibus-managed instances as they are actively
developed and tested. We aim to be compatible with most external (not managed
by Omnibus) databases (for example, [AWS Relational Database Service (RDS)](https://aws.amazon.com/rds/)) but we don't guarantee compatibility.
- You must also ensure the `postgres_fdw` extension is loaded into every
GitLab database. This extension
[can be enabled](https://www.postgresql.org/docs/11/sql-createextension.html)
using a PostgreSQL super user.
## Puma settings
......
---
title: Remove Foreign Data Wrapper support from Geo docs
merge_request: 39485
author:
type: removed
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment