Commit bed5ae42 authored by Gabriel Mazetto's avatar Gabriel Mazetto

Added FDW development documentation

Improve existing FDW documentation in `doc/development` to include
additional information about the architecture and how to use FDW as
a developer.
parent 785e55c2
...@@ -10,6 +10,7 @@ the diagram below and are described in more detail within this document. ...@@ -10,6 +10,7 @@ the diagram below and are described in more detail within this document.
## Replication layer ## Replication layer
Geo handles replication for different components: Geo handles replication for different components:
- [Database](#database-replication): includes the entire application, except cache and jobs. - [Database](#database-replication): includes the entire application, except cache and jobs.
- [Git repositories](#repository-replication): includes both projects and wikis. - [Git repositories](#repository-replication): includes both projects and wikis.
- [Uploaded blobs](#uploads-replication): includes anything from images attached on issues - [Uploaded blobs](#uploads-replication): includes anything from images attached on issues
...@@ -209,20 +210,38 @@ bundle exec rake geo:db:migrate ...@@ -209,20 +210,38 @@ bundle exec rake geo:db:migrate
### Foreign Data Wrapper ### Foreign Data Wrapper
The use of [FDW](#fdw) was introduced in GitLab 10.1. > Introduced in GitLab 10.1.
Foreign Data Wrapper ([FDW](#fdw)) is used by the [Geo Log Cursor](#geo-log-cursor) and improves
the performance of many synchronization operations.
This is useful for the [Geo Log Cursor](#geo-log-cursor) and improves FDW is a PostgreSQL extension ([`postgres_fdw`](https://www.postgresql.org/docs/current/postgres-fdw.html)) that is enabled within
the performance of some synchronization operations. the Geo Tracking Database (on a **secondary** node), which allows it
to connect to the readonly database replica and perform queries and filter
data from both instances.
While FDW is available in older versions of PostgreSQL, we needed to While FDW is available in older versions of PostgreSQL, we needed to
raise the minimum required version to 9.6 as this includes many raise the minimum required version to 9.6 as this includes many
performance improvements to the FDW implementation. performance improvements to the FDW implementation.
This persistent connection is configured as an FDW server
named `gitlab_secondary`. This configuration exists within the database's user
context only. To access the `gitlab_secondary`, GitLab needs to use the
same database user that had previously been configured.
The Geo Tracking Database accesses the readonly database replica via FDW as a regular user,
limited by its own restrictions. The credentials are configured as a
`USER MAPPING` associated with the `SERVER` mapped previously
(`gitlab_secondary`).
FDW configuration and credentials definition are managed automatically by the
Omnibus GitLab `gitlab-ctl reconfigure` command.
#### Refeshing the Foreign Tables #### Refeshing the Foreign Tables
Whenever the database schema changes on the **primary** node, the Whenever a new Geo node is configured or the database schema changes on the
**secondary** node will need to refresh its foreign tables by running **primary** node, you must refresh the foreign tables on the **secondary** node
the following: by running the following:
```sh ```sh
bundle exec rake geo:db:refresh_foreign_tables bundle exec rake geo:db:refresh_foreign_tables
...@@ -243,6 +262,53 @@ STATEMENT: SELECT a.attname, format_type(a.atttypid, a.atttypmod) ...@@ -243,6 +262,53 @@ STATEMENT: SELECT a.attname, format_type(a.atttypid, a.atttypmod)
ORDER BY a.attnum ORDER BY a.attnum
``` ```
#### Accessing data from a Foreign Table
At the SQL level, all you have to do is `SELECT` data from `gitlab_secondary.*`.
Here's an example of how to access all projects from the Geo Tracking Database's FDW:
```sql
SELECT * FROM gitlab_secondary.projects;
```
As a more real-world example, this is how you filter for unarchived projects
on the Tracking Database:
```sql
SELECT project_registry.*
FROM project_registry
JOIN gitlab_secondary.projects
ON (project_registry.project_id = gitlab_secondary.projects.id
AND gitlab_secondary.projects.archived IS FALSE)
```
At the ActiveRecord level, we have additional Models that represent the
foreign tables. They must be mapped in a slightly different way, and they are read-only.
Check the existing FDW models in `ee/app/models/geo/fdw` for reference.
From a developer's perspective, it's no different than creating a model that
represents a Database View.
With the examples above, you can access the projects with:
```ruby
Geo::Fdw::Project.all
```
and to access the `ProjectRegistry` filtering by unarchived projects:
```ruby
# We have to use Arel here:
project_registry_table = Geo::ProjectRegistry.arel_table
fdw_project_table = Geo::Fdw::Project.arel_table
project_registry_table.join(fdw_project_table)
.on(project_registry_table[:project_id].eq(fdw_project_table[:id]))
.where((fdw_project_table[:archived]).eq(true)) # if you append `.to_sql` you can check generated query
```
## Finders ## Finders
Geo uses [Finders](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/app/finders), Geo uses [Finders](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/app/finders),
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment