Commit 6c172a07 authored by Marcel Amirault's avatar Marcel Amirault

Merge branch '330218-aqualls-pseudonymizer' into 'master'

Tidy and polish the Pseudonymizer page

See merge request gitlab-org/gitlab!72549
parents 3334b89d 6478b448
...@@ -557,7 +557,7 @@ supported by consolidated configuration form, refer to the following guides: ...@@ -557,7 +557,7 @@ supported by consolidated configuration form, refer to the following guides:
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| **{dotted-circle}** No | | [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| **{dotted-circle}** No |
| [Packages](packages/index.md#using-object-storage) (optional feature) | **{check-circle}** Yes | | [Packages](packages/index.md#using-object-storage) (optional feature) | **{check-circle}** Yes |
| [Dependency Proxy](packages/dependency_proxy.md#using-object-storage) (optional feature) **(PREMIUM SELF)** | **{check-circle}** Yes | | [Dependency Proxy](packages/dependency_proxy.md#using-object-storage) (optional feature) **(PREMIUM SELF)** | **{check-circle}** Yes |
| [Pseudonymizer](pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | **{dotted-circle}** No | | [Pseudonymizer](pseudonymizer.md) (optional feature) | **{dotted-circle}** No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | **{dotted-circle}** No | | [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | **{dotted-circle}** No |
| [Terraform state files](terraform_state.md#using-object-storage) | **{check-circle}** Yes | | [Terraform state files](terraform_state.md#using-object-storage) | **{check-circle}** Yes |
| [GitLab Pages content](pages/index.md#using-object-storage) | **{check-circle}** Yes | | [GitLab Pages content](pages/index.md#using-object-storage) | **{check-circle}** Yes |
......
...@@ -6,33 +6,38 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -6,33 +6,38 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Pseudonymizer **(ULTIMATE)** # Pseudonymizer **(ULTIMATE)**
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/5532) in GitLab 11.1. Your GitLab database contains sensitive information. To protect sensitive information
when you run analytics on your database, you can use the Pseudonymizer service, which:
As the GitLab database hosts sensitive information, using it unfiltered for analytics 1. Uses `HMAC(SHA256)` to mutate fields containing sensitive information.
implies high security requirements. To help alleviate this constraint, the Pseudonymizer 1. Preserves references (referential integrity) between fields.
service is used to export GitLab data in a pseudonymized way. 1. Exports your GitLab data, scrubbed of sensitive material.
WARNING: WARNING:
This process is not impervious. If the source data is available, it's possible for If the source data is available, users can compare and correlate the scrubbed data
a user to correlate data to the pseudonymized version. with the original.
The Pseudonymizer currently uses `HMAC(SHA256)` to mutate fields that shouldn't To generate a pseudonymized data set:
be textually exported. This ensures that:
- the end-user of the data source cannot infer/revert the pseudonymized fields 1. [Configure Pseudonymizer](#configure-pseudonymizer) fields and output location.
- the referential integrity is maintained 1. [Enable Pseudonymizer data collection](#enable-pseudonymizer-data-collection).
1. Optional. [Generate a data set manually](#generate-data-set-manually).
## Configuration ## Configure Pseudonymizer
To configure the Pseudonymizer, you need to: To use the Pseudonymizer, configure both the fields you want to anonymize, and the location to
store the scrubbed data:
- Provide a manifest file that describes which fields should be included or 1. **Create a manifest file**: This file describes the fields to include or pseudonymize.
pseudonymized ([example `manifest.yml` file](https://gitlab.com/gitlab-org/gitlab/-/tree/master/config/pseudonymizer.yml)). - **Default manifest** - GitLab provides a default manifest in your GitLab installation
A default manifest is provided with the GitLab installation, using a relative file path that resolves from the Rails root. ([example `manifest.yml` file](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/pseudonymizer.yml)).
Alternatively, you can use an absolute file path. To use the example manifest file, use the `config/pseudonymizer.yml` relative path
- Use an object storage and specify the connection parameters in the `pseudonymizer.upload.connection` configuration option. when you configure connection parameters.
- **Custom manifest** - To use a custom manifest file, use the absolute path to
[Read more about using object storage with GitLab](object_storage.md). the file when you configure the connection parameters.
1. **Configure connection parameters**: In the configuration method appropriate for
your version of GitLab, specify the [object storage](object_storage.md)
connection parameters (`pseudonymizer.upload.connection`).
**For Omnibus installations:** **For Omnibus installations:**
...@@ -50,7 +55,7 @@ To configure the Pseudonymizer, you need to: ...@@ -50,7 +55,7 @@ To configure the Pseudonymizer, you need to:
} }
``` ```
If you are using AWS IAM profiles, be sure to omit the AWS access key and secret access key/value pairs. If you are using AWS IAM profiles, omit the AWS access key and secret access key/value pairs.
```ruby ```ruby
gitlab_rails['pseudonymizer_upload_connection'] = { gitlab_rails['pseudonymizer_upload_connection'] = {
...@@ -85,24 +90,34 @@ To configure the Pseudonymizer, you need to: ...@@ -85,24 +90,34 @@ To configure the Pseudonymizer, you need to:
1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) 1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source)
for the changes to take effect. for the changes to take effect.
## Usage ## Enable Pseudonymizer data collection
To enable data collection:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Settings > Metrics and Profiling**, then expand
**Pseudonymizer data collection**.
1. Select **Enable Pseudonymizer data collection**.
1. Select **Save changes**.
You can optionally run the Pseudonymizer using the following environment variables: ## Generate data set manually
- `PSEUDONYMIZER_OUTPUT_DIR` - where to store the output CSV files (defaults to `/tmp`) You can also run the Pseudonymizer manually:
- `PSEUDONYMIZER_BATCH` - the batch size when querying the DB (defaults to `100000`)
```shell 1. Set these environment variables:
## Omnibus - `PSEUDONYMIZER_OUTPUT_DIR` - Where to store the output CSV files. Defaults to `/tmp`.
sudo gitlab-rake gitlab:db:pseudonymizer These commands produce CSV files that can be quite large. Make sure the directory
can store a file at least 10% of the size of your database.
- `PSEUDONYMIZER_BATCH` - The batch size when querying the database. Defaults to `100000`.
1. Run the command appropriate for your application:
- **Omnibus GitLab**:
`sudo gitlab-rake gitlab:db:pseudonymizer`
- **Installations from source**:
`sudo -u git -H bundle exec rake gitlab:db:pseudonymizer RAILS_ENV=production`
## Source After you run the command, upload the output CSV files to your configured object
sudo -u git -H bundle exec rake gitlab:db:pseudonymizer RAILS_ENV=production storage. After the upload completes, delete the output file from the local disk.
```
This produces some CSV files that might be very large, so make sure the ## Related topics
`PSEUDONYMIZER_OUTPUT_DIR` has sufficient space. As a rule of thumb, at least
10% of the database size is recommended.
After the pseudonymizer has run, the output CSV files should be uploaded to the - [Using object storage with GitLab](object_storage.md).
configured object storage and deleted from the local disk.
...@@ -2085,7 +2085,7 @@ on what features you intend to use: ...@@ -2085,7 +2085,7 @@ on what features you intend to use:
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No | | [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
| [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes | | [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes |
| [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes | | [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes |
| [Pseudonymizer](../pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | No | | [Pseudonymizer](../pseudonymizer.md) (optional feature) | No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No | | [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
| [Terraform state files](../terraform_state.md#using-object-storage) | Yes | | [Terraform state files](../terraform_state.md#using-object-storage) | Yes |
......
...@@ -2091,7 +2091,7 @@ on what features you intend to use: ...@@ -2091,7 +2091,7 @@ on what features you intend to use:
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No | | [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
| [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes | | [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes |
| [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes | | [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes |
| [Pseudonymizer](../pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | No | | [Pseudonymizer](../pseudonymizer.md) (optional feature) | No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No | | [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
| [Terraform state files](../terraform_state.md#using-object-storage) | Yes | | [Terraform state files](../terraform_state.md#using-object-storage) | Yes |
......
...@@ -352,7 +352,7 @@ Omnibus: ...@@ -352,7 +352,7 @@ Omnibus:
```ruby ```ruby
## Enable Redis ## Enable Redis
redis['enable'] = true redis['enable'] = true
# Avoid running unnecessary services on the Redis server # Avoid running unnecessary services on the Redis server
gitaly['enable'] = false gitaly['enable'] = false
postgresql['enable'] = false postgresql['enable'] = false
...@@ -922,7 +922,7 @@ on what features you intend to use: ...@@ -922,7 +922,7 @@ on what features you intend to use:
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No | | [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
| [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes | | [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes |
| [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes | | [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes |
| [Pseudonymizer](../pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | No | | [Pseudonymizer](../pseudonymizer.md) (optional feature) | No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No | | [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
| [Terraform state files](../terraform_state.md#using-object-storage) | Yes | | [Terraform state files](../terraform_state.md#using-object-storage) | Yes |
......
...@@ -2039,7 +2039,7 @@ on what features you intend to use: ...@@ -2039,7 +2039,7 @@ on what features you intend to use:
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No | | [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
| [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes | | [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes |
| [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes | | [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes |
| [Pseudonymizer](../pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | No | | [Pseudonymizer](../pseudonymizer.md) (optional feature) | No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No | | [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
| [Terraform state files](../terraform_state.md#using-object-storage) | Yes | | [Terraform state files](../terraform_state.md#using-object-storage) | Yes |
......
...@@ -2105,7 +2105,7 @@ on what features you intend to use: ...@@ -2105,7 +2105,7 @@ on what features you intend to use:
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No | | [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
| [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes | | [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes |
| [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes | | [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes |
| [Pseudonymizer](../pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | No | | [Pseudonymizer](../pseudonymizer.md) (optional feature) | No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No | | [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
| [Terraform state files](../terraform_state.md#using-object-storage) | Yes | | [Terraform state files](../terraform_state.md#using-object-storage) | Yes |
......
...@@ -2033,7 +2033,7 @@ on what features you intend to use: ...@@ -2033,7 +2033,7 @@ on what features you intend to use:
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No | | [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
| [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes | | [Packages](../packages/index.md#using-object-storage) (optional feature) | Yes |
| [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes | | [Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (optional feature) | Yes |
| [Pseudonymizer](../pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | No | | [Pseudonymizer](../pseudonymizer.md) (optional feature) | No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No | | [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
| [Terraform state files](../terraform_state.md#using-object-storage) | Yes | | [Terraform state files](../terraform_state.md#using-object-storage) | Yes |
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment