Commit ae3ce141 authored by Amy Qualls's avatar Amy Qualls

Merge branch 'kpaizee-extract-develop-test-service-ping' into 'master'

Move 'Develop and test Service Ping' to its own page

See merge request gitlab-org/gitlab!68309
parents d71f1b02 19220cea
......@@ -21,7 +21,7 @@ When you are optimizing your SQL queries, there are two dimensions to pay attent
| Queries in a migration | `100ms` | This is different than the total [migration time](migration_style_guide.md#how-long-a-migration-should-take). |
| Concurrent operations in a migration | `5min` | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`. |
| Background migrations | `1s` | |
| Service Ping | `1s` | See the [Service Ping docs](service_ping/index.md#develop-and-test-service-ping) for more details. |
| Service Ping | `1s` | See the [Service Ping docs](service_ping/implement.md) for more details. |
- When analyzing your query's performance, pay attention to if the time you are seeing is on a [cold or warm cache](#cold-and-warm-cache). These guidelines apply for both cache types.
- When working with batched queries, change the range and batch size to see how it effects the query timing and caching.
......
---
stage: Growth
group: Product Intelligence
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Develop and test Service Ping
To add a new metric and test Service Ping:
1. [Name and place the metric](#name-and-place-the-metric)
1. [Test counters manually using your Rails console](#test-counters-manually-using-your-rails-console)
1. [Generate the SQL query](#generate-the-sql-query)
1. [Optimize queries with `#database-lab`](#optimize-queries-with-database-lab)
1. [Add the metric definition](#add-the-metric-definition)
1. [Add the metric to the Versions Application](#add-the-metric-to-the-versions-application)
1. [Create a merge request](#create-a-merge-request)
1. [Verify your metric](#verify-your-metric)
1. [Set up and test Service Ping locally](#set-up-and-test-service-ping-locally)
## Name and place the metric
Add the metric in one of the top-level keys:
- `settings`: for settings related metrics.
- `counts_weekly`: for counters that have data for the most recent 7 days.
- `counts_monthly`: for counters that have data for the most recent 28 days.
- `counts`: for counters that have data for all time.
### How to get a metric name suggestion
The metric YAML generator can suggest a metric name for you.
To generate a metric name suggestion, first instrument the metric at the provided `key_path`.
Then, generate the metric's YAML definition and
return to the instrumentation and update it.
1. Add the metric instrumentation to `lib/gitlab/usage_data.rb` inside one
of the [top-level keys](#name-and-place-the-metric), using any name you choose.
1. Run the [metrics YAML generator](metrics_dictionary.md#metrics-definition-and-validation).
1. Use the metric name suggestion to select a suitable metric name.
1. Update the instrumentation you created in the first step and change the metric name to the suggested name.
1. Update the metric's YAML definition with the correct `key_path`.
## Test counters manually using your Rails console
```ruby
# count
Gitlab::UsageData.count(User.active)
Gitlab::UsageData.count(::Clusters::Cluster.aws_installed.enabled, :cluster_id)
# count distinct
Gitlab::UsageData.distinct_count(::Project, :creator_id)
Gitlab::UsageData.distinct_count(::Note.with_suggestions.where(time_period), :author_id, start: ::User.minimum(:id), finish: ::User.maximum(:id))
```
## Generate the SQL query
Your Rails console returns the generated SQL queries. For example:
```ruby
pry(main)> Gitlab::UsageData.count(User.active)
(2.6ms) SELECT "features"."key" FROM "features"
(15.3ms) SELECT MIN("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
(2.4ms) SELECT MAX("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
(1.9ms) SELECT COUNT("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4)) AND "users"."id" BETWEEN 1 AND 100000
```
## Optimize queries with `#database-lab`
`#database-lab` is a Slack channel that uses a production-sized environment to test your queries.
Paste the SQL query into `#database-lab` to see how the query performs at scale.
- GitLab.com's production database has a 15 second timeout.
- Any single query must stay below the [1 second execution time](../query_performance.md#timing-guidelines-for-queries) with cold caches.
- Add a specialized index on columns involved to reduce the execution time.
To understand the query's execution, we add the following information
to a merge request description:
- For counters that have a `time_period` test, we add information for both:
- `time_period = {}` for all time periods.
- `time_period = { created_at: 28.days.ago..Time.current }` for the last 28 days.
- Execution plan and query time before and after optimization.
- Query generated for the index and time.
- Migration output for up and down execution.
We also use `#database-lab` and [explain.depesz.com](https://explain.depesz.com/). For more details, see the [database review guide](../database_review.md#preparation-when-adding-or-modifying-queries).
### Optimization recommendations and examples
- Use specialized indexes. For examples, see these merge requests:
- [Example 1](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26871)
- [Example 2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26445)
- Use defined `start` and `finish`, and simple queries.
These values can be memoized and reused, as in this [example merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/37155).
- Avoid joins and write the queries as simply as possible,
as in this [example merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/36316).
- Set a custom `batch_size` for `distinct_count`, as in this [example merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/38000).
## Add the metric definition
See the [Metrics Dictionary guide](metrics_dictionary.md) for more information.
## Add the metric to the Versions Application
Check if the new metric must be added to the Versions Application. See the `usage_data` [schema](https://gitlab.com/gitlab-services/version-gitlab-com/-/blob/master/db/schema.rb#L147) and Service Data [parameters accepted](https://gitlab.com/gitlab-services/version-gitlab-com/-/blob/master/app/services/usage_ping.rb). Any metrics added under the `counts` key are saved in the `stats` column.
## Create a merge request
Create a merge request for the new Service Ping metric, and do the following:
- Add the `feature` label to the merge request. A metric is a user-facing change and is part of expanding the Service Ping feature.
- Add a changelog entry that complies with the [changelog entries guide](../changelog.md).
- Ask for a Product Intelligence review.
On GitLab.com, we have DangerBot set up to monitor Product Intelligence related files and recommend a [Product Intelligence review](review_guidelines.md).
## Verify your metric
On GitLab.com, the Product Intelligence team regularly [monitors Service Ping](https://gitlab.com/groups/gitlab-org/-/epics/6000).
They may alert you that your metrics need further optimization to run quicker and with greater success.
The Service Ping JSON payload for GitLab.com is shared in the
[#g_product_intelligence](https://gitlab.slack.com/archives/CL3A7GFPF) Slack channel every week.
You may also use the [Service Ping QA dashboard](https://app.periscopedata.com/app/gitlab/632033/Usage-Ping-QA) to check how well your metric performs.
The dashboard allows filtering by GitLab version, by "Self-managed" and "SaaS", and shows you how many failures have occurred for each metric. Whenever you notice a high failure rate, you can re-optimize your metric.
## Set up and test Service Ping locally
To set up Service Ping locally, you must:
1. [Set up local repositories](#set-up-local-repositories).
1. [Test local setup](#test-local-setup).
1. (Optional) [Test Prometheus-based Service Ping](#test-prometheus-based-service-ping).
### Set up local repositories
1. Clone and start [GitLab](https://gitlab.com/gitlab-org/gitlab-development-kit).
1. Clone and start [Versions Application](https://gitlab.com/gitlab-services/version-gitlab-com).
Make sure you run `docker-compose up` to start a PostgreSQL and Redis instance.
1. Point GitLab to the Versions Application endpoint instead of the default endpoint:
1. Open [service_ping/submit_service.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/service_ping/submit_service.rb#L5) in your local and modified `PRODUCTION_URL`.
1. Set it to the local Versions Application URL: `http://localhost:3000/usage_data`.
### Test local setup
1. Using the `gitlab` Rails console, manually trigger Service Ping:
```ruby
ServicePing::SubmitService.new.execute
```
1. Use the `versions` Rails console to check the Service Ping was successfully received,
parsed, and stored in the Versions database:
```ruby
UsageData.last
```
## Test Prometheus-based Service Ping
If the data submitted includes metrics [queried from Prometheus](index.md#prometheus-queries)
you want to inspect and verify, you must:
- Ensure that a Prometheus server is running locally.
- Ensure the respective GitLab components are exporting metrics to the Prometheus server.
If you do not need to test data coming from Prometheus, no further action
is necessary. Service Ping should degrade gracefully in the absence of a running Prometheus server.
Three kinds of components may export data to Prometheus, and are included in Service Ping:
- [`node_exporter`](https://github.com/prometheus/node_exporter): Exports node metrics
from the host machine.
- [`gitlab-exporter`](https://gitlab.com/gitlab-org/gitlab-exporter): Exports process metrics
from various GitLab components.
- Other various GitLab services, such as Sidekiq and the Rails server, which export their own metrics.
### Test with an Omnibus container
This is the recommended approach to test Prometheus-based Service Ping.
To verify your change, build a new Omnibus image from your code branch using CI/CD, download the image,
and run a local container instance:
1. From your merge request, select the `qa` stage, then trigger the `package-and-qa` job. This job triggers an Omnibus
build in a [downstream pipeline of the `omnibus-gitlab-mirror` project](https://gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/-/pipelines).
1. In the downstream pipeline, wait for the `gitlab-docker` job to finish.
1. Open the job logs and locate the full container name including the version. It takes the following form: `registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>`.
1. On your local machine, make sure you are signed in to the GitLab Docker registry. You can find the instructions for this in
[Authenticate to the GitLab Container Registry](../../user/packages/container_registry/index.md#authenticate-with-the-container-registry).
1. Once signed in, download the new image by using `docker pull registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>`
1. For more information about working with and running Omnibus GitLab containers in Docker, refer to [GitLab Docker images](https://docs.gitlab.com/omnibus/docker/README.html) in the Omnibus documentation.
### Test with GitLab development toolkits
This is the less recommended approach, because it comes with a number of difficulties when emulating a real GitLab deployment.
The [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) is not set up to run a Prometheus server or `node_exporter` alongside other GitLab components. If you would
like to do so, [Monitoring the GDK with Prometheus](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/prometheus/index.md#monitoring-the-gdk-with-prometheus) is a good start.
The [GCK](https://gitlab.com/gitlab-org/gitlab-compose-kit) has limited support for testing Prometheus based Service Ping.
By default, it comes with a fully configured Prometheus service that is set up to scrape a number of components.
However, it has the following limitations:
- It does not run a `gitlab-exporter` instance, so several `process_*` metrics from services such as Gitaly may be missing.
- While it runs a `node_exporter`, `docker-compose` services emulate hosts, meaning that it normally reports itself as not associated
with any of the other running services. That is not how node metrics are reported in a production setup, where `node_exporter`
always runs as a process alongside other GitLab components on any given node. For Service Ping, none of the node data would therefore
appear to be associated to any of the services running, because they all appear to be running on different hosts. To alleviate this problem, the `node_exporter` in GCK was arbitrarily "assigned" to the `web` service, meaning only for this service `node_*` metrics appears in Service Ping.
......@@ -867,197 +867,6 @@ We return fallback values in these cases:
| Timeouts, general failures | -1 |
| Standard errors in counters | -2 |
## Develop and test Service Ping
### 1. Name and place the metric
Add the metric in one of the top level keys:
- `settings`: for settings related metrics.
- `counts_weekly`: for counters that have data for the most recent 7 days.
- `counts_monthly`: for counters that have data for the most recent 28 days.
- `counts`: for counters that have data for all time.
#### How to get a metric name suggestion
The metric YAML generator can suggest a metric name for you. To generate a metric name suggestion,
first instrument the metric at the provided `key_path`, generate the metrics YAML definition, then
return to the instrumentation and update it.
1. Add the metric instrumentation to `lib/gitlab/usage_data.rb` inside one
of the [top level keys](index.md#1-name-and-place-the-metric), using any name you choose.
1. Run the [metrics YAML generator](metrics_dictionary.md#metrics-definition-and-validation).
1. Use the metric name suggestion to select a suitable metric name.
1. Update the instrumentation you created in the first step and change the metric name to the suggested name.
1. Update the metric's YAML definition with the correct `key_path`.
### 2. Use your Rails console to manually test counters
```ruby
# count
Gitlab::UsageData.count(User.active)
Gitlab::UsageData.count(::Clusters::Cluster.aws_installed.enabled, :cluster_id)
# count distinct
Gitlab::UsageData.distinct_count(::Project, :creator_id)
Gitlab::UsageData.distinct_count(::Note.with_suggestions.where(time_period), :author_id, start: ::User.minimum(:id), finish: ::User.maximum(:id))
```
### 3. Generate the SQL query
Your Rails console returns the generated SQL queries.
Example:
```ruby
pry(main)> Gitlab::UsageData.count(User.active)
(2.6ms) SELECT "features"."key" FROM "features"
(15.3ms) SELECT MIN("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
(2.4ms) SELECT MAX("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
(1.9ms) SELECT COUNT("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4)) AND "users"."id" BETWEEN 1 AND 100000
```
### 4. Optimize queries with #database-lab
Paste the SQL query into `#database-lab` to see how the query performs at scale.
- `#database-lab` is a Slack channel which uses a production-sized environment to test your queries.
- GitLab.com's production database has a 15 second timeout.
- Any single query must stay below [1 second execution time](../query_performance.md#timing-guidelines-for-queries) with cold caches.
- Add a specialized index on columns involved to reduce the execution time.
To have an understanding of the query's execution we add in the MR description the following information:
- For counters that have a `time_period` test we add information for both cases:
- `time_period = {}` for all time periods
- `time_period = { created_at: 28.days.ago..Time.current }` for last 28 days period
- Execution plan and query time before and after optimization
- Query generated for the index and time
- Migration output for up and down execution
We also use `#database-lab` and [explain.depesz.com](https://explain.depesz.com/). For more details, see the [database review guide](../database_review.md#preparation-when-adding-or-modifying-queries).
#### Optimization recommendations and examples
- Use specialized indexes [example 1](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26871), [example 2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26445).
- Use defined `start` and `finish`, and simple queries. These values can be memoized and reused, [example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/37155).
- Avoid joins and write the queries as simply as possible, [example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/36316).
- Set a custom `batch_size` for `distinct_count`, [example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/38000).
### 5. Add the metric definition
[Check Metrics Dictionary Guide](metrics_dictionary.md)
### 6. Add new metric to Versions Application
Check if the new metric must be added to the Versions Application. See `usage_data` [schema](https://gitlab.com/gitlab-services/version-gitlab-com/-/blob/master/db/schema.rb#L147) and Service Data [parameters accepted](https://gitlab.com/gitlab-services/version-gitlab-com/-/blob/master/app/services/usage_ping.rb). Any metrics added under the `counts` key are saved in the `stats` column.
### 7. Add the feature label
Add the `feature` label to the Merge Request for new Service Ping metrics. These are user-facing changes and are part of expanding the Service Ping feature.
### 8. Add a changelog entry
Ensure you comply with the [Changelog entries guide](../changelog.md).
### 9. Ask for a Product Intelligence review
On GitLab.com, we have DangerBot set up to monitor Product Intelligence related files and recommend a [Product Intelligence review](review_guidelines.md).
### 10. Verify your metric
On GitLab.com, the Product Intelligence team regularly [monitors Service Ping](https://gitlab.com/groups/gitlab-org/-/epics/6000).
They may alert you that your metrics need further optimization to run quicker and with greater success.
The Service Ping JSON payload for GitLab.com is shared in the
[#g_product_intelligence](https://gitlab.slack.com/archives/CL3A7GFPF) Slack channel every week.
You may also use the [Service Ping QA dashboard](https://app.periscopedata.com/app/gitlab/632033/Usage-Ping-QA) to check how well your metric performs. The dashboard allows filtering by GitLab version, by "Self-managed" & "SaaS" and shows you how many failures have occurred for each metric. Whenever you notice a high failure rate, you may re-optimize your metric.
### Service Ping local setup
To set up Service Ping locally, you must:
1. [Set up local repositories](#set-up-local-repositories).
1. [Test local setup](#test-local-setup).
1. (Optional) [Test Prometheus-based Service Ping](#test-prometheus-based-service-ping).
#### Set up local repositories
1. Clone and start [GitLab](https://gitlab.com/gitlab-org/gitlab-development-kit).
1. Clone and start [Versions Application](https://gitlab.com/gitlab-services/version-gitlab-com).
Make sure to run `docker-compose up` to start a PostgreSQL and Redis instance.
1. Point GitLab to the Versions Application endpoint instead of the default endpoint:
1. Open [service_ping/submit_service.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/service_ping/submit_service.rb#L5) in your local and modified `PRODUCTION_URL`.
1. Set it to the local Versions Application URL `http://localhost:3000/usage_data`.
#### Test local setup
1. Using the `gitlab` Rails console, manually trigger Service Ping:
```ruby
ServicePing::SubmitService.new.execute
```
1. Use the `versions` Rails console to check the Service Ping was successfully received,
parsed, and stored in the Versions database:
```ruby
UsageData.last
```
### Test Prometheus-based Service Ping
If the data submitted includes metrics [queried from Prometheus](#prometheus-queries)
you want to inspect and verify, you must:
- Ensure that a Prometheus server is running locally.
- Ensure the respective GitLab components are exporting metrics to the Prometheus server.
If you do not need to test data coming from Prometheus, no further action
is necessary. Service Ping should degrade gracefully in the absence of a running Prometheus server.
Three kinds of components may export data to Prometheus, and are included in Service Ping:
- [`node_exporter`](https://github.com/prometheus/node_exporter): Exports node metrics
from the host machine.
- [`gitlab-exporter`](https://gitlab.com/gitlab-org/gitlab-exporter): Exports process metrics
from various GitLab components.
- Other various GitLab services, such as Sidekiq and the Rails server, which export their own metrics.
#### Test with an Omnibus container
This is the recommended approach to test Prometheus based Service Ping.
The easiest way to verify your changes is to build a new Omnibus image from your code branch using CI/CD, download the image,
and run a local container instance:
1. From your merge request, select the `qa` stage, then trigger the `package-and-qa` job. This job triggers an Omnibus
build in a [downstream pipeline of the `omnibus-gitlab-mirror` project](https://gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/-/pipelines).
1. In the downstream pipeline, wait for the `gitlab-docker` job to finish.
1. Open the job logs and locate the full container name including the version. It takes the following form: `registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>`.
1. On your local machine, make sure you are signed in to the GitLab Docker registry. You can find the instructions for this in
[Authenticate to the GitLab Container Registry](../../user/packages/container_registry/index.md#authenticate-with-the-container-registry).
1. Once signed in, download the new image by using `docker pull registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>`
1. For more information about working with and running Omnibus GitLab containers in Docker, please refer to [GitLab Docker images](https://docs.gitlab.com/omnibus/docker/README.html) in the Omnibus documentation.
#### Test with GitLab development toolkits
This is the less recommended approach, because it comes with a number of difficulties when emulating a real GitLab deployment.
The [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) is not set up to run a Prometheus server or `node_exporter` alongside other GitLab components. If you would
like to do so, [Monitoring the GDK with Prometheus](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/prometheus/index.md#monitoring-the-gdk-with-prometheus) is a good start.
The [GCK](https://gitlab.com/gitlab-org/gitlab-compose-kit) has limited support for testing Prometheus based Service Ping.
By default, it already comes with a fully configured Prometheus service that is set up to scrape a number of components,
but with the following limitations:
- It does not run a `gitlab-exporter` instance, so several `process_*` metrics from services such as Gitaly may be missing.
- While it runs a `node_exporter`, `docker-compose` services emulate hosts, meaning that it normally reports itself as not associated
with any of the other running services. That is not how node metrics are reported in a production setup, where `node_exporter`
always runs as a process alongside other GitLab components on any given node. For Service Ping, none of the node data would therefore
appear to be associated to any of the services running, because they all appear to be running on different hosts. To alleviate this problem, the `node_exporter` in GCK was arbitrarily "assigned" to the `web` service, meaning only for this service `node_*` metrics appears in Service Ping.
## Aggregated metrics
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45979) in GitLab 13.6.
......
......@@ -51,9 +51,9 @@ are regular backend changes.
#### The Product Intelligence **reviewer** should
- Perform a first-pass review on the merge request and suggest improvements to the author.
- Check the [metrics location](index.md#1-name-and-place-the-metric) in
- Check the [metrics location](implement.md#name-and-place-the-metric) in
the Service Ping JSON payload.
- Suggest that the author checks the [naming suggestion](index.md#how-to-get-a-metric-name-suggestion) while
- Suggest that the author checks the [naming suggestion](implement.md#how-to-get-a-metric-name-suggestion) while
generating the metric's YAML definition.
- Add the `~database` label and ask for a [database review](../database_review.md) for
metrics that are based on Database.
......@@ -68,8 +68,7 @@ are regular backend changes.
- Check the file location. Consider the time frame, and if the file should be under `ee`.
- Check the tiers.
- Metrics instrumentations
- Recommend to use metrics instrumentation for new metrics addded to service with
[limitations](metrics_instrumentation.md#support-for-instrumentation-classes)
- Recommend using metrics instrumentation for new metrics, [if possible](metrics_instrumentation.md#support-for-instrumentation-classes).
- Approve the MR, and relabel the MR with `~"product intelligence::approved"`.
## Review workload distribution
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment