Commit 0cf15bff authored by Russell Dickenson's avatar Russell Dickenson

Merge branch 'eread/consolidate-gitaly-related-monitoring-information' into 'master'

Consolidate Gitaly and Gitaly Cluster-related monitoring information

See merge request gitlab-org/gitlab!67088
parents 625dd03e d145c9e5
......@@ -113,14 +113,5 @@ URL to use SSH.
### Observe Git protocol version of connections
To observe what Git protocol versions are being used in a
production environment, you can use the following Prometheus query:
```prometheus
sum(rate(gitaly_git_protocol_requests_total[1m])) by (grpc_method,git_protocol,grpc_service)
```
<!-- This link sporadically returns a 503 during automated link checking but is correct -->
You can view what Git protocol versions are being used on GitLab.com at
<https://dashboards.gitlab.com/d/pqlQq0xik/git-protocol-versions>.
For information on observing the Git protocol versions are being used in a production environment,
see the [relevant documentation](gitaly/index.md#useful-queries).
......@@ -684,12 +684,8 @@ To configure Gitaly with TLS:
### Observe type of Gitaly connections
[Prometheus](../monitoring/prometheus/index.md) can be used observe what type of connections Gitaly
is serving a production environment. Use the following Prometheus query:
```prometheus
sum(rate(gitaly_connections_total[5m])) by (type)
```
For information on observing the type of Gitaly connections being served, see the
[relevant documentation](index.md#useful-queries).
## `gitaly-ruby`
......@@ -781,26 +777,8 @@ repository. In the example above:
- If another request comes in for a repository that has used up its 20 slots, that request gets
queued.
You can observe the behavior of this queue using the Gitaly logs and Prometheus:
- In the Gitaly logs, look for the string (or structured log field) `acquire_ms`. Messages that have
this field are reporting about the concurrency limiter.
- In Prometheus, look for the following metrics:
- `gitaly_rate_limiting_in_progress`.
- `gitaly_rate_limiting_queued`.
- `gitaly_rate_limiting_seconds`.
The metric definitions are available:
- Directly from Prometheus `/metrics` endpoint configured for Gitaly.
- Using [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/) on a
Grafana instance configured against Prometheus.
NOTE:
Although the name of the Prometheus metric contains `rate_limiting`, it's a concurrency limiter, not
a rate limiter. If a Gitaly client makes 1,000 requests in a row very quickly, concurrency doesn't
exceed 1, and the concurrency limiter has no effect.
You can observe the behavior of this queue using the Gitaly logs and Prometheus. For more
information, see the [relevant documentation](index.md#monitor-gitaly).
## Background Repository Optimization
......@@ -854,30 +832,11 @@ server" and "Gitaly client" refers to the same machine.
### Verify authentication monitoring
Before rotating a Gitaly authentication token, verify that you can monitor the authentication
behavior of your GitLab installation using Prometheus. Use the following Prometheus query:
```prometheus
sum(rate(gitaly_authentications_total[5m])) by (enforced, status)
```
Before rotating a Gitaly authentication token, verify that you can
[monitor the authentication behavior](index.md#useful-queries) of your GitLab installation using
Prometheus.
In a system where authentication is configured correctly and where you have live traffic, you
see something like this:
```prometheus
{enforced="true",status="ok"} 4424.985419441742
```
There may also be other numbers with rate 0. We care only about the non-zero numbers.
The only non-zero number should have `enforced="true",status="ok"`. If you have other non-zero
numbers, something is wrong in your configuration.
The `status="ok"` number reflects your current request rate. In the example above, Gitaly is
handling about 4000 requests per second.
Now that you have established that you can monitor the Gitaly authentication behavior of your GitLab
installation, you can begin the rest of the procedure.
You can then continue the rest of the procedure.
### Enable "auth transitioning" mode
......@@ -1084,9 +1043,8 @@ closed it.
### Observe the cache
The cache can be observed in logs and using metrics.
#### Logs
The cache can be observed [using metrics](index.md#monitor-gitaly) and in the following logged
information:
|Message|Fields|Description|
|:---|:---|:---|
......@@ -1146,33 +1104,3 @@ Example:
"time":"2021-03-25T14:57:53.543Z"
}
```
#### Metrics
The following cache metrics are available.
|Metric|Type|Labels|Description|
|:---|:---|:---|:---|
|`gitaly_pack_objects_cache_enabled`|gauge|`dir`,`max_age`|Set to `1` when the cache is enabled via the Gitaly configuration file|
|`gitaly_pack_objects_cache_lookups_total`|counter|`result`|Hit/miss counter for cache lookups|
|`gitaly_pack_objects_generated_bytes_total`|counter||Number of bytes written into the cache|
|`gitaly_pack_objects_served_bytes_total`|counter||Number of bytes read from the cache|
|`gitaly_streamcache_filestore_disk_usage_bytes`|gauge|`dir`|Total size of cache files|
|`gitaly_streamcache_index_entries`|gauge|`dir`|Number of entries in the cache|
Some of these metrics start with `gitaly_streamcache`
because they are generated by the "streamcache" internal library
package in Gitaly.
Example:
```plaintext
gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1
gitaly_pack_objects_cache_lookups_total{result="hit"} 2
gitaly_pack_objects_cache_lookups_total{result="miss"} 1
gitaly_pack_objects_generated_bytes_total 2.618649e+07
gitaly_pack_objects_served_bytes_total 7.855947e+07
gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07
gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
```
......@@ -267,13 +267,7 @@ The primary node is chosen to serve the request if:
- There are no up to date nodes.
- Any other error occurs during node selection.
To track distribution of read operations, you can use the `gitaly_praefect_read_distribution`
Prometheus counter metric. It has two labels:
- `virtual_storage`.
- `storage`.
They reflect configuration defined for this instance of Praefect.
You can [monitor distribution of reads](#monitor-gitaly-cluster) using Prometheus.
#### Strong consistency
......@@ -312,6 +306,137 @@ For configuration information, see [Configure replication factor](praefect.md#co
For more information on configuring Gitaly Cluster, see [Configure Gitaly Cluster](praefect.md).
## Monitor Gitaly and Gitaly Cluster
You can use the available logs and [Prometheus metrics](../monitoring/prometheus/index.md) to
monitor Gitaly and Gitaly Cluster (Praefect).
Metric definitions are available:
- Directly from Prometheus `/metrics` endpoint configured for Gitaly.
- Using [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/) on a
Grafana instance configured against Prometheus.
### Monitor Gitaly
You can observe the behavior of [queued requests](configure_gitaly.md#limit-rpc-concurrency) using
the Gitaly logs and Prometheus:
- In the [Gitaly logs](../logs.md#gitaly-logs), look for the string (or structured log field)
`acquire_ms`. Messages that have this field are reporting about the concurrency limiter.
- In Prometheus, look for the following metrics:
- `gitaly_rate_limiting_in_progress`.
- `gitaly_rate_limiting_queued`.
- `gitaly_rate_limiting_seconds`.
Although the name of the Prometheus metric contains `rate_limiting`, it's a concurrency limiter,
not a rate limiter. If a Gitaly client makes 1,000 requests in a row very quickly, concurrency
doesn't exceed 1, and the concurrency limiter has no effect.
The following [pack-objects cache](configure_gitaly.md#pack-objects-cache) metrics are available:
- `gitaly_pack_objects_cache_enabled`, a gauge set to `1` when the cache is enabled. Available
labels: `dir` and `max_age`.
- `gitaly_pack_objects_cache_lookups_total`, a counter for cache lookups. Available label: `result`.
- `gitaly_pack_objects_generated_bytes_total`, a counter for the number of bytes written into the
cache.
- `gitaly_pack_objects_served_bytes_total`, a counter for the number of bytes read from the cache.
- `gitaly_streamcache_filestore_disk_usage_bytes`, a gauge for the total size of cache files.
Available label: `dir`.
- `gitaly_streamcache_index_entries`, a gauge for the number of entries in the cache. Available
label: `dir`.
Some of these metrics start with `gitaly_streamcache` because they are generated by the
`streamcache` internal library package in Gitaly.
Example:
```plaintext
gitaly_pack_objects_cache_enabled{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache",max_age="300"} 1
gitaly_pack_objects_cache_lookups_total{result="hit"} 2
gitaly_pack_objects_cache_lookups_total{result="miss"} 1
gitaly_pack_objects_generated_bytes_total 2.618649e+07
gitaly_pack_objects_served_bytes_total 7.855947e+07
gitaly_streamcache_filestore_disk_usage_bytes{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 2.6200152e+07
gitaly_streamcache_filestore_removed_total{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
gitaly_streamcache_index_entries{dir="/var/opt/gitlab/git-data/repositories/+gitaly/PackObjectsCache"} 1
```
#### Useful queries
The following are useful queries for monitoring Gitaly:
- Use the following Prometheus query to observe the
[type of connections](configure_gitaly.md#enable-tls-support) Gitaly is serving a production
environment:
```prometheus
sum(rate(gitaly_connections_total[5m])) by (type)
```
- Use the following Prometheus query to monitor the
[authentication behavior](configure_gitaly.md#observe-type-of-gitaly-connections) of your GitLab
installation:
```prometheus
sum(rate(gitaly_authentications_total[5m])) by (enforced, status)
```
In a system where authentication is configured correctly and where you have live traffic, you
see something like this:
```prometheus
{enforced="true",status="ok"} 4424.985419441742
```
There may also be other numbers with rate 0, but you only need to take note of the non-zero numbers.
The only non-zero number should have `enforced="true",status="ok"`. If you have other non-zero
numbers, something is wrong in your configuration.
The `status="ok"` number reflects your current request rate. In the example above, Gitaly is
handling about 4000 requests per second.
- Use the following Prometheus query to observe the [Git protocol versions](../git_protocol.md)
being used in a production environment:
```prometheus
sum(rate(gitaly_git_protocol_requests_total[1m])) by (grpc_method,git_protocol,grpc_service)
```
### Monitor Gitaly Cluster
To monitor Gitaly Cluster (Praefect), you can use these Prometheus metrics:
- `gitaly_praefect_read_distribution`, a counter to track [distribution of reads](#distributed-reads).
It has two labels:
- `virtual_storage`.
- `storage`.
They reflect configuration defined for this instance of Praefect.
- `gitaly_praefect_replication_latency_bucket`, a histogram measuring the amount of time it takes
for replication to complete once the replication job starts. Available in GitLab 12.10 and later.
- `gitaly_praefect_replication_delay_bucket`, a histogram measuring how much time passes between
when the replication job is created and when it starts. Available in GitLab 12.10 and later.
- `gitaly_praefect_node_latency_bucket`, a histogram measuring the latency in Gitaly returning
health check information to Praefect. This indicates Praefect connection saturation. Available in
GitLab 12.10 and later.
To monitor [strong consistency](#strong-consistency), you can use the following Prometheus metrics:
- `gitaly_praefect_transactions_total`, the number of transactions created and voted on.
- `gitaly_praefect_subtransactions_per_transaction_total`, the number of times nodes cast a vote for
a single transaction. This can happen multiple times if multiple references are getting updated in
a single transaction.
- `gitaly_praefect_voters_per_transaction_total`: the number of Gitaly nodes taking part in a
transaction.
- `gitaly_praefect_transactions_delay_seconds`, the server-side delay introduced by waiting for the
transaction to be committed.
- `gitaly_hook_transaction_voting_delay_seconds`, the client-side delay introduced by waiting for
the transaction to be committed.
## Do not bypass Gitaly
GitLab doesn't advise directly accessing Gitaly repositories stored on disk with a Git client,
......
......@@ -1094,19 +1094,8 @@ Feature.enable(:gitaly_reference_transactions)
Feature.disable(:gitaly_reference_transactions_primary_wins)
```
To monitor strong consistency, you can use the following Prometheus metrics:
- `gitaly_praefect_transactions_total`: Number of transactions created and
voted on.
- `gitaly_praefect_subtransactions_per_transaction_total`: Number of times
nodes cast a vote for a single transaction. This can happen multiple times if
multiple references are getting updated in a single transaction.
- `gitaly_praefect_voters_per_transaction_total`: Number of Gitaly nodes taking
part in a transaction.
- `gitaly_praefect_transactions_delay_seconds`: Server-side delay introduced by
waiting for the transaction to be committed.
- `gitaly_hook_transaction_voting_delay_seconds`: Client-side delay introduced
by waiting for the transaction to be committed.
For information on monitoring strong consistency, see the
[relevant documentation](index.md#monitor-gitaly-cluster).
## Configure replication factor
......
......@@ -71,7 +71,7 @@ Remember to disable `transitioning` when you are done
changing your token settings.
All authentication attempts are counted in Prometheus under
the `gitaly_authentications_total` metric.
the [`gitaly_authentications_total` metric](index.md#useful-queries).
### TLS
......
......@@ -8,7 +8,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
To enable the GitLab Prometheus metrics:
1. Log in to GitLab as a user with [administrator permissions](../../../user/permissions.md).
1. Log in to GitLab as a user with Administrator [role](../../../user/permissions.md).
1. On the top bar, select **Menu >** **{admin}** **Admin**.
1. On the left sidebar, select **Settings > Metrics and profiling**.
1. Find the **Metrics - Prometheus** section, and select **Add link to Prometheus**.
......@@ -153,15 +153,8 @@ The following metrics can be controlled by feature flags:
## Praefect metrics
You can [configure Praefect to report metrics](../../gitaly/praefect.md#praefect).
These are some of the Praefect metrics served from the `/metrics` path on the [configured port](index.md#changing-the-port-and-address-prometheus-listens-on)
(9652 by default).
| Metric | Type | Since | Description | Labels |
| :----- | :--- | ----: | :---------- | :----- |
| `gitaly_praefect_replication_latency_bucket` | Histogram | 12.10 | The amount of time it takes for replication to complete once the replication job starts. | |
| `gitaly_praefect_replication_delay_bucket` | Histogram | 12.10 | A measure of how much time passes between when the replication job is created and when it starts. | |
| `gitaly_praefect_node_latency_bucket` | Histogram | 12.10 | The latency in Gitaly returning health check information to Praefect. This indicates Praefect connection saturation. | |
You can [configure Praefect](../../gitaly/praefect.md#praefect) to report metrics. For information
on available metrics, see the [relevant documentation](../../gitaly/index.md#monitor-gitaly-cluster).
## Sidekiq metrics
......
......@@ -8,18 +8,19 @@ info: To determine the technical writer assigned to the Stage/Group associated w
[Prometheus](https://prometheus.io) is a powerful time-series monitoring service, providing a flexible
platform for monitoring GitLab and other software products.
GitLab provides out-of-the-box monitoring with Prometheus, providing easy
access to high quality time-series monitoring of GitLab services.
> **Notes:**
>
> - Prometheus and the various exporters listed in this page are bundled in the
> Omnibus GitLab package. Check each exporter's documentation for the timeline
> they got added. For installations from source you must install them
> yourself. Over subsequent releases additional GitLab metrics are captured.
> - Prometheus services are on by default with GitLab 9.0.
> - Prometheus and its exporters don't authenticate users, and are available
> to anyone who can access them.
Prometheus and the various exporters listed in this page are bundled in the
Omnibus GitLab package. Check each exporter's documentation for the timeline
they got added. For installations from source you must install them
yourself. Over subsequent releases additional GitLab metrics are captured.
Prometheus services are on by default.
Prometheus and its exporters don't authenticate users, and are available to anyone who can access
them.
## Overview
......@@ -33,7 +34,7 @@ dashboard tool like [Grafana](https://grafana.com).
For installations from source, you must install and configure it yourself.
Prometheus and its exporters are on by default, starting with GitLab 9.0.
Prometheus and its exporters are on by default.
Prometheus runs as the `gitlab-prometheus` user and listen on
`http://localhost:9090`. By default, Prometheus is only accessible from the GitLab server itself.
Each exporter is automatically set up as a
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment