Commit dd3a3624 authored by Amy Qualls's avatar Amy Qualls Committed by Bob Van Landuyt

Revise to bring closer to GitLab tone / style

Various fixes to bring the page closer to GitLab tone and style,
including:

- fix spelling
- bring each link onto one line, for better syntax highlighting
- revise for passive voice
- revise linking text to describe the page being linked to, rather
  than using words like 'here' and 'this issue' (semantic linking
  words)
- take out a forward-looking statement - we need to keep those out
  of our documentation for legal reasons
parent 9903e51b
...@@ -52,7 +52,7 @@ be higher than those defined above. ...@@ -52,7 +52,7 @@ be higher than those defined above.
For example: for the web-service, we want at least 99.8% of requests For example: for the web-service, we want at least 99.8% of requests
to be faster than their target duration. to be faster than their target duration.
These are the targets we use for alerting and service montoring. So These are the targets we use for alerting and service monitoring. So
durations should be set keeping those into account. So we would not durations should be set keeping those into account. So we would not
cause alerts. But the goal would be to set the urgency to a target cause alerts. But the goal would be to set the urgency to a target
that users would be satisfied with. that users would be satisfied with.
...@@ -63,7 +63,7 @@ error budget for stage groups. ...@@ -63,7 +63,7 @@ error budget for stage groups.
## Adjusting request urgency ## Adjusting request urgency
Not all endpoints perform the same type of work, so it is possible to Not all endpoints perform the same type of work, so it is possible to
define different urgencies for different endpoints. An endpoint with a define different urgency levels for different endpoints. An endpoint with a
lower urgency can have a longer request duration than endpoints that lower urgency can have a longer request duration than endpoints that
are high urgency. are high urgency.
...@@ -90,7 +90,7 @@ a case-by-case basis. Please take the following into account: ...@@ -90,7 +90,7 @@ a case-by-case basis. Please take the following into account:
1. The workload for some endpoints can sometimes differ greatly 1. The workload for some endpoints can sometimes differ greatly
depending on the parameters specified by the caller. The urgency depending on the parameters specified by the caller. The urgency
needs to accomodate that. In some cases, it might be interesting to needs to accommodate that. In some cases, it might be interesting to
define a separate [application SLI](index.md#defining-a-new-sli) define a separate [application SLI](index.md#defining-a-new-sli)
for what the endpoint is doing. for what the endpoint is doing.
...@@ -99,7 +99,7 @@ a case-by-case basis. Please take the following into account: ...@@ -99,7 +99,7 @@ a case-by-case basis. Please take the following into account:
target. For example, if the `MergeRequests::DraftsController` is target. For example, if the `MergeRequests::DraftsController` is
hit for every merge request being viewed, but doesn't need to hit for every merge request being viewed, but doesn't need to
render anything in most cases, then we should pick the target that render anything in most cases, then we should pick the target that
would still accomodate the endpoint performing work. would still accommodate the endpoint performing work.
1. Consider the dependent resources consumed by the endpoint. If the endpoint 1. Consider the dependent resources consumed by the endpoint. If the endpoint
loads a lot of data from Gitaly or the database and this is causing loads a lot of data from Gitaly or the database and this is causing
...@@ -117,10 +117,10 @@ a case-by-case basis. Please take the following into account: ...@@ -117,10 +117,10 @@ a case-by-case basis. Please take the following into account:
should try to keep as short as possible. should try to keep as short as possible.
1. Traffic characteristics should also be taken into account: if the 1. Traffic characteristics should also be taken into account: if the
trafic to the endpoint is bursty, like CI traffic spinning up a traffic to the endpoint is bursty, like CI traffic spinning up a
big batch of jobs hitting the same endpoint, then having these big batch of jobs hitting the same endpoint, then having these
endpoints take 5s is not acceptable from an infrastructure point of endpoints take 5s is not acceptable from an infrastructure point of
view. We cannot scale up the fleet fast enough to accomodate for view. We cannot scale up the fleet fast enough to accommodate for
the incoming slow requests alongside the regular traffic. the incoming slow requests alongside the regular traffic.
When lowering the urgency for an existing endpoint, please involve a When lowering the urgency for an existing endpoint, please involve a
...@@ -146,14 +146,14 @@ information in the logs to determine this: ...@@ -146,14 +146,14 @@ information in the logs to determine this:
1. The table loads information for the busiest endpoints by 1. The table loads information for the busiest endpoints by
default. You can speed things up by adding a filter for default. You can speed things up by adding a filter for
`json.caller_id.keyword` and adding the identifier you're intersted `json.caller_id.keyword` and adding the identifier you're interested
in (for example: `Projects::RawController#show`). in (for example: `Projects::RawController#show`).
1. Check the [appropriate percentile duration](#request-apdex-slo) for 1. Check the [appropriate percentile duration](#request-apdex-slo) for
the service the endpoint is handled by. The overall duration should the service the endpoint is handled by. The overall duration should
be lower than the target you intend to set. be lower than the target you intend to set.
1. Assess if the overall duration is below the intended target. Please also 1. If the overall duration is below the intended target. Please also
check the peaks over time in [this check the peaks over time in [this
graph](https://log.gprd.gitlab.net/goto/9319c4a402461d204d13f3a4924a89fc) graph](https://log.gprd.gitlab.net/goto/9319c4a402461d204d13f3a4924a89fc)
in Kibana. Here, the percentile in question should not peak above in Kibana. Here, the percentile in question should not peak above
...@@ -235,21 +235,20 @@ end ...@@ -235,21 +235,20 @@ end
### Error budget attribution and ownership ### Error budget attribution and ownership
This SLI is used for service level monitoring and will feed into This SLI is used for service level monitoring. It feeds into the
the [error budget for stage [error budget for stage groups](../stage_group_dashboards.md#error-budget) when
groups](../stage_group_dashboards.md#error-budget) when opting in (see [this opting in. For more information, read the epic for
project](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525)). The [defining custom SLIs and incorporating them into error budgets](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525)).
endpoints for the SLI feed into a group's error budget based on the [feature The endpoints for the SLI feed into a group's error budget based on the
category declared on it](../feature_categorization/index.md). [feature category declared on it](../feature_categorization/index.md).
To know which endpoints are included for your group, you can see the To know which endpoints are included for your group, you can see the
request rates on the [group request rates on the
dashboard for your [group dashboard for your group](https://dashboards.gitlab.net/dashboards/f/stage-groups/stage-groups).
group](https://dashboards.gitlab.net/dashboards/f/stage-groups/stage-groups). The In the **Budget Attribution** row, the **Puma apdex** log link shows you
"Puma apdex" log-link in the "Budget Attribution" row will show you
how many requests are not meeting a 1s or 5s target. how many requests are not meeting a 1s or 5s target.
Learn more about the content of the dashboard in [the Learn more about the content of the dashboard in the documentation for
documentation](../stage_group_dashboards.md). We intend on iterating [Dashboards for stage groups](../stage_group_dashboards.md). For more information
on the exploration of the error budget itself in [this on our exploration of the error budget itself, read the infrastructure issue
issue](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1365). [Stage group error budget exploration dashboard](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1365).
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment