Commit 05a563b9 authored by allison.browne's avatar allison.browne Committed by Russell Dickenson

Add links to monitoring GitLab.com to perf docs

parent 8389f86e
...@@ -33,7 +33,7 @@ to original issue and epic. ...@@ -33,7 +33,7 @@ to original issue and epic.
and those maintaining a GitLab setup. and those maintaining a GitLab setup.
Any change submitted can have an impact not only on the application itself but Any change submitted can have an impact not only on the application itself but
also those maintaining it and those keeping it up and running (e.g. production also those maintaining it and those keeping it up and running (for example, production
engineers). As a result you should think carefully about the impact of your engineers). As a result you should think carefully about the impact of your
merge request on not only the application but also on the people keeping it up merge request on not only the application but also on the people keeping it up
and running. and running.
...@@ -85,34 +85,34 @@ the following: ...@@ -85,34 +85,34 @@ the following:
1. Is there something that we can do differently to not process such a 1. Is there something that we can do differently to not process such a
big data set? big data set?
1. Should we build some fail-safe mechanism to contain 1. Should we build some fail-safe mechanism to contain
computational complexity? Usually it is better to degrade computational complexity? Usually it's better to degrade
the service for a single user instead of all users. the service for a single user instead of all users.
## Query plans and database structure ## Query plans and database structure
The query plan can answer the questions whether we need additional The query plan can tell us if we will need additional
indexes, or whether we perform expensive filtering (i.e. using sequential scans). indexes, or expensive filtering (such as using sequential scans).
Each query plan should be run against substantial size of data set. Each query plan should be run against substantial size of data set.
For example if you look for issues with specific conditions, For example, if you look for issues with specific conditions,
you should consider validating the query against you should consider validating a query against
a small number (a few hundred) and a big number (100_000) of issues. a small number (a few hundred) and a big number (100_000) of issues.
See how the query will behave if the result will be a few See how the query will behave if the result will be a few
and a few thousand. and a few thousand.
This is needed as we have users using GitLab for very big projects and This is needed as we have users using GitLab for very big projects and
in a very unconventional way. Even, if it seems that it is unlikely in a very unconventional way. Even if it seems that it's unlikely
that such big data set will be used, it is still plausible that one that such a big data set will be used, it's still plausible that one
of our customers will have the problem with the feature. of our customers will encounter a problem with the feature.
Understanding ahead of time how it is going to behave at scale even if we accept it, Understanding ahead of time how it's going to behave at scale, even if we accept it,
is the desired outcome. We should always have a plan or understanding what it takes is the desired outcome. We should always have a plan or understanding of what it will take
to optimise feature to the magnitude of higher usage patterns. to optimize the feature for higher usage patterns.
Every database structure should be optimised and sometimes even over-described Every database structure should be optimized and sometimes even over-described
to be prepared to be easily extended. The hardest part after some point is in preparation for easy extension. The hardest part after some point is
data migration. Migrating millions of rows will always be troublesome and data migration. Migrating millions of rows will always be troublesome and
can have negative impact on application. can have a negative impact on the application.
To better understand how to get help with the query plan reviews To better understand how to get help with the query plan reviews
read this section on [how to prepare the merge request for a database review](https://docs.gitlab.com/ee/development/database_review.html#how-to-prepare-the-merge-request-for-a-database-review). read this section on [how to prepare the merge request for a database review](https://docs.gitlab.com/ee/development/database_review.html#how-to-prepare-the-merge-request-for-a-database-review).
...@@ -167,14 +167,14 @@ be clearly mentioned in the merge request description. ...@@ -167,14 +167,14 @@ be clearly mentioned in the merge request description.
## Batch process ## Batch process
**Summary:** Iterating a single process to external services (e.g. PostgreSQL, Redis, Object Storage, etc) **Summary:** Iterating a single process to external services (for example, PostgreSQL, Redis, Object Storage)
should be executed in a **batch-style** in order to reduce connection overheads. should be executed in a **batch-style** in order to reduce connection overheads.
For fetching rows from various tables in a batch-style, please see [Eager Loading](#eager-loading) section. For fetching rows from various tables in a batch-style, please see [Eager Loading](#eager-loading) section.
### Example: Delete multiple files from Object Storage ### Example: Delete multiple files from Object Storage
When you delete multiple files from object storage (e.g. GCS), When you delete multiple files from object storage, like GCS,
executing a single REST API call multiple times is a quite expensive executing a single REST API call multiple times is a quite expensive
process. Ideally, this should be done in a batch-style, for example, S3 provides process. Ideally, this should be done in a batch-style, for example, S3 provides
[batch deletion API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html), [batch deletion API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html),
...@@ -187,12 +187,12 @@ in a batch style. ...@@ -187,12 +187,12 @@ in a batch style.
## Timeout ## Timeout
**Summary:** You should set a reasonable timeout when the system invokes HTTP calls **Summary:** You should set a reasonable timeout when the system invokes HTTP calls
to external services (e.g. Kubernetes), and it should be executed in Sidekiq, not to external services (such as Kubernetes), and it should be executed in Sidekiq, not
in Puma/Unicorn threads. in Puma/Unicorn threads.
Often, GitLab needs to communicate with an external service such as Kubernetes Often, GitLab needs to communicate with an external service such as Kubernetes
clusters. In this case, it's hard to estimate when the external service finishes clusters. In this case, it's hard to estimate when the external service finishes
the requested process, for example, if it's a user-owned cluster that is inactive for some reason, the requested process, for example, if it's a user-owned cluster that's inactive for some reason,
GitLab might wait for the response forever ([Example](https://gitlab.com/gitlab-org/gitlab/issues/31475)). GitLab might wait for the response forever ([Example](https://gitlab.com/gitlab-org/gitlab/issues/31475)).
This could result in Puma/Unicorn timeout and should be avoided at all cost. This could result in Puma/Unicorn timeout and should be avoided at all cost.
...@@ -203,7 +203,7 @@ Using [`ReactiveCaching`](https://docs.gitlab.com/ee/development/utilities.html# ...@@ -203,7 +203,7 @@ Using [`ReactiveCaching`](https://docs.gitlab.com/ee/development/utilities.html#
## Keep database transaction minimal ## Keep database transaction minimal
**Summary:** You should avoid accessing to external services (e.g. Gitaly) during database **Summary:** You should avoid accessing to external services like Gitaly during database
transactions, otherwise it leads to severe contention problems transactions, otherwise it leads to severe contention problems
as an open transaction basically blocks the release of a Postgres backend connection. as an open transaction basically blocks the release of a Postgres backend connection.
...@@ -247,14 +247,14 @@ necessary. ...@@ -247,14 +247,14 @@ necessary.
A merge request must not increase the memory usage of GitLab by more than the A merge request must not increase the memory usage of GitLab by more than the
absolute bare minimum required by the code. This means that if you have to parse absolute bare minimum required by the code. This means that if you have to parse
some large document (e.g. an HTML document) it's best to parse it as a stream some large document (for example, an HTML document) it's best to parse it as a stream
whenever possible, instead of loading the entire input into memory. Sometimes whenever possible, instead of loading the entire input into memory. Sometimes
this isn't possible, in that case this should be stated explicitly in the merge this isn't possible, in that case this should be stated explicitly in the merge
request. request.
## Lazy Rendering of UI Elements ## Lazy Rendering of UI Elements
**Summary:** only render UI elements when they're actually needed. **Summary:** only render UI elements when they are actually needed.
Certain UI elements may not always be needed. For example, when hovering over a Certain UI elements may not always be needed. For example, when hovering over a
diff line there's a small icon displayed that can be used to create a new diff line there's a small icon displayed that can be used to create a new
...@@ -284,7 +284,7 @@ data should be cached for a certain time period instead of the duration of the ...@@ -284,7 +284,7 @@ data should be cached for a certain time period instead of the duration of the
transaction. transaction.
For example, say you process multiple snippets of text containing username For example, say you process multiple snippets of text containing username
mentions (e.g. `Hello @alice` and `How are you doing @alice?`). By caching the mentions (for example, `Hello @alice` and `How are you doing @alice?`). By caching the
user objects for every username we can remove the need for running the same user objects for every username we can remove the need for running the same
query for every mention of `@alice`. query for every mention of `@alice`.
...@@ -304,7 +304,7 @@ The main styles of pagination are: ...@@ -304,7 +304,7 @@ The main styles of pagination are:
and the total number of pages. This style is well supported by all components of GitLab. and the total number of pages. This style is well supported by all components of GitLab.
1. Offset-based pagination, but without the count: user goes to a specific page, like 1. 1. Offset-based pagination, but without the count: user goes to a specific page, like 1.
User sees only the next page number, but does not see the total amount of pages. User sees only the next page number, but does not see the total amount of pages.
1. Next page using keyset-based pagination: user can only go to next page, as we do not know how many pages 1. Next page using keyset-based pagination: user can only go to next page, as we don't know how many pages
are available. are available.
1. Infinite scrolling pagination: user scrolls the page and next items are loaded asynchronously. This is ideal, 1. Infinite scrolling pagination: user scrolls the page and next items are loaded asynchronously. This is ideal,
as it has exact same benefits as the previous one. as it has exact same benefits as the previous one.
...@@ -316,20 +316,20 @@ can follow the progress looking at [API: Keyset Pagination ...@@ -316,20 +316,20 @@ can follow the progress looking at [API: Keyset Pagination
Take into consideration the following when choosing a pagination strategy: Take into consideration the following when choosing a pagination strategy:
1. It is very inefficient to calculate amount of objects that pass the filtering, 1. It's very inefficient to calculate amount of objects that pass the filtering,
this operation usually can take seconds, and can time out, this operation usually can take seconds, and can time out,
1. It is very inefficient to get entries for page at higher ordinals, like 1000. 1. It's very inefficient to get entries for page at higher ordinals, like 1000.
The database has to sort and iterate all previous items, and this operation usually The database has to sort and iterate all previous items, and this operation usually
can result in substantial load put on database. can result in substantial load put on database.
## Badge counters ## Badge counters
Counters should always be truncated. It means that we do not want to present Counters should always be truncated. It means that we don't want to present
the exact number over some threshold. The reason for that is for the cases where we want the exact number over some threshold. The reason for that is for the cases where we want
to calculate exact number of items, we effectively need to filter each of them for to calculate exact number of items, we effectively need to filter each of them for
the purpose of knowing the exact number of items matching. the purpose of knowing the exact number of items matching.
From ~UX perspective it is often acceptable to see that you have over 1000+ pipelines, From ~UX perspective it's often acceptable to see that you have over 1000+ pipelines,
instead of that you have 40000+ pipelines, but at a tradeoff of loading page for 2s longer. instead of that you have 40000+ pipelines, but at a tradeoff of loading page for 2s longer.
An example of this pattern is the list of pipelines and jobs. We truncate numbers to `1000+`, An example of this pattern is the list of pipelines and jobs. We truncate numbers to `1000+`,
...@@ -338,7 +338,7 @@ but we show an accurate number of running pipelines, which is the most interesti ...@@ -338,7 +338,7 @@ but we show an accurate number of running pipelines, which is the most interesti
There's a helper method that can be used for that purpose - `NumbersHelper.limited_counter_with_delimiter` - There's a helper method that can be used for that purpose - `NumbersHelper.limited_counter_with_delimiter` -
that accepts an upper limit of counting rows. that accepts an upper limit of counting rows.
In some cases it is desired that badge counters are loaded asynchronously. In some cases it's desired that badge counters are loaded asynchronously.
This can speed up the initial page load and give a better user experience overall. This can speed up the initial page load and give a better user experience overall.
## Application/misuse limits ## Application/misuse limits
...@@ -349,9 +349,9 @@ be performant and usable for the user, but **not limiting**. ...@@ -349,9 +349,9 @@ be performant and usable for the user, but **not limiting**.
**We want the features to be fully usable for the users.** **We want the features to be fully usable for the users.**
**However, we want to ensure that the feature will continue to perform well if used at its limit** **However, we want to ensure that the feature will continue to perform well if used at its limit**
**and it will not cause availability issues.** **and it won't cause availability issues.**
Consider that it is always better to start with some kind of limitation, Consider that it's always better to start with some kind of limitation,
instead of later introducing a breaking change that would result in some instead of later introducing a breaking change that would result in some
workflows breaking. workflows breaking.
...@@ -370,9 +370,9 @@ The intent of quotas could be different: ...@@ -370,9 +370,9 @@ The intent of quotas could be different:
Examples: Examples:
1. Pipeline Schedules: It is very unlikely that user will want to create 1. Pipeline Schedules: It's very unlikely that user will want to create
more than 50 schedules. more than 50 schedules.
In such cases it is rather expected that this is either misuse In such cases it's rather expected that this is either misuse
or abuse of the feature. Lack of the upper limit can result or abuse of the feature. Lack of the upper limit can result
in service degradation as the system will try to process all schedules in service degradation as the system will try to process all schedules
assigned the the project. assigned the the project.
......
...@@ -7,16 +7,15 @@ consistent performance of GitLab. ...@@ -7,16 +7,15 @@ consistent performance of GitLab.
The process of solving performance problems is roughly as follows: The process of solving performance problems is roughly as follows:
1. Make sure there's an issue open somewhere (e.g., on the GitLab CE issue 1. Make sure there's an issue open somewhere (for example, on the GitLab CE issue
tracker), create one if there isn't. See [#15607][#15607] for an example. tracker), and create one if there is not. See [#15607][#15607] for an example.
1. Measure the performance of the code in a production environment such as 1. Measure the performance of the code in a production environment such as
GitLab.com (see the [Tooling](#tooling) section below). Performance should be GitLab.com (see the [Tooling](#tooling) section below). Performance should be
measured over a period of _at least_ 24 hours. measured over a period of _at least_ 24 hours.
1. Add your findings based on the measurement period (screenshots of graphs, 1. Add your findings based on the measurement period (screenshots of graphs,
timings, etc) to the issue mentioned in step 1. timings, etc) to the issue mentioned in step 1.
1. Solve the problem. 1. Solve the problem.
1. Create a merge request, assign the "Performance" label and assign it to 1. Create a merge request, assign the "Performance" label and follow the [performance review process](merge_request_performance_guidelines.md).
[@yorickpeterse][yorickpeterse] for reviewing.
1. Once a change has been deployed make sure to _again_ measure for at least 24 1. Once a change has been deployed make sure to _again_ measure for at least 24
hours to see if your changes have any impact on the production environment. hours to see if your changes have any impact on the production environment.
1. Repeat until you're done. 1. Repeat until you're done.
...@@ -44,16 +43,16 @@ GitLab provides built-in tools to help improve performance and availability: ...@@ -44,16 +43,16 @@ GitLab provides built-in tools to help improve performance and availability:
- [QueryRecoder](query_recorder.md) for preventing `N+1` regressions. - [QueryRecoder](query_recorder.md) for preventing `N+1` regressions.
- [Chaos endpoints](chaos_endpoints.md) for testing failure scenarios. Intended mainly for testing availability. - [Chaos endpoints](chaos_endpoints.md) for testing failure scenarios. Intended mainly for testing availability.
GitLab employees can use GitLab.com's performance monitoring systems located at GitLab team members can use [GitLab.com's performance monitoring systems](https://about.gitlab.com/handbook/engineering/monitoring/) located at
<https://dashboards.gitlab.net>, this requires you to log in using your <https://dashboards.gitlab.net>, this requires you to log in using your
`@gitlab.com` Email address. Non-GitLab employees are advised to set up their `@gitlab.com` email address. Non-GitLab team-members are advised to set up their
own InfluxDB + Grafana stack. own InfluxDB and Grafana stack.
## Benchmarks ## Benchmarks
Benchmarks are almost always useless. Benchmarks usually only test small bits of Benchmarks are almost always useless. Benchmarks usually only test small bits of
code in isolation and often only measure the best case scenario. On top of that, code in isolation and often only measure the best case scenario. On top of that,
benchmarks for libraries (e.g., a Gem) tend to be biased in favour of the benchmarks for libraries (such as a Gem) tend to be biased in favour of the
library. After all there's little benefit to an author publishing a benchmark library. After all there's little benefit to an author publishing a benchmark
that shows they perform worse than their competitors. that shows they perform worse than their competitors.
...@@ -68,8 +67,8 @@ When writing benchmarks you should almost always use ...@@ -68,8 +67,8 @@ When writing benchmarks you should almost always use
[benchmark-ips](https://github.com/evanphx/benchmark-ips). Ruby's `Benchmark` [benchmark-ips](https://github.com/evanphx/benchmark-ips). Ruby's `Benchmark`
module that comes with the standard library is rarely useful as it runs either a module that comes with the standard library is rarely useful as it runs either a
single iteration (when using `Benchmark.bm`) or two iterations (when using single iteration (when using `Benchmark.bm`) or two iterations (when using
`Benchmark.bmbm`). Running this few iterations means external factors (e.g. a `Benchmark.bmbm`). Running this few iterations means external factors, such as a
video streaming in the background) can very easily skew the benchmark video streaming in the background, can very easily skew the benchmark
statistics. statistics.
Another problem with the `Benchmark` module is that it displays timings, not Another problem with the `Benchmark` module is that it displays timings, not
...@@ -114,17 +113,18 @@ the behaviour of suspect code in detail. ...@@ -114,17 +113,18 @@ the behaviour of suspect code in detail.
It's important to note that profiling an application *alters its performance*, It's important to note that profiling an application *alters its performance*,
and will generally be done *in an unrepresentative environment*. In particular, and will generally be done *in an unrepresentative environment*. In particular,
a method is not necessarily troublesome just because it is executed many times, a method is not necessarily troublesome just because it's executed many times,
or takes a long time to execute. Profiles are tools you can use to better or takes a long time to execute. Profiles are tools you can use to better
understand what is happening in an application - using that information wisely understand what is happening in an application - using that information wisely
is up to you! is up to you!
Keeping that in mind, to create a profile, identify (or create) a spec that Keeping that in mind, to create a profile, identify (or create) a spec that
exercises the troublesome code path, then run it using the `bin/rspec-stackprof` exercises the troublesome code path, then run it using the `bin/rspec-stackprof`
helper, e.g.: helper, for example:
```shell ```shell
$ LIMIT=10 bin/rspec-stackprof spec/policies/project_policy_spec.rb $ LIMIT=10 bin/rspec-stackprof spec/policies/project_policy_spec.rb
8/8 |====== 100 ======>| Time: 00:00:18 8/8 |====== 100 ======>| Time: 00:00:18
Finished in 18.19 seconds (files took 4.8 seconds to load) Finished in 18.19 seconds (files took 4.8 seconds to load)
...@@ -170,10 +170,11 @@ kcachegrind project_policy_spec.callgrind # Linux ...@@ -170,10 +170,11 @@ kcachegrind project_policy_spec.callgrind # Linux
qcachegrind project_policy_spec.callgrind # Mac qcachegrind project_policy_spec.callgrind # Mac
``` ```
It may be useful to zoom in on a specific method, e.g.: It may be useful to zoom in on a specific method, for example:
```shell ```shell
$ stackprof tmp/project_policy_spec.rb.dump --method warm_asset_cache $ stackprof tmp/project_policy_spec.rb.dump --method warm_asset_cache
TestEnv#warm_asset_cache (/Users/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/spec/support/test_env.rb:164) TestEnv#warm_asset_cache (/Users/lupine/dev/gitlab.com/gitlab-org/gitlab-development-kit/gitlab/spec/support/test_env.rb:164)
samples: 0 self (0.0%) / 6288 total (36.9%) samples: 0 self (0.0%) / 6288 total (36.9%)
callers: callers:
...@@ -239,6 +240,7 @@ shell: ...@@ -239,6 +240,7 @@ shell:
```shell ```shell
$ rake rspec_profiling:console $ rake rspec_profiling:console
irb(main):001:0> results.count irb(main):001:0> results.count
=> 231 => 231
irb(main):002:0> results.last.attributes.keys irb(main):002:0> results.last.attributes.keys
...@@ -257,9 +259,9 @@ One of the reasons of the increased memory footprint could be Ruby memory fragme ...@@ -257,9 +259,9 @@ One of the reasons of the increased memory footprint could be Ruby memory fragme
To diagnose it, you can visualize Ruby heap as described in [this post by Aaron Patterson](https://tenderlovemaking.com/2017/09/27/visualizing-your-ruby-heap.html). To diagnose it, you can visualize Ruby heap as described in [this post by Aaron Patterson](https://tenderlovemaking.com/2017/09/27/visualizing-your-ruby-heap.html).
To start, you want to dump the heap of the process you are investigating to a JSON file. To start, you want to dump the heap of the process you're investigating to a JSON file.
You need to run the command inside the process you are exploring, you may do that with `rbtrace`. You need to run the command inside the process you're exploring, you may do that with `rbtrace`.
`rbtrace` is already present in GitLab `Gemfile`, you just need to require it. `rbtrace` is already present in GitLab `Gemfile`, you just need to require it.
It could be achieved running webserver or Sidekiq with the environment variable set to `ENABLE_RBTRACE=1`. It could be achieved running webserver or Sidekiq with the environment variable set to `ENABLE_RBTRACE=1`.
...@@ -295,11 +297,11 @@ There is no clear set of steps that you can follow to determine if a certain ...@@ -295,11 +297,11 @@ There is no clear set of steps that you can follow to determine if a certain
piece of code is worth optimizing. The only two things you can do are: piece of code is worth optimizing. The only two things you can do are:
1. Think about what the code does, how it's used, how many times it's called and 1. Think about what the code does, how it's used, how many times it's called and
how much time is spent in it relative to the total execution time (e.g., the how much time is spent in it relative to the total execution time (for example, the
total time spent in a web request). total time spent in a web request).
1. Ask others (preferably in the form of an issue). 1. Ask others (preferably in the form of an issue).
Some examples of changes that aren't really important/worth the effort: Some examples of changes that are not really important/worth the effort:
- Replacing double quotes with single quotes. - Replacing double quotes with single quotes.
- Replacing usage of Array with Set when the list of values is very small. - Replacing usage of Array with Set when the list of values is very small.
...@@ -309,7 +311,7 @@ Some examples of changes that aren't really important/worth the effort: ...@@ -309,7 +311,7 @@ Some examples of changes that aren't really important/worth the effort:
## Slow Operations & Sidekiq ## Slow Operations & Sidekiq
Slow operations (e.g. merging branches) or operations that are prone to errors Slow operations, like merging branches, or operations that are prone to errors
(using external APIs) should be performed in a Sidekiq worker instead of (using external APIs) should be performed in a Sidekiq worker instead of
directly in a web request as much as possible. This has numerous benefits such directly in a web request as much as possible. This has numerous benefits such
as: as:
...@@ -416,7 +418,7 @@ as omitting it may lead to style check failures. ...@@ -416,7 +418,7 @@ as omitting it may lead to style check failures.
## Anti-Patterns ## Anti-Patterns
This is a collection of [anti-patterns][anti-pattern] that should be avoided This is a collection of [anti-patterns][anti-pattern] that should be avoided
unless these changes have a measurable, significant and positive impact on unless these changes have a measurable, significant, and positive impact on
production environments. production environments.
### Moving Allocations to Constants ### Moving Allocations to Constants
...@@ -458,5 +460,4 @@ You may find some useful examples in this snippet: ...@@ -458,5 +460,4 @@ You may find some useful examples in this snippet:
<https://gitlab.com/gitlab-org/gitlab-foss/snippets/33946> <https://gitlab.com/gitlab-org/gitlab-foss/snippets/33946>
[#15607]: https://gitlab.com/gitlab-org/gitlab-foss/issues/15607 [#15607]: https://gitlab.com/gitlab-org/gitlab-foss/issues/15607
[yorickpeterse]: https://gitlab.com/yorickpeterse
[anti-pattern]: https://en.wikipedia.org/wiki/Anti-pattern [anti-pattern]: https://en.wikipedia.org/wiki/Anti-pattern
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment