Commit 14126d38 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch '254735-document-guidelines-for-cached-sql-calls' into 'master'

Update documentation guidelines

See merge request gitlab-org/gitlab!46902
parents f274d667 0e0ab66d
......@@ -139,13 +139,15 @@ from:
## Performance guides
- [Instrumentation](instrumentation.md) for Ruby code running in production
environments
environments.
- [Performance guidelines](performance.md) for writing code, benchmarks, and
certain patterns to avoid
certain patterns to avoid.
- [Merge request performance guidelines](merge_request_performance_guidelines.md)
for ensuring merge requests do not negatively impact GitLab performance
- [Profiling](profiling.md) a URL, measuring performance using Sherlock, or
tracking down N+1 queries using Bullet
tracking down N+1 queries using Bullet.
- [Cached queries guidelines](cached_queries.md), for tracking down N+1 queries masked by query caching, memory profiling and why should
we avoid cached queries.
## Database guides
......
# Cached queries guidelines
Rails provides an [SQL query cache](https://guides.rubyonrails.org/caching_with_rails.html#sql-caching),
used to cache the results of database queries for the duration of the request.
If Rails encounters the same query again for that request,
it will use the cached result set as opposed to running the query against the database again.
The query results are only cached for the duration of that single request, it does not persist across multiple requests.
## Why cached queries are considered bad
The cached queries help with reducing DB load, but they still:
- Consume memory.
- Require as to re-instantiate each `ActiveRecord` object.
- Require as to re-instantiate each relation of the object.
- Make us spend additional CPU-cycles to look into a list of cached queries.
The Cached SQL queries are cheaper, but they are not cheap at all from `memory` perspective.
They could mask [N+1 query problem](https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations),
so we should threat them the same way we threat regular N+1 queries.
In case of N+1 queries, masked with cached queries, we are executing the same query N times.
It will not hit the database N times, it will return the cached results instead.
This is still expensive since we need to re-initialize objects each time, and this is CPU/Memory expensive.
Instead, we should use the same in-memory objects, if possible.
When we introduce a new feature, we should avoid N+1 problems,
minimize the [query count](merge_request_performance_guidelines.md#query-counts), and pay special attention that [cached
queries](merge_request_performance_guidelines.md#cached-queries) are not masking N+1 problems.
## How to detect
### Detect potential offenders by using Kibana
On GitLab.com, we are logging entries with the number of executed cached queries in the
`pubsub-redis-inf-gprd*` index with the [`db_cached_count`](https://log.gprd.gitlab.net/goto/77d18d80ad84c5df1bf1da5c2cd35b82).
We can filter endpoints that have a large number of executed cached queries. For example, if we encounter an endpoint
that has 100+ `db_cached_count`, this could indicate that there is an N+1 problem masked with cached queries.
We should probably investigate this endpoint further, to check if we are executing duplicated cached queries.
For more cached queries Kibana visualizations see [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/259007).
### Inspect suspicious endpoint using Performance Bar
When building features, you could use the [performance bar](../administration/monitoring/performance/performance_bar.md)
to list database queries, which will include cached queries as well. The performance bar will show a warning
when threshold of total executed queries (including cached ones) has exceeded 100 queries.
## What to look for
Using [Kibana](cached_queries.md#detect-potential-offenders-by-using-kibana), you can look for a large number
of executed cached queries. End-points with large number of `db_cached_count` could indicate that there
are probably a lot of duplicated cached queries, which often indicates a masked N+1 problem.
When you investigate specific endpoint, you could use
the [performance bar](cached_queries.md#inspect-suspicious-endpoint-using-performance-bar).
If you see a lot of similar queries, this often indicates an N+1 query issue (or a similar kind of query batching problem).
If you see same cached query executed multiple times, this often indicates a masked N+1 query problem.
For example, let's say you wanted to debug `GroupMembers` page.
In the left corner of the performance bar you could see **Database queries** showing the total number of database queries
and the number of executed cached queries:
![Performance Bar Database Queries](img/performance_bar_members_page.png)
We can see that there are 55 cached queries. By clicking on the number, a modal window with more details is shown.
Cached queries are marked with the `cached` label, so they are easy to spot. We can see that there are multiple duplicated
cached queries:
![Performance Bar Cached Queries Modal](img/performance_bar_cached_queries.png)
If we click on `...` for one of them, it will expand the actual stack trace:
```shell
[
"app/models/group.rb:305:in `has_owner?'",
"ee/app/views/shared/members/ee/_license_badge.html.haml:1",
"app/helpers/application_helper.rb:19:in `render_if_exists'",
"app/views/shared/members/_member.html.haml:31",
"app/views/groups/group_members/index.html.haml:75",
"app/controllers/application_controller.rb:134:in `render'",
"ee/lib/gitlab/ip_address_state.rb:10:in `with'",
"ee/app/controllers/ee/application_controller.rb:44:in `set_current_ip_address'",
"app/controllers/application_controller.rb:493:in `set_current_admin'",
"lib/gitlab/session.rb:11:in `with_session'",
"app/controllers/application_controller.rb:484:in `set_session_storage'",
"app/controllers/application_controller.rb:478:in `set_locale'",
"lib/gitlab/error_tracking.rb:52:in `with_context'",
"app/controllers/application_controller.rb:543:in `sentry_context'",
"app/controllers/application_controller.rb:471:in `block in set_current_context'",
"lib/gitlab/application_context.rb:54:in `block in use'",
"lib/gitlab/application_context.rb:54:in `use'",
"lib/gitlab/application_context.rb:21:in `with_context'",
"app/controllers/application_controller.rb:463:in `set_current_context'",
"lib/gitlab/jira/middleware.rb:19:in `call'"
]
```
The stack trace, shows us that we obviously have an N+1 problem, since we are repeatably executing for each group member:
```ruby
group.has_owner?(current_user)
```
This is easily solvable by extracting this check, above the loop.
After [the fix](https://gitlab.com/gitlab-org/gitlab/-/issues/231468), we now have:
![Performance Bar Fixed Cached Queries](img/performance_bar_fixed_cached_queries.png)
## How to measure the impact of the change
We can use the [memory profiler](performance.md#using-memory-profiler) to profile our code.
For the previous example, we could wrap the profiler around the `Groups::GroupMembersController#index` action.
We had:
- Total allocated: 7133601 bytes (84858 objects)
- Total retained: 757595 bytes (6070 objects)
- `db_count`: 144
- `db_cached_count`: 55
- `db_duration`: 303ms
After the fix, we can see that we have reduced the allocated memory as well as the number of cached queries and improved execution time:
- Total allocated: 5313899 bytes (65290 objects), 1810KB (25%) less
- Total retained: 685593 bytes (5278 objects), 72KB (9%) less
- `db_count`: 95 (34% less)
- `db_cached_count`: 6 (89% less)
- `db_duration`: 162ms (87% faster)
## See also
- [Metrics that would help us detect the potential N+1 Cached SQL calls](https://gitlab.com/gitlab-org/gitlab/-/issues/259007)
- [Merge Request performance guidelines for cached queries](merge_request_performance_guidelines.md#cached-queries)
- [Improvements for biggest offenders](https://gitlab.com/groups/gitlab-org/-/epics/4508)
......@@ -162,30 +162,11 @@ query. This in turn makes it much harder for this code to overload a database.
**Summary:** a merge request **should not** execute duplicated cached queries.
Rails provides an [SQL query cache](https://guides.rubyonrails.org/caching_with_rails.html#sql-caching),
Rails provides an [SQL Query Cache](cached_queries.md#cached-queries-guidelines),
used to cache the results of database queries for the duration of the request.
If Rails encounters the same query again for that request,
it will use the cached result set as opposed to running the query against the database again.
The query results are only cached for the duration of that single request, it does not persist across multiple requests.
The cached queries help with reducing DB load, but they still:
- Consume memory.
- Require as to re-instantiate each `ActiveRecord` object.
- Require as to re-instantiate each relation of the object.
- Make us spend additional CPU-cycles to look into a list of cached queries.
They are cheaper, but they are not cheap at all from `memory` perspective.
Cached SQL queries, could mask [N+1 query problem](https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations).
If those N queries are executing the same query, it will not hit the database N times, it will return the cached results instead,
which is still expensive since we need to re-initialize objects each time, and this is CPU/Memory expensive.
Instead, you should use the same in-memory objects, if possible.
When building features, you could use [Performance bar](../administration/monitoring/performance/performance_bar.md)
in order to list Database queries, which will include cached queries as well. If you see a lot of similar queries,
this often indicates an N+1 query issue (or a similar kind of query batching problem).
If you see same cached query executed multiple times, this often indicates a masked N+1 query problem.
See [why cached queries are considered bad](cached_queries.md#why-cached-queries-are-considered-bad) and
[how to detect them](cached_queries.md#how-to-detect).
The code introduced by a merge request, should not execute multiple duplicated cached queries.
......
......@@ -349,6 +349,40 @@ issues in our code.
## Memory profiling
We can use two approaches, often in combination, to track down memory issues:
- Leaving the code intact and wrapping a profiler around it.
- Monitor memory usage of the process while disabling/enabling different parts of the code we suspect could be problematic.
### Using Memory Profiler
We can use `memory_profiler` for profiling.
The [`memory_profiler`](https://github.com/SamSaffron/memory_profiler) gem is already present in GitLab's `Gemfile`,
you just need to require it:
```ruby
require 'sidekiq/testing'
report = MemoryProfiler.report do
# Code you want to profile
end
output = File.open('/tmp/profile.txt','w')
report.pretty_print(output)
```
The report breaks down 2 key concepts:
- Retained: long lived memory use and object count retained due to the execution of the code block.
- Allocated: all object allocation and memory allocation during code block.
As a general rule, **retained** will always be smaller than or equal to allocated.
The actual RSS cost will always be slightly higher as MRI heaps are not squashed to size and memory fragments.
### Rbtrace
One of the reasons of the increased memory footprint could be Ruby memory fragmentation.
To diagnose it, you can visualize Ruby heap as described in [this post by Aaron Patterson](https://tenderlovemaking.com/2017/09/27/visualizing-your-ruby-heap.html).
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment