Merge branch '254735-document-guidelines-for-cached-sql-calls' into 'master'

Update documentation guidelines See merge request gitlab-org/gitlab!46902

Merge branch '254735-document-guidelines-for-cached-sql-calls' into 'master'
Update documentation guidelines See merge request gitlab-org/gitlab!46902
14126d38 · Achilleas Pipinellis · f274d667 · 0e0ab66d · 14126d38 · 14126d38
Commit 14126d38 authored Nov 16, 2020 by Achilleas Pipinellis
7 changed files
--- a/doc/development/README.md
+++ b/doc/development/README.md
@@ -139,13 +139,15 @@ from:
 ## Performance guides

 - [Instrumentation](instrumentation.md) for Ruby code running in production
-  environments
+  environments.
 - [Performance guidelines](performance.md) for writing code, benchmarks, and
-  certain patterns to avoid
+  certain patterns to avoid.
 - [Merge request performance guidelines](merge_request_performance_guidelines.md)
  for ensuring merge requests do not negatively impact GitLab performance
 - [Profiling](profiling.md) a URL, measuring performance using Sherlock, or
-  tracking down N+1 queries using Bullet
+  tracking down N+1 queries using Bullet.
+- [Cached queries guidelines](cached_queries.md), for tracking down N+1 queries masked by query caching, memory profiling and why should
+  we avoid cached queries.

 ## Database guides


--- a/doc/development/cached_queries.md
+++ b/doc/development/cached_queries.md
+# Cached queries guidelines
+
+Rails provides an [SQL query cache](https://guides.rubyonrails.org/caching_with_rails.html#sql-caching), 
+used to cache the results of database queries for the duration of the request. 
+
+If Rails encounters the same query again for that request,
+it will use the cached result set as opposed to running the query against the database again.
+
+The query results are only cached for the duration of that single request, it does not persist across multiple requests.
+
+## Why cached queries are considered bad
+
+The cached queries help with reducing DB load, but they still:
+
+- Consume memory.
+- Require as to re-instantiate each `ActiveRecord` object.
+- Require as to re-instantiate each relation of the object.
+- Make us spend additional CPU-cycles to look into a list of cached queries.
+ 
+The Cached SQL queries are cheaper, but they are not cheap at all from `memory` perspective. 
+They could mask [N+1 query problem](https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations),
+so we should threat them the same way we threat regular N+1 queries.
+
+In case of N+1 queries, masked with cached queries, we are executing the same query N times.
+It will not hit the database N times, it will return the cached results instead.
+This is still expensive since we need to re-initialize objects each time, and this is CPU/Memory expensive.
+Instead, we should use the same in-memory objects, if possible.
+
+When we introduce a new feature, we should avoid N+1 problems,
+minimize the [query count](merge_request_performance_guidelines.md#query-counts), and pay special attention that [cached
+queries](merge_request_performance_guidelines.md#cached-queries) are not masking N+1 problems. 
+
+## How to detect
+
+### Detect potential offenders by using Kibana
+
+On GitLab.com, we are logging entries with the number of executed cached queries in the
+`pubsub-redis-inf-gprd*` index with the [`db_cached_count`](https://log.gprd.gitlab.net/goto/77d18d80ad84c5df1bf1da5c2cd35b82).
+We can filter endpoints that have a large number of executed cached queries. For example, if we encounter an endpoint
+that has 100+ `db_cached_count`, this could indicate that there is an N+1 problem masked with cached queries.
+We should probably investigate this endpoint further, to check if we are executing duplicated cached queries.
+
+For more cached queries Kibana visualizations see [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/259007).
+
+### Inspect suspicious endpoint using Performance Bar
+
+When building features, you could use the [performance bar](../administration/monitoring/performance/performance_bar.md)
+to list database queries, which will include cached queries as well. The performance bar will show a warning
+when threshold of total executed queries (including cached ones) has exceeded 100 queries.
+
+## What to look for
+
+Using [Kibana](cached_queries.md#detect-potential-offenders-by-using-kibana), you can look for a large number
+of executed cached queries. End-points with large number of `db_cached_count` could indicate that there
+are probably a lot of duplicated cached queries, which often indicates a masked N+1 problem.
+
+When you investigate specific endpoint, you could use
+the [performance bar](cached_queries.md#inspect-suspicious-endpoint-using-performance-bar).
+If you see a lot of similar queries, this often indicates an N+1 query issue (or a similar kind of query batching problem).
+If you see same cached query executed multiple times, this often indicates a masked N+1 query problem.
+
+For example, let's say you wanted to debug `GroupMembers` page. 
+
+In the left corner of the performance bar you could see **Database queries** showing the total number of database queries
+and the number of executed cached queries:
+
+![Performance Bar Database Queries](img/performance_bar_members_page.png)
+
+We can see that there are 55 cached queries. By clicking on the number, a modal window with more details is shown.
+Cached queries are marked with the `cached` label, so they are easy to spot. We can see that there are multiple duplicated
+cached queries:
+
+![Performance Bar Cached Queries Modal](img/performance_bar_cached_queries.png)
+
+If we click on `...` for one of them, it will expand the actual stack trace:
+
+```shell
+[
+  "app/models/group.rb:305:in `has_owner?'",
+  "ee/app/views/shared/members/ee/_license_badge.html.haml:1",
+  "app/helpers/application_helper.rb:19:in `render_if_exists'",
+  "app/views/shared/members/_member.html.haml:31",
+  "app/views/groups/group_members/index.html.haml:75",
+  "app/controllers/application_controller.rb:134:in `render'",
+  "ee/lib/gitlab/ip_address_state.rb:10:in `with'",
+  "ee/app/controllers/ee/application_controller.rb:44:in `set_current_ip_address'",
+  "app/controllers/application_controller.rb:493:in `set_current_admin'",
+  "lib/gitlab/session.rb:11:in `with_session'",
+  "app/controllers/application_controller.rb:484:in `set_session_storage'",
+  "app/controllers/application_controller.rb:478:in `set_locale'",
+  "lib/gitlab/error_tracking.rb:52:in `with_context'",
+  "app/controllers/application_controller.rb:543:in `sentry_context'",
+  "app/controllers/application_controller.rb:471:in `block in set_current_context'",
+  "lib/gitlab/application_context.rb:54:in `block in use'",
+  "lib/gitlab/application_context.rb:54:in `use'",
+  "lib/gitlab/application_context.rb:21:in `with_context'",
+  "app/controllers/application_controller.rb:463:in `set_current_context'",
+  "lib/gitlab/jira/middleware.rb:19:in `call'"
+]
+```
+
+The stack trace, shows us that we obviously have an N+1 problem, since we are repeatably executing for each group member:
+
+```ruby
+group.has_owner?(current_user)
+```
+
+This is easily solvable by extracting this check, above the loop.
+
+After [the fix](https://gitlab.com/gitlab-org/gitlab/-/issues/231468), we now have:
+
+![Performance Bar Fixed Cached Queries](img/performance_bar_fixed_cached_queries.png)
+
+## How to measure the impact of the change
+
+We can use the [memory profiler](performance.md#using-memory-profiler) to profile our code. 
+For the previous example, we could wrap the profiler around the `Groups::GroupMembersController#index` action.
+
+We had:
+
+- Total allocated: 7133601 bytes (84858 objects)
+- Total retained: 757595 bytes (6070 objects)
+- `db_count`: 144
+- `db_cached_count`: 55
+- `db_duration`: 303ms
+
+After the fix, we can see that we have reduced the allocated memory as well as the number of cached queries and improved execution time:
+
+- Total allocated: 5313899 bytes (65290 objects), 1810KB (25%) less
+- Total retained: 685593 bytes (5278 objects), 72KB (9%) less
+- `db_count`: 95 (34% less)
+- `db_cached_count`: 6 (89% less)
+- `db_duration`: 162ms (87% faster)
+
+## See also
+
+- [Metrics that would help us detect the potential N+1 Cached SQL calls](https://gitlab.com/gitlab-org/gitlab/-/issues/259007)
+- [Merge Request performance guidelines for cached queries](merge_request_performance_guidelines.md#cached-queries)
+- [Improvements for biggest offenders](https://gitlab.com/groups/gitlab-org/-/epics/4508)
--- a/doc/development/img/performance_bar_cached_queries.png
+++ b/doc/development/img/performance_bar_cached_queries.png
--- a/doc/development/img/performance_bar_fixed_cached_queries.png
+++ b/doc/development/img/performance_bar_fixed_cached_queries.png
--- a/doc/development/img/performance_bar_members_page.png
+++ b/doc/development/img/performance_bar_members_page.png
--- a/doc/development/merge_request_performance_guidelines.md
+++ b/doc/development/merge_request_performance_guidelines.md
@@ -162,30 +162,11 @@ query. This in turn makes it much harder for this code to overload a database.

 **Summary:** a merge request **should not** execute duplicated cached queries.

-Rails provides an [SQL query cache](https://guides.rubyonrails.org/caching_with_rails.html#sql-caching), 
+Rails provides an [SQL Query Cache](cached_queries.md#cached-queries-guidelines), 
 used to cache the results of database queries for the duration of the request. 
-If Rails encounters the same query again for that request,
-it will use the cached result set as opposed to running the query against the database again.
-The query results are only cached for the duration of that single request, it does not persist across multiple requests.
-
-The cached queries help with reducing DB load, but they still:
-
- Consume memory.
- Require as to re-instantiate each `ActiveRecord` object.
- Require as to re-instantiate each relation of the object.
- Make us spend additional CPU-cycles to look into a list of cached queries.
-
-They are cheaper, but they are not cheap at all from `memory` perspective.
- 
-Cached SQL queries, could mask [N+1 query problem](https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations).
-If those N queries are executing the same query, it will not hit the database N times, it will return the cached results instead,
-which is still expensive since we need to re-initialize objects each time, and this is CPU/Memory expensive.
-Instead, you should use the same in-memory objects, if possible. 
-
-When building features, you could use [Performance bar](../administration/monitoring/performance/performance_bar.md)
-in order to list Database queries, which will include cached queries as well. If you see a lot of similar queries,
-this often indicates an N+1 query issue (or a similar kind of query batching problem).
-If you see same cached query executed multiple times, this often indicates a masked N+1 query problem.
+
+See [why cached queries are considered bad](cached_queries.md#why-cached-queries-are-considered-bad) and 
+[how to detect them](cached_queries.md#how-to-detect).

 The code introduced by a merge request, should not execute multiple duplicated cached queries.


--- a/doc/development/performance.md
+++ b/doc/development/performance.md
@@ -349,6 +349,40 @@ issues in our code.

 ## Memory profiling

+We can use two approaches, often in combination, to track down memory issues:
+
+- Leaving the code intact and wrapping a profiler around it.
+- Monitor memory usage of the process while disabling/enabling different parts of the code we suspect could be problematic.
+
+### Using Memory Profiler
+
+We can use `memory_profiler` for profiling.
+
+The [`memory_profiler`](https://github.com/SamSaffron/memory_profiler) gem is already present in GitLab's `Gemfile`,
+you just need to require it:
+
+```ruby
+require 'sidekiq/testing'
+
+report = MemoryProfiler.report do
+  # Code you want to profile
+end
+
+output = File.open('/tmp/profile.txt','w')
+report.pretty_print(output)
+```
+
+The report breaks down 2 key concepts:
+
+- Retained: long lived memory use and object count retained due to the execution of the code block.
+- Allocated: all object allocation and memory allocation during code block.
+
+As a general rule, **retained** will always be smaller than or equal to allocated.
+
+The actual RSS cost will always be slightly higher as MRI heaps are not squashed to size and memory fragments.
+
+### Rbtrace
+
 One of the reasons of the increased memory footprint could be Ruby memory fragmentation.

 To diagnose it, you can visualize Ruby heap as described in [this post by Aaron Patterson](https://tenderlovemaking.com/2017/09/27/visualizing-your-ruby-heap.html).