Commit 3777996b authored by Krasimir Angelov's avatar Krasimir Angelov

Update "Understanding EXPLAIN plans" docs

Mostly about node statistics being per-loop averages.
parent 045c11d1
......@@ -52,7 +52,7 @@ that require a more in-depth discussion between the database reviewers and maint
- [Database Office Hours Agenda](https://docs.google.com/document/d/1wgfmVL30F8SdMg-9yY6Y8djPSxWNvKmhR5XmsvYX1EI/edit).
- <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [YouTube playlist with past recordings](https://www.youtube.com/playlist?list=PL05JrBw4t0Kp-kqXeiF7fF7cFYaKtdqXM).
You should also join the [#database-labs](../understanding_explain_plans.md#database-lab)
You should also join the [#database-lab](../understanding_explain_plans.md#database-lab-engine)
Slack channel and get familiar with how to use Joe, the Slackbot that provides developers
with their own clone of the production database.
......
......@@ -38,8 +38,8 @@ cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query.
When analyzing an [`EXPLAIN` plan](understanding_explain_plans.md), you can see
the difference not only in the timing, but by looking at the output for `Buffers`
by running your explain with `EXPLAIN(analyze, buffers)`. The [#database-lab](understanding_explain_plans.md#database-lab)
tool will automatically include these options.
by running your explain with `EXPLAIN(analyze, buffers)`. [Database Lab](understanding_explain_plans.md#database-lab-engine)
will automatically include these options.
If you are making a warm cache query, you will only see the `shared hits`.
......
......@@ -198,13 +198,39 @@ Here we can see that our filter has to remove 65,677 rows, and that we use
208,846 buffers. Each buffer in PostgreSQL is 8 KB (8192 bytes), meaning our
above node uses *1.6 GB of buffers*. That's a lot!
Keep in mind that some statistics are per-loop averages, while others are total values:
| Field name | Value type |
| --- | --- |
| Actual Total Time | per-loop average |
| Actual Rows | per-loop average |
| Buffers Shared Hit | total value |
| Buffers Shared Read | total value |
| Buffers Shared Dirtied | total value |
| Buffers Shared Written | total value |
| I/O Read Time | total value |
| I/O Read Write | total value |
For example:
```sql
-> Index Scan using users_pkey on public.users (cost=0.43..3.44 rows=1 width=1318) (actual time=0.025..0.025 rows=1 loops=888)
Index Cond: (users.id = issues.author_id)
Buffers: shared hit=3543 read=9
I/O Timings: read=17.760 write=0.000
```
Here we can see that this node used 3552 buffers (3543 + 9), returned 888 rows (`888 * 1`), and the actual duration was 22.2 milliseconds (`888 * 0.025`).
17.76 milliseconds of the total duration was spent in reading from disk, to retrieve data that was not in the cache.
## Node types
There are quite a few different types of nodes, so we only cover some of the
more common ones here.
A full list of all the available nodes and their descriptions can be found in
the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h)
the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h).
pgMustard's [EXPLAIN docs](https://www.pgmustard.com/docs/explain) also offer detailed look into nodes and their fields.
### Seq Scan
......@@ -441,7 +467,7 @@ When optimizing a query, we usually need to reduce the amount of data we're
dealing with. Indexes are the way to work with fewer pages (buffers) to get the
result, so, during optimization, look at the number of buffers used (read and hit),
and work on reducing these numbers. Reduced timing will be the consequence of reduced
buffer numbers. [#database-lab](#database-lab) guarantees that the plan is structurally
buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally
identical to production (and overall number of buffers is the same as on production),
but difference in cache state and I/O speed may lead to different timings.
......@@ -617,7 +643,7 @@ If we look at the plan we also see our costs are very low:
Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
```
Here our cost is only 3.45, and it only takes us 0.050 milliseconds to do so.
Here our cost is only 3.45, and it takes us 7.25 milliseconds to do so (0.05 * 145).
The next index scan is a bit more expensive:
```sql
......@@ -681,64 +707,26 @@ There are a few ways to get the output of a query plan. Of course you
can directly run the `EXPLAIN` query in the `psql` console, or you can
follow one of the other options below.
### Rails console
### Database Lab Engine
Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze)
you can directly generate the query plan from the Rails console:
GitLab team members can use [Database Lab Engine](https://gitlab.com/postgres-ai/database-lab), and the companion
SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe).
```ruby
pry(main)> require 'activerecord-explain-analyze'
=> true
pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
(pry):12
=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
Filter: (projects.build_timeout > 3600)
Rows Removed by Filter: 14
Buffers: shared hit=2
Planning time: 0.411 ms
Execution time: 0.113 ms
```
Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans.
### ChatOps
Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack,
and through its [web interface](https://console.postgres.ai/gitlab/joe-instances).
[GitLab team members can also use our ChatOps solution, available in Slack using the
`/chatops` slash command](chatops_on_gitlabcom.md).
You can use ChatOps to get a query plan by running the following:
With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements.
```sql
/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
```
For example, in order to test new index on a column that is not existing on production yet, you can do the following:
Visualising the plan using <https://explain.depesz.com/> is also supported:
Create the column:
```sql
/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
exec ALTER TABLE projects ADD COLUMN last_at timestamp without time zone
```
Quoting the query is not necessary.
For more information about the available options, run:
```sql
/chatops run explain --help
```
### `#database-lab`
Another tool GitLab team members can use is a chatbot powered by [Joe](https://gitlab.com/postgres-ai/joe)
which uses [Database Lab](https://gitlab.com/postgres-ai/database-lab) to instantly provide developers
with their own clone of the production database.
Joe is available in the
[`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack.
Unlike ChatOps, it gives you a way to execute DDL statements (like creating indexes and tables) and get query plan not only for `SELECT` but also `UPDATE` and `DELETE`.
For example, in order to test new index you can do the following:
Create the index:
```sql
......@@ -769,18 +757,67 @@ For more information about the available options, run:
help
```
The web interface comes with the following execution plan visualizers included:
- [Depesz](https://explain.depesz.com/)
- [PEV2](https://github.com/dalibo/pev2)
- [FlameGraph](https://github.com/mgartner/pg_flame)
#### Tips & Tricks
The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings will be applied to all subsequent commands until you reset them.
The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings will be applied to all subsequent commands until you reset them. For example you can disable parallel queries with
```sql
exec SET max_parallel_workers_per_gather = 0
```
### Rails console
Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze)
you can directly generate the query plan from the Rails console:
```ruby
pry(main)> require 'activerecord-explain-analyze'
=> true
pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
(pry):12
=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
Filter: (projects.build_timeout > 3600)
Rows Removed by Filter: 14
Buffers: shared hit=2
Planning time: 0.411 ms
Execution time: 0.113 ms
```
### ChatOps
[GitLab team members can also use our ChatOps solution, available in Slack using the
`/chatops` slash command](chatops_on_gitlabcom.md).
NOTE:
While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine).
It is also possible to use transactions. This may be useful when you are working on statements that modify the data, for example INSERT, UPDATE, and DELETE. The `explain` command will perform `EXPLAIN ANALYZE`, which executes the statement. In order to run each `explain` starting from a clean state you can wrap it in a transaction, for example:
You can use ChatOps to get a query plan by running the following:
```sql
exec BEGIN
/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
```
explain UPDATE some_table SET some_column = TRUE
Visualising the plan using <https://explain.depesz.com/> is also supported:
```sql
/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
```
exec ROLLBACK
Quoting the query is not necessary.
For more information about the available options, run:
```sql
/chatops run explain --help
```
## Further reading
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment