Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
93608001
Commit
93608001
authored
Sep 23, 2021
by
Heinrich Lee Yu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update label filtering case study
parent
ab669cf5
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
17 additions
and
5 deletions
+17
-5
doc/development/filtering_by_label.md
doc/development/filtering_by_label.md
+17
-5
No files found.
doc/development/filtering_by_label.md
View file @
93608001
...
@@ -82,6 +82,19 @@ AND (EXISTS (
...
@@ -82,6 +82,19 @@ AND (EXISTS (
While this worked without schema changes, and did improve readability somewhat,
While this worked without schema changes, and did improve readability somewhat,
it did not improve query performance.
it did not improve query performance.
### Attempt A2: use label IDs in the WHERE EXISTS clause
In
[
merge request #34503
](
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34503
)
, we followed a similar approach to A1. But this time, we
did a separate query to fetch the IDs of the labels used in the filter so that we avoid the
`JOIN`
in the
`EXISTS`
clause and filter directly by
`label_links.label_id`
. We also added a new index on
`label_links`
for the
`target_id`
,
`label_id`
, and
`target_type`
columns to speed up this query.
Finding the label IDs wasn't straightforward because there could be multiple labels with the same title within a single root namespace. We solved
this by grouping the label IDs by title and then using the array of IDs in the
`EXISTS`
clauses.
This resulted in a significant performance improvement. However, this optimization could not be applied to the dashboard pages
where we do not have a project or group context. We could not easily search for the label IDs here because that would mean searching across all
projects and groups that the user has access to.
## Attempt B: Denormalize using an array column
## Attempt B: Denormalize using an array column
Having
[
removed MySQL support in GitLab 12.1
](
https://about.gitlab.com/blog/2019/06/27/removing-mysql-support/
)
,
Having
[
removed MySQL support in GitLab 12.1
](
https://about.gitlab.com/blog/2019/06/27/removing-mysql-support/
)
,
...
@@ -159,9 +172,8 @@ However, at present, the disadvantages outweigh the advantages.
...
@@ -159,9 +172,8 @@ However, at present, the disadvantages outweigh the advantages.
## Conclusion
## Conclusion
We have yet to find a method that is demonstrably better than the current
We found a method A2 that does not need denormalization and improves the query performance significantly. This
method, when considering:
did not apply to all cases, but we were able to apply method A1 to the rest of the cases so that we remove the
`GROUP BY`
and
`HAVING`
clauses in all scenarios.
1.
Query performance.
This simplified the query and improved the performance in the most common cases.
1.
Readability.
1.
Ease of maintaining schema consistency.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment