@@ -20,8 +20,8 @@ every time the project is saved.
...
@@ -20,8 +20,8 @@ every time the project is saved.
The summary of those statistics per namespace is then retrieved
The summary of those statistics per namespace is then retrieved
by [`Namespaces#with_statistics`](https://gitlab.com/gitlab-org/gitlab-ce/blob/v12.2.0.pre/app/models/namespace.rb#L70) scope. Analyzing this query we noticed that:
by [`Namespaces#with_statistics`](https://gitlab.com/gitlab-org/gitlab-ce/blob/v12.2.0.pre/app/models/namespace.rb#L70) scope. Analyzing this query we noticed that:
* It takes up to `1.2` seconds for namespaces with over `15k` projects.
- It takes up to `1.2` seconds for namespaces with over `15k` projects.
* It can't be analyzed with [ChatOps](chatops_on_gitlabcom.md), as it times out.
- It can't be analyzed with [ChatOps](chatops_on_gitlabcom.md), as it times out.
Additionally, the pattern that is currently used to update the project statistics
Additionally, the pattern that is currently used to update the project statistics
(the callback) doesn't scale adequately. It is currently one of the largest
(the callback) doesn't scale adequately. It is currently one of the largest
While this implied a single query update (and probably a fast one), it has some downsides:
While this implied a single query update (and probably a fast one), it has some downsides:
* Materialized views syntax varies from PostgreSQL and MySQL. While this feature was worked on, MySQL was still supported by GitLab.
- Materialized views syntax varies from PostgreSQL and MySQL. While this feature was worked on, MySQL was still supported by GitLab.
* Rails does not have native support for materialized views. We'd need to use a specialized gem to take care of the management of the database views, which implies additional work.
- Rails does not have native support for materialized views. We'd need to use a specialized gem to take care of the management of the database views, which implies additional work.
### Attempt B: An update through a CTE
### Attempt B: An update through a CTE
...
@@ -131,8 +131,8 @@ WHERE namespace_id IN (
...
@@ -131,8 +131,8 @@ WHERE namespace_id IN (
Even though this approach would make aggregating much easier, it has some major downsides:
Even though this approach would make aggregating much easier, it has some major downsides:
* We'd have to migrate **all namespaces** by adding and filling a new column. Because of the size of the table, dealing with time/cost will not be great. The background migration will take approximately `153h`, see <https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29772>.
- We'd have to migrate **all namespaces** by adding and filling a new column. Because of the size of the table, dealing with time/cost will not be great. The background migration will take approximately `153h`, see <https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29772>.
* Background migration has to be shipped one release before, delaying the functionality by another milestone.
- Background migration has to be shipped one release before, delaying the functionality by another milestone.
### Attempt E (final): Update the namespace storage statistics in async way
### Attempt E (final): Update the namespace storage statistics in async way
...
@@ -155,10 +155,10 @@ but we refresh them through Sidekiq jobs and in different transactions:
...
@@ -155,10 +155,10 @@ but we refresh them through Sidekiq jobs and in different transactions:
This implementation has the following benefits:
This implementation has the following benefits:
* All the updates are done async, so we're not increasing the length of the transactions for `project_statistics`.
- All the updates are done async, so we're not increasing the length of the transactions for `project_statistics`.
* We're doing the update in a single SQL query.
- We're doing the update in a single SQL query.
* It is compatible with PostgreSQL and MySQL.
- It is compatible with PostgreSQL and MySQL.
* No background migration required.
- No background migration required.
The only downside of this approach is that namespaces' statistics are updated up to `1.5` hours after the change is done,
The only downside of this approach is that namespaces' statistics are updated up to `1.5` hours after the change is done,
which means there's a time window in which the statistics are inaccurate. Because we're still not
which means there's a time window in which the statistics are inaccurate. Because we're still not
...
@@ -171,8 +171,8 @@ performant approach of aggregating the root namespaces.
...
@@ -171,8 +171,8 @@ performant approach of aggregating the root namespaces.
All the details regarding this use case can be found on:
All the details regarding this use case can be found on:
w->>+s: save the incoming file on a temporary location
w->>+s: save the incoming file on a temporary location
s-->>-w:
s-->>-w:
w->>+r: POST /some/url/upload
w->>+r: POST /some/url/upload
Note over w,r: file was replaced with its location<br>and other metadata
Note over w,r: file was replaced with its location<br>and other metadata
opt requires async processing
opt requires async processing
r->>+redis: schedule a job
r->>+redis: schedule a job
redis-->>-r:
redis-->>-r:
end
end
r-->>-c: request result
r-->>-c: request result
...
@@ -230,17 +230,17 @@ sequenceDiagram
...
@@ -230,17 +230,17 @@ sequenceDiagram
w->>+os: PUT file
w->>+os: PUT file
Note over w,os: file is stored on a temporary location. Rails select the destination
Note over w,os: file is stored on a temporary location. Rails select the destination
os-->>-w:
os-->>-w:
w->>+r: POST /some/url/upload
w->>+r: POST /some/url/upload
Note over w,r: file was replaced with its location<br>and other metadata
Note over w,r: file was replaced with its location<br>and other metadata
r->>+os: move object to final destination
r->>+os: move object to final destination
os-->>-r:
os-->>-r:
opt requires async processing
opt requires async processing
r->>+redis: schedule a job
r->>+redis: schedule a job
redis-->>-r:
redis-->>-r:
end
end
r-->>-c: request result
r-->>-c: request result
...
@@ -268,4 +268,3 @@ sequenceDiagram
...
@@ -268,4 +268,3 @@ sequenceDiagram
This option affect the response to the `/authorize` call. When not enabled, the API response will not contain presigned URLs and workhorse will write the file the shared disk, on the path is provided by rails, acting like object storage was disabled.
This option affect the response to the `/authorize` call. When not enabled, the API response will not contain presigned URLs and workhorse will write the file the shared disk, on the path is provided by rails, acting like object storage was disabled.
Once the request reachs rails, it will schedule an object storage upload as a sidekiq job.
Once the request reachs rails, it will schedule an object storage upload as a sidekiq job.
@@ -102,7 +102,7 @@ files to your local computer, automatically preserving the Git connection with t
...
@@ -102,7 +102,7 @@ files to your local computer, automatically preserving the Git connection with t
remote repository.
remote repository.
You can either clone it via HTTPS or [SSH](../ssh/README.md). If you chose to clone
You can either clone it via HTTPS or [SSH](../ssh/README.md). If you chose to clone
it via HTTPS, you'll have to enter your credentials every time you pull and push. You can read more about credential storage in the [Git Credentials documentation](https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage). With SSH, you enter your credentials only once.
it via HTTPS, you'll have to enter your credentials every time you pull and push. You can read more about credential storage in the [Git Credentials documentation](https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage). With SSH, you enter your credentials only once.
You can find both paths (HTTPS and SSH) by navigating to your project's landing page
You can find both paths (HTTPS and SSH) by navigating to your project's landing page
and clicking **Clone**. GitLab will prompt you with both paths, from which you can copy
and clicking **Clone**. GitLab will prompt you with both paths, from which you can copy
When you clone a repository, `REMOTE` is typically `origin`. This is where the
When you clone a repository, `REMOTE` is typically `origin`. This is where the
repository was cloned from, and it indicates the SSH or HTTPS URL of the repository
repository was cloned from, and it indicates the SSH or HTTPS URL of the repository
on the remote server. `<name-of-branch>` is usually `master`, but it may be any existing
on the remote server. `<name-of-branch>` is usually `master`, but it may be any existing
branch. You can create additional named remotes and branches as necessary.
branch. You can create additional named remotes and branches as necessary.
You can learn more on how Git manages remote repositories in the [Git Remote documentation](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes).
You can learn more on how Git manages remote repositories in the [Git Remote documentation](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes).
...
@@ -169,7 +169,7 @@ To view your remote repositories, type:
...
@@ -169,7 +169,7 @@ To view your remote repositories, type: