GithubImporter: Optimize Pull Request Review Importer
= Problem The Github API does not provide a way to fetch all the pull requests reviews of a project (repo), like it provides for comments, instead we have to fetch the reviews by Pull Request. For this reason, the Gitlab::GithubImport::Importer::PullRequestsReviewsImporter¹ have to iterate over the imported pull requests and for each one do request the reviews, which might be more than one page. If the importer hits a rate limit, the process restarts, and the imported pull requests are skipped², but the importer goes over all the review pages again. In other words, for some projects with large number of pull requests and large number of reviews per pull request, we might end up with duplicated reviews and unnecessary API requests, which would lead to longer importing times. = Proposed solution - To avoid duplicated comments, besides caching the Pull Requests ids, also cache the review ids and skip the already processed ones. - To avoid unnecessary API requests, use the PageCounter to only request pages that weren't yet imported. Related to: https://gitlab.com/gitlab-org/gitlab/-/issues/331315 Changelog: changed MR: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62036
Showing
Please register or sign in to comment