• Kassio Borges's avatar
    GithubImporter: Optimize Pull Request Review Importer · f60fd4e3
    Kassio Borges authored
    = Problem
    
    The Github API does not provide a way to fetch all the pull requests
    reviews of a project (repo), like it provides for comments, instead we
    have to fetch the reviews by Pull Request.
    
    For this reason, the
    Gitlab::GithubImport::Importer::PullRequestsReviewsImporter¹ have to
    iterate over the imported pull requests and for each one do request the
    reviews, which might be more than one page.
    
    If the importer hits a rate limit, the process restarts, and the
    imported pull requests are skipped², but the importer goes over all the
    review pages again.
    
    In other words, for some projects with large number of pull requests and
    large number of reviews per pull request, we might end up with
    duplicated reviews and unnecessary API requests, which would lead to
    longer importing times.
    
    = Proposed solution
    
    - To avoid duplicated comments, besides caching the Pull Requests ids,
      also cache the review ids and skip the already processed ones.
    
    - To avoid unnecessary API requests, use the PageCounter to only request
      pages that weren't yet imported.
    
    Related to: https://gitlab.com/gitlab-org/gitlab/-/issues/331315
    Changelog: changed
    MR: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62036
    f60fd4e3
caching.rb 5.72 KB