• Stan Hu's avatar
    Fix merge pre-receive errors when load balancing in use · 821e1a83
    Stan Hu authored
    When a user merges a merge request, the following sequence happens:
    
    1. Sidekiq: `MergeService` locks the state of the merge request in the
    DB.
    2. Gitaly: UserMergeBranch RPC runs and creates a merge commit.
    3. Sidekiq: `Repository#merge` and `Repository#ff_merge` updates the
       `in_progress_merge_commit_sha` database column with this merge commit
       SHA.
    4. Gitaly: UserMergeBranch RPC runs again and applies this merge commit
    to the target branch.
    5. Gitaly (gitaly-ruby): This RPC calls the pre-receive hook.
    6. Rails: This hook makes an API request to `/api/v4/internal/allowed`.
    7. Rails: This API check makes a SQL query for locked merge requests
       with a matching SHA.
    
    Since steps 1 and 7 will happen in different database sessions,
    replication lag could erroneously cause step 7 to report no matching
    merge requests. To avoid this, we have a few options:
    
    1. Wrap step 7 in a transaction. The EE load balancer will always
       direct these queries to the primary.
    2. Always force the load balancer session to use the primary for this
       query.
    3. Use the load balancing sticking mechanism to use the primary until
       the secondaries have caught up to the right write location.
    
    Option 1 isn't great because on GitLab.com, this query can take 500 ms
    to run, and long-running, read transactions are not desirable.
    
    Option 2 is simple and guarantees that we will always have a consistent
    read. However, none of these queries will ever be routed to a secondary,
    and this may cause undo load on the primary.
    
    We go with option 3. Whenever the `in_progress_merge_commit_sha` is
    updated, we mark the write location of the primary. Then in
    `MatchingMergeRequest#match?`, we stick to the primary if the replica
    has not caught up to that location.
    
    Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/247857
    821e1a83
project_spec.rb 83.6 KB