checks: Speed up retrieving commits via quarantine directory
The Git access checks need access to objects which have been newly introduced for a set of given changes. These new objects used to be retrieved via a revision walk, where preexisting references have been excluded and new references included. The resulting set of objects is then the set of objects which haven't previously been referenced by anything else. Depending on the number of references, walking all preexisting ones can be quite expensive and easily take dozens of seconds. In the past, we have thus converted our LFS pointer checks to use the quarantine directory to list new LFS pointers. The quarantine directory hosts all objects which have been uploaded by a git-push(1), and can thus directly tell us about which objects are new and which aren't. This has sped up LFS pointer checks in case we have a quarantine directory by 3 orders of magnitude in biggish repos with many refs, going from up to 20 seconds down to 20 milliseconds. Given that commits can be front-loaded in a single batched RPC call since 156ce433 (checks: Implement infrastructure for bulk diff checks, 2021-07-29), we are now in a position to do the same for commits: If we have an object quarantine directory, then we use the new `ListAllCommits()` RPC with all alternate paths unset. This will thus effectively only list commits of the object directory, which in a quarantined environment points to the quarantine object directory. Note that we still have to retain the old code to list commits via a revision walk, given that non-push RPCs currently don't have an object quarantine directory available. While this is slated to change soon by creating manual quarantines, we must continue to support enumeration via revwalks. Changelog: performance
Showing
Please register or sign in to comment