• Patrick Steinhardt's avatar
    Use object quarantine directory to enumerate new LFS pointers · e07a1c70
    Patrick Steinhardt authored
    When accepting pushes, we will check whether pushes contain any new LFS
    pointers and, if so, verify that we've got the corresponding LFS object
    for each of the poniters in order to ensure consistency. Determining new
    LFS pointers is expensive though: we need to perform a complete graph
    walk in order to determine which blobs are new and which aren't. The
    integrity check's runtime thus scales with repository size and is
    frequently seen to take multiple seconds or even time out after 30
    seconds. Results are that the push seems to be hanging for quite some
    time doing nothing, or that the push is refused altogether in the case
    of a timeout.
    
    We can do better though: instead of doing a graph walk, we can just
    inspect all pushed objects directly by enumerating all pushed objects.
    This is quite trivial to do: when git-receive-pack(1) receives a push,
    all pushed objects will be put into a quarantine directory which is then
    made available to git hooks via the GIT_OBJECT_DIRECTORY variable, where
    the real repository is stored in GIT_ALTERNATIVE_OBJECT_DIRECTORIES.
    Instead of doing the graph walk, we can just use git-cat-file(1) with
    the `--batch-all-objects` flag and the alternate object directories
    unset. The result is a direct enumeration of all pushed objects, which
    scales linearly with push size and not with repository size. Doing some
    benchmarks for gitlab-org/gitlab showed that these computations are
    around 100-200x faster than doing the graph walk, reducing the time from
    around 200ms to 2-4ms.
    
    This functionality has recently been implemented in Gitaly via the new
    `ListAllLFSPointers()` RPC: given a repository, it will simply list all
    reachable or unreachable objects. We can now use above semantics when a
    pre-receive hook environment is active and strip the repository's
    alternative object directory, which will as a result only list newly
    pushed objects.
    e07a1c70
blob_service.rb 5.46 KB