checks: Convert FileSizeCheck push rule to allow for batching
The FileSizeCheck push rule loads all new blobs for a given change and checks whether any of these blobs is bigger than a certain threshold. This push rule is quite expensive: loading new blobs requires a revwalk of all preexisting references and thus scales with the number of refs in a repository. This may easily take a few dozen seconds to compute, and repeating this computation for each change doesn't help either. Refactor the FileSizeCheck to be implemented atop the BaseBulkChecker such that we can fix this shortcoming in the future: instead of loading blobs for each change, we may load them once for all changes. While this would already amortize the costs because we have to perform the walk of negative refs once only, we can eventually tweak this even further and use the quarantine directory to enumerate all pushed objects directly without doing a revwalk at all. This optimization will bring down the time to load blobs from dozens of seconds to a few milliseconds. Note that this commit doesn't yet change behaviour, but instead only prepares the infrastructure to allow for batch loading. Batch loading will require some additional changes to our infrastructure first, which will be done in a separate patch series.
Showing
Please register or sign in to comment