Limit updates to Web Hook backoff interval
If a Web hook times out, this is treated as an error, and `Webhook#backoff!` is executed. However, if the hook fires repeatedly, which is common for a system hook or group hook, it's possible for this backoff to update the same row repeatedly via `WebHooks::LogExecutionWorker` jobs. This not only generates unnecessary table bloat, but it can cause a significant performance degradation when a long transaction has started. These concurrent row updates can cause PostgreSQL to allocate multixact transaction IDs. A SELECT call will cause PostgreSQL to prune tuples in an opportunistic way, but this pruning may be significantly slowed if the window of multiexact tuples grows over time. Once the simple LRU cache can no longer fit the multixact XIDs in-memory cache, we will see slowdowns when accessing the `web_hooks` table. To avoid this, we cap the number of backoffs to 100 (`MAX_FAILURES`) and only update the row if the `disabled_until` time has elapsed. This should ensure the hook only fires once every 24 hours and only updates the row once during that time. Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/340272 Changelog: performance
Showing
Please register or sign in to comment