Commit 95ab2922 authored by Stan Hu's avatar Stan Hu

Limit updates to Web Hook backoff interval

If a Web hook times out, this is treated as an error, and
`Webhook#backoff!` is executed. However, if the hook fires repeatedly,
which is common for a system hook or group hook, it's possible for this
backoff to update the same row repeatedly via
`WebHooks::LogExecutionWorker` jobs. This not only generates unnecessary
table bloat, but it can cause a significant performance degradation when
a long transaction has started.

These concurrent row updates can cause PostgreSQL to allocate multixact
transaction IDs. A SELECT call will cause PostgreSQL to prune tuples in
an opportunistic way, but this pruning may be significantly slowed if
the window of multiexact tuples grows over time. Once the simple LRU
cache can no longer fit the multixact XIDs in-memory cache, we will see
slowdowns when accessing the `web_hooks` table.

To avoid this, we cap the number of backoffs to 100 (`MAX_FAILURES`) and
only update the row if the `disabled_until` time has elapsed. This
should ensure the hook only fires once every 24 hours and only updates
the row once during that time.

Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/340272

Changelog: performance
parent 75da7f45
......@@ -80,6 +80,8 @@ class WebHook < ApplicationRecord
end
def backoff!
return if backoff_count >= MAX_FAILURES && disabled_until && disabled_until > Time.current
assign_attributes(disabled_until: next_backoff.from_now, backoff_count: backoff_count.succ.clamp(0, MAX_FAILURES))
save(validate: false)
end
......
......@@ -10,7 +10,11 @@ RSpec.describe WebHook do
let(:hook) { build(:project_hook, project: project) }
around do |example|
freeze_time { example.run }
if example.metadata[:skip_freeze_time]
example.run
else
freeze_time { example.run }
end
end
describe 'associations' do
......@@ -326,10 +330,28 @@ RSpec.describe WebHook do
expect { hook.backoff! }.to change(hook, :backoff_count).by(1)
end
it 'does not let the backoff count exceed the maximum failure count' do
hook.backoff_count = described_class::MAX_FAILURES
context 'when we have backed off MAX_FAILURES times' do
before do
stub_const("#{described_class}::MAX_FAILURES", 5)
5.times { hook.backoff! }
end
it 'does not let the backoff count exceed the maximum failure count' do
expect { hook.backoff! }.not_to change(hook, :backoff_count)
end
it 'does not change disabled_until', :skip_freeze_time do
travel_to(hook.disabled_until - 1.minute) do
expect { hook.backoff! }.not_to change(hook, :disabled_until)
end
end
expect { hook.backoff! }.not_to change(hook, :backoff_count)
it 'changes disabled_until when it has elapsed', :skip_freeze_time do
travel_to(hook.disabled_until + 1.minute) do
expect { hook.backoff! }.to change { hook.disabled_until }
expect(hook.backoff_count).to eq(described_class::MAX_FAILURES)
end
end
end
include_examples 'is tolerant of invalid records' do
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment