Commit d31730c3 authored by Craig Miskell's avatar Craig Miskell

Respect sidekiq timeout when hard-killing workers

As discovered in
https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10930,
the 5 second timeout can be too short as during normal shutdowns getppid
returns "1" sooner than expected.  But even in a "real" failure case
where the sidekiq-cluster process is terminated hard, we still need to
respect the sidekiq timeout so that sidekiq will be able to wait for
running jobs to complete (or termiante them and push them back into the
queue) before being killed off.  Otherwise we end up with orphaned jobs
that are only picked up by the reliable fetcher cleanup, up to an hour
later.
parent dbbdcf45
......@@ -14,10 +14,10 @@ if ENV['ENABLE_SIDEKIQ_CLUSTER']
if Process.ppid != parent
Process.kill(:TERM, Process.pid)
# Wait for just a few extra seconds for a final attempt to
# gracefully terminate. Considering the parent (cluster) process
# have changed (SIGKILL'd), it shouldn't take long to shutdown.
sleep(5)
# Allow sidekiq to cleanly terminate and push any running jobs back
# into the queue. We use the configured timeout and add a small
# grace period
sleep(Sidekiq.options[:timeout] + 5)
# Signaling the Sidekiq Pgroup as KILL is not forwarded to
# a possible child process. In Sidekiq Cluster, all child Sidekiq
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment