Respect sidekiq timeout when hard-killing workers
As discovered in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10930, the 5 second timeout can be too short as during normal shutdowns getppid returns "1" sooner than expected. But even in a "real" failure case where the sidekiq-cluster process is terminated hard, we still need to respect the sidekiq timeout so that sidekiq will be able to wait for running jobs to complete (or termiante them and push them back into the queue) before being killed off. Otherwise we end up with orphaned jobs that are only picked up by the reliable fetcher cleanup, up to an hour later.
Showing
Please register or sign in to comment