Commit d14dc814 authored by Kirill Smelkov's avatar Kirill Smelkov

gitlab: watcher should take care of sidekiq killed by SIGTERM

The watcher should also watch for signals like SIGTERM killing sidekiq, which
are trapped by sidekiq, with sidekiq exiting successfully (with exit code 0).

To achieve this we rework our watcher-sigkill to be a generic watcher -
that can be given a set of restart exit codes including signal names and
monitors whether child process terminated with matching for restart exit
code.

Example usage:

	watcher 0,SIGKILL prog ...

Based on patch by @iv.
Discussion: https://lab.nexedi.com/lab.nexedi.com/lab.nexedi.com/issues/25#note_22085
parent e7e37398
......@@ -632,11 +632,11 @@ log = ${sidekiq-dir:log}
recipe = slapos.cookbook:wrapper
wrapper-path = ${directory:service}/sidekiq
command-line =
# NOTE Sidekiq memory killer just makes sidekiq processes to be SIGKILL
# terminated and relies on managing service to restart it. In slapos we don't
# have mechanism to set autorestart=true, nor bang/watchdog currently work with
# slapproxy, so we do the monitoring ourselves.
{{ watcher_sigkill }}
# NOTE Sidekiq memory killer makes sidekiq processes to exit, or if exit request
# not handled in time, to be SIGKILL terminated, and relies on managing service
# to restart it. In slapos we don't have mechanism to set autorestart=true, nor
# bang/watchdog currently work with slapproxy, so we do the monitoring ourselves.
{{ watcher }} 0,SIGKILL
${gitlab-sidekiq:wrapper-path}
# XXX -q runner ? (present in gitlab-ce/Procfile but not in omnibus)
......
......@@ -55,7 +55,7 @@ context =
raw redis_binprefix ${redis28:location}/bin
raw ruby_location ${bundler-4gitlab:ruby-location}
raw tar_location ${tar:location}
raw watcher_sigkill ${watcher-sigkill:rendered}
raw watcher ${watcher:rendered}
raw xnice_repository_location ${xnice-repository:location}
# config files
......
......@@ -53,7 +53,7 @@ parts =
bash
curl
watcher-sigkill
watcher
gitlab-export
gzip
dcron-output
......@@ -256,7 +256,7 @@ eggs =
recipe = slapos.recipe.template
url = ${:_profile_base_location_}/instance.cfg.in
output = ${buildout:directory}/instance.cfg
md5sum = b99a99b161c0b292845002fc3fee50cd
md5sum = 2329ddc4934e900785aa669adc214c23
# macro: download a shell script and put it rendered into <software>/bin/
[binsh]
......@@ -267,9 +267,9 @@ mode = 0755
context =
section bash bash
[watcher-sigkill]
[watcher]
<= binsh
md5sum = 2986dcb006dc9e8508ff81f646656131
md5sum = 90690e1351637f20ff2df57a6c3e85b4
[gitlab-export]
<= binsh
......@@ -319,7 +319,7 @@ md5sum = 176939a6428a7aca4767a36421b0af2b
[instance-gitlab.cfg.in]
<= download-file
md5sum = 89914e4a225f6cdebfa196d46359f6f2
md5sum = b05fad928ffbb689b4415837525c62d1
[instance-gitlab-export.cfg.in]
<= download-file
......
#!{{ bash.location }}/bin/bash
# run program under SIGKILL watchdog
# watcher-sigkill <prog> [<progargs> ...]
# run program under watchdog
# watcher <restart-codes> <prog> [<progargs> ...]
#
# if the program terminates with SIGKILL - it is restarted after grace period.
# <restart-codes> = code1,code2,...
#
# if the program terminates with status in <restart-codes> - it is restarted after grace period.
# if the program terminates otherwise - whole process terminates.
#
# code can be numeric or symbolic - refering to a signal name. example:
#
# watcher 0,SIGKILL <prog> ...
if [ "$#" -lt 1 ]; then
echo "Usage: watcher-sigkill <prog> [<progargs> ...]" 1>&2
die() {
echo "$@" 1>&2
exit 1
}
if [ "$#" -lt 2 ]; then
die "Usage: watcher <restart-codes> <prog> [<progargs> ...]"
fi
restart_codes="$1"; shift
prog="$@"
# signumber <signame> -> #sig
signumber() {
signame=$1
# "11) SIGSEGV "
sigentry=`kill -l |grep -o "[0-9]\+) $signame\(\s\|$\)"` ||
die "E: $signame is not a signal"
echo "$sigentry" | grep -o "[0-9]\+"
}
# restart codes as set
declare -A restarts
for code in `echo "$restart_codes" |sed 's/,/ /g'`; do
case $code in
*[!0-9]*)
# non-number - treat it as signal name
signo=`signumber $code` || exit 1
code=$((128 + $signo)) # exit code of process terminated by signal #signo
;;
*)
# already number
;;
esac
restarts[$code]=y
done
progpid=""
killexit="137" # = 128 + 9 (exit code of process terminated by SIGKILL)
# make sure to terminate children, when we exit.
# needed for e.g. when `slapos node stop ...` kills us.
......@@ -32,8 +68,8 @@ while true; do
status=$?
echo "-> $status"
# if program terminated not by SIGKILL - exit
if [ "$status" != "$killexit" ] ; then
# if program terminated not with expected status - exit
if [ "${restarts[$status]}" != y ] ; then
echo "exit $status"
exit "$status"
fi
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment