-
Willem Riede authored
I earlier reported, that the error handler for ide-scsi exits prematurely if modprobed from rc.sysinit. I put in some debug prints to apprehend the culprit responsible for sending the SIGHUP signal that causes the exit. This is what my log captured: Jan 1 12:20:13 fallguy kernel: Process 223 [modprobe] starting scsi error handler Jan 1 12:20:13 fallguy kernel: Wake up parent of scsi_eh_2, pid 224 Jan 1 12:20:13 fallguy kernel: Signals pending for scsi_eh_2: 00000000 00000000 Jan 1 12:20:13 fallguy kernel: Error handler scsi_eh_2 sleeping Jan 1 12:20:13 fallguy kernel: scsi2 : SCSI host adapter emulation for IDE ATAPI devices [detected devices skipped] Jan 1 12:20:14 fallguy kernel: Signal 15 sent from 181 [rc.sysinit] to 182 [getkey] Jan 1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 22 [init] Jan 1 12:20:14 fallguy kernel: Signal 18 sent from 22 [init] to 22 [init] Jan 1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 22 [init] Jan 1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 24 [initlog] Jan 1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 78 [khubd] Jan 1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 224 [scsi_eh_2] Jan 1 12:20:14 fallguy kernel: Signals pending for scsi_eh_2: 00000001 00000000 Jan 1 12:20:14 fallguy kernel: Error handler scsi_eh_2 exiting Here is a snapshot of some processes made during rc.sysinit: F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 100 0 1 0 15 0 1332 420 schedu S ? 0:05 init ... 040 0 22 1 16 0 1332 388 wait4 S tty1 0:00 init 000 0 23 22 15 0 4116 1316 wait4 S tty1 0:00 /bin/bash / 040 0 24 23 16 0 2160 1364 schedu S tty1 0:00 /sbin/initl ... Init must have forked to exec bash to exec rc.sysinit which then gets re-executed through initlog. When rc.sysinit ends, the last thing it does is send that TERM signal from sub-process 181 to getkey (process 182) -- the 'Signal 15 ...' line above. As the forked init (process 22) exits, it sends a flurry of signals to all surviving processes created from it. That looks like standard "if I am to die I need to take all my offspring down with me" behavior -- do you agree? Since we want error handlers to survive, IMHO that means that the choice of signal for error handler exit is unfortunate. The source of scsi_error suggests SIGPWR might be a worthy alternative. I think that is true. From inspecting init source, it is not capable of sending SIGPWR. SIGPWR should never be sent by dying processes (its sole use should be from a power daemon _to_ init to shut the system down when the juice is running out). So I suggest the following changes to hosts.c and scsi_error.c:
e18106d2