ruby 1.8.5 not handling SIGTERM

I’ve deployed some applications using the nifty runit project because it offers very nice mechanisms for process supervision. I’ve been using it flawlessly in several environments for at least a year… until just the other day when I was sure I had found a problem with it. My investigation of sv term handling with a slow child process is available on the runit mailing list.

The problem I was experiencing was with a ruby application running under runit and not honoring SIGTERMs. Looking at the 1.8.0 runit source via `apt-get source runit` on debian sid, I learned more about how runit’s pieces work together to manage and supervise child processes. Riddling the code with debug info, I was able to trace the offending code to the sv.c source file in the control() function. A snippet of code was disallowing writing of another command to a pipe for later processing because the process was already in a particular state. In my case, runit thought the process was already down because I had previously sent it a TERM signal. Unfortunately, my process never handled the original TERM because of delayed registration of the handler and therefore never would receive another from runit.

This repro’d 100% of the time on ruby 1.8.5 with this snippet of code:

puts "doing initial sleep for 10..."
sleep(10)

puts "registering term handler..."
trap("TERM") do
  puts "got term"
  exit
end

while(true) do
  puts "in loop, sleeping for 2..."
  sleep 2
end

Here’s where it gets a little messy…

As I outline in my post to the mailing list, runit does indeed refrain from sending a TERM signal again to the process, but the real offender was ruby v1.8.5. The Google led me to a post on ruby-forum about a signal handling behavior change on an upgrade to ruby v1.8.5. Matsumoto chimes in and indicates that this seemingly related(?) issue is indeed a bug.

I tried a most base case running a ruby script consisting of ’sleep 30′. Running this under:
rwoodrum@fs1sea:~/tmp$ ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]

and subsequently sending the process a TERM does nothing. A la this annotated strace:
..... snip .....
..... we received a TERM .....
) = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn() = ? (mask now [])
select(0, NULL, NULL, NULL, {12, 476000}

..... riding out the sleep .....

) = 0 (Timeout)
time(NULL) = 1200560526
sigprocmask(SIG_BLOCK, NULL, []) = 0
sigprocmask(SIG_BLOCK, NULL, []) = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f180d0, [], 0}, 8) = 0
exit_group(0) = ?

I believe at this point that the SIGTERM handler is going to be SIG_DFL. I don’t understand why this doesn’t perk up and terminate the process however, so that is still somewhat unsolved. This doesn’t repro with a more recent version of ruby.

Lessons learned? The version of Ruby1.8 in debian etch (package ruby1.8 version 1.8.5-4etch1) does not appear to handle SIGTERM correctly, even in cases where a signal handler is registered.

Comments

Leave a Reply