In /proc/sys/kernel you'll find the file "panic" and, starting with 2.6 kernels, "panic_on_oops".
The "panic" file controls whether or not the kernel will attempt to reboot after a panic. If "panic" is zero, it will just sit forever waiting for you to do something. Obviously that's not good for an unattended machine or a machine that is difficult to get to, or perhaps a machine with no monitor. In those cases, you probably want to set "panic" to some non-zero value. For example, setting it to "30" means the kernel will reboot after 30 seconds. Assuming the problem was transitory, you are back up and running. If not, well, when you eventually get in front of the thing, you'll be able to see the panic messages for 30 seconds. Remember that you need to rewrite on each boot.
It might be interesting to write a little script that checks the time it last wrote to "panic" and increments the amount if it was recent.. thereby increasing the time between reboots in the even the problem does not go away.
The "panic_on_oops" file is normally 0, but if set to 1, the kernel will delay a bit before reponding to the panic, thereby hopefully giving klogd a chance to write out what it knows about the problem.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2009-11-07 Tony Lawrence