I picked up a new customer recently. It was just tape drive installation and the machine was soon back in service.
But the next day I got a phone call from the woman who uses the machine. "It says "Disk boot failure, Insert system disk and press ENTER".
Sheesh. Just what every computer fixer wants to hear. You touched my machine, and you broke it. Oh boy.
This woman was honest though: "It's my fault, I forgot to tell you. It's been doing that for a while. I was just so upset about the tape drive stuff I totally forgot."
Hmm. That's better, but I didn't see anything like that when I was working on the tape drive. I said so. "It doesn't happen always", she explained. OK, so she can bring it back and I'll look more closely.
Sure enough, when she brought it back, it failed to boot. I reset it and it booted fine. Did that five more times; it failed on the fifth.
All righty, there is an intermittent problem here. Maybe the hard drive, but I looked in the syslog and there was nothing there. So, I put the machine under heavy load: reading and writing the tape drive, continuous "ls -lR /" loops running, copying big files to /tmp in other loops, all with random sleeps. Sure enough, twenty minutes in the disk stopped responding. Nothing in the logs still, but the machine was locked up.
My first thought for this is power supply. There's a couple of reasons for that: first, there's no direct indication of the disk actually being bad, second, a "bad" hard drive doesn't usually just stop without errors, and third, changing out a power supply is a quick and simple test. Cheap, too, unless you have some strange proprietary thingy. Even then you can sometimes still test even if you couldn't actually fit your test power supply in the machine.
Ah, but I couldn't find a spare power supply. There are one or two around here somewhere, but we still have not completely unpacked. So.. let's just take a peek under the hood.
I hadn't really paid a lot of attention to the machine before because the tape drives were already installed and it wasn't unusually dusty inside. But this time I did blow it out with "computer air".
I recently had another conversation with someone who killed a computer dead by blowing it out with "shop air" - air from a garage air compressor. Those compressors usually throw out large quantities of oil along with the air, and are also often much more powerful than you really want. Do not do that. Use the little air cans designed for cameras or computers. Amazingly he eventually got the system working, but I'm not sure I'd ever trust that machine again.
At https://groups.google.com/ .. 63b240e9aeb12d9b someone wanted to try that with a leaf blower..
I also pulled off the cables and the power connectors. The hard drive cable looked new and wasn't crimped or bent. The power connector didn't seem quite tight enough to me, so I used one of the spares when I plugged it back in. I put everything back and buttoned the machine back up.
I spent the next three days trying to make it fail again. I had so many processes running it was swapping. I covered it with a blanket (not blocking the fan) to make it hotter. I put it near an A/C vent to make it colder. I ran it on its side and upright. I power cycled heavy appliances on the same circuit. I could not make it fail again.
I'm going to return it to the owners. If it does fail for them, maybe we'll try a new power supply, but I suspect it was just something loose.
Got something to add? Send me email.
More Articles by Anthony Lawrence © 2011-05-02 Anthony Lawrence