That's a snippet from email I got yesterday. The owner of the machines is a local doctor, and the procedure has been that the backup tape from the "working" server has been restored to the "spare" server, thus providing safety in the event of failure. Good idea, but the weak part was that they used Travan tape drives. These failed, and a local consultant decided to replace them with DDS4 DAT drives. Excellent idea, but he knew nothing about Unix, so he went looking for help. He's the one who sent the email.
He'd found some nearby college student who professed knowledge, but had instead "broken" the spare machine. Backup was not working. This was a SCO Unix system, so it was easy enough to find me on the web and he was probably pretty happy to see that I was one town away. I wondered if the college kid really knew anything about SCO or was just a Linux type.. either way, he should have been able to do this.
I offered to come look at the working box, but no, he wanted me to fix the "spare" that the student had broken. I hesitated: I'd much rather just go install the tape drive in the untouched machine then stick my fingers in somebody else's mess. But the consultant and the doctor were now worried about losing everything: if their working machine was broken, they'd have nothing. So, I reluctantly agreed to look at the spare. The consultant offered to drop it off at my house; I don't normally like to do that either, but I agreed and last night the machine arrived here.
I was amused by the bag of stuff that accompanied it: complete printouts from the online manual for the tape drive, 3 extra Adaptec 2940 SCSI cards, and screwdrivers. Yes, screwdrivers. There was also a book: "Essential Unix Administration, a Beginner's Guide".
I guess he wanted to be sure I'd have what I needed.
The hard drive is ide, so the scsi cards were for the tape drive, which was already installed. I took the cover off just to check. Ayup, no termination: straight from the card to the tape, no terminator. He can probably get away with that, but it's not done right. These drives can't provide termination, so we need to do it at the end of the cable. He'll need to fix that.
I booted the machine and noticed that it hung for half a minute at "wdinit". Well, that's easy to get rid of: just add "wd.delay=1" to the end of the "defbootstr" line in /etc/default/boot. But right after the "tape" line showed up during boot, an ominous looking warning appeared:
WARNING: idistributed: cannot handle interrupts from PCI device
(handler F007282C, device 0/10/0)
I checked with "hw -r pci" and yep, that's the 2940 card. The %adapter line showed it claiming interrupt 255. No wonder idistributed is complaining. But what's the real problem? Bad driver? Nope. Bad card. Nope. Simpler than that. The card just could not acquire an interrupt from the BIOS. I looked at the PNP/PCI configuration in the machine's bios and every interrupt was reserved for Legacy/ISA. Obviously someone had gone a little too far with that.. I reset 9 -12 to PCI and a reboot eliminated that error message.
Next was the tape configuration. It was set to "Generic SCSI-1 / SCSI-2 tape drive", which is wrong. I deleted that tape and started over, this time setting it to "DAT drive (Compressing and non-Compressing)". The kernel relink failed, complaining of problems in the "blad" driver. Blad? We aren't using any blad. I edited /etc/conf/sedevice.d/blad and changed all the Y's to N, and tried the relink again. No problems now, reboot, and tested the drive. It worked.
I looked back in /var/adm/messages and noticed that at one point the 2940 had claimed interrupt 5, so obviously the BIOS hadn't been set this way when the student first installed the card. But he would have had troubles because of not choosing DAT anyway, and maybe the "blad" was him trying to get it working - I don't know. He got off course somewhere, but not knowing to choose DAT would have messed him up - SCO's "mkdev tape" can be confusing to the uninitiated. People quit out too soon before they even see the question.
So now the question is: will they let me touch the working machine? We'll see..
Got something to add? Send me email.
More Articles by Anthony Lawrence © 2011-03-10 Anthony Lawrence
C++ is just an abomination. Everything is wrong with it in every way. So I really tried to avoid using that as much as I could and do everything in C at Netscape. (Jamie Zawinski)