Wed Oct 6 12:38:29 2004 Switches Search Keys: hardware|problems
Yesterday I had an emergency call from a customer whose network was down. As I was familiar with the physical setup (old equipment, badly made cat-5 cables), I suspected cable failure at first, and yes, when I arrived on site and started testing, there were some cables that wouldn't pass a "wiggle test" - in other words, they probably worked most of the time, but if you put a little wiggle on the connector, they were apt to fail.
However, that wasn't the real problem. One of the users gave me my first clue when he said that around 3:00 PM he had been copying files from one server to another. He heard a longish "beep" and everything stopped. A long beep makes me think UPS, and that makes me suspicious of power surges, which makes me start thinking about blown switches.
Other evidence supported this: a very few people were still able to connect to some servers, though intermittently: one minute they'd check mail and Outlook would complain that it couldn't reach the server, a few minutes later a bunch of mail would come through. The same flaky access was seen for Internet access.
Testing is not too difficult. First, I used a cross-over cable to connect two machines together. I chose two Linux boxes, and verified that they could ping each other and transmit some files without error. The purpose of this was to be sure no damage had been done to their NIC cards. Next, I took everything off the nearest switch and plugged these two servers into that. Moving wires from port to port, I quickly found that most of the ports were dead or very close to it. The only good ports were those that had not had any wires plugged into them.
That switch was not directly connected to the UPS, but it was daisy chained to another switch that was. Like so many places, these folks had outgrown their switch capacity and had daisy chained rather than buying larger units. Physically, a 24 port switch in another part of the building fed an 8 port (which was the one plugged into the UPS), which then went to the 5 port that I first tested, and to another one in the next office. I repeated my test with the 8 port and found that several ports there were dead or weak.
It was obvious now that we'd need some hardware. We sent someone out and they soon returned with a 16 port and a 5 port 10/100 switch. At this point I didn't know if the damage might extend to the 24 port across the building, but we couldn't buy that off the shelf anyway and I knew I could get at least part of the building running with the new switches. There were actually two 24 ports chained together, but they were too far away for me to conveniently test, so I just installed the new equipment and crossed my fingers. I still needed the 5 port just because of physical location, but we did eliminate one extra 5 port switch.
Fortunately, everything came up. Well, almost. One machine in a remote location wasn't seeing the network. I took a quick look and tried finding where its wire went to but lost it under heavy desks and file cabinets. It was getting late, people were leaving to go home, so we decided to track that one down the next day - it apparently wasn't a critical machine anyway. It could be a weak port, bad wiring, who knows? The customer will find it.
My suspicion is that the cheap UPS blew ports on two switches. I've seen UPS do things like that, and recommended that it be replaced ASAP. We also talked about network surge protectors, though I'm not sure those would have stopped this.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2012-06-17 Tony Lawrence
It is the the duty of a Webmaster to allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years. (Tim Berners-Lee)