A few days ago I had an email from a long time customer telling me that she had been trying to get a new Dell configured but it just couldn't seem to see the network. Her email said the machine was getting an ip address, but just wasn't accessing network resources. She said she'd had enough for the day, but would call me in the morning. I assumed this would be some some variation of a Windows authentication problem and didn't give it any more thought.
The next day's email brought a new message: mysteriously the machine had fixed itself overnight. Everything was fine, have a nice day, and so on. OK, great: a lot of problems do fix themselves, though I thought this was a little bit odd considering the symptoms described. Oh well, I had plenty of other work to do, including a programming project that has been incomplete for several weeks. I had just logged into that customer's machine to get reoriented with that code when the phone rang.
It was the customer with the misbehaving Dell. A new problem, she said: all the remote desktop clients are down. She had rebooted the Terminal Server. but clients still could not connect. I confirmed that by trying to connect from my Mac - no dice. I could ssh in to their Linux box though, so it wasn't their internet connection. Time to dig deeper.
After logging into Linux, I tried pinging the Terminal Server. No response. Unreachable. Dead. I told my customer that. "But it thinks everything is fine", she said, "except that no packets are flowing.."
Ok, maybe we have a bad switch port. I had her unplug the cable. The server immediately noticed that the cable was unplugged, but plugging it into a different port didn't help. I had her try a different switch entirely; no change. Hmmm.
Well, this server has another NIC that we don't currently use. I had her unconfigure the current card and transfer the ip to the other card. We switched the cable, but no change. Unreachable. Dead. Maybe a bad cable? Unlikely since it noticed plug/unplug events, but worth a try. I was about to suggest that when the customer said "It must have something to do with the new Dell"
Honestly, that seemed unlikely unless she had tried to configure that with the same ip or same network name. But I knew she hadn't. I asked her if that machine was running. She said no, but it was still plugged into the network. What the heck, unplug it, I said, not expecting any change from that action. To my surprise, the moment she unplugged it, the server responded to a ping. Plug it back in, no response. Unplug, all was fine. I tried the Remote Desktop; it came right up. Consistently repeatable, no question about it: the problem was this new Dell - the machine that wasn't even running!. She unplugged it again because users needed to get work done. As our work was solving this problem, we booted the new Dell (leaving it unplugged from the network) to see what we could see.
My immediate suspicion was that this card was a one in a million incorrect MAC address. Hardware addresses are supposed to be unique but screwups can happen, so I wanted to know what that new machine thought its NIC hardware address was. I knew what the Terminal Server's address was from "arp -an", so I just needed to get it from the new box.
Stupid #$!@% Windows! If the cable is unplugged, you can't get XP to give you the status of the connection. Device Manager doesn't bother to tell you that data at all, so that's unhelpful. Fortunately you can still get to a command line and "ipconfig/all" will give you the physical address. Idiots.
Anyway, that wasn't the problem. This machine really does have a unique and proper MAC address. So that's not it. I suppose it could be putting out incorrect voltage on the line and that is leaking to disrupt the server if its wiring is close by, but experimenting with that by moving the machine is just going to interrupt more work so we decided to let it be. I told her she could go buy another NIC, but that this could be a motherboard problem that might manifest itself somewhere else later, so my best advice was to get Dell to replace it. She agreed, though the employee who had been suffering with an old Windows 95 machine for years wasn't happy to see her new toy disappear so suddenly. But her old machine regained its place, and the network remained happy.
When all else fails, start unplugging. After last weeks bad storm here in the Northeast, I had a similar case where a server wouldn't come up because it insisted that it saw a duplicate name on the network. The customer checked every machine; there were no conflicts. I then had her unplug all network cables except the three servers. Rebooting the troubled server still gave the same message. We unplugged the other two servers. No change. In desperation, I had her unplug the router also. Still no change. At this point, there was nothing connected to the switch but this server. I had her move the cable to another switch, but the reboot still complained. Obviously there was something wrong with the card: it was seeing itself! We swapped in a new NIC card, and the problem went away.
Bad nics can do very strange things.
Got something to add? Send me email.
More Articles by Anthony Lawrence © 2011-06-29 Anthony Lawrence