There is an IP address conflict with another system on the network.
I only needed to be run over by four or five cluetrains before realizing what was causing this message on a Windows 2003 server. Follow along and see when you solve it.
Here's what was going on:
A Windows 2003 server would sometimes complain that its IP was in use on the network. This would often happen after a power failure.. yes, they had a UPS, but it died and they never got around to replacing it as power failures seem to be very rare in this particular area and the server isn't all that critical. It's a mail and file server; if the power is out the lights are out and nobody cares too much about mail or files right then. So..
It would not always complain that its IP was in use. That was actually an important clue, so if you are trying to solve this before I tell you what it was, chew on that while I blabber on.
Of course the very first thing we did was shut the server off and go looking for that IP. We found nothing. We looked on the DHCP servers to see if they were perhaps sometimes handing that IP out to someone else; they were not.
Being suspicious of switches, I swapped those out with some spares. No change.
Sometimes we could clear it by disabling the nic card, counting to twenty and reenabling. That didn't always work, though.
What did work was disconnecting it from the network, booting it up, counting to the magic twenty once more and then plugging it back in. That became the official procedure for handling this. As I said, power failures are rare, so this was a once or twice a year thing.
That's a good clue also, by the way.
The really big clue was this: if you'd patiently wait an hour or so, the problem would fix itself. That's a BIG clue.
Also, if you happened to shut down the machine midday, it would never see this problem. That's the same clue as the last, if you think about it.
After the fourth or fifth time that this happened, I did catch the clues and was able to cure it in a whopping three or four seconds. Have you guessed it yet?
Actually, it was when I happened to be editing that article for a typo that the clues all came together in my head and I knew what was freaking out that Windows server.
Stop reading now if you want to think about this more before I tell you.
I'll add a little white space so you don't just slide on into it against your will..
It was manual arp addresses hardcoded into an old SCO box on the network.
At one time, there was a network print server that used the IP address we were looking for. The printer was long gone, but the manual setting was still in the SCO startup scripts.
So.. if the SCO machine rebooted before the Windows server, it had an arp entry for that IP and when the slower Windows box did its "who has?" broadcast, the SCO would helpfully provide the IP of that non-existent printer.
If the SCO machine did not restart for whatever reason, the Windows machine saw no response to its inquiry and happily used the IP we wanted it to use.
As the arp protocol specifies, if the non-existent host stays non-existent long enough (which it obviously would in this case), the machine holding the cache should forget about it.
You can see now how all the clues made perfect sense. I logged into the SCO box, found the rc2.d script that caused this problem, deleted it and that was that. Problem solved.
Got something to add? Send me email.
More Articles by Anthony Lawrence © 2011-11-14 Anthony Lawrence