A "race condition" is a circumstance where you have two things that really need to happen sequentially, but don't. Usually if there is an error (the programmer didn't provide any method of enforcing the sequence), it's because the programmer didn't notice the problem under "ordinary" circumstances. It then becomes a "bug", where something unexpected happens sometimes but not always. It comes up a lot in system level programming, but also well above that.
A good example of that came up with regard to How can I make a device that will print to a network printer?. That article suggests a method to create a "device" that transfers data to a network printer. That's useful for ancient software that can't work with spooled printers.
The solution presented works, but does have a potential race condition. I'd never noticed it, but this comment pointed it out:
I needed to migrate from serial printers to network printers using (D-Link and Netgear) print server devices. Unlike a Unix system, those devices have no understanding of printer capabilities. I therefore needed to retain the interface mechanisms used with local printers. I fell upon your suggestion of diverting printing to a remote printer using a named pipe and a perpetual script with enthusiasm, and at first it appeared to work brilliantly. However, disappointment set in when garbled print came out of busy printers. This is, I think, caused by a rapid succession of lp jobs for the same printer that write concurrently to the same named pipe. It seems that the lp system relies on the ability of a process to lock a tty for exclusive writing, and to wait if it finds a locked tty; I am not aware of a similar feature for named pipes. The remedy seems to be to amend the interface scripts to invoke the remote print instead, as I believe you and others have suggested elsewhere, and abandon the neat but, it seems, flawed approach of using a named pipe.
Actually, print jobs would never write concurrently to the same pipe; they would always be sequential, so that's not where the problem lies. It's the process that is reading the pipe that has the problem.
Here's what happens: the reader process is reading data from the pipe. When it hits eof, it picks up and sends that data to the print server. Normally, that's all very quick, but if the jobs are large and the system is busy, there may be data from a new print job written to the pipe before the reader is quite through. That can confuse things.
A simple solution is to add a sleep at the end of the writer interface script. If the sleep is longer than the time it takes to transfer data to the print server, that solves the problem, but of course the race condition still exists: you've simply handicapped one of the runners. However, that may be "good enough" for most circumstances.
If it isn't, you need to institute some form of cooperative locking scheme. Your writer process would not send new data until the reader process had cleared the lock. That's very simple to do in Perl. In a shell script, it's not as neat but can still be accomplished with a "mkdir": creating a directory is always atomic, so if you can't (because it exists), your other process has the lock. That scheme does have the problem of stale locks (the directory) being left over from a crash, so I'd rather use Perl's "lockf".
Got something to add? Send me email.
More Articles by Anthony Lawrence © 2011-03-18 Anthony Lawrence