Threads are often described as lightweight processes, which is useful, but unless you understand ordinary Unix processes, that doesn't really tell you much.
The traditional Unix model was fork and exec, which is "expensive" in terms of cpu time. However (and this is the part that often gets left out), that doesn't mean that fork and exec is "bad" and threads are "good": it depends upon what it is you need to accomplish. Even at that, there are other considerations, as we shall see.
The idea of threads comes from observation of typical programming tasks. Suppose, for example, that you have a web server. Obviously, a webserver needs to be able to handle multiple requests at the same time. We'll conveniently ignore the fact that unless there are multiple CPU's, multiple hard drives and multiple network cards, there really is no such possibility, and work with the appearance of simultaneous access as though it really were. There are a number of ways the programmer can approach that problem. First, the webserver could be one program that does the timesharing itself, dividing up its time between whatever number of requests it gets, interleaving responses so as to give everyone something. Such servers are clumsy to write, and generally don't scale well unless they are written very, very cleverly. Another way is to skip that hard work and just start up X number of standalone servers either ahead of time or as some network daemon (like inetd) requests. That lets the OS take care of the timesharing issues. Or, you can have one server that creates copies of itself as needed, which is very similar. The copies would come from fork and execs; such a server is often called a "forking server" for that reason.
There may be some inefficiences there. It can be useful to fork off something that handles the output you are returning to the requesting user, but it hardly makes sense to fire off another whole webserver just to handle that part. What you would like is just to have a small section of code off running by iteslf, under the control of the OS scheduler, without the expense of a whole program. That's what threads give you: you designate parts of your code effectively as standalone processes - except they aren't processes, they are simply separate threads of execution: separately running as far as the OS is concerned, but not separate processes.
Unix didn't always have threads; in fact it was a fairly late invention. The kernel has to be very different to incorporate this idea, so before there were real threads, there were (and still are) pseudo threads, or user threads. This is simply something that makes it easier for the programmer by allowing the code to be written as though kernel threads were available, but it's all really just controlled by a controlling task (you use a threads library with your code) that does the timesharing for you. So within the processes time slice, it does its own slicing and scheduling. It's nothing different than the very first model we described; it just removes a lot of the nasty details for the programmer.
Sounds good so far, and it is. A program written for threads can be much more efficient and can scale up much more easily, especially if it is real kernel threads. Your program automatically gets all the benefits of kernel scheduling without the expense of running lots of big processes. Sounds like there's no downside, so why not just abandon the fork and exec model and go with threads? That's what Windows programmers do almost exclusively. But the downside of this is that it encourages, even demands, large monolithic programs that try to "do it all".
A different approach is taken if we design for fork and exec. Our code that sends back the requested page could be a standalone program: something that actually can execute by itself. The downside to this is the expense (time) it takes to either fork and exec or otherwise get our data flowing to and from this. The upside is that if we come up with a better version, it's a drop-in. The little program might also be useful for other programs to use if they have similar needs. That is, in fact, the whole idea of Unix utility tools. While most of us are used to calling them from shell scripts, they can also be (and often are) the target of fork/execs in C or other programs.
So neither is "better". If you don't need speed, the fork/exec model has real benefits, and Unix does work hard at making this stuff work as efficiently as possible. Otherwise, it's great to have threads available. Use what's appropriate for the task at hand.
See Google Chrome for a real life example of threads vs. fork/exec.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2010-10-27 Tony Lawrence