Girish Venkatachalam is a UNIX hacker with more than a decade of
networking and crypto programming experience.
His hobbies include yoga,cycling, cooking and he runs his own
business. Details here:
More posts by Girish Venkatachalam.
The moment you hear the term p2p, the first thing that comes to your mind will mostly be BitTorrent. It is a fantastic file sharing protocol and in a way it rewrote the way we have always understood file transfers on the Internet. Curious geeks can refer to this detailed specification to understand how the protocol works. It is advanced technology and a brilliant way to solve the age old problem of scalability and web traffic overloads which have crashed the best of web servers.
The key concepts used in BitTorrent are SHA1 checksum computations, peer to peer networking(of course), randomization of downloading process and the concept of Pareto efficiency.
But hang on for a minute. Let us first understand p2p and how it differs from client server protocols like HTTP, FTP and SMTP.
A peer to peer network builds a layer of redundancy and creates interrelationships betweens the participants of a p2p protocol. Whereas in a client server interaction between a client and a server, the same server interacts with different clients in a 2 way relationship. It does not share any information with the other clients downloading at the same time.
In other words, if you are downloading a page from slashdot and 1000 other people are also downloading the very same page, then the web server at slashdot gets loaded. And that is the only interaction between the clients. There is a negative side effect in having many clients and one server. This limits the scalability of web servers in a big way. One cannot provision for rare instances of heavy load.
This problem has been dogging the evolution of Internet file sharing mechanisms for a long time. Until Bittorrent came along and rewrote the rules of the game. In Bittorrent, the file that is being downloaded does not get downloaded from one single server. Instead the files get shared and downloaded and uploaded between the participating "peers".
What is a bittorrent peer?
It is a client or a server that has a portion of the file available. It also has to be connected to other peers. In other words at any given point of time, if there 40 people downloading a file,then each of the participants viz, each of the computers interested in obtaining a file also simultaneously upload portions of the file with other computers.
The notion of client and server blurs when we enter the p2p world. Every node acts as a server as well as a client. But there is a significant problem here. It is called network address translation or NAT.
If you wish to know NAT in graphic detail, read the article on NAT in the reference section of this article. I will only give a brief introduction. Due to the shortage of IPv4 addresses on the Internet, many vendors (mostly MODEM and router manufacturers) allocate private IP addresses to machines on a company Intranet or a house if using ADSL.
These private IP addresses belong to fixed netblocks and are technically known as RFC1918 addresses. Typically 192.168/16, 172.16/12 and 10/8 networks are used. Now these addresses are not unique across the Internet. Consequently they are unroutable and they cannot be participants in a Bittorrent or any p2p protocol.
Forget p2p. They cannot even participate in any Internet client server protocol. Then how can machines behind NAT access the Internet? They use the globally unique public IP address allocated to the router or MODEM that does NATing. Since many machines share a single(or few) public IP addresses, an ephemeral public port(TCP or UDP) is assigned on the NAT device for communicating with the Internet.
This works without hassle when we are the client. Not as a server. The client uses ephemeral ports anyway; so this is not a problem. But in the case of servers, this will not work. You can emulate a fixed port mapping in a machine behind NAT by using a technique known as port forwarding. This involves configuring the router/MODEM to redirect requests coming to a certain public port to a certain private port running in a certain private IP address.
Things have got really complicated now. We will move over to other topics. If this is the case then how does Bittorent work?
How can Bittorrent nodes act as servers? Obviously it is a problem and you have similar issues with many other protocols including Session Initiation Protocol, Real Time Protocol and many others.
One of the ways skype solved this problem is by using a technique called UDP hole punching. UDP hole punching does not require you to configure port forwarding and things work seamlessly. But it is a complex protocol. And so are all p2p protocols.
P2P is used not only for file sharing, but also for instant messaging, VoIP and for many other applications with future growth prospects. Multimedia, whiteboarding, Video on Demand and so on come to mind.
In all p2p protocols, there is a concept of a tracker or supernode which tracks the various peers/participants of the protocol. It is the job of the tracker to keep the state of the individual nodes participating in the protocol.
UDP holepunching is an effective technique to scale firewalls, NAT devices and routers and enable p2p applications to work without any configuration. This is achieved through the concept of a rendezvous server or mediator node that tracks the ephemeral port mapping at each NAT device.
All this complexity adds a great deal of resilience, fault tolerance and scalability by managing the churn effectively. Any number of participants can come and go, the file transfer may get interrupted any number of times, but we will get the file correctly thanks to the SHA1 checksum mechanism.
A lot of real life applications easily lend to the P2P model as we will see below.
I have already mentioned many of the key applications that use P2P and that will continue to use P2P in today's Internet. The future holds bright prospects for P2P computing as more and more media rich applications move to Internet for content delivery. Even television streaming and telephony applications will get integrated into instant messaging and presence networks. All this need a solid P2P backbone.
Got something to add? Send me email.
More Articles by Girish Venkatachalam © 2011-03-09 Girish Venkatachalam