The Evolution of Peer-to-Peer File Sharing System
It was in the year 1960, when the idea of peer-to-peer first came into existence. With the aim of sharing files among US research facilities, the ARPANET network was formed. The network comprised of several hosts and each host was considered equally important. The ARPANET proved to be a successful client-server network where every node in the network could request and serve content. Though ARPANET was considered to be a successful client-server network using a simple routing mechanism; however, it was not self-organized and it lacked the ability to provide any means for context or content based routing.
With the introduction of TCP/IP in 1973, the direct host-to-host communication concept in network was sealed. It laid down a mechanism of routing the packets to their destination. Most of the protocols that were built thereafter were built on the same idea. Some of the protocols are HTTP, SMTP, DNS, etc. We can say that the techniques used by peer-to-peer file-sharing networking systems are simply an evolution of these principles.
In order to overcome the challenges faced in ARPANET, a distributed messaging system called USENET was established in the year 1979. It was also considered to be a client-server model from the user’s perspective which was based on a decentralized model of control. The communication took place with one another as peers by propagating messages over the entire group of servers within the network. The same concept is being used by the Mail transfer agents using SMTP.
Thereafter, with more and more people on the Internet, a concept of music and file sharing system in May 1999. Such an application was named Naster. It was just the beginning of a revolution in peer-to-peer networks. As we can see today that any participating user can set up a virtual network independent from the physical network, without any requirement to follow administrative rules. With a powerful file sharing application called Naster, there were set of central server to provide the requested file. It was supported by indexing of the users and the shared content. In this way it linked people who had files on the central server.
In case a file was searched, the server would search all the copies of the files to be presented to the user. The requested file was transferred directly between the two computers. However, there was a limitation with only music files getting shared. Because of the central server that was storing the music files to be shared, Napster was held responsible for copyright issue and was shut down in July 2001.
With the shutdown of Naster, other peer-to-peer services which came into the limelight were Gnutella and Kazaa. These became more popular as they allowed users to download movies, games, and music. Over the years, the peer-to-peer system had been used in many applications; however, it is believed that it gained popularity only because of the Napster file sharing systems, which supported sharing of music through a centralized server. The basic model of peer-to-peer computing was already laid in Napster, and was applied in earlier software system. With the already set model, the peer-to-peer revolution allowed millions of users to directly connect to different users by forming groups and collaborating search engines, virtual supercomputers, and file systems.
Later, with the growing use and dependency on World Wide Web, Tim Berners-Lee co-related the peer-to-peer network with the World Wide Web. He assumed each Web user as an active editor and contributor where each user can create and link resulting in an interconnected web of links. In earlier days, it was easier to connect to machines for communication as they were without any firewalls and any security measures. The Internet was more open to flawlessly send packets from one system to another. Such a setup contrasts to the broadcasting mechanism which has been used over the years.
Over the years, with the advancement in technologies various researchers and developers have given more importance on routing algorithms to attain better performance and efficiency. Today’s research focus still lies in the file sharing systems and the data distribution. The issues, techniques and solutions are well understood and don’t cause any major concern. In the coming years the focus will be more on security parameters for shared information. In addition, the coming generation of peer-to-peer system will see advanced interactions and collaboration among the peers by building and deploying commercial software system. Though there were many works done in this direction. However, some of the significant earlier works done include Groove and JXTA.
There were many conclusions drawn by various researchers over the years. A widely distributed search methods may result in some benefits for application numerous. A peer-to-peer network works better with small configuration and has the ability to evolve and reduce the use of high performance servers.
A similar concept was introduced to use many off-the-shelf PC processors than a large mainframe computer. This motivated the deployment of distributed databases and aggregates the storage and processing capabilities for better performance.
The concern for reliability was put forward by Gedik and Liu. According to him, reliability needs to be addressed for peer-to-peer networks in application.
Later, it was suggested that peer-to-peer networks has three characteristics, namely, self-organization, symmetric communication, and distributed control. In a symmetric communication, there is no centralized directory and peers act as both clients and servers.
It was also suggested that peers are inherently unreliable. Whereas, DePaoli and Mariani recently reviewed the early peer-to-peer system at a higher level and their reliability. With the requirement of critical surveys in peer-to-peer communication, the entire research literature for peer-to-peer was sub-divided along four lines: search, security, storage and applications.
The search and security measures were already classified. However, the other research has drawn numerous distributed systems disciplines. They addressed the routing and security issues along the network boundaries by recognizing issues related to naming, routing and congestion control. However, over the years, the researches related to peer-to-peer have been host centric.
Some stated that the peer-to-peer research is set to benefit from database researches. Further, it was recommended that there is a need to decouple data indexes from the applications that use the data. Database indexes, like B+ trees use an analog in peer to peer distributed hash tables.
The major researches are done on the basic routing structure and researches on algorithms have become more visible. The structured approach like hyper cubes, rings, skip graphs etc. have weighed the amount of routing state per peer and the number of links per peer against overlay hop-counts. Whereas the unstructured approaches uses blind flooding and random walks, overheads usually trigger some structure. The applications have included file sharing, directories, content delivery networks, email, etc. However, the other aspect includes the ability to accommodate variation in outcome, which one could call adaptability.
Handling different kind of queries accommodating changeable application requirements with minimal intervention was recently acknowledged as a first-class requirement termed it as “organic scaling” where the system can grow gracefully, without any architectural breakpoints.
Recently, classified peer-to-peer systems by the presence or absence of structure in routing tables and network topology. It includes unstructured and structured algorithms as competing alternatives. The unstructured approaches have been called “first generation”, and the structured algorithms have been called as a “second generation”.
In spite of various advantages associated with structured peer-to-peer network, several research groups are still pursuing unstructured P2P because of the various criticisms of the structured systems.