The World of Peer-to-Peer (P2P)/All Chapters

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

Contents

Foreword

Guide to Readers

This is a wikibook (en.wikibooks.org), as such you should learn a bit about what it is and how it does its magic.

The book is organized into different parts, but as this is a work that is always evolving, things may be missing or just not where they should be, you are free to become a writer and contribute to fix things up...

This book intends to explain to you the overall utilization that P2P (Peer-to-Peer) technologies have in todays world, it goes deeper into as many implementations as it can and compares the benefits, problems even legal implications and changes to social behaviors and economic infrastructures. We explain in detail about the technology and how works and try to bring you a vision on what to expect in the future.

Reader Comments

If you have comments about the technical accuracy, content, or organization of this document, please tell us (e.g. by using the "discussion" pages or by email). Be sure to include the section or the part title of the document with your comments and the date of your copy of the book. If you are really convinced of your point, information or correction then become a writer (at Wikibooks) and do it, it can always be rolled back if someone disagrees...

Guide to Writers

Authors/Contributors (at Wikibooks) should register if intending to make non-anonymous contributions to the book (this will give more value and relevance to your opinions and views on the evolution of the work and enable others to talk to you) and try to follow the structure. If you have major ideas or big changes use the discussion area; as a rule just go with the flow...

Conventions 
A set of conventions have been adopted on the creation of this book please read about them before you contribute any content on the book's talk page.

Authors

The following people are authors to this book
Panic

There are many other contributors/editors to the book; a verifiable list of all contributions exist as History Logs at Wikibooks (http://en.wikibooks.org/).

Acknowledgment is given for using some contents from other works like Wikipedia, theinfobox:Peer to Peer and Internet Technologies

What is P2P ?

This is a diagram of a Peer-to-Peer computer network.
A diagram of a server-based computer network.

Generally, a peer-to-peer (or P2P) computer network refers to any network that does not have fixed clients and servers, but a number of autonomous peer nodes that function as both clients and servers to the other nodes on the network. This model of network arrangement is contrasted with the client-server model as any node is able to initiate or complete any supported transaction. Peer nodes may differ in local configuration, processing speed, network bandwidth, and storage quantity.

Although the term has been applied to Usenet and IRC in all their incarnations and is even applicable to the network of IP hosts known as the Internet, it is most often used restricted to the networks of peers developed starting in the late 1990s characterized by transmission of data upon the receiver's request instead of the sender's. Such of the early networks included Gnutella, FastTrack, and the now-defunct Napster which all provide facilities for free (and somewhat anonymous) file transfer between personal computers connected in a dynamic and unreliable way to a network in order to work collectively towards a shared objective.

Even those early Networks did work around the same concept or implementation. In some Networks, such as Napster, OpenNap or IRC, the client-server structure is used for some tasks (e.g. searching) and a peer-to-peer structure for others, and even that is not consistent in each. Networks such as Gnutella or Freenet, use a peer-to-peer structure for all purposes and are sometimes referred to as true peer-to-peer networks, even though some of the last evolution are now making them into a hybrid approach were each peer is not equal in its functions.

When the term peer-to-peer was used to describe the Napster network, it implied that the peer protocol nature was important, but in reality the great achievement of Napster was the empowerment of the peers (ie, the fringes of the network). The peer protocol was just a common way to achieve this.

So the best approach will be to define peer-to-peer, not as a set of strict definitions but to extend it to a definition of a technical/social/cultural movement, that attempts to provide a decentralized, dynamic and self regulated structure (in direct opposition to the old model o central control or server-client model) with the objective of providing content and services. In this way a computer programs/protocol that attempts to escape the need to use a central servers/repository and aims to empower or provide a similar level of service/access to a collection of similar computers can be referred to as being a P2P implementation, and it will be in fact enabling everyone to be a creator/provider, not only a consumer.

From a Computer Science Perspective

Technically, a true peer-to-peer application must implement only peering protocols that do not recognize the concepts of "server" and "client". Such pure peer applications and networks are rare. Most networks and applications described as peer-to-peer actually contain or rely on some non-peer elements, such as DNS. Also, real world applications often use multiple protocols and act as client, server, and peer simultaneously, or over time.

P2P under a computer science perspective creates new interesting fields for research not on to the not so recent switch of roles on the networks components, but due to unforeseen benefits and resource optimizations it enables, on network efficiency and stability.

Peer-to-peer systems and applications have attracted a great deal of attention from computer science research; some prominent research projects include the Chord lookup service, the PAST storage utility, and the CoopNet content distribution system (see below for external links related to these projects).

Distributed Systems

Distributed systems are becoming key components of IT companies for data centric computing. A general example of these systems is the Google infrastructure or any similar system. Today most of the evolution of these systems if being focused on how to analyze and improve performance. A P2P system is also a distributed systems and share, depending on the implementation, the characteristics and problems of distributed systems (error/failure detection, aligning machine time, etc...).

Ganglia

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency.
Ganglia has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes. ( http://ganglia.info/ )

Distributed Computation

The basic premise behind distributed computation is to spread computational tasks between several machines distributed in space, most of the new projects focus on harnessing the idle processing power of "personal" distributed machines, the normal home user PC. This current trends is an exciting technology area that has to do with a sub set of distributed systems (client/server communication, protocols, server design, databases, and testing).

This new implementation of an old concept has it's roots in the realization that there is now a staggering number of computers in our homes that are vastly underutilized, not only home computers but there are few businesses that utilizes their computers the full 24 hours of any day. In fact seemingly active computers can be using only a small part of it processing power. Using a word processing, email, and web browsing, require very few CPU resources. So the "new" concept is to tap on this underutilized resource (CPU cycles) that can surpass several supercomputers at substantially lower costs since machines that individually owned and operated by the general public.

SETI@Home

One of the most famous distributed computation project, , hosted by the Space Sciences Laboratory, at the University of California, Berkeley, in the United States. SETI is an acronym for the Search for Extra-Terrestrial Intelligence. SETI@home was released to the public on May 17, 1999.

In average it used hundreds of thousands of home Internet-connected computers in the search for extraterrestrial intelligence. The whole point of the programs is to run your free CPU cycles when it would be otherwise idle, the original project is now deprecated to be included into BOIC.

BOINC

BOINC has been developed by a team based at the Space Sciences Laboratory at the University of California, Berkeley led by David Anderson, who also leads SETI@home.

Boinc stands for Berkeley Open Infrastructure for Network Computing, a non-commercial (free/w:open source software), released under the LGPL, middleware system for volunteer computing, originally developed to support the SETI@home project and still hosted at ( http://boinc.berkeley.edu/ ), but intended to be useful for other applications in areas as diverse as mathematics, medicine, molecular biology, climatology, and astrophysics. an open-source software platform for computing using volunteered resources that extends the original concept and lets you donate computing power to other scientific research projects such as:

  • Climateprediction.net: study climate change.
  • Einstein@home: search for gravitational signals emitted by pulsars.
  • LHC@home: improve the design of the CERN LHC particle accelerator.
  • Predictor@home: investigate protein-related diseases.
  • Rosetta@home: help researchers develop cures for human diseases.
  • SETI@home: Look for radio evidence of extraterrestrial life.
  • Folding@Home ( http://www.stanford.edu/group/pandegroup/folding/ ): to understand protein folding, misfolding, and related diseases.
  • Cell Computing biomedical research. (Japanese; requires nonstandard client software)
  • World Community Grid: advance our knowledge of human disease. (Requires 5.2.1 or greater)

As a "quasi-supercomputing" platform, BOINC has over 435,000 active computers (hosts) worldwide. BOINC is funded by the National Science Foundation through awards SCI/0221529, SCI/0438443, and SCI/0506411.

It is also used for commercial usages, as there are some private companies that are beginning to use the platform to assist in their own research. The framework is supported by various operating systems: Windows (XP/2K/2003/NT/98/ME), Unix (GNU/Linux, FreeBSD) and Mac OS X.

World Community Grid (WCG)

Created by IBM, World Community Grid ( http://www.worldcommunitygrid.org/ ) is similar to the above systems. Fourteen IBM servers serve as "command central" for WCG. When they receive a research assignment from an organization, they will scour it for security bugs, parse it into data units, encrypt them, run them through a scheduler and dispatch them out in triplicate to the army of volunteer PCs.

To be a volunteer one only needs to download a free, small software agent (similar to a screensaver).

Projects get selected based on the potential to benefit from WCG technology and address humanitarian concerns, and chosen by an independent, external board of philanthropists, scientists and officials.

The software is OpenSource (LGPL), C/C++ and wxWidgets and is available for Windows, Mac, or Linux.

Grid Networks

Grids first emerged in the use of supercomputers in the U.S. , as scientists and engineers sought access to scarce high-performance computing resources that were concentrated at a few sites.

Open Science Grid

The Open Science Grid ( http://www.opensciencegrid.org/ ) was built and is operated by the OSG Consortium, it is a U.S. grid computing infrastructure that supports scientific computing via an open collaboration of science researchers and software developers from universities and national laboratories, storage and network providers.

Globus Alliance

The Globus Alliance ( http://www.globus.org/ ) is a community of organizations and individuals developing fundamental technologies behind the "Grid," which lets people share computing power, databases, instruments, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy.

The Globus Alliance also provides the Globus Toolkit, an open source software toolkit used for building robust, secure, grid systems (peer-to-peer distributed computing on supercomputers, clusters, and other high-performance systems) and applications. A Wiki is available to the Globus developer community ( http://dev.globus.org/wiki/Welcome ).

High Throughput Computing (HTC)

As some scientists try extract more floating point operation per second (FLOPS) or minute from their computing environment, others concentrate on the same goal for larger time scales, like months or years, we refer these environments as High Performance Computing (HPC) environments.

The term HTC was coined in a seminar at the NASA Goddard Flight Center in July of 1996 as a distinction between High Performance Computing (HPC) and High Throughput Computing (HTC).

HTC focus is on the processing power and not on the network, but the systems can also be created over a network and so be seen as a Grid network optimized for processing power.

Condor Project

The goal of the Condor Project ( http://www.cs.wisc.edu/condor/ ) is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput.

IBM Grid Computing

IBM among other big fishes in the IT pond, spends some resources investigating Grid Computing. Their attempts around grid computing are listed in on the projects portal page ( http://www.ibm.com/grid ). All seem to attempt to leverage the enterprise position on server machines to provide grid services to costumers. The most active project is Grid Medical Archive Solution a scalable virtual storage solution for healthcare, research and pharmaceutical clients.

Content distribution/Hosting

The traditional method of distributing large files is to put them on a central server. The server and the client can then share the content across the network using agreed upon protocols (from HTTP, FTP to an infinite number of variations) when using IP connections the data can be sent over TCP or UDP connection or a mix of the two, this all depends mostly on the requirements on the service, machines, network and many security considerations.

P2P evolved in it time to solve a simple problem, central servers don't scale well (bandwidth, space and CPU) and they constitute a point of failure, that can bring the function of the system to an end (as any centralization of services).

TODO

TODO
Complete

From a Economics Perspective

For a P2P system to be viable there must be a one to one share of work between peers, the goal should be a balance between consumption and production of resources and maintaining a singe class of participant on the network. Since most P2P systems have an hard time creating incentives for users to produce, most P2P system have a pyramidal scheme as users interact with it and do depend on the network effect it creates, the more users the system has the more attractive it (and the more value it has) as any system that depends on the network effect, it's success is based on compatibility and conformity issues.

In 2008, Michael Heller, an academic at Columbia University in New York, in an interview to technology law podcast OUT-LAW Radio ( Listen to: Are patents and copyrights making innovation impossible? OUT-LAW Radio, 28/08/2008 ) presented his theory, saying:
- "I discovered a paradox in the free market and it is this: usually private ownership creates wealth, but too much ownership has the opposite effect - it creates gridlock,". "When too many owners control a single resource – it can be a patent, a copyright, land – when too many people control a single resource, co-operation breaks down and wealth disappears. Everybody ends up losing.".
- "Imagine a drug developer walking into an auditorium and seeing 50 or 100 or several hundred patent owners, each with their essential patent on their lap, and the drug developer knows that unless he's able to negotiate successfully with every single one of those patent owners, his drug can't come to market,".
- "A standard, when it works, can solve the problem of gridlock. But to create the standard you need to get, in the context of a DVD, seven or eight hundred separate patents pooled together into a single patent pool; but there are many areas that we don't have because entrepreneurs can't get the pools together or can't get the standard negotiated.".
Heller has also published a book, The Gridlock Economy in it he expanded on his theory how too much ownership wrecks markets, stops innovation and costs lives.

Content for money

The privatization of the production and distribution of Cultural goods.

Content is virtual, made only of information. This information can be any type of non material object that is made from ideas (text, multimedia). In this way content is also the myriad ways those ideas can be expressed. It may consist of music, movies, books or any one single aspect of each.

Music

In todays interconnected world the distribution channels are so diversified that creating artificial control schemes will only degrade the level of satisfaction of consumers without increasing product value but incrementing the costs to the sanctioned distributors. If costumers are faced with a product with DRM, unauthorized copies if made publicly available, will create a competing product without limitations, thus creating a better product with a better price tag. In fact the use of DRM promotes the creation of a parallel market (if one can call it that because most offerings are gratis), this results from the consumers wishes are not being satisfied by the primary offer.

Video

Movies
TV

Recently some television networks are rethinking their approach to audiences, this has resulted from the level acceptance and interest that DVD show collections were having and several online attempts to improve distribution. Since now anyone can easily illegally download their favorite shows, a problem similar to the fragmentation of the distribution channels as seen in the music recording industry with the rise of alternative delivery technologies will have a similar result if television industry fails adapt and fill the audiences expectations of quick and easy accessibility to new fresh content.

TODO

TODO
extend, address the rise of independent productions, compare the p2p meme of decentralization with the static settings of the industry, we are all producers now, real time interaction

ISPs

ISPs have been shaping/throttling P2P traffic, especially the more popular networks for years, resulting on an ongoing cat and mouse game between ISPs and P2P developers. In the US the network neutrality discussion and recently the evidence of this actions by ISPs against P2P traffic has turned this matter into a political issue.

In November 2007, Vuze, creators of Azureus (a Bittorrent application), petitioned the FCC, resulting in a FCC hearing held in December 2007. One of the issues raised there, was the level of data available on BitTorrent throttling. This lead to a statement by the General Counsel at Vuze, Jay Monahan; “We created a simple software “plug-in” that works with your Vuze application to gather information about potential interference with your Internet traffic.”

This plugin has been gathering more hard data on the actions of ISPs, resulting in a growing list of ISPs that interfere with P2P protocols is maintained on the Azureus WIKI ( http://www.azureuswiki.com/index.php/Bad_ISPs ).

From a Sociological Perspective

From person to person or user to user, a new world is being born in that all are at the same time producers and consumers. The information will be free since the costs of distribution will continue to fall and the power for creative participation is at anyones hands.

Is it morally wrong?

As discussed previously there is no common ground to answer this question, views differ wildly, even states degrees with the interpretation or legality of restricting/implementing intellectual property rights.

TODO

TODO
Disintermediation... Price Fixing... Copyright Extension... Radio Consolidation... DRM...

  • paedophiles and terrorists
For every action there is a reaction

It is today an evidence that there is a social movement against what is generally perceived as the corruption of copyright over public goods, that is, legally a minority is attempting to impose extensions and reductions of liberties to defend economical interests of mostly sizable international corporations that in it's vast majority aren't even the direct creators of the goods. In this particular case virtual goods, mostly digital that have a approaching 0 cost of replication and aren't eroded by time or use.

DRM (Digital Rights Management)

In late 2005, market-based rationales influenced Sony BMG's deployment of DRM systems on millions of Compact Discs that threatened the security of its customers computers and compromised the integrity of the information infrastructure more broadly. This became known as the Sony BMG Rootkit debacle (see the paper Mulligan, Deirdre and Perzanowski, Aaron K., "The Magnificence of the Disaster: Reconstructing the Sony BMG Rootkit, for detailed information).

In February 6, 2007, Steve Jobs, wrote an open letter addressing DRM since it was impacting Apples business on the iTunes/iPod store ( http://www.apple.com/hotnews/thoughtsonmusic/ ).

On a presentation made by David Hughes of the RIAA at Arizona State University (2007), David Hughes, senior vice president of technology for the RIAA, dubbed the spiritual leader of Apple Steve Jobs as a "hypocrite" over his attitude to DRM on iTunes. "While Steve has been banging on about the music companies dropping DRM he has been unwilling to sell his Pixar movies through iTunes without DRM and DVDs without CSS encryption."

a danger for historical records

TODO

TODO
Libraries fear digital lockdown By Ian Youngs BBC News

P2P United

A now disbanded organization formed by six of the biggest P2P groups (those behind eDonkey, Grokster, Morpheus, Blubster, Limewire and BearShare), with Adam Eisgrau as executive director. It was started in mid-July 2003 to provide a way to lobby for the P2P on U.S. Congress and WIPO, the UN organization that administers intellectual property treaties since the file-sharing industry (as an industry) had no identifiable name and face in Washington or in the media.

This attempt was a bust and since then most of the members of the group has lost court cases or have settled and closed operations.

TODO

TODO
Complete

Peer-to-Peer working group

The Peer-to-Peer WG (P2Pwg).

A great article about problems with the creation of the working group is available at (www.openp2p.com) by Tim O'Reilly 10/13/2000 is available (http://www.openp2p.com/pub/a/p2p/2000/10/13/working_grp.html).

P2P in non technological fields

There are also several movements attempting to establish how to apply the concept of P2P to non technological fields like politics, economics, ecology etc...
One of such attempts is the The Foundation for P2P Alternatives ( http://p2pfoundation.net ), that function as a clearinghouse for such open/free, participatory/p2p and commons-oriented initiatives and aims to be a pluralist network to document, research, and promote peer to peer alternatives.

From a Legal Perspective

The most commonly shared files on such networks are mp3 files of popular music and DivX movie files. This has led many observers, including most media companies and some peer-to-peer advocates, to conclude that these networks pose grave threats to the business models of established media companies. Consequently, peer-to-peer networks have been targeted by industry trade organizations such as the RIAA and MPAA as a potential threat. The Napster service was shut down by an RIAA lawsuit; both groups the RIAA and MPAA spend large amounts of money attempting to lobby lawmakers for legal restrictions. The most extreme manifestation of these efforts to date (as of January, 2003) has been a bill introduced by California Representative Berman, which would grant copyright holders the legal right to break into computer systems believed to be illegally distributing copyrighted material, and to subvert the operation of peer-to-peer networks. The bill was defeated in committee in 2002, but Rep. Berman has indicated that he will reintroduce it during the 2003 sessions.

As attacks from Media companies expand the networks have seemed to adapt at a quick pace and have become technologically more difficult to dismantle. This has caused the users of such systems to become targets . Some have predicted that open networks may give way to closed, encrypted ones where the identity of the sharing party is not known by the requesting party. Other trends towards immunity from media companies seem to be in wireless adhoc networks where each device is connected in a true peer-to-peer sense to those in the immediate vicinity.

While historically P2P file sharing has been used to illegally distribute copyrighted materials (like music, movies, and software), future P2P technologies will certainly evolve and be used to improve the legal distribution of materials.

TODO

TODO
..."IP addresses are an identifier used to locate a particular network interface on the Internet. Be this a router, a PC, Mac, PDA, mobile phone or otherwise (with modules capable of utilizing one ranging to the size of a finger nail). IP addresses are not proof that a particular TYPE (PC running Windows, Linux or other free software, PDA, mobile phone, etc.) of computer hardware was used in the transmission. Nothing about this hardware can be *assumed*, and also nothing about the users IF ANY, of this hardware. So, I define my second point, which is that these electronic devices (of the types I listed above) may be operated without regard to physical location or the actual OWNER of the IP address." in "Patricia Santangelo files Answer, Demands Trial by Jury"...

As it should be obvious by now the problem P2P technologies create to the owner of the content, to the control of the distribution channels and to the limitation of users (consumers) rights is huge, the technology is making holes in the standard ideology that controls the relations between producers and consumers some new models have been proposed (see for example Towards solutions to “the p2p problem” - http://groups.sims.berkeley.edu/pam-p2p/ ).

In 2007 a handful of the wealthiest countries (United States, the European Commission, Japan, Switzerland, Australia, New Zealand, South Korea, Canada, and Mexico) started secretive negotiations toward a treaty-making process to create a new global standard for intellectual property rights enforcement, the Anti-Counterfeiting Trade Agreement (ACTA) due to be adopted at the 34th G8 summit in July 2008.

It has been argued that the main purpose of the treaty is to provide safe harbor for service providers so that they may not hesitate to provide information about infringers; this may be used, for instance, to quickly identify and stop infringers once their identities are confirmed by their providers.

Similarly, it provides for criminalization of copyright infringement, granting law enforcement the powers to perform criminal investigation, arrests and pursue criminal citations or prosecution of suspects who may have infringed on copyright.

More pressingly, being an international treaty, it allows for these provisions—usually administered through public legislation and subject to judiciary oversight—to be pushed through via closed negotiations among members of the executive bodies of the signatories, and once it is ratified, using trade incentives and the like to persuade other nations to adopt its terms without much scope for negotiation.

Is it Illegal?

Peer-to-Peer in itself in nothing particularly new. We can say that an FTP transfer or any other one on one transfer is P2P, like an IRC user sending a DCC file to another, or even eMail, the only thing that can be illegal is the use one can give to a particular tool.

Legal uses of P2P include distributing open or public content, like movies, software distributions (Linux, updates) and even Wikipedia DVDs are found on P2P Networks. It can also be used to bypass censorship, like for instance the way Michael Moore's new film 'Sicko' leaked via P2P or as publicity machine to promote products and ideas or even used as a market annalists tool.

However trading copyrighted information without permission is illegal in most countries. You are free to distribute your favorite Linux distribution, videos or pictures you have taken yourself, MP3 files of a local band that gave you permission to post their songs online, maybe even a copy an open source software or book. The view of legality lies foremost on cultural and moral ground and in a globally networked world there is no fixed line you should avoid crossing, one thing is certain most people don't produce restricted content, most view their creations as giving to the global community, so it's mathematically evident that a minority is "protected" by the restrictions imposed on the use and free flow of ideas, concepts and culture in general.

P2P as we will see is not only about files sharing, it is more generally about content/services distribution.

Sharing is not theft and theft is not the same as piracy, this is true under any law.
is sharing theft? and is theft piracy? surly not...

Sharing contents that you have no right to is not theft. It has never been theft any were in the world. Anyone who says it is theft is wrong. Sharing content that you don't own or have the rights to is copyright infringement. Duplicating digital good doesn't reduce the value of the original good nor does it signify subtraction of the same to the owner and making that same digital good available to others without a license has a well understood effect of augmenting the visibility of said product and increase the demand for it, this has been validated in tests done with digital books, music and video releases.

The legal battles we are now accustomed to hear about deals mostly on control and also on lesser degree in rights preservation. Control over the way distribution is archived (who gets what in what way), this deals with money, as there is added value to controlling and restricting access by format, time and space.

TODO

TODO

  • the Copyright Term Extension Act of 1998—alternatively known as the Sonny Bono Copyright Term Extension Act or the Mickey Mouse Protection Act.
  • Encryption
  • paedophiles and terrorists

WIPO (World Intellectual Property Organization)

The World Intellectual Property Organization is one of the specialized agencies of the United Nations. WIPO was created in 1967 with the stated purpose "to encourage creative activity, [and] to promote the protection of intellectual property throughout the world". The convention establishing the World Intellectual Property Organization, was signed at Stockholm on July 14, 1967.

TODO

TODO
Add more info on the WIPO and relevant treaties.

European Union

An italian manifest saying "to share is not steal", referring to P2P legal status in Italy.

In August 2007, the Music industry was rebuffed in Europe on file-sharing identifications, as a court in Offenburg, Germany refused to order ISPs to identify subscribers when asked to by Music Industry who suspected specified accounts were being used for copyright-infringing file-sharing, the refusal was based in the courts understanding that ordering the ISPs to handover the details would be "disproportionate", since the Music Industry representatives had not adequately explained how the actions of the subscribers would constitute "criminally relevant damage" that could be a basis to request access to the data.

This was not an insulate incident in Germany, as also in 2007, Celle chief prosecutor's office used the justification that substantial damage had not been shown to refuse the data request, and does follows the opinion of a European Court of Justice (ECJ) Advocate-General, Juliane Kokott who had published an advice two weeks earlier, backing this stance, as it states that countries whose law restricted the handing over of identifying data to criminal cases were compliant with EU Directives. The produced advice was directed to a Spanish case in which a copyright holders' group wanted subscriber details from ISP Telefonica. The ECJ isn't obliged to follow an Advocate-General's advice, but does so in over three-quarters of cases.

In most European countries, copyright infringement is only a criminal offense when conducted on a commercial scale (for profit).

TODO

TODO

  • EU Proposing to Make P2P Piracy A Criminal Offense
    "The EP excluded patent rights from the scope of the directive, and decided that criminal sanctions should apply only to infringements deliberately carried out to obtain a commercial advantage. Piracy committed by private users for personal, non-profit purposes is therefore also excluded."

France

On June 12 2007 the Société des Producteurs de Phonogrammes en France (SPPF - http://www.sppf.com/ ), an entity that represents the legal interests and collects copyright revenue in behalf of independent French audio creations, have publicly announced that they had launched a civil action on the Paris Court of First Instance requesting a court order to terminate the distribution and function of Morpheus (published by Streamcast), Azureus and demanding compensation for monetary losses. In 18 September 2007 a similar action was made against Shareaza and in 20 December 2007 the SPPF announced a new action this time against Limewire. All of this legal actions seem to have as a base an amendment done to the national copyright law that stipulates that civil action can taken against software creators/publishers that do not take steps in preventing users from accessing illegal content.

TODO

TODO
Also mention ALPA (Association against audiovisual piracy)

Norway

The Norway’s Personal Data Act (PDA) makes it mandatory for ISPs in the country to delete all IP address logs on their customers more than three weeks old as it is considered personal information. This is a huge step forward in personal data protection laws but it also will make the work of "pirate-hunters" more dificult. The Simonsen law firm, is an example since it is known by the lawyer Espen Tøndel, figure head on this matters, and for having since 2006 (now terminated), a temporary license from Norway’s data protection office to monitor suspected IP addresses without legal supervision.

USA

Under US law "the Betamax decision" (Sony Corp. of America v. Universal City Studios, Inc.), case holds that copying "technologies" are not inherently illegal, if substantial non-infringing use can be made of them. This decision, predating the widespread use of the Internet applies to most data networks, including peer-to-peer networks, since legal distribution of some files can be performed. These non-infringing uses include sending open source software, creative commons works and works in the public domain. Other jurisdictions tend to view the situation in somewhat similar ways.

The US is also a signatory of the WIPO treaties, treaties that were partially responsible for the creation and adoption of the Digital Millennium Copyright Act (DMCA).

As stated in US Copyright Law, one must be keep in mind the provisions for fair use, licensing, copyright misuses and the statute of limitations.

MGM v. Grokster

TODO

TODO
http://www.eff.org/IP/P2P/MGM_v_Grokster/

RIAA

The RIAA and the labels took an aggressive stance as soon as online music file sharing became popular. They won an early victory in 2001 by shutting down the seminal music-sharing service Napster.

The site was an easy target because Napster physically maintained the computer servers where illegal music files, typically in high-fidelity, compressed, download-friendly MP3 format, were stored. [With P2P networks, the files are stored on individual user computers; special software lets consumers "see" the files and download them onto their own hard drives.]
—Daphne Eviatar, "Record industry, music fans out of tune," The Recorder, August 20, 2003

The Recording Industry Association of America (RIAA) ( http://www.riaa.com/ ) is the trade group that represents the U.S. recording industry. The RIAA receives funding from the four of the major music groups EMI, Warner, Sony BMG and Universal and hundreds of small independent labels.

Motion Picture Association of America

TODO

TODO
Warner Bros. to Try File Sharing in Germany

MPAA sues newsgroup, P2P search sites By John Borland, Published on ZDNet News, February 23, 2006

Canada

Canada has a levy on blank audio recording media, created on March 19, 1998, by the adoption of the new federal copyright legislation. Canada introduced this levy regarding the private copying of sound recordings, other states that share a similar copyright regime include most of the G-7 and European Union members. In depth information regarding the levy may be found in the Canadian copyright levy on blank audio recording media FAQ ( http://neil.eton.ca/copylevy.shtml ).

With borders and close ties to it neighbor, Canada as historically been less prone to serve corporations interests and has a policy that contrasts in its social aspects with any other country in the American Continent. The reality is that Canada has been highly influenced and even pressured (economically and politically) by its strongest neighbor, the USA, to comply with its legal, social and economic evolution. In recent time (November 2007) the government of Canada has attempted to push for the adoption of a DMCA-modeled copyright law, so to to comply with the WIPO treaties the country signed in 1997 in a similar move to the USA, this has resulted in a popular outcry against the legislation and will probably result in it's alteration. The visibility of this last attempt was due to efforts of Dr. Michael Geist, a law professor at the University of Ottawa considered an expert in copyright and the Internet, that was afraid that law would copy the worst aspects of the U.S. Digital Millennium Copyright Act.

Darknets vs Brightnets

Due the refusal to legislate in accordance with the public needs and wants. By adding extensions of copyrights (US, UK) and by actively promoting the promulgating laws similar to the DMCA in other countries a monoculture is created where a virtual monopoly on cultural goods is created, generating something of a cultural imperialism.

This reality promotes the population to move its support for transparent distribution systems (brightnets) to more closed system (darknets), that will increasingly depend on social connections to get into, like the hold speakeasy bars that popped up during the prohibition, legislating against the people once more will prove to be a failure.

A P2P brightnet for content distribution,where no one breaks the law, so no one need hide in the dark, can only be feasible if built around owner-free media/system or by being as heavily controlled, owned media/system, as the old centralized system. This probably around a centralized entity that should guarantee control and manage the content, maybe even requiring the use of some DRM sheme, an overall failed system.

This types of networks were already tested and failed, since content is also information, the need for privacy or lack of it on an open system will always create generate a more layered system that will ultimately degenerate in a darknet to survive legal actions.

Shadow Play

Some actions are not intended to see the light of day, this section is dedicated to bring out some of the subjects/actions in an attempt to help the reader to fully appreciate some of the less publicized information that has some kind of baring on the evolution of P2P.

OS

Since P2P (and P2P related technologies) started to pop up, the security of the user OS started to be placed above user freedom, probably due to most people being, technologically challenged, some organization (or groups with invested interest) are still free to think for the masses in place of just try to push the information out. There is an organized attempt to hide this fact from the public, it is funny to see that after this security enhancements are done they tend to be hidden so not to cause any brain damage or confusion (read foment rebellion) into users.

Well not all is lost, some people can't seem to be made to comply with this state of things and some information can be found and actions reversed.

about MS Windows
  • TCPIP.SYS - [fix],[info] for Windows XP.

ISPs

TODO

TODO
Add missing information about ISPs interactions with P2P technologies.

Internet providers (ISPs) aren't very pleased with P2P technologies due to the load the bring into to their networks, although they sell their Internet connections as unlimited usage, if people actually take on their offer, ISPs will eventually be unable to cope with the demand at the same price/profit level. This has made clients increasingly worried over some ISPs actions, from traffic shaping (protocol/packet prioritization) to traffic tampering.

San Francisco-based branch of the Electronic Frontier Foundation (EFF) a digital rights group have successfully verified that this type of efforts by Internet providers to disrupt some uses of their services and evidences seem to indicate that it is an increasing trend other as reports have reached the EFF and verified by an investigation by The Associated Press.

EFF Releases Report Interference with Internet Traffic on ComCast ( http://www.eff.org/wp/packet-forgery-isps-report-comcast-affair ), other information is available about this subject on the EFF site.

Traffic shaping

TODO

TODO
Add missing information.

Traffic tampering

Traffic tampering is more worrying then Traffic shaping and harder to be noticed or verified. It can also be defined as spoofing, consisting in the injection of adulterated/fake information into communication by gaming a given protocol. It like the post office taking the identity of one of your friends and sending mail to you in it's name.

Pcapdiff ( http://www.eff.org/testyourisp/pcapdiff/ ) is a free Python tool developed by the EFF to compare two packet captures and identify potentially forged, dropped, or mangled packets.

Localization/Acceleration (Cache)

Network neutrality

Network Neutrality deals with the need to prevent ISPs from double dipping on charges/fees for both the clients paying for their broadband connections and WEB sites/Organizations having also to pay for prioritization of traffic according to origination and destination or protocol used.

P2P Networks and Protocols

This chapter will try to provide an overview of what is Peer-to-Peer, it's historical evolution, technologies and uses.

P2P and the Internet: A "bit" of History

P2P is not a new technology, P2P is almost as old as the Internet it started with the email and the next generation were called "metacomputing" or classed as "middleware". The concept of it took the Internet by storm only because of a general decentralization of the P2P protocols, that not only gave power to the simple user but also made possible savings on information distribution resources, a very different approach from the old centralization concept.

This can be a problem for security or control of that shared information, or in other words a "democratization" of the information (the well known use of P2P for downloading copies of MP3s, programs, and even movies from file sharing networks), and due to it's decentralizing nature the traffic patterns, are hard to predict, so, providing infrastructures to support it is a major problem most ISPs are now aware.

P2P has also been heralded as the solution to index the deep Web since most implantations of P2P technologies are based and oriented to wired networks running TCP/IP. Some are even being transfered to wireless uses (sensors, phones and robotic applications), you probably have already heard of some military implementation of intelligent mines or robotic insect hordes.

Ultimately what made P2P popular was that it created a leveled play field, due to the easy access to computers and networking infrastructures we have today in most parts of the world. We are free to easily become producers, replacing the old centralized models where most of the population remained as consumers dependent on a single entity (monopoly, brand, visibility) for the distribution or creation of services or digital goods. This shift will undoubtedly reduce the costs of production and distribution in general as the price of services and products that can be digital transfered, the cost is now also becoming evident the quality will also be downgraded until a new system for classification emerges, this can be seen today in relation to the written media after the Internet impact.

FIDO net

TODO

TODO
Add missing information

eMail

Electronic mail (often abbreviated as e-mail or email), started as a centralized service for creating, transmitting, or storing primarily text-based human communications with digital communications systems with the first standardization efforts resulting in the adoption of the Simple Mail Transfer Protocol (SMTP), first published as Internet Standard 10 (RFC 821) in 1982.

Modern e-mail systems are based on a store-and-forward model in which e-mail computer server systems, accept, forward, or store messages on behalf of users, who only connect to the e-mail infrastructure with their personal computer or other network-enabled device for the duration of message transmission or retrieval to or from their designated server.

Originally, e-mail consisted only of text messages composed in the ASCII character set, today, virtually any media format can be sent today, including attachments of audio and video clips.

TODO

TODO
Complete

Peer2Mail

Peer to Mail ( http://www.peer2mail.com/ ) is a FreeWare application for Windows that lets you store and share files on any web-mail account, you can use Web-mail providers such as Gmail (Google Mail), Walla!, Yahoo and others, it will split the shared files into segments that will be compressed and encrypted and then sends the file segments one by one to an account you have administration access. To Download the files the process is reversed.

Security

The ecryptation was broken in Peer2Mail v1.4 (prior versions are also affected) - Peer2Mail Encrypt PassDumper Exploit.

Usenet

Usenet is the original peer to peer file-sharing application. It was originally developed to make use of UUCP (Unix to Unix Copy) to synchronize two computers' message queues. Usenet stores each article in an individual file and each newsgroup in its own directory. Synchronizing two peers is as simple as synchronizing selected directories in two disparate filesystems.

Usenet was created with the assumption that everyone would receive, store and forward the same news. This assumption greatly simplified development to the point where a peer was able to connect to any other peer in order to get news. The fragmentation of Usenet into myriad newsgroups allowed it to scale while preserving its basic architecture. 'Every node stores all news' became 'every node stores all news in newsgroups it subscribes to'.

Of all other peer-to-peer protocols, Usenet is closest to Freenet since all nodes are absolutely equal and global maps of the network are not kept by any subset of nodes. Unlike Freenet, which works by recursive pulling of a requested object along a linear chain of peers, Usenet works by recursive pushing of all news to their immediate neighbors into a tree.


FTP

The File Transfer Protocol (FTP), can be seen as a primordial P2P protocol. Even if it depends on a client/server structure the limitation is only on the type of application (client/server) one run since the roles are flexible.

File eXchange Protocol (FXP)

TODO

TODO
Add missing information

Other Software Implementations

JXTA

JXTA™ technology, created by Sun™ ( http://www.jxta.org ), is a set of open protocols that allow any connected device on the network ranging from cell phones and wireless PDAs to PCs and servers to communicate and collaborate in a P2P manner. JXTA peers create a virtual network where any peer may interact with other and their resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports. The project goals are interoperability across different peer-to-peer systems and communities, platform independence, multiple/diverse languages, systems, and networks, and ubiquity: every device with a digital heartbeat. The technology is is licensed using the Apache Software License (similar to the BSD license).

Most of the implementation is done in Java (with some minor examples in C).

iFolder

iFolder ( http://www.ifolder.com ) is an still in early development open source application, developed by Novell, Inc., intended to allow cross-platform file sharing across computer networks by using the Mono/.Net framework.

iFolder operates on the concept of shared folders, where a folder is marked as shared and the contents of the folder are then synchronized to other computers over a network, either directly between computers in a peer-to-peer fashion or through a server. This is intended to allow a single user to synchronize their files between different computers (for example between a work computer and a home computer) or share files with other users (for example a group of people who are collaborating on a project).

The core of the iFolder is actually a project called Simias. It is Simias which actually monitors files for changes, synchronizes these changes and controls the access permissions on folders. The actual iFolder clients (including a graphical desktop client and a web client) are developed as separate programs that communicate with the Simias back-end.

The iFolder client runs in two operating modes, enterprise sharing (with a server) and workgroup sharing (peer-to-peer, or without a server).

Freenet

The Freenet Project ( http://freenetproject.org ), designed to allow the free exchange of information over the Internet without fear of censorship, or reprisal. To achieve this Freenet makes it very difficult for adversaries to reveal the identity, either of the person publishing, or downloading content. The Freenet project started in 1999, released Freenet 0.1 in March 2000, and has been under active development ever since.

Freenet is unique in that it handles the storage of content, meaning that if necessary users can upload content to Freenet and then disconnect. We've discovered that this is a key requirement for many Freenet users. Once uploaded, content is mirrored and moved around the Freenet network, making it very difficult to trace, or to destroy. Content will remain in Freenet for as long as people are retrieving it, although Freenet makes no guarantee that content will be stored indefinitely.

The journey towards Freenet 0.7 began in 2005 with the realization that some of Freenet's most vulnerable users needed to hide the fact that they were using Freenet, not just what they were doing with it. The result of this realization was a ground-up redesign and rewrite of Freenet, adding a "darknet" capability, allowing users to limit who their Freenet software would communicate with to trusted friends. This would make it far more difficult for a third-party to determine who is using Freenet.

Freenet 0.7 also embodies significant improvements to almost every other aspect of Freenet, including efficiency, security, and usability. Freenet is available for Windows, Linux, and OSX. It can be downloaded from:

Software Implementations

All software is available on The Freenet Project page.

Frost an application for Freenet that provides usenet-like message boards and file uploading/downloading/sharing functionalities. It should get installed with Freenet 0.7 automatically if you used the standard Freenet installers.

jSite is a graphical application that you can use to create, insert and manage your own Freenet sites. It was written in Java by Bombe.

Thaw is a filesharing utility and upload/download manager. It is used as a graphical interface for Freenet filesharing.

KaZaa

KaZaa ( http://www.kazaa.com )

TODO

TODO
Add missing info

Software (FastTrack) Implementations

  • Kazaa
  • Kazaa Lite
  • Diet Kaza
  • giFT
  • Grokster
  • iMesh

GNUnet

GNUnet ( http://gnunet.org/ ), was started in late 2001, as a framework for secure peer-to-peer networking that does not use any centralized or otherwise trusted services. A service implemented on top of the networking layer allows anonymous censorship-resistant file-sharing. GNUnet uses a simple, excess-based economic model to allocate resources. Peers in GNUnet monitor each others behavior with respect to resource usage; peers that contribute to the network are rewarded with better service.

GNUnet is part of the GNU project. Our official GNU website can be found at ( http://www.gnu.org/software/gnunet/ ), there is only an existing client, OpenSource, GPL, written in C, that shares the same name as the network. GNUnet can be downloaded from this site or the GNU mirrors.

MANOLITO (MP2P)

MANOLITO or MP2P is the internal protocol name for the proprietary peer-to-peer file sharing network developed by Pablo Soto. MANOLITO uses UDP connections on port 41170 for search routing and is based on Gnutella. In addition file transfers use a proprietary protocol based on TCP.

MANOLITO hosts obtain an entry into the network by contacting an HTTP network gateway, which returns a list of approximately one-hundred MANOLITO hosts. Hosts can also be manually connected to. Servents maintain contact with a fixed number of peers (depending on the Internet connection) that are sent search queries and results.

Software Implementations

Mute File Sharing

MUTE File Sharing ( http://mute-net.sourceforge.net ) is an anonymous, decentralized search-and-download file sharing system. MUTE uses algorithms inspired by ant behavior to route all messages, include file transfers, through a mesh network of neighbor connections.
Author Jason Rohrer - jcr13 (at) cornell (dot) edu Created using C++ and Crypto++ Library, support is provided for multiple OSs there is a frontend for Windows created with MFC, Mute is Open Source and released under the GPL License.

iMesh

iMesh ( http://www.imesh.com ), a free but closed source P2P network (IM2Net) operating on ports 80, 443 and 1863, for Widows. iMesh is owned by an American company iMesh, Inc. and maintains a development center in Israel. An agreement with the MPAA had also been reached. Video files more than 50mb in size and 15 minutes in length can no longer be shared on the iMesh network, guaranteeing feature-length releases cannot be transferred across the network.

BitCoop

BitCoop (http://bitcoop.sourceforge.net/) created by Philippe Marchesseault is a console (Text Based) peer to peer backup system that enables the storage of files on remote computers with cryto and compression support. The size of files depends on the quantity you wish to share with the other peers. It is intended for server farms that wish to backup data among themselves. Supports various Operating Systems including Windows, Linux and Mac OS X, it's implemented in Java (Open Source under the GPL).

CSpace

CSpace (http://cspace.in/) provides a platform for secure, decentralized, user-to-user communication over the Internet. The driving idea behind the CSpace platform is to provide a connect(user,service) primitive, similar to the sockets API connect(ip,port). Applications built on top of CSpace can simply invoke connect(user,service) to establish a connection. The CSpace platform will take care of locating the user and creating a secure, nat/firewall friendly connection. Thus the application developers are relieved of the burden of connection establishment, and can focus on the application-level logic! CSpace is developed in Python. It uses OpenSSL for crypto, and Qt for the GUI. CSpace is licensed under the GPL.

I2P

I2P is a generic anonymous and secure peer to peer communication layer. It is a network that sits on top of another network (in this case, it sits on top of the Internet). It is responsible for delivering a message anonymously and securely to another location.

p300

p300 (http://p300.eu/) is a P2P application created in Java with the intention of provide a just-works-single-download solution for a multitude of Operating Systems without the need to deal with user accounts or specific protocol and security configurations (ie. samba). Another aspect is that p300 is primarily meant to be used in LANs or over VPNs. This application is OpenSource released under the GNU GPL v3.

Netsukuku

Netsukuku (http://netsukuku.freaknet.org/) is a p2p (in mesh) network system, originally developed by FreakNet MediaLab, that can generate and sustains itself autonomously. It is designed to handle an unlimited number of nodes with minimal CPU and memory resources. It seems that it can be easily used to build a worldwide distributed, anonymous decentralized network, above the Internet, without the support of any servers, ISPs or authority controls. Netsukuku replaces the network level 3 of the OSI model with another routing protocol. An open source Python implementation was finished in October 2007.

Netsukuku is based on a very simple idea: extending the concept of Wi-Fi mesh networks to a global scale, although not necessarily using that medium. With the use of specialized routing protocols and algorithms, the current Wi-Fi technologies can be exploited to allow the formation of a global P2P wireless network, where every peer (node) is connected to its neighbors.

Other media will be equally functional to interconnect nodes, as the interaction is independent of that which it is transmitted through, but it is believed that Wi-Fi will be the most practical for ordinary users to take advantage of. Once greater proliferation has been achieved, it may become common to see some nodes establishing high-speed land-line connections between each other in the interest of increasing network bandwidth for connections over it and lowering latencies.

Other

  • Thunderbolt (aka Thunder) (http://www.xunlei.com/) was created by Xunlei Network Technology Ltd. Thunderbolt proprietary P2P network supports Multi-protocol P2P resources (supports BitTorrent, eDonkey, Kad, and FTP) and also HTTP downloads (a download accelerator) as it web caches to aid in accelerating of downloads. It is mainly used in the Mainland China, recently an English translation has been released. Of particularly interest is that on January 5, 2007, Google acquired a 4% stake on the company.
  • XNap ( http://xnap.sourceforge.net/ ) OpenSource (GPL), written in Java. The client features a modern Swing based user interface and console support. Able to work in several P2P Networks OpenNap, Gnutella, Overnet and OpenFT (and other networks supported by giFT like FastTrack). It also supports ICQ and IRC, viewers for MP3 tags, images, PDF, ZIP files and Text-To-Speech.
  • Filetopia ( http://www.filetopia.com ), a free but closed source server/clients P2P application for Windows. It includes, instant messaging, chat and file sharing system with a search engine, online friends list and message boards. It also supports the use of a bouncer (open source,Java) as an anonymity layer, that enables indirect connections.
  • Omemo ( http://www.omemo.com ), a free and open source (Visual Basic) P2P application under the GPL, from Pablo Soto creator of the MANOLITO protocol and Blubster. Omemo takes a different approach and uses a ring-shaped DHT based on Chord. It is meant to support key based routing while keeping query source obscurity due to randomization. Available for Windows.
  • Napster network
    • WinMX
    • Napigator
    • FileNavigator
  • WPNP network
    • WinMX
  • other networks
    • MojoNation
    • Carracho
    • Hotwire
    • Chord peer-to-peer lookup service|Chord
    • Dexter
    • Swarmcast
    • Alpine program|Alpine
    • Scribe
    • Groove
    • Squid_Soft|Squid
    • Akamai
    • Evernet
    • Overnet network
    • Audiogalaxy network
    • SongSpy network
    • The Circle
    • OpenFT
  • Acquisition
  • Cabos
  • Swapper
  • SoulSeek

Building a P2P System

Developer

Selecting the Programming Language

TODO

TODO

RAD (prototype) vs resource use optimization

Selecting the License

Open vs Closed Source

How can P2P generate revenue

TODO

TODO

Pushing content
Marketing(ads,...)
Monitoring and Control (know what your customers are searching for,control what they see)
Premium User list

Donations

Donations is a model that is open for all types of software, open source or closed source, the objective is to let users to freely contribute to a project they like, most probably you will not get a fixed income of this revenue source but it may not be the only way you use to get a payoff or even profit from the project. If you use this method attempt to be clear on how the donations will be used (to further development, etc...), immediate needs you may have (hosting, services and equipment to develop and test ).

Depending were your project is located and how it is structured and focused there are ways to maximize the revenue, a common way to run a donation based project is to set up a non-profit corporation (you can even issue receipts for tax purposes).

Most people my not like it but providing a donors/supporters page list does incentive participation, and may event show users that small amount are a help, if you decide to list donors do offer an option to be excluded from it.

For an in depth look on the most of donation problems, you may read the article, When Do Users Donate? Experiments with Donationware: Ethical Software, Work Equalization, Temporary Licenses, Collective Bargaining, and Microdonations ( http://www.donationcoder.com/Articles/One/index.html ). You may even try to join their project or support similar ones, like for instance microPledge ( http://micropledge.com/ ) or even create a similar offering around your own product.

Examples:

Money
TODO

TODO
micro-payment - PayPal, Amazon Gift Certificate.

Hardware

Is is also common programmers or projects to accept Hardware donations by request or to incentive the project to add support to exclusive features or specific setups. If you adopt to support this feature do provide and maintain a list of hardware that is wanted and how it would help you.

Shareware / for Pay

This is the most problematic setup due to the legal hot-waters it can get you into and the formalisms and obligations that you need to comply to.

Also restricting the participation on the network will be intentionally reducing its usefulness, this is why most P2P services are free or at least support some level of free access.

Variations

There are several models that are variations of the simple donation/pay model, they give specific goals to the users or to the project in relation to the values collected.

Ransom
Put features or the code of the application up for a ransom payment, if people do contribute and fill that goal you accept to comply with your proposal (ie: opening the source code. fix or implement a feature).
Ransom has been shown to works in practice, it is used in several open source initiatives and even writers have tested this scheme, an example of the later is the test the writer Lawrence Watt-Evans has done on several titles, all successful reached his monetary and production goal.
Pay for features
In a variation of the ransom model, in this particular case you should be extra careful to inform users on what they are paying for, and the legality of what you are providing for that payment. Extra feature may be better services or even a better quality for the existing ones.
Paid support
Paid support include providing users access to a paid prioritized service for technical support, this is very commonly use on Open Source projects. You should restrain yourself for over complicating the software so you can profit from it, as the users will be the network. One solution is to provide a default dumb down version for public consumption and enable a very high degree of tweaking of the software, protocol or network and then attempt to profit for it.

License new technology

In case you came up with a new technology or a way new interconnect existing ones that to can be made into revenue source.

TODO

TODO
Complete & Examples

Venture Capital

TODO

TODO
add know list of VC firms that support P2P development

Level of Control

The Peer

The Peer and the user running it, is the corner stone of all P2P systems, without peers you will not be able to create the Network, this seems obvious but it is very common to disregard the users needs and focus on the final objective the Network itself, kind of looking to a florets and not seeing the trees.

TODO

TODO
...personal computer...

Building communities

There is a benefit in incorporating the social element into the software. Enabling people to come together not only to share content or services but around a common goal were cooperation for an optimal state should remain the final objective.

Communities built around the shared goal of a functional peer to peer network will help not only to improve the system but to establish a relation of trust that is fallowed by the emergence of reputation latter. Across the participants and in the network and software itself, this personalization will ultimately enable to extend this trust and reputation to the multitude of relations that are possible in a peer to peer network.

TODO

TODO
Extend and clarify, also link to security and leeching behaviors

A user oriented GUI

As you start to project your P2P application the GUI is what the users will have to interact with to use your creation, you should attempt to define not only what OS you will support but within what framework you can design the application to be used from a WEB browser or select a portable framework so you can port it to other systems.

Aside form the technical decisions the functionally you offer should also be considered, the best approach is to be consistent and offer similar options to existing implementations, other applications or even how it is normally done on the environment/OS you are using. There are several guidelines you may opt to fallow for instance Apple provides a guideline for OSX ( http://developer.apple.com/documentation/UserExperience/Conceptual/OSXHIGuidelines/ ).

Overwhelming a user with options is always a bad solution and will only be enticing to highly experienced users, even if it done based on what you like, you should keep in mind that you aren't creating it for your own use.

TODO

TODO
Complete

Topology

The topology of a P2P network can be very diverse, it depends on the medium it is run (Hardware), the size of the network (LAN,WAN) or even on the software/protocol that can impose or enable a specific network organization to emerge.

Below we can see the most common used topologies (there can be mixed topologies or even layered on the same network), most if not all peer to peer networks are classified as overlay networks, since they are created over an already existing network with its own topology.

Ring topology - Mesh topology - Star topology - Total Mesh or Full Mesh - Line topology - Tree topology - Bus topology - Hybrid topology

Since most P2P networks are also a type of overlay networks, the resulting topology of a P2P system depends also on the protocol, the infrastructure (medium/base network) or by the interaction of peers. This "variables" add more layers to the complexity of the normal networks, were the basic characteristics are bandwidth, latency and robustness, making most P2P networks into self-organising overlay networks.

When performing studies of P2P networks the resulting topology (structural properties) is of primary importance, there are several papers on the optimization or characteristics of P2P networks.

The paper Effective networks for real-time distributed processing ( http://arxiv.org/abs/physics/0612134 ) by Gonzalo Travieso and Luciano da Fontoura Costa, seems to indicate that uniformly random interconnectivity scheme, is specific Erdős-Rényi (ER) random network model with fixed number of edges, as being largely more efficient than the scale-free counterpart, the Barabási-Albert(BA) scale-free model.

Testing/Debugging the system

TODO

TODO
Add missing info

Virtual Machines

TODO

TODO
Add missing info

Ethernet crossover cable network

A simple, two-computer network between two computers can be created using an Ethernet crossover cable. As in any other TCP/IP network, each computer needs to be assigned a unique IP address. In this network configuration, a default gateway is not used and can be unspecified.

Example crossover network configuration using arbitrary unique IP addresses
Machine 1 Machine 2
IP address 192.168.0.1 192.168.0.2
Subnet Mask 255.255.255.0

Information on how to deploy such network on Windows systems can be found here

Software Tools

TODO

TODO
Add missing info

Self-Organizing Systems (SOS)

P2P systems fall by default in this category, taking in account that most try to remove the centralization or Control and Command (C&C) structure that exists in most centralized systems/networks.

The study of self-organizing systems is relatively new, and it applies to a huge variety of systems or structures from organizations to natural occurring events). Mostly done with math and depending on models and simulations, its accuracy depends on the complexity of the structure, number of intervenients and initial options.

This book will not cover this aspect of the P2P systems in great detail but some references and structural characteristics do run in parallel with the SOS concepts. For more information on SOS check out the Self-Organizing Systems (SOS) FAQ for USENET Newsgroup comp.theory.self-org-sys (http://www.calresco.org/sos/sosfaq.htm).

Synergy
One of the emerging characteristic of SOS is synergy, where members aggregate around a common goal each working to the benefit of all. This is also present in most P2P systems were peers work to optimally share resources and improve the network.

Swarm
The effect generated of participants in a SOS, to aggregate and do complementary work so to share knowledge and behavior in a way it improves coordination. This is observable in the natural world on flocks of birds or social insects. We will covert this aspect of P2P later on, but keep it in mind that it related to SOS.

Bootstrap

Most P2P systems don't have (or need) a central server but need to know a entry point into the network, this is what is called bootstrapping the P2P application, it deals with the connectivity of peers, to be able find and connect other P2P peers (the network) even without having a concrete idea who and what is where...

TODO

TODO
Complete... Avahi ( http://avahi.org/ )

Hybrid vs real-Peer systems

One of the main objectives of the P2P system is to make sure no single part of it critical to the collective objective. By introducing any type of centralization to a peer-to-peer Network one is creating points of failure, as some Peers will be more than others this can even lead to security or stability problems, as with the old server-client model, were a single user could crash the server and deny its use to others.

TODO

TODO
Complete

Availability

Due to the instability of a P2P network, were nodes are always joining and leaving, some efforts must be made to guarantee not only that the network is available and enabling new peers to join, but that the resources shared continue to be recognized or at least indexed while temporarily not available.

TODO

TODO
Complete

Integrity

TODO

TODO
Complete

Due to the open nature of peer-to-peer networks, most are under constant attack by people with a variety of motives. Most attacks can be defeated or controlled by careful design of the peer-to-peer network and through the use of encryption. P2P network defense is in fact closely related to the "Byzantine Generals Problem". However, almost any network will fail when the majority of the peers are trying to damage it, and many protocols may be rendered impotent by far fewer numbers.

Clustering

Computer science defines a computer cluster in general terms as a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer.

The components of a cluster are commonly, connected to each other through fast networks and usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.

As we have seen before, this concept if applied to distributed networks or WANs (in place of LANs), generates distributed computations, grids and other systems. All of those applications are part of the P2P concept.

We loosely define clusters as a physical, social or even economical/statistical event. That is is defined by the aggregation of entities due to sharing a property in common, that property may be a shared purpose or a characteristic, or any other communality.

As we look at the topologies generated by P2P networks we can observe that most protocols generate some kind of clustering around networks structures, resources and they can even emerge as a result of the status of network conditions. Clustering is then an unsupervised learning problem, an automatic emerging event that results on the creation of ad hoc collection of unlabeled objects (data/items or events). For more information on clusters you may check A Tutorial on Clustering Algorithms ( http://home.dei.polimi.it//matteucc/Clustering/tutorial_html/ ).

A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters using the same set of characteristics.

This concept is very important on not only the form of P2P networks but has also implications on the social structure/relations that can be build upon the use of P2P applications.

TODO

TODO

Efficient Algorithms for K-Means Clustering Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu

Flashcrowds

Flashcrowd is a behavioral model in that participants will tend to aggregate/crowd around an event, in P2P terms this can be a scarce resource, for instance as we will see later BitTorrent promotes big files over small ones and new over old, this is a results of a connection to the network based on single items, the speeds on BitTorrent does depend uppermost on generating flashcrowds around files (more peers, more speed that will snow ball in more seeders).

This can also result of DoS (Denial of Service) or flood, for instance if a P2P system is poorly designed attempting to connect to the network by a significant number of peers may disturb the bootstrap method used.

You should educate users, more connections doesn't equate to more speed, at most it will result in more responses to queries but that may depend on how the protocol is structured, but enabling them will cost bandwidth. On the other side more peers will result on more resources to be shared that will also result on overlapping of shares and so better speeds, as a P2P network gets bigger the better it can provide its users.

Optimizations

There are simple optimization that should be done in any P2P Protocol that could bring a benefit to both peers and the network in general based on system metrics or profile like IP (ISP or range), content, physical location, share history, ratios, searches and many other variables.

Most of the logic/characteristics of P2P networks and topologies (in a WAN environment) will result in aggregation of peers and so this clusters will share the same properties of Distributed Behavioral Models, like Flocks, Herds and Schools this results in a easier to study environment and to establish correlation about the peers relations and extrapolate ways to improve efficiency.

  • Alignment
To get information/characteristics from the local system/environment in order to optimize the peer "location" on the network by selecting and optimize the separation e cohesion functions to improve the local neighborhood.
  • Separation
To implement a way to avoid crowding with other peers based on unwanted alignment.
  • Cohesion
To select peers based on their own alignment.

P2P Networks Traffic

The P2P traffic detected on the Internet due to the nature of the protocols and topology used some times can only be done by estimation based on the perceived use (users on-line, number of downloads of a given implementation) or by doing point checks on the networks itself. It is even possible to access the information if the implementation on the protocol or application was done with this objective in mind, several implementations of Gnutella a for example have that option and Bearshare did even reports some of the users system parameters, like type of firewall etc...

Examples of services that provide such traffic information over P2P networks are for instance Cachelogic ( http://www.cachelogic.com/research/2005_slide16.php# ).

TODO

TODO
Complete...

Communication

One of your most important considerations is how you project the way peers will communicate, even if we discard the use of central servers like SuperNodes and multiple distributed clients as Peer/Nodes there will be several questions to consider:

  • Will communication need to go across firewalls and proxy servers?
  • Is the network transmission speed important? Can it be configured by the user?
  • Will communications be synchronous or asynchronous?
  • Will it need/use only a single port? what port should we use?
  • What resources will you need to support? Is there size limits? Should compression be used ?
  • Will the data have to be encrypted?
  • etc...

One must carefully consider your project's specific goals and requirements, this will help you evaluate the use of toolkits and frameworks. Try not to reinvent the wheel if you can't came up with a better solution or have the time,capacity or disposition to. You can also use open standards (ie: use the HTTP protocol) to, but are also free to explore other approaches.

Security Considerations

By using a P2P system users will broadcast their existence to others, this in contrast to a centralized service were they may interact with others but their anonymity can be protected.

Violating the security of a network can be a crime, for instance the 2008 case of the research project from University of Colorado and University of Washington, the researchers engaged in the motorization of users across the Tor anonymous proxy network and could have faced legal risks for the snooping.

This vulnerability of distributed communications can result in identity attacks (e.g. tracking down the users of the network and harassing or legally attacking them), DoS, Spamming, eavedropping and other threats or abuses. All this actions are generally targeted to a single user and some may even be automated, there are several actions the creator can take to make it more difficult but ultimately they can't be stopped and should be expected and dealt with, one of the first steps is to provide information to the user so they can locally implement hardware or software actions and even a social behavior to counteract this abuse.

DoS (denial of service), Spamming

Since each user is a "server" they are also prone to denial of service attacks (attacks that may, if optimized, make the network run very slowly or break completely), the result may depend on the attacker resources and how the decentralized is the P2P protocol on the other hand to be the target of spamm (e.g. sending unsolicited information across the network- not necessarily as a denial of service attack) does only depend how visible and contactable you are, if for instance other users can send messages to you. Most P2P applications support some kind of chat system and this type of abuse is very hold on such system, they can address the problem but will complete solve it, what can lead to social engineering attacks were users can be lead to perform actions that will compromise them or their system, on this last point only giving information to users that enables them to be aware of the risk will work.

ARP Attack
TODO

TODO
Complete this section

Eavesdropping

Hardware traffic control

TODO

TODO

ISPs and Net Neutrality, filtering (network operators may attempt to prevent peer-to-peer network data from being carried)

Software traffic control

Since most Network applications and in specific P2P tools are prone to be a source of security problems (they will bypass some of the default security measures from inside), when using or creating such a tool one must take care on granting the possibility or configuring the system to be as safe as possible.

tools for security

There are several tools and options that can be used for this effect, be it configuring a firewall, adding a IP blocker or making sure some restrictions are turned on by default as you deploy your application.

  • PeerGuardian 2 ( http://phoenixlabs.org/pg2/ ) a OpenSource tool produced by Phoenix Labs’, consisting in a IP blocker for Windows OS that supports multiple lists, list editing, automatic updates, and blocking all of IPv4 (TCP, UDP, ICMP, etc)
  • PeerGuardian Lite ( http://phoenixlabs.org/pglite/ ) a version of the PeerGuardian 2 that is aimed at having a low system footprint.

blocklists

Blocklists are text files containing the IP addresses of organizations opposed and actively working against file-sharing (such as the RIAA), any enterprise that mines the networks or attempts to use resources without participating in the actual sharing of files. It is basically a spam filter like the ones that exist for eMail systems.

TODO

TODO

Add examples and resources

Firewalls

As computers attempt to be more secure for the user, todays OS will provide by default some form of external communication restriction that will permit the user to define different levels of trust, this is called a Firewall. A Firewall may have hardware or software implementation and is configured to permit, deny, or proxy data through a computer network. Most recent OSs will come with a software implementation running, since a connection to the Internet are becoming common and the lack or even the default configuration of the Firewall can cause some difficulties to the use of P2P applications.

TODO

TODO
Complete

on Windows

Microsoft in the last OS releases has taken the chance to provide some security by default to it's users, by including and enabling a simple firewall solutions.

Routers

NAT

NAT (network address translation)

TODO

TODO
ICS (Internet Connection Sharing) in Windows 2000+

NAT Traversal

Users behind NAT should be able to connect with each other, there are some solutions available that try to enable it.

STUN

STUN (Simple Traversal of UDP over NATs) is a network protocol which helps many types of software and hardware receive UDP data properly through home broadband routers that use NAT ).

Quoted from its standard document, RFC 3489:

"Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) (STUN) is a lightweight protocol that allows applications to discover the presence and types of NATs and firewalls between them and the public Internet.
It also provides the ability for applications to determine the public IP address allocated to them by the NAT.
"STUN works with many existing NATs, and does not require any special behavior from them. As a result, it allows a wide variety of applications to work through existing NAT infrastructure."

As STUN RFC states this protocol is not a cure-all for the problems associated with NAT but it is particularly helpful for getting voice over IP working through home routers. VoIP signaling protocols like SIP use UDP packets for the transfer of sound data over the Internet, but these UDP packets often have trouble getting through NATs in home routers.

STUN is a client-server protocol. A VoIP phone or software package may include a STUN client, which will send a request to a STUN server. The server then reports back to the STUN client what the public IP address of the NAT router is, and what port was opened by the NAT to allow incoming traffic back in to the network.

The response also allows the STUN client to determine what type of NAT is in use, as different types of NATs handle incoming UDP packets differently. It will work with three of four main types: full cone NAT, restricted cone NAT, and port restricted cone NAT. It will not work with symmetric NAT (also known as bi-directional NAT) which is often found in the networks of large companies.

Port Forwarding

TODO

TODO
Complete

UPnP

The Universal Plug and Play (UPnP) architecture consists in a set of open standards and technologies promulgated by the UPnP Forum ( http://www.upnp.org/ ), with the goal of extending the Plug and Play concept to support networks and peer-to-peer discovery, configuration and control and so enable that appliances, PCs, and services be able to connect transparently.

As UPnP is offered in most modern routers and network devices and also supported by Microsoft since Windows XP, it is a necessity that a P2P application should support this architecture and avoid the requirements that users deal with the necessary changes when UPnP is enabled (by default is should be disable due to security risks).

Unique ID

Numbers control your life. Over time anyone identified by multitude of numbers. Like a phone number, your credit card numbers, the driver's license number, the social security number, even zip codes or your car license plate. They way this unique numbers are created, attributed and verified is fascinating. The Unique ID site (http://www.highprogrammer.com/alan/numbers/) by Alan De Smet provides information on the general subject. To our particular subject what is relevant is the algorithms behind it all. How to establish a unique identifiers for users and resources.

Universally Unique Identifiers (UUID) / Globally Unique Identifier (GUID)

For a P2P protocol/application to be able to manage user identification, authentication, build a routing protocol, identify resources etc... there is a need for (a set of) Unique Identifiers. While a true peer to peer protocol doesn't intend to establish a centralized service, it can frequently make use of an already established such service. So for instance, Usenet makes use of domain names to create globally unique identifier for articles. If the protocol refrains from even make use of such a service then uniqueness can only be based on random numbers and mathematical probabilities.

The problem of generating unique IDs can be broken down as uniqueness over space and uniqueness over time which, when combined, aim to produce a globally unique sequence. This leads to a problem detected over some P2P networks using Open Protocols/Multiple vendors implementations, due to the use of different algorithms on the generation of the GUIDs the uniqueness over space is broken leading to sporadic collisions.

UUIDs are officially and specifically defined as part of the ISO-11578 standard other specifications also exist, like RFC 4122, ITU-T Rec. X.667.

Examples of uses

  1. Usenet article IDs.
  2. In Microsoft's Component Object Model (COM) morass, an object oriented programming model that incorporates MFC (Microsoft Foundation Classes), OLE (Object Linking Embedding), ActiveX, ActiveMovie and everything else Microsoft is hawking lately, a GUID is a 16 byte or 128 bit number used to uniquely identify objects, data formats, everything.
  3. The identifiers in the windows registry.
  4. The identifiers used in used in RPC (remote procedure calls).
  5. Within ActiveMovie, there are GUID's for video formats, corresponding to the FOURCC's or Four Character Codes used in Video for Windows. These are specified in the file uuids.h in the Active Movie Software Developer Kit (SDK). ActiveMovie needs to pass around GUID's that correspond to the FOURCC for the video in an AVI file.

Security There is a know fragility on UUIDs of version 1 (time and node based), as they broadcast the node's ID.

Software implementation

Programmers needing to implement UUID could take a look on these examples:

  • OSSP uuid ( http://www.ossp.org/pkg/lib/uuid/ ) is an API for ISO C, ISO C++, Perl and PHP and a corresponding CLI for the generation of DCE 1.1, ISO/IEC 11578:1996, and RFC4122 compliant Universally Unique Identifiers (UUIDs). It supports DCE 1.1 variant UUIDs of version 1 (time and node based), version 3 (name based, MD5), version 4 (random number based), and version 5 (name based, SHA-1). UUIDs are 128-bit numbers that are intended to have a high likelihood of uniqueness over space and time and are computationally difficult to guess. They are globally unique identifiers that can be locally generated without contacting a global registration authority. It is Open Sourced under the MIT/X Consortium License.

Hashes, Cryptography and Compression

Most P2P systems have to implement at least one algorithms for Hashing, D/Encryption and De/Compression, this section will try to provide some ideas of this actions in relation to the P2P subject as we will address this issues later in other sections.

Detailed information on the subject can be found on the Cryptography Wikibook ( http://en.wikibooks.org/wiki/Cryptography ).

One way of creating structured P2P networks is by maintaining a Distributed Hash Table (DHT), that will server as a distributed index of the resources on the network.

Another need for cryptography is in the protection of the integrity of the distributed resources themselves, to make them able to survive an attack most implementations of P2P some kind of Hash function (MD5, SHA1) and may even implement a Hash tree designed to detect corruption of the resource content as a hole or of the parts a user gets (for instance using Tiger Tree Hash).

Hash function

A hash function is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital "fingerprint" of the data. The algorithm substitutes or transposes the data to create such fingerprints. The fingerprints are called hash sums, hash values, hash codes or simply hashes. When you referring to hash or hashes some attention must be given since it can also mean the hash functions.

A group of hashes (the result of applying an hash function to data) will sometimes be referred as a bucked or more properly a hash bucket. Most hash buckets if generated by a non colliding hash function will generate distinct hashes for a give data input, there are other characteristics that should be considered when selecting an hash function but it goes beyond the scope of this book, just remember to check if the hash function/algorithm you are implementing satisfies your requirements. Wikibooks has several books that covers hashes in some way, you can check more about the subject in Algorithm implementation or Cryptography

A typical hash function at work

Hash sums are commonly used as indices into hash tables or hash files. Cryptographic hash functions are used for various purposes in information security applications.

Choosing a good hash function

A good hash function is essential for good hash table performance. A poor choice of a hash function is likely to lead to clustering, in which probability of keys mapping to the same hash bucket (i.e. a collision) is significantly greater than would be expected from a random function. A nonzero collision probability is inevitable in any hash implementation, but usually the number of operations required to resolve a collision scales linearly with the number of keys mapping to the same bucket, so excess collisions will degrade performance significantly. In addition, some hash functions are computationally expensive, so the amount of time (and, in some cases, memory) taken to compute the hash may be burdensome.

Simplicity and speed are readily measured objectively (by number of lines of code and CPU benchmarks, for example), but strength is a more slippery concept. Obviously, a cryptographic hash function such as SHA-1 would satisfy the relatively lax strength requirements needed for hash tables, but their slowness and complexity makes them unappealing. However, using cryptographic hash functions can protect against collision attacks when the hash table modulus and its factors can be kept secret from the attacker, or alternatively, by applying a secret salt. However, for these specialized cases, a universal hash function can be used instead of one static hash.

In the absence of a standard measure for hash function strength, the current state of the art is to employ a battery of statistical tests to measure whether the hash function can be readily distinguished from a random function. Arguably the most important test is to determine whether the hash function displays the avalanche effect, which essentially states that any single-bit change in the input key should affect on average half the bits in the output. Bret Mulvey advocates testing the strict avalanche condition in particular, which states that, for any single-bit change, each of the output bits should change with probability one-half, independent of the other bits in the key. Purely additive hash functions such as CRC fail this stronger condition miserably.

NOTE:
CRC is often used to denote either the function or the function's output. A CRC can be used in the same way as a checksum to detect accidental alteration of data during transmission or storage. CRCs are popular because they are simple to implement in binary hardware, are easy to analyze mathematically, and are particularly good at detecting common errors caused by noise in transmission channels. Historically CRCs have been given an ample used as a error detection/correction in telecommunications.

For additional information on Hashing:

Collision avoidance

TODO

TODO
Complete

Implementing a Hash algorithm

Most Hash algorithms are have an high degree of complexity and are designed for a specific target use and may not apply with the same level of guarantees in each task. The algorithms or raw descriptions are freely accessible so you can implement you own version or select to use an already existing and tested implementation.

NOTE:
Some Hash functions may be subject to export restrictions.

  • Mhash ( http://mhash.sourceforge.net/ ) is an OpenSource (under GNU Lesser GPL) C library which provides a uniform interface to a large number of hash algorithms (SHA1, SHA160, SHA192, SHA224, SHA384, SHA512, HAVAL128, HAVAL160, HAVAL192, HAVAL224, HAVAL256, RIPEMD128, RIPEMD256, RIPEMD320, MD4, MD5, TIGER, TIGER128, TIGER160, ALDER32, CRC32, CRC32b, WHIRLPOOL, GOST, SNEFRU128, SNEFRU256), for Windows support you need to use cygwin to compile. A Python interface exists.

Hash tree (Merkle trees)

In cryptography, hash trees' (also known as Merkle trees, invented in 1979 by Ralph Merkle) are an extension of the simpler concept of hash list, which in turn is an extension of the old concept of hashing. It is a hash construct that exhibits desirable properties for verifying the integrity of files and file subranges in an incremental or out-of-order fashion.

Hash trees where the underlying hash function is Tiger ( http://www.cs.technion.ac.il/~biham/Reports/Tiger/ ) are often called Tiger trees or Tiger tree hashes.

The main use of hash trees is to make sure that data blocks received from other peers in a peer-to-peer network are received undamaged and unaltered, and even to check that the other peers do not send adulterated blocks of data. This will optimize the use of the Network and permit to quickly exclude adulterated content in place of waiting for the download of the hole file to complete to check with a single hash, an partial or complete hash tree can be downloaded and the integrity of each branch can be checked immediately (since they consist in "hashed" blocks or leaves of the Hash tree), even though the whole tree/content is not available yet, making also possible for the downloading peer to upload blocks of an unfinished files.

Usually, a cryptographic hash function such as SHA-1, Whirlpool, or Tiger is used for the hashing. If the hash tree only needs to protect against unintentional damage, the much less secure checksums such as CRCs can be used.

In the top of a hash tree there is a top hash (or root hash or master hash). Before downloading a file on a P2P network, in most cases the top hash is acquired from a trusted source (a Peer or a central server that has elevated trust ratio). When the top hash is available, the hash tree can then be received from any source. The received hash tree is then checked against the trusted top hash, and if the hash tree is damaged or corrupted, another hash tree from another source will be tried until the program finds one that matches the top hash.

This requires several considerations:

  1. What is a trusted source for the root hash.
  2. A consistent implementation of the hashing algorithm (for example the size of the blocks to be transfered must be known and constant on every file transfer).

Tiger Tree Hash (TTH)

The Tiger tree hash is one of most widely used form of hash tree on P2P Networks. It uses a binary hash tree (two child nodes under each node), usually has a data block size of 1024-bytes and uses the cryptographically secure Tiger hash.

Tiger hash is used because it's fast (and the tree requires the computations of a lot of hashes), with recent implementations and architectures, TTH is as fast as SHA1, with more optimization and the use of 64-bit processors, it will become faster, even though it generates larger hash values (192 bits vs. 160 for SHA1).

Tiger tree hashes are used in the Gnutella, Gnutella2, and Direct Connect and many other P2P file sharing protocols and in file sharing in general.

A step by step introduction to the TTH is available as part of the Tree Hash Exchange (THEX) format ( http://open-content.net/specs/draft-jchapweske-thex-02.html ) page.

Hash table

In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key, find the corresponding value. It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location ("bucket") where the values should be.

A small phone book as a hash table.

Hash tables support the efficient addition of new entries, and the time spent searching for the required data is independent of the number of items stored (i.e. O(1).)

In P2P system Hash tables are used locally on every client/server application to perform the routing of data or the local indexing of files, this concept is taken further as we try to use the same system in a distributed way, in that case distributed hash tables are used to solve the problem.

Distributed Hash Table (DHT)

The Distributed hash tables (DHTs) concept was made public in 2001 but very few did publicly-release robust implementations.

Protocols
  • Content addressable network (CAN)
  • Chord ( http://pdos.csail.mit.edu/chord/ ) - aims to build scalable, robust distributed systems using peer-to-peer ideas. It is completely decentralized and symmetric, and can find data using only log(N) messages, where N is the number of nodes in the system. Chord's lookup mechanism is provably robust in the face of frequent node failures and re-joins. A single research implementation is available in C but there are other implementations in C++, Java and Python.
  • Tulip
  • Tapestry
  • Pastry
    • Bamboo (http://bamboo-dht.org/) - based on Pastry, a re-engineering of the Pastry protocols written in Java and licensed under the BSD license.
TODO

TODO
Tulip seems to have a C++ implementation couldn't find info about it...

Encryption

A part of the security of any P2P Network, encryption is needed to make make sure only the "allowed" parties have access to sensitive data. Examples are the encryption of the data on a server/client setup (even on P2P) were clients could share data without fear of it being accessed on the server (a mix of this is if the Network would in itself enable a distributed cache mechanism for transfers, server-less), encryption of transfers, to prevent man-in-the-middle attacks, or monitor of data (see FreeNet) and many other applications with the intent of protecting the privacy and enable an extended level of security to Networks.

There are several algorithm that can be used to implement encryption most used by P2P project include: BlowFish.

TODO

TODO
Extend and concrete provide examples

Compression

TODO

TODO
Complete

Resources (Content, other)

TODO

TODO
...digital assets...

Metadata

Metadata (meta data, or sometimes metainformation) is "data about data". An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema.

Data is the lowest level of abstraction for knowledge, information is the next, and finally, we have knowledge highest level among all three. Metadata consists on direct and indirect data that help define the knowledge about the target item.

The hierarchy of metadata descriptions can go on forever, but usually context or semantic understanding makes extensively detailed explanations unnecessary.

The role played by any particular datum depends on the context. For example, when considering the geography of London, "E83BJ" would be a datum and "Post Code" would be metadatum. But, when considering the data management of an automated system that manages geographical data, "Post Code" might be a datum and then "data item name" and "5 characters, starting with A – Z" would be metadata.

In any particular context, metadata characterizes the data it describes, not the entity described by that data. So, in relation to "E83BJ", the datum "is in London" is a further description of the place in the real world which has the post code "E83BJ", not of the code itself. Therefore, although it is providing information connected to "E83BJ" (telling us that this is the post code of a place in London), this would not normally be considered metadata, as it is describing "E83BJ" qua place in the real world and not qua data.

Difference between data and metadata

Usually it is not possible to distinguish between (plain) data and metadata because:

  • Something can be data and metadata at the same time. The headline of an article is both its title (metadata) and part of its text (data).
  • Data and metadata can change their roles. A poem, as such, would be regarded as data, but if there were a song that used it as lyrics, the whole poem could be attached to an audio file of the song as metadata. Thus, the labeling depends on the point of view.

These considerations apply no matter which of the above definitions is considered, except where explicit markup is used to denote what is data and what is metadata.

Hierarchies of metadata

When structured into a hierarchical arrangement, metadata is more properly called an ontology or schema. Both terms describe "what exists" for some purpose or to enable some action. For instance, the arrangement of subject headings in a library catalog serves not only as a guide to finding books on a particular subject in the stacks, but also as a guide to what subjects "exist" in the library's own ontology and how more specialized topics are related to or derived from the more general subject headings.

Metadata is frequently stored in a central location and used to help organizations standardize their data. This information is typically stored in a Metadata registry.

Schema examples

A good example on work being done on Metadata and how to make it accessible and useful can be examined in the W3C's old page Metadata Activity Statement (http://www.w3.org/Metadata/Activity.html) and the Resource Description Framework (RDF), that by the use of a declarative language, provides a standard way for using XML to represent metadata in the form of statements about properties and relationships of items on the Web. The RDF has it's own and more up to data page at http://www.w3.org/RDF/.

Metadata registries

Free Services
  • MusicBrainz ( http://musicbrainz.org/ ) is a community music metadatabase (nonprofit service) that attempts to create a comprehensive music information site. You can use the MusicBrainz data either by browsing this web site, or you can access the data from a client program — for example, a CD player program can use MusicBrainz to identify CDs and provide information about the CD, about the artist or about related information. MusicBrainz is also supporting MusicIP's Open FingerprintTM Architecture, which identifies the sounds in an audio file, regardless of variations in the digital-file details. The community provides a REST styled XML based Web Service.

Database & Index

TODO

TODO
...type... ...location... ...route... ...tracking... ...scalable... ...flexible...

using a DHT

TODO

TODO
Complete

Searches

TODO

TODO
Mention Boyer-Moore algorithm

Services

TODO

TODO
...tracking...

Seeds

Leechers

In Nature, cooperation is widespread but so too are leechers (cheats, mutants). In evolutionary terms, cheats should indeed prosper, since they don´t contribute to the collective good but simply reap the benefits of others’ cooperative efforts, but they don't. Both compete for the same goal using different strategies, cooperation is the path of less cost to all (even for leechers on the long run), cooperation provides stability and previsibility and on the other hand if cheats are not kept in some sort of equilibrium they generate a degradation of the system that can lead to its global failure.

In computer science and especially on the Internet, being a leech or leecher refers to the practice of benefiting, usually deliberately, from others' information or effort but not offering anything in return, or only token offerings in an attempt to avoid being called a leech. They are universally derided.

The name derives from the leech, an animal which sucks blood and then tries to leave unnoticed. Other terms are used, such as freeloader, but leech is the most common.

Examples

  • On peer to peer networks, a leecher shares nothing (or very little of little worth) for upload. Many applications have options for dealing with leeches, such as uploading at reduced rates to those who share nothing, or simply not allowing uploads to them at all. Some file-sharing forums have an anti-leech policy to protect the download content, where it will require users to expend more energy or patience than most leechers are willing to before they can access the "download area".
  • Most BitTorrent sites refer to leeches as clients who are downloading a file, but can't seed it because they don't have a complete copy of it. They are by default configured to allow a certain client to download more when they upload more.
  • When on a shared network (Such as a school or office LAN), any deliberate overuse of bandwidth (To the point at which normal use of the network would be noticeably degraded) can be called leeching.
  • In online computer games (especially role-playing games), leeching refers to the practice of a player joining a group for the explicit purpose of gaining rewards without contributing anything to the efforts necessary to acquire those rewards. Sometimes this is allowed in an effort to power-level a player. Usually it is considered poor behavior to do this without permission from the group. In first person shooters the term refers to a person who benefits by having his team mates carry him to a win.
  • Direct linking is a form of bandwidth leeching that occurs when placing an unauthorized linked object, often an image, from one site in a web page belonging to a second site (the leech). This constitutes an unauthorized use of the host site's bandwidth and content.

In some cases, leeching is used synonymously with freeloading rather than being restricted to computer contexts.

Possible solutions

TODO

TODO
Complete

Trust & Reputation

Due the volatility and modularity of peer to peer networks, peer trust as mentioned in the reference to building communities around the network objective and user participation is one of the few motivations or lubricant for the online transactions of services or resources that falls ultimately to the creator. Having an average good reputation in participants will also boost the trust on the network and the systems built on it.

TODO

TODO
Complete

File sharing

Traditionally, file transfers involve two computers, often designated as a client and a server and most operations are for the copying files from one machine to another.

Most WEB and FTP servers are punished for being popular. Since all uploading is done from one central place, a popular site needs more resources (CPU and bandwidth) to be able to cope. With the use of P2P, the clients automatically mirror the files they download, easing the publisher's burden.

One limitation of most P2P protocols is that they don't provide a complex file-system emulation or a user-right system (permissions), so complex file operations like NFS or FTP protocols provide are very rare, this also has a reason to be, since the networks is decentralized a system for the authentication of users is hard to implement and most are easy to break.

Another concept about P2P transfers is the use of bandwidth, things will not be linear, transfers will depend on the availability of the resource, the load of the seeding peers, size of the network and the local user connection and load on its bandwidth.

As seen before, Downloading files with a restrictive copyright, license or under a given country law, may increase the risk of being sued. Some of the files available on these networks may be copyrighted or protected under the law. You must be aware that there is a risk involved.

Receiving

TODO

TODO
...authentication... ...digital signature...

from multiple sources (segmented downloading, swarm)

Multiple source download, (segmented download, swarming download), can be a more efficient way of downloading files from many peers at once. The one single file is downloaded, in parallel, from several distinct sources or uploaders of the file. This can help a group of users with asymmetric connections, such as ADSL to provide a high total bandwidth to one downloader, and to handle peaks in download demand.

This technique can not magically solve the problem, in a group of users that has insufficient upload-bandwidth, with demand higher than supply. It can however very nicely handle peaks, and it can also to some degree let uploaders upload "more often" to better utilize their connection. However, naive implementations can often result in file corruption, as there is no way of knowing if all sources are actually uploading segments of the same file. This has led to most programs using segmented downloading using some sort of checksum or hash algorithm to ensure file integrity.

Resuming

TODO

TODO
Complete

Preview while Downloading

Is may help to let users preview files before the downloading process finishes and as soon as possible, this will improve the quality of shares on the network increasing users confidence and reducing lost time and bandwidth.

Security

TODO

TODO
Complete

Poisioning and Pollution

Examples include:

  • poisoning attacks (e.g. providing files whose contents are different from the description)
  • polluting attacks (e.g. inserting "bad" chunks/packets into an otherwise valid file on the network)
  • insertion of:
    • viruses to carried data (e.g. downloaded or carried files may be infected with viruses or other malware)
    • malware in the peer-to-peer network software itself (e.g. distributed software may contain spyware)
TODO

TODO
Complete

Distributed Proxy

TODO

TODO
Complete (Tor,Squid)

VoIP

TODO

TODO
Complete

Distributed Streaming

TODO

TODO
Complete (Freecast)

Priority settings

Enabling the dynamic set of priorities on transfers will not only keep users happy but provide an easy way to boost transfers on highly sicked content increasing the speed of replication of the same on the network.

One can even go a step further and permit a by resource configuration, enabling the removal or configuration access rights to each resource, like on a file system or even enable a way to permit a market for free trading of data letting users set a specific ratio for that resource.

Bandwidth Scheduler

Managing local resources is important not only to the local user but to the global network. Managing and enabling control of the application use of bandwidth will serve as incentive for users to improve how they manage that resource (what and how it is being used) and if taken in consideration by the application as a dynamic resource it can have positive effects on the global network by reducing wasting.

Many of the actual P2P applications enable users such control of their bandwidth, but not only P2P benefits from this strategy, today with a significant part of most computers connected to the Internet, managing this scarce resource is of top most importance. One example is Microsoft's Background Intelligent Transfer Service (BITS) aimed at enabling system updates or even the MS IM service to transfer data whenever there is bandwidth which is not being used by other applications, of note is also the ability to use the BITS technology since it is exposed through Component Object Model (COM).

"New" models

Fault-Tolerant Web Sites

Many people have speculated that peer-to-peer file sharing technology could be used to improve wiki and other kinds of Internet services.

High quality video or large files distribution

The Internet infrastructure was not designed to support broadcasting. P2P partially solves this infrastructural bottleneck by switching the server or content provider from a single point to a decentralized infrastructure, that depends not on the specific network limitations but on the protocol that optimizes the distribution and its popularity.

TODO

TODO
Complete, cover “multicasting”

In February 2008 the European Union announced its commitment into a four-year project that aims to create an open source, peer-to-peer BitTorrent-like client called P2P-Next, based on an improvement of the Delft University of Technology python project Tribler. The EU will contribute 14 million euros (£10.5 million, $22 million) into this project and another 5 million euros (£3.7 million, $7.4 million) will be added by another 21 partners that includes the European Broadcasting Union, Lancaster University, BBC, Markenfilm, VTT Technical Research Center and Pioneer Digital Design Center Limited.

Real Time Video

Transmission of live events to millions of people using the actual infrastructure imposes limits on the quality of the output and high expectations on the hardware resources, not only on network resources but on the encoding and playback capabilities on each side of the transfer.

This has lead to the use of P2P network, as an attempt to save server bandwidth. An example of this new aproach is the MSR Asia Peer-to-peer Video Broadcast System ( http://research.microsoft.com/en-us/projects/p2pbroadcast/ ) from Microsoft Research Asia that was used in the 2008 Beijing Olympic Games claiming the use of more than two million Internet peers.

TODO

TODO
Complete, cover data streaming

Hardware

Traffic Shapers

Set Top Boxes

P2P technologies can also be used to provide a means to at low cost distribute content in an automated way.

Using a peer to peer architecture directly connected to a broadband line, a set top box (a stripped down PC of sorts), with an operating software and some storage space can for instance provide a service similar to video on demand.

VUDU

VUDU ( http://www.vudulabs.com/ ), thousands of movies delivered directly to your TV, it doesn't require a PC and is independent of your cable or satellite TV service.

TODO

TODO
TiVo, WebTV, Openwave, 2Wire, Slim Devices, OpenTV, and Danger

Mobile Peer-to-Peer Computing

PEPERS Project

The PEPERS project (http://www.pepers.org/) focus is to design, implement and validate a reliable platform with high-level support for the design, development and operational deployment of secure mobile peer-to-peer applications for future Ambient Intelligent (AmI) environments. The platform will greatly assist the work and benefit both mobile application and service providers and service users. The project will address issues related to security, privacy, trust and access control in mobile peer-to-peer (p2p) systems by proposing a relevant framework architecture. The framework will include support for policy-based security management for mobile systems. The specific thematic focus of the project (that reflects also in the selection of the pilot scenarios that will be implemented) is the collaboration among teams dispersed over a geographical area. The consortium partners come from 4 EU Member states (Greece, UK, Italy and Cyprus) and include key technology providers and industry players, and leading academic institutions, as well as users partners from diverse business domains (media and journalism, security services) with increased needs for secure collaboration through advanced technological solutions that bring significant expertise and knowledge in PEPERS related technologies as well as the underlying operating business models.

From a technical point of view the project will focus on:

  • the definition of appropriate security services for mobile peer-to-peer applications over suitable protocols
  • the analysis and design of possible platforms and interfaces in mobile devices that can provide security services to support peer-to-peer applications
  • the definition of interfaces based on open standards that will allow secure mobile access to application servers

The secure use of mobile peer-to-peer applications based on the project technology will be validated in real-life pilot applications supporting collaborative work in two operational domains:

  • media and journalism: reporters working on assignment need to be able to record, edit and exchange information in a way that will protect and monitor intellectual property rights and the interests of the organization that employs them. This is particularly relevant as it covers both general issues of a mobile workforce and more specific issues related to the media application domain.
  • physical security: guards and mobile patrols need to be able to receive and transmit sensitive customer information in a dynamic environment when they are called to respond and co-operate in case of ad-hoc exceptional situations.

External Links

Personal tools
Create a book