Peer-to-peer and the future of distributed applications

© 2001 Mixter
	Peer-to-peer and the future of distributed applications

Index

What is P2P

Goals and applications for P2P

How P2P works

Security issues and solutions

The Hacktivismo project

What is P2P

Definitions

Peers: Computers communicating with each other mutually while playing identical roles. Also called nodes.

Peer-to-peer: A decentralized technology of peers that communicate with each other while playing identical roles on a network.

Client: Part of a P2P application which lets the user direct the information exchange and which initiates connections.

Server: Part of a P2P application not directing the information exchange, providing access to data or resources, and accepting incoming connections.

Neighbouring Node: A node that is directly connected to the node in question.

Metadata: Data describing other data, e.g.: filenames, indexes, content labels.

Characteristics and features

Decentralization - leads to increased performance, scalability and disruptiveness.

As a network grows bigger, all centralized points will become weak points in its infrastructure. Example: The internet and the DNS system, or transatlantic peering points.

Opposed to the traditional client/server model of standalone ftp/http/mail/etc. servers, peers are constantly connected to the network and constantly exchange information.

Peer-to-peer...

can utilize the unused resources of machines "at the edges of the internet" (PCs, dialups with dynamic addresses)
is designed to handle unpredictable routing topography and data flows
often creates addresses and metadata for things other than machines. Example: Freenet creates addresses for the same content intentionally copied and spread across multiple machines ("superdistribution").
often means information will be exchanged over an untrusted public network.

Goals and applications for P2P

Current applications of distributed and P2P technology

Area Since ca. Examples

File-Sharing 3 years Gnutella, Napster*

Distributed Computing 4+ years Seti@Home**, Distributed.net**, DistributedScience**

P2P Search Engine 1 year OpenCOLA, Some bots/agents (see botspot.com)

P2P Communication 4+ years ICQ*, IRC*, Eggdrop, Aimster

Edge Services** 2 years Intel's upcoming edge services

Device Intercommunication 1 year Jini, Bluetooth

Anonymity/Anti-Censorship 1 year Freenet, Onion Routing, Hacktivismo, Red Rover

* = Does need central or hub servers
** = Distributed but not "real" P2P: peers don't communicate directly with each other

Area	Since ca.	Examples
File-Sharing	3 years	Gnutella, Napster*
Distributed Computing	4+ years	Seti@Home, Distributed.net, DistributedScience**
P2P Search Engine	1 year	OpenCOLA, Some bots/agents (see botspot.com)
P2P Communication	4+ years	ICQ, IRC, Eggdrop, Aimster
Edge Services**	2 years	Intel's upcoming edge services
Device Intercommunication	1 year	Jini, Bluetooth
Anonymity/Anti-Censorship	1 year	Freenet, Onion Routing, Hacktivismo, Red Rover

Note: DDoS would have a ** here, because a decentralized DDoS tool doesn't exist yet. It would probably be a problem to make a stealthy and reliable P2P DDoS network, because of the low but constant traffic or P2P nodes.

Possible future trends affecting P2P

Future internet content might require much more bandwidth
Migration from dialup to broadband
Migration to IPv6 could mean less dynamic addresses
More censorship laws or even anti-P2P laws

Future goals of P2P

Anonymity, anti-censorship services, and decentralized information hosting. Increasingly deployed in totalitarian countries with excessive censorship of the internet.

Easier ways to find shared content and data, and a reduced risk of losing data (by HD crash, viruses, intrusions, etc.), through multiple copies and search indexes.

Open business-to-business and logistics networks, working through "supply" and "demand" messages of various types, offering fast exchange of computer resources and materials.

Transferring previously signed cybercash with true anonymity, using anonymous aliases for identifying transaction partners. Providers of anonymous cybercash might develop such systems to avoid liability for their customer's actions and to attract more people.

General access to storage space, CPU cycles, content, even rendering capabilities of video cards or other hardware. Distributed computing requires better virtual languages for executing untrusted code and offering access to resources securely.

Net infrastructure. Today, if 1000 people in Europe request the same web page in the US at the same time, it will travel across the Atlantic 1000 times. Transparent collaborative P2P routing might join technologies like multicasting, edge services, and transparent caching in the future to enhance effectivity of bandwidth usage on the net.

Conclusions

The key problems of public distributed P2P systems are: mutual trust, application security and data integrity.

Things you can't or shouldn't implement as P2P are systems where users have to receive the same data simultaneously, and systems in which the user conceptually has to communicate with a central party.

Peer-to-peer and decentralization can make old protocols interesting again, and put them to new uses, for example, HTML, XML and HTTP. A P2P-extended web could help end users to contribute actively with their own content.

Decentralization and peer-to-peer models have probably more applications than we can imagine. In any areas where scalability and capacity increasingly plays a role, decentralized systems will be implemented eventually.

How P2P works

Different types of decentralized P2P applications and protocols have different structures of user data, and different ways of using that data, but they all share an underlying infrastructure.

Components of a typical P2P application

The client component

provides a user interface
sends requests and active messages
connects to and contacts other peers

The server component

processes incoming requests and sends responses
provides access to resources (e.g. files, information, data processing)
accepts incoming connections

The data component

stores data or metadata
exchanges data between server and client components
gathers and handles address list of other peers

The routing component

sends and receives messages from and to the client and server parts
manages duplicate messages, TTLs, and tries to optimize network performance
optionally handles P2P key exchange and cryptography
ensures propagation of locally and remotely originating messages

Propagation of messages in a decentralized network is usually done according to the source/destination fields of the application protocol:

Routing and P2P topography

Messages can be broadcasted, which means they are sent to all directly connected hosts by each peer. For example, pings or file search requests.

Messages can also be routed, which means they will be sent only to a particular location, after traveling through a chain of other peers.

Since the topography of a P2P network is semi-random and cyclic, a cache of recent identifiers unique for each message, and a decreasing TTL value in the P2P protocol can be used to prevent circulating redundant messages.

Security issues and solutions

General security recommendations

The authenticity of peers in a distributed network cannot be trusted unless reputation can be assured from a source outside of the conventional communication of the distributed network. Problem: this leads to some centralization.

A secure distributed protocol should generally not let a remote peer make you:

execute or pass data from the network to system or library calls unchecked
establish active connections and send data originating from a remote peer
interact with hosts outside of the distributed network, at least, only within strict limits of bandwidth, connections and data
use locally firewall-breaching techniques (e.g. push routing) unless explicitly configured by the user

As additional protection for the user and his anonymity, the following general policies should be configurable in distributed applications:

ability to block the uploading of own data
ability to use a virtual IP address or an alias
ability to block IP addresses and network ranges from being neighboring peers
ability to ignore and drop request messages from certain addresses and aliases

Stealth and anonymity of peers

Spying and malicious parties have two basic ways of monitoring traffic of a decentralized P2P network:

Traffic analysis is used to analyze content and addresses on a public network, and to determine that a particular protocol or form of communication is actually taking place.

Peer-to-peer SSL connections can help obscuring the content to outside parties (each node exchanges P2P headers and payloads with direct neighbors through SSL-encrypted channels).

Traffic analysis can go beyond analyzing the content. For example, it can find data belonging to an encrypted P2P protocol if it often sends data packets of the same size. Padding of data packets to a random size can prevent such kinds of analysis.

Eavesdropping means to determine who is talking to whom, and what data is exchanged. It consists of methods to subvert existing principles of anonymity.

Anonymity means that two parties can communicate while one or both cannot be identified by the other. Using the common internet transport protocols, this is impossible, however, peer-to-peer can make this possible.

The TCP/IP headers are like an envelope which bears the addresses of source and destination and contains the user data. In a peer-to-peer system, we can discard one (when routing) or both (when broadcasting) of those addresses by routing our messages through our own P2P application level protocol, replacing source and destination addresses with virtual addresses or aliases, thus delivering only the now anonymized content.

For senders and requesters of data to remain anonymous, the remaining problem is that each node always knows the IP addresses of its neighboring nodes, the nodes which are directly connected to it.

Therefore, if your neighboring nodes want, they can still monitor which files you request or send and correlate to your IP address. A spy - a government agent, your boss, the RIAA, or whoever - could contribute own hub servers to the P2P net, ensuring that many nodes directly connect to his node(s), and then still keep a log of who sends and requests what.

There are a few workable, but imperfect solutions to this dilemma:

only making outgoing connections through a HTTP/SOCKS/SSL proxy (Problem: A proxy cannot be fully trusted; proxy owners can be supoena'd)
only making direct connections to "friendly peers" that you know and trust (Problem: your origin can probably not be identified but it is hard to find friendly peers in the first place, and those peers might then be at risk instead)
exchanging sensitive data with a trusted peer over "ssl-through-ssl" (See next paragraph)

Confidentiality and integrity of distributed data

A weak point of open P2P networks in which anyone can contribute, is the integrity of the data exchanged. Even with cryptographic or specific protection, such as in current distributed computing projects, it is questionable if a user couldn't reverse engineer his client or its traffic, and manipulate it.

Providing data through a distributed networkis not hard, but when it comes to offering authentic data, cryptographical signatures or other forms of guaranteeing authenticity become a necessity.

SSL-through-SSL using trusted peers, a technology developed by Hacktivismo, and first implemented by our team member Paul B., is one possible solution to this problem. It also solves the problem of protecting secrets while still being able to send them over untrusted peers.

Public keys or certificates must be exchanged between Peer 1 (Requesting Peer) and Peer 5 (Trusted Peer). While the protocol header of the P2P packet can and must be read by each peer, the payload part is encrypted with the trusted peer's key and stays encrypted while being routed by middlemen peers.

The privacy of a SSL-through-SSL transmission is moved beneath the application layer, limited to the payload, therefore it is made application independent.

Trusting SSL in a distributed environment

To prevent man-in-the-middle attacks against SSL, the trusted peer's key or certificate itself must be obtained through a channel that ensures authenticity, e.g. it must be downloaded from a secure, certified web site.

If certificates or public keys in a P2P network are not either distributed through out-of-band methods, or at least signed by a trusted CA, the identity of the keyholding peer cannot be trusted. It can then at best be used for traffic encryption against the monitoring efforts of third parties *outside* of the peer-to-peer network.

The Hacktivismo project

The Hacktivismo team has been brought together by the cDc. We are developing a distributed application with the goal to defeat censorship and surveillance technologies.

We have decided to do this because of growing international censorship, to give people access to free information, and to prevent legal actions against people in unfree countries who try to get access to politically incomfortable material.

At a certain point of development, hacktivismo will go open source and also publish detailed informations and documentation.

In general terms, our program provides a decentralized peer-to-peer network for doing fully anonymous proxy downloads of files and web sites, instead of locally storing and sharing files.

Our goal is to make all traffic anonymous and stealthy enough to bypass any firewall or censorship system from the "inside", i.e. from totalitarian countries.

We are working with a handful of well-known developers and other people, as well as human rights groups, on the goal of finishing and safely distributing this application to the people who need it.

Questions or comments about the application
Ideas for improvement about evading traffic detection and tampering
General questions