chapter2.tex

\chapter{Peer-to-peer Networks and Darknets}


\section{Peer-to-peer (P2P) Networks}

In this section a short history of the internet leading to the development and spreading of peer-to-peer network architecture is given. The key concepts and properties of peer-to-peer systems are explained. It should provide a basic understanding of differences and advantages of the peer-to-peer network design.

\subsection{From classic client-server architecture to distributed networks}

The Internet emerged from several military and sience research networks, with ARPANET beeing the most commonly known. It was planed and designed as a decentralized telecommunication network resiliant to outages. Although it is resiliant on a network level, most its services have a centralized structure. On failure not the whole network will fail, but a service, as a website like \emph{www.tu-darmstadt.de} or its email system, can come unavailable easily.

This originates from the client-server architecture used by most protocols and services. The clients tries to find a single server to send its request to. The server processes the request and sends the response back. If the server fails after the client has sent its request and before a response can be send, the request fails too. There are several methods in protocol and network design to prevent such failures.

The basic approach is to devide the responsibilities in different independent domains. For example an email server is not responsible for all the emails in the world but only for those of one (or more) domains. Note that such a domain does not necessarily mean a domain name in the DNS system but can be an arbitrary domain like a company, an university or just a group of people.

The next step in reducing the responsibility is to reduce what a single component is responsible for. The most important approaches are the avoidance of single points of failure, and the introduction of redundancy and load balancing, which distributes the requests over different servers. However, all approaches on this level scale very badly and therefore achieving fault tolerance or dealing with a large amount of clients or requests is expensive on this level.

Taking the splitting of responsibility to the maximum, every entity is responsible for everything. This is basicaly the idea behind the peer-to-peer architecture in which every client is simultaniously a server. Naturally not every client an be responsible for everything, but each of them is participating in the service the network provides, serving a fraction of it while for each fraction there are multiple, redundant ``servers''. This  both reduces the load of every single server and simulataniuslly adds redundancy. Since there is commonly no distinction between clients and servers in a P2P network, a participant is called a node or peer.

\subsection{The rise of P2P systems and its usage today}

With the Domain Name System (DNS) and the Simple Mail Transfer Protocol (SMTP) some parts of the peer-to-peer design are already in use since the early days of the internet. More than that, P2P ideas were researched, developed and used in smaller scale for a long time. But it took until the late '90 and file sharing networks like Napster to popularize the P2P concept.

TP2P networks experienced a huge increase in participation and usage when sharing copyright protected material over them started, and gave them an infamous image. Under this the term P2P suffered for quite a time. Just in the past few years several larger companys began to utilize peer-to-peer systems, e.g. for delivering larger sets of data to a huge amount of customers. This improved the acceptance and the image of P2P technology.

Today P2P systems are more and more used as a resiliant and scalable basis for communication, content delivery, and distributed storage. These benefits are both used in public and in unclear twighlight or even illegality. 

\subsection{What a peer-to-peer overlay is}

The Internet connects many devices worldwide over different, commonly multiple connections. In this global network nodes can take part in ``virtual'' networks over the top of the underlying network. In this overlay network, nodes are connected by logical links, each of which having a corresponding path in the underlying network. Therefor the P2P network graph is a subset of the underlying networks topology graph. \todo{Bild zu Overlays, ggf wikipedia? lizen?}

Commonly a seperate addressing scheme for communication in such a network is used. In most cases a node is identified by just a sequence of bits, its ID. It is usually represented as a number in decimal or hexadecimal form. All possible IDs together are called the address or ID space and its size varies from network to network but is constant within one network.

Both these properties together form a more or less idependant network put on top of the underlying one. This is called a peer-to-peer overlay.


\section{Darknets: privacy preserving P2P overlays}

In the last section we gave a brief overview of the key aspects of peer-to-peer systems and their envolvement over time. As any tool and technology, P2P systems have advantages and disadvantages and can be used for good or evil. Now we will discuss how the demand for privacy led to a more specialized class of peer-to-peer networks, the darknets.

\subsection{Consequences of decentralisation}

As discussed in section 2.1 the essence of P2P systems are their decentralisation. Not a single server but virtually every pariticpant of the system is responsible for serving requests which results in a high failure resiliance. While this is a valuable property for its users, it can be problematic if trying to stop the service of a network. Single nodes clearly can be filtered or blocked by their ISP or a firewall. Though, this does not prevents the unauthorized distribution of copyright protected material or repressed communication as there will be still many nodes left.

\subsection{The demand for privacy preservation}

The distribution of objectonable information can therefor not be effectivly prevented but the operator of a node could still be prosecuted, if their membership and identity is revealed. In the used file sharing networks the communication between two nodes is indeed encrypted and not readable by outstanders, but in order to exchange infromation the nodes have to connect to each other. Therefor their identity, in the internet commonly the used IP address at a given time, is revealed to each other.

This is, not limited to file sharing networks. Contrary to content delivery networks, any regime undesired communication can be somewhat pushishable. This applys to mancy cases, for example a whisle blower in a missbehaving company or regime critics under a dictatorship. For an free society the freedom of speach and the ability to inform oneself is important. Therefor it is essential to be able to communicate whithout surveilliance and obstraction.


\subsection{Key concepts of darknets}

For those oppressed environments a more privacy preserving class of peer-to-peer networks has evolved, the darknets. In the following section the main differences to classical P2P systems are briefly noted while they get explained in detail in chapter 3.

Exchanging any kind of information on the internet leads to the necessaty of directly connecting of at least two participants. As explained, this leads to revealing the identity of those in the underlying network to each other. To overcome the impact of this privacy relevant information disclosure, connections are only established between mutually known and trusted parties.

Additionally to communicating with kown and trusted peers only, no information about which peers a node is connected to are passed. On compromisation only the identiy of directly connected peers are affected. Anyway, no node can be held responsible for contact to other nodes, since not even participants know to which peers a node is connected to.

But since only communicating with trusted peers would end in a very limited reachability, communication between not directly connected nodes has to be forwarded on some way. To do so the identity in the underlying network and information about the topology, who is connected to whom, have to be concealed.

To achieve this, forwarded messages are modified to originate from the forwarding node itself. Returning answers are modified accordingly and are passed back to the source of the original request. \todo{grafik die das erläutert?} As this is done on every node, the identity of the origin of a message is preserved and no information about the topology is reveald.


\section{Darknet characteristics and resulting challenges}

Section 2.1 explained the methodical differences of darknets to classical P2P networks. Now follows, althoug already briefly touched, a discussion of the arising characteristics and the thereby resulting challenges for the practical use of darknets.

\subsection{Hidden Topology}

In summary the membership of a node in a darknet is only known to its trusted peers. The other way, all a node knows about the topology is the list of its online peers. If the destination node of a message is not contained in this list, it has, at least in the basic darknet model, no clue where to send it to.

An alike situation can be found in a wireless sensor network (WSN), where the topology of the network is changing and unclear to the single nodes. It is caused by the limited rage of the nodes and their constrains in energy consumption. To overcome this problem, WSNs often use the information they can get from forwarded messages: the rough direction the source node of a message. This is prevented in darknets by the hop-by-hop anonymity.


\subsection{Difficulties for routing ...}

This high rate of protection of privacy relevant information comes with numerous difficulties in designing and evaluating darknets that should be simultaneously resilient and scalable. When sending a message to a node that is not a connected neighbor, a routing decision has to be made. In conventional networks the next node can be chosen on the basis of topology information about the network, e.g. in form of a classical routing table or structured overlays in P2P systems.

Because any topology information about the network is confidential, they are not distributed. Therefore the topology of a darknet can in general not be used to make routing decisions. This holds as well for meta topology information such as the origin of a message. From this the approximate direction a node lies within could be estimated.


\subsection{... and evaluation}

Comparable difficulties arise for measuring any kind of quality while evaluating darknets. For example it is important to be able to compare the path a message travels along to the best possible path from its source to its destination, however a quality of a path is measured. This contradicts so fundamentaly the protection of privacy, that it is virtual impossible to observer and analyze a real world darknet without disrupting its actual usecase. So other then other networks, both ``normal'' and P2P networks, some kind of extra study environment is required to develop and improve darknets.