[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

TAZ Servers and the Rewebber Network Enabling Anonymous Publishing on the World Wide Web by Ian Goldberg and David Wagner

Ich bitte um Entschuldigung für das Versenden von HTML. Es genügt, dieser URL zu


Im Zusammenhang mit WebBlock ist dies der Todesstoß für diese Art von
Kontrolltechnologie: Durch die Verwendung von verschlüsselten URLs ist nicht
erkennbar, welche Information abgerufen wird...

Schöne Idee,

Kristian Koehntopp, SH Online Dienst GmbH, Siemenswall,24107 Kiel
"The problem with most computer languages is that all they do is lengthen
 your resume by a word and a comma, but Perl really has made a lot of people's
 resumes more valuable." -- Larry Wall, Techweb Interview
Title: TAZ Servers and the Rewebber Network Enabling Anonymous Publishing= on the World Wide Web by Ian Goldberg and David Wagner
Read related ar= ticles on=20 An= onymity,=20 Internet=20 publishing, Security=20 and World=20 Wide Web servers.


The World Wide Web has recently matured enough to provide ever= yday=20 users with an extremely cheap publishing mechanism. However, the cu= rrent=20 WWW architecture makes it fundamentally difficult to provide conten= t without=20 identifying yourself. We examine the problem of anonymous publicati= on on=20 the WWW, propose a design suitable for practical deployment, and de= scribe=20 our implementation. Some key features of our design include univers= al accessibility=20 by preexisting clients, short persistent names, security against so= cial,=20 legal, and political pressure, protection against abuse, and good p= erformance.=20 =20



Existing Technology
The Problem at Hand
Related Work
Future Work
Notes<= /a>



Recently the Internet has seen tremendous growth, with the ranks = of new=20 users swelling at ever-increasing rates. This expansion has catapul= ted it=20 from the realm of academic research towards new-found mainstream ac= ceptance=20 and increased social relevance for the everyday individual. Yet wit= h this=20 suddenly increased reliance on the Internet comes the threat that f= reedoms=20 previously enjoyed may be overlooked in the online realm.=20

The World Wide Web (WWW) has recently grown in importance to beco= me perhaps=20 the most important Internet application today. The WWW enables the = ordinary=20 citizen to reach an audience potentially as large as millions. In s= hort,=20 this technology gives us an extremely cheap publishing mechanism: p= rinting=20 presses for the masses.=20

Yet we must not forget that anonymous publishers have played an i= mportant=20 role throughout the history of publication. Freedom of anonymous sp= eech=20 is an essential component of free speech, and freedom of speech is = a critical=20 part of any healthy democracy. For instance, the United States Supr= eme Court=20 has consistently upheld protection for anonymous publication of pol= itical=20 speech. As Justice Stevens wrote in the McIntyre v Ohio Election= Commission=20 majority opinion,=20


Under our Constitution, anonymous pamphleteering is not = a pernicious,=20 fraudulent practice, but an honorable tradition of advocacy and of = dissent.=20 Anonymity is a shield from the tyranny of the majority.

We should be careful not to forget the importance o= f anonymous=20 speech as we move to an increasingly WWW-centric paradigm of commun= ication.=20

Today the WWW includes little support for anonymou= s publishing.=20 The current WWW architecture fundamentally includes identity inform= ation=20 in the URL that is used to locate published documents, and it is ve= ry hard=20 for a WWW publisher to avoid revealing this information. To address= these=20 limitations, we would like to build technology to enable anonymous = publishing=20 on the World Wide Web.=20




We briefly motivate this work with a few potential applications f= or anonymous=20 WWW publishing. We hope these will help provide some of the general= flavor=20 for the requirements and illustrate the goals of such a publication= service.=20

First, we note that anonymous publication is a time-honored tradi= tion.=20 The Founding Fathers of the United States published the Federali= st Papers=20 (arguing for the adoption of the Constitution) anonymously under th= e pseudonym=20 Publius. Before the American Revolution, many had to publish secret= ly because=20 of the very real danger of English prosecution.=20

Across the world, from medieval times to the current age, governm= ents=20 have tried to suppress unwanted speech. Political dissidents have o= ften=20 turned to secret printing and distribution of anonymous leaflets, b= ooks,=20 and other publications. Moreover, dissidents have frequently taken = advantage=20 of technology for expression of controversial ideas: the Protestant= Reformation=20 was greatly aided by the invention of the printing press, which ena= bled=20 widespread distribution of many copies of the Bible; the French "Vo= ice of=20 the Resistance" used nightly radio broadcasts from constantly-chang= ing temporary=20 locations to reach the people during the German occupation; the Uni= ted States=20 used high-power radio stations such as Radio Free Europe during the= cold=20 war to combat censorship behind the Iron Curtain; in past years, ba= nned=20 information was copied through underground channels from person to = person=20 in the Soviet Union, in a process known as samizdat (which is Russi= an for=20 "self-publishing"); and when the Serbian government began jamming t= he Belgrade=20 independent radio station during the Serb-Croat war, Serbian studen= ts used=20 the World Wide Web to mirror broadcasts and combat the government's= censorship=20 [1].=20

Our vision combines many key aspects of these historically-succes= sful=20 approaches. First, we want to take advantage of the highly-favorabl= e economics=20 and vast potential audience available to WWW publishers. Second, we= wish=20 to use anonymity to avoid suppression of controversial ideas: when = the targeted=20 party cannot be identified, it is much harder to bring legal and ot= her pressure=20 on the author or printer. Third, we aim to gain resilience through = the grass=20 roots nature of samizdat channels. Fourth, because government burea= ucracy=20 often limits the speed with which full legal and other pressure can= be brought=20 to bear on targets to time periods measured in months or years, we = wish=20 to enable temporary stations to pop up and disappear just as rapidl= y. Through=20 a combination of these techniques, we hope to build a robust infras= tructure=20 for anonymous WWW publishing.=20



A few definitions are in order. Anonymity refers to the ability o= f an=20 individual to prevent others from linking his identity to his actio= ns. We=20 can divide anonymity into two cases: persistent anonymity (or pseud= onymity),=20 where the user maintains a persistent online persona ("nym") which = is not=20 connected with the user's physical identity ("true name"); and one-= time=20 anonymity, where an online persona lasts for just one use. The key = concept=20 here is that of linkability: with a nym, one may publish a number o= f documents=20 that are all linked together but cannot be linked to the author's t= rue name;=20 by using one-time anonymity for each work, none of the documents ca= n be=20 linked to each other or to the user's physical identity. Forward se= crecy=20 refers to the inability of an adversary to recover security-critica= l information=20 (such as the true name of the writer of a controversial article) "a= fter=20 the fact" (e.g. after the writing is published); where possible, pr= oviders=20 of anonymity services should take care to provide forward secrecy, = which=20 entails (for instance) keeping no logs.=20

We can identify a number of goals for a potential anonymous WWW p= ublishing=20 infrastructure. First, we should be clear that we only concern ours= elves=20 with the problem of anonymity for the publisher. In contrast, some = existing=20 services, such as the Anonymizer proxy [2], aim to provide privacy for the user who is viewing W= WW pages;=20 we explicitly do not attempt to solve that problem. We note that th= e client=20 privacy problem is orthogonal to the anonymous publishing problem, = and that=20 the two problems compose well: if full anonymity is needed (to prot= ect the=20 identity of both endpoints), techniques to anonymize the client wil= l work=20 well in tandem with an infrastructure for anonymous WWW publishing.= =20

We should also point out another problem we explicitly avoid solv= ing.=20 Our infrastructure may reveal the fact that a particular server hol= ds anonymous=20 published data, so long as it does not reveal the data itself or al= low the=20 document to be linked to the author. In short, documents and author= s may=20 each be individually recognized as anonymous, but we aim to prevent= the=20 possibility of linking any one document with its author. As long as= there=20 are sufficiently many anonymous authors (some of whom are respected= citizens)=20 and sufficiently many anonymous documents (some of which are non-co= ntroversial=20 and clearly harmless), there will be "safety in numbers": even if A= lice=20 is identified as an author who has published something ano= nymously,=20 Alice will retain very strong plausible deniability protections - t= he infrastructure=20 will give authorities no reason to suspect that she is involved in = writing=20 anything other than harmless documents.=20

We view anonymous WWW publishing as a problem which can benefit f= rom the=20 experience, approaches, and perspectives of computer security resea= rchers.=20 In particular, we attempt to identify the relevant threats, design = robust=20 countermeasures, take advantage of defense in depth, and apply othe= r standard=20 security engineering techniques.=20

One important goal is to avoid the existence of a central single p= oint=20 of failure. Any such single point of failure becomes a very attract= ive target=20 - a sitting duck for the motivated adversary. Instead, we strongly = prefer=20 to distribute trust to obtain resilience in the event that a few ne= twork=20 nodes are compromised or colluding. We wish to see a means by which= users=20 can protect their privacy, preferably by putting privacy-enhancing = technology=20 directly into their own hands inasmuch as is possible. Where the co= operation=20 of others is necessary to ensure personal privacy, the system shoul= d not=20 be easily subverted by the mere collusion or compromise of a few pa= rticipants.=20

A corollary is that we should attempt to minimize the length of t= ime which=20 a node possesses security-critical information. By limiting the exp= osure=20 of information that would reveal the identity of publishers, we red= uce the=20 severity of the risk of compromise. In other words, supporting forw= ard secrecy=20 is prudent security practice.=20

We must recognize that in practice there will be tradeoffs betwee= n security=20 and other factors such as performance, availability, stability, etc= . Different=20 publishers will likely require different levels of security, and th= e only=20 party who can soundly evaluate the appropriate degree of security i= s the=20 publisher himself. Therefore, an important design goal is a flexibl= e security=20 strength "knob" under control of the individual publishers: publish= ers should=20 be able to select the level of security that is appropriate for the= mselves.=20

At this point, it is probably appropriate to offer a quick summar= y of=20 some possible threats to an infrastructure for anonymous WWW publis= hing.=20 A minimal threat model might concern itself primarily with politica= l, legal,=20 and social attacks, and give less attention to traditional computer= security=20 attack techniques. For instance, an adversary might obtain a subpoe= na from=20 a judge of relevant jurisdiction to compromise a targeted node in t= he distributed=20 infrastructure. Even an adversary with few resources can recruit th= e power=20 of the government to his advantage with such a tactic. Social press= ure is=20 a concern as well; carefully-fabricated complaints to a site admini= strator=20 with power over the targeted node may also suffice.=20

A more cautious threat model will need to take into account the t= hreat=20 of computer intrusion and attacks on the underlying network. Typica= lly,=20 individual hosts are rather hard to secure, and in practice motivat= ed penetration=20 attempts will often succeed in compromising a significant number of= nodes.=20 Furthermore, unprotected network traffic is well-known to be easy t= o monitor,=20 redirect, forge, and modify. In cryptographic circles, the adversar= y is=20 often presumed to have complete control over the network; while tha= t may=20 seem a rather paranoid assumption, we know that any system that can= stand=20 up to that level of attack will provide very strong security. Howev= er, in=20 practice such threat models may lead to unrealistic designs. Theref= ore,=20 we ought to carefully consider the resources available to likely ad= versaries,=20 and tailor the countermeasures to the anticipated threat level. It = is probably=20 realistic to assume that a limited number of nodes will be compromi= sed,=20 dishonest, or colluding; that the opponent may occasionally be able= to successfully=20 penetrate targeted nodes of his choosing; that the adversary may be= eavesdropping=20 on certain segments of the network; and that the opponent can perfo= rm traffic=20 analysis (statistical analysis using only unencrypted header d= ata such=20 as source and destination addresses as well as timing coincidences)= using=20 the eavesdropped information. We will attempt to design an infrastr= ucture=20 that is flexible enough to admit variations that will work under a = wide=20 array of threat models.=20

Lest we get caught up in the details of obscure security threats,= it is=20 important to remember that security systems can only provide securi= ty if=20 they are used. Therefore, we must take care to ensure that usabilit= y is=20 not sacrificed without careful consideration. In particular, we wan= t to=20 preserve the vast potential audience provided by the World Wide Web= , and=20 so we require that WWW pages published anonymously using our infras= tructure=20 be accessible from standard WWW browsers without needing any specia= l client=20 software or anonymity tools. The anonymity services ought to be tra= nsparent=20 to the browser software and WWW page reader. In short, enabling wid= espread=20 deployment and use of this infrastructure is a central goal of this= work.=20


Existing Technology

In this section, we describe some existing privacy-enhancing tech= nologies=20 for the Internet [3], in order to see what part= s of the=20 solutions they offer can be carried over to the problem of protecti= ng the=20 identity of WWW publishers.=20

In past years e-mail was the most important distributed applicati= on, so=20 it should not be surprising that early efforts at bringing privacy = to the=20 Internet primarily concentrated on e-mail protection. Today the les= sons=20 learned from e-mail privacy provide a foundation of practical exper= ience=20 that is critically relevant to the design of new privacy-enhancing = technologies.=20

The most primitive way to send e-mail anonymously involves sendin= g the=20 message to a trusted friend, who deletes the identifying headers an= d resends=20 the message body under his identity. Another old technique for anon= ymous=20 e-mail takes advantage of the lack of authentication for e-mail hea= ders:=20 one connects to a mail server and forges fake headers (with falsifi= ed identity=20 information) attached to the message body. Both approaches could al= so be=20 used for anonymous posting to newsgroups. Of course, these techniqu= es don't=20 scale well, and they offer only very minimal assurance of protectio= n.=20

The technology for e-mail anonymity took a step forward with the = introduction=20 of anonymous remailers. An anonymous remailer can be thought of as = a mail=20 server which combines the previous two techniques, but using a comp= uter=20 to automate the header-stripping and resending process [4].=20 There are basically three styles of remailers; we classify remailer= technology=20 into "types" which indicate the level of sophistication and securit= y.=20

The anon.penet.fi ("type 0") remailer was perhaps the mo= st famous.=20 It supported anonymous e-mail senders by stripping identifying head= ers from=20 outbound remailed messages. It also supported recipient anonymity: = the user=20 was assigned a random pseudonym at anon.penet.fi, the rema= iler=20 maintained a secret identity table matching up the user's real e-ma= il address=20 with his anon.penet.fi nym, and incoming e-mail to the nym= at anon.penet.fi=20 was retransmitted to the user's real e-mail address. Due to its sim= plicity=20 and relatively simple user interface, the anon.penet.fi re= mailer=20 was the most widely used remailer; sadly, it was shut down recently= after=20 being harassed by legal pressure [5].=20

The disadvantage of a anon.penet.fi style (type 0) remai= ler is=20 that it provides rather weak security. Users must trust it not to r= eveal=20 their identity when they send e-mail through it. Worse still, pseud= onymous=20 users must rely on the confidentiality of the secret identity table= - their=20 anonymity would be compromised if it were disclosed, subpoenaed, or= bought=20 - and they must rely on the security of the anon.penet.fi = site=20 to resist intruders who would steal the identity table. Furthermore= , more=20 powerful attackers who could eavesdrop on Internet traffic traversi= ng the=20 anon.penet.fi site could match up incoming and outgoing me= ssages=20 to learn the identity of the nyms. Worst of all, anon.penet.fi<= /tt>=20 becomes a central single point of failure, and it is very dangerous= to have=20 such an attractive target.=20

Cypherpunk-style (type I) remailers were designed to address thes= e types=20 of threats. First of all, support for pseudonyms is dropped; no sec= ret identity=20 table is maintained, and remailer operators take great care to avoi= d keeping=20 mail logs that might identify their users. This diminishes the risk= of "after-the-fact"=20 tracing. Second, type I remailers will accept encrypted e-mail, dec= rypt=20 it, and remail the resulting message. This prevents the simple eave= sdropping=20 attack where the adversary matches up incoming and outgoing message= s. Third,=20 they take advantage of chaining to achieve more robust sec= urity.=20 Chaining is a technique that involves sending a message through sev= eral=20 anonymous remailers; we discuss chaining more completely below in <= a href=3D"#hand">"The=20 Problem at Hand". This allows us to take advantage of a distrib= uted=20 collection of remailers; diversity gives one a better assurance tha= t at=20 least some of the remailers are trustworthy, and chaining ensures t= hat one=20 honest remailer (even if we don't know which it is) is all we need.= Type=20 I remailers can also randomly reorder outgoing messages to prevent = correlations=20 of ciphertexts by an eavesdropper. Note that this feature imposes s= ubstantial=20 delays; such increased latency would of course be unacceptable for = Web applications.=20 In short, type I remailers offer greatly improved security over typ= e 0,=20 though they do have some limitations which we will discuss next.=20

The newest and most sophisticated remailer technology is the Mixm= aster,=20 or type II, remailer [6]. They extend the techn= iques used=20 in a type I remailer to provide enhanced protection against eavesdr= opping=20 attacks. First, one always uses chaining and encryption at each lin= k of=20 the chain. Second, type II remailers use constant-length messages, = to prevent=20 passive correlation attacks where the eavesdropper matches up incom= ing and=20 outgoing messages by size. Third, type II remailers include defense= s against=20 sophisticated replay attacks. Finally, these remailers offer improv= ed message=20 reordering code to stop passive correlation attacks based on timing= coincidences=20 (which adds still more latency). Because their security against eav= esdropping=20 relies on "safety in numbers" (where the target message cannot be d= istinguished=20 from any of the other messages in the remailer net), the architectu= re also=20 calls for continuously-generated random cover traffic to hide the r= eal messages=20 among the random noise.=20

Another new technology is that of the "newnym"-style nymservers. = These=20 nymservers are essentially a melding of the recipient anonymity fea= tures=20 of a anon.penet.fi style remailer with the chaining, encry= ption,=20 and other security features of a cypherpunk-style remailer. A user = obtains=20 a pseudonym (e.g. joeblow@nym.alias.net) from a nymserver,= and=20 mail to that pseudonym will be delivered to him. However, unlike anon.penet.fi,=20 where the nymserver operator maintained a list matching pseudonyms = to real=20 e-mail addresses, newnym-style nymservers only match pseudonyms to = "reply=20 blocks": the nymserver operator does not have the real e-mail addre= ss of=20 the user, but rather sends the e-mail through a fixed chain that=20 eventually ends up at the real user. We will use a similar techniqu= e in=20 the section "The Problem at Hand".=20

With the increasing sophistication in remailer technology, we fin= d that=20 modern remailers have been burdened with a correspondingly complica= ted and=20 obscure interface. To deal with this unfriendly mess, client progra= ms have=20 sprung up to provide a nicer interface to the remailers. Raph Levie= n's premail=20 [7] is the archetypical example. Even so, using= remailers=20 still requires some knowledge; for even greater user-friendliness, = we need=20 this support to be integrated into popular mail handling applicatio= ns. The=20 need for specialized clients has made it difficult to widely deploy= support=20 for anonymous e-mail; from this experience, we conclude that an ano= nymous=20 WWW publishing infrastructure should not require special tools to b= rowse=20 WWW pages created by anonymous authors.=20

The "strip identifying headers and resend" approach used by remai= lers=20 has recently been applied to provide anonymity protection for Web b= rowsing=20 as well. Community ConneXion has sponsored the Anonymizer [8],=20 an HTTP proxy that filters out identifying headers and source addre= sses=20 from the WWW browser. This allowing users to surf the WWW anonymous= ly without=20 revealing their identity to WWW servers. However, the Anonymizer of= fers=20 rather weak security - no chaining, encryption, log safeguarding, o= r forward=20 secrecy - so its security properties are roughly analogous to those= of type=20 0 remailers. Other implementations have since appeared based on the= same=20 approach [9].=20


The Problem at Hand

We now turn our attention to adapting the above-mentioned technol= ogies=20 to the problem of anonymous WWW publishing. There are two important= distinctions=20 that must be made between the e-mail and WWW realms:=20


  • The WWW is an interactive medium, while e-mail is store-and-fo= rward;=20


  • More notably, e-mail is a "push" technology; the source of the = data=20 initiates the transfer, possibly without even the knowledge or co= nsent=20 of the receiver. By contrast, HTTP is a "pull" technology; the in= tended=20 receiver of the data explicitly requests it from the sender.=20

Some of the security in remailer technology is obtained by random= ly reordering=20 and delaying messages (on time scales ranging from minutes to days)= . The=20 first point observes that these delays are unacceptable for WWW req= uests.=20 The second point, however, offers us some extra benefit. As we shal= l see,=20 the security of a system such as ours or of the remailer network in= creases=20 as the number of publicly-accessible cooperating nodes increases. I= n the=20 realm of e-mail, operators of remailers often come under fire when = their=20 services are abused by people sending threatening letters or untrac= eable=20 spam; the undesirability of handling irate end-users causes the num= ber of=20 remailers to stay low, potentially impacting on the security of the= overall=20 system. By contrast, an HTTP server cannot initiate a connection wi= th an=20 unwilling client and send it data when no request was made. This "c= onsensual"=20 nature of the WWW should cause fewer potential node operators to be= come=20 discouraged, and lead to corresponding increases in security.=20





Here, then, is the plan. We situate around the world a number of = nodes=20 which we call rewebbers (collectively, these nodes are kno= wn as=20 the rewebber network). Each node is essentially an HTTP pr= oxy which=20 understands "nested" URLs (i.e. URLs of the form http://proxy.c= om/http://realsite.com/);=20 most existing HTTP proxies have this behavior. The basic idea is th= en to=20 publish a URL such as the one above, which points to proxy.com<= /tt>=20 instead of realsite.com.=20



Figure 1: Schematic diagram of a locator:= boxes=20 with a subscript indicate encryption with a public key; the K's are= DESX=20 keys to use to decrypt the data that is returned. A, B, and C are s= ites=20 running rewebbers.=20

An obvious objection at this point is that this does not hide the= real=20 server name. We solve this problem by using public-key cryptography= to encrypt=20 the real server name so that only the rewebber can read it. In this= case,=20 the URL would look something like http://proxy.com/!RFkK4J...d_= 4w/.=20 The rewebber at proxy.com, upon receiving this URL, would = first=20 decrypt it (the fact that it is encrypted is indicated by the leadi= ng !=20 in lieu of the expected http://), and then proceed to retr= ieve=20 the nested URL in the normal fashion. We call such encrypted URLs <= em>locators.=20

This mechanism hides the real location of the server from the cli= ent,=20 but has other flaws. First of all, once the client has retrieved th= e data=20 from the rewebber, he could use one of the more aggressive WWW sear= ch engines=20 to try to find where the document originally came from. We solve th= is problem=20 by encrypting the document before storing it on the server. Thus, i= f the=20 document is accessed directly (through a search engine, for example= ), it=20 will be random data (but see the Section below, "Future=20 Work", for an interesting extension). The encryption used here = ought=20 not be public-key; it should be a symmetric-key algorithm.=20

We used DESX [10] in our implementation, but = any strong=20 symmetric-key algorithm would do. Further references to DESX should= be taken=20 as referring to whatever symmetric-key algorithm is in use. The DES= X key=20 is given to the rewebber in the encrypted part of the locator; that= is,=20 when the rewebber decrypts the locator, it finds not only a URL to = retrieve,=20 but a DESX key with which to decrypt the document thus retrieved. I= t then=20 passes the decrypted document back to the client.=20

This technique of encrypting the document stored at the server ha= s another=20 benefit as well: the document can be padded in size before being en= crypted,=20 and the rewebber will truncate the document to its original size be= fore=20 passing it back to the client. The reason for doing this is similar= to that=20 for encrypting the document in the first place; if the retrieved do= cument=20 is 2843 bytes long, for example, a WWW search for encrypted documen= ts near=20 that size would quickly narrow down the possible choices. To thwart= this,=20 we always add random padding to the end of encrypted documents so t= hat their=20 total length is one of a handful of fixed sizes. We chose sizes of = 10000,=20 20000, 40000, 80000, etc. bytes, as data in [11= ] indicate=20 that the median size of files on the WWW is a little below 10000 by= tes.=20

We do not use latency-intensive techniques such as full Chaumian = mixes,=20 message reordering, or injecting random delays; interactive WWW bro= wsing=20 is too delay-sensitive to allow those techniques. This is unfortuna= te, but=20 we feel that the resulting decrease in security is somewhat mitigat= ed by=20 some other features of our system.=20

If the example above were used as it stood, the rewebber would se= e both=20 the real URL of the document, as well as the decrypted document its= elf.=20 Thus, if the rewebber is compromised (if its private key leaks out,= it is=20 forced to reveal information, or even had it been being operated by= an adversary=20 all along), it could reveal the mapping between the real site and t= he document.=20 The solution is to implement chaining, in the same manner = as the=20 type I remailers mentioned earlier.=20

To implement chaining, we simply replace the URL in the encrypted= portion=20 of the locator with another complete locator, one which points to a= rewebber=20 at a different site (and preferably, in a different legal jurisdict= ion).=20 We can repeat this process, thus making the rewebber chain listed i= n the=20 locator (in encrypted form) as long as we like. A schematic diagram= of a=20 locator containing a rewebber chain of length 3 appears in Figure=20 1. In the next subsection, we identify tradeoffs associated wit= h choosing=20 a length for a rewebber chain.=20

When using chaining, the document will need to be multiply encryp= ted on=20 the server. To do this, the publisher randomly selects a DESX key f= or each=20 rewebber in the chain, encodes them into the locator as depicted in= Figure=20 1, and iteratively encrypts the document. In this way the publi= sher=20 forms a locator and announces it to the world (by using an anonymou= s remailer=20 to post it to a newsgroup, for example). Note that all of the secur= ity parameters,=20 including the number of rewebbers in the chain, are under the contr= ol of=20 the publisher; individual publishers may adjust this "knob" to suit= their=20 anonymity needs.=20

The main benefit of chaining is that only the rewebber closest to = the client=20 ever sees the decrypted data, and only the rewebber closest to the = server=20 knows where it is really getting data came from. In order to link t= he two,=20 the cooperation of every rewebber in the chain would be necessary. = We note=20 that this avoids the existence of a single point of failure, and al= lows=20 us to distribute trust throughout the network. The use of chaining = and locators=20 here is the rewebber's counterpart of the use of "reply blocks" by = nymservers=20 (mentioned above). Remember that there will likely be many opaque W= WW requests=20 flowing through the network at any one point; there is "safety in n= umbers".=20

Finally, we point out a serendipitous aspect of using HTTP proxie= s to=20 implement rewebbers. These proxies tend to have caches, in order to= improve=20 network performance. Somewhat conveniently, they also turn out to g= ive us=20 an extra benefit. The fact that data is being cached at the interme= diate=20 rewebbers in a chain makes it less likely that requests will actual= ly get=20 forwarded all the way back to the publisher. This makes traffic ana= lysis=20 somewhat harder to accomplish. Note that traffic analysis relies on= having=20 a large sample of network data to perform statistical analysis; by = reducing=20 the amount of traffic on the network, less data is available to a t= raffic=20 analyst, and so statistical patterns will be harder to discern. Als= o note=20 that this form of caching gives us the political benefit that reweb= ber nodes=20 never cache unencrypted data, and so it is less likely they can be = held=20 accountable for controversial material they happen to be caching.=20



In this section, we make some computations regarding the security= and=20 the reliability of the rewebber network described above. Here, a node=20 is a site running a rewebber proxy; a compromised node is = a node=20 for which logs are available to an adversary (in order that the adv= ersary=20 may learn which decrypted URL (or locator) corresponds to which enc= rypted=20 locator), or simply one for which the private key is so available (= whether=20 because of host compromise, or because the operator of the node was= an adversary=20 in the first place). Let n be the number of nodes in the rew= ebber=20 network, and let c be the number of compromised nodes. Suppo= se a=20 publisher wished to pick k nodes through which to chain. In = the worst=20 case, we would have no clue which nodes were compromised (or e= ven which=20 were more likely to be compromised), and he would choose the k nodes=20 uniformly at random with replacement (as we assume k is smal= l as=20 compared to n, sampling with or without replacement makes li= ttle=20 difference, but sampling with replacement gives a worst-case bound)= .=20



Figure 2: Probability of compromised path=


We now calculate the probability of a compromised path. = Here=20 our threat model is that all of the nodes in the chain collude to d= ecrypt=20 the publisher's published locator, in order to determine the publis= her's=20 WWW site (assumedly from that information learning some information= about=20 his identity). In this model, the worst-case probability of a compr= omised=20 path is 3D"tex2html_=. A plot of this value for various values of k = appears=20 in Figure 2. Looking at this result, it wo= uld seem=20 that choosing a larger value of k would always be preferable= .=20

However, we now consider reliability. We call a node broken=20 if for some reason it is unable (or unwilling) to decrypt locators = and forward=20 the resulting URLs (or locators). If b is the number of brok= en nodes=20 in the network, then a chain of k rewebbers will suffer = path=20 failure if any of the nodes on it is broken. Assuming that the= broken=20 nodes are randomly distributed (at least with respect to the nodes = in the=20 chain), this occurs with probability. This value is plotted for var= ious=20 values of k in Figure 3.=20



Figure 3: Probability of path failure


When choosing k, then, a tradeoff needs to be made between= the=20 probability of path compromise and the probability of path failure.= The=20 smart choice is usually to give higher priority to preventing path = compromise;=20 path compromise is catastrophic, but path failure is "fail-safe": r= equests=20 will not succeed until a new locator is created and published, but = the identity=20 of the publisher will not be revealed.=20

Some additional heuristics can be used to further decrease the pr= obability=20 of path compromise. First of all, a slight decrease can be gained b= y choosing=20 the rewebbers to form a chain without replacement (so that no reweb= ber appears=20 twice in a chain). More interestingly, we could use knowledge about= the=20 independence (or lack thereof) of rewebbers. It seems logical to as= sume=20 that the closer two rewebbers are in terms of administrative contro= l, the=20 more likely it would be that their compromises would be correlated.= For=20 example, two rewebbers run by the same person are likely to have hi= gh compromise=20 correlation; rewebbers run in the same administrative domain would = have=20 somewhat high correlation; rewebbers in the same legal jurisdiction= would=20 have higher correlation than rewebbers in different countries, etc.= Therefore,=20 a clever choice of rewebber chain would tend to include rewebbers b= eing=20 run by different people, preferably in different countries.=20



We implemented a rewebber on top of the scalable TACC server arch= itecture=20 described in [11]. This architecture provided u= s with=20 a high-performance scalable HTTP proxy and caching system, on top o= f which=20 it was straightforward to implement the extra functionality necessa= ry for=20 a rewebber. We observe excellent performance with our prototype imp= lementation;=20 CPU costs are minimized with the scalable TACC server, so that netw= ork costs=20 dominate, and network costs are themselves reduced with the caching= provided=20 by the TACC architecture.=20

We also implemented the handful of tools necessary for publishers= to pick=20 random DESX keys, create locators, and encrypt the files they wish = to place=20 on their server.=20

All of the crypto portions of the code will likely be recoded fro= m scratch=20 the next time the first author goes to a country with somewhat less= ridiculous=20 laws about publishing cryptography, so that they can be released.=20


TAZ Servers

Once a rewebber network is deployed, the stage will be set for th= e ability=20 to publish anonymously on the WWW. One major drawback, however, is = that=20 a locator that contains just a simple chain of rewebbers looks some= thing=20 like this:=20



It is evident that there is a naming problem and that it needs to= be solved.=20

We therefore create a virtual namespace called the .taz = namespace,=20 and we create new servers called TAZ servers to resolve th= is namespace.=20 TAZ stands for Temporary Autonomous Zone [12]; = the emphasis=20 here is that .taz sites can relocate quickly, utilizing th= e level=20 of indirection offered by the TAZ server.=20

The function of a TAZ server is to offer publishers an easy way t= o point=20 potential readers at their material, as well as offering readers an= easy=20 way to access it. A TAZ server consists essentially of a public= =20 database mapping virtual hostnames ending in .taz to reweb= ber locators.=20 The emphasis on "public" is to stress that nothing in this database= is a=20 secret; unlike anon.penet.fi-style (type 0) remailers (whi= ch associate=20 a virtual e-mail address with a real one), TAZ servers merely assoc= iate=20 .taz addresses with locators. In practice, the TAZ server = database=20 would also usually contain a hash of a password, so as to effect ac= cess=20 control for updates to the database, but this need not be secret in= formation.=20 Most importantly, the TAZ server operator cannot decrypt the lo= cators=20 that are in its database.=20

The public nature of the TAZ database brings several benefits. Fi= rst of=20 all, it provides protection for TAZ server operators: there is no i= ncentive=20 to threaten them with legal, social, or political pressure, because= any=20 information the operator can access is also publicly available. It = would=20 be prudent to widely publicize this fact, in order to reduce the in= evitable=20 flood of requests for identity information. Second, it separates th= e document=20 location problem from the rewebber network, so we get extensibility= : TAZ=20 servers can be readily upgraded without needing to change existing = rewebber=20 code. Finally, the purely public nature of TAZ databases gives very= strong=20 security benefits, because it ensures that the TAZ servers cannot r= educe=20 the security of the system. TAZ servers may very well become extrem= ely complicated=20 in the future, if some type of coherent dynamic distributed databas= e were=20 implemented. In such a future, we would be very glad that the secur= ity of=20 our system does not rely on the security of the TAZ servers.=20

We are not overly concerned with the authenticity of the TAZ data= base,=20 because the best solution is to rely on end-to-end authentication [= 13].=20 Anonymous publishers concerned about forgeries should establish a p= ublic=20 key for their pseudonym, sign all anonymous documents with this key= , and=20 distribute their nym's public key widely along with their .taz<= /tt>=20 hostname.=20

One very important feature to note about TAZ servers is that the = mechanism=20 it provides (resolving virtual names to locators) is completely ort= hogonal=20 to the rewebber mechanism; there is no need for the operator of a T= AZ server=20 to also run a rewebber, nor vice versa. A TAZ server is merely a co= nvenient=20 central repository for a public database.=20

Currently, we have two interfaces to a working TAZ server availab= le; each=20 interface offers a slightly different way of sending a rewebber loc= ator=20 to any standard WWW client. At http://www.isaac.cs.berkel= ey.edu/ta=20 z/ is a list of all currently registered .taz hosts. F= rom this=20 page, one can click on a host's name, and be taken to that page (th= e HREF=20 in the link points to the locator for the host). Also, to access a = specific=20 host (cellspec.taz for example), one can just use the URL = http://www.isaac.cs.berkeley.edu/taz.cgi/cellspec.taz/=20 (the CGI on this page issues an HTTP Redirect which points to the l= ocator).=20

These centralized approaches will work fine, until the namespace = gets=20 too big or the central .taz server becomes a bottleneck; t= hen we=20 will probably prefer to move to a decentralized distributed solutio= n for=20 better scalability and availability. This issue is addressed furthe= r in=20 the section "Future Work".=20


Related Work

Ross Anderson proposes the Eternity Service [14], which=20 is intended to provide very long-term digital storage and publicati= on. It=20 is intended to offer strong availability guarantees even in presenc= e of=20 adversarial attacks. While the Eternity Service shares several simi= lar features=20 with anonymous WWW publication, the Eternity Service is especially = geared=20 towards highly-available publication; anonymity is not a direct goa= l. The=20 contrast is perhaps best appreciated with a direct quote from the p= aper:=20


The net effect will be that your file, once posted on the= eternity=20 service, cannot be deleted. As you cannot delete it yourself, you c= annot=20 be forced to delete it, either by abuse of process or by a gun at y= our wife's=20 head.

Note that our design would attempt to stop threats like "a gun at= your=20 wife's head" by protecting your anonymity, so that the thugs will n= ot know=20 whose wife to point a gun at.=20

Anderson discusses a number of engineering tradeoffs involved in = building=20 such a service. One interesting observation is that anonymous commu= nication=20 models can stand as a countermeasure against denial of service atta= cks;=20 thus, in the context of the Eternity Service anonymity is viewed as= a useful=20 technique to increase availability, rather than a goal in and of it= self.=20 No implementation or full design of Anderson's functional specifica= tion=20 is available, probably because of the very aggressive design goals,= but=20 we can learn some interesting lessons about the engineering tradeof= fs in=20 highly available publication from this paper.=20

There has also been some recent work on building infrastructures = for general-purpose=20 interactive anonymous communication on the Internet. We can learn a= great=20 deal about the challenges of anonymity for low-latency communicatio= ns. One=20 important lesson to take away from these efforts is that there are = a great=20 deal many clever and powerful attacks against low-latency anonymize= d traffic=20 when adversaries have control over many segments of the network.=20

Wei Dai's PipeNet architecture [15] is based = on a distributed=20 network of packet forwarders, connected by long-term encrypted comm= unications=20 links that are padded to constant bandwidth when not in use to prev= ent traffic=20 analysis; Chaumian mixes are used at each router node. The result i= s much=20 like a low-latency IP-level cousin of the type II remailer network,= with=20 the addition of a lot of cover traffic to make up for the increased= potential=20 for timing attacks that arises from the low-latency requirements. T= he requirement=20 for constant-bandwidth long-lived encrypted links incurs serious pe= rformance=20 costs to provide security against this very cautious threat model o= f pervasive=20 eavesdropping on the network. No complete PipeNet design, much less= implementation,=20 is available, and PipeNet does not appear to be a serious candidate= for=20 practical deployment in the short term; but PipeNet was an influent= ial early=20 proposal for anonymous communication that established the possibili= ty a=20 high level of security and illuminated a number of subtle possible = attacks=20 on such a system.=20

A more recent and mature research effort to build a practical inf= rastructure=20 for interactive anonymous Internet traffic is the Onion Routing pro= ject=20 [16]. The Onion Routing proposal embodies slightly more p= ractically-oriented=20 engineering decisions than vanilla PipeNet. For instance, links are= not=20 padded to constant bandwidth; instead, Chaumian mixes and "safety i= n numbers"=20 is used to protect against traffic analysis. The authors do not att= empt=20 to protect against attackers who can control (or monitor) many dive= rse portions=20 of the network simultaneously. By taking on slightly less ambitious= goals,=20 the Onion Routing researchers have managed to design a full system = and preliminary=20 implementations are available.=20

Our rewebber system bears a certain resemblance to PipeNet and On= ion Routing;=20 after all, it is essentially attempting to anonymize low-latency IP= traffic.=20 However, for our sanity, we have simplified the problem in a number= of ways.=20 First, our rewebbers operate at the application layer, rather than = the network=20 layer, which enables us to make several optimizations and simplific= ations:=20 we take advantage of the request-reply nature of HTTP, and take adv= antage=20 of the weak semantics of HTTP as well as the relatively static natu= re of=20 WWW pages to cache data throughout the system. Second, to achieve v= ery rapid=20 implementation and deployment, we have weakened the threat model so= mewhat;=20 protection against network attacks is deemed a lower priority (and = much=20 harder problem) than immediate deployment with protection against p= olitical,=20 legal, and social attacks. We expect to build upon the rewebber sys= tem to=20 improve its security against network attacks as experience with Oni= on Routing=20 and the general low-latency communications anonymity problem improv= es. Third,=20 we note that infrastructures for routing general IP packets anonymo= usly=20 is subject to a potentially unprecedented levels of abuse: maliciou= s hackers=20 could use this service to break into vast numbers of Internet-conne= cted=20 hosts without fear of discovery. In contrast, by restricting oursel= ves to=20 the problem of anonymous Web publication, we can avoid worrying abo= ut abuse.=20 Finally, we note that once the Onion Routing infrastructure becomes= widely=20 available, we could easily add support for anonymous publishers con= nected=20 via Onion Routing links, if we expected that publishers were sophis= ticated=20 enough to take advantage of the Onion Routing infrastructure; that = allows=20 us to take advantage of the best of both worlds.=20

URNs (Universal Resource Names) [17] attempt = to address=20 nearly the same naming problem as TAZ servers do. Both attempt to p= rovide=20 short persistent names for resources via a simple resolving mechani= sms,=20 using the standard computer science trick - a level of indirection.= The=20 major difference is that, when a URN ultimately points to a resourc= e described=20 by a URL, it is a design goal that one be able to learn the pointed= -to URL=20 from the URN; in contrast, the whole point of the anonymous WWW pub= lishing=20 infrastructure is that, given a TAZ name, one cannot obtain the URL= for=20 the author's real site. These differences notwithstanding, future e= xperience=20 with URN resolving mechanisms is likely to lead to improved name re= solution=20 service from the TAZ servers.=20


Future Work

There are a number of possible improvements to the rewebber syste= m. Currently=20 we implement some but not all of the features of type I and type II= remailers;=20 we do not provide strong security against wide-area network attacks= at present.=20 It would be interesting to explore techniques for better security. = For instance,=20 we could apply "fan-out" - where one request to a first-hop rewebbe= r spawns=20 several parallel requests to unrelated sites (whose responses are i= gnored)=20 - to attempt to generate some cover traffic and disguise the locati= on of=20 the publisher's endpoint. A variation on "fan-out" can be used to a= ttempt=20 to increase availability in the presence of network faults: include= several=20 independent paths to the content in the locator, and then if one fa= ils because=20 of a network error, the content can still be retrieved. More genera= lly,=20 these "fan-out" schemes can be generalized using k-out-of-n=20 secret sharing schemes, where a single client request fans out into= n=20 requests inside the rewebber network, and any k responses su= ffice=20 to recover the content. All of these schemes need to be analyzed to= examine=20 just how much benefit they provide, and what cost they impose upon = the system.=20

It would be valuable to carefully analyze the security benefit th= at application-level=20 caching affords us. We have not worked this out in detail yet, beca= use at=20 this point we consider eavesdropping and other network attacks a se= cond=20 priority. However, it appears that very large benefits may potentia= lly be=20 available, especially if caching is combined with a special high-se= curity=20 mechanism using considerable reordering and random delays for non-i= nteractive=20 pre-fetching to warm the caches.=20

Another improvement to the rewebber system would take advantage o= f steganography=20 [18] to hide the fact that a particular author = has published=20 some (unspecified) works anonymously. Currently an anonymous author= must=20 store encrypted data on his WWW server, but ciphertext usually stan= ds out;=20 by disguising the encrypted data file within normal-looking data (e= .g. within=20 the low bits of a large image or sound file), we can avoid this pro= blem.=20 That would allow an anonymous publisher to camouflage himself and b= lend=20 in among all the other noise on the World Wide Web.=20

There is the potential for a great deal of future work in the TAZ= servers.=20 Centralized solutions such as the ones we have implemented may work= well=20 for a while, but in the future decentralized solutions may be prefe= rable,=20 for both scalability and availability reasons. Of course, a distrib= uted=20 naming database would incur all the coherency, performance, and oth= er problems=20 found in a dynamic distributed database; we expect that a great dea= l can=20 be learned from DNS. Two possible decentralized methods worthy of f= urther=20 study are multicast capable servers, and Usenet-based updates (a fo= rm of=20 slow multicast). If the .taz namespace starts becoming too= large,=20 we will have all the problems of a static /etc/hosts file = with=20 our current implementation, and a transition to a more scalable sol= ution=20 will be of paramount importance.=20

In any case, more implementation and real-life deployment experie= nce is=20 needed to understand the engineering tradeoffs better. Fortunately,= by separating=20 the TAZ database from the rewebber system, and by ensuring that the= TAZ=20 database contains only public data (and so is not security-critical= ), we=20 have reserved great flexibility for upgrading this portion of the s= ystem.=20



We have designed a practical infrastructure that enables anonymou= s publishing=20 on the World Wide Web. Some key features include universal accessib= ility=20 by standard clients, short persistent names, security against socia= l, legal,=20 and political pressure, protection against abuse, and good performa= nce.=20

We have completed a prototype implementation, and we expect to st= art deploying=20 more mature code for practical use this summer. Implementation was = eased=20 by using the scalable TACC server architecture [11<= /a>]. Extensibility=20 is provided by separating the naming and document location problem = from=20 the core document anonymization services.=20

One novel contribution was the observation that caching can be us= ed to=20 improve not just performance but also improve security against netw= ork eavesdroppers=20 as well. More generally, by providing anonymity services at the app= lication=20 layer, we were able to take advantage of the semantics of the World= Wide=20 Web to simplify and optimize our system.=20

In short, we have demonstrated that anonymous publishing for the = World=20 Wide Web is well within practical reach today.=20



About the Authors

Ian Goldberg and David Wagner are Graduate = Student=20 Researchers and founding members of the ISAAC research group at the U= niversity=20 of California, Berkeley. The ISAAC group studies, among other things,= computer=20 security, privacy, and anonymity.=20

Ian holds a B.Math. degree from the University of Waterloo; David= holds=20 an AB degree from Princeton University. Both are currently in their= third=20 year of a Ph.D. program at the University of California, Berkeley.=20

ISAAC Group Home Page: http://www.isaac.cs.berkeley.e= du/=20


We gratefully acknowledge Raph Levien for suggesting many of the = general=20 directions this work has taken.=20



1. Veran Matic, 1996. "Another 'Jamming Device= ' Activated=20 Against Radio B92", Press release (1 December), at http://www.dds.nl/~= pressnow/=20 news/9611300.htm; Rebecca Vesely, 1996. "Banned on Radio, Belgr= ade Dissidents=20 Take to Net", Wired News (3 December), at http://www.wir= ed.com/ne=20 ws/politics/story/777.html; and, Rebecca Vesely, 1996. "Net Pre= sence=20 Widens as Serbia Shuts Down Media", Wired News (4 December),= at http://www.wir= ed.com/ne=20 ws/politics/story/806.html=20

2. Community ConneXion, 1996. "Anonymous Surfi= ng", http://www.anonymizer.com/=20

3. Ian Goldberg, David Wagner, and Eric Brewer= , 1997.=20 "Privacy-enhancing Technologies for the Internet", Proceedings o= f IEEE=20 COMPCON '97.=20

4. Andre Bacard, 1996. "Anonymous Remailer FAQ= ", at http://www.well.com= /user/aba=20 card/remail.html; Arnoud Engelfriet, 1996. "Anonymity and Priva= cy on=20 the Internet", (19 December), at http://www.st= ack.nl/~g=20 alactus/remailers/index.html; C. Gulcu and G. Tsudik, 1996. "Mi= xing=20 E-mail with Babel", Proceedings Symp. Network and Distribute= d System=20 Security, pp. 2-16; and, Andreas Pfitzmann and Michael Waidner, 198= 6. "Networks=20 without user observability - design options", EUROCRYPT 85, = LNCS=20 219. New York: Springer-Verlag, pp. 245-253, and at http://www.informatik.uni-hildesheim.de/~sirene/publ/PfWa86anonyNetze.h= tml=20

5. Johan Helsingius, 1996. Press release (30 A= ugust),=20 at http:= //www.cyb=20 erpass.net/security/penet.press-release.html=20

6. David Chaum, 1981. "Untraceable Electronic = Mail, Return=20 addresses, and Digital Pseudonyms", Communications of the ACM, Volume=20 24, Number 2 (February), at http://www.eskimo.com/= ~weidai/m=20 ix-net.txt; Lance Cotrell, 1995. "Mixmaster & Remailer Atta= cks",=20 at =20 http://www.obscura.com/~loki/remailer/remailer-essay.html=20

7. Raph Levien, "premail", at http://www.c2.net/~raph/pre= mail.html=20 =20

8. Community ConneXion, 1996. "Anonymous Surfi= ng", http://www.anonymizer.com/=20

9. Ray Cromwell, 1996. "Welcome to the Decense= Project",=20 http://www.clark.net/p= ub/rjc/de=20 cense.html; Laurent Demailly, "Announce: Anonymous Http Proxy (= preliminary=20 release)", Usenet post at http://www.lyot.obspm= .fr/~dl/a=20 nonproxy.txt=20

10. Bruce Schneier, 1996. Applied Cryptogra= phy.=20 2nd edition, N. Y.: John Wiley & Sons.=20

11. Armando Fox et al., 1997. "Scalable Networ= k Services",=20 Submitted under blind review.=20

12. Hakim Bey, 1991. T.A.Z.: The Temporary = Autonomous=20 Zone, Ontological Anarchy, Poetic Terrorism. N. Y.: Autonomedia= .=20

13. J. Saltzer, D. Reed, and D. Clark, 1984. "= End-to-end=20 Arguments in System Design", ACM Transactions on Computer System= s (TOCS),=20 Volume 2, Number 4, pp. 195-206.=20

14. Ross Anderson, 1996. "The Eternity Service= ", PRAGOCRYPT=20 96, at ftp://ftp.cl.ca= m.ac.uk/u=20 sers/rja14/eternity.ps.Z=20

15. Wei Dai, 1995. "PipeNet", February 1995 po= st to the=20 cypherpunks mailing list.=20

16. Paul Syverson, David Goldschlag, and Micha= el Reed.=20 "Anonymous Connections and Onion Routing", draft manuscript, availa= ble at=20 http://www.itd.nrl.navy.mil/ITD/5540/projects/onion-routing/overview.htm= l=20

17. K. Sollins and L. Masinter. "Functional Req= uirements=20 for Uniform Resource Names", RFC1737.=20

18. Ross Anderson, 1995. "Information Hiding -= An Annotated=20 Bibliography", at http://www.cl.cam.ac.uk/users/fapp2/steganography/bibliography/index.htm= l=20

3D"Contents"=20 3D"Index"=20

Copyright © 1998, ƒ = 61; ®=20 s † - m ¤ ñ d @ ¥=20