Frontier Nerds | An ITP Blog

Spring Thesis Plans

Eric Mika

THE POST-APOCALYPTIC PIRATE INTERNET

For background on the basic idea of the post-apocalyptic pirate internet, please read an earlier post on the subject

Problem: The centrally-distributed internet is fragile and politically fickle

The web’s current implementation is built from millions of geographically dispersed clients communicating with a handful of extremely high-density data centers. Despite the many ⇔ many ideals of the web, the infrastructure looks more like many ⇒ one ⇒ many.

This topology means that there are points in the network of significant vulnerability: Backbone fiber, ISP central offices, data centers, etc. all represent potential choke points in the web. The destruction of physical infrastructure or installation of firewalls to screen and censor data at one of these points could snuff access to the web. That would be a shame, since the web is arguably the most significant aggregation of knowledge and culture humanity has ever assembled.

How could this knowledge be protected, and how could the current freedom of expression and exchange enjoyed on the centralized web reemerge under a distributed model that is technically immune to data loss and censorship?

Solution: Distributed, mesh-networked backups of the entire web

I propose a distributed backup system for the web to ensure the survival of data and continuation of the platform’s ideals in the face of a political or infrastructural apocalypse.

The basic unit of the post-apocalyptic pirate internet is the “backup node”. These are relatively small, suitcase-sized computers with lots of storage space. Servers, basically. They’re designed for use by consumers of average technical aptitude. Backup nodes would sit in the corner of a room and sip data from the internet to build a backup of some portion of the web. If and when the centralized web infrastructure falls apart, the backup nodes would be poised to respond by automatically transforming from data aggregators to data distributors. Requests for web data in the absence of centralized infrastructure (post-apocalypse) would instead be fulfilled by the backup nodes — at least to the extent that backups are available.

The technical infrastructure of the post-apocalyptic pirate internet has two basic components. The first is physical: local storage nodes — hard disks, flash memory, etc. — on which fragments of the web will be backed up and paired with a supporting computer and interface (most likely a browser). The second is ethereal: wireless communication which will enable the formation of mesh network between physically proximate nodes. This would give apocalypse survivors access to more than just the data stored on their local node. In this sense, a new internet would take shape as the backup nodes enmeshed — an internet that was not vulnerable to centralized oversight or obstruction.

Execution: Research demand and feasibility, then build a backup node

First I’ll have to figure out how / why, exactly, such a system could / should be built. How would the content of the backups backups curated? By some distributed democratic means? By the usage patterns of the backup node’s owner? There’s a judgment to be made in deciding between saving the data people actually interact with on a daily basis (say, Twitter), and the data that actually carries forward knowledge essential to civilization (OpenCourseWare comes to mind).

What role will the backup nodes play before the apocalypse? Will they be seemingly dormant black boxes going about their work without human intervention, or will they become distribution points for content censored from the centralized web (Wikileaks would be the example of the day).

Marina has encouraged me to focus on the conceptual justifications for the system instead of technical implementation. However I’m personally interested in creating at least one actual node to demonstrate the concept. I understand the futility of the gesture, since the pirate internet would require thousands of backup nodes to be built, sold, and operated if it was going to actually protect (and eventually distribute) an appreciable amount of data. A single node is not particularly useful. Nevertheless, I’d like to end the semester with more than an exhaustive string of justifications / marketing material for something that doesn’t actually exist.

NIME is Coming

Eric Mika

NIME 2010 Poster

Signs of the Apocalypse

Eric Mika

A glut of headlines relevant to the post-apocalyptic pirate internet have popped up over the last few weeks. Here’s a quick review with commentary.

This first batch is regarding the temporary loss of major online repositories for “user generated content” (to invoke the cliché).

Another post discussing the Wikileaks saga is forthcoming in the context of the post-apocalyptic pirate internet is forthcoming.

Tumblr, the celebrated blogging platform, was down for about 24 hours on December 5th. This was their longest outage to date.

Tumblr outage screenshot

Users’ trust is shaken by this sort of thing, and a day after the outage they released a backup application that lets users save all of their Tumblr posts to their hard disks.

Here’s the official line:

Unlike other publishing sites’ approach to backups, our goal was to create a useful copy of your blog’s content that can be viewed on any computer, burned to a CD, or hosted as an archive of static HTML files.
Wherever possible, we use simple file formats. Our backup structure is optimized for Mac OS X’s Spotlight for searching and Quick Look for browsing, and we’ll try to use the same structure and achieve the same benefits on other platforms.

To me this reads more like, “Keep uploading! If we implode, we won’t take your data with us.”

The backup app strikes me as Hail Mary decision executed in the interest of damage control (with the side effect of actually being good news for the survivability of the 2+ billion posts Tumblr hosts on their servers). There’s a tension on social media websites between giving users access to their own data (in the form of database dumps) and maximizing “lock in” — since giving users downloadable access to their data can provide an easy means of egress from one service and migration to a competitor. (cf. Facebook’s recent decision to let users dump their data in one step.)

Of course, like most prophylactics, the download tool would only be useful in the context of the post-apocalyptic pirate internet if it 100% of Tumblr publishers used it 100% of the time. Nevertheless, the fact that this piece of preservationist infrastructure was officially released suggests that some portion of the Tumblr staff / users are paranoid enough to prepare for a data or infrastructure related disaster. The app also implicitly migrates the worst-case backup burden from the host to the client. (e.g. “Oops, we lost everything… what, you didn’t back up your posts?”) This represents a significant shift in one of the basic contracts of Web 2.0, which is the idea that “files” as we know them on our PCs don’t exist, you don’t have to worry about which directory things go in, you don’t plan for a day when you’ll need to open Word 3.0 files, and you certainly don’t have to back up. The understanding between consumer and provider is that once something’s uploaded, it’s safe from loss due to technical failure — where every bit is tucked away in multi-million-dollar data centers and placed under the careful watch of bespectacled geeks pacing up and down miles of server racks.

Of course, that’s not how things work out, but the cloud = safe truism is one that will need to be proven catastrophically false before the basic tenet of the post-apocalyptic pirate internet — that local bits are safe bits — can take hold.

Another outage of reasonably high profile (although certainly not on the scale of Tumblr) struck GitHub on November 14th. A botched command by a systems administrator wiped out a database and destroyed some data along the way. The site was unusable for about three hours.

GitHub is much more esoteric than Tumblr, but for the uninitiated it’s basically a web site layering social-networking tools on top of Git. Git, in turn, is a piece of software that runs locally on your computer to keep track of collaborations around / revisions to source code written in the course of developing software.

Anyway, here’s what bad news looked like, as delivered by GitHub’s mascot, the Octocat:

GitHub Octocat looking sad

The nature of Git (the version-control system) means that even a total loss of GitHub (the community build on Git) would be inconvenient, but not catastrophic. When you’re working with a Git repository, you have a local copy on your hard disk that is periodically updated and synced to the GitHub server.

If 50 people are working on a particular project, then 50 copies of that project exist on local hard disks in one corner of the world or another. Thus the degree to which a projects is insured against disaster rises proportionally to a project’s popularity / number of collaborators.

So there are two particularly great things about the Git + GitHub combination that should be kept in mind as plans for the post-apocalyptic pirate internet are drawn up:

The same basic software (Git) is running on both your own computer and GitHub’s servers. In this sense, GitHub makes the most of the web when it’s available (by adding a social layer to Git), but Git itself doesn’t completely melt down in the absence of GitHub. In short, Git’s use of the centralized web is value added, not mission critical.
Local backups are generated automatically in the course of using GitHub — unlike Tumblr’s proposed solution, which calls on users to make a conscious decision to back up at regular intervals if they want the safety of their data.

It Talks: Text to Speech in Processing

Eric Mika

The Mac has a really great text-so-speech (TTS) engine built right in, but at first glance it’s only available at Apple’s whim in specific contexts — e.g. via a menu command in TextEdit, or system-wide through the accessibility settings. Seems grim, but we’re in luck — Apple, in their infinite generosity, have given us a command line program called “say”, which lets us invoke the TTS engine through the terminal. It’s super simple to use, just type the command and then the text you want, e.g.

say cosmic manifold

So that’s great, now what if we wanted to make a Processing sketch talk to us? In Java, as in most languages, there are ways to send commands to the terminal programmatically. By calling Runtime.getRuntime().exec("some command"); we can run any code we want on the terminal from within Processing. So to invoke the TTS engine from a Processing sketch, we can just create the say ... command line instruction in a string object, pass that into the runtime execution thing, which in turn handles the TTS conversion.

I’ve put together a small Processing class that makes it easy to add speech to your Processing sketches. It only works on Mac OS, won’t work in a web applet, and has only been tested in Mac OS 10.6. (I think the list of voices has changed since 10.5.)

Note that the since the class is quite simple and really just wraps up a few functions. I’ve set it up for static access, which means that you should never need to instantiate the class by calling something like TextToSpeech tts = new TextToSpeech() — and in fact that would be a Bad Idea. Instead, you can access the methods any time without any prior instantiation using static style syntax, e.g. TextToSpeech.say("cosmic manifold");.

Here’s the class and a sample sketch:

// Processing Text to Speech
// Eric Mika, Winter 2010
// Tested on Mac OS 10.6 only, possibly compatible with 10.5 (with modification)
// Adapted from code by Denis Meyer (CallToPower)
// Thanks to Mark Triant for the inspiring sample text

String script = "cosmic manifold";
int voiceIndex;
int voiceSpeed;

void setup() {
  size(500, 500);
}

void draw() {
  background(0);

  // set the voice based on mouse y
  voiceIndex = round(map(mouseY, 0, height, 0, TextToSpeech.voices.length - 1));

  //set the voice speed based on mouse X
  voiceSpeed = mouseX;

  // help text
  fill(255);
  text("Click to hear " + TextToSpeech.voices[voiceIndex] + "\nsay \"" + script + "\"\nat speed " + mouseX, 10, 20);

  fill(128);
  text("Mouse X sets voice speed.\nMouse Y sets voice.", 10, 65);
}

void mousePressed() {
  // say something
  TextToSpeech.say(script, TextToSpeech.voices[voiceIndex], voiceSpeed);
}


// the text to speech class
import java.io.IOException;

static class TextToSpeech extends Object {

  // Store the voices, makes for nice auto-complete in Eclipse

  // male voices
  static final String ALEX = "Alex";
  static final String BRUCE = "Bruce";
  static final String FRED = "Fred";
  static final String JUNIOR = "Junior";
  static final String RALPH = "Ralph";

  // female voices
  static final String AGNES = "Agnes";
  static final String KATHY = "Kathy";
  static final String PRINCESS = "Princess";
  static final String VICKI = "Vicki";
  static final String VICTORIA = "Victoria";

  // novelty voices
  static final String ALBERT = "Albert";
  static final String BAD_NEWS = "Bad News";
  static final String BAHH = "Bahh";
  static final String BELLS = "Bells";
  static final String BOING = "Boing";
  static final String BUBBLES = "Bubbles";
  static final String CELLOS = "Cellos";
  static final String DERANGED = "Deranged";
  static final String GOOD_NEWS = "Good News";
  static final String HYSTERICAL = "Hysterical";
  static final String PIPE_ORGAN = "Pipe Organ";
  static final String TRINOIDS = "Trinoids";
  static final String WHISPER = "Whisper";
  static final String ZARVOX = "Zarvox";

  // throw them in an array so we can iterate over them / pick at random
  static String[] voices = {
    ALEX, BRUCE, FRED, JUNIOR, RALPH, AGNES, KATHY,
    PRINCESS, VICKI, VICTORIA, ALBERT, BAD_NEWS, BAHH,
    BELLS, BOING, BUBBLES, CELLOS, DERANGED, GOOD_NEWS,
    HYSTERICAL, PIPE_ORGAN, TRINOIDS, WHISPER, ZARVOX
  };

  // this sends the "say" command to the terminal with the appropriate args
  static void say(String script, String voice, int speed) {
    try {
      Runtime.getRuntime().exec(new String[] {"say", "-v", voice, "[[rate " + speed + "]]" + script});
    }
    catch (IOException e) {
      System.err.println("IOException");
    }
  }

  // Overload the say method so we can call it with fewer arguments and basic defaults
  static void say(String script) {
    // 200 seems like a resonable default speed
    say(script, ALEX, 200);
  }

}

Rough Thesis Proposal

Eric Mika

Let’s suppose the internet stops working tomorrow. Not the hard disks, just the wires. Government firewall, tiered service, cut fiber, ISP meltdown — pick a scenario.

How much would be lost? How much could be recovered? What would you need to rebuild?

We have the infrastructure — computing power, abundant storage, networking — but not the know-how, organization, will, or sense of impending disaster required to start building a decentralized web.

We need a way to easily rearrange our consumer electronics (which are optimized for consumption) into a network that can’t be centrally controlled or destroyed (and is therefore optimized for creation and distribution). Most importantly, the ubiquity and overlap of consumer-created wireless networks in urban areas means that mesh-based networks with thousands of nodes should be feasible without any reliance on centralized network infrastructure.

Hence the post-apocalyptic pirate web kit: everything you need to bootstrap a decentralized web in a single package. This could take several forms, but my initial thinking suggest a suitcase full of hard disks, wireless connections, and processing power designed to restore fragments of the web to nearby users and act as a node in a broader network of data survivalists.

The hard-disks would be pre-loaded with an archival version of the web. The whole of Wikipedia’s english text content, for example, is readily available for download and amounts to abut 5 terabytes. This could fit on three hard disks, which cost about $100 each, and together displace about as much physical space as a loaf of bread.

In its dormant form, the post-apocalyptic pirate web kit is something you might leave plugged in at the corner of the room — it could sit there indefinitely, like a fire extinguisher. The kit could automatically crawl the web and keep its archival mirror as fresh as possible.

When and if disaster strikes, the kit would be ready to switch into server node and thus preserve our way of internet life. (So that we might continue with a spirit of bold curiosity for the adventure ahead!)