Haiku Laureate | Frontier Nerds

Concept

Haiku Laureate generates haiku about a particular geographic location.

For example, the address “Washington D.C.” yields the following haiku:

the white house jonas
of washington president
and obama tree

Much of the work we’ve created in Electronic Text has resulted in output that’s interesting but very obviously of robotic origin. English language haiku has a very simple set of rules, and its formal practice favors ambiguous and unlikely word combinations. These conventions / constraints give haiku a particularly shallow uncanny valley; low-hanging fruit for algorithmic mimicry.

Haiku Laureate takes a street address, a city name, etc. (anything you could drop into Google maps), and then asks Flickr to find images near that location. It skims through the titles of those images, building a list of words associated with the location. Finally, it spits them back out using the familiar three-line 5-7-5 syllable scheme (and a few other basic rules).

The (intended) result is a haiku specifically for and about the location used to seed the algorithm: The code is supposed to become an on-demand all-occasion minimally-talented poet laureate to the world.

Demo

Execution

The script breaks down into three major parts: Geocoding, title collection, and finally haiku generation.

Geocoding:

Geocoding takes a street address and returns latitude and longitude coordinates. Google makes this easy, their maps API exposes a geocoder that returns XML, and it works disturbingly well. (e.g. a query as vague as “DC” returns a viable lat / lon.)

This step leaves us with something like this:

721 Broadway, New York NY is at lat: 40.7292910 lon: -73.9936710

Title Collection:

Flickr provides a real glut of geocoded data through their API, and much of it is textual — tags, comments, descriptions, titles, notes, camera metadata, etc. I initially intended to use tag data for this project, but it turned out that harvesting words from photo titles was more interesting and resulted in more natural haiku. The script passes the lat / lon coordinates from Google to Flickr’s photo search function, specifying an initial search radius of 1 mile around that point. It reads through a bunch of photo data, storing all the title words it finds along the way, and counting the number times each word turned up.

If we can’t get enough unique words within a mile of the original search location, the algorithm tries again with a progressively larger search radius until we have enough words to work with. Asking for around 100–200 unique words work well. (However, for rural locations, the search radius sometimes has to grow significantly before enough words are found.)

The result of this step is a dictionary of title words, sorted by frequency. For example, here’s the first few lines of the list for ITP’s address:

{"the": 23, "of": 16, "and": 14, "washington": 12, "village": 11, "square": 10, "park": 10, "nyu": 9, "a": 9, "new": 8, "in": 8, "greenwich": 8, "street": 6, "webster": 6, "philosophy": 6, "hall": 6, "york": 6, [...] }

Haiku Generation:

This list of words is passed to the haiku generator, which assembles the words into three-line 5-7-5 syllable poems.

Programmatic syllable counting is a real problem — the dictionary-based lookup approach doesn’t work particularly well in this context due to the prevalence of bizarre words and misspellings on the web. I ended up using a function from the nltk_contrib library which uses phoneme-based tricks to give a best guess syllable count for non-dictionary words. It works reasonably well, but isn’t perfect.

Words are then picked from the top of the list to assemble each line, using care to produce a line of the specified syllable count. This technique alone created mediocre output — it wasn’t uncommon to get lines ending with “the” or a line with a string of uninspired conjunctions. So I isolated these problematic words into a boring_words list — consisting mostly of prepositions and conjunctions — which was used to enforce to enforce a few basic rules: First, each line is allowed to contain only one word from the boring word list. Second, a line may not end in a boring word. This improved readability dramatically. Here’s the output:

the washington square
of village park nyu new street
and greenwich webster

More Sample Output

A few more works by the Haiku Laureate:

Chicago, IL
chicago lucy
trip birthday with balloons fun
gift unwraps her night

Gettysburg
the gettysburg view
monument and from devils
of den sign jess square

Dubai
Dubai Museum Bur
in Hotel The Ramada
with Dancing Room Tour

Tokyo
tokyo shinjuku
metropolitan the night
from government view

Canton, KS
jul thu self me day
and any first baptist cloud
the canton more up

Las Vegas, NV
and eiffel tower
in flamingo from view glass
at caesars palace

eve revolution
trails fabulous heralds blue
emptiness elton

monorail hide new
above bird never jasmine
path boy cleopatra

I’ve also attached a list of 150 haiku about New York generated by the haiku laureate.

Note that the Haiku Laureate isn’t limited to major cities… just about any first-world address will work. Differences in output can be seen at distances of just a few blocks in densely populated areas.

Source Code

The code is intended for use on the command line. You’ll need your own API keys for Google Maps and Flickr.

The script takes one or two arguments. The first is the address (in quotes), and the second is the number of haiku you would like to receive about the particular location.

For example: $ python geo_haiku.py "central park, ny" 5

Will return five three-line haiku about central park.

The source is too long to embed here, but it’s available for download