Thursday, July 30, 2015

Hannah Lee – Linking the Humanities with Data Sets and Algorithms


Hannah Lee recently moved back to St. Paul after working for the past two years as a Program Manager for the Blue Mountain Center in New York's Adirondack Mountains. A graduate of Oberlin College with a degree in English, she is spending working as a Communications Assistant with the Humanities Center's Communications Team.

Have you ever used a random Facebook post generator or noticed someone else using one? Social media users plug their post histories into apps like What Would I Say? or Random Status Generator, which spit out nonsense versions of their typical posts, using their syntax and frequently used words or phrases. It’s a goofy way to learn what you are most likely to share, or just look at your online persona in a funhouse mirror, but it’s not really what you’d call useful.

Imagine the reaction of Brown University professor Elias Muhanna when one of his students handed in a randomly-generated paper. In his recent article “Hacking the Humanities”, Muhanna describes a seminar he taught on encyclopedic writing. Assigned to write in the style of Roman encyclopedist Pliny the Elder, the student wrote an algorithm instead. Using an English version of Pliny’s “The Natural History,” this algorithm produced sentences like, “Also great creatures resembling sheep come out on to the land for an unascertained reason, and they bud best under those circumstances, as otherwise it would make only leaves.” Muhanna understood that this was a great way to make new observations about Pliny’s expository style. Since he was also interested in coding, he ended up hiring the student to help him with an analysis of Arabic poetry.

Algorithms like this one offer students of the humanities a tool for analyzing texts that is unprecedented in terms of efficiency and scale. A literary critic, for instance, can strengthen an argument by examining exactly where and how often a symbol appears in an author’s complete works by analyzing characters’ word choices based on gender, or mapping the evolution of a cliché.

It seems strange at first to apply number-crunching to novels, but it’s refreshing to remember that we can use numbers and formulas to make subjective observations, not just draw objective conclusions. In the words of Stephen Ramsay, author of Reading Machines: Toward an Algorithmic Criticism, “The scientist is right to say that the plural of anecdote is not data, but in literary criticism, an abundance of anecdote is precisely what allows discussion and debate to move forward.”

Plus, it turns out it works both ways; data (or anecdote) visualization is useful for artists as well as critics. The New York Times even had a Data Artist in Residence—Jer Thorp—who says he wants to humanize data. “Each data set has its own unique character,” he explains, whether it’s the UK’s National DNA Database or the names on the 9/11 Memorial in Manhattan. The traditional disciplines of the humanities have always shown us our own reflection; now, data sets and algorithms can help make its outlines much more precise.

No comments:

Post a Comment