Columbia's Group for Experimental Methods in the Humanities
about events impact projects lab-culture people

We are a group of literary scholars, sociologists of knowledge, and information designers interested in experimenting with distant reading, lateral reading, cultural analytics, digital philology, micro- and macro-analysis, and textual trans-mediation. We believe that the humanities are not just a set of values or theories, they offer a way of enacting change in the world: from the creation of taxonomies, to knowledge design, material history, and large-scale text analysis. This research cluster therefore showcases new research methods, which complement traditional practices of deep thought and contextual reflection with a robust sense of making and doing.


» Middlemarch Critical Histories

  • Jonathan Reeve
  • Milan Terlunen
  • Sierra Eckert

(The main project page may be found here, via the XPMethod GitHub Organization.)

Text reuse detection technology and approximate text matching have made possible the large-scale computational identification of intertextuality. These technologies have often been used in plagiarism detection and in studies of journalistic text reuse. Fewer studies, however, have applied these methods to humanities research. We present a method for quantifying the critical reception history of a source text by analyzing the precise location, density and chronology of its citations.


» Macro-Etymological Textual Analysis

  • Jonathan Reeve

Word histories are correlated strongly with the tone and genre of a text. When a writer chooses the word “enchantment” instead of “spell,” or “inquire” instead of “ask,” this decision may indicate, to some degree, the speaker’s mode, dialect, or level of formality. These modes may be then be measured by quantifying the aggregated etymology of an entire text. The Macro-Etymological Analyzer is a command-line utility, written in Python, that quantifies the etymologies of a text using the Etymological Wordnet.


» Jail of Mountjoy graphic

  • Casey Michael Henry

Finnegans Wake is a text marked by music, yet this simple fact has long been obscured by the forbidding layering of Joyce’s prose. A book that should be appreciated for its lyrical assonance and rhythms has become a notorious symbol for impenetrability and displeasure. Anyone who has heard Joyce read the “Anna Livia Plurabelle” section can appreciate the Wake’s integral aural component. Anyone appreciating Joyce’s penchant for bawdy limericks and coarse double-entendre (such as naming a book of poetry, Chamber Music, after the “music” of a chamber pot) can understand his link to unconventional music.

Roland McHugh’s Annotations to Finnegans Wake suggests a structural relationship between Joyce’s musicality and referentiality through its spatial arrangement, such that each page resembles sheet music. My project, Jail of Mountjoy, takes this sonic understanding one step further by “playing” indicative sections of these annotations as music. This is accomplished by aligning different categories of reference to respective octaves–e.g. “Professional Jargon,” “Linguistic [Taxonomic],” “Linguistic [Categorical],” and so on. Then, sections are processed through a synthesizer to allow users to hear both the level of linguistic depth and type of reference indicated, while synchronized visual animations keep one situated in the text.

See the entire project here.

» Mapping Fabula and Sjužet in “Wandering Rocks” geocoding

  • Moacir P. de Sá Pereira

Read the README at GitHub. Mapping Fabula and Sjužet in “Wandering Rocks” as hosted by GitHub.

“Fabula and Sjužet in ‘Wandering Rocks’” is a data visualization of the events in the tenth episode of James Joyce’s Ulysses. It maps the events both by the fabula (historical spacetime) and sjužet (narrative spacetime) to show how both provide different points of entry into the episode.

The animation can be played both as the fabula, where events occur in chronological order, and as the sjužet, where events occur in narrative order.

The data are available to serve as a model for rebuilding the project with one’s own dataset.

» Text Divider: Quick Markup for Chapter and Dialogue Splitting markup

  • Moacir P. de Sá Pereira

Read the README at GitHub.

This python script breaks up a text into its internal sections. It uses a light markup scheme to signal where chapters and sections begin, and it also can keep track of dialogue by speaker. Given an electronic version of The Great Gatsby, for example, after the markup, it is possible to extract only Tom Buchanan’s lines.

The markup that breaks out the sections and dialogue was created by David Hoover, though the entirety of Prof. Hoover’s markup scheme has not been implemented...

» Epigraphing the 19th Century experiment

  • Aaron Plasek


Frequently ignored and occasionally made up, the epigraph is a textual genre defined both by its physical placement on the page and by the absence of the textual object being signposted. An epigraph attribution situates the text it prefaces within a larger constellation of texts and authors, and in this manner has an indexical function rather similar to scanning the spines of books on a shelf, flipping through a card catalog, or examining a record in a digital relational database. The affordances of citation networks cannot replace other critical methods, but a comparative approach to the different kinds of citation practices made visible by different networks of attribution provides an opportunity to reconsider how shared concepts that constitute a (disciplinary) field...

» Semantic Analysis of One Million #GamerGate Tweets semantic analysis

  • Phillip R. Polefrone

This paper develops a methodology for describing the contents of a controversy on a microblogging platform (Twitter) by measuring correlations in broad semantic categories. Over one million tweets were gathered daily from November 2015 to June 2016 using Tweepy and the Twitter API, over 280,000 of which were not retweets and thus contained unique data. Using a Python implementation of Roget’s hierarchy of semantic categories, these tweets were collected in bins of one thousand and analyzed using a “bag of categories” model, or a categorized bag of words. The linear correlation of each category with the “WOMAN” category was measured and compared with a control group. The categories concomitant with “WOMAN” in the test corpus include some noise, but as a whole they present a meaningful description of the conversation that adheres to its known qualities. This result suggests that a more developed version of this methodology could be used...

» Shape of Time visualization

  • Sierra Eckert
  • Allison Chaney

In a novel, time does not often move evenly or linearly–––a single paragraph in García Márquez’s One Hundred Years of Solitude jumps several decades while in Proust’s In Search of Lost Time, 15 pages are devoted to a single moment of eating a madeleine. In this project, we are interested in the kind of language that used to talk about time and what is the shape and tempo of this language in a given text. Tracking what we call the “time signature” of a text, we use explicit references to time passing in order to divide up a text, and then use...

» Science Surveyor network analysis

  • William Leif Hamilton (Stanford)
  • Raine Hoover (Stanford)
  • Marguerite Y. Holloway
  • Dan Jurafsky (Stanford)
  • David Jurgens (Stanford)
  • Laura Kurgan (Center for Spatial Research)
  • Minkyoung Kim (Stanford)
  • Eli Bennett Levin
  • Dan McFarland (Stanford)
  • Vinodkumar Prabhakaran (Stanford)
  • Phillip R. Polefrone
  • Juan Francisco Saldarriaga (Center for Spatial Research)
  • Dennis Yi Tenen

One of the biggest challenges facing science journalists is the ability to quickly contextualize journal articles they are reporting on deadline. A science reporter must rapidly get a sense of what has come before a new paper in the field, understand whether the paper represents a significant advance or not, and establish whether this finding is an outlier or part of the field’s consensus. Doing all that within a matter of hours or a few days is often impossible. The consequences of these limitations are serious and well documented. Science journalists are often overly dependent on expert sources, which encourages investigative complacency; they become vulnerable to presenting false balance and to covering articles that will be retracted; they sensationalize. As a consequence, the public often receives a mistaken view of science. Many people see science as a series of great new “discoveries” accompanied by a lot of hype; few understand...

» Character Networks for Narrative Generation article

  • Graham Alexander Sack

This paper models narrative as a complex adaptive system in which the temporal sequence of events constituting a story emerges out of cascading local interactions between nodes in a social network. The approach is not intended as a general theory of narrative, but rather as a particular generative mechanism relevant to several academic communities: (1) literary critics and narrative theorists interested in new models for narrative analysis, (2) artificial intelligence researchers and video game designers interested in new mechanisms for narrative generation, and (3) complex systems theorists interested in novel applications of agent-based modeling and network theory. The paper is divided into two parts. The first part offers examples of research by literary critics on the relationship between social networks of fictional characters and the structure of long- form narratives, particularly novels. The second part provides an example of schematic story generation based on a simulation of the structural balance network model. I will argue that if literary critics can better understand sophisticated narratives by extracting networks from them, then narrative intelligence researchers can benefit by inverting the process, that is, by generating narratives from...

» Visualizing Joyce graphic

  • Emily Fuhrman

In reference to schemas for Ulysses, Joyce describes the compositional technique behind the “Sirens” episode as a “fugue with all musical notations,”1 and as including the “eight regular parts of a fuga per canonem.”2 Joyce uses the first 63 lines of the chapter to introduce 99 words and syllables that reappear in different forms throughout the rest of the text. The sounds ultimately act as leitmotifs, evoking the sensory presence of different characters at different times.

This visualization is constructed as a line-by-line annotation of each sound that recurs at least four times following its initial introduction. Within each...

» LITclock twitter-bot

  • Dennis Yi Tenen
  • Susana Zialcita

This project was inspired by Christian Marclay’s The Clock1 Each minute, the LITclock Twitter handle will tweet one minute in time from a novel or narrative non-fiction book. (Occasionally, a travel guide chimes in.) Each tweet will be a quote from a book, describing what is happening in that very minute.

For example, the LIT CLOCK started with a quote from Christopher Marlowe, at precisely 12:00 am on 3/13/14:

The clock striketh twelve O it strikes, it strikes! Now body, turn to air

and then thirteen hours and nine minutes later, the LIT CLOCK told us that Miriam Wu from Stieg Larsson’s Millennium Trilogy is being interviewed:

“The time is 1:09 pm.” She turned off tape recorder.

Our goal was to create, as Zadie Smith said of Marclay’s clock, “thousands of fictional interpretations of time repurposed to express time...