The Computational Case against Computational Literary Studies

Da, Nan Z.

doi:10.1086/702594

Cited by 119 publications

(48 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…That's roughly twenty tokens for each type, a number that's really high because the book is so big. The ten most frequently appearing types in A43998, after all the words have been converted to lowercase, are the (14,849), of (10,850), and (7,305), to (7,236), is (4,864), that (4,786), in (4,194), a (3,122), by (2,636), and for (2,539). Those are just the top words, of course, and the list goes on from there.…”

Section: From Texts To Tokensmentioning

confidence: 99%

Is there a text in my data? (Part 1): on counting words

Gavin¹

2020

Journal of Cultural Analytics

View full text Add to dashboard Cite

A B S T R A C TThis essay is the first in a two-part series. This first installment invites readers to consider a few very basic questions: what does it mean to count words in a text? What happens to the text, and to our understanding of it, when we decompose it into a series of word counts?What relation exists between the textual domain and its numerical image? Or, to restate this question with a nod to literary critic stanley fish, "is there a text in my data?" following one document through a series of typical transformations --first into a simple list of words and their frequencies, then to a vector of elements in a matrix, and from there through the processes of normalization, dimensionality reduction, and analysis --this essay argues against the commonly held notion that counting words reduces complexity, suggesting instead that semantic models embed textual objects in highly complex structures that are extremely sensitive to historical context and subtle nuances in meaning. Word frequencies aren't static, given things that simply exist in a text. They're produced through the act of modeling, and the mathematical structures they imply dissolve both words and texts into elaborate systems of mutual interrelation.

show abstract

Section: From Texts To Tokensmentioning

confidence: 99%

Is there a text in my data? (Part 1): on counting words

Gavin¹

2020

Journal of Cultural Analytics

View full text Add to dashboard Cite

show abstract

“…Most previous work within the NLP community applies distant reading (Jockers, 2013) to large collections of books, focusing on modeling different aspects of narratives such as plots and event sequences (Chambers and Jurafsky, 2009;McIntyre and Lapata, 2010;Goyal et al, 2010;Eisenberg and Finlayson, 2017), characters (Bamman et al, 2014;Iyyer et al, 2016;Chaturvedi et al, , 2017, and narrative similarity (Chaturvedi et al, 2018). In the same vein, researchers in computational literary analysis have combined statistical techniques and linguistics theories to perform quantitative analysis on large narrative texts (Michel et al, 2011;Franzosi, 2010;Underwood, 2016;Jockers and Kirilloff, 2016;Long and So, 2016), but these attempts largely rely on techniques such as word counting, topic modeling, and naive Bayes classifiers and are therefore not able to capture the meaning of sentences or paragraphs (Da, 2019). While these works discover general patterns from multiple literary works, we are the first to use cutting-edge NLP techniques to engage with specific literary criticism about a single narrative.…”

Section: Related Workmentioning

confidence: 99%

Untitled

Wang

Iyyer

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

Literary critics often attempt to uncover meaning in a single work of literature through careful reading and analysis. Applying natural language processing methods to aid in such literary analyses remains a challenge in digital humanities. While most previous work focuses on "distant reading" by algorithmically discovering high-level patterns from large collections of literary works, here we sharpen the focus of our methods to a single literary theory about Italo Calvino's postmodern novel Invisible Cities, which consists of 55 short descriptions of imaginary cities. Calvino has provided a classification of these cities into eleven thematic groups, but literary scholars disagree as to how trustworthy his categorization is. Due to the unique structure of this novel, we can computationally weigh in on this debate: we leverage pretrained contextualized representations to embed each city's description and use unsupervised methods to cluster these embeddings. Additionally, we compare results of our computational approach to similarity judgments generated by human readers. Our work is a first step towards incorporating natural language processing into literary criticism.

show abstract

“…In his work on scholarly hypertext, David Kolb noted that informational hypertext and literary hypertext are different from hypertexts featuring scholarly inquiry, and asked "how in hypertext we might allow not just connection but assertion, selfrepresentation, and debate about criteria" [14]. The growth of the digital humanities in the last decade has seen an increase in databases and other digital tools for developing research in the humanities, but distant reading [17] and quantitative methods in the humanities have been criticised for their potential disregard for the qualitative interpretation that is at the core of humanities methodologies [8], and for a sometimes poor use of quantitative methods [5].…”

Section: Hypertextual Structure For Scholarly Inquirymentioning

confidence: 99%