AsteXT: Data Networks of Asian American Literature (2025-2026)
Background
Asian American literature is expanding in both volume and prominence. 62% of Asian American literature was produced in the last three decades. The acclaim of recent novels like Yellowface (2023) and Chlorine (2024) corroborates the growing cultural impact of this body of American storytelling.
Yet, these works are under-studied. While short-form American literature has received academic attention, the transnational complexity of Asian American short stories makes it hard to assess their most community-bound aspects: prose style, structure, theme and circulation. Tracing publication trends and authorial agency in short-form Asian American works across time will deepen understanding of the course of literary identity and subjectivity among the fastest-growing ethnic community in the U.S.
Project Description
This project team will create a unique data set and computational toolkit to study Asian American literature and culture in a new way using empirical analysis and the scalability of data science. Building on the work of Ted Underwood and Andrew Piper, it aims to advance Digital Humanities (DH) research and intervene on the lack of computational workflows for parsing minority literature. The team will use cutting-edge statistical Natural Language Processing (NLP) tools to study large corpora of texts to identify language patterns in meaning, syntax, grammar and themes that will help create a new literary metadata profiling system.
This project will include three branches, each focused on one interrelated goal that will advance the data-driven analysis of Asian American narratives and create versatile tools that scholars can use in their work. The statistics branch will aim to create a reusable DH toolkit for future research in Asian American studies and American literary studies. The database branch will build open-access literary and cultural datasets for teachers, students and scholars to use in their work. The theory/history branch will apply statistical methods to capture minority creative histories in our modern digital epoch, as well as studying copyright implications and permission solicitation in text analysis research.
Anticipated Outputs
NLP toolkit; public digital short story database; data collection; website; copyright/fair use research guidebook; public event with live database demonstration
Student Opportunities
Ideally, this project team will include 3 graduate students and 10 undergraduate students. Graduate students will mentor one subteam of undergraduate students, and preferably specialize in computer science, English literature and dataset creation. Interested undergraduate students may come from such fields as computer science, modern literature, creative writing, archival and database research, history and law.
This project will enhance the skillset of students in the computer sciences, humanities and social sciences. Students will gain understanding of the logical processes behind computational tools and the data sets that influence humanistic inquiry. They will learn from current legal battles regarding copyright, fair use and computational methods. Students will also have opportunities to practice public speaking, real-life troubleshooting, project management and group collaboration.
See the related Data+ project for Summer 2025; there is a separate application process for students who are interested in this optional component.
Timing
Summer 2025 – Spring 2026
- Summer 2025 (optional): Build a statistical NLP research prototype; define the academic subject of the Asian American short story; determine research pipeline for database branch; start building analysis architecture
- Fall 2025: Perform historical research; refine statistical toolkit and database; perform copyright and legal research; perform toolkit test cases; finalize cluster method; pair toolkit with stories
- Spring 2026: Apply toolkit to entire database; finalize network visualization method; create historical context visualization; create network series; deploy toolkit; research literary theory
Crediting
Academic credit available for fall and spring semesters; summer funding available
See related Data+ summer project, Data Networks of Asian American Literature, 1974-2024 (2025).
Image: Cover of Jade Song's Chlorine, courtesy of HarperCollins