Loading...

Historical Shifts and Geographical Drifts: An Exploration of AI Embedding (2025-2026)

Background

AI systems use “embeddings” – numerical representations of human concepts learned through neural networks – to encode semantic and syntactic information in high-dimensional spaces. These embeddings, created from enormous amounts of human-generated data, are used across various fields from healthcare to fashion.

By studying these AI embedding spaces, researchers can analyze the human data that created them. The original data’s size makes direct analysis challenging, but embeddings provide a compressed version that allows researchers to explore this data for the first time. This research could reveal new patterns in how human knowledge and cultural understanding have evolved, while also improving our ability to detect and address biases in AI systems.

Project Description

This project will explore embeddings across time, geography and languages through three interconnected components, each of which will have a corresponding subteam:

  1. The temporal subteam will develop small transformers-based contextual embedding models trained specifically on text from multiple time periods and fine-tune existing models using period-specific text. They will analyze how concept representations have changed, examining shifts in the meaning of terms and tracking the evolution of AI terminology across different research eras.
  2. The geo-lingual subteam will investigate how large embedding models encode different languages and regional variations of the same language. They will create a specialized dataset for probing pre-trained models, incorporating text that spans multiple languages and geographical locations in order to compare how the same concept is represented when embedded using models trained on different languages.
  3. The representation subteam will focus on developing innovative approaches to visualize and interact with high-dimensional embedding spaces. They will conduct a comprehensive literature review examining visualization techniques from diverse fields, including neuroscientific brain mapping, geospatial analysis and string theory visualizations. The project will leverage VR, sonification and haptic feedback to help users better conceptualize multidimensional spaces.

The project aims to advance both technical understanding of embedding spaces and public engagement with these complex concepts. The project’s findings will be shared through two main channels: an interactive website built with modern JavaScript frameworks and an experiential art exhibit.

Anticipated Outputs

Interactive website; experiential art exhibit; academic articles

Student Opportunities

Ideally, this project team will include 3 graduate students and 3 undergraduate students. Interested students will likely come from diverse backgrounds and majors, but share an interest in media arts, AI and software development. Applicants competent in using Python are preferred; experience building websites with JavaScript is a plus.

Undergraduate students will learn the fundamentals of machine learning/artificial intelligence embeddings, best practices for working with high-dimensional data and dimensionality reduction techniques, and ways to connect AI with research in the arts and humanities. They will have the opportunity to practice Python programming abilities and participate in the creation of interactive media. Graduate students will have the opportunity to gain leadership skills through two paid group leader roles.

In Fall 2025, the team will meet on Mondays and Wednesdays from 10:05-11:20 a.m.

Timing

Summer 2025 – Spring 2026

  • Summer 2025 (optional): Dataset curation; literature review; automated analysis pipeline development; visualization of language/geography encodings
  • Fall 2025: Create interactive website visualizing embedding spaces; analyze geo-lingual data on algorithmic bias; build temporal transformer models using historical texts; develop art exhibition concept/timeline and secure venue
  • Spring 2026: Fine-tune embeddings by time period; explore temporal patterns; develop questions/analyses; create interactive website and art exhibit; prepare manuscripts; publicize findings

Crediting

Academic credit available for fall and spring semesters; summer funding available

Team Leaders

  • Brinnae Bent, Pratt School of Engineering
  • William Seaman, Arts & Sciences: Art, Art History, and Visual Studies

Team Contributors

  • Gregory Baker, Arts & Sciences: Art, Art History, and Visual Studies