Skip to main

Data+ Now Accepting Applications for Summer Research Experience

Data+ is a ten-week summer research experience that gives Duke undergraduate and master’s students an opportunity to explore data-driven approaches to interdisciplinary challenges. 

Students join small project teams of approximately four students, working alongside other teams in a communal environment. They learn how to marshal, analyze and visualize data, while gaining broad exposure to the modern world of data science.

The teams make up a group of about 20 working in different subject areas. Teams meet for lunch regularly and discuss the progress of their work. This lets students learn broader strokes about data science while digging deep into one project with their teammates.

Program details and how to apply

Participants will receive a $5,000 stipend, out of which they must arrange their own housing and travel. Please note that participants may not accept employment or take classes during the program.

The program runs from May 23 until July 29, 2016. The application deadline is February 25, 2016, but applications are accepted and evaluated on a rolling basis.

Projects available for 2016

The projects planned for summer 2016 are listed below. While projects may appeal to certain majors, like public policy or economics, students from all backgrounds are encouraged to apply. For some projects, IRB training may be required and will be provided in advance. 

Please indicate the number of the project you choose when you apply.

1) Election Polling This project will explore new possibilities in political polling methodology, using data provided by a leading polling form. Sponsored by POLIS: The Center for Politics, Leadership, Innovation, and Service. 

2) Drugs and Gluttony How do individuals change their eating habits when given a new prescription? Working with the Sanford School and the BECR Center, students will analyze and visualize detailed food purchase data and prescription records to classify drugs, examine the effect of new prescriptions on food purchases and measure the nutritional impact of any changes in food purchase behavior. 

3) Data-Driven Parking Sponsored by the Parking and Transportation Office at Duke. Students will work with a dataset that carries de-identified information about parking behavior in the university and medical system. In consultation with professionals at that office, they will build vizualization and analytical tools to assist with strategic inventory management. 

4) Transgender Discrimination Survey Students will work with data from the National Transgender Discrimination Survey, collected as a joint project of the National Gay and Lesbian Task Force and the National Center for Transgender Equality. They will evaluate the consequences of anti-transgender bias on housing, employment, health, education, family life and criminal justice systems. 

5) Smoking and Activity Space Students will use signal processing and network analysis to understand how the manner in which people allocate their lives across space and time is related to variables that affect their health. They will use GPS activity tracks acquired from smokers and nonsmokers in Durham County, as well as demographic and purchase data about each subject. 

6) Data-driven Development Sponsored by Alumni Affairs and Development at Duke. Students will investigate commonalities and distinctions in alumni gifts, and attempt to understand and predict motivations for gifts of different types. They will also construct mathematical models to evaluate different strategies for alumni engagement. Students will have the opportunity to consult with the Prospect Research, Management and Analytics team in the development office. 

7) Geometry and Topology for Data Sponsored by Geometric Data Analytics, Inc. Students will use methods inspired by geometry and topology on data relevant to vehicle tracking and/or cyber defense.

8) Energy Resource Assessment How can satellite imagery be used to understand and analyze building energy efficiency, distributed solar generation and the consumption of oil and coal? Using data from the U.S. Geological Survey, students will investigate methods and applications for using satellite imagery and aerial photography for energy resource assessment. 

9) EMR and Clinical Trials This project involves the Triad Health Network and UNC Greensboro. Students will utilize electronic medical records data including medical codes, medications, demographics, lab values and vital signs to develop a model for predicting future disease in patients with diabetes, build a scheme for randomizing high risk patients to either health coaching or control groups and implement an approach to track cost and comorbidity outcomes in trial patients. 

10) NC Budget Data and Policy This project is sponsored by the Budget and Tax Center, part of the North Carolina Justice Center. Students will help the BTC build a keystone tool for analysis of the North Carolina state budget, use historical budget data to run scenarios in order to illustrate the impact of proposed state-level policies and help create a budget data visualization tool that allows the public to explore and learn about public investments in an interactive way. 

11) Black Queen Hypothesis This project involves some travel to the Smithsonian National Museum of Natural History. Student will use image analysis techniques to test predictions about the relationship between body size and social complexity, in the context of the evolution of highly cooperative colonies of eusocial insects. 

12) Fruit Fly Morphogenesis Students will analyze 4D images (3D + time) of cell movements and shape changes in developing fruit fly embryos to assess the effect of genetic mutations on morphogenesis. They will implement machine learning methods to detect and track cell boundaries in image sequences, and create a graphical user interface in which biologists may edit the results. 

13) Eye Movements and Food Choice Students will develop automated image processing algorithms to analyze mobile eye-tracking data, in an attempt to identify and code what subjects are looking at while they choose between different items in a mock “convenience-store” setting. 

14) Durham Neighborhoods Sponsored by the Neighborhood Compass, a neighborhood indicators project designed to empower the communities of Durham County by leading and tracking services and action with data. Students will build visualization tools to explore and analyze datasets related to chronic ambient stress, population change and energy consumption. There will be opportunities to present findings to city and county leadership. 

15) Health Networks and Disparities Sponsored by the Duke Network Analysis Center. Students will build an interactive “co-treatment” network visualization tool, where each presented medical condition is linked to the other conditions most commonly associated with it in similar patients. This will help medical professionals see beyond the immediate condition that patients present and show how medical problems co-occur. Disparities by race, gender and poverty status will be analyzed. 

16) Night Vision  Students will explore the genetics of night vision and darkness adaptation. The team will perform time series analysis of visual acuity and contrast sensitivity—night vision score measurements taken over a period of 20 minutes as participants adapt to the dark. Students will then evaluate the association of these temporal trends of darkness adaptation with genetic variant data. 

17) Team Science Sponsored by a Clinical and Translational Science Award from NIH. Students will develop metrics for evaluating the amount of team-based science that occurs at Duke, and will produce a visualization map of team science in the Duke Schools of Medicine and Nursing. 

18) Predicting Pancreatic Cancer Students will electronic medical records (EMR) data to search for precursors of pancreatic cancer. They will assess whether patients with type II diabetes are at higher risk for pancreatic cancer, and develop a statistical model of demographic and medical record data. 

19) LungMAP The LungMAP project seeks to improve lung health by providing the research community with a comprehensive web-based atlas to support investigations into the processes that regulate lung development. Students will develop a statistical and machine-learning pipeline to automatically classify immunofluorescent images of developing mouse lungs in the LungMAP database. 

20) National Asset Scorecard for Communities of Color The NASCC is an ongoing survey project that gathers information about asset and debt positions of households at a detailed racial and national origin level. Students will work with Duke’s Samuel DuBois Cook Center on Social Equity to use the survey data to examine various dimensions of social inequality including labor market discrimination, health outcomes, family structure, differential reliance on predatory lending sources, exposure to poverty, immigrant remittances and equity in homes versus other types of assets. 

21) Smart(er) Routing at Theme Parks Students will have access to historical queue length and realized wait time data from several major theme parks, with the data provided by The team will investigate the behavior of customers to determine the strategies and degree of sophistication that customers employ when deciding which rides to ride and when. Students will also seek to identify opportunities to shift customer behavior in a socially desirable way, with the goal of reducing congestion and improving the customer experience.

Funding sources

Funding is provided by part of a Research Training Grant issued by the NSF to the Departments of Mathematics and Statistical Science at Duke. Additional funding and infrastructure support provided by the Information Initiative at Duke (iiD) the Social Science Research Institute (SSRI), Bass Connections, MEDx, and the Vice-Provost for Research.

Cross-posted from the Duke Social Science Research Institute website.

Read about some of last summer's projects.