Data-driven Approaches to Interdisciplinary Challenges

December 16, 2015
Data-driven Approaches to Interdisciplinary Challenges

Originally published in the Fall 2015 issue of GIST by the Duke Social Science Research Institute (SSRI)

Data+ is a ten-week summer research experience that immerses undergraduate students in client-based big data projects.

With approximately forty undergraduate participants making up fourteen project teams, the program offers a structured setting for students to engage with big data while encouraging peer-to-peer interaction. The teams themselves are small. Three or four undergraduates form each project team in addition to a supervising mentor and a faculty sponsor.

Participation in the program is open to Duke undergraduates interested in exploring new data-driven approaches to interdisciplinary challenges. While fields like mathematics, economics, and statistics are typically well represented in data science exploration, participants this past summer represented a wide range of fields including biophysics, earth and ocean sciences, evolutionary anthropology, and public policy.

Creating an Engaged Research Community

Housed in the Information Initiative at Duke (iiD) in Gross Hall, the program teams meet biweekly to discuss their progress and the challenges they encounter during their work. Students can exchange ideas, ask one another questions, and practice relationship building and communication skills.

The program’s design is meant to maximize student engagement both with the data and with others according to Paul Bendich, director of Data+ and associate director of undergraduate research at iiD.

“If you give them one project to do, you give them a deep understanding of some of it—so it’s a really good idea to have them immersed in thirteen or fourteen other teams doing similar but different projects. That way they see the whole range of what data science can be.”

Drawing on both quantitative and qualitative skillsets, students learn to marshal, analyze, visualize, and explain their data to individuals with different comprehension levels—from knowledgeable peers with a detailed under- standing of data science to clients with only the broadest knowledge.

The goal, Bendich said, is not just for participants to understand the research, but also for them to communicate that understanding clearly with others. The program’s design provides students with opportunities to practice communicating prior to high-pressure situations, practice Bendich described as necessary for success inside and outside the academy.

A Client-based Approach

At the end of the ten weeks, participants present their findings to the client, either a Duke faculty member or a corporation or company, who sponsored the project team’s work.

“The idea is that in the ‘real world’ you come in for a period of time and you have a client that has a problem involving data and you have to figure out how to solve it. Far more importantly, you have to present your solution to them and give them advice, interact with them, dialogue with them. We want to recreate an example of that for students. Some of the clients were actual clients, companies, and corporations,” Bendich explained.

Data+ is funded by a research training grant issued by the National Science Foundation, with matching funds from the Information Initiative at Duke (iiD) and the Social Science Research Institute (SSRI). Infrastructure, staff, and faculty from both centers contribute to the program, furthering the interdisciplinary nature of it.

Constructing Challenges from Duke MOOC Data

As one Data+ team found, massive open online courses (MOOCs) are a rich source of data. Using Duke’s Coursera data, team members analyzed six courses for elements like student intentions versus student behaviors and student retention across time in a single course.

“[The team] provided interesting proofs of concept that showed that the Coursera data is a fruitful arena for research projects. One of the students, Yijun (Jenny) Li, continues to work on a project to better predict who will complete and who will not complete Coursera classes and what factors impact completion,” said Lorrie Schmid, the team’s mentor and Research Data Infrastructure Manager of EHDi.

“The key assumption by the client and other Coursera users was that the Coursera data might be used for a Datafest,” Schmid explained. While confidentiality issues around the data complicated plans for Datafest, Schmid said students learned a valuable research lesson in the process—namely, that research outcomes cannot always be predicted. And while outcomes may not fall in line with expectations, other key components of the project, like learning how to prepare a proposal for IRB approval, offered valuable skills for further research endeavors.

“The project was a great learning experience for me,” said Andy Cooper, a student on the Data+ MOOC team.

“Since I had never completed a research project like this before, I didn’t have any clear expectations for the summer. I did expect to learn about statistical research and analysis, and that was certainly the case for me. I learned about researching in the social sciences, I became much more comfortable with the program SAS, and I gained useful experience in writing a formal research paper.”

Food Choices and Behavioral Economics

The Duke-UNC USDA Center For Behavioral Economics And Healthy Food Choice Research (BECR) exemplifies interinstitutional collaboration, and their Data+ sponsored team continued that collaborative spirit with inter- disciplinary research into food choices within the WIC program.

Undergraduates Alex Hong, Kehan Zhang, and Kang Ni worked with their client Matthew Harding, director of BECR, to investigate grocery store transaction data as a first step toward developing a customer preference index for the products available through WIC.

By bringing together students from statistics, economics, and public health, the project was able to gain a deeper and more complex under- standing of the data than if one approach had been applied to it.

“That’s the magic of putting [students] together,” Bendich said. “The analytical questions are different but the skills that are needed to develop are the same.”

Workforce Incentives

Luke Raskopf, a political science major and Xinyi (Lucy) Lu, a statistics and computer science major, teamed up to investigate the effectiveness of policies to combat wage stagnation and unemployment in working and middle-class families in North Carolina.

Their client, Director Allan Freyer of the North Carolina Justice Center, worked closely with them throughout the ten weeks as they studied the effects of different types of incentives programs in certain counties in the state.

At the end of the program, Raskopf and Lu presented their findings, along with data-driven policy recommendations, to the Justice Center in Raleigh.

Excited for his partnership with Data+, Freyer said “[t]he students provided a significant amount of highly sophisticated data analysis in support of an important project on the ways in which economic development professionals can fight wage stagnation at the state and local levels. I strongly recommend professional organizations to consider Data+ for statistical analyses they don’t have the capacity to do in house—the students are highly adaptable, learn quickly, and perform high quality work.”

Learn more: bigdata.duke.edu