Data+ Projects Showcase Applications of Data Science to Real-world Challenges

October 23, 2018

Data+ Showcase.

Over the summer, Duke students on 24 project teams worked on a wide variety of real-world challenges using health data, text, voting records, wireless mapping data, economic and financial analytics and more. They were participants in Data+, a 10-week summer research experience that allows students to explore new data-driven approaches to interdisciplinary challenges. The program is offered through the Information Initiative at Duke (iiD) and is part of the Bass Connections Information, Society & Culture theme.

Four Data+ project teams are continuing as Bass Connections projects during the 2018-2019 academic year, including Gerrymandering and the Extent of Democracy in America, Vaccine Misinformation and Its Link to Hesitancy and Uptake in DurhamBig Data for Reproductive Health and Data and Technology for Fact-checking. The Data+ teams laid important groundwork for their affiliated Bass Connections teams, and many students are continuing to work on these projects throughout the academic year.

Vaccine Hesitancy Data+ team in front of their poster.

The Data+ program showcases many ways to apply data science in the real world and provides a pivotal new way of exploring data for its student participants.

Here’s what some of this year’s Data+ students had to say about the program:

I got hands on experience with buzzwords and ideas that I did not understand before (and still am working on). I developed my intuition for when to use computers to advance humanities work and when not to and learned how to overcome challenges in team-work and group-organizing. I also landed a research position with a philosophy professor doing more text mining and citation analysis, skills I would not have without Data+. –Sandra Luksic ’20 (Philosophy and Political Science), Women’s Spaces

Data+ gave me an opportunity to apply some of the methods and skills that I have learned in the first year of my Master's program to an interesting and very relevant project. It can be hard to see the real-world value of concepts learned in school, even in an applied field like statistics; through Data+, I was able to run MCMC sampling, learn how to use QGIS to manipulate spatial data and improve my knowledge of (mostly) Python and (a little) R. I learned not just how to answer questions, but also how to ask questions. –Lisa Lebovici ’19 (MS in Statistical Science), Gerrymandering and the Extent of Democracy in America

Data+ made me realize that data science research is surprisingly pragmatic. I came in thinking that success in data science came down to statistical brilliance, but in reality, the key to success is flexibility. Ninety-nine percent of the methods we tried did not work, and the only way we moved forward was changing our perspective of the problem. –Grant Kim ’21 (Computer Science and Electrical & Computer Engineering), Complex Decisions, Real Numbers: Medical Decision Making

Data+ student Sean Holt describes his poster.

Data+ projects are sponsored by faculty or staff members or an industry representative with a data problem or question that the project team tackles over the summer. Often, project teams are able to accomplish so much that team sponsors continue working with students after the program is over on papers, prototype applications and further analysis.

Data+ sponsors reported that they were thrilled with this year’s results:

I am particularly impressed by how students with different (particularly non-computing) backgrounds were able to contribute to various projects in very meaningful ways. –Jun Yang, Professor of Computer Science, Duke University, Data and Technology for Fact-checking

We greatly enjoyed working with our Data+ team and were very pleased with the end results of their work. I highly recommend this program for anyone who is looking to dip their toes into applying data analytics in their world. –Richard Biever, Senior Director, Duke Office of Information Technology, Duke Wireless Data and Co-curricular Pathways E-Advisor

The team of students had excellent technical and communications skills. I was impressed at how self-driven they were and their ability to come up with solutions to some inevitable data issues that they encountered. Their final output was both better presented, and more directly applicable, than I expected at the outset. –Emma Rasiel, Professor Economics, Data and the Global Corporate Bond Market

Duke Forge, the center for health data science at Duke, sponsored the Improving the Machine Learning Pipeline at Duke team. The team developed a tool to operationalize the application of distributed computing methodologies in the analysis of electronic medical records at Duke. As a case study, the team applied these systems to a natural language processing project on clinical narratives about growth failure in premature babies.

Team members presented their work at The Forge’s Neonatal Intensive Care Unit prediction stakeholder collective meeting on July 31, 2018. The presentation prompted a lively discussion about how this data could be used to predict growth failure and other health markers in babies.

Data+ student team members and members of the Durham Crisis Intervention Team smile in front of their poster.

This was the second year that a Data+ team partnered with the Durham Crisis Intervention Team (CIT) to investigate how mental health training for law enforcement officers affects key outcomes such as incarceration, recidivism and referral for treatment. At the last CIT collaborative meeting in August 2018, members of the sponsoring organization decided they were so happy with the Mental Health Interventions by the Durham Police teams’ two years of data analysis that they will be naming the Data+ program their “Community Partner Agency of the Year” at this year’s end-of-year celebration banquet on December 7, 2018.

Evaluation and research are sustaining core elements of the Crisis Intervention Team. We are very fortunate to be in a position to collaborate with Duke’s Data+ Program. The team has done a phenomenal job with evaluation data from 2002 to 2017. We are looking forward to this continued relationship. –Major Elijah Bazemore, Durham Sheriff’s Department and Crisis Intervention Team, Mental Health Interventions by the Durham Police

The Big Data for Reproductive Health team, led by Amy Finnegan and Megan Huchko, sought to build a web-based application that will allow users to visualize and analyze data from the Demographic and Health Surveys (DHS) contraceptive calendar. To ground their project, they did a mapping exercise to identify currently available tools, identifying core elements they liked and key areas a new tool could improve.

Using DHS data and user feedback from various stakeholders in the field, they created a website that hosts four different data visualization methods to interpret trends in contraceptive use from the DHS contraceptive calendar. The site currently uses Kenya data to demonstrate efficacy, but additional datasets will be added soon.

Saumya Sao and Melanie Lai Wai stand in front of their poster.

Although Big Data for Reproductive Health team member Melanie Lai Wai (MS in Statistical Science ’19) had worked on coding projects in the past, teammate Saumya Sao ’20 (Gender, Sexuality & Feminist Studies and Global Health) came into the summer with very little coding experience. Sao said she enjoyed learning how to work with the programming language R, and both students are continuing their research through the related Bass Connections team this fall. The team is continuing to improve the website, working to gain a deeper understanding of machine learning and big data analytics and engaging with key stakeholders to ensure maximal usability for the tool.

The Gerrymandering and the Extent of Democracy in America team has already had some major successes this year in seeing their work applied to the Common Cause vs. Rucho federal court case in North Carolina. On August 27, 2018, a three-panel judge ruled that North Carolina had been unconstitutionally gerrymandered when districts in North Carolina were redrawn.

Data+ is currently accepting project proposals for Summer 2019!

We are especially interested in proposals that involve a partner from outside the academy, or a faculty member from a different discipline. We also encourage proposals that involve previously untested ideas or unanalyzed datasets, and we hope that the Data+ team can make a contribution with important proof-of-principle work that may lead to more substantial faculty work and/or connections in the future. We also welcome proposals that will lead to the undergraduates creating tools that might be used in the classroom or that might facilitate community engagement with data and data-driven questions.

Faculty can also submit a joint proposal for Data+ alongside a year-long Bass Connections team.

For more information or to discuss a potential proposal, please contact Paul Bendich.

Learn More

Images courtesy of Ariel Dawn: Data+ Poster Session, Energy Hub Atrium, Gross Hall, August 3, 2018; Members of the Vaccine Hesitancy and Uptake team, John Madden ’19, Lucy St. Charles ’20 and Alexandra Putka ’21; Machine Learning Pipeline at Duke team member Sean Holt ’20; Members and affiliates of the Mental Health Interventions by the Durham Police Team, including Nicole Schramm-Sapyta (Duke Institute for Brain Sciences), two members of the Durham Police Department, Joyce Yoo ’20, Tatyana Bidopia ’19, Matt Rose ’21 and Simon Brauer (Ph.D. in Sociology); Members of the Big Data for Reproductive Health team, Saumya Sao ’20 and Melanie Lai Wai (MS in Statistical Science ’19)