Addressing the Challenges of Missing Data in Digital Health Studies
Project Team
Team profile by Hayoung Jeong, Leeor Hershkovich, Bill Chen, Md Mobashir Hasan Shandhi and Jessilyn Dunn
Digital biomarkers transform data from wearables into indicators of health outcomes, allowing rapid detection, prevention and management of many diseases. The Digital Biomarker Discovery Project (DBDP), initiated at Duke University’s BIG IDEAs lab, aims to set community standards in digital biomarker development by disseminating toolkits and learning resources. Despite its potential, the strategies for outsourcing the resources available at DBDP are limited. Our Bass Connections team set out to understand the specific needs of our users and uncover why researchers are not utilizing our resources.
Our team consisted of two subteams: the User Study team, skilled in conducting market research and structured interviews, and the Computational team, focused on coding, machine learning, and software development.
In Fall 2023, the User Study team conducted a product search to identify existing organizations similar to the DBDP and resources that support digital health studies. Team members identified a gap in the number of resources focusing on analyzing commercial device data. Based on this, the Computational Team began to develop dbdPy.
Consider, for instance, a researcher who previously spent countless hours manually cleaning and organizing data to make it usable. With dbdPy, this process is dramatically simplified and automated. Researchers can now save significant time and energy while ensuring consistency and reliability in their data processing steps.
In Spring 2024, our team sought to address analytical challenges in digital health research. The User Study team conducted qualitative interviews with Ph.D. students using commercial wearables in their work. Common themes emerged around data analysis, resource utilization and the challenges of data processing.
In particular, we identified data with missing observations – arising from device non-wear, improper wear or malfunctioning – as a prevalent issue in observational studies. Such issues are exacerbated by the fact that there are no standard definitions or assessments for missing data for studies conducted with commercial wearable devices.
To address the challenge of “missingness,” our team created a Python library named FOMO: Functions for Optimizing Missing Observations. This Python library helps researchers understand and manage missing data.
Further, our subsequent analysis revealed that certain patterns in missing data are associated with demographic characteristics. This implies that missingness can potentially lead to biased algorithms or conclusions if addressed inappropriately. With FOMO, we aim to standardize methods for handling missing data.
The year-long project with Bass Connections taught us the importance of data-driven communication, understanding available resources and considering the sociological and scientific impact of our work. The collaboration between our subteams has been especially fruitful, combining diverse skills to innovate and create new solutions. Exciting updates are on the horizon, so please check out our website!