Data Science For Cardiovascular and Brain Health Research (2026-2027)
Background
Cardiovascular diseases remain the leading causes of death and long-term disability worldwide. Although many risk factors — such as smoking, diabetes and high cholesterol — are well established, numerous other medical conditions, medications and environmental exposures likely influence cardiovascular outcomes but remain poorly studied. Some medications are protective, while others may increase risk, and access to these therapies is often inequitable.
Electronic health records (EHRs) offer an unprecedented opportunity to deepen our understanding of these complex relationships. EHR data include detailed information on diagnoses, medications, laboratory values and free-text clinical notes, enabling large-scale analyses that can uncover new risk factors and evaluate how treatments perform in real-world settings. By identifying which patients are at highest risk and which treatments work best, researchers can support preventive strategies and improve clinical decision-making.
Project Description
Building on the 2025–2026 project, this team will extend its work across three research threads — predictive analytics, pharmacoepidemiology and causal inference — using data from Truveta (over 100 million patients), the National Institutes of Health All of Us Research Program, the UK Biobank and the American Heart Association Get With The Guidelines registries.
Predictive analytics
Team members will explore new risk factors for cardiovascular disease using hierarchical clustering to identify combinations of conditions that convey elevated risk. These clusters or individual factors will then be evaluated using propensity score matching. This work builds on a risk score model developed in the prior year and validated using external datasets.
Pharmacoepidemiology
The team will study how medication combinations influence cardiovascular risk, especially among patients with existing disease. Analyses will assess whether certain medications interact synergistically or antagonistically and will apply appropriate causal inference methods to evaluate risk or protective effects.
Causal inference for cardiovascular disease management
Using advanced econometric tools — such as regression discontinuity and instrumental variables — the team will emulate trial designs and assess the real-world effectiveness of care delivery strategies and policy changes. These insights will help guide system-level improvements in cardiovascular care.
All research threads will proceed in parallel, and team members may cross between areas as their projects evolve. In 2026–2027, the team will integrate closely with the newly funded Observational Research Building Interdisciplinary Therapeutic Advances (ORBIT) Hub, pairing each participant 1:1 with a full-time analyst to provide intensive mentoring and hands-on experience in rigorous observational research.
Anticipated Outputs
- Four to six peer-reviewed publications with student co-authorship
- Abstract submissions to the American Heart Association Scientific Sessions (2026 or 2027) and International Stroke Conference (2027)
- Grant applications, including those supporting undergraduate and graduate research training
- Manuscripts, statistical analysis plans and curated datasets for further research
Student Opportunities
This project will include 2-4 graduate or professional students and 2-4 undergraduate students. Participants with interests in data science, epidemiology, causal inference, public health, biostatistics, computer science, economics, population health or clinical medicine are encouraged to apply.
Students will gain experience in:
- Statistical modeling and predictive analytics
- Causal inference approaches including regression discontinuity, instrumental variables and trial emulation
- Working with large, multimodal datasets (EHRs, cohort studies and disease registries)
- Coding in R, data cleaning, and constructing analysis-ready datasets
- Manuscript development and poster/oral presentation
- Participation in seminars, journal clubs and research days through Duke’s Department of Neurology
- Optional clinical shadowing as relevant and permitted
Graduate students will serve as near-peer mentors, guiding analysis plans and collaborating closely with ORBIT Hub analysts to support project execution.
Timing
Summer 2026 – Summer 2027
Summer 2026 (optional):
- Prepare abstracts for AHA Scientific Sessions 2026
- Conduct literature review and exploratory data analysis
Fall 2026:
- Develop statistical analysis plans
- Begin full-scale analyses
- Participate in research methodology seminar series
Spring 2027:
- Complete analyses
- Prepare early drafts of manuscripts
Summer 2027 (optional):
- Finalize manuscripts
- Prepare abstracts for AHA Scientific Sessions 2027
Crediting
Academic credit available for fall and spring semesters
See earlier related team, Data Science to Optimize Cardiovascular and Brain Health Promotion (2025-2026).