Loading...

Data Science For Cardiovascular and Brain Health Research (2026-2027)

Background

Cardiovascular diseases remain the leading causes of death and long-term disability worldwide. Although many risk factors — such as smoking, diabetes and high cholesterol — are well established, numerous other medical conditions, medications and environmental exposures likely influence cardiovascular outcomes but remain poorly studied. Some medications are protective, while others may increase risk, and access to these therapies is often inequitable.

Electronic health records (EHRs) offer an unprecedented opportunity to deepen our understanding of these complex relationships. EHR data include detailed information on diagnoses, medications, laboratory values and free-text clinical notes, enabling large-scale analyses that can uncover new risk factors and evaluate how treatments perform in real-world settings. By identifying which patients are at highest risk and which treatments work best, researchers can support preventive strategies and improve clinical decision-making.

Project Description

Building on the 2025–2026 project, this team will extend its work across three research threads — predictive analytics, pharmacoepidemiology and causal inference — using data from Truveta (over 100 million patients), the National Institutes of Health All of Us Research Program, the UK Biobank and the American Heart Association Get With The Guidelines registries.

Predictive analytics
Team members will explore new risk factors for cardiovascular disease using hierarchical clustering to identify combinations of conditions that convey elevated risk. These clusters or individual factors will then be evaluated using propensity score matching. This work builds on a risk score model developed in the prior year and validated using external datasets.

Pharmacoepidemiology
The team will study how medication combinations influence cardiovascular risk, especially among patients with existing disease. Analyses will assess whether certain medications interact synergistically or antagonistically and will apply appropriate causal inference methods to evaluate risk or protective effects.

Causal inference for cardiovascular disease management
Using advanced econometric tools — such as regression discontinuity and instrumental variables — the team will emulate trial designs and assess the real-world effectiveness of care delivery strategies and policy changes. These insights will help guide system-level improvements in cardiovascular care.

All research threads will proceed in parallel, and team members may cross between areas as their projects evolve. In 2026–2027, the team will integrate closely with the newly funded Observational Research Building Interdisciplinary Therapeutic Advances (ORBIT) Hub, pairing each participant 1:1 with a full-time analyst to provide intensive mentoring and hands-on experience in rigorous observational research.

Anticipated Outputs

  • Four to six peer-reviewed publications with student co-authorship
  • Abstract submissions to the American Heart Association Scientific Sessions (2026 or 2027) and International Stroke Conference (2027)
  • Grant applications, including those supporting undergraduate and graduate research training
  • Manuscripts, statistical analysis plans and curated datasets for further research

Student Opportunities

This project will include 2-4 graduate or professional students and 2-4 undergraduate students. Participants with interests in data science, epidemiology, causal inference, public health, biostatistics, computer science, economics, population health or clinical medicine are encouraged to apply.

Students will gain experience in:

  • Statistical modeling and predictive analytics
  • Causal inference approaches including regression discontinuity, instrumental variables and trial emulation
  • Working with large, multimodal datasets (EHRs, cohort studies and disease registries)
  • Coding in R, data cleaning, and constructing analysis-ready datasets
  • Manuscript development and poster/oral presentation
  • Participation in seminars, journal clubs and research days through Duke’s Department of Neurology
  • Optional clinical shadowing as relevant and permitted

Graduate students will serve as near-peer mentors, guiding analysis plans and collaborating closely with ORBIT Hub analysts to support project execution.

Timing

Summer 2026 – Summer 2027

Summer 2026 (optional):

  • Prepare abstracts for AHA Scientific Sessions 2026
  • Conduct literature review and exploratory data analysis

Fall 2026:

  • Develop statistical analysis plans
  • Begin full-scale analyses
  • Participate in research methodology seminar series

Spring 2027:

  • Complete analyses
  • Prepare early drafts of manuscripts

Summer 2027 (optional):

  • Finalize manuscripts
  • Prepare abstracts for AHA Scientific Sessions 2027

Crediting

Academic credit available for fall and spring semesters

Team Leaders

  • Fan Li, Arts & Sciences: Statistical Science, School of Medicine: Biostatistics and Bioinformatics
  • Jay Lusk, School of Medicine: Neurology
  • Brian Mac Grory, School of Medicine: Neurology, School of Medicine: Ophthalmology