Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2018-2019)


The worldwide economic burden associated with caring for patients with schizophrenia has doubled in the last 10 years, from $62.7 billion in 2002 to $155.7 billion in recent years. Direct healthcare costs (inpatient care, emergency visits, medication costs and long-term care) account for 22-24% of all healthcare expenditures. 

Patients with schizophrenia are high utilizers of emergency department (ED) services because of relapse, which may be caused by psychoactive substance use, not taking medications as prescribed and/or lack of efficacy of interventions. These patients frequently need inpatient care, but insufficient resources lead to a situation in which patients are often kept in the ED until the crisis resolves, the substances dissipate from their system, interventions take effect or an inpatient bed becomes available.

The average number of patients with schizophrenia presenting to the Duke Hospital ED is 400 to 500 per year; half of these patients remain in the ED for up to two weeks awaiting placement at an inpatient psychiatric facility. Assuming a direct cost of $716/day per patient in the ED, direct care for patients with schizophrenia can be estimated to cost approximately $3.75 million per year.

Project Description

This Bass Connections project aims to foster effective allocation of resources by assessing relapse risk and applying community supports and priority inpatient beds according to risk.

There are no tools in psychiatry that can reliably predict patients’ illness course. The project team will use machine learning in order to develop a clinical prediction tool to predict the risk of relapse and prognosis of schizophrenia for every patient diagnosed with the condition. Based on predicted risk of relapse, resource allocation can be optimized to target reduced relapse rate and, ultimately, result in less frequent visits to the ED.

The 2017-18 Bass Connections team extracted clinical data on 1,350 patients with schizophrenia from Duke electronic health records (EHR) along with all provider notes. Data in these notes will add features for the team’s machine learning prediction tool, which is currently modeled based on diagnosis, medications, problem lists, insurance and other demographics. Adding substance use history, episodes of aggressions, suicidal and self-injurious behavior, homelessness, compliance on medications, responses to different medications and details about other treatments will enhance the tool.

Outcome measures for the current machine learning model are frequency of visits for outpatient and ED visits and duration of stay if inpatient. Using analysis of provider notes, the team will extract other outcome measures such as being lost to follow-up, suicide, lack of compliance on medications or death by suicide or homicide. Team members will analyze these text notes with natural language processing (NLP) packages. Negations will also be taken into account, following which extracted text will be screened for any protected health information (PHI). Once devoid of PHI, these extracts will be analyzed to help develop the clinical prediction tool. The goal is to utilize the extracted language as features for applying machine learning. Clinically relevant terms will be compiled from SNOMED-CT.

Anticipated Outcomes

Two publications, multiple posters, pilot data for future grant applications

Student Opportunities

Master’s or Ph.D. students in computer science, statistics or math are encouraged to apply, along with undergraduates in computer science or math. Coding, analytic and problem-solving skills are sought.

IRB rules prevent undergraduate students from access to raw unprocessed patient notes because of protected health information (PHI). There has been no reliable technique to remove PHI without compromising confidentiality. Therefore a graduate student will work in collaboration with two team leaders to extract clinically meaningful data from patient notes, which average 50-100 for each patient. Extracted clinical data from these notes will be checked and screened to ensure that PHI is not present. Three undergraduate students will then build a feature matrix model to apply machine learning to these extracted features.

Duke undergraduates and graduate students can apply for this project team beginning on January 24. The priority deadline is February 16 at 5:00 p.m.


Summer 2018 – Fall 2018  

  • Summer 2018: Graduate students perform tasks on 500 patients (extraction of clinical relevant details from notes and screening to ensure lack of PHI); undergraduates conduct analysis of these extracted details to enrich feature matrix model
  • Fall 2018: Complete next 500 patients by December 2018


Independent study credit available for fall semester; summer funding

See earlier related team, Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2017-2018).

Faculty/Staff Team Members

Jane Gagliardi, School of Medicine-Psychiatry*
Katherine Heller, Arts & Sciences-Statistical Science*
Gopalkumar Rakesh, School of Medicine-Psychiatry*
Jessica Tenenbaum, School of Medicine-Biostatistics and Bioinformatics*

* denotes team leader


Active, New