Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2018-2019)


The worldwide economic burden associated with caring for patients with schizophrenia has doubled in the last 10 years, from $62.7 billion in 2002 to $155.7 billion in recent years. Direct healthcare costs (inpatient care, emergency visits, medication costs and long-term care) account for 22-24% of all healthcare expenditures. 

Patients with schizophrenia are high utilizers of emergency department (ED) services because of relapse, which may be caused by psychoactive substance use, not taking medications as prescribed and/or lack of efficacy of interventions. These patients frequently need inpatient care, but insufficient resources lead to a situation in which patients are often kept in the ED until the crisis resolves, the substances dissipate from their system, interventions take effect or an inpatient bed becomes available.

Project Description

This Bass Connections project aims to foster effective allocation of resources by assessing relapse risk and applying community supports and priority inpatient beds according to risk.

There are no tools in psychiatry that can reliably predict patients’ illness course. The project team will use machine learning in order to develop a clinical prediction tool to predict the risk of relapse and prognosis of schizophrenia for every patient diagnosed with the condition. Based on predicted risk of relapse, resource allocation can be optimized to target reduced relapse rate and, ultimately, result in less frequent visits to the ED.

The 2017-18 Bass Connections team extracted clinical data on 1,350 patients with schizophrenia from Duke electronic health records (EHR) along with all provider notes. Data in these notes will add features for the team’s machine learning prediction tool, which is currently modeled based on diagnosis, medications, problem lists, insurance and other demographics. Adding substance use history, episodes of aggressions, suicidal and self-injurious behavior, homelessness, compliance on medications, responses to different medications and details about other treatments will enhance the tool.

Outcome measures for the current machine learning model are frequency of visits for outpatient and ED visits and duration of stay if inpatient. Using analysis of provider notes, the team will extract other outcome measures such as being lost to follow-up, suicide, lack of compliance on medications or death by suicide or homicide. Team members will use natural language processing (NLP) term extraction packages to extract biomedical concepts from free text, including negation. The resulting concept vectors will not contain any PHI. Once devoid of PHI, these extracts will be analyzed to help develop the clinical prediction tool. The goal is to utilize the extracted language as features for applying machine learning. Clinically relevant terms will be compiled from SNOMED-CT.

Anticipated Outcomes

Two publications, multiple posters, pilot data for future grant applications

Student Opportunities

Undergraduates and graduate students in computer science, statistics or math are encouraged to apply. Coding, analytic and problem-solving skills are sought. In addition, students from psychology or those with a general interest in the topic are encouraged to apply to participate in a project management role.

IRB rules prevent undergraduate students from access to raw unprocessed patient notes because of protected health information (PHI). Therefore a graduate student[s] will work in collaboration with team leaders to extract clinically meaningful biomedical terms from patient notes, including contextual information such as negation, using existing natural language processing tools. Students will then work together to build a feature matrix model and apply machine learning to these (non-PHI) extracted features.


Summer 2018 – Fall 2018  

  • Summer 2018: Graduate students perform tasks on 500 patients (extraction of clinical relevant details from notes and screening to ensure lack of PHI); undergraduates conduct analysis of these extracted details to enrich feature matrix model
  • Fall 2018: Complete next 500 patients by December 2018


Independent study credit available for fall semester; summer funding

See earlier related team, Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2017-2018).

Faculty/Staff Team Members

Jane Gagliardi, School of Medicine-Psychiatry*
Katherine Heller, Arts & Sciences-Statistical Science*
Gopalkumar Rakesh, School of Medicine-Psychiatry*
Jessica Tenenbaum, School of Medicine-Biostatistics and Bioinformatics*

Community Team Members

Benedetto Benedetti, Scuola Normale Superiore

* denotes team leader


Active, New