Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2018-2019)


The worldwide economic burden associated with caring for patients with schizophrenia has doubled in the last 10 years, from $62.7 billion in 2002 to $155.7 billion in recent years. Direct healthcare costs (inpatient care, emergency visits, medication costs and long-term care) account for 22-24% of all healthcare expenditures. 

Patients with schizophrenia are high utilizers of emergency department (ED) services because of relapse, which may be caused by psychoactive substance use, not taking medications as prescribed and/or lack of efficacy of interventions. These patients frequently need inpatient care, but insufficient resources lead to a situation in which patients are often kept in the ED until the crisis resolves, the substances dissipate from their system, interventions take effect or an inpatient bed becomes available.

Project Description

This Bass Connections project aims to foster effective allocation of resources by assessing relapse risk and applying community supports and priority inpatient beds according to risk.

There are no tools in psychiatry that can reliably predict patients’ illness course. The project team will use machine learning in order to develop a clinical prediction tool to predict the risk of relapse and prognosis of schizophrenia for every patient diagnosed with the condition. Based on predicted risk of relapse, resource allocation can be optimized to target reduced relapse rate and, ultimately, result in less frequent visits to the ED.

The 2017-18 Bass Connections team extracted clinical data on 1,350 patients with schizophrenia from Duke electronic health records (EHR) along with all provider notes. Data in these notes will add features for the team’s machine learning prediction tool, which is currently modeled based on diagnosis, medications, problem lists, insurance and other demographics. Adding substance use history, episodes of aggressions, suicidal and self-injurious behavior, homelessness, compliance on medications, responses to different medications and details about other treatments will enhance the tool.

Outcome measures for the current machine learning model are frequency of visits for outpatient and ED visits and duration of stay if inpatient. Using analysis of provider notes, the team will extract other outcome measures such as being lost to follow-up, suicide, lack of compliance on medications or death by suicide or homicide. Team members will use natural language processing (NLP) term extraction packages to extract biomedical concepts from free text, including negation. The resulting concept vectors will not contain any PHI. Once devoid of PHI, these extracts will be analyzed to help develop the clinical prediction tool. The goal is to utilize the extracted language as features for applying machine learning. Clinically relevant terms will be compiled from SNOMED-CT.

Anticipated Outcomes

Two publications, multiple posters, pilot data for future grant applications


Summer 2018 – Fall 2018  

  • Summer 2018: Graduate students perform tasks on 500 patients (extraction of clinical relevant details from notes and screening to ensure lack of PHI); undergraduates conduct analysis of these extracted details to enrich feature matrix model
  • Fall 2018: Complete next 500 patients by December 2018

Team Outcomes to Date

Machine Learning on Electronic Health Record Data in Schizophrenia (poster by Kamyar Yazdani, Abhi Jadhav, Sanya Kochhar, Aakash Thumaty, Pranav Warman, Sam Lusk, Xue Zou, Dylan Qi Liu, Myung Woo, Colette Blach, Jane P. Gagliardi, Jessica D. Tenenbaum, presented at Bass Connections Showcase, Duke University, April 17, 2019)

Machine Learning on Structured EHR Data for Prediction in Schizophrenia: Feature Engineering and Pipeline Construction (poster by Kamyar Yazdanhi, Abhi Jadhav, Aakash Thumaty, Sanya Kochhar, Pranav Warman, Jane P. Gagliardi, Jessica Tenenbaum) presented at the 2019 Duke Research Computing Symposium, Durham, NC, January 16, 2019

Preliminary Findings in Natural Language Processing to Stratify Patients with Mental Illness (poster by Myung Woo, Dylan Liu, Stephen Evans, Jane P. Gagliardi, Jessica Tenenbaum) presented at the 2019 Duke Research Computing Symposium, Durham, NC, January 16, 2019

See earlier related team, Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2017-2018).

Image of brains.

Team Leaders

  • Jane Gagliardi, School of Medicine-Psychiatry and Behavioral Sciences
  • Gopalkumar Rakesh, School of Medicine-Psychiatry and Behavioral Sciences
  • Jessica Tenenbaum, School of Medicine-Biostatistics and Bioinformatics

/graduate Team Members

  • Qi Liu, Statistical Science - MS
  • Casey Riffel, Masters of Public Policy
  • Allison Young, Interdisciplinary Data Science - Masters
  • Xue Zou, Comp Biology and Bioinfo-PHD

/undergraduate Team Members

  • Abhishek Jadhav, Biomedical Engineering (BSE), Computer Science (BS2)
  • Sanya Kochhar, Computer Science (BS)
  • Aakash Thumaty, Computer Science (BS), History (AB2)
  • Pranav Warman, Computer Science (BS), Biology (BS2)
  • Kamyar Yazdani, Biology (BS), Computer Science (AB2)

/yfaculty/staff Team Members

  • Myung Woo, School of Medicine-Medicine: General Internal Medicine