Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2018-2019)
The worldwide economic burden associated with caring for patients with schizophrenia has doubled in the last 10 years, from $62.7 billion in 2002 to $155.7 billion in recent years. Direct healthcare costs (inpatient care, emergency visits, medication costs and long-term care) account for 22-24% of all healthcare expenditures.
Patients with schizophrenia are high utilizers of emergency department (ED) services because of relapse, which may be caused by psychoactive substance use, not taking medications as prescribed and/or lack of efficacy of interventions. These patients frequently need inpatient care, but insufficient resources lead to a situation in which patients are often kept in the ED until the crisis resolves, the substances dissipate from their system, interventions take effect or an inpatient bed becomes available.
This Bass Connections project aims to foster effective allocation of resources by assessing relapse risk and applying community supports and priority inpatient beds according to risk.
There are no tools in psychiatry that can reliably predict patients’ illness course. The project team will use machine learning in order to develop a clinical prediction tool to predict the risk of relapse and prognosis of schizophrenia for every patient diagnosed with the condition. Based on predicted risk of relapse, resource allocation can be optimized to target reduced relapse rate and, ultimately, result in less frequent visits to the ED.
The 2017-18 Bass Connections team extracted clinical data on 1,350 patients with schizophrenia from Duke electronic health records (EHR) along with all provider notes. Data in these notes will add features for the team’s machine learning prediction tool, which is currently modeled based on diagnosis, medications, problem lists, insurance and other demographics. Adding substance use history, episodes of aggressions, suicidal and self-injurious behavior, homelessness, compliance on medications, responses to different medications and details about other treatments will enhance the tool.
Outcome measures for the current machine learning model are frequency of visits for outpatient and ED visits and duration of stay if inpatient. Using analysis of provider notes, the team will extract other outcome measures such as being lost to follow-up, suicide, lack of compliance on medications or death by suicide or homicide. Team members will use natural language processing (NLP) term extraction packages to extract biomedical concepts from free text, including negation. The resulting concept vectors will not contain any PHI. Once devoid of PHI, these extracts will be analyzed to help develop the clinical prediction tool. The goal is to utilize the extracted language as features for applying machine learning. Clinically relevant terms will be compiled from SNOMED-CT.
Two publications, multiple posters, pilot data for future grant applications
Summer 2018 – Fall 2018
- Summer 2018: Graduate students perform tasks on 500 patients (extraction of clinical relevant details from notes and screening to ensure lack of PHI); undergraduates conduct analysis of these extracted details to enrich feature matrix model
- Fall 2018: Complete next 500 patients by December 2018
/faculty/staff Team Members
Jane Gagliardi, School of Medicine-Psychiatry and Behavioral Sciences*
Katherine Heller, Arts & Sciences-Statistical Science*
Gopalkumar Rakesh, School of Medicine-Psychiatry and Behavioral Sciences*
Jessica Tenenbaum, School of Medicine-Biostatistics and Bioinformatics*
/graduate Team Members
Qi Liu, Statistical Science - MS
Xue Zou, Comp Biology and Bioinfo-PHD
/undergraduate Team Members
Aakash Thumaty, Computer Science (BS), History (AB2)
/zcommunity Team Members
Benedetto Benedetti, Scuola Normale Superiore