Using Machine Learning to Generate Clinical Prediction Rules for Clinical Outcomes in Schizophrenia (2017-2018)


Schizophrenia is a mental illness that affects 1.1% of the U.S. population. The disease is characterized by global deterioration in functioning and includes presence of delusions, hallucinations and cognitive deficits.

The burden of care in terms of caregiver stress and economic burden is high. Patients are usually started on medications called antipsychotics for symptom control and will need lifelong treatment in most cases. Although medications are not 100% effective, compliance with psychosocial interventions and medications is an important moderator of illness course and prognosis. When patients stop taking their medications, they have a higher risk of relapse, which leads to care in the emergency department (ED) or inpatient unit.

At Duke, nearly 500 patients with schizophrenia visit the ED in a year and stay for an average of one to two weeks before they can get an inpatient bed. In addition to occupying space in the ED, patients with schizophrenia cost the system money. They are frequently under-resourced, uninsured or under-insured.

The ability to prospectively and accurately predict high risk of relapse could facilitate the allocation of scarce resources to patients most likely to benefit, with an end result of decreasing the length of stay in the ED and, ultimately, decreasing the need to refer patients for inpatient care. 

Project Description

This Bass Connections project will tackle the problems of high frequency of relapse associated with schizophrenia and high economic and health system burden associated with schizophrenia.

The current absence of a clinical prediction tool makes it difficult for a clinical provider to know prospectively which patients would benefit from more intensive resources including community support or clozapine. Such a tool would be of greatest relevance to large-scale providers such as the Department of Defense and affordable care organizations, as it would help them decrease the economic burden of mental healthcare and allocate resources appropriately.

Therefore, in order to lay the groundwork for development of a clinical prediction tool for use in inpatient and outpatient settings, this project team will apply machine learning to the Duke clinical data set that contains clinical and demographic details related to patients with schizophrenia. The goal is to pinpoint the optimum predictor clinical and demographic variables.

Ultimately, taking this work forward beyond the 2017-2018 Bass Connections team, researchers will develop a software interface wherein input of a few patient-specific demographic, illness and comorbidity variables would result in a score having prognostic implications. The prediction score could be utilized to create algorithms to facilitate appropriate advocacy for resource allocation to patients based on risk of relapse.

Anticipated Outcomes

Extraction of data sets and application of machine learning to pinpoint optimum predictor variables; poster and platform presentations at local and national meetings; submission to nationally distributed journals

Student Opportunities

Student team members will contribute to manuscript writing, extract data from the clinical database—Epic (Maestro)—and apply machine learning to extracted data sets.

Work done by undergraduates will be supervised by Ph.D. scholars and clinicians. Collaboration and in-person discussions are also vital. There will be extensive discussions between clinicians and statisticians to choose optimum predictor variables.

Through this collaboration, clinicians and clinical learners will gain a greater understanding of statistical methodology and limitations of datasets, whereas statistical and mathematical experts and learners will gain greater appreciation for the symptoms involved in schizophrenia and, possibly, the unnecessary additional burden of stigma. We plan to hold weekly meetings initially for the first 10 meetings and then move to monthly meetings.

Team progress and performance will be evaluated by the number of patients whose data has been extracted from the clinical data set and the number of patients for whom machine learning has been applied.

The team will be comprised of five undergraduate students working part time during the summer and two master’s students or Ph.D. scholars working full time over the course of the year after the summer. Two clinicians will supervise the team.


Summer 2017 – Spring 2018

  • Summer 2017: Extraction of clinical data from Epic Maestro, groundwork for applying machine learning to the data extracted: June 19 – August 13
  • Fall 2017: Estimation of optimum predictors, discussions between students and clinicians about variables
  • Spring 2018: Drafting of posters, manuscripts and other abstracts for presentation/submission


Independent study credit available for fall and spring semesters; summer funding

Faculty/Staff Team Members

Jane Gagliardi, School of Medicine - Psychiatry
Katherine Heller, Trinity - Statistical Science*
Gopalkumar Rakesh, School of Medicine - Psychiatry*

Graduate Team Members

Joseph Futoma, Graduate School - PhD in Statistics*

* denotes team leader