Competition Appeal Tribunal Upholds CMA's Verdict
The study evaluated how the introduction of predictive models affects the subsequent performance of both the existing models and others. Pietro Jeng/Pexels

In a groundbreaking study conducted by researchers from the Icahn School of Medicine and the University of Michigan, the impact of implementing predictive models in healthcare has come under scrutiny.

The research findings, detailed in the October 9th online issue of Annals of Internal Medicine, reveal that machine learning models, which have shown remarkable promise in healthcare, can sometimes become victims of their own success. The study assessed how the use of these models to influence clinical decisions can alter the very assumptions they were trained on, often with unintended consequences.

Lead author, Akhil Vaid, M.D., Clinical Instructor of Data-Driven and Digital Medicine (D3M) at Icahn Mount Sinai, explains: "We wanted to explore what happens when a machine learning model is deployed in a hospital and allowed to influence physician decisions for the overall benefit of patients. For example, we sought to understand the broader consequences when a patient is spared from adverse outcomes like kidney damage or mortality."

"AI models possess the capacity to learn and establish correlations between incoming patient data and corresponding outcomes, but use of these models, by definition, can alter these relationships. Problems arise when these altered relationships are captured back into medical records."

The study conducted extensive simulations of critical care scenarios at two major healthcare institutions: the Mount Sinai Health System in New York and Beth Israel Deaconess Medical Center in Boston. The analysis involved 130,000 critical care admissions and focused on three key scenarios:

  1. Model Retraining After Initial Use: Current practice suggests retraining machine learning models to address performance degradation over time. While retraining can initially improve performance by adapting to changing conditions, the study revealed a paradoxical outcome – further degradation. This occurs because retraining disrupts the learned relationships between patient presentation and outcome, which were the foundation of the model's effectiveness.
  2. Creating a New Model After One Has Already Been in Use: Using a machine learning model to predict outcomes, such as sepsis, can save patients from adverse events. However, this model also aims to prevent these adverse outcomes, including death. Any new models developed for predicting death will now inherit the same disrupted relationships as before. This uncertainty regarding the exact relationships between various outcomes makes it inappropriate to use data from patients who received care influenced by machine learning models to train new models.
  3. Concurrent Use of Two Predictive Models: When two models make simultaneous predictions, relying on one set of predictions renders the other obsolete. This means that predictions should be based on freshly gathered data, which can be costly or impractical. The concurrent use of models further complicates the already complex landscape of healthcare decision-making.

Co-senior author Karandeep Singh, MD, Associate Professor of Learning Health Sciences, Internal Medicine, Urology and Information at the University of Michigan, explains: "Our findings reinforce the complexities and challenges of maintaining predictive model performance in active clinical use."

"Model performance can fall dramatically if patient populations change in their makeup. However, agreed-upon corrective measures may fall apart completely if we do not pay attention to what the models are doing — or more properly, what they are learning from."

Co-senior author Girish Nadkarni, M.D., MPH, Irene and Dr. Arthur M. Fishberg Professor of Medicine at Icahn Mount Sinai, Director of The Charles Bronfman Institute of Personalised Medicine and System Chief of Data-Driven and Digital Medicine, emphasises the need for a thoughtful and measured approach to using machine learning models in healthcare.

He states: "We should not view predictive models as unreliable. Instead, it's about recognising that these tools require regular maintenance, understanding and contextualisation. Neglecting their performance and impact monitoring can undermine their effectiveness."

"We must use predictive models thoughtfully, just like any other medical tool. Learning health systems must pay heed to the fact that indiscriminate use of and updates to, such models will cause false alarms, unnecessary testing and increased costs."

Dr. Vaid suggests practical steps to mitigate the unintended consequences of machine learning models in healthcare. "We recommend that health systems promptly implement a system to track individuals impacted by machine learning predictions, and that the relevant governmental agencies issue guidelines."

"These findings are equally applicable outside of healthcare settings and extend to predictive models in general. As such, we live in a model-eat-model world where any naively deployed model can disrupt the function of current and future models and eventually render itself useless."