A Reality Check – When Lab-Trained AI Meets the Real World, “Mistakes Can Happen”

Artificial Intelligence CPU Technology Concept Art

A study reveals AI’s struggle with tissue contamination in medical diagnostics, a problem easily managed by human pathologists, underscoring the importance of human expertise in healthcare despite advancements in AI technology.

Contamination of tissue samples can mislead AI models, preventing them from making accurate diagnoses in real-world situations.

Human pathologists undergo rigorous training to identify instances where tissue samples from one patient are accidentally placed on microscope slides meant for another patient, a mistake referred to as tissue contamination. However, this type of contamination poses a significant challenge for artificial intelligence (AI) models, which are typically developed in clean, controlled settings, according to a recent study by Northwestern Medicine.

“We train AIs to tell ‘A’ versus ‘B’ in a very clean, artificial environment, but, in real life, the AI will see a variety of materials that it hasn’t trained on. When it does, mistakes can happen,” said corresponding author Dr. Jeffery Goldstein, director of perinatal pathology and an assistant professor of perinatal pathology and autopsy at Northwestern University Feinberg School of Medicine.

“Our findings serve as a reminder that AI that works incredibly well in the lab may fall on its face in the real world. Patients should continue to expect that a human expert is the final decider on diagnoses made on biopsies and other tissue samples. Pathologists fear — and AI companies hope — that the computers are coming for our jobs. Not yet.”

In the new study, scientists trained three AI models to scan microscope slides of placenta tissue to (1) detect blood vessel damage; (2) estimate gestational age; and (3) classify macroscopic lesions. They trained a fourth AI model to detect prostate cancer in tissues collected from needle biopsies. When the models were ready, the scientists exposed each one to small portions of contaminant tissue (e.g. bladder, blood, etc.) that were randomly sampled from other slides. Finally, they tested the AIs’ reactions.

Each of the four AI models paid too much attention to the tissue contamination, which resulted in errors when diagnosing or detecting vessel damage, gestational age, lesions, and prostate cancer, the study found.

The findings were recently published in the journal Modern Pathology. It marks the first study to examine how tissue contamination affects machine-learning models.

‘For a human, we’d call it a distraction, like a bright, shiny object’

Tissue contamination is a well-known problem for pathologists, but it often comes as a surprise to non-pathologist researchers or doctors, the study points out. A pathologist examining 80 to 100 slides per day can expect to see two to three with contaminants, but they’ve been trained to ignore them.

When humans examine tissue on slides, they can only look at a limited field within the microscope, then move to a new field, and so on. After examining the entire sample, they combine all the information they’ve gathered to make a diagnosis. An AI model performs in the same way, but the study found AI was easily misled by contaminants.

“The AI model has to decide which pieces to pay attention to and which ones not to, and that’s zero-sum,” Goldstein said. “If it’s paying attention to tissue contaminants, then it’s paying less attention to the tissue from the patient that is being examined. For a human, we’d call it a distraction, like a bright, shiny object.”

The AI models gave a high level of attention to contaminants, indicating an inability to encode biological impurities. Practitioners should work to quantify and improve upon this problem, the study authors said.

Previous AI scientists in pathology have studied different kinds of image artifacts, such as blurriness, debris on the slide, folds, or bubbles, but this is the first time they’ve examined tissue contamination.

‘Confident that AI for placenta is doable’

Perinatal pathologists, such as Goldstein, are incredibly rare. In fact, there are only 50 to 100 in the entire U.S., mostly located in big academic centers, Goldstein said. This means only 5% of placentas in the U.S. are examined by human experts. Worldwide, that number is even lower. Embedding this type of expertise into AI models can help pathologists across the country do their jobs better and faster, Goldstein said.

“I’m actually very excited about how well we were able to build the models and how well they performed before we deliberately broke them for the study,” Goldstein said. “Our results make me confident that AI evaluations of placenta are doable. We ran into a real-world problem, but hitting that speedbump means we’re on the road to better integrating the use of machine learning in pathology.”

Reference: “Tissue contamination challenges the credibility of machine learning models in real world digital pathology” by Ismail Irmakci, Ramin Nateghi, Rujoi Zhou, Mariavittoria Vescovo, Madeline Saft, Ashley E. Ross, Ximing J. Yang, Lee A.D. Cooper and Jeffery A. Goldstein, 6 January 2024, Modern Pathology.
DOI: 10.1016/j.modpat.2024.100422

The study was funded by the National Institute of Biomedical Imaging and Bioengineering, the National Center for Advancing Translational Sciences (NCATS), the Walder Foundation Fund to Retain Clinician Scientists, and Department of Health and Human Services.