Tackling AI Bias in Medical Imaging to Improve Diagnoses

Summary: Researchers have identified 29 distinct sources of potential bias that can affect artificial intelligence and machine learning (AI/ML) models used in medical imaging, spanning every stage from data collection to model deployment.

The report outlines the limitations of current AI/ML approaches in medical imaging and offers practical mitigation strategies to help enable fairer, more reliable clinical implementations.

Key Points:

AI and machine learning are increasingly applied to medical imaging for diagnosis, prognosis, risk assessment, and evaluating treatment response.
The study catalogs 29 potential sources of bias that can arise during the development and deployment of imaging AI/ML systems and recommends mitigation measures.
Unchecked bias can produce unequal benefits across patient groups and may deepen existing disparities in healthcare access and outcomes.

Source: SPIE

Artificial intelligence and machine learning technologies are rapidly expanding their role in medicine, especially in interpreting medical images such as X-rays, CT scans, and MRI scans. When applied correctly, these tools can assist clinicians with detection, diagnosis, prognosis, and monitoring treatment response. However, achieving trustworthy, generalizable AI/ML requires careful attention to design, training, validation, and real-world use.

In practice, building models that perform robustly across diverse patient populations and clinical settings is difficult. Like human decision-makers, AI/ML systems can reflect and amplify biases present in their data, development processes, or deployment environments. Identifying where bias can occur and implementing strategies to reduce it are essential steps toward equitable and reliable AI for medical imaging.

A collaborative, multi-institutional team affiliated with the Medical Imaging and Data Resource Center (MIDRC) — including medical physicists, AI/ML researchers, statisticians, physicians, and regulatory scientists — documented these concerns in a comprehensive review published in the Journal of Medical Imaging.

The authors mapped 29 potential sources of bias to five main stages of the AI/ML lifecycle: data collection; data preparation and annotation; model development; model evaluation; and model deployment. Many bias sources can affect more than one stage, and a single bias can cascade through the pipeline if it is not recognized and corrected.

The report also discusses mitigation strategies and points readers to additional resources available on the MIDRC website for bias awareness and tools.

Data collection is a frequent origin of bias. For example, datasets drawn primarily from a single hospital, a narrow geographic area, or a single type of imaging device may not represent the wider patient population. Differences in how social groups are treated by the healthcare system or selected into studies can also skew datasets. In addition, temporal bias can arise when models are trained on historical data that no longer reflect current clinical practices or population health patterns.

Biases in data preparation and annotation are closely linked to data collection. Labeling decisions and annotation practices can introduce systematic errors when labels reflect subjective judgments, inconsistent instructions, or annotator-specific patterns. How information is presented to annotators — including which samples they see and in what order — can also influence labels and downstream model behavior.

During model development, biases can be introduced through algorithmic choices, training objectives, and data engineering practices. One notable example is inherited bias: when outputs from a biased model are used to train subsequent models, the bias propagates. Underrepresentation of particular subgroups in training data, or historical and institutionalized factors reflected in the data, are other common sources of model-level bias.

This shows a brain on a computer screen — Much like humans, AI/ML models can be biased, and may result in differential treatment of medically similar cases. Credit: Neuroscience News

Evaluation is another critical point where bias can appear. Benchmarking models on biased or non-representative test sets produces misleading performance estimates. Choosing inappropriate metrics or statistical tests can further conceal disparities in model performance across subgroups.

Finally, deployment introduces human and system-level sources of bias. Models may be used outside their intended scope — for image types or device settings they were not validated on — or users may over-rely on automated outputs, a phenomenon known as automation bias. Workflow integration, training, and monitoring practices all influence how a model performs in real clinical environments.

Beyond identification, the team proposes mitigation strategies and best practices for each stage of the pipeline. Recommendations include building diverse, well-documented datasets; standardizing annotation protocols and audit trails; assessing models for subgroup performance; using robust statistical evaluations; and establishing post-deployment monitoring with feedback loops to catch drift or misuse.

This analysis offers researchers, clinicians, and policymakers a practical framework to recognize, measure, and reduce bias in medical imaging AI/ML. By following these guidelines and leveraging available tools, the field can move toward fairer and more trustworthy AI systems that benefit a wider range of patients.

About this artificial intelligence and machine learning research news

Author: Daneet Steffens
Source: SPIE
Contact: Daneet Steffens – SPIE
Image: The image is credited to Neuroscience News

Original Research: Open access. “Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment” by K. Drukker et al., Journal of Medical Imaging.

Abstract

Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment

Purpose

The goal of this work is to identify and address sources of bias that threaten algorithmic fairness, reliability, and trust in medical imaging AI. As AI/ML tools move toward clinical use for detection, diagnosis, prognosis, and risk assessment, biases introduced at any stage of development can limit their benefits and may worsen inequities by producing systematically different outcomes for different groups.

Approach

A multi-institutional team of experts — including medical physicists, AI/ML researchers, bias specialists, statisticians, clinicians, and regulatory scientists — reviewed the AI/ML development pipeline, identified potential bias sources, and proposed mitigation strategies and best-practice recommendations tailored to medical imaging.

Results

The review defines five principal stages of the imaging AI/ML roadmap: (1) data collection, (2) data preparation and annotation, (3) model development, (4) model evaluation, and (5) model deployment. Within these stages the team cataloged 29 potential sources of bias, many of which can influence multiple stages, and described corresponding mitigation approaches.

Conclusions

These findings offer a practical resource for scientists, clinicians, and the public to better understand the limitations of current imaging AI/ML systems and to guide the design, evaluation, and deployment of fairer, more equitable models.