
In Sep. 2022, researchers from Stanford and Harvard published an article in the journal Nature Biomedical Engineering about how a new self-supervised machine learning algorithm could diagnose conditions based on X-ray images. This new model was able to outperform past, fully supervised, machine learning models; in fact, it was able to diagnose patients at a level equal to or above medical experts. The researcher’s model, CheXzero, was able to learn faster and at a lower cost, as its self-supervised nature meant that it didn’t need to rely on the expert labeling that other models had been burdened by. While this could be a huge step forward in medicine, potential bias in this model’s diagnoses remains a concern.
Researchers have known for several years that medical AI models can take on biases simply due to the nature of their training data, which is impacted by biases from people and society as a whole. A paper published in the journal Nature Medicine in late 2021 by researchers at the University of Toronto and MIT demonstrated that AI systems tend to underdiagnose historically underserved groups, such as women, racial minorities, and people with lower socioeconomic statuses. That is to say that people within these groups were more likely to be falsely marked as healthy by these AI models, even if they had the disease in question.
The researchers also found that people at the intersections of these underserved identities, such as Hispanic women, had an even higher underdiagnosis rate than other populations. Although the researchers were unable to say what exactly was causing these models to underdiagnose people in these ways, they still expressed that the ethical concerns raised by their results meant that medical professionals should be very careful about using these technologies in their work, if they still choose to use them at all.
In a June 2022 paper from the journal The Lancet: Digital Health, another team of researchers led by a professor at Emory University found that AI models could detect a patient’s race from their medical images. In their study, the team found that across several types of medical images, including X-rays, CT scans, and mammograms, most standard AI deep learning models could be trained to predict a patient’s race.
Moreover, this could still be done without knowing other details about a patient, such as BMI, and it could be done across the entire human body. This study expanded on findings from a June 2021 paper in Emergency Radiology by researchers at the University of Maryland and Johns Hopkins that showed how advanced neural networks could automatically classify patients by race and age based on their chest X-rays.
On March 26, a new paper was published in Science Advances by a team from MIT, the University of California San Diego, and the University of Washington, showing that even CheXzero, one of the most (if not the most) advanced models out there, is also impacted by these biases. This paper found that, once again, the model underdiagnosed historically marginalized groups and underdiagnosed patients in intersection subgroups at an even higher rate. So, even though CheXzero wasn’t trained on nearly as much of the labeled data one would expect to produce biases, it appears that it has taken them on anyway. This once again raises ethical concerns about the use of these models to diagnose people in the real world — concerns that the medical community continues to grapple with.
Leave a Reply