> The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here.
For example, If you include no (or few enough) black women in the dataset of x-rays, the model may very well miss signs of disease in black women.
The biases and mistakes of those who created the data set leak into the model.
Early image recognition models had some very… culturally insensitive classes baked in.
I am confused. I’m not a doctor, but why would a model perform poorly at detecting diseases in X-rays in different genders and races unless the diseases present themselves differently in X-Rays for different races? Shouldn’t the model not have the race and gender information to begin with? Like a model trained on detecting lesions should perform equally well on ANY X-Ray unless lesions show up differently in different demographics.
You and the article are both correct. The disease does present itself differently as a function of these other characteristics, so since the training dataset doesn't contain enough samples of these different presentations, it is unable to effectively diagnose.
> [...] unless lesions show up differently in different demographics.
Well, first the model looks at the entire X-ray and lesions probably do show differently. Maybe it's genetic/sex-based or it's due how lesions develop due environmental factors that are correlated to race or gender. Maybe there's a smaller segment of white people that has the same type of lesion and poor detection.
> Like a model trained on detecting lesions should perform equally well on ANY X-Ray unless lesions show up differently in different demographics.
This is not true in practice.
For a model to perform well looking at ANY X-ray, it would need examples of every kind of X-ray.
That includes along race, gender, amputee status, etc.
The point of classification models is to discover differentiating features.
We don’t know those features before hand, so we give the model as much relevant information as we can and have it discover those features.
There very well may be differences between black woman X-rays and other X-rays, we don’t know for sure.
We can’t have that assumption when building a dataset.
Even believing that there are no possible differences between X-rays of different races is a bias that would be reflected by the dataset.
For a start, women have different body shape and you can (unreliably) tell a woman and from a men from an X-ray. The model can be picking up on those signs as a side effect and end up less correct for demographic it was not trained for.
If diseases manifest differently for different races and genders, the obvious solution is to train multiple LLMs, based on separate datasets for those different groups. Not to mutter darkly about bias and discrimination.
Xays by definition don't look at skin color. Do chest x-rays of black women reveal that there's something different about their chests than white or asian women? That doesn't pass my non doctor sniff test, but someone can correct me (no sarcasm intended).
But they do look at bones and near-bone tissues, which can still have variance based on ethnicity and gender. For a really brute-force example, just think about how we use the shape of the pelvis and some other bones to identify the gender of skeletal remains of a person. If you had a data set of pelvic xrays that only included males, your data set would imply that female pelvic bones are massively malformed despite being perfectly normal for that gender.
This is the whole point of the article. Did you read it? Does the whole thing fail your sniff test?
Their results seem solid, and clear, to me.
Breast density affects the imaging you get from x-rays. It is well-known that denser breast tissue results in x-rays that are "whiter" (I'm talking about the image of the tissue, in white, on a black background, as x-rays are commonly read by radiologists). Denser breasts are associated with less effective screening for breast cancer via mammogram. A mammogram is a low-dose x-ray.
When using a chest x-ray to look for pulmonary edema, for instance, I would be unsurprised if breast tissue (of any quantity) and in particular denser breast tissue would make the diagnosis of pulmonary edema more difficult from the image alone.
Also, you seem to have conflated a few things in your second sentence. Deep in the article, they did have radiologists try to guess demographic attributes by looking at the x-ray images. They were pretty good at guessing female/male (unsurprising) and were not really able to guess age or race. So I'm super interested in how the AI model was able to be better at that than the human radiologists.
There can be differences which statistical models pick up which we humans don’t.
For example, a couple years ago there was a statistical model made which could fairly accurately predict (iirc >80%) the gender of a person based on a picture of their iris. At the time we didn’t know there was a visible iris difference between genders, but a statistical model found one.
That’s kind of the whole point of statistical classification models. Feed in a ton of data and the model will discover the differentiating features.
Put another way, If we knew all the possible differences between someone with cancer and without, we wouldn’t need statistical models at all, we could just automate the diagnosis.
We don’t know the indicators that we don’t know, so we don’t know if some possible indicators show up or don’t show up in a given group of people.
That is the danger of wholly relying on statistical models.
What groups have the financial means to get chest x-rays, and what groups do not? What historical events could create the circumstances where different groups have different health outcomes?
you ain't gonna like the truth but there are differences between the races and during med school they try to say it ain't so but once you start seeing patients there's differences in musculature/skin, all sorts. and if you have a good attending they tactfully tell you and you go 'was it in a study?' and nope nobody wants to publish it. and no i'm talking just stuff like scabies or diabetes.
Cancer progresses differently depending on ethnicity and sex. As does treatment and likelihood of receiving treatment at early stages.
Black women experience worse outcomes and are diagnosed with more severe forms of breast cancer than white women.
Cancer is not just one disease. Its progression will vary depending on type. If the AI is trained on only some strains of cancer, eg those traditionally found in white women in early detection scenarios, it might not generalize to other cancer types.
So yes, to your genuine question, medical imaging of cancer can vary depending on ethnicity because different cancers can vary between genetic backgrounds. Ideally there would be sufficient training data across the populations, but there isn't because of historical race bias. (Among other reasons.)