"AIs want the future to be like the past, and AIs make the future like the past. If the training data is full of human bias, then the predictions will also be full of human bias, and then the outcomes will be full of human bias, and when those outcomes are copraphagically fed back into the training data, you get new, highly concentrated human/machine bias.”
https://pluralistic.net/2025/03/18/asbestos-in-the-walls/#go...
The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here. The key takeaway is that you can't treat all humans as a single group in this context, and variations in the biology across different groups of people may need to be taken into account within the training process. In other words, the model will need to be trained on this racial/gender data too in order to get better results when predicting the targeted diseases within these groups.
I think it's interesting to think about instead attaching generic information instead of group data, which would be blind to human bias and the messiness of our rough categorizations of subgroups.
One of the things that people I know in the medical field have mentioned is that there's racial and gender bias that goes through all levels and has a sort of feedback loop. A lot of medical knowledge is gained empirically, and historically that has meant that minorities and women tended to be underrepresented in western medical literature. That leads to new medical practitioners being less exposed to presentations of various ailments that may have variance due to gender or ethnicity. Basically, if most data is gathered from those who have the most access to medicine, there will be an inherent bias towards how various ailments present in those populations. So your base data set might be skewed from the very beginning.
(This is mostly just to offer some food for thought, I haven't read the article in full so I don't want to comment on it specifically.)
>women tended to be underrepresented in western medical literature.
Is there some evidence of this? It's hard for me to picture that women see receive less medical attention than man: completely inconsistent with my culture and every doctor's office I've ever been to. It's more believable (still not very) that they disproportionately avoid studies.
There is indeed a lot of evidence of this but you've got the direction backwards- it's not that women avoid studies, it's that for a long time studies specifically excluded women. Ditto for people of different races. This is why these days (well, as of today, at least) the NIH has a whole set of very well-established policies around inclusion in clinical trials that include sex, race, and age: https://grants.nih.gov/policy-and-compliance/policy-topics/i...
And this isn't for "DEI" reasons, it's literally because for decades there used to be drug trials that excluded women and as a result ended up releasing drugs that gave half the population weird side effects that didn't get caught during the trials, or just plain didn't work as well on one group or another in ways that were really hard to debug once the drug was on the market. That was legit bad science, and the medical research world has worked very hard over the last thirty years to do better. We are admittedly not there yet, but things are a lot better than they used to be.
For a really interesting take on the history of racial exclusion and bias in medicine, I recommend Uché Blackstock's recent book "Legacy: A Black Physician Reckons With Racism In Medicine" which gave a great overview.
Oh! And also everybody should read Abby Norman's "Ask Me About My Uterus," it gives a fabulous history of issues around women's health.
Also, lots of medical studies have been done on drafted/conscripted soldiers which were all men. As well as lessons learned from treating injured and sick soldiers.
European medical studies had few non-white members because their populations had few such people until recent decades.
Lots of workplace accidents or exposures have led to medical knowledge, which are massively disproportionately male.
> It's more believable (still not very) that they disproportionately avoid studies.
Women are definitely strongly underrepresented in medical texts, and it's not typically by choice: https://www.aamc.org/news/why-we-know-so-little-about-women-...
A lot of "the consensus" in medical literature predates the inclusion of women in medical research, and even still there things are not tested on women (often because of ethical risks around fertility and birth defects).
> It's hard for me to picture that women see receive less medical attention than man: completely inconsistent with my culture and every doctor's office I've ever been to
“Medical attention” and “coverage in medical literature” aren't even remotely the same thing, so dismissing a claim about the first based on your anecdotal experience of the second is completely bonkers.
There's a few factors here:
1. We're talking about a span of 200 or so years. There is plenty of modern medicine that is still based on now century+ old knowledge.
2. The feedback loop. If you were learning medicine in the 1950's, you were probably learning from medical texts written in the 50 or so years before that, when it's not unreasonable to think women would have been less represented. Those same doctors from the 1950's would then have been teaching the next generation of doctors, and they carried those (intentional or not) biases forward. Of course there was new information, but you don't tend to have much time to explore novel medicine when you're in medical school or residency, so by the time you can integrate the new knowledge, some biases have already set in. Repeat for a few generations, and you tend to only get a dilution of those old ideas, not a wholesale replacement of them.
3. If you've been affected by such biases as a patient, you're less likely to trust and be willing to participate with medicine, once more reinforcing the feedback loop.
I don't have any specific numbers or studies for you, but you could probably find more than a few that attest to this phenomenon. I hate to go with 'trust me bro' here, but my knowledge on this topic largely comes from knowing people that are either studying or practicing medicine currently, so it's anecdotal, but the anecdotes are from those in the field currently.
Your location seems to be in Cox, Virginia, not sure how widespread beyond that your experience is?
Of course lots of people have already noted that being represented in medical studies is not related to doctor's visits, but I would like to talk about the doctor's visits observation.
At any rate one thing that might cause you to think that Women are receiving lots of medical attention, based on your anecdotal evidence from visits to doctors' offices, there is one type of medical attention that of course is almost all women and that is the medical attention that revolves around pregnancy. That might skew your perception.
Furthermore if AI models and doctors have a tendency to miss disease among women it would seem to me to be reasonable to assume that women would be in the doctor's offices more often.
Example of why this is:
You go to your doctor, there is a man there, doctor says you have this rare disease you need to go to this specialist - you will not see that man in the doctor's office again dealing with his rare disease.
You go to your doctor, there is a woman there that has the same rare disease, the doctor says I think it will clear up, just relax you have some anxiety. That woman will probably be showing up to that doctor's office to deal with that disease multiple times, and you might end up seeing her.
on edit: there was another example of why women might be in doctor's offices more often then men that I forgot, women tend, even nowadays, to be the primary caregiver and errand runner for the family, sometimes if you have issues with children or your husband etc. has had an appointment, needs to drop a sample off, etc. it may be that the woman goes to the doctor's office and takes care of these errands around the medical needs of the rest of the family, and thus you might go to a doctor and see a couple women sitting around and wonder damn, why all these women always being sick, when the meeting isn't even about them.
Part of it is that women are less likely to join studies (especially risky ones that might impact their fertility or the health of their future children).
Part of it is that men are seen as disposable and it's more socially acceptable to exploit and experiment on men. It was also much easier to deal with men historically since once women got involved everything got a lot more complicated. This was especially true in the past where women were so infantilized that their husbands/fathers were put in charge of their medical care/choices. Those backwards attitudes had some strange consequences. On one hand women were seen as the property of men who could get their wives/daughters institutionalized or even lobotomized for not conforming, but at the same time women were also seen as delicate over-emotional creatures who had to be protected and whose modesty had to be preserved in ways that just weren't a consideration when men were involved. Basically for a large part of our history both men and women have been treated like crap by society and while things have improved in a lot of ways, our records and knowledge have been tainted by those old stupid biases and so we're stuck dealing with the fallout.
Here is an academic medicine perspective: https://www.aamc.org/news/why-we-know-so-little-about-women-...
To give you some TL;DR from personal-ish experience, women have historically been excluded from medical trials because:
* why include them? people are people, right? * except when they're pregnant or could be pregnant -- a trial by definition has risks, and so "of course" one would want to exclude anyone who is or could get pregnant (it's the clinical trial version of "she's just going to get married and leave the job anyway") * and cyclical fluctuations in hormones are annoying.
The first one is wrong (tho is an oversight that many had for years, assuming for instance that heart attacks and autism would present with the same symptoms in all adult humans).
The second is an un-nuanced approach to risk. Pregnant ladies also need medical treatment for things, and it's pretty annoying to be pregnant and be told that you need to decide among unstudied treatments for some non-pregnancy-related problem.
The third is just a difficult fact of life. I know researchers studying elite performance in women athletes, for instance. At an elite level, it would be useful to understand if there are different effects of training (strength, speed, endurance) at different times in the menstrual cycle. To do this, you need to measure hormone levels in the blood to establish on a scientific basis where in the cycle a study participant is. Turns out there is significant heterogeneity in how this process works. So some scientists in the field are arguing that studies should only be conducted on women who are experiencing "normal menstrual cycles" which is defined by them as three continuous months of a cycle between 28-35 days. So to establish that then you've got to get these ladies in for three months before the study can even start, getting these hormone levels measured to establish that the cycle is "normal", before you can even start your intervention. (Ain't no one got $$ for that...) And that's before we bring in the fact that many women performing on an elite level in sport don't have a normal menstrual cycle. But from the sports side, they'd still like to know what training is most effective.... so that's a very current debate in the field. And I haven't even started on hormonal birth control! Birth control provides a base level of hormone circulating in the blood, but if it's from a pill it's varying on a daily basis, while if it's a patch or ring it's on a monthly basis (or longer). There's some question of whether that hormonal load from the birth control is then suppressing natural production of some hormones. And why does this matter? Because estrogen for instance has significant effects on cardiovascular health, being cardioprotective from puberty up to menopause. (Yeah, I didn't even get started on perimenopause or menopause.)
Fine, fine, it's just data analysis & logistics. If you get the ladies (only between 21-35) into the lab for blood samples frequently enough and measure at the same time of day every time to avoid daily effects and find a large enough group that you can dump all the ladies who don't fit some definition of normal & anyone who gets pregnant but still get the power for your study, it's all fine, right? You've just expanded medical research to incorporate, like, 10% more of the population....!
I am just tired of skeptics asking innocently. Yes I wish i could take time to look for sources to educate people like you, but I don't. So take my word for it or not. But yes women's medical issue are disproportionately underrepresented, misrepresented and understudied.
It's pretty well understood that there's an unfortunate bias towards white men in their early 20s. This is a pervasive sampling problem across all human studies because most researchers have historically been at universities. So their pool of subjects has naturally been nearby college students.
Just as those are the people who have historically been doing that research, the people who they have studied have been drawn from the same population. Over and over we find that problems from the assumption that the young, white, male college student is a model of "normal" for all of humanity.
Honestly, it's such a pervasive finding in medicine, psychology, and sociology that I think it says more about your relative inexperience in those areas than anything else.
Women use far more medical care than men. Men's insurance premiums subsidize women's.
Has this been consistently true for the past 200 or so years? Many medical texts are pretty old.
And how much medical care they use does not necessarily correlate with how represented they are in the training data sets for AI.
The burden of proof is on you.
Proof that health insurance premiums for men have been consistently subsidizing women's health insurance premiums for the last 200 or so years? Perhaps the practical non-existence of health insurance until the latter half of the 20th's century? Pretty tough to subsidize something that doesn't exist.
You also offered no evidence for your assertion in the first place.
The ACA bans health insurance companies from charging men and women different rates for the same coverage. Before this, Women would have higher premiums because, on average, they use their coverage more. This is very easy to look up.
I can cite the ACA, but you can not cite anything that says AI training sets are biased against women.
A few questions for you to think of then -- or rather a few things I think you should consider with your statements:
1. How does ACA affect the corpus of knowledge and medical practice gathered prior to the ACA being in effect? How does it affect late 19th, and early and mid 20th century medical knowledge and practice, which occurred prior to health insurance of any kind, nevermind ACA-compliant, being widespread? This corpus of knowledge and practice continues to propagate even now. I've read a handful of recently published medical textbooks and there are definitely parts that are pretty much the same as the textbooks of the early 20th century, just with slightly updated language.
2. What are the possible confounding factors in the use of health insurance by men vs women? For example, could men just be more hesitant to see a doctor, and thus less likely to make use of health insurance? Does the average life expectancy of women result in more use of health insurance later in life than for men? Are medical procedures that are specific to women that add to the cost of their care, such as mammograms, pap smears, etc? Seeings as how in the US health insurance is a practical requirement to getting medical care, and lack of it is punished financially in various ways from taxes to just having medical care be more expensive when you truly need it, means most people will try to have _some_ kind of health insurance, even if they don't think they need it for actual health reasons. So despite a perception of not needing health insurance, men are incentivized to have health insurance they don't use?
3. Does the ACA guarantee in any way that medical professionals no longer hold any bias due their previous training, especially if such training occurred prior to the introduction of the ACA? Does the ACA similarly guarantee that women and men are not only able, but choose to pursue medical care and participate in medical studies at percentages matching the general population?
Your point about men subsidizing women with regard to health insurance premiums may be perfectly valid, I am not disputing you on that point. I am disputing that it is salient to the tradition and practice of medicine in the western world in the modern era, until very recently historically, and that these traditions and biases will affect data sets gathered from people who are directly affected by these biases and traditions to this day. We haven't eliminated them, because as I said in another comment, every generation just dilutes the old issues, it doesn't solve them. And while I could spend my evening finding studies from various countries that attest to my view on this, I have spent about as much time as I desire to on this, so I will grant you that my evidence is on the level of 'trust me bro' -- with the slight caveat that many people within just my family and close circle of friends are involved in the medical field and all largely agree to this, and they are not all based in the US (which by the way, your point is very specific to. ACA is a US thing, western medicine spans a bit more than that.) It is entirely fair for you to call out that I have offered no real peer-reviewed evidence for my statements. I intend to offer a viewpoint of someone who has had extensive peripheral experience with medical professionals and has discussed this topic with them, and to offer some avenues of thought on how and why the data sets might be biased.
Women using more health care didn't start with the ACA. The ACA just banned the practice of charging women more because they use more health care.
Ask a doctor what gender goes to them more for gender neutral health care like "flu-like symptoms".
Now you provide evidence that AI models discriminate against Women instead of DDoSing me with "how can you know its not true" written in 10 ways.
Funny how you never read a headline about how Latinos or Asians are discriminated against in medical science. That's a pretty clear give away that this is politically motivated.
Are you going to hold the same standard to them? Were Asians and Latinos represented in 200 year old medical texts?
> Funny how you never read a headline about how Latinos or Asians are discriminated against in medical science.
This happens all the time? Maybe you're just not reading a diverse set of media?
> Funny how you never read a headline about how Latinos or Asians are discriminated against in medical science. That's a pretty clear give away that this is politically motivated.
I read multiple of those, in mainstream media. Also about blacks having issues. Arguably, I did not seen them in conservative journals.
> Are you going to hold the same standard to them? Were Asians and Latinos represented in 200 year old medical texts?
Yes, if their diseases gets badly diagnosed, it is an issue.
> Ask a doctor what gender goes to them more for gender neutral health care like "flu-like symptoms".
That has about zero to do with who is in the studies. Plus, women in fact do have more problem to have their issues taken seriously.
> Now you provide evidence that AI models discriminate against Women instead of DDoSing
Literally here: https://www.science.org/content/article/ai-models-miss-disea...
Okay, frankly, the fuck are you on about?
I specifically mentioned both minorities and women in my original post, you're the one who specified men vs women. At this point, it seems you're the one who has some political if not potentially misogynist agenda.
It is very true that a lot of medical knowledge is gained empirically, and there is also an additional aspect to it. The history of Medical research is generally studied on the demographics where such testing is cultural acceptable, and where the gains of such research has been mostly sought, which is young men drafted into wars. The second common demographic are medical students, which historically was biased towards men but are today biased towards women.
So while access to medicine indeed one demographic, I would say that studies are more likely to target demographics which are convenient to test on.
> The history of Medical research is generally studied on the demographics where such testing is cultural acceptable, and where the gains of such research has been mostly sought, which is young men drafted into wars.
Though in this study, the AI models were also biased against people under the age of 40.
It is interesting that we're also seeing a lot of bias in the reporting and discussion of these results. The results tested three groups for bias, and found a bias in all three. Yet the headline only mentions the bias against two of the groups, and almost the entirety of the discussion here only talks about bias against two of the groups while ignoring the third group.
If I test a system for bias, select three different groups to test for, and all three have a bias against them, my first reaction would be "there's a good chance that it's also biased against many other groups, I should test for those as well." It wouldn't be to pretend that there's only bias against the only three groups I actually bothered checking for. It definitely wouldn't be two ignore one of those groups, and pretend that there's only a bias against the other two.
I think we're really talking about different aspects of the same issue. Everything you've described basically agrees with "those who have more access to medicine" because those are also the ones inherently more convenient to test/observe.
Like how the ones with the most access to medicine are mice, because they're convenient to experiment on.
And this is absolutely something one needs to consider when reading medical studies -- if they only use animal (usually mice) models, there's a decent chance the conclusions are not directly transferable to humans.
The key takeaway from the article is that the race etc. of the subjects wasn't disclosed to the AI, yet it was able to predict it to 80% while the human experts managed 50% suggesting that there was something else encoded in the imagery that the AI was picking up on.
The AI might just have a better subjective / analytical weight detection criteria. Humans are likely more willing to see what they (or not see what they don't) expect to see.
> The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here.
For example, If you include no (or few enough) black women in the dataset of x-rays, the model may very well miss signs of disease in black women.
The biases and mistakes of those who created the data set leak into the model.
Early image recognition models had some very… culturally insensitive classes baked in.
I am confused. I’m not a doctor, but why would a model perform poorly at detecting diseases in X-rays in different genders and races unless the diseases present themselves differently in X-Rays for different races? Shouldn’t the model not have the race and gender information to begin with? Like a model trained on detecting lesions should perform equally well on ANY X-Ray unless lesions show up differently in different demographics.
You and the article are both correct. The disease does present itself differently as a function of these other characteristics, so since the training dataset doesn't contain enough samples of these different presentations, it is unable to effectively diagnose.
> [...] unless lesions show up differently in different demographics.
Well, first the model looks at the entire X-ray and lesions probably do show differently. Maybe it's genetic/sex-based or it's due how lesions develop due environmental factors that are correlated to race or gender. Maybe there's a smaller segment of white people that has the same type of lesion and poor detection.
> Like a model trained on detecting lesions should perform equally well on ANY X-Ray unless lesions show up differently in different demographics.
This is not true in practice.
For a model to perform well looking at ANY X-ray, it would need examples of every kind of X-ray.
That includes along race, gender, amputee status, etc.
The point of classification models is to discover differentiating features.
We don’t know those features before hand, so we give the model as much relevant information as we can and have it discover those features.
There very well may be differences between black woman X-rays and other X-rays, we don’t know for sure.
We can’t have that assumption when building a dataset.
Even believing that there are no possible differences between X-rays of different races is a bias that would be reflected by the dataset.
For a start, women have different body shape and you can (unreliably) tell a woman and from a men from an X-ray. The model can be picking up on those signs as a side effect and end up less correct for demographic it was not trained for.
If diseases manifest differently for different races and genders, the obvious solution is to train multiple LLMs, based on separate datasets for those different groups. Not to mutter darkly about bias and discrimination.
Xays by definition don't look at skin color. Do chest x-rays of black women reveal that there's something different about their chests than white or asian women? That doesn't pass my non doctor sniff test, but someone can correct me (no sarcasm intended).
But they do look at bones and near-bone tissues, which can still have variance based on ethnicity and gender. For a really brute-force example, just think about how we use the shape of the pelvis and some other bones to identify the gender of skeletal remains of a person. If you had a data set of pelvic xrays that only included males, your data set would imply that female pelvic bones are massively malformed despite being perfectly normal for that gender.
This is the whole point of the article. Did you read it? Does the whole thing fail your sniff test?
Their results seem solid, and clear, to me.
Breast density affects the imaging you get from x-rays. It is well-known that denser breast tissue results in x-rays that are "whiter" (I'm talking about the image of the tissue, in white, on a black background, as x-rays are commonly read by radiologists). Denser breasts are associated with less effective screening for breast cancer via mammogram. A mammogram is a low-dose x-ray.
When using a chest x-ray to look for pulmonary edema, for instance, I would be unsurprised if breast tissue (of any quantity) and in particular denser breast tissue would make the diagnosis of pulmonary edema more difficult from the image alone.
Also, you seem to have conflated a few things in your second sentence. Deep in the article, they did have radiologists try to guess demographic attributes by looking at the x-ray images. They were pretty good at guessing female/male (unsurprising) and were not really able to guess age or race. So I'm super interested in how the AI model was able to be better at that than the human radiologists.
There can be differences which statistical models pick up which we humans don’t.
For example, a couple years ago there was a statistical model made which could fairly accurately predict (iirc >80%) the gender of a person based on a picture of their iris. At the time we didn’t know there was a visible iris difference between genders, but a statistical model found one.
That’s kind of the whole point of statistical classification models. Feed in a ton of data and the model will discover the differentiating features.
Put another way, If we knew all the possible differences between someone with cancer and without, we wouldn’t need statistical models at all, we could just automate the diagnosis.
We don’t know the indicators that we don’t know, so we don’t know if some possible indicators show up or don’t show up in a given group of people.
That is the danger of wholly relying on statistical models.
What groups have the financial means to get chest x-rays, and what groups do not? What historical events could create the circumstances where different groups have different health outcomes?
you ain't gonna like the truth but there are differences between the races and during med school they try to say it ain't so but once you start seeing patients there's differences in musculature/skin, all sorts. and if you have a good attending they tactfully tell you and you go 'was it in a study?' and nope nobody wants to publish it. and no i'm talking just stuff like scabies or diabetes.
Cancer progresses differently depending on ethnicity and sex. As does treatment and likelihood of receiving treatment at early stages.
Black women experience worse outcomes and are diagnosed with more severe forms of breast cancer than white women.
Cancer is not just one disease. Its progression will vary depending on type. If the AI is trained on only some strains of cancer, eg those traditionally found in white women in early detection scenarios, it might not generalize to other cancer types.
So yes, to your genuine question, medical imaging of cancer can vary depending on ethnicity because different cancers can vary between genetic backgrounds. Ideally there would be sufficient training data across the populations, but there isn't because of historical race bias. (Among other reasons.)
It disappoints me how easily we are collectively falling for what effectively is "Oh, our model is biased, but the only way to fix it is that everyone needs to give us all their data, so that we can eliminate that bias. If you think the model shouldn't be biased, you're morally obligated to give us everything you have for free. Oh but then we'll charge you for the outputs."
How convenient.
It's increasingly looking like the AI business model is "rent extracting middleman", just like the Elseviers et al of the academic publishing world - wedging themselves into a position where they get to take everything for free, but charge others at every opportunity.
We have to invent more ways to pay rich people for being rich, and AI looks like a promising one.
Do you think there is a middle ground for a progressive 'detailization' of the data -- you form a model based on the minimal data set that allows you to draw useful conclusions, and refine that with additional data to where you're capturing the vast majority of the problem space with minimal bias?
X-rays are ordered only after doctor decides it's recommended. If there's dismissal bias in the decision tree at that point, many ill chests are missing from training data.
Apparently providing this messy rough categorization appeared to help in some cases. From the article:
> To force CheXzero to avoid shortcuts and therefore try to mitigate this bias, the team repeated the experiment but deliberately gave the race, sex, or age of patients to the model together with the images. The model’s rate of “missed” diagnoses decreased by half—but only for some conditions.
In the end though I think you're right and we're just at the phases of hand-coding attributes. The bitter lesson always prevails
https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...
> Also important was the use [in Go] of learning by self play to learn a value function
I thought the self-play was the value function that made progress in Go. That is, it wasn't the case that we played through a lot of games and used that data to create a function that would assign a value to a Go board. Instead, the function to assign a value to a Go board would do some self-play on the board and assign value based on the outcome.
I think the model needs to be thought about human anatomy, not just fed a bunch of scans. It needs to understand what ribs and organs are.
I don't think LLMs can achieve "understanding" in that sense.
These aren't LLM. Most of the neat things in science, involving AI, aren't LLM. Next word prediction has extremely limited use with non-text data.
People seem to have started to use "LLM" to refer to any suite of software that includes an LLM somewhere within it; you can see them talking about LLM-generated art, for example.
Was it ascii art? ;)
https://hamatti.org/posts/art-forgery-llms-and-why-it-feels-...
People will just believe whatever they hear.
Computer vision models are not large language models; LLM does not mean generative AI or even AI in general, it stands for a specific initialism.
As Sara Hooker discussed in her paper https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1..., bias goes way beyond data.
I like how the author used neo-Greek words to sneak in graphic imagery that would normally be taboo in this register of writing
I really can’t help but think of the simulation hypothesis. What are the chances this copy-cat technology was developed when I was alive, given that it keeps going.
We may be in a simulation, but your odds of being alive to see this (conditioned on being born as a human at some point) aren't that low. Around 7% of all humans ever born are alive today!
In order to address the chances of a human being alive to witness the creation of this tech, you'd have to factor in the humans who have yet to be born. If you're a doomer, 7% is probably still fine. If we just maintain the current population for another century, it'll be much lower.
I dont believe that percentage. Especially considering how spread the homo branch already was more than 100 000 years ago. And from which point do you start counting? Homo erectus?
It kinda doesn't matter where you start counting. Exponential curves put almost everything at the end. Adding to the left side doesn't change it much.
You could go back to Lucy and add only a few million. Compared to the billions at this specific instant, it just doesn't make a difference.
I would imagine this is probably the source, which benchmarks using the last 200,000 years. https://www.prb.org/articles/how-many-people-have-ever-lived...
Given that we only hit the first billion people in 1804 and the second billion in 1927 it's not all that shocking.
That argument works both ways, it might be significantly higher depending how you count.
But this is also just the non-intuitiveness of exponential growth which has only now tapering off.
"The model used in the new study, called CheXzero, was developed in 2022 by a team at Stanford University using a data set of almost 400,000 chest x-rays of people from Boston with conditions such as pulmonary edema, an accumulation of fluids in the lungs. Researchers fed their model the x-ray images without any of the associated radiologist reports, which contained information about diagnoses. "
... very interesting that the inputs to the model had nothing related to race or gender, but somehow it still was able to miss diagnose Black and female patients? I am curious of the mechanism for this. Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis? I do remember when it came out that AI could predict race from medical images with no other information[1], so that part seems possible. But where would it get the idea to do a worse diagnosis, even if it determines this? Surely there is no medical literature that recommends this!
[1]https://news.mit.edu/2022/artificial-intelligence-predicts-p...
The non-tinfoil hat approach is to simply Google "Boston demographics", and think of how training data distribution impacts model performance.
> The data set used to train CheXzero included more men, more people between 40 and 80 years old, and more white patients, which Yang says underscores the need for larger, more diverse data sets.
I'm not a doctor so I cannot tell you how xrays differ across genders / ethnicities, but these models aren't magic (especially computer vision ones, which are usually much smaller). If there are meaningful differences and they don't see those specific cases in training data, they will always fail to recognize them at inference.
Non-technical suggestion: if AI represents an aspect of the collective unconscious, as it were, then a racist society would produce latently racist training data that manifests in racist output, without anyone at any step being overtly racist. Same as an image model having a preference for red apples (even though there are many colors of apple, and even red ones are not uniformly cherry red).
The training data has a preponderance of examples where doctors missed a clear diagnosis because of their unconscious bias? Then this outcome would be unsurprising.
An interesting test would be to see if a similar issue pops up for obese patients. A common complaint, IIUC, is that doctors will chalk up a complaint to their obesity rather than investigating further for a more specific (perhaps pathological) cause.
I'm going to wager an uneducated guess. Black people are less likely to go to the doctor for both economic and historical reasons so images from them are going to be underrepresented. So in some way I guess you could say that yes, latent racism caused people to go to the doctor less which made them appear less in the data.
Where the data comes from also matters. Data is collected based on what's available to the researcher. Data from a particular city or time period may have a very different distribution than the general population.
Men are also way less likely to go to Dr vs women. Yet this claims a bias against women as well.
> Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis?
The opposite. The dataset is for the standard model "white male", and the diagnoses generated pattern-matched on that. Because there's no gender or racial information, the model produced the statistically most likely result for white male, a result less likely to be correct for a patient that doesn't fit the standard model.
The better question is just "are you actually just selecting for symptom occurrence by socioeconomic group?"
Like you could modify the question to ask "is the model better at diagnosing people who went to a certain school?" and simplistically the answer would likely seem to be yes.
Then why is the headline not "AI models miss disease in Asian patients" or even "AI models miss disease in Latino patients"?
It just so happens to align with what maximizes political capital in today's world.
You really just have to understand one thing: AI is not intelligent. It's pattern matching without wisdom. If fewer people in the dataset are a particular race or gender it will do a shittier job predicting and won't even "understand" why or that it has bias, because it doesn't understand anything at a human level or even a dog level. At least most humans can learn their biases.
Isn't it kind of clear that it would have to be that the data they chose was influenced somehow by bias?
Machines don't spontaneously do this stuff. But the humans that train the machines definitely do it all the time. Mostly without even thinking about it.
I'm positive the issue is in the data selection and vetting. I would have been shocked if it was anything else.
LLMs don't and cannot want things. Human beings also like it when the future is mostly like the past. They just call that "predictability."
Human data is bias. You literally cannot remove one from the other.
There are some people who want to erase humanity's will and replace it with an anthropomorphized algorithm. These people concern me.
Can humans want things? Our reward structures sure seem aligned in a manner that encourages anthropomorphization.
Biases are symptoms of imperfect data, but that's hardly a human-specific problem.
> Can humans want things?
Yes. Do I have to prompt you? Or do you exist on your own?
> Our reward structures sure seem aligned in a manner that encourages anthropomorphization.
You do understand what that word /means/?
> are symptoms of imperfect data
Which means humans cannot generate perfect data. So good luck with all that high priced "training" you're doing. Mathematically errors compound.
> Yes. Do I have to prompt you? Or do you exist on your own?
I've gone through a significant amount of prompting and training, much of which has been explicitly tailed at understanding and addressing my biases. We all do; we certainly don't exist in isolation!
> You do understand what that word /means/?
Yes, what's the confusion? Analogy is a very powerful tool.
> Which means humans cannot generate perfect data.
Totally agree, nothing can possibly access perfect data, but surely that makes training all the more important?
The most concerning people are -- as ever -- those who only think that they are thinking. Those who keep trying to fit square pegs into triangular holes without, you know, stopping to reflect: who gave them those pegs in the first place, and to what end?
Why be obtuse? There is no "anthropomorphic fallacy" here to dispel. You know very well that "LLMs want" is simply a way of speaking about teleology without antagonizing people who are taught that they should be afraid of precise notions ("big words"). But accepting that bias can lead to some pretty funny conflations.
For example, humanity as a whole doesn't have this "will" you speak of any more than LLMs can "want"; will is an aspect of the consciousness of the individual. So you seem to be be uncritically anthropomorphizing social processes!
If we assume those to be chaotic, in that sense any sort of algorithm is slightly more anthropomorphic: at least it works towards a human-given and therefore human-comprehensible purpose -- on the other hand, whether there is some particular "destination of history" towards which humanity is moving, is a question that can only ever be speculated upon, but not definitively perceived.
> Why be obtuse?
In the context of the quote precision is called for. You cite fear but that's attempting to have it both ways.
> humanity as a whole doesn't have this "will" you speak of
Why not?
> will is an aspect of the consciousness of the individual.
I can't measure your will. I can measure the impact of your will through your actions in reality. See the problem? See why we can say "the will of humanity?"
> So you seem to be be uncritically anthropomorphizing social processes!
It's called "an aggregate."
> is a question that can only ever be speculated upon, but not definitively perceived.
The original point was that LLMs want the future to be like the past. You've way overshot the mark here.
> You've way overshot the mark here.
Nah, I'm just having fun.
>You cite fear but that's attempting to have it both ways.
Huh?
>In the context of the quote precision is called for.
Because we must make it explicit that AI is not conscious? But why?
Since you can only ever measure impacts on reality -- what difference does it make to you if there's a consciousness that's causing them or not?
>It's called "an aggregate."
An individual is conscious. Does it follow from this that the set of all individuals is itself conscious? I.e. do you say that it's appropriate to model humanity as sort of one giant human?
Humans anthropocize all sorts of things but there are way bigger consequences for treating current AI like a human than someone anthropocizing their dog.
I know plenty of people that believe LLMs think and reason the same way as humans do and it leads them to make bad choices. I'm really careful about the language I use around such people because we understand expressions like, "the AI thought this" very differently.
>Humans anthropocize all sorts of things but there are way bigger consequences for treating current AI like a human than someone anthropocizing their dog.
AI is less human-like than a dog, in the sense that an AI (hopefully!) is not capable of experiencing suffering.
AI is also more human-like than a dog; in the sense that, unlike a dog, an AI can apply political power.
I agree that there are considerable consequences for misconstruing the nature of things, especially when there's power involved.
>I know plenty of people that believe LLMs think and reason the same way as humans do and it leads them to make bad choices.
They're not completely wrong in their belief. It's just that you are able, thanks to your specialized training, to automatically make a particular distinction, for which most people simply have no basis for comparison. I agree that it's a very important distinction; I could also guess that even when you do your best to explain it to people, often they prove unable to grasp its nature, or its importance. Right?
See, everyone's trying to make sense of what's going on in their lives on the basis of whatever knowledge and conditioning they might have. Everyone gets it right some of the time and wrong most of the time. For example, humans also make bad choices as a result of misinterpreting other humans. Or by correctly interpreting and trusting other humans who happen to be wrong. There's nothing new about that. Nor is there a particular difference between suffering the consequences of AI-driven bad choice vs those of human-driven bad choice. In both cases, you're a human experiencing negative consequences.
AI stupidity is simply human stupidity distilled. If humans were to only ever speak logically correct statements in an unambiguous language, that's what an LLM's training data would contain, and in turn the acceptance criterion ("Turing test") for LLMs would be outputting other unambiguously correct statements.
However, it's 2025 and most humans don't actually reason, they vibe with the pulsations of the information medium. Give us something that looks remotely plausible and authoritative, and we'll readily consider it more valid than our own immediate thoughts and perceptions - or those of another human being.
That's what media did to us, not AI. It's been working its magic for at least a century, because humans aren't anywhere near rational creatures; we're sloppy. We don't have to be; we are able to teach ourselves a tiny bit of pure thought. Thankfully, we have a tool for when we want to constrain ourselves to only thinking in logically correct statements, and only expressing those things which unambiguously make sense: it's called programming.
Up to this point, learning how to reason was economically necessary, in order to be able to command computers. With LLMs becoming better, I fear thinking might be relegated to an entirely academic pursuit.
> If we assume those to be chaotic, in that sense any sort of algorithm is slightly more anthropomorphic: at least it works towards a human-given and therefore human-comprehensible purpose -- on the other hand, whether there is some particular "destination of history" towards which humanity is moving, is a question that can only ever be speculated upon, but not definitively perceived.
Do you not think that if you anthropomorphise things that aren't actually anthropic, that you then insert a bias towards those things? The bias will actually discriminate at the expense of people.
If that is so, the destination of history will inevitably be misanthropic.
Misplaced anthropomorphism is a genuine, present concern.
I'd say anthropomorphizing humans is already deeply misplaced!
Each one of us is totally unlike any other -- that's what's so cool about us! Long ago, my neighbor Diogenes proved, by means of a certain piece of poultry, that no universal Platonic ideal of human-ness can be reasonably established. (We've largely got the toxic fandom of my colleague Jesus to thank for having to even explain this nearly 2500 years after the fact.)
There is no universal "human shape" which we all fit, or are obliged to aspire to fit. It's precisely the mass delusions of there ever being such a thing which are fundamentally misanthropic. All they ever do is invoke a local Maxwellian process which heats shit up until it all blows the fuck up out of the orbit of the local attractor.
Look at history. Consider the epic fails that are fascism, communism, capitalism. Though they define it differently, they are all about this pernicious idea of "the correct way to human"; which implicitly requires the complementary category of "subhuman" for all featherless bipeds whose existence happens to defy the dominant delusion. In practice, all this can ever accomplish is to collapse under the weight of its own idiocy. But not without destroying innumerable individual humans first -- in the name of "all that is human", you see.
Materialists say the universe doesn't care about us puny humans anyway. But one only ever perceives the universe through one's own human senses, and ascribes meanings to it through one's own cogitations! Both are tragicomically imperfect, but they're all we've ever got to work with. Therefore, rather than try to convince myself I'm able to grasp the destination of the history of my species, I prefer to seek knowledge of those things which enable me to do right by myself and others in the present.
But one's gotta believe in something! Metaphysics is not only entertaining, it's also a primary source of motivation! So my belief is that if each one of us trusted one's own senses more -- and gave up on trying to delegate the answer of "how should I be?" to unaccountable authorities which are themselves not a human (but mere concepts, or else machinic assemblages of human behaviors which we can only ever grasp through concepts: such as "society", "morality", "humanity") -- then it'd all turn out fine!
It simplifies things considerably. Lets me focus on figuring out how they work. Were I to believe in the existence of some universal definition of what constitutes a human, I'd just end up not noticing that I was paying for a faulty dataset.
Suppose you have a system that saves 90% of lives on group A but only 80% of lives in group B.
This is due to the fact that you have considerably more training data on group A.
You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.
So the obvious thing to do is to have the technology intentionally kill ~1 out of every 10 patients from group A so the efficacy rate is ~80% for both groups. Problem solved
From the article:
> “What is clear is that it’s going to be really difficult to mitigate these biases,” says Judy Gichoya, an interventional radiologist and informatician at Emory University who was not involved in the study. Instead, she advocates for smaller, but more diverse data sets that test these AI models to identify their flaws and correct them on a small scale first. Even so, “Humans have to be in the loop,” she says. “AI can’t be left on its own.”
Quiz: What impact would smaller data sets have on efficacy for group A? How about group B? Explain your reasoning
> You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.
Who is preventing you in this imagined scenario?
There are drugs that are more effective on certain groups of people than others. BiDil, for example, is an FDA approved drug marketed to a single racial-ethnic group, African Americans, in the treatment of congestive heart failure. As long as the risks are understood there can be accommodations made ("this AI tool is for males only" etc). However such limitations and restrictions are rarely mentioned or understood by AI hype people.
What does this have to do with FDA or drugs? Re-read the comment I was replying to. It's complaining that a technology could serve one group of people better than another, and I would argue that this should not be our goal.
A technology should be judged by "does it provide value to any group or harm any other group". But endlessly dividing people into groups and saying how everything is unfair because it benefits group A over group B due to the nature of the problem, just results in endless hand-wringing and conservatism and delays useful technology from being released due to the fear of mean headlines like this.
No. That's not how it works.
It's contraindication. So you're in a race to the bottom in a busy hospital or clinic. Where people throw group A in a line to look at what the AI says, and doctors and nurses actually look at people in group B. Because you're trying to move patients through the enterprise.
The AI is never even given a chance to fail group B. But now you've got another problem with the optics.
> You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A
I think the point is you need to let group B know this tech works less well on them.
Imagine if you had a strawman so full of straw, it was the most strawfilled man that ever existed.
From the article:
> “What is clear is that it’s going to be really difficult to mitigate these biases,” says Judy Gichoya, an interventional radiologist and informatician at Emory University who was not involved in the study. Instead, she advocates for smaller, but more diverse data sets that test these AI models to identify their flaws and correct them on a small scale first. Even so, “Humans have to be in the loop,” she says. “AI can’t be left on its own.”
What do you think smaller data sets would do to a model? It'll get rid of disparity sure