This isn’t an AI problem but a general medical field problem. It is a big issue with basically any population centric analysis where the people involved in the study don’t have a perfect subset of the worlds population to model human health; they have a couple hundred blood samples from patients at a Boise hospital over the past 10 years perhaps. And they validate this population against some other available cohort that is similarly constrained by what is practically possible to sample and catalog and might not even see the same markers shake out between disease and healthy.
There are a couple populations that are really overrepresented as a result of these available datasets. Utah populations on one hand because they are genetically bottlenecked and therefore have better signal to noise in theory. And on the other the Yoruba tribe out of west africa as a model of the most diverse and ancestral population of humans for studies that concern themselves with how populations evolved perhaps.
There are other projects too amassing population data. About 2/3rd of the population of iceland has been sequenced and this dataset is also frequently used.
It's a generative AI LLM hype issue because it follows the confidence game playbook. Feed someone correct ideas and answers that fit their biases until they trust you, then when the time is right, suggest things that fit their biases but give incorrect (and exploitative) results.