Item 43497952

dekhn • 5 days ago

I use the term genetic history, rather than race, as race is only weakly correlated with body level phenotypes.

If your question is truly in good faith (rather than a "I want to get in argument "), then my answer is: it's complicated. Machine learning models that work on images learn extremely complicated correlations between pixels and labels. If on average, people with a specific genetic history had slightly larger ribcages (due to their genetics, or even socioeconomic status that correlated with genetic history), that would exhibit in a number of ways in the pixels of a radiograph- larger bones spread across more pixels, density of bones slightly higher or lower, organ size differences, etc.

It is true that Africa has more genetic diversity than anywhere else; the current explanation is that after humans arose in africa, they spread and evolved extensively, but only a small number of genetically limited groups left africa and reproduced/evolved elsewhere in the world.

KittenInABox • 5 days ago

I am genuinely asking because it makes no sense to me that a genetically diverse group are distinctly identifiable by their ribcage bones in an x-ray. If it's something more specific like AI sucks at statistically larger ribcages, statistically noticeable bone densities, or similar, okay. But something like so-small-humans-cannot-tell-but-is-simultaneously-widely-applicable-to-a-large-genetic-population is utterly baffling to me.

3 replies

dekhn • 4 days ago

I dunno. My perspective is that I've worked in ML for 30+ years now and over time, unsupervised clustering and direct featurization (IE, treating the image pixel as the features, rather than extracting features) have shown great utility in uncovering subtle correlations that humans don't notice. Sometimes, with careful analysis, you can sort of explain these ("it turns out the unlabelled images had the name of the hospital embedded in them, and hospital 1 had more cancer patients than hospital 2 patients because it was a regional cancer center, so the predictor learned to predict cancer more often for images that came from hospital 1") while other cases, no human, even a genius, could possibly understand the combination of variables that contributed to an output (pretty much anything in cellular biology, where billions of instances of millions of different factors act along with feedback loops and other regulation to produce systems that are robust to perturbations).

I concluded long ago I wasn't smart enough to understand some things, but by using ML, simulations, and statistics, I could augment my native intelligence and make sense of complex systems in biology. With mixed results- I don't think we're anywhere close to solving the generalized genotype to phenotype problem.

1 reply

bflesch • 4 days ago

Sounds like "geoguesser" players who learn to recognize google street view pictures from a specific country by looking at the color of the google street view car or a specific piece of dirt on the camera lens.

1 reply

dekhn • 4 days ago

Yeah, there's also an likely apocryphal story about tanks and machine learning: https://gwern.net/tank

The more you work with large-scale ML systems the more you develop an intuition for these kinds of properties. If you work a lot with debugging models and training data, or even just dimensionality reduction and matrix factorization, you begin to realize that many features are highly correlated with each other, often being close to scaled linear.

echoangle • 4 days ago

> it makes no sense to me that a genetically diverse group are distinctly identifiable by their ribcage bones in an x-ray

I don't see how diversity would prevent identification. Butterflies are very diverse, but I still recognize one and don't think it's a bird. As long as the diversity is constrained to specific features, it can still be discriminated (and even if it's not, it technically still could be by just excluding everything else).

stevenhuang • 4 days ago

If differences exist then statistical methods will have a better chance at finding them than human intuition, yes. I'm not sure why this is baffling to you.