Start with two populations undergoing neutral drift but with no frequency differences on the alleles that influence the trait (i.e. no genetically causal population differences).
Generate a phenotype that differs slightly between the populations for entirely non-genetic reasons (i.e. a difference in the environmental means). Drift + environmental differences = population stratification.
Run a GWAS that doesn't control for population stratification. The GWAS will orient all of the drifted alleles towards the environmental differences and the resulting polygenic score will massively exaggerate population genetic differences that do not actually exist.
The GWAS will still learn the causal genetic effects too, so the score will be an accurate predictor within the populations. Everything looks like it's working correctly. No way to tell the population differences are an artifact.
Is population stratification a problem in real GWAS data? Recent family GWAS (Tan et al. https://medrxiv.org/content/10.1101/2024.10.01.24314703v1.full-text) estimated 50% of the GWAS effects for Education and 65-75% for IQ, Income were not direct, with population stratification likely the major cause.
Comments
Lots of subtle data landmines out there making environments look like "straight biology". /x