Hello everyone,
Before I begin, let me introduce myself. My name is Thiago Peixoto Leal, and I am the self-proclaimed leader of bioinformatics analysis at Mata Lab (Cleveland Clinic) and the best bioinformatician in my household (though my daughter might surpass me in a few years).
As I considered what to write for the blog, I decided to share my experiences with genetic analysis, specifically focusing on admixed populations. I plan to write several interconnected posts. Each post will be able to stand alone, so you can read about phasing without having read the other posts. However, I will assume that you have read the previous posts, as good quality control (QC) is crucial for effective phasing, for example. My plan is share my experience with QC, phasing, ancestry analysis, genotype-phenotype studies without the academic formality.
Before we dive in, let’s cover the basics of Population Genetics, specifically the Hardy-Weinberg Equilibrium (HWE). HWE is a principle that states the genetic variation in a population will remain constant from one generation to the next in the absence of disturbing factors. The Hardy-Weinberg principle relies on several key assumptions: (i) Random mating (The population structure is absent, and matings occur in proportion to genotype frequencies), (ii) absence of natural selection, (iii) very large (infinite) population size: Genetic drift is negligible, (iv) no gene flow or migration, (v) no mutation, and (vi) the locus is autosomal (the HWE in chromosome X is calculated different). Evolution is a process that results in changes in the genetic material of a population over time [1] (yes, pokemon lied to us).
Now, let’s talk about the main topic of this post: why is it so important to study non-European populations?
The human population has a long history of migrations, beginning with the first Homo sapiens in Africa. These early humans spread to the Middle East, Asia, Europe, and the Americas over thousands of years (Figure 1). More recently, there have been other significant migrations, such as the transatlantic slave trade, which forcibly brought millions of Africans to the Americas during one of humanity’s darkest periods (Figure 1). Additionally, ongoing migrations include people fleeing war and other tragedies.
Figure 1: The human migration. Figure from [2]
Figure 2: Overview of the slave trade out of Africa, 1500-1900. Figure from [3]
Throughout this time, the human genome has been shaped by various evolutionary forces. Natural selection has acted on populations due to factors such as diseases or environmental conditions, like the low oxygen levels at high altitudes in the Andes or Himalayas. Gene flow has resulted in the creation of admixed populations from different parental groups. Additionally, sexual selection can introduce strong biases in certain populations. Genetic drift, the most potent evolutionary force in human populations, also plays a significant role. Essentially, the Hardy-Weinberg Equilibrium (HWE) represents an idealized scenario, as no population fully meets its assumptions in reality.
Genetics provides a powerful tool for understanding the past of populations, especially for events that were not recorded or whose records have been lost. It can also help elucidate the pathophysiology of diseases, leading to better medications and treatments. Genetics is truly fascinating, and everyone should have access to their DNA information.
However, in the realm of genetic research, there is a notable imbalance: over 94.4% of participants in Genome-Wide Association Studies (GWAS) are of European descent. While we have extensive knowledge about genetic susceptibility in European populations, this focus may not be fully applicable to other populations. Research suggests that this approach can be problematic.
Figure 3: Factors Affecting Ability to Replicate Genotype-Phenotype Associations across Populations (Transferability). Figure from [4]
Figure 3 (B) illustrates two hypotheses: (i) Shared Causative Variants: All populations may share the same causative variant, but the tagging SNPs (tagSNPs) differ between populations. A tagSNP is a variant that represents a group of variants (called haplotype). In this case, tagging a causative variant in different populations can be challenging if we assume that all populations use the same tagSNP. (ii) Population-Specific Variants: Each population might have its own unique causative variant and tagSNP.
Indeed, these scenarios occur. For example, in an Admixture Mapping study with Brazilian populations, a variant showed a strong female-specific effect (beta = 3.99 ± 0.84 kg/m² per A allele, 95% CI: 2.32 – 5.65) [5]. This variant is rare in Europeans but has a frequency of about 3% in West Africa. To put this into perspective, if you are a female with the AA genotype, this variant could add up to 8 points to your BMI.
Another example is a recent study on Parkinson’s Disease titled “Identification of Genetic Risk Loci and Causal Insights Associated with Parkinson’s Disease in African and African-Admixed Populations: A Genome-Wide Association Study” [6]. This study identified an ancestry-specific risk locus associated with African ancestry, highlighting how genetic risks can vary across different populations.
To conclude, it is well-established that drugs supported by human genetic evidence have a higher approval rate [7]. Consider how many potential new drugs, treatments, and insights into human history we might be missing out on by not including diverse populations in our studies.
Thank you for reading. In the next post, I will focus more on practical aspects rather than philosophical considerations.
P.S.: I prefer not to use the term “race” to describe ancestry or populations (white, african, etc). According to the NIH Curriculum Supplement Series, the genetic variation between any two humans is about 0.1% [8], so we do not have genetic differences to have more than one “race” in Homo sapiens. In my view, using the term “race” can be harmful and may be misused by some to claim superiority.
P.S.2: It is my first post in forum/blog. So I edited the post 3x until now. If I edit again, I will update here.
[1] J. Lachance. Hardy–Weinberg Equilibrium and Random Mating. Encyclopedia of Evolutionary Biology, Academic Press, 2016, Pages 208-211, ISBN 9780128004265, Redirecting.
[2] Mendes M, Alvim I, Borda V, Tarazona-Santos E. The history behind the mosaic of the Americas. Curr Opin Genet Dev. 2020 Jun;62:72-77. doi: 10.1016/j.gde.2020.06.007. Epub 2020 Jul 10. PMID: 32659643.
[3] David Eltis, ‘A Brief Overview of the Trans-Atlantic Slave Trade,’ Slave Voyages: The Trans-Atlantic Slave Trade Database, Trans-Atlantic Slave Trade - Understanding the Database (accessed September 05, 2024).
[4] Sirugo G, Williams SM, Tishkoff SA. The Missing Diversity in Human Genetic Studies. Cell. 2019 Mar 21;177(1):26-31. doi: 10.1016/j.cell.2019.02.048. Erratum in: Cell. 2019 May 2;177(4):1080. doi: 10.1016/j.cell.2019.04.032. PMID: 30901543; PMCID: PMC7380073.
[5] Scliar MO, Sant’Anna HP, Santolalla ML, Leal TP et al. Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong female-specific effect on body mass and fat mass indexes. Int J Obes (Lond). 2021 May;45(5):1017-1029. doi: 10.1038/s41366-021-00761-1. Epub 2021 Feb 26. PMID: 33633342; PMCID: PMC9952852.
[6] Rizig M, Bandres-Ciga S, Makarious MB, Ojo OO et al. Identification of genetic risk loci and causal insights associated with Parkinson’s disease in African and African admixed populations: a genome-wide association study. Lancet Neurol. 2023 Nov;22(11):1015-1025. doi: 10.1016/S1474-4422(23)00283-1. Epub 2023 Aug 23. PMID: 37633302; PMCID: PMC10593199.
[7] Minikel EV, Painter JL, Dong CC, Nelson MR. Refining the impact of genetic evidence on clinical success. Nature. 2024 May;629(8012):624-629. doi: 10.1038/s41586-024-07316-0. Epub 2024 Apr 17. PMID: 38632401; PMCID: PMC11096124.
[8] National Institutes of Health (U.S.). NIH Curriculum Supplement Series. [Bethesda, Md.] :The Institutes, 2007.