Should we keep asking for self-reported ancestry in highly admixed populations like Latin Americans?

Last week at the third LARGE-PD meeting in Querétaro, México, hosted by @paularp @anajimenahdz we realized something important while reviewing the patient questionnaire:

  1. We are asking about race rather than ancestry.

  2. The questions are modeled on surveys for North American populations, not Latin Americans.

This encouraged me to do a quick PubMed search and think on how we collect this information. And I realized, that:

  1. “Definitions matter” (PMID: 36962630):
  • Race = a social construct, often reflecting inequities.

  • Ethnicity = cultural affiliation, usually self-reported.

  • Ancestry = genealogical or genetic descent.

2. “How we ask matters”

In this article (PMID: 29179688), they show that Responses depend on how the question is asked:

  • The same individuals gave different responses depending on the method (checkbox vs. family history consults).

  • For some groups (African, European, East Asian), answers matched across methods.

  • For others (Mediterranean, Native American), family history gave more accurate results.

  1. “ One size doesn’t fit all”

The same study also showed that admixed populations, like Latin Americans (≈52% European, 24% Native American, 12% African), have low genetic correlation with self-reported categories.

So, summarizing this quick Pubmed search:

  • In less admixed populations → self-report ≈ genetics.

  • In admixed populations → self-report ≠ genetics, though still valuable for social context.

  • Genetics can complement, but definitions and methods matter.

With this evidence in mind, I’d like to open the discussion: @peixott @waldoe @hirotaka

Should we continue asking for self-reported ancestry in admixed populations — and if so, how should we frame the question?

7 Likes

Hi Paula! Thanks for bringing up this topic, it’s very important to discuss. In terms of the LARGE-PD questionnaire, we ask for both ethnicity (Hispanic or Non-Hispanic) and self-reported ancestry (African, Amerindian, Asian, European, other, or mixed), but not race, specifically for the reasons you brought up in this post. Since race is a social construct, I agree that it should not be used as a proxy for genetic ancestry, and as Fig. 3 mentions, it can be detrimental to include in health research and can be co-opted in a harmful manner. In terms of incorporation of self-reported ancestry into analyses I think Gouveia et al., 2025 (PMID: 40480197) does a great job assessing the correlation between self-reported and genetic ancestry in the All of Us cohort. (Specifically, I find Fig. 7 particularly helpful) This quote seems to address some of the applicability of self-reported ethnicity within genetic studies:

“Therefore, we do not recommend using race and ethnicity as proxies for ancestry in genetic studies, including association models. Rather, we support the use of race and ethnicity as proxies of social, environmental, and historical factors that influence health outcomes only as an exception, rather than as a default practice, when more direct measurements of these factors are not available. Specifically, race/ethnicity may be helpful in the absence of relevant socio-environmental covariates and if empirical evidence supports its predictive value after accounting for all available covariates. Although race/ethnicity may serve as a proxy for environmental effects, directly adjusting for more specific environmental factors affecting the outcome is preferable when such data are available.”

The paper also notes that specifically in Latin American participants, country of origin shows association with genetic ancestry (for example Mexicans in the US tended to have high Native American ancestry, while Puerto Ricans tended to have higher African ancestry, both of which agree with the ancestries observed in LARGE-PD). Perhaps this indicates that for LARGE-PD and potentially other admixed populations, country/site/geographic region would be a more useful proxy for genetic ancestry than self-reported ethnicity/ancestry. I still think there is a use for collecting self-reported ancestry information, as you said it can be valuable for assessing social context.

You also mentioned family consults and I wanted to point out that the LARGE-PD questionnaire also includes maternal and paternal ancestry (however in the context of this question ancestry refers to country of origin), so this could be a potential area to compare correlation with genetic ancestry.

This was a really long comment, so apologies for that, but I think this is a really important discussion to bring up and I’m happy to hear additional thoughts from you!

TLDR: I agree that race should not be included in genetic studies, ethnicity should be collected but not used as a proxy for genetic ancestry, and that the LARGE-PD questionnaire specifically could potentially benefit from increased specificity in the wording of the questions about ancestry and ethnicity (especially since the general ancestry question and the parental ancestry questions don’t use ancestry to refer to the same metric).

6 Likes

Thank you @psaffie and @waldoe for sharing your thoughts and experiences about this important topic. Wondering if others who have worked with ancestry and ethnicity in their research would like to weigh in? @peixott @danieltds @kathrynstep @paularp @felipe_duartez @schuh.afs @mario.cornejo.o

Hello,
I’d like to share my thoughts on this topic. In my view, the first step is to remove the word “race” because of how it’s interpreted by the general public. For example, when I say “football,” you probably imagine a sport with 11 players on each side whose main objective is to score more goals than the opponent (yes—the real football, not the “hand-prolate-spheroid”). But when you say “race,” people often imagine separate species, like sunfish and barracuda. Yet the human species is a single race. Using the term “race” to categorize anything outside of phylogeny, in my humble postdoc opinion (PI > PhD Student > Master Student > General Public > Postdoc), is inaccurate and risks fueling some of the worst ideologies in human history.

That said, the concept often labeled as “race” can still provide indirect insights into other factors that are frequently overlooked. Many genetic studies fail to incorporate key demographic variables such as income, education level, and lifestyle factors, and in such cases this variable is sometimes used as a rough proxy. However, even this approach is problematic, because the challenges faced by members of any group differ from one individual to another, which can lead to misleading or distorted conclusions.

One example is a paper that I worked in 2015 (https://www.ahajournals.org/doi/full/10.1161/HYPERTENSIONAHA.115.06609). I will put some small pieces of it:

Previous Brazilian studies, based on ethnoracial self-classification, reported greater prevalence of high blood pressure or hypertension among self-reported black adults and among blacks who reported having discrimination.”

What was the conclusion?
“Among those with African ancestry, 59.4% came from East and 40.6% from West Africa. Baseline systolic and diastolic blood pressure, controlled hypertension, and their respective trajectories, were not significantly (P>0.05) associated with level (in quintiles) of African genomic ancestryLower schooling level (<4 years versus higher) showed a significant and positive association with systolic blood pressure (Adjusted β=2.92; 95% confidence interval, 0.85–4.99). Lower monthly household income per capita (<USD 180.00 versus higher) showed an inverse association with hypertension control (β=−0.35; 95% confidence interval, −0.63 to −0.08, respectively)”.

In this work, you can see how using the term “African race” was harmful. It’s possible that “non-African” individuals with low income or lower education levels were overlooked because the association was misattributed to the wrong category.

For this reason, I advocate for improving data collection so that we gather the actual demographic and environmental information rather than relying on a proxy that is noisy and imprecise. For example, when I had to fill out a form to enroll my daughter in school, “race” was a mandatory field. I ended up having to select both “white” and “black”—and believe me, my daughter is not a Dalmatian. I honestly don’t know what kind of study they plan to conduct with such data, but my daughter is a statistical noise.

If you have genetic data (and a good team—I’m fortunate to have both), you can infer genetic ancestry. Ancestry is a much better term because it doesn’t divide humans into different races; rather, it recognizes a single human species with diverse geographic origins.

However, this leads to another challenge: how should ancestry-specific effects be reported? That’s a complex topic, and it will address it in another post. :ghost:

3 Likes

I haven’t worked specifically with this race/ethnicity/ancestry dilemma but I wanted to thank you @gginnan for tagging me as I learned a lot from reading all the comments here! This was a great discussion!

Hi guys,

Great dialogue so far! First, as alluded to, but I’ll be explicit: race and ethnicity are social constructs. They are regionally and culturally dependent, oftentimes driven by official policies (i.e., the US Census in the USA). I will encourage everyone to importantly continue to collect self-reported race/ethnicity/identity as it captures invaluable cultural/environmental differences that is generally ignored in genetic studies. While hold findings from GWAS studies as being independent genetic findings, they are also at times, reflecting GxE relationships with the E being wide-ranging environmental factors, e.g. in studies of isolated populations, the genetic findings can reflect GxE relationships since all in the isolated population might be exposed to specific non-genetic/exogenous factors (i.e. specific diet, microbial strains, UV levels, etc). Similarly, race/ethnicty captures the shared culturally/socially connected exposures that many within that ethnoracial group may share/be exposed to.

2 Likes