Healthy people with genetic risks in PPMI?

Hello everyone,

I am analysing the different subgroups available in PPMI’s curated data, and there are 6 people in the healthy cohort in which 2 have PRKN subgroup, 3 have GBA subgroup, and 1 has PINK1 subgroup. All of them have a primary diagnosis (PRIMDIAG) of “No PD nor other neurological disorder” (17) at baseline.

Looking at the description of the Prodromal cohort, it seems these genetic risk factors would put these people in the Prodromal cohort rather than the healthy cohort. I understand there was some change in terms of PPMI inclusion criteria in the past, but I thought after those changes everything got harmonised, so I don’t understand why these people got assigned to the Healthy cohort.

However, I’m actually wondering whether there has been some error in the generation of the curated dataset, because when I look for these people in the “Participant_Status” CSV, the corresponding `ENRL…` variables are all either zero or empty, which would mean no mutation was recorded and thus the subgroup definition in the curated dataset is wrong. Also in the “Data Dictionary” tab of the curated dataset, the original variables used for the subgroup are indicated to include `CON…` which don’t seem to be available anywhere in LONI, so once again makes me believe this came from some old generation that is not correct anymore and not consistent with the available CSVs in PPMI (so I’m either missing something or this needs to be corrected).

Would anyone be able to help me understand what might have happened here? Thanks!

2 Likes

This is a great question! I don’t have an answer, unfortunately, but will be tracking this post as I’m really interested in the outcome.

1 Like

Hi, just as a quick update on this topic. I was looking to the new curated dataset from a few weeks ago (2025-11-12). I can see there was an update for the “subgroup”, and now the “Original Variables” do not include the `CON…` anymore, it was added that the original dataset used now includes the “iu_genetic_consensus”, and in the Derivation Notes we now have a clarification saying “Genetic consensus variant data from the “iu_genetic_consensus” file are used when available. If genetic consensus data is missing, use enrollment (ENRL) indicator.”

After checking the “iu_genetic_consensus” CSV now I can see how the genetic subgroups for these healthy people came from, so at least this part is clearer now (I could swear I didn’t see this file in LONI before, but the version date is for around a week before I created this topic, so I guess I just missed it).

However, I believe some things are still not clear: how is it possible that this “genetic consensus” data is different from the “Participant_Status” CSV? And, in this case, shouldn’t this still mean that if these people have these genetic risks at baseline, they should have gone into the Prodromal cohort instead? :thinking: I feel I’m missing some key methodological information.

1 Like