Healthy people with genetic risks in PPMI?

tiago.azevedo · October 31, 2025, 12:45pm

Hello everyone,

I am analysing the different subgroups available in PPMI’s curated data, and there are 6 people in the healthy cohort in which 2 have PRKN subgroup, 3 have GBA subgroup, and 1 has PINK1 subgroup. All of them have a primary diagnosis (PRIMDIAG) of “No PD nor other neurological disorder” (17) at baseline.

Looking at the description of the Prodromal cohort, it seems these genetic risk factors would put these people in the Prodromal cohort rather than the healthy cohort. I understand there was some change in terms of PPMI inclusion criteria in the past, but I thought after those changes everything got harmonised, so I don’t understand why these people got assigned to the Healthy cohort.

However, I’m actually wondering whether there has been some error in the generation of the curated dataset, because when I look for these people in the “Participant_Status” CSV, the corresponding `ENRL…` variables are all either zero or empty, which would mean no mutation was recorded and thus the subgroup definition in the curated dataset is wrong. Also in the “Data Dictionary” tab of the curated dataset, the original variables used for the subgroup are indicated to include `CON…` which don’t seem to be available anywhere in LONI, so once again makes me believe this came from some old generation that is not correct anymore and not consistent with the available CSVs in PPMI (so I’m either missing something or this needs to be corrected).

Would anyone be able to help me understand what might have happened here? Thanks!

vcatterson · October 31, 2025, 8:36pm

This is a great question! I don’t have an answer, unfortunately, but will be tracking this post as I’m really interested in the outcome.

tiago.azevedo · December 5, 2025, 2:52pm

Hi, just as a quick update on this topic. I was looking to the new curated dataset from a few weeks ago (2025-11-12). I can see there was an update for the “subgroup”, and now the “Original Variables” do not include the `CON…` anymore, it was added that the original dataset used now includes the “iu_genetic_consensus”, and in the Derivation Notes we now have a clarification saying “Genetic consensus variant data from the “iu_genetic_consensus” file are used when available. If genetic consensus data is missing, use enrollment (ENRL) indicator.”

After checking the “iu_genetic_consensus” CSV now I can see how the genetic subgroups for these healthy people came from, so at least this part is clearer now (I could swear I didn’t see this file in LONI before, but the version date is for around a week before I created this topic, so I guess I just missed it).

However, I believe some things are still not clear: how is it possible that this “genetic consensus” data is different from the “Participant_Status” CSV? And, in this case, shouldn’t this still mean that if these people have these genetic risks at baseline, they should have gone into the Prodromal cohort instead? I feel I’m missing some key methodological information.

Topic		Replies	Views
Which are the correct identifiers for patients in the PPMI cohort? Analyzing and Reusing Data meta , ppmi , data-interpretation , documentation	6	139	August 8, 2023
An Introduction to the PPMI Dataset Accessing and Understanding Data ppmi , data-access , featured-content	4	270	August 17, 2023
How much curation goes into the Curated Data Cut in PPMI? Accessing and Understanding Data ppmi , data-access , data-curation	3	39	August 29, 2025
Using PPMI to study long term PD progression Analyzing and Reusing Data how-to , subtyping , ppmi , data-analysis	5	80	March 22, 2024
Discovering patients who got a PD diagnosis in PPMI and Fox Insight cohorts Analyzing and Reusing Data fox-insight , how-to , ppmi , data-interpretation , documentation	4	57	June 22, 2023

Healthy people with genetic risks in PPMI?

Related topics