How much curation goes into the Curated Data Cut in PPMI?

Hi folks! I have a small question about the Curated Data Cut in PPMI: how much curation goes into this table? If I downloaded the source data tables and joined them together, would I create exactly the same dataset?

My reason for asking is that I discovered from this post that the Consensus Committee has been retired https://rcop.michaeljfox.org/t/new-ppmi-cohort-designation-methodology/775. My understand was that previously the Consensus Committee would make some adjustments to the dataset where appropriate, such as moving patients from one COHORT to another as needed, and I definitely saw differences between patients’ enrollment COHORT and the committee-agreed CONCOHORT (for example). Since the committee no longer meets, does this mean that there is less curation of the raw data?

Ultimately, the Curated Data Cut is very convenient for holding lots of commonly-needed attributes in one table, regardless of the level of curation that goes into it. But I was curious about whether the curation is now largely the convenient join, or if there is still some further updating between the raw tables and the joined table?

Thanks for your insight!

2 Likes

@vcatterson The curated data cut actually has supplemental derivation information in the “Data Dictionary” worksheet, it is quite a bit more than a simple join. If you check out the The consensus cohort designations aren’t used any more given that there is now so much more activity going into the biological definitions and the NSD staging. It is a little bit of a “choose your own adventure” on cohorts because there are so many different cohort splits that are driven by research question (e.g., does genetic status matter)!

2 Likes

Ahh, this is so helpful, thank you! I see now that “cohort” could mean different things, depending on whether you are interested in NSD staging, genetic predisposition, etc, and therefore it is less meaningful to define a “consensus cohort”. I will explore more the new potential cohorts, and add these new labels into my analysis.

Also, I see now that in the Data Dictionary sheet of the Curated Data Cut, there is a column named “Derivation Notes”, which gives details of the transformations performed in the case of the more complex derived terms. Thanks, this is exactly what I was looking for!

1 Like

Yes and thank you for bringing this up and thanks @lkirsch for answering. These points would all be helpful as we’re looking at updates for the data wrangling guide.

1 Like