An Introduction to the PPMI Dataset

Hello everyone! Today, I’m here to make a fairly simple post aimed at those who have little or no familiarity with one of the main open datasets about Parkinson’s disease: the Parkinson’s Progression Markers Initiative (PPMI). My objective is not to exhaust information about the study, which is detailed on its own website, but merely to introduce it to those who might be unfamiliar with it, and highlight its relevance.

My post will follow this flow:

1 - What is PPMI?
2 - What is PPMI’s Relevance?
3 - Which Types of Patients Participate in PPMI?
4 - How to Gain Access to PPMI?
5 - Useful Links and Documents Related to PPMI

1 - What is PPMI?

“The Study That Could Change Everything” is how the Michael J. Fox Foundation describes PPMI on its website. PPMI is a cohort that evaluates a variety of clinical, genetic, biological, and neuroimaging data from Parkinson’s disease patients, patients with prodromal symptoms of the disease, and healthy individuals. It is a database filled with useful information and has become a benchmark in Parkinson’s Disease research.

PPMI’s primary goals are to compare progression biomarkers between different cohorts, including individuals diagnosed with PD, genetic mutations, prodromal symptoms, and healthy participants. It uses an array of clinical, digital, imaging, biological, and genetic data and employs various analysis techniques to understand changes and variability in outcomes. The focus is on known measures of disease progression and analyzing subgroup differences, aiming for a better characterization of Parkinson’s Disease progression.

PPMI actually encompasses various studies: PPMI Clinical (intensive in-person longitudinal evaluations of various cohorts), PPMI Remote (remote study activities using smell tests, genotyping, and digital sensor technologies), and PPMI Online (online evaluations with patient-reported outcomes). In this post, I will focus on PPMI Clinical.

2 - What is PPMI’s Relevance?

PPMI is a pioneering study in the field of Parkinson’s disease, being one of the first to freely provide an abundance of longitudinal patient data. Its quality has been progressively recognized by the academic community, with a growing number of publications released each year. Below, I will show you some results that demonstrate this recognition based on a search conducted on PubMed.

Number of publications with PPMI data over the years on the PubMed platform (as of August 2023)

Percentage of Parkinson’s disease publications using PPMI data relative to total publications on the disease over the years on the PubMed platform (as of August 2023)

Number of publications with PPMI data over the years on the PubMed platform and projection for 2030

Additionnaly, recently, the PPMI achieved a significant milestone by establishing one of the first highly precise biomarkers for Parkinson’s Disease (PD): the analysis of the α-synuclein seed amplification assays (SAA). This α-synuclein SAA serves as a vital tool for the biochemical diagnosis of Parkinson’s disease, with impressive accuracy. The results of this analysis indicate that the assay effectively categorizes individuals with Parkinson’s disease, offering high sensitivity and specificity. Moreover, it reveals information about molecular variations and identifies individuals in the prodromal phase before an official diagnosis. For more information, seek out this article: Assessment of heterogeneity among participants in the Parkinson’s Progression Markers Initiative cohort using α-synuclein seed amplification: a cross-sectional study

3 - Which Types of Patients Participate in PPMI?

PPMI currently consists of several distinct cohorts (for more information, click here):

1 - Parkinson’s disease: patients diagnosed with sporadic Parkinson’s Disease and associated with pathogenic variants (LRRK2, GBA, SNCA, and others).
2 - Prodromal: patients at increased risk for developing Parkinson’s Disease based on clinical data, genetic variants, or other biomarkers.
3 - Controls: participants with no neurological disorder and no first-degree relative with PD and normal dopamine transporter (DAT) SPECT.
4 - Legacy cohorts: two cohorts initially admitted to PPMI will not enter the current phase of study expansion: genetic registry and Scan without dopaminergic deficit (SWEDD).

4 - How to Gain Access to PPMI?

Gaining access to PPMI data is a straightforward process. Visit the main study site and click the “Access data” option at the top left. Here, you can apply for data access, a process requiring some personal information, detailed motivations for your intended analyses, and agreement with data usage terms. The study committee will review your request and respond within a week. Once access is granted, simply log in on this site to access them!

5 - Useful Links and Documents Related to PPMI

Available on PPMI’s own site

The PPMI website contains a lot of useful information. These include research documents such as the study protocol, schedule of activities, operations manual, biological data acquisition manual, pathology data acquisition manual, genetic data processing manual, acquisition protocols for MRI, DTI, and SPECT.

Ongoing Specimen Analysis lists ongoing or completed projects, indicating if their results are publicly available.

Available on the PPMI’s online data access platform (LONI - requires login)

After logging in, select “Download” at the top of the home screen, and choose “Study Data”. Here, you’ll find files to guide you in using PPMI data, including:

Overview/Quick Start Guide: explains the expansion that the study has undergone since its inception, and some changes that were made to the original cohorts.

Consensus Committee Analytic Dataset: an extremely important database that must be used to officially determine to which cohort each patient belongs. I repeat: every definition of the group to which a patient belongs comes from this database.

PPMI Analytic Dataset Guide: a document that explains why the above document was created to define cohorts of patients and how to use it.

PPMI Data User Guide: a document that provides an introduction and serves as a reference for explaining how to interpret certain variables and conduct some analyses. I recommend reading it.

In the “Study Docs” tab, you’ll also find the Code List and Data Dictionary, which provide meanings for each column name and their values.

Conclusion
Well, that concludes this brief introduction to this important dataset! Hope you all find the information herein presented useful! If you have any suggestions or corrections to make to this post, please send me a message or reply here! Thanks for you attention!

10 Likes

Wow, what a thoughtful and thorough post! This sort of documentation will, I’m sure, be very useful for anyone coming to PPMI for the first time.

In thinking about future users, I’ve added some tags to this Topic and many others on Discourse so that folks coming to the DCoP can find information more easily. @DCoP_Innovators, it would be great if you can use tags on your posts, and let us know on this Topic whether you need a new tag.

Thank you for the post, Daniel!

4 Likes

This is great! Thanks for all this info. Through LONI, are there any analysis resources or is LONI just meant for accessing/downloading data?

I’m glad you liked it! I think LONI shows some basic cohort information but does not enable any kind of anylisis, but I may be wrong.

1 Like

@vdardov and @danieltds, I think I can shed a little light on your question. LONI is a repository but not a research gateway - there are no embedded analysis tools or data explorers. However, in the PPMI Data User Guide that Daniel mentioned, there is quite a lot of sample code and technical support to give users a head start with their own analyses.

3 Likes