Navigating PDBP studies

As many of you know, the Parkinson’s Disease Biomarkers Program (PDBP) is an invaluable resource, providing a wealth of open data for our field. Personally, I’ve been accessing PDBP data mainly through the AMP-PD platform since its introduction. However, I recently had the opportunity to download data directly from the DMR for this post, and I was surprised to see how much the PDBP data has expanded in recent years. I’ve taken a closer look at the structure of PDBP study data and I’m excited to share these insights with you, hoping it will enhance and expedite your research endeavors

Data Platform

Access to PDBP data is available through the Data Management Resource (DMR). Researchers can apply for data access and, once approved, log in via the button at the top right of the PDBP website.

Data Selection and Download

After logging in, use the “Query” module to select specific studies and forms. For comprehensive data, click “Add All” on the page’s right-hand side. The selected data will be moved to the “Data Cart”. From there, click “Download Data Cart to Queue”. The data isn’t immediately downloadable, but you’ll receive an email notification once it’s ready.

Study Structure

PDBP encompasses multiple studies, not just one. When I last downloaded the data, I found 31 studies in the data. Please note that some of the studies, although they are on DMR, do not have their study data publicly available.

Each study within PDBP has a distinct design. Finding comprehensive study descriptions online is challenging, but I’ve visited most descriptions in the DMR and compiled them in this table’s first tab. We can see that some studies focus on specific conditions like GBA-PD, LRRK2-PD, and non-PD conditions such as PSP and MSA. The second tab cross-tabulates the unique GUIDs per VisitType, indicating which studies are cross-sectional.

Overlap with AMP-PD

PDBP studies are also accessible on the AMP-PD platform. This is advantageous if you’re interested in a study featured in both databases, as it allows for seamless analysis integration with other AMP-PD studies. However, it’s important to note that AMP-PD currently does not provide PDBP study IDs. To address this, you might need to combine the study ID with the GUID in AMP-PD, especially if you’re concerned about varying study designs within PDBP. The spreadsheet’s third tab displays the GUID correlations between PDBP and AMP-PD studies.

Multiple Enrollment

Many PDBP participants are enrolled in more than one study. The spreadsheet’s fourth tab provides a detailed account of these overlapping patterns and counts. When merging data across different sources, it’s safer to merge both study ID and GUID to avoid unintentional duplications.

Summary

The use of GUIDs for participant identification across various sub-studies in PDBP is effective. Researchers can integrate data from multiple studies, facilitating more comprehensive analyses. Additionally, the support and assistance from the PDBP team have been invaluable in navigating this resource. I am immensely grateful for the open science resources and the collaborative community that makes them available.

Have you had the chance to work with PDBP DMR data, or are you considering it? I’d love to hear about your experiences and insights.

5 Likes

Thank you a lot, Hirotaka, for this post! PDBP was a little difficult for me to understand when I first accessed it and having this type of post will certainlly will help a lot new people get into that database!

2 Likes

This is super helpful! I have accessed PDBP data before but it’s been a few years so I find this update incredibly useful. Also good to know about the genetic PD and some of the subtypes - this is another bit of information I try to find when looking at cohorts. BioFIND for example is idiopathic PD as far as I can tell, while PPMI has genetic PD and idiopathic PD - all good things to know when you want to validate across cohorts.

I especially appreciate the practical bits - linking up with AMP PD IDs, finding study information, etc. I think all of this would be very useful to include in the Data Wrangling Guide. @lmackenzie can we add this to the agenda for the next meeting? A peek at this post for some useful information?

@danieltds @psaffie These are some great examples of “quirks” - the study structure/not all study data being publicly available, AMP-PD not providing PDBP study IDs (which @hirotaka addressed by sharing how to match them up if need be), enrollment in multiple studies. All great things to know when you are accessing the data.

3 Likes

Agreed @ehutchins! Since @hirotaka has already done a great job on this post, I think it could be reformatted and added to the data wrangling guide. I’ll make sure to add it to the agenda.

2 Likes