Accessing Data: LONI vs AMP PD

What experiences do others have with accessing data from LONI vs AMP PD? I primarily have done analysis with the PPMI and BioFIND cohorts, which have data available both at LONI and AMP PD. The AMP PD data harmonization team has put in a lot of work in order to harmonize data across multiple cohorts, which is incredibly useful, especially when comparing data across cohorts.

However, when looking at one cohort only whether it’s PPMI or BioFIND - I often go directly to LONI and get clinical data there, as I’ve found the occasional field missing from the harmonized data (though this may have been fixed since the last time I checked). In my practice, I’ve found myself:

  • using AMP PD for raw transcriptomics data
  • using LONI for clinical data when analyzing one cohort only

I’m curious how others approach data access. What are your preferences?

  1. How easy/difficult do you find it to use the Terra system within AMP PD?
  2. Do you keep your data in the cloud when using AMP PD or do you want to download and use locally?
  3. Have you found any major differences between the data releases at LONI vs AMP PD?
4 Likes

This is interesting to see – this is also ignorance on my end, but @ehutchins do you happen to know what the process looks like for determining what makes it into the harmonized data versus what is excluded? Is there a process to advocate for adding/removing or is it pretty set?

I’d be curious also to hear if @hirotaka @danieltds @paularp @vdardov have thoughts/experiences with your questions!

Hi @jgottesman and @ehutchins ,
My workflow is to check with AMP-PD first and then if the data is not available, I go to LONI or PDBP.

  1. There was a learning curve but I feel comfortable with Terra now.
  2. I generally avoid downloading data from AMP-PD. I do analysis on Terra as quickly as possible and then kill the machine to save some money.
  3. Indeed the harmonized data was not exactly same as the data I processed from LONI. But I believe the AMP-PD team should have done a better job than me :sunglasses:. On the other hand, I have an instinct as a data wrangler that I would like to start from the data as original as possible, and control all the data processing. So if I focus on just one cohort, I may want to go to LONI. Additionally, PPMI’s curated data are useful for me because it populates many important variables and ready-to-analyze.
3 Likes

I currently do not have a lot of experience with the AMP-PD dataset, even though my research will use both PPMI and PDBP datasets (which are covered in AMP-PD). This has to do with the following reasons:

1 - I think I need to learn the original datasets first in order to use the harmonized dataset and that there are some useful information in the individual datasets that will not be present in the harmonized version
2 - From what I’ve heard, I need to do my analysis in the Terra platform in AMP-PD, which probably isn’t something that difficult, however, it is one additional thing I also have to learn. Can I download data from Terra and use it in Google Colab? From the previous answers in this post, it seems to be possible, something I did not know of.

In general, I know that I need to learn more about this powerful tool, which is AMP-PD. If someone is willing and knowledgeable about this tool, I would love to read more about it in an introductory post hehehe!

2 Likes

@hirotaka That’s a great way to distill it - I tend to think similarly as a data wrangler. But I also know that the harmonization team put in a great effort to get the data together.

@danieltds You can download data from Terra and then use it however you like. I’ve done this with some smaller data tables and things that I want to integrate more easily with what I already have set up locally (the egress fees are much smaller if the data is small).

1 Like