What are the intended/best ways to select MR images for download, matched to subject data?

I can’t really understand what’s the best way (or the intended way) to search and download MR images, matching them to subject information.

An example: I want MRs that are T1, isotropic 3D, and I want each image (each timepoint) to be matched to my DB with demographics and neuropsychological scores.

I’d expect something like: I get a PATNO, an EVENT_ID and from that an unique image id.

I can’t find a way to do this that doesn’t feel like hacking. What I do now is to go in advanced image search, filter there for “T1” on the website, hope the data have been correctly flagged, “select all”, download a csv of all images IDs which are related to “Subject” and “Visit” instead of “PATNO” and “EVENT_ID” as one would expect from the dictionary, do some pandas magic in python, and finally download the images.

Is this “select all” and “let’s hope that T1/T2 flags have been flagged” hack the intended way? Isn’t there any CSV that lists all the images acquired for a subject with their properties?

Hi @Luca, and thanks for the question. Which dataset(s) are you downloading T1 images from? Are you referring to PPMI?

Tagging community members experienced with MR images and/or PPMI data to see if they have insight into this.
@vcatterson @whiter @AmgadDroby @tiago.azevedo @awiederhold @rickhelmich @SidKarthik @braah @Nesrine20262003 @dr.mltm.muller @apinstein @namburin @Sneha_Kugunavar @nisha @alberto.imarisio @juanbot @blin @bmarebwa @dkflin

Yes, I am indeed referring to PPMI data!

@Luca, might suggest taking a look at PPMI Data User Guide if you haven’t already: https://www.ppmi-info.org/sites/default/files/docs/PPMI%20Data%20User%20Guide.pdf.

The imaging data section also points to this guide, which may be of use? [link].

Thank you @jgottesman , I was aware of the guide but I really don’t understand how to translate it into practice. The guide mentions a metadata file available… The best I can find is two files. None of which contains the Image ID that I can insert in the IDA search for download!! One of the two does contain a “REC_ID” field but its format does not even match that of Image ID in which images are recorded!

The 2nd link you provide suggests downloading all image IDs from the IDA search and then doing some ad-hoc matching… Exactly what I’m doing! Like… isn’t there any documented way to do this? I’m managing to (most likely) get what I want but it just feels weird that there’s no documented way to do this in a reproducible way!

Hi @Luca, a few thoughts!

  1. Would suggest taking a look at the webinar/tool that @glatard developed: Webinar - Conducting Novel and Replicative Analyses of PPMI Imaging Data with the LivingPark Tool
  2. Spoke with a colleague on our imaging portfolio who suggested that refer to the curated data cut rather than attempting to pull down all the files. This will enable you to have the latest data from which you can then stratify to meet the needs of your analysis. It only includes DATSCAN collection dates, but these should be a good proxy for other imaging data collected.

Hope this helps, let us know if you still have questions! :slight_smile:

I’ve looked at all the webinar and… I’m still confused. What they’re doing is the same “hack” I’m doing myself. But this can’t be the intended way to match subjects to their images! BTW, if I filter by “T2” in LONI I get both T2 and FLAIR sequences! And, from what I understand from LivingPark tutorials, if I want to download T1s I need to select also the “PD” (proton density, not parkinsos) flag and manually guess from sequence name.

concerning the curated data cuts… They don’t include image IDs, so I still wouldn’t know how to match individual images to their diagonoses and demographics (if not by some manual scripting and string-matching)

Hi @Luca

The CSV exported from the Advanced Image Search is the authoritative file linking images to subjects and visits,. The column naming is just inconsistent with the clinical data dictionary: Subject maps to PATNO, and Visit uses human-readable labels (“Baseline”, “Month 12”) instead of the EVENT_ID codes (BL, V04) used in the clinical CSVs. PPMI publishes the full mapping in the Code List file included with the Study Data download, and once you build that lookup table in pandas it’s straightforward to merge on PATNO + EVENT_ID.
The MR modality flags are populated from DICOM headers and are inconsistently applied across sites, so it’s more robust to filter directly on the Description field using known sequence name patterns (e.g. MPRAGE, SPGR, MP-RAGE, IR-FSPGR). You can also use the “3D” format filter in Advanced Image Search to reduce noise before you even export.

I hope this helps.