Understanding the DaTScan data in the PPMI Curated Data Cut

Hi all! I have a question about the difference in DaTScan data in the raw PPMI data tables versus the Curated Data Cut, and I hope someone can explain or point me to the relevant documentation.

The Curated Data Cut has fields with names such as MIA_CAUDATE_L, which is described as “Left caudate, MIAKAT pipeline with cerebral white matter reference region”, and my understanding is that MIAKAT is a toolbox in MATLAB for processing imaging files.

On LONI there is a data section called Imaging→DaTSCAN which contains 6 files, most of which are documentation or metadata. The only file that appears to contain raw data is DaTScan_SBR_Analysis, which contains fields with names such as DATSCAN_CAUDATE_L.

For a given patient-visit, the raw numbers in MIA_CAUDATE_L look very different from the raw numbers in DATSCAN_CAUDATE_L. This makes sense if there is a pipeline applied to derive the MIA_ version from the DATSCAN_ version, but I cannot find the documentation that explains this transformation step.

The Curated Data Cut Data Dictionary says that the MIA_ location values are NOT derived data, which implies to me that they are in one of the PPMI tables. The source is given as “XING_Core_Lab_-_Quant_SBR”, but when I search for this name or parts of the name on LONI, I’m drawing a blank.

Can anyone point me to the source of the MIA_ data, or some documentation to explain the MIAKAT pipeline? Thanks!

2 Likes

I don’t have an answer - posting because I too would like to know. Thanks!

1 Like

I’d like to report that I have exactly the same problem. I hope someone is able to answer, thanks!

2 Likes

Hi all:

to the best of my understanding, the difference comes from two separate processing pipelines. You won’t find the MIAKAT documentation in LONI’s DaTSCAN.

Both sets come directly from core labs, but they aren’t transformable into each other . you just need to pick one pipeline consistently. As of lately, the PPMI are reanalyzing all DAT scans with WM as a reference (instead of occipital). Choosing the MIA_ might be recommended.

Hope this helps!

4 Likes

Thank you Amgad! This is very helpful information, and I think I understand better now why there are two separate sets of values.

Since I am performing a reanalysis, I will change to using the MIA_* values, to help future-proof my new results by referencing to WM.

Thanks again for your perspective!

2 Likes

Thank you so much Amgad, but if you don’t mind I’d like to come back with two follow-up questions:

  1. Could you please clarify why do you recommend to use the MIA_ values instead of the DATSCAN_ ones? Apologies as my background is computational and I’m still learning the ins and outs of this type of scans, but I was was wondering whether there’s any material online to understand why this MIAKAT is being used by PPMI to reanalyse the DAT scans? (and the shift in focus from the occipital cortex to the cerebral white matter)
  2. I’ve noticed some unexpected inconsistencies between the files and I don’t see any documentation on how this might be happening, wondering whether you know?:
    1. There are a lot people identified with EVENT_ID=”SC” in the DaTScan_SBR_Analysis file, and this “screening” time is not identified anywhere in the curated file. Also, sometimes the curated file has a visit_date for the baseline that is after the screening, and I thought this would be the other way around?
    2. Beyond the “screening” time not being present in the curated file, there are also some “unscheduled” times not present in there. Just to show a specific example, patient 3001 has EVENT_IDs U01, U02, V04, V06, and V10 in the DaTScan_SBR_Analysis file, whereas for the curated file this patient has MIA_ values for BL, V04, V06, V08, and V10. For this patient ID, the DATSCAN_DATE of U01 and U02 do not match any of the visit_data values. This is a bit confusing because it’s not clear whether some datscan values are just being ignored in curated dataset and if yes, why PPMI decided in that way. Also, I’m wondering whether PPMI just assigned “BL” in the curated data for dates that were “SC” in the DaTScan_SBR_Analysis, but just decided to not report that it did that?

I hope my points make sense, I’ll appreciate any help in understanding what might be going on here :slight_smile:

1 Like

Hi Tiago! I believe I can provide some answers, at least partially.

  1. The SC screening visit and the BL baseline visit have slightly different protocol for what data is collected at each. The intention is that the data collected at SC determines whether someone is a good fit for PPMI, and if they join the study, then the BL data is also collected. As such, SC is notionally before BL in time. The SC visit is specified to be no more than 3 months before BL, but in many cases SC and BL were performed at the same temporal visit, ie the dates for SC and BL are identical for a given patient. I believe this is just a feature of what was most convenient for different centres and for different patients (for example, if a patient has to travel a long distance then performing SC and BL together is more convenient).

1a. When it comes to the Curated Data Cut, there are no SC timepoints recorded. I believe all the data from SC and BL has been merged into a single label of BL simply for convenience. Since the BL data is meant to be additive to what was originally collected at SC, there is relatively little overlap between SC and BL data, and since they occur very close together in time (max 3 months difference), they can be considered the same timepoint.

  1. The Curated Data also contains no unscheduled visits, which is why you’re not seeing the Uxx event IDs for the given patient. The goal with the Curated Data is to regularize the data to simplify analysis, and we don’t know a priori where an unscheduled visit fits within the timeline.

As for the rationale behind the reanalysis of the DAT scan data, I would also love to hear any more information Amgad or others could provide!

Thank you so much Victoria!

  1. That’s what I had understood from PPMI, which was confusing because I’ve seen some SC dates after BL dates, not at the same time. See for example patient 3006: their visit_date for BL in the curated dataset is 05/2011, whereas the DATSCAN_DATE for SC is 08/2011, 2 months after. Other patients I have checked also have this behaviour of SC in the datscan file being 2/3 months after the BL dates in the curated dataset.

1a. I see, thanks, indeed in the curated dataset, in the Data Dictionary tab it says “(Note: Collection dates of DATSCAN, lumbar puncture, or other imaging/lab data may be different.)“ for the visit_date field.

  1. Thanks for the clarification on the Uxx not being included in the Curated Data, it does make sense. I guess my confusion is that in my mind these scans are not exactly “cheap” (both in terms of money and time), and so if PPMI decided to reanalyse the datscans why were also not run on the unscheduled visits? Even more as they are marked as non derived data in the Curated Data but there’s no other table in LONI for this reanalysed datscans alone.
1 Like

@tiago.azevedo @vcatterson MIAKAT is designed to reduce variability and improve reproducibility by applying uniform preprocessing, registration, and region definition.

As for the reference region, the shift from occipital cortex to cerebral white matter since white matter ROIs can provide a more stable, less noisy reference region for DaT quantification, especially when comparing across multi-site data. Several studies have shown that occipital brain regions do in fact show DaT uptake to a certain level, and this can vary with acquisition or pathology. Hence, white matter offers a more consistent non-specific binding signal which can increase SNR and possibly be more sensitive to quantifiy subtle striatal DaT changes, especially in the early stages.

2 Likes

@AmgadDroby thank you so much for the detailed explanation! Hopefully we’ll have those MIAKAT scans also for the datscan unscheduled visits in the future.

Any idea where I could find more information about the potential reasons for Screening dates being after Baseline dates in the DatScan files? Thanks!

2 Likes

Hi Tiago! This is indeed quite odd! I’m looking at the Age_at_visit file, which shows PATNO 3006 was 57.4 years old at SC and 57.5 at BL (suggesting that SC was before BL, as expected). Note that Age_at_visit also shows that 3006 attended SC, BL, V01, V02, V03, and V04 before withdrawing from PPMI, and the Curated Cut shows only two visits: BL and V04. This is because the Curated Cut only shows a subset of the full visits (see tab 2 of the Curated Cut’s description of EVENT_ID for the list).

I wonder if the DaT scan was scheduled at the time of the SC visit, but actually completed later (due to hospital department schedules or the non-urgent nature of the scan?) This would explain why the SC date occurs before BL (according to Age), but the actual scan date comes after BL. It’s a speculation though: I wonder if @lkirsch or someone else could give better insight?

1 Like

My best advice would be that this is one for the Schedule of Activities and the Imaging Manual in the protocol and procedure documents. Scheduling scans is specifically noted as a participant burden issue where there are permissable windows for collection, particularly for the screening and baseline visits. Since the protocol has changed over the years, you may also see some differences depending on the applicable recruitment and screening protocols. In general, it’s not a cross-validation failure given the protocol flexibility. You can find all the protocol info here if you need it: https://www.ppmi-info.org/study-design/research-documents-and-sops

1 Like