Microbiome PDMB subset Q&A

mdeleeuw · August 19, 2024, 5:58am

Hello everyone, I wanted to share some first impressions regarding the PDMB Microbiome sub-study data and check with you if these are correct. First, regarding the available (Cutadapt pre-processed) raw data, these are the sample numbers we see:

phenotype	origin	fecal	saliva	both (%)
NT	23&ME participant	295	294	99.0
NT	PDMB, main	23	22	95.7
NT	total	318	316
PD	PDMB, main	350	349	98.0
PD	PDMB, no response	17	15	88.2
PD	PDMB, pilot	37	37	94.7
PD	total	404	401

It looks non-PDMB controls were sourced from 23&Me, because indeed within the PDMB sub-cohort, there are not many neurotypical (NT) controls. This indeed balances the two phenotypes nicely, but it seems no subject metadata is available at all for these NT controls? Would it be possible to source gender, age, eventually BMI? As mentioned in the table, a handful of PDMB participants do not have PDMB specific metadata associated, so presamably, these subjects have not responded to the PDMB questionnaire.

Next, it seems we can confirm the raw data was indeed pre-processed as mentioned in the FoxInsight Nature Scientific Data paper, because we do not detect adapters. We can still infer library insert size, by looking at the resulting mean read length per sample. Shorter (trimmed back) read length with Illumina sequences is generally indicative of detected adapter much more than of read quality, especially on the forward read - which is the only read available here, since this is 1x100bp single-end sequencing data. Another quality control aspect we look at is PCR duplicates, which we remove prior to read classification. We obtain the following QC plot:

This confirms the microbiome data is indeed shallow but not dramatically - for the fecal samples we are at about a factor 10 of what we usually sequence, with ~3M reads/sample. PCR duplicates seem well controlled overall, but perhaps surprisingly comparable between fecal and saliva, whereas only saliva represents low biomass. The more extreme duplicate read samples are indeed situated on the side of the saliva. The indicated read number thresholds, 20k reads for saliva and 1M for fecal, is what we intend to use for downstream rarefaction on a dataset like this, to perform alpha diversity analysis. And thus eliminate the corresponding outliers. A few saliva samples below the threshold also have low mean read length, indicative of degraded DNA input to the library construction, but overall the DNA quality seems to have been very usable or perhaps a good small fragment cleanup was used.

Voilà, our first impressions with the microbiome dataset, I hope this is useful for the community and that we can get some additional metadata for the NT controls as mentioned - presuming our interpretations are correct of course.

gginnan · March 21, 2025, 6:53pm

Hi @mdeleeuw , thanks for sharing your impressions after looking into the PDMB Microbiome sub-study! Wanted to point out this recent conversation on PDMB metadata & resources, where some other members were asking about / discussing additional metadata, data, and documentation for the PDMB cohort.

Topic		Replies	Views
Greetings - Marcel de Leeuw, MRM Health - Microbiome Introductions	3	35	August 24, 2024
Hi, I'm Keaton Stagaman Introductions	3	36	April 11, 2024
Prior experience with PDBP biological and omics data Accessing and Understanding Data how-to , data-interpretation , pdbp , omics-data	6	63	August 18, 2023
Fox Insight PD Microbiome (PDMB) Metadata & Resources Accessing and Understanding Data fox-insight , meta , data-access , featured-content , microbiome , pdmb	16	158	November 17, 2025
Hi there. this is Haydeh Payami Introductions	3	29	March 4, 2024

Microbiome PDMB subset Q&A

Related topics