Has anyone looked into the concordance of WGS and the genotyping assay SNPs in PPMI? I’m mostly wondering about WGS and NeuroX concordance. If the calls disagree, would you trust one technology over the other?
Hi @amclean, thanks for your question!
Wondering if @hirotaka @ehutchins @mattk @hamptonl @vdardov or other community members with experience using NeuroX and WGS in PPMI data have any insight into this?
I have not personally looked at the concordance for NeuroX and WGS for PPMI specifically (maybe @hirotaka has?), but for GP2 we often look at concordance between our NBA array and WGS. For a lot of variants, concordance can actually be quite high. However, there will always be rare variants and variants in complicated regions where we have found the probes on the array do not perform as well. In cases where the variant carrier counts differ vastly between array and WGS, we take a look at the variant cluster plots (using theta and R) to check if specific probes are not typing these variants well, and often we see blobs of poorly called genotypes rather than neatly formed clusters that you would expect with a good probe. Typically we end up choosing WGS in these cases, especially for rare variant cases in which WGS has an advantage over array genotyping.
Just to add to Hamptom’s reply, I’m not that knowledge on bioinformatics as some, so maybe take my response with a grain of salt, given that what I will tell you came from what the bioinformatician I work with found out, so I’m not sure I got everything right.
Nonetheless, my PhD project initially involved obtaining the 90 risk SNPs from Nalls et al. from NeuroX array data available in PPMI. However, as we looked into the data, we found out that only something between 50 or 60 of those SNPs could be identified using this array data. For that reason, we migrated to WGS data, and it worked out completely fine.
We did discuss this a few years back, so I’m not sure the most recent answer, but in my case, we decided to use the WGS data over the Neurodex data. This was my decision tree SOP back then, with the caveat that this was prior to data release by the Consensus Committee.
I wanted to add to @hamptonl ‘s point about rare variants - GBA variants, for instance, can be tricky, and she raises a very good point about looking more closely at specific probes with the Neurodex data.