Issues scoring Neurobooster genotyped samples with Nalls et al PGS

Hi all, first post here, my name is Valentin Selvaggi, currently PhD candidate and GP2 trainee, working on implementing long read sequencing on PD/parkinsonism/movement disorders patients, based in Buenos Aires, Argentina.

I’m working on a side project that involves scoring a cohort genotyped using Neurobooster array, and I’m having some trouble with the match rate of variants in the weights files.

I’m using two Nalls et al PGS, the 90 variant (PGS Catalog - PGS000902 / Parkinson's disease (Polygenic Score)) and the 1805 variant (PGS Catalog - PGS000903 / Parkinson's disease (Polygenic Score)) PGS to score an imputed NBA dataset of ~250 samples. Imputation was ran using minimac/TOPMed imputation server, and the filtered dataset contains approx 8M variants.

The issue Im having is that despite using the hg38 build weights file and matching by rsID OR chr:pos:ref:alt, I’m getting what I consider a low match rate: 76% in both scores.

I would think that the 90 variants in the first PGS would definitely be in those 8 million variants and I’m doing something wrong, but I can’t seem to figure it out.

Thank you in advance to anyone who can help!

Hi @vselvaggi, thanks for your question, it’s always great to hear from our GP2 members! Tagging community members who work with (or have stated interest in) genetic data who may be able to help.

@ehutchins @mike @betamaro @LauraIbanez @VesnavM @jaeyoon.chung @kathrynstep @klposton @rochet071369 @Miriam @Bradford @elahif01

Hi @vselvaggi. One thing worth double-checking is whether your REF/ALT alleles are aligned correctly. This can be tricky if you processed the data yourself (tools like PLINK may swap alleles based on frequency rather than the reference genome), which can lead to mismatches with the PGS weights.

I’d also make sure that the effect allele in the weights file matches the coded allele in your dataset, and check for strand issues (especially palindromic SNPs).

In addition, it’s worth verifying that variant IDs are consistent between your dataset and the summary statistics (rsIDs vs chr:pos), and that no build or normalization differences are creeping in.

If those look fine, I’d next look at imputation quality filters and whether any variants were dropped or represented differently (e.g., multiallelic sites).

Let me know if this helps at all. Some of the other GP2 members might have some additional/ alternative methods to try.

Thanks for the suggestion @kathrynstep!

@hirotaka/@hamptonl/@makariousmb, is this something you worked on/have thoughts on?

Hi @kathrynstep , thank you for the reply!

I think my code was trying to match var IDs only by chr:pos:ref:alt, I added an rsID matching alternative and match rate rose to 95% and 91%