Issues scoring Neurobooster genotyped samples with Nalls et al PGS

vselvaggi · April 15, 2026, 9:32pm

Hi all, first post here, my name is Valentin Selvaggi, currently PhD candidate and GP2 trainee, working on implementing long read sequencing on PD/parkinsonism/movement disorders patients, based in Buenos Aires, Argentina.

I’m working on a side project that involves scoring a cohort genotyped using Neurobooster array, and I’m having some trouble with the match rate of variants in the weights files.

I’m using two Nalls et al PGS, the 90 variant (PGS Catalog - PGS000902 / Parkinson's disease (Polygenic Score)) and the 1805 variant (PGS Catalog - PGS000903 / Parkinson's disease (Polygenic Score)) PGS to score an imputed NBA dataset of ~250 samples. Imputation was ran using minimac/TOPMed imputation server, and the filtered dataset contains approx 8M variants.

The issue Im having is that despite using the hg38 build weights file and matching by rsID OR chr:pos:ref:alt, I’m getting what I consider a low match rate: 76% in both scores.

I would think that the 90 variants in the first PGS would definitely be in those 8 million variants and I’m doing something wrong, but I can’t seem to figure it out.

Thank you in advance to anyone who can help!

gginnan · April 16, 2026, 1:19pm

Hi @vselvaggi, thanks for your question, it’s always great to hear from our GP2 members! Tagging community members who work with (or have stated interest in) genetic data who may be able to help.

@ehutchins @mike @betamaro @LauraIbanez @VesnavM @jaeyoon.chung @kathrynstep @klposton @rochet071369 @Miriam @Bradford @elahif01

kathrynstep · April 17, 2026, 7:00am

Hi @vselvaggi. One thing worth double-checking is whether your REF/ALT alleles are aligned correctly. This can be tricky if you processed the data yourself (tools like PLINK may swap alleles based on frequency rather than the reference genome), which can lead to mismatches with the PGS weights.

I’d also make sure that the effect allele in the weights file matches the coded allele in your dataset, and check for strand issues (especially palindromic SNPs).

In addition, it’s worth verifying that variant IDs are consistent between your dataset and the summary statistics (rsIDs vs chr:pos), and that no build or normalization differences are creeping in.

If those look fine, I’d next look at imputation quality filters and whether any variants were dropped or represented differently (e.g., multiallelic sites).

Let me know if this helps at all. Some of the other GP2 members might have some additional/ alternative methods to try.

jgottesman · April 17, 2026, 9:22pm

Thanks for the suggestion @kathrynstep!

@hirotaka/@hamptonl/@makariousmb, is this something you worked on/have thoughts on?

vselvaggi · April 17, 2026, 10:35pm

Hi @kathrynstep , thank you for the reply!

I think my code was trying to match var IDs only by chr:pos:ref:alt, I added an rsID matching alternative and match rate rose to 95% and 91%

Topic		Replies	Views
NeuroX and WGS concordance Analyzing and Reusing Data genetic-data , ppmi , neurox , wgs , snps	4	104	August 29, 2025
Lack of Harmonization: A Special Task for the Data Community of Practice Analyzing and Reusing Data genetic-data , communication , data-interpretation	1	77	November 13, 2024
GP2 10th Data Release Data Sharing and Publications genetic-data , data-sharing , gp2	1	81	July 18, 2025
Access to SNPs data from NeuroX or Illumina in the AMP PD Platform Accessing and Understanding Data genetic-data , how-to , data-access , amp-pd , gwas , neurox , snps	4	93	January 15, 2024
AMP-PD codes to get genetic overviews (Ancestry, population PCs and PRS) Analyzing and Reusing Data genetic-data , amp-pd , code , ancestry	3	103	April 30, 2024

Issues scoring Neurobooster genotyped samples with Nalls et al PGS

Related topics