So you've done a GWAS - what's next? Part 1 - SNP annotation & function

fbbriggs · January 23, 2024, 5:33pm

For many, depending on the platform or the imputation server used, or if it is historical summary statistics from various sources (i.e. the GWAS catalog) - the variant information may be incomplete or not up-to-date. In the least, chromosome and base-location location will be available but it might not be for the most updated hg build. So there are a few best practices to take:

Confirm the build of the genetic data. It is it not in the build of choice, then use LiftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). This is particularly important if you will be conducting a meta-analysis of summary statistics conducted for different builds.
Annotate SNPs once the build is confirmed. There are several tools that can map a lot of very cool information to each variant, i.e. if it’s genic and its function in the gene; if its in a regulatory region (e.g. CpG island, miRNA target site); if its associated with other phenotypes (e.g ClinVar, GWAS catalog), and various other traits. A few examples of tools: a. SNP-nexus (SNP Annotation Tool) - a web application; b. ANNOVAR (ANNOVAR Documentation) - a perl program; and c. AnnoGen (GitHub - shengqh/annogen) - a python program. I was a huge fan of SeattleSeq, a web application, but it seems it is no longer available/supported as of a few months ago.
Regulatory potential can be succinctly evaluated using RegulomeDB, which scores a variant based on evidence from ChIP-seq data (if SNP is in a transcription factor [TF] binding site), impact on chromatin states, findings from DNA-seq experiments, location within TF motifs, and expression QTL (eQTL - if SNP is associated with the expression of a gene in a specific tissue)& chromatin accessibility QTL (caQTL - if SNP is associated with nucleosome packing/positioning in a specific tissue). The RegulomeDB score ranges from 1a to 7, with 1a having the highest evidence and 7 having no regulatory evidence.
Gene expression potential can be more extensively investigated using various user-friendly web-application tools. While SNP annotation reports on whether that one SNP is a possible eQTL (expression quantitative trait locus), it is quite like that the SNP is in the linkage disequilibrium (LD) with other nearby variants that are eQTLs for similar or different genes across various tissues. Two of my favorite tools are:
a. FIVEx (https://fivex.sph.umich.edu/) which are easily search for a SNP of interest and all SNPs in LD up to a distance of +/- 1Mb. It includes a lot of different data sets (e.g. ROSMAP, TwinsUK, GTEx, FUSION, etc) - hence, it has been my go-to eQTL tool. There is also an option to look for splicing QTLs - few data sets, but something novel.
b. LDlink (https://ldlink.nih.gov/) is hosted by NIH and it has a lot of features that can be explored. LDexpress is similar to FIVEx but only for GTEx. LDhap - evaluates population-specific haplotype frequencies - this is an under-utilized new in SNP interpretation as a haplotype may likely be the causal structure versus a single variant. LDassoc examines LD structure in specific populations, which can be converted into a heatmap with LDmatrix. LDpair can look for correlated pairs of variants. LDtrait can look for al ist of variants in LD with variants of interest and extract their results from the GWAS catalog. There are also a few other tools, i.e. LDproxy, etc - but as you can see - it is a great one-stop interface from which a lot of functionality for a SNP or SNPs could be hypothesized.

They are always new tools popping up - do you have any suggestions: @paularp @ehutchins @vdardov @gdp22 @Vidash @danieltds @psaffie

fbbriggs · January 23, 2024, 5:33pm

What about you: @hirotaka @malosco @rooparajan @johanna.junker

malosco · January 24, 2024, 6:34pm

Hi @fbbriggs, thanks for the interesting post and question. Unfortunately, I don’t have much to add or advice as this falls a bit outside of my experiences. I do work with several genetic folks though and can ask them for you.

hirotaka · February 13, 2024, 2:07pm

Thank you for sharing the tools you are using. I didn’t know many of the tools you mentioned here, and they look very useful! I am looking forward to trying them out next time! I also would like to share two tools I use after GWAS.

LocusZoom: Makes Manhattan plots interactive and provides a qualitative assessment of QQ plots. +1 point for accepting summary stats without rsID, annotating them, and allowing users to download the file with rsID.
FUMA: A one-stop screening solution for various enrichment analysis.

Regrading the additional analysis, I often conduct PRS analyses. I also like to conduct Mendelian randomization and LD score regressions, provided there is enough power.

Topic		Replies	Views
So you've done a GWAS - what's next? Part 2 - Gene-based & enrichment analyses Analyzing and Reusing Data genetic-data , how-to , gwas	0	36	February 21, 2024
Genetic analysis in admixed populations - Topic 2: Genetic QC Analyzing and Reusing Data genetic-data , how-to , data-quality	0	37	October 17, 2024
Making sense of variants: Annotation with VEP and ANNOVAR Analyzing and Reusing Data genetic-data , how-to , tools	3	52	March 28, 2025
Systematic review methodology tips Analyzing and Reusing Data genetic-data , how-to , methodology	1	42	September 8, 2023
Lack of Harmonization: A Special Task for the Data Community of Practice Analyzing and Reusing Data genetic-data , communication , data-interpretation	1	48	November 13, 2024

So you've done a GWAS - what's next? Part 1 - SNP annotation & function

Related topics