So you've done a GWAS - what's next? Part 2 - Gene-based & enrichment analyses

Following up on my last post, and suggestions from others, here are a few more recommendations on what to do after your GWAS.

  1. Gene-based tests of association. While a GWAS is an examination of one of many variants, a gene-based test of association examines the associations for all variants within the physical boundaries of a genes and estimates an overall significance for the gene. Alas, it doesn’t report a direction of effect, as some variants might be tagging haplotypes that confer increase or decrease risk. There are several tools that rely on the summary statistics from the GWAS. One such tools is MAGMA, which is a stand-alone program or readily available via FUMA (a very easy-to-use web interface), that for one option creates a gene-based test statistic by converting p-values of SNP-level summary statistics to χ2 values, which are averaged across SNPs mapped to a gene range. The mean χ2 is then converted to a p-value to determine a gene-based level of association. Another tool is VEGAS2, which has both a command line and online version as well, and it incorporates LD structure in calculating the gene-based association test.

  2. Pathway enrichment analyses. So once you have identify genes with individual SNP associations or genes from the gene-based test of association, it is common to explore what biological pathways might be overrepresented amongst your genic hits. I have used various significance thresholds, i.e. genes with SNP hits with p<1x10-5 and gene-based p<0.001 - it’s context dependent. A word of caution for any enrichment analysis is the usefulness of findings will be a function of how complete the reference database that is being used (i.e. enrichment analyses done a decade ago were based on a database that had limited information and if done today results might be different as the underlying database is likely to be more complete). There are several tools that I like: DAVID and Enrichr that are both online tools that only require you to copy and paste your gene list into a window and enrichment can be examine for many cool databases. Another tool (command line though) that I like is PARIS because it add a novel layer of statistical robustness by first grouping the SNPs (in the GWAS results) into LD features and single SNPs in linkage equilibrium (LE) features. Then, features are then grouped by pathways (i.e. the KEGG database or other sources) and the significance of a pathway is determined by permutation testing. In each permutation, the features in a pathway are replaced by a randomly selected set of features of similar size from across the genome. The total number of features with a significant p-value are compared with the number of significant features in the permuted pathway.

Again, these are complementary analyses to help synthesize the many SNP findings from a GWAS. It also allows us to not just focus on the sparse hits that might reach genome-wide significance. I firmly believe is maximize data/results - and these two approaches help expand the results that can be discussed.

5 Likes