Making sense of variants: Annotation with VEP and ANNOVAR

paularp · March 19, 2025, 7:40pm

The human genome has over 150 million known variants cataloged in population databases. Single nucleotide variants (SNVs) are the most common, with projects like the 1000 Genomes Project identifying about 84.7 million of them

These variants shape traits, disease risk, and how people respond to drugs. To understand these variants, scientists use a process called annotation. Variant annotation involves determining the functional consequences of genetic variants, such as their impact on gene function or association with diseases.

This process adds information about how variants link to genes and their effects on proteins. Such data helps predict if a variant might cause a disease or change how the body works.

Handling the huge amount of genetic data needs advanced tools, and there are several to facilitate this process. Personally, I have only used two:

• Ensembl Variant Effect Predictor (VEP): provides annotation and filtering of genomic variants. It predicts molecular consequences using gene sets and reports phenotype associations, allele frequencies, and deleteriousness predictions. VEP is accessible via command-line, API, and a web interface.

• ANNOVAR: it facilitates fast and easy variant annotations, including gene-based, region-based, and filter-based annotations. ANNOVAR is a command-line tool

As a very short example, let’s annotate variant rs41432647 chr17:4631740 (GRCh38.p14)

First, let’s try VEP

As an output, we can get something like this, and you can always use their filters and options to get more or less information

2. Now, ANNOVAR
To use Annovar, we must format our variant as a vcf file, following something similar to this format

##fileformat=VCFv4.2
##source=Varinat-to-VCF
#CHROM POS ID REF ALT QUAL FILTER INFO

perl annovar/table_annovar.pl \  # Path to the table_annovar.pl script
    data/rs41432647.vcf.gz \  # Input VCF file (gzipped)
    annovar/humandb/ \  # Directory containing ANNOVAR databases
    -buildver hg38 \  # Genome build version 
    -out Annovar_results/rs41432647.annovar \  # Output prefix and directory
    -remove \  # Remove intermediate files after annotation
    -protocol refGene,clinvar_20140902,dbnsfp47a \  # Databases used for annotation
    -operation g,f,f \  # Type of operation for each database (g=gene-based, f=filter-based)
    --nopolish \  # Skip variant normalization/polishing
    -nastring . \  # String to represent missing values in the output
    -vcfinput  # Specify that input is a VCF file

And once again, filtering the output (which can be suuuper long), you can access columns of your interest:

So in summary, what he have learned about our variant?

• Impact: The variant occurs in an exonic or protein-coding region
• ClinVar Status: No associated clinical significance reported
• Predictions:
• Polyphen-2 predicts the variant to be damaging (D) under the HDIV model.
• ClinPred also predicts the variant to be damaging (D).
• PolyPhen Prediction: 0.958, indicating the variant is likely damaging to the protein function.
• ClinPred Score: 0.998, further supporting the likelihood of a deleterious effect.
• CADD Scores:
• PHRED: 24.2 (higher scores suggest greater potential deleteriousness).
• RAW: 4.074235.

Please let me know any other tools you have used and for what purposes has annotation been useful!

gginnan · March 25, 2025, 6:09pm

Hi @paularp , thanks for this post! You mentioned you’ve only used VEP and ANNOVAR so far. Is that because that’s what your lab has conventionally been using, or were there other considerations? There are a couple other tools I’m aware of, such as SnpEff and GATK (Genome Analysis Toolkit), but am not sure how they stack up against VEP and ANNOVAR.

Do any of our community members use these or other tools for their genetic variant annotation? @samantha.schaffner @kraty @Bradford @saneckaa @jbmchls @Synuclein

jbmchls · March 26, 2025, 2:41pm

I use Annovar because that’s what my lab has conventionally used. I will be interested to see what our community members have to say about other tools.

paularp · March 28, 2025, 2:27pm

Yes! Ww have mainly used those too but I’ll definitely take a look on these tools you shared. Thanks!

Topic		Replies	Views
So you've done a GWAS - what's next? Part 1 - SNP annotation & function Analyzing and Reusing Data genetic-data , how-to , gwas	3	151	February 13, 2024
Interpretation of a genetic variant Accessing and Understanding Data genetic-data , how-to , data-interpretation	7	65	December 18, 2023
Resources for sequence variant interpretation Accessing and Understanding Data genetic-data , data-interpretation , documentation	2	39	October 20, 2023
AMP-PD codes to get genetic overviews (Ancestry, population PCs and PRS) Analyzing and Reusing Data genetic-data , amp-pd , code , ancestry	3	66	April 30, 2024
So you've done a GWAS - what's next? Part 2 - Gene-based & enrichment analyses Analyzing and Reusing Data genetic-data , how-to , gwas	0	70	February 21, 2024

Making sense of variants: Annotation with VEP and ANNOVAR

Related topics