Skip to content
David A. Parry edited this page Jul 9, 2020 · 13 revisions

VASE

THIS WIKI IS UNDER CONSTRUCTION

What is VASE?

VASE stands for Variant Annotation, Segregation and Exclusion. It provides a program to filter and annotate variant data (in VCF/BCF format) according to user-specified criteria. Example use cases include:

  • Annotating/filtering variants on frequency data from public databases (e.g. gnomAD, dbSNP)
  • Selecting variants based on VEP consequence annotations
  • Filtering variants based on presence in cases vs control samples
  • Filtering variants based on familial segregation/inheritance patterns
  • Filtering variants/genotypes based on depth data and quality metrics

Installation

VASE requires python3 and can be installed using pip or the setup.py script included in the repo. Assuming your system has git and pip installed, the easiest way to install VASE with full functionality is as follows:

pip3 install git+git://github.com/david-a-parry/vase.git#egg=project[BGZIP,REPORTER,MYGENE] --user

For more detailed installation instructions see the README.

Examples

Frequency filtering

Use exome data from gnomAD to remove variants with an allele frequency of 1% or higher (in any gnomAD population).

vase -i input.bcf --freq 0.01 -g gnomad.exomes.r2.1.1.sites.vcf.bgz -o rare.bcf

Annotate gnomAD and dbSNP frequencies but do not filter.

vase -i input.bcf \
-g gnomad.exomes.r2.1.1.sites.vcf.bgz gnomad.genomes.r2.1.1.sites.vcf.bgz \
-d dbSNP151.vcf.gz \
-o annotated.bcf

Annotation with large datafiles such as gnomAD data can be slow. Frequency information in annotated VCFs can be used to filter post-annotation which is a faster option if you are likely to be performing different types of filtering on the same VCF.

vase -i annotated.bcf --freq 0.01 -o one_pc_filter.bcf
vase -i annotated.bcf --freq 0.05 -o five_pc_filter.bcf

Filtering with VEP annotations

Several features of VASE rely on VEP annotations. It is recommended to run VEP with the '--everything' flag (and optionally LoF, dbNSFP and dbscSNV plugins) for best use of VASE features.

To output only variants with a HIGH impact consequence for at least one overlapping transcript:

vase -i annotated.vep.bcf --impact HIGH

As above but only if the HIGH impact variant is in a canonical transcript:

vase -i annotated.vep.bcf --impact HIGH --canonical

Familial segregation filtering

Output variants occuring de novo in affected child(ren) of a parent-child trio(s):

vase -i input.bcf --ped trios.ped --de_novo -o naive_dnms.bcf

As above, but using some sensible filters to reduce false-positives:

vase -i input.bcf \
--ped trios.ped \
--de_novo \
--het_ab 0.27 \
--control_het_ab 0.05 \
--dp 10 \
--gq 20 \
--freq 1e-5 \
-o dnms.bcf

Output rare variants with HIGH or MODERATE VEP impact in canonical transcripts and match recessive inheritance in families:

vase -i annotated.vep.bcf \
--freq 0.005 \
--impact HIGH MODERATE \
--canonical \
--ped trios.ped \
--recessive \
-o recessives.bcf

Optionally combine variants and write report in either XLSX or JSON format:

#concat variants to save running the reporter on both recessive and de novo outputs separately
bcftools concat -O b -o dnms_and_recessives.bcf dnms.bcf recessives.bcf

#write report in Excel format
vase_reporter dnms_and_recessives.bcf dnms_and_recessives.report.xlsx --ped trios.ped

#alternatively #write report in JSON format
vase_reporter dnms_and_recessives.bcf dnms_and_recessives.report.json -o json --ped trios.ped

Clone this wiki locally