-
Notifications
You must be signed in to change notification settings - Fork 2
Home
VASE stands for Variant Annotation, Segregation and Exclusion. It provides a program to filter and annotate variant data (in VCF/BCF format) according to user-specified criteria. Example use cases include:
- Annotating/filtering variants on frequency data from public databases (e.g. gnomAD, dbSNP)
- Selecting variants based on VEP consequence annotations
- Filtering variants based on presence in cases vs control samples
- Filtering variants based on familial segregation/inheritance patterns
- Filtering variants/genotypes based on depth data and quality metrics
VASE requires python3 and can be installed using pip or the setup.py script included in the repo. Assuming your system has git and pip installed, the easiest way to install VASE with full functionality is as follows:
pip3 install git+git://github.com/david-a-parry/vase.git#egg=project[BGZIP,REPORTER,MYGENE] --user
For more detailed installation instructions see the README.
Use exome data from gnomAD to remove variants with an allele frequency of 1% or higher (in any gnomAD population).
vase -i input.bcf --freq 0.01 -g gnomad.exomes.r2.1.1.sites.vcf.bgz -o rare.bcf
Annotate gnomAD and dbSNP frequencies but do not filter.
vase -i input.bcf \
-g gnomad.exomes.r2.1.1.sites.vcf.bgz gnomad.genomes.r2.1.1.sites.vcf.bgz \
-d dbSNP151.vcf.gz \
-o annotated.bcf
Annotation with large datafiles such as gnomAD data can be slow. Frequency information in annotated VCFs can be used to filter post-annotation which is a faster option if you are likely to be performing different types of filtering on the same VCF.
vase -i annotated.bcf --freq 0.01 -o one_pc_filter.bcf
vase -i annotated.bcf --freq 0.05 -o five_pc_filter.bcf
Several features of VASE rely on VEP annotations. It is recommended to run VEP with the '--everything' flag (and optionally LoF, dbNSFP and dbscSNV plugins) for best use of VASE features.
To output only variants with a HIGH impact consequence for at least one overlapping transcript:
vase -i annotated.vep.bcf --impact HIGH
As above but only if the HIGH impact variant is in a canonical transcript:
vase -i annotated.vep.bcf --impact HIGH --canonical
Output variants occuring de novo in affected child(ren) of a parent-child trio(s):
vase -i input.bcf --ped trios.ped --de_novo -o naive_dnms.bcf
As above, but using some sensible filters to reduce false-positives:
vase -i input.bcf \
--ped trios.ped \
--de_novo \
--het_ab 0.27 \
--control_het_ab 0.05 \
--dp 10 \
--gq 20 \
--freq 1e-5 \
-o dnms.bcf
Output rare variants with HIGH or MODERATE VEP impact in canonical transcripts and match recessive inheritance in families:
vase -i annotated.vep.bcf \
--freq 0.005 \
--impact HIGH MODERATE \
--canonical \
--ped trios.ped \
--recessive \
-o recessives.bcf
Optionally combine variants and write report in either XLSX or JSON format:
#concat variants to save running the reporter on both recessive and de novo outputs separately
bcftools concat -O b -o dnms_and_recessives.bcf dnms.bcf recessives.bcf
#write report in Excel format
vase_reporter dnms_and_recessives.bcf dnms_and_recessives.report.xlsx --ped trios.ped
#alternatively #write report in JSON format
vase_reporter dnms_and_recessives.bcf dnms_and_recessives.report.json -o json --ped trios.ped