Skip to content

Commit

Permalink
Use GATK tool SVAnnotate for functional consequence annotation (#342)
Browse files Browse the repository at this point in the history
  • Loading branch information
epiercehoffman authored May 4, 2022
1 parent 8aa85e5 commit daf14a9
Show file tree
Hide file tree
Showing 21 changed files with 140 additions and 945 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ We still encourage members of the community to adapt GATK-SV for non-GCP backend
* Refer to Cromwell's [documentation](https://cromwell.readthedocs.io/en/stable/backends/Backends/) for configuration instructions.
* The handling and ordering of `glob` commands may differ between platforms.
* Shell commands that are potentially destructive to input files (e.g. `rm`, `mv`, `tabix`) can cause unexpected behavior on shared filesystems. Enabling [copy localization](https://cromwell.readthedocs.io/en/stable/Configuring/#local-filesystem-options) may help to more closely replicate the behavior on GCP.
* For clusters that do not support Docker, Singularity is an alternative. See [Cromwell documentation on Singularity(https://cromwell.readthedocs.io/en/stable/tutorials/Containers/#singularity).
* For clusters that do not support Docker, Singularity is an alternative. See [Cromwell documentation on Singularity](https://cromwell.readthedocs.io/en/stable/tutorials/Containers/#singularity).
* The GATK-SV pipeline takes advantage of the massive parallelization possible in the cloud. Local backends may not have the resources to execute all of the workflows. Workflows that use fewer resources or that are less parallelized may be more successful. For instance, some users have been able to run [GatherSampleEvidence](#gather-sample-evidence) on a SLURM cluster.

### Data:
Expand Down Expand Up @@ -475,9 +475,9 @@ Combines variants across multiple batches, resolves complex variants, re-genotyp
* Finalized "cleaned" VCF and QC plots

## <a name="module07">Module 07</a> (in development)
Apply downstream filtering steps to the cleaned vcf to further control the false discovery rate; all steps are optional and users should decide based on the specific purpose of their projects.
Apply downstream filtering steps to the cleaned VCF to further control the false discovery rate; all steps are optional and users should decide based on the specific purpose of their projects.

Filterings methods include:
Filtering methods include:
* minGQ - remove variants based on the genotype quality across populations.
Note: Trio families are required to build the minGQ filtering model in this step. We provide tables pre-trained with the 1000 genomes samples at different FDR thresholds for projects that lack family structures, and they can be found at the paths below. These tables assume that GQ has a scale of [0,999], so they will not work with newer VCFs where GQ has a scale of [0,99].
```
Expand All @@ -493,10 +493,10 @@ gs://gatk-sv-resources-public/hg38/v0/sv-resources/ref-panel/1KG/v2/mingq/1KGP_2
## <a name="annotate-vcf">AnnotateVcf</a> (in development)
*Formerly Module08Annotation*

Add annotations, such as the inferred function and allele frequencies of variants, to final vcf.
Add annotations, such as the inferred function and allele frequencies of variants, to final VCF.

Annotations methods include:
* Functional annotation - annotate SVs with inferred function on protein coding regions, regulatory regions such as UTR and Promoters and other non coding elements;
* Functional annotation - annotate SVs with inferred functional consequence on protein-coding regions, regulatory regions such as UTR and promoters, and other non-coding elements.
* Allele Frequency annotation - annotate SVs with their allele frequencies across all samples, and samples of specific sex, as well as specific sub-populations.
* Allele Frequency annotation with external callset - annotate SVs with the allele frequencies of their overlapping SVs in another callset, eg. gnomad SV callset.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
"AnnotateVcf.vcf_idx" : "${this.vcf_index}",

"AnnotateVcf.protein_coding_gtf" : "${workspace.protein_coding_gtf}",
"AnnotateVcf.linc_rna_gtf" : "${workspace.linc_rna_gtf}",
"AnnotateVcf.promoter_bed" : "${workspace.promoter_bed}",
"AnnotateVcf.noncoding_bed" : "${workspace.noncoding_bed}",
"AnnotateVcf.ref_bed" : "${workspace.external_af_ref_bed}",
"AnnotateVcf.ref_prefix" : "${workspace.external_af_ref_bed_prefix}",
Expand All @@ -19,6 +17,7 @@

"AnnotateVcf.prefix" : "${this.sample_set_id}",

"AnnotateVcf.gatk_docker" : "${workspace.gatk_docker}",
"AnnotateVcf.sv_base_mini_docker" : "${workspace.sv_base_mini_docker}",
"AnnotateVcf.sv_pipeline_docker" : "${workspace.sv_pipeline_docker}"
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
"AnnotateVcf.vcf_idx" : "${this.vcf_index}",

"AnnotateVcf.protein_coding_gtf" : "${workspace.protein_coding_gtf}",
"AnnotateVcf.linc_rna_gtf" : "${workspace.linc_rna_gtf}",
"AnnotateVcf.promoter_bed" : "${workspace.promoter_bed}",
"AnnotateVcf.noncoding_bed" : "${workspace.noncoding_bed}",
"AnnotateVcf.ref_bed" : "${workspace.external_af_ref_bed}",
"AnnotateVcf.ref_prefix" : "${workspace.external_af_ref_bed_prefix}",
Expand All @@ -19,6 +17,7 @@

"AnnotateVcf.prefix" : "${this.sample_set_set_id}",

"AnnotateVcf.gatk_docker" : "${workspace.gatk_docker}",
"AnnotateVcf.sv_base_mini_docker" : "${workspace.sv_base_mini_docker}",
"AnnotateVcf.sv_pipeline_docker" : "${workspace.sv_pipeline_docker}"
}
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
workspace:cloud_sdk_docker cnmops_docker condense_counts_docker gatk_docker gatk_docker_pesr_override gcnv_gatk_docker genomes_in_the_cloud_docker linux_docker manta_docker samtools_cloud_docker sv_base_docker sv_base_mini_docker sv_pipeline_base_docker sv_pipeline_docker sv_pipeline_hail_docker sv_pipeline_updates_docker sv_pipeline_qc_docker sv_pipeline_rdtest_docker wham_docker allosome_file autosome_file bin_exclude cnmops_exclude_list cohort_ped_file contig_ploidy_priors copy_number_autosomal_contigs cytobands dbsnp_vcf delly_exclude_intervals_file depth_exclude_list empty_file exclude_intervals_for_gcnv_filter_intervals external_af_ref_bed external_af_ref_bed_prefix genome_file inclusion_bed linc_rna_gtf manta_region_bed mei_bed melt_standard_vcf_header noncoding_bed pesr_exclude_list preprocessed_intervals primary_contigs_fai primary_contigs_list promoter_bed protein_coding_gtf reference_build reference_dict reference_fasta reference_index reference_version rmsk segdups seed_cutoffs unpadded_intervals_file wgd_scoring_mask wham_include_list_bed_file chr_x chr_y google_project_id
{{ dockers.cloud_sdk_docker }} {{ dockers.cnmops_docker }} {{ dockers.condense_counts_docker }} {{ dockers.gatk_docker }} {{ dockers.gatk_docker_pesr_override }} {{ dockers.gatk_docker }} {{ dockers.genomes_in_the_cloud_docker }} {{ dockers.linux_docker }} {{ dockers.manta_docker }} {{ dockers.samtools_cloud_docker }} {{ dockers.sv_base_docker }} {{ dockers.sv_base_mini_docker }} {{ dockers.sv_pipeline_base_docker }} {{ dockers.sv_pipeline_docker }} {{ dockers.sv_pipeline_hail_docker }} {{ dockers.sv_pipeline_updates_docker }} {{ dockers.sv_pipeline_qc_docker }} {{ dockers.sv_pipeline_rdtest_docker }} {{ dockers.wham_docker }} {{ reference_resources.allosome_file }} {{ reference_resources.autosome_file }} {{ reference_resources.bin_exclude }} {{ reference_resources.cnmops_exclude_list }} gs://broad-dsde-methods-eph/ped_1kgp_all.ped {{ reference_resources.contig_ploidy_priors }} {{ reference_resources.copy_number_autosomal_contigs }} {{ reference_resources.cytobands }} {{ reference_resources.dbsnp_vcf }} {{ reference_resources.delly_exclude_intervals_file }} {{ reference_resources.depth_exclude_list }} {{ reference_resources.empty_file }} {{ reference_resources.exclude_intervals_for_gcnv_filter_intervals }} {{ reference_resources.external_af_ref_bed | tojson }} {{ reference_resources.external_af_ref_bed_prefix | tojson }} {{ reference_resources.genome_file }} {{ reference_resources.inclusion_bed }} {{ reference_resources.linc_rna_gtf | tojson }} {{ reference_resources.manta_region_bed }} {{ reference_resources.mei_bed }} {{ reference_resources.melt_std_vcf_header }} {{ reference_resources.noncoding_bed | tojson }} {{ reference_resources.pesr_exclude_list }} {{ reference_resources.preprocessed_intervals }} {{ reference_resources.primary_contigs_fai }} {{ reference_resources.primary_contigs_list }} {{ reference_resources.promoter_bed | tojson }} {{ reference_resources.protein_coding_gtf | tojson }} {{ reference_resources.reference_build }} {{ reference_resources.reference_dict }} {{ reference_resources.reference_fasta }} {{ reference_resources.reference_index }} {{ reference_resources.reference_version }} {{ reference_resources.rmsk }} {{ reference_resources.segdups }} {{ reference_resources.seed_cutoffs }} {{ reference_resources.unpadded_intervals_file }} {{ reference_resources.wgd_scoring_mask }} {{ reference_resources.wham_include_list_bed_file }} {{ reference_resources.chr_x }} {{ reference_resources.chr_y }} {{ cloud_env.terra_billing_project_id }}
workspace:cloud_sdk_docker cnmops_docker condense_counts_docker gatk_docker gatk_docker_pesr_override gcnv_gatk_docker genomes_in_the_cloud_docker linux_docker manta_docker samtools_cloud_docker sv_base_docker sv_base_mini_docker sv_pipeline_base_docker sv_pipeline_docker sv_pipeline_hail_docker sv_pipeline_updates_docker sv_pipeline_qc_docker sv_pipeline_rdtest_docker wham_docker allosome_file autosome_file bin_exclude cnmops_exclude_list cohort_ped_file contig_ploidy_priors copy_number_autosomal_contigs cytobands dbsnp_vcf delly_exclude_intervals_file depth_exclude_list empty_file exclude_intervals_for_gcnv_filter_intervals external_af_ref_bed external_af_ref_bed_prefix genome_file inclusion_bed manta_region_bed mei_bed melt_standard_vcf_header noncoding_bed pesr_exclude_list preprocessed_intervals primary_contigs_fai primary_contigs_list protein_coding_gtf reference_build reference_dict reference_fasta reference_index reference_version rmsk segdups seed_cutoffs unpadded_intervals_file wgd_scoring_mask wham_include_list_bed_file chr_x chr_y google_project_id
{{ dockers.cloud_sdk_docker }} {{ dockers.cnmops_docker }} {{ dockers.condense_counts_docker }} {{ dockers.gatk_docker }} {{ dockers.gatk_docker_pesr_override }} {{ dockers.gatk_docker }} {{ dockers.genomes_in_the_cloud_docker }} {{ dockers.linux_docker }} {{ dockers.manta_docker }} {{ dockers.samtools_cloud_docker }} {{ dockers.sv_base_docker }} {{ dockers.sv_base_mini_docker }} {{ dockers.sv_pipeline_base_docker }} {{ dockers.sv_pipeline_docker }} {{ dockers.sv_pipeline_hail_docker }} {{ dockers.sv_pipeline_updates_docker }} {{ dockers.sv_pipeline_qc_docker }} {{ dockers.sv_pipeline_rdtest_docker }} {{ dockers.wham_docker }} {{ reference_resources.allosome_file }} {{ reference_resources.autosome_file }} {{ reference_resources.bin_exclude }} {{ reference_resources.cnmops_exclude_list }} gs://broad-dsde-methods-eph/ped_1kgp_all.ped {{ reference_resources.contig_ploidy_priors }} {{ reference_resources.copy_number_autosomal_contigs }} {{ reference_resources.cytobands }} {{ reference_resources.dbsnp_vcf }} {{ reference_resources.delly_exclude_intervals_file }} {{ reference_resources.depth_exclude_list }} {{ reference_resources.empty_file }} {{ reference_resources.exclude_intervals_for_gcnv_filter_intervals }} {{ reference_resources.external_af_ref_bed }} {{ reference_resources.external_af_ref_bed_prefix }} {{ reference_resources.genome_file }} {{ reference_resources.inclusion_bed }} {{ reference_resources.manta_region_bed }} {{ reference_resources.mei_bed }} {{ reference_resources.melt_std_vcf_header }} {{ reference_resources.noncoding_bed }} {{ reference_resources.pesr_exclude_list }} {{ reference_resources.preprocessed_intervals }} {{ reference_resources.primary_contigs_fai }} {{ reference_resources.primary_contigs_list }} {{ reference_resources.protein_coding_gtf }} {{ reference_resources.reference_build }} {{ reference_resources.reference_dict }} {{ reference_resources.reference_fasta }} {{ reference_resources.reference_index }} {{ reference_resources.reference_version }} {{ reference_resources.rmsk }} {{ reference_resources.segdups }} {{ reference_resources.seed_cutoffs }} {{ reference_resources.unpadded_intervals_file }} {{ reference_resources.wgd_scoring_mask }} {{ reference_resources.wham_include_list_bed_file }} {{ reference_resources.chr_x }} {{ reference_resources.chr_y }} {{ cloud_env.terra_billing_project_id }}
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,6 @@
"GATKSVPipelineSingleSample.run_vcf_qc" : false,

"GATKSVPipelineSingleSample.protein_coding_gtf" : "${workspace.reference_protein_coding_gtf}",
"GATKSVPipelineSingleSample.linc_rna_gtf" : "${workspace.reference_linc_rna_gtf}",
"GATKSVPipelineSingleSample.promoter_bed" : "${workspace.reference_promoter_bed}",
"GATKSVPipelineSingleSample.noncoding_bed" : "${workspace.reference_noncoding_bed}",
"GATKSVPipelineSingleSample.external_af_ref_bed" : "${workspace.reference_external_af_ref_bed}",
"GATKSVPipelineSingleSample.external_af_ref_bed_prefix" : "${workspace.reference_external_af_ref_bed_prefix}",
Expand Down
Loading

0 comments on commit daf14a9

Please sign in to comment.