Following GATK4 best practices workflow: GATK_Germline_Short_Variant_Discovery
- PREREQUISITE PREPARATION
Please download and install bcftools from here to manipulate vcf file.
Please also refer to this htslib_Guide or latest release guide on how to build and install appropriately.
Please download and install htslib from here.
Please have a look at this htslib_latest guide on how to install it correctly.
This variant discovery practice will be using GATK v4.4.0.0.
One can download the zip file for this latest version here.
Please have a look at this in depth tutorial in GATK github: broadinstitute/gatk.
Please download the reference genome and databases required for this variant calling practice. One should always download both the vcf file and its tbi indexed file.
One can navigate to UCSC sequence data by chromosome using this link: UCSC_hg38_sequence_data_by_chromosome, then choose to download "chr21.fa.gz" reference fasta file.
OR
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz
- VCF file
wget -c ftp://[email protected]/bundle/hg38/dbsnp_146.hg38.vcf.gz
- INDEX file
wget -c ftp://[email protected]/bundle/hg38/dbsnp_146.hg38.vcf.gz.tbi
- VCF file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz
- INDEX file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz.tbi
- VCF file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz
- INDEX file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
- VCF file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz
- INDEX file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
- VCF file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
- INDEX file
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi
Please download both bam file and bai file to follow this practice. These 2 files can be found in test_data folder.