PREREQUISITE PREPARATION

Following GATK4 best practices workflow: GATK_Germline_Short_Variant_Discovery

Tools

1. bcftools

Please download and install bcftools from here to manipulate vcf file.

Please also refer to this htslib_Guide or latest release guide on how to build and install appropriately.

2. htslib

Please download and install htslib from here.

Please have a look at this htslib_latest guide on how to install it correctly.

3. GATK

This variant discovery practice will be using GATK v4.4.0.0.

One can download the zip file for this latest version here.

Please have a look at this in depth tutorial in GATK github: broadinstitute/gatk.

Requirements
Downloading GATK
Building GATK
Running GATK

Reference genome for human chromosome 21 and necessary databases

Please download the reference genome and databases required for this variant calling practice. One should always download both the vcf file and its tbi indexed file.

1. Reference genome

One can navigate to UCSC sequence data by chromosome using this link: UCSC_hg38_sequence_data_by_chromosome, then choose to download "chr21.fa.gz" reference fasta file.

OR

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz

2. dbsnp_146 database

VCF file

wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_146.hg38.vcf.gz

INDEX file

wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_146.hg38.vcf.gz.tbi

3. 1000G_omni2.5.hg38 known snps database

VCF file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz

INDEX file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz.tbi

4. 1000G_phase1 snps with high confidence hg38 database

VCF file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz

INDEX file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi

5. Homo sapiens assembly38 known indels database

VCF file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz

INDEX file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi

6. Mills and 1000G gold standard for known indels hg38 database

VCF file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

INDEX file

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi

Dataset for practice

Please download both bam file and bai file to follow this practice. These 2 files can be found in test_data folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prerequisites_setup.md

prerequisites_setup.md

PREREQUISITE PREPARATION

Table of Contents

Tools

1. bcftools

2. htslib

3. GATK

Reference genome for human chromosome 21 and necessary databases

1. Reference genome

2. dbsnp_146 database

3. 1000G_omni2.5.hg38 known snps database

4. 1000G_phase1 snps with high confidence hg38 database

5. Homo sapiens assembly38 known indels database

6. Mills and 1000G gold standard for known indels hg38 database

Dataset for practice

Files

prerequisites_setup.md

Latest commit

History

prerequisites_setup.md

File metadata and controls

PREREQUISITE PREPARATION

Table of Contents

Tools

1. bcftools

2. htslib

3. GATK

Reference genome for human chromosome 21 and necessary databases

1. Reference genome

2. dbsnp_146 database

3. 1000G_omni2.5.hg38 known snps database

4. 1000G_phase1 snps with high confidence hg38 database

5. Homo sapiens assembly38 known indels database

6. Mills and 1000G gold standard for known indels hg38 database

Dataset for practice