Skip to content

Latest commit

 

History

History
60 lines (45 loc) · 3.24 KB

self-learning.md

File metadata and controls

60 lines (45 loc) · 3.24 KB

Introduction to Variant Calling

Learning Objectives

  • Evaluate QC metrics for variant calling
  • Call variants using GATK
  • Filter variants to retain only high-quality variant calls
  • Annotate variants using SnpEff and dbSNP
  • Prioritize variants by their impact
  • Visualize variants in IGV

Installations

On your desktop

  1. FileZilla Client (make sure you get ‘FileZilla Client')
  2. Integrative Genomics Viewer (IGV)

On your HPCC (if not using Harvard's O2 cluster)

Required

  1. FastQC version 0.11.9
  2. bwa version 0.7.17
  3. Picard version 2.27.5
  4. MultiQC version 1.12
  5. GATK version 4.1.9.0
  6. SnpEff and SnpSift suite version 4.3g
  7. bcftools version 1.14

Optional

  1. samtools version 1.15.1
  2. bedtools version 2.30.0

NOTE: If you are not working on the O2 cluster and are using different versions of these software programs, these packages may still work with the provided commands. However,this workshop was designed on these versions specifically, so you may need to tweak some of the commands if you use different versions of this software.

Lessons

  1. ICGC-TCGA DREAM Mutation Calling Challenge Synthetic Dataset
  2. Project Organization
  3. File Formats
  4. Evaluating Read Quality with FastQC
  5. Sequence Read Alignment
  6. Alignment File Processing
  7. Alignment File Quality Control
  8. Evaluating Quality Control Metrics
  9. Variant Calling
  10. Variant Filtering
  11. Variant Annotation with SnpEff
  12. Automation of Variant Calling Pipeline
  13. Variant Prioritization with SnpSift
  14. Visualization in IGV

NOTE: If you aren't working on Harvard's O2 cluster the directory structure for the HPCC that you are using is likely different and you will need to modify paths to work within your HPCC's directory structure.

Answer key


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.