Skip to content

RNA seq pipeline: HiSat2 Stringtie ballgown

Ricky Woo edited this page Dec 1, 2017 · 1 revision

1. Download the genome and transcriptome data of GRCh38 version

ENSEMBL_RELEASE=80
ENSEMBL_GRCh38_BASE=ftp://ftp.ensembl.org/pub/release-${ENSEMBL_RELEASE}/fasta/homo_sapiens/dna
ENSEMBL_GRCh38_GTF_BASE=ftp://ftp.ensembl.org/pub/release-${ENSEMBL_RELEASE}/gtf/homo_sapiens
F=Homo_sapiens.GRCh38.dna.primary_assembly.fa
GTF_FILE=Homo_sapiens.GRCh38.${ENSEMBL_RELEASE}.gtf

## download the genome file
curl -o $F.gz ${ENSEMBL_GRCh38_BASE}/$F.gz
gunzip $F.gz
mv $F genome.fa

## download the transcriptome data file
curl -o ${GTF_FILE}.gz ${ENSEMBL_GRCh38_GTF_BASE}/${GTF_FILE}.gz
gunzip ${GTF_FILE}.gz

2. Extract the splice site and exon information

hisat2_extract_splice_sites.py ${GTF_FILE}.gz > genome.ss
hisat2_extract_exons.py ${GTF_FILE}.gz > genome.exon

3. Build the index

hisat2-build -p 4 genome.fa --ss genome.ss --exon genome.exon hisat2/genome_tran

A bioinformatics wiki for the course BI462.

Clone this wiki locally