This repository holds all the code that was used in the Acromyrmex inquiline comparative genomics project.
These files hold the workflows to analyse the basic quality of the genomes used in the comparison.
# mac
macbase=/Users/lukas/sciebo/inquilineGenomics18/
# pallas
base=/corefac/cse/lukas/inqGen18/
The latest assemblies and annotations delivered from BGI are located on Pallas. The raw files are located at /usr/local/home/lschrader/data/genomes/inquilines_2.0/
Acromyrmex charruanus: ./Acha.files.txt
Acromyrmex insinuator: ./Ains.files.txt
Acromyrmex heyeri: ./Ahey.files.txt
Pseudoatta argentina: ./Parg.files.txt
Clean genomes and annotations of fungal and mitochondrial scaffolds.
./QC/filterContaminants/inquilines2.0-to-2.1.sh
# location on pallas
$base/inquilines_v2.1/
$base/inquilines_v2.1/Acromyrmex_charruanus.2.1/Acromyrmex_charruanus.v2.1.files.tree
$base/inquilines_v2.1/Acromyrmex_heyeri.2.1/Acromyrmex_heyeri.v2.1.files.tree
$base/inquilines_v2.1/Acromyrmex_insinuator.2.1/Acromyrmex_insinuator.v2.1.files.tree
$base/inquilines_v2.1/Pseudoatta_argentina.2.1/Pseudoatta_argentina.v2.1.files.tree
Code, programs and scripts used for quality control of assemblies and annotations are provided below.
BUSCO (v3) was run on v2.1 assembly and annotation of the new genomes and on the reannotated previously published attine genomes.
# newly sequenced genomes
./QC/busco/runBUSCO_inq.v2.1.sh
# previously published attines
./QC/busco/runBUSCO_7ants.sh
Quast (v5.0.2) was run to assess N50 contig and scaffold sizes and other assembly parameters.
./QC/N50/runQuast.sh
TransposonPSI (v.08222010) was run to identify putative TE-encoded proteins.
./QC/transposonPSI/runTransposonPsi.sh
GAG v2.0.1 was run on all newly sequenced genomes (see ./QC/filterContaminants/inquilines2.0-to-2.1.sh
) and also on reannoated attines.
./QC/gag/runGAG_7attines.sh
I used orthofinder (v.2.2.6) to identify orthologs and orthogroups between different species.
./orthology/runOrthofinder.sh
Alignment of single copy orthologs, testing for recombination and extraction of 4d sites.
./singleCopyOrthologs/runSCOalignment.sh
Phylogenetic inference with PAUP, PhyML, MrBayes, and RaxML
./phylogeny/runPhylogeny.sh
Divergence estimates with mcmcTree
./TreeDating/treeDating.workflow.sh
The complete workflow for analysing gene family evolution with CAFE is described here:
./geneFamilies/README.md
# This readme refers to the following scripts:
run.geneFamilies.mcl.sh
run.CAFError.sh
run.CAFE.sh
analyseCAFEmodels.Rmd
CAFE.lhtest.R
analyseCAFEoutput.Rmd
Demographic history for A. insinuator and A. echinatior with MSMC2:
./Ne/MSMC2.refAech.md ./Ne/plotMSMC2.Rmd
Evolutionary rate analyses and signatures of selection (SOS) on single copy orthologs with HyPhy (v2.3.14)
./SOS/AnalyseDNDS.hyphy.sh
./SOS/RELAX.summary.Rmd
./SOS/absREL.summary.Rmd
./SOS/analyseSOS.Rmd
GO enrichment analyses were run on genes under relaxed, intensified, and positive selection and on shrinking gene families.
./GOenrichment/RELAX.GOenrichment.Rmd
./GOenrichment/absREL.GOenrichment.Rmd
./GOenrichment/CAFE.GOenrichment.Rmd
# scripts for revigo plots
./GOenrichment/revigoPlots/
Whole genome alignment with progressiveCactus v0.1
./WGA/run.progressiveCactus.sh
Synteny was analyzed with i-adhore v3.0.1
./synteny/run.iadhore.sh
./synteny/analyse.synteny.Rmd
# analysis of annotated OR genes only
./synteny/run.iadhore.ORs.sh
./synteny/plotOR.synteny.Rmd
The full annotation pipeline is contained in
./ORevolution/clean/ORannotation.190408.sh
The following scipts are called in the above bash script:
ORannotation.190408.sh
filterForDomain.sh
runGeMoMa.pipe.sh
runExonerate.sh
runEVM.sh
prepareEVM.sh
addIncompleteGenes.sh
rename.R
runExonBlast.sh
runGeMoMa.sh
runExonerate.pl
processExonerate2Fa.pl
cleanExonerateGFF.pl
translateCDS2AA.pl
processExonerate2GFF.pl
addGeneGFF.pl
identify_geneLS.pl
hsp2bed.pl
# final annotations appropriate for v2.1 and published attine genomes
./ORevolution/results/annotations/
Scripts for OR alignments, alignment QC, tree inference, tree plotting, and analyses.
./ORphylogeny/ORphylogeny.sh
./ORphylogeny/ORplotTreeClean.Rmd
./ORphylogeny/analyseTree.Rmd
Scripts for clade-wide OR alignments, alignment QC, tree inference, tree plotting, and analyses.
./ORevolution/reconciliation/allClades.Rmd
./ORevolution/reconciliation/DLCpar.md
./ORevolution/reconciliation/plotReconTreeDLCpar.Rmd
./ORevolution/reconciliation/runAllDLCpar.sh
./ORevolution/OR.SOS/hyphy.ORs.sh
./ORevolution/OR.SOS/summary.RELAX.OR.Rmd
./ORevolution/OR.SOS/summary.absREL.OR.Rmd
# Detailed analysis of clade 285
./ORevolution/OR.SOS/analyseClade285.Rmd
# MRJPs
./genePhylogenies/MRJPevolution/MRJP.annotation.sh
./genePhylogenies/MRJPevolution/MRJP.phylogeny.sh
./genePhylogenies/MRJPevolution/plotMRJP.tree.Rmd
./genePhylogenies/MRJPevolution/mrjp.analyseTree.Rmd
# Gustatory Receptors (GR)
./genePhylogenies/GRevolution/annotateGRs.sh
./genePhylogenies/GRevolution/phylogenyGRs.sh
./genePhylogenies/GRevolution/plotGRtree.Rmd
# Elongases
./genePhylogenies/ELONGASEevolution/annotate.ELNG.sh
./genePhylogenies/ELONGASEevolution/phylogeny.ELNG.sh
./genePhylogenies/ELONGASEevolution/plotELGtree.Rmd
# Cuticular Proteins (CPRs)
./genePhylogenies/CPRevolution/annotateCPR.sh
./genePhylogenies/CPRevolution/phylogenyCPR.sh
./genePhylogenies/CPRevolution/plotCPRtree.Rmd
./genePhylogenies/barplot.Rmd