Skip to content

This repository was created to upload the scripts used for population genetics analysis of the XX paper

Notifications You must be signed in to change notification settings

humphreyaddy/Popgen_bacteria

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 

Repository files navigation

Estimating adaptive evolution among bacterial species

This repository was created to upload the scripts used for the population genetics analyses of the paper: "Recombination facilitates adaptive evolution in rhizobial soil bacteria", Cavassim et al., 2021.

Primary steps (previously published in Cavassim et al., 2020)

SNP-calling

python script: https://github.com/izabelcavassim/Rhizobium_analysis/#snp-calling

Classifying synonymous and non-synonymous sites

python script: hdf5_call_variants.py

Estimating transition transversion bias and non-synonysmous and synonymous sites

python script: ts_tv_calc.py

Computing the DFE and alpha

We used the software GRAPES to estimate the DFE across pairs of species. We first computed the folded Site frequency spectrum of synonysmous and non-synonymous sites using the customized python script:

Grapes_files_pairwise_quantile.py

To produce a text file ({species1}_{species2}_grapes.txt) that looks like this one:

gsE+gsD (548.000000 genes)
'all_genes'	11	547966.6905329467	365	257	354	107	259	149830.30946705345	2698	2220	2469	1062	1952	547966.6905329467	963	149830.30946705345	8810

In which species_1 (this case gsE) is the focal group (polymorphism) and species 2 (in this case gsD) is the outgroup (divergence). A description of each entry is found here.

And then for each input file generated we ran GRAPES as follow:

/home/mica16/grapes-izabel-work/bin/grapes -in {inputs_dir}{species1}_{species2}_grapes.txt -out {results_dir}{species1}_{species2}_grapes_output.txt -nb_rand_start 20 -model all

The same procedure is done for computing alpha across recombination classes.

Computing LD per gene

python script: LD_per_gene.py

Workflow

For my own sake I created a python workflow that combines all the described analyses above. I used the software gwf to build and run this workflow.

Workflow pipeline (workflow.py), scripts and files connected to it are found here.

Feel free to contact me if you want to apply these scripts in your own dataset or if you have any questions.

About

This repository was created to upload the scripts used for population genetics analysis of the XX paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%