disclaimer: this is a general repository with some unspecific structure. It represents a work in progress!
This repository contains a collection of bioinformatics functions used to analyse bulk/single-cell transcriptomics data.
Some of the scripts require rpy2 v.2.9.4
- found here - to run.
The default_env.yml
contains all the libraries needed to create the environment.
The src
folder contains all the relevant scripts/functions where:
network
contains network analysis methods to study co-expression networks (motifs detection, clustering coefficient, etc...)
-
bioinfo
contains general methods for differential expression, pathway enrichment analysis, and other methods-
run_pathifier
function is a Python implementation of the Pathifier (download here) method developed by Drier et al.. This method allows to compress gene-based info into pathway-based info.
This implementation allows to choose between the original method and the PathTracer method.
Also, this is a modified version of both: the optimal number of PCs to use for the Principal Curve computation is given by a permutation test on the components:
- seemy_PCA.select_components_above_background
for reference.
- an example of the pipeline is given in the modulepathifier_example_structure.py
.
This is specific to a dataset, but can be generalised - planning to do so with time.
-
-
data_viz
contains custom visualization tools to attach heatmaps to dendgrograms, explore pathway enrichment, UMAP plots -
statistics
contains standard statistical tests, both for stats and ML, together with fitting functions, etc...
The example
directory contains some output files generated with the scripts contained in this repository.
The repos
directory contains some repository with standard bioinformatic annotation databases (disclaimer! they could be outdated):