Code repository for the paper entitled "Genetic-based patient stratification in Alzheimer’s disease". The repository follows the methodology and results presented in the mentioned work.
The results obtained are organized in the following notebooks:
The results obtained for the manuscript are organized in the following notebooks:
- 00_make_datasets - generate edge scores datasets that will be used as input for clustering.
- 01_clustering - perform clustering with edge scores data (Similarity Network Fusion + spectral clustering)
- 02_bio_analysis - describe clusters regarding significant differences, important genetic variants, etc.
- 03_clusters_description - describe clusters regarding sociodemographic, clinical, and biomarkers data.
- 04_neurocognitive_analysis - obtain neurocognitive networks for each cluster.
These notebooks call several scripts or take as input their results. The mentioned scripts are:
- extract_variants.sh - bash script for extracting genetic variants from the original VCF files.
- bio_networks.py - scripts for obtaining the biological network employed.
- obtain_edges_scores.py - script for obtaining the individual edge scores.
- utils_description.py - script containing several functions for data and statistical analyses.
- neurocognitive_analysis_utils.py - script containing several functions for obtaining neurocognitive networks and performing statistical analyses. This work is based on the original work done by Ana Solbas at asolbas/AD-CogNet.
Other directories in this repository:
- data contains several data files used in this work.
- figures figures obtained for this work and present in the paper.
Please note that several files, such as raw genetic data, datasets built from it, and other data files for clusters' descriptions (sociodemographics, clinical, neuroimaging, biomarkers, neurocognitive) are unavailable in this repository for privacy reasons. ADNI data can be accessed under formal request at adni.loni.usc.edu/data-samples/access-data.