This repository contains the data, scripts, and analyses used in the research titled "Unravelling the Co-Morbidity between COVID-19 and Neurodegenerative Diseases Through Multi-scale Graph Analysis: A Systematic investigation of Biological Databases and Text Mining". The project leverages Neo4j paltform for graph-based analysis and integrates natural language processing to explore relationships between COVID-19 and neurodegenerative diseases (NDDs).
- Overview
- Data
- Sources
- Notebooks
- Getting Started
- Exploring the Covid-NDD Comorbidity Database
- Contact
This project explores the connections between COVID-19 and neurodegenerative diseases by:
- Integrating database information about COVID-19 and NDDs and storing them in a graph structure.
- Extracting textual data from scientific literature and using natural language processing pipelines for information extraction and KG construction.
- Loading all KG in Neo4j to identify and analyse relationships and pathways between entities such as genes, diseases, and chemicals.
- Construction of a hypothesis database for omorbidity between COVID-19 and NDDs to explore, analyse, and visualise testable comorbidity hypotheses.
The repository includes the following directories:
-
Expert-curated-publications: Contains manually curated publications relevant to the study, ensuring high-quality and accurate information.
-
PubTator3-results: Includes results from PubTator3, a web-based system that offers a comprehensive set of features and tools for exploring biomedical literature using advanced text mining and AI techniques. :contentReference[oaicite:0]{index=0}
-
Sherpa-results: Houses outputs from Sherpa, a tool designed to assist in the curation of biomedical literature by providing automated annotations and insights.
-
Textual-corpora-for-textmining: Comprises textual corpora prepared for text mining purposes, facilitating the extraction of meaningful patterns and relationships regarding COVID-19 and NDD.
- Purpose: Automatically opens the Neo4j Browser with prefilled credentials to connect to the AuraDB instance for comorbidity hypothesis exploration.
- Key Features:
- Simplifies connection to Neo4j by generating a pre-configured URL.
- Useful for direct interaction with the knowledge graph.
- Usage:
Run the script, and the Neo4j Browser will open in your default web browser:
python comorbidity-hypothesis-db.py
- Purpose: Uploads curated comorbidity hypothesis paths to the Neo4j AuraDB instance.
- Key Features:
- Simplifies uploading comorbidity hypothesis candidates.
- Standardizes and normalizes graph entities for compatibility.
- Usage:
Run the script to upload the data:
python comorbidity_database_neo4j_upload.py
-
Neo4j AuraDB: Ensure you have access to a Neo4j AuraDB instance. Use the provided connection details or set up your own.
-
Python Environment: Install the required libraries:
pip install neo4j pandas
- Purpose: Analyzes the knowledge graph loaded to Neo4j to extract insights.
- Key Features:
- Counts nodes and edges in the graph.
- Executes community detection algorithms like Louvain using Neo4j's Graph Data Science (GDS) library.
- Retrieves and visualizes properties of detected clusters
- Usage:
Open the Jupyter Notebook and follow the instructions to:
- Query the Neo4j database.
- Get general statistics about nodes, triple and pathways, and analyze them.
- Purpose: These scripts are designed to upload multiple databases into Neo4j, providing a streamlined workflow for graph-based data integration and analysis.
- Prerequisites:
- bel_json_import package for BEL data conversion to eBEL format
- Properly formatted database extracts
- Key Features:
- Efficiently import graph data into Neo4j using a common schema
- Seamless integration of complex biological networks
- Privacy-aware data handling
- Usage:
- Open the notebook in Jupyter Notebook or JupyterLab
- Place data in required locations
- Run cells specific to each source
To manually explore the comorbidity graph database:
-
Open the Neo4j Browser:
Navigate to https://browser.neo4j.io.
-
Enter the Connection Details:
-
URI:
neo4j+s://09f8d4e9.databases.neo4j.io
-
Username:
neo4j
-
Password: Refer to the credentials provided in the src/comorbidity-hypothesis-db.py.
-
-
Run Cypher Queries:
Once connected, you can execute Cypher queries to explore the graph. For example, to retrieve a sample of nodes:
MATCH (n) RETURN n LIMIT 10;
For any questions, suggestions, or collaborations, please contact:
Negin Babaiha
Email: [email protected]
Google Scholar Profile
Feel free to reach out for discussions regarding the project!