Skip to content

Latest commit

 

History

History
77 lines (60 loc) · 12.4 KB

File metadata and controls

77 lines (60 loc) · 12.4 KB

Awesome-HealthCare-KnowledgeBase

A curated list of awesome healthcare taxonomies and knowledge graphs. We know we may have missed important softwares or literatures, please feel free to create an issue for any suggesstions.

HealthCare Ontology/Taxonomies

What are the differences between ontology and taxonomy? See discussion 🔗 here>>.

Name Paper Misc.
Mondo Disease Ontology [Website] A semi-automatically constructed ontology that merges in multiple disease resources to yield a coherent merged ontology.
MeSH Ontology [Website] MeSH includes the subject headings appearing in MEDLINE/PubMed, the NLM Catalog, and other NLM databases.
UMLS Semantic Network [Website] Broad categories (semantic types) and their relationships (semantic relations) for UMLS Metathesaurus
SNOMED CT [Website] A multilingual hierarchical organized medical terms providing codes, terms, synonyms and definitions used in clinical documentation and reporting.
Disease Ontology The Human Disease Ontology 2022 update (Nucleic Acids Research'22) [Website] 10,862 disease terms, 22,137 new SubClassOf Axioms
Gene Ontology [Citation Policy] [Website] three ontologies: Molecular Function, Cellular Component, Biological Process
Cell Taxonomy Cell Taxonomy: a curated repository of cell types with multifaceted characterization (Nucleic Acids Research'22) [Website] 3,143 cell types, 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species.

HealthCare Knowledge Graphs

Name Paper Domain Scale Data Sources
General KGs
ClinicalKG (CKG) Clinical Knowledge Graph Integrates Proteomics Data into Clinical Decision-Making (bioRxiv) [Website] [Code] clinical, laboratory and imaging data, multiomics data, and EHRs 16 million nodes and 220 million relationships integrate 25 KGs, 10 ontologies(taxos)
MSI Identification of disease treatment mechanisms through the multiscale interactome (Nature Communications'21) [Code] Drugs, Proteins, Diseases, Biological Functions, Gene 1,661 drugs, 840 disease, 17,660 proteins, 9,798 biological functions
Hetionet Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes (PLOS Computational Biology'15) [Website] 11 types of nodes, 24 types of edges combines information from 29 public databases. 47,031 nodes (11 types) and 2,250,197 relationships (24 types) Entrez Gene, DrugBank, Uberon, Disease Ontology, MeSH ontology, SIDER, UMLS, Gene Ontology, WikiPathways, Reactome, Pathway Interaction Database, DrugCentral
iBKH iBKH: The integrative Biomedical Knowledge Hub (Iscience'23) [Code] Anatomy, Disease, Drug, Gene, Molecule, Symptom, Dietary Supplement Ingredient/Product, Therapeutic Class, Pathway, Side-Effect 2M entities, 48M relations Integrate 18 public data sources
PrimeKG Building a knowledge graph to enable precision medicine (Scientific Data'23) Biological process, Protein, Disease, Phenotype, Anatomy, Molecular function, Drug, Cellular component, Pathway, Exposure 129,375 nodes, 4,050,249 edges integrates 20 high-quality resources
Disease-specific KGs
DRKG [Blog article'22] [Code] genes, compounds, diseases, biological processes, side effects and symptoms focusing on drug repurposing for COVID-19. 97,238 entities belonging to 13 entity-types; and 5,874,261 triplets belonging to 107 edge-types. DrugBank, Hetionet, GNBR, String, IntAct and DGIdb, and Covid19 literatures.
COVID-KG COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation (NAACL'21) [Website] Gene, Disease, Chemical, Organism focusing on COVID-19 50,752 Gene nodes, 10,781 Disease nodes, 5,738 Chemical nodes, and 535 Organism nodes; 133 relation types COVID scientific literature, and existing CTD, MESH
KGHC KGHC: a knowledge graph for hepatocellular carcinoma (BMC Medical Informatics and Decision Making'20) focusing on hepatocellular carcinoma 5,028 entities and 13,296 triples SemMedDB, Literature, Clinical Trials
Drug-specific KGs
repoDB A Standard Database for Drug Repositioning (Scientific Data'17) [Website] Drug, Disease 1,571 drugs, 2,051 diseases N/A
DrugBank DrugBank: a comprehensive resource for in silico drug discovery and exploration (Nucleic Acids Research'06) [Website] Drug, and drug target (i.e. sequence, structure, pathway) 15,686 drug, 5,296 non-redundant protein sequences N/A
DrugCentral DrugCentral: online drug compendium (Nucleic Acids Research'16) [Website] Drug, Target, Disease, Pharmacologic action, Active Ingredients 112,359 FDA drug labels, 4,927 Active Ingredients, 137,693 Pharmaceutical formulations N/A
Protein-specific KGs
The Human Protein Atlas [Website] Proteins, Genes, Tissues, Cell, Pathology, Disease 27520 antibodies targeting, 17288 unique proteins N/A
Proteinarium Proteinarium: Multi-sample protein-protein interaction analysis and visualization tool(Genomics'20) [Website] multi-sample protein-protein interaction TB Release

KB Construction

Interested in the interaction between Large Language Models and KB? See this amazing resource 🔗 here>>.

Name Paper Used Resources
LM-Bio-KGC Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study (AKBC'21) [Code] repoDB, MSI, Hetionet
AutoRD AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models (ArXiv'24) [Code] RareDis-v1

KB Fusion: including ontology matching, entity alignment.

Ontology Matching

Problem Definition: Given two ontoloties $\mathcal{O}_1, \mathcal{O}_2$, generating a 4-tuple $(e, e', r, c)$ where $e$ and $e'$ are entities of $\mathcal{O}_1$ and $\mathcal{O}_2$; $r$ is the semantic equivelant relation (can be extendted to $\{ \sqsubseteq,\sqsupseteq, \equiv \}$) ; and $c$ is a confident value within $[0, 1]$.

Name Paper Datasets Notes
LogMap LogMap: Logic-based and Scalable Ontology Matching (ISWC'11) [Github] FMA-NCI, FMA-SNOMED, SNOMED-NCI, Mouse-NCIAnat a highly scalable ontology matching system in Java
PARIS PARIS: Probabilistic Alignment of Relations, Instances, and Schema (VLDB'12) [Github] YAGO-DBpedia a non-neural entity, relation and ontology alignment system in Java.
AML The AgreementMakerLight Ontology Matching System (OTM'13) [Github] OAEI'2012: FMA-NCI, FMA-SNOMED, SNOMED-NCI An element-level ontology alignment system in Java.
OAEI [Website] Mondo: OMIM-ORDO, NCIT-DOID; UMLS: SNOMED-FMA, SNOMED-NCIT Ontology Alignment Evaluation Initiative since 2011
MEDTO MEDTO: Medical Data to Ontology Matching Using Hybrid Graph Neural Networks (KDD'21) databases: MIMIC-III, MDX; ontology: OAEI database to ontology matching task

Entity Alignment (or KG Alignment)

Problem Definition: Given two KGs $\mathcal{G}_1=\{ \mathcal{E}_1, \mathcal{R}_1, \mathcal{TP}_1 \}$ and $\mathcal{G}_2= \{ \mathcal{E}_2, \mathcal{R}_2, \mathcal{TP}_2 \}$, where $\mathcal{E}$ and $\mathcal{R}$ denote the sets of entities and relations, $\mathcal{TP} \subseteq \mathcal{E}\times \mathcal{R}\times \mathcal{E}$ is the set of relational triplets. Entity aligment aims to identify entities in $\mathcal{G}_1$ and $\mathcal{G}_2$ that refer to the same real-world object, i.e., seeking a set of alignment $\mathcal{A}=\{(e_1,e_2)\in \mathcal{E}_1\times\mathcal{E}_2 | e_1 \equiv e_2 \}$. A small set of seed entity algiment $\mathcal{A}_s \subset \mathcal{A}$ is usually provided as anchors (training data) beforehand to help align the remaining entities.

Name Paper Baselines Datasets
industry eval EA An Industry Evaluation of Embedding-based Entity Alignment (COLING'17) BootEA, MultiKE, RDGCN, RSN4EA, PARIS cross-lingual EA: DBP15K, WK3160K; cross-KG (DBpedia and Wikidata) EA: DWY15K, DWY100K, MED-BBK-9K: contains two Chinese medical KGs.
OpenEA [Code] 20+ methods cross-lingual DBpedia: EN-FR, EN-DE; cross-KG: D-W(ikidata), D-Y(AGO)
UED Semi-constraint Optimal Transport for Entity Alignment with Dangling Cases (arxiv'22) [Code] MTransE, JAPE, BootEA, RDGCN, RNM, RAGA, EchoEA, SelfKG, SoTead, UEA, SEU cross-lingual EA: DBP15K, DBP2.0; cross-lingual medical-KG EA: MedED
OntoEA OntoEA: Ontology-guided Entity Alignment via Joint Knowledge Graph Embedding (Findings of ACL 2021) [Code] OpenEA methods shared Ontology: EN-FR, EN-DE, MED-BBK; not shared Ontology: D-W
SapBERT Self-Alignment Pretraining for Biomedical Entity Representations (NAACL'21) [Code] BioBERT, BlueBERT, ClinicalBERT, SciBERT, UMLSBERT, PubMedBERT NCBI, BC5CDR, MedMentions
HiPrompt HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting (SIGIR'23) LogMap, PARIS, AML, SapBERT, SelfKG, MTransE; HiPrompt: LLM-based entity alignment method SDKG-DO, repoDB-DO