This document provides an overview of state-of-the-art models for generating medical image reports. We compare two main approaches: end-to-end large language models (LLM) and image-segmentation report approaches.
Pros:
- Direct generation of reports from images.
- Handles raw data with rich context.
- Flexible and scalable for various medical imaging tasks.
Cons:
- Requires extensive training data.
- May produce hallucinations and overfitting.
- Needs robust filtering and augmentation.
Additional Context:
"Using LLMs like GPT-4 can help with normalization detection. CheXagent provides a solid baseline for report generation with a large dataset. Fine-tuning on private data (e.g., with LLaVA) can yield good results. However, current models often face issues like hallucinations and overfitting."
— Senior AI Researcher in AI University
"At our organization, we generate reports using fixed logic and templates, not LLMs, due to their unreliability and limited added value in this context."
— CTO at MedAI Startup
"End-to-end models that combine segmentation and text generation are being developed but often have poor performance in practice."
— Senior Research Engineer in Startup
"We anchor our models to a set of AI validated outputs to ensure reliability and accuracy."
— Annalise.ai Representative (YouTube Video)
Model Name | # stars | Unique Features | Performance Highlights | Source | Code Link |
---|---|---|---|---|---|
PromptMRG | ⭐⭐⭐⭐ | Uses diagnosis-driven prompts (DDP), cross-modal feature enhancement | Higher diagnostic accuracy, improved clinical relevance of reports | arXiv | GitHub |
KERP | ⭐⭐⭐⭐ | Combines abnormality graph learning with template retrieval and paraphrasing | Structured and accurate reports, state-of-the-art results in classification | AAAI | GitHub |
IIHT | ⭐⭐⭐⭐ | Classifier, indicator expansion, and generator modules mimicking radiologists' workflow | Effective modeling of hierarchical report generation | SpringerLink | GitHub |
MedRAT | ⭐⭐⭐⭐ | Does not require paired image-report data, uses auxiliary tasks | Detailed, contextually relevant reports, surpasses previous methods | Papers With Code | GitHub |
CheXagent | ⭐⭐⭐⭐ | Trained on the largest publicly available dataset of image and text pairs | Solid baseline for medical report generation | Hugging Face | GitHub |
LLaVA | ⭐⭐⭐⭐ | Fine-tuned on private datasets for flexible and customizable results | Comparable to other top models, flexible influence on results | BioNLP Workshop | GitHub |
Pros:
- Reliable and interpretable results.
- Facilitates precise measurements and visualizations.
- Easier management of segmentation tasks.
Cons:
- Requires detailed segmentation models for each pathology.
- Time-consuming development and re-training when templates change.
Additional Context:
"We use fixed logic and templates for report generation instead of LLMs due to their unreliability."
— CTO at MedAI Startup
"Segmentation is often not used for modalities like chest X-rays due to their limited detail. However, end-to-end segmentation and text generation can be useful for other imaging modalities."
— Senior Research Engineer in Startup
Project Name | # stars | Description | Scenario | Source |
---|---|---|---|---|
Raidionics | ⭐⭐⭐⭐ | Provides a complete pipeline for medical image segmentation and report generation using templates | Detection, Segmentation, Reporting | GitHub |
MONAI | ⭐⭐⭐⭐ | PyTorch-based framework for deep learning in healthcare imaging | Preprocessing, Classification, Segmentation | GitHub |
Medical Detection Toolkit | ⭐⭐⭐ | Contains 2D + 3D implementations of prevalent object detectors for medical images | Detection, Segmentation | GitHub |
TransUnet | ⭐⭐⭐ | Transformers for medical image segmentation | Segmentation | GitHub |
End-to-End LLM Approach:
- Pros: Direct generation of reports from images, handles raw data with rich context, flexible and scalable for various medical imaging tasks.
- Cons: Requires extensive training data, may produce hallucinations and overfitting, needs robust filtering and augmentation.
Image-Segmentation Report Approach:
- Pros: Reliable and interpretable results, facilitates precise measurements and visualizations, easier management of segmentation tasks.
- Cons: Requires detailed segmentation models for each pathology, time-consuming development and re-training when templates change.
Both approaches have their strengths and are suited to different aspects of medical imaging and report generation. End-to-end LLM approaches are more flexible and scalable, while image-segmentation report approaches offer precision and reliability.
- PromptMRG: arXiv
- KERP: AAAI
- IIHT: SpringerLink
- MedRAT: Papers With Code
- CheXagent: Hugging Face
- LLaVA: BioNLP Workshop
- Raidionics: GitHub
- MONAI: GitHub
- Medical Detection Toolkit: GitHub
- TransUnet: GitHub
- Annalise.ai: YouTube Video