Medical Image Report Generation Models

This document provides an overview of state-of-the-art models for generating medical image reports. We compare two main approaches: end-to-end large language models (LLM) and image-segmentation report approaches.

End-to-End LLM Approach

Pros:

Direct generation of reports from images.
Handles raw data with rich context.
Flexible and scalable for various medical imaging tasks.

Cons:

Requires extensive training data.
May produce hallucinations and overfitting.
Needs robust filtering and augmentation.

Additional Context:

"Using LLMs like GPT-4 can help with normalization detection. CheXagent provides a solid baseline for report generation with a large dataset. Fine-tuning on private data (e.g., with LLaVA) can yield good results. However, current models often face issues like hallucinations and overfitting."
— Senior AI Researcher in AI University

"At our organization, we generate reports using fixed logic and templates, not LLMs, due to their unreliability and limited added value in this context."
— CTO at MedAI Startup

"End-to-end models that combine segmentation and text generation are being developed but often have poor performance in practice."
— Senior Research Engineer in Startup

"We anchor our models to a set of AI validated outputs to ensure reliability and accuracy."
— Annalise.ai Representative (YouTube Video)

Model Name	# stars	Unique Features	Performance Highlights	Source	Code Link
PromptMRG	⭐⭐⭐⭐	Uses diagnosis-driven prompts (DDP), cross-modal feature enhancement	Higher diagnostic accuracy, improved clinical relevance of reports	arXiv	GitHub
KERP	⭐⭐⭐⭐	Combines abnormality graph learning with template retrieval and paraphrasing	Structured and accurate reports, state-of-the-art results in classification	AAAI	GitHub
IIHT	⭐⭐⭐⭐	Classifier, indicator expansion, and generator modules mimicking radiologists' workflow	Effective modeling of hierarchical report generation	SpringerLink	GitHub
MedRAT	⭐⭐⭐⭐	Does not require paired image-report data, uses auxiliary tasks	Detailed, contextually relevant reports, surpasses previous methods	Papers With Code	GitHub
CheXagent	⭐⭐⭐⭐	Trained on the largest publicly available dataset of image and text pairs	Solid baseline for medical report generation	Hugging Face	GitHub
LLaVA	⭐⭐⭐⭐	Fine-tuned on private datasets for flexible and customizable results	Comparable to other top models, flexible influence on results	BioNLP Workshop	GitHub

Image-Segmentation Report Approach

Pros:

Reliable and interpretable results.
Facilitates precise measurements and visualizations.
Easier management of segmentation tasks.

Cons:

Requires detailed segmentation models for each pathology.
Time-consuming development and re-training when templates change.

Additional Context:

"We use fixed logic and templates for report generation instead of LLMs due to their unreliability."
— CTO at MedAI Startup

"Segmentation is often not used for modalities like chest X-rays due to their limited detail. However, end-to-end segmentation and text generation can be useful for other imaging modalities."
— Senior Research Engineer in Startup

Project Name	# stars	Description	Scenario	Source
Raidionics	⭐⭐⭐⭐	Provides a complete pipeline for medical image segmentation and report generation using templates	Detection, Segmentation, Reporting	GitHub
MONAI	⭐⭐⭐⭐	PyTorch-based framework for deep learning in healthcare imaging	Preprocessing, Classification, Segmentation	GitHub
Medical Detection Toolkit	⭐⭐⭐	Contains 2D + 3D implementations of prevalent object detectors for medical images	Detection, Segmentation	GitHub
TransUnet	⭐⭐⭐	Transformers for medical image segmentation	Segmentation	GitHub

Comparison of Approaches

End-to-End LLM Approach:

Pros: Direct generation of reports from images, handles raw data with rich context, flexible and scalable for various medical imaging tasks.
Cons: Requires extensive training data, may produce hallucinations and overfitting, needs robust filtering and augmentation.

Image-Segmentation Report Approach:

Pros: Reliable and interpretable results, facilitates precise measurements and visualizations, easier management of segmentation tasks.
Cons: Requires detailed segmentation models for each pathology, time-consuming development and re-training when templates change.

Both approaches have their strengths and are suited to different aspects of medical imaging and report generation. End-to-end LLM approaches are more flexible and scalable, while image-segmentation report approaches offer precision and reliability.

References

PromptMRG: arXiv
KERP: AAAI
IIHT: SpringerLink
MedRAT: Papers With Code
CheXagent: Hugging Face
LLaVA: BioNLP Workshop
Raidionics: GitHub
MONAI: GitHub
Medical Detection Toolkit: GitHub
TransUnet: GitHub
Annalise.ai: YouTube Video

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Medical Image Report Generation Models

End-to-End LLM Approach

Image-Segmentation Report Approach

Comparison of Approaches

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Medical Image Report Generation Models

End-to-End LLM Approach

Image-Segmentation Report Approach

Comparison of Approaches

References