Violet: Arabic Image Captioning

Overview

Violet is a vision-language model designed to generate high-quality Arabic image captions. Built around a Gemini Decoder and a pretrained transformer, Violet bridges the gap between computer vision and natural language processing (NLP) for Arabic. The repository provides a simple, effective, and streamlined pipeline that can handle a variety of image formats and produce descriptive captions in Arabic with minimal effort.

This repository is built on the model proposed in the paper Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder.

Key Features:

Arabic Image Captioning: Generate high-quality captions for images in Arabic.
Visual Feature Extraction: Extract image features for integration into vision-language models or downstream tasks.
Mixed Input Support: Handle batches of images in various formats, such as URLs, file paths, NumPy arrays, PyTorch tensors, and PIL Image objects.
Pretrained Model: Leverages a robust pretrained model, requiring no additional training

Installation:

Option 1: Install via `pip`

pip install git+https://github.com/Mahmood-Anaam/violet.git --quiet

Option 2: Clone Repository and Install in Editable Mode

git clone https://github.com/Mahmood-Anaam/violet.git
cd violet
pip install -e .

Option 3: Use Conda Environment

git clone https://github.com/Mahmood-Anaam/violet.git
cd violet
conda env create -f environment.yml
conda activate violet
pip install -e .

Quickstart:

Generate Captions for Images

from violet.pipeline import VioletImageCaptioningPipeline
from violet.configuration import VioletConfig

# Initialize the pipeline
pipeline = VioletImageCaptioningPipeline(VioletConfig)

# Caption a single image
caption = pipeline("http://images.cocodataset.org/val2017/000000039769.jpg")
print(caption)

# Caption a batch of images
images = [
    "http://images.cocodataset.org/val2017/000000039769.jpg",
    "/path/to/local/image.jpg",
    np.random.rand(224, 224, 3),
    torch.randn(3, 224, 224),
    Image.open("/path/to/pil/image.jpg"),
]
captions = pipeline(images)
for caption in captions:
    print(caption)

Additional Capabilities:

Feature Extraction (Optional)

If needed, extract visual features for further processing:

# Extract features from a single image
features = pipeline.generate_features("http://images.cocodataset.org/val2017/000000039769.jpg")
print(features.shape)

# Extract features for a batch of images
batch_features = pipeline.generate_features(images)
print(batch_features.shape)

Caption Generation from Precomputed Features

Generate captions from extracted visual features:

captions = pipeline.generate_captions_from_features(features)
for caption in captions:
    print(caption)

Example Usage in Google Colab

Interactive Jupyter notebooks are provided to demonstrate Violet's capabilities. You can open these notebooks in Google Colab:

Image Captioning Demo
Feature Extraction Demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Violet: Arabic Image Captioning

Overview

Key Features:

Installation:

Option 1: Install via `pip`

Option 2: Clone Repository and Install in Editable Mode

Option 3: Use Conda Environment

Quickstart:

Generate Captions for Images

Additional Capabilities:

Feature Extraction (Optional)

Caption Generation from Precomputed Features

Example Usage in Google Colab

Files

README.md

Latest commit

History

README.md

File metadata and controls

Violet: Arabic Image Captioning

Overview

Key Features:

Installation:

Option 1: Install via pip

Option 2: Clone Repository and Install in Editable Mode

Option 3: Use Conda Environment

Quickstart:

Generate Captions for Images

Additional Capabilities:

Feature Extraction (Optional)

Caption Generation from Precomputed Features

Example Usage in Google Colab

Option 1: Install via `pip`