Maximum Context Length Exceeded Due to Base64-Encoded Image in Prompt #24109

tusharraskar · 2024-07-11T07:20:01Z

tusharraskar
Jul 11, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

llm = VLLMOpenAI(
openai_api_key="EMPTY",
openai_api_base=api_base,
model_name="microsoft/Phi-3-vision-128k-instruct",
model_kwargs={"stop": ["."]}
)

image_path = "invoice_data_images/Screenshot 2024-05-02 160946_page_1.png"
with open(image_path, "rb") as image_file:
image_base64 = base64.b64encode(image_file.read()).decode("utf-8")

prompt_1 = "Give me the invoice date from the given image."
messages = [
HumanMessage(
content=[
{"type": "text", "text": prompt_1},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}}
]
)
]

response = llm.invoke(messages)
print(response)

Description

I hosted VLLM on an EC2 instance to extract text data from images using the Phi-3 Vision model. The model is hosted with the following command:

python3 -m vllm.entrypoints.openai.api_server --port 8000 --model microsoft/Phi-3-vision-128k-instruct --trust-remote-code --dtype=half --max_model_len=8192

When running the code, I encounter a BadRequestError due to exceeding the maximum context length. The error message indicates that the total number of tokens requested is 254457, which far exceeds the model's limit of 8192 tokens.

The base64-encoded image is being considered part of the prompt, significantly increasing the token count and leading to the context length issue.

Even if the model's context length were 128k tokens, including a base64-encoded image in the prompt would always exceed the model's limit.

System Info

langchain==0.2.7
langchain-community==0.2.7
langchain-core==0.2.12
langchain-text-splitters==0.2.2

@dosu · 2024-07-11T07:28:44Z

dosubot[bot]
bot Jul 11, 2024

To handle the base64-encoded image in the prompt without exceeding the model's context length, you can split the image and text types and then process the image separately. Here is a relevant code snippet to help you achieve this:

import base64
import io
from PIL import Image
from langchain_core.documents import Document
from langchain_core.messages import HumanMessage
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

def resize_base64_image(base64_string, size=(128, 128)):
    """
    Resize an image encoded as a Base64 string.

    :param base64_string: A Base64 encoded string of the image to be resized.
    :param size: A tuple representing the new size (width, height) for the image.
    :return: A Base64 encoded string of the resized image.
    """
    img_data = base64.b64decode(base64_string)
    img = Image.open(io.BytesIO(img_data))
    resized_img = img.resize(size, Image.LANCZOS)
    buffered = io.BytesIO()
    resized_img.save(buffered, format=img.format)
    return base64.b64encode(buffered.getvalue()).decode("utf-8")

def get_resized_images(docs):
    """
    Resize images from base64-encoded strings.

    :param docs: A list of base64-encoded image to be resized.
    :return: Dict containing a list of resized base64-encoded strings.
    """
    b64_images = []
    for doc in docs:
        if isinstance(doc, Document):
            doc = doc.page_content
        resized_image = resize_base64_image(doc, size=(1280, 720))
        b64_images.append(resized_image)
    return {"images": b64_images}

def img_prompt_func(data_dict, num_images=2):
    """
    GPT-4V prompt for image analysis.

    :param data_dict: A dict with images and a user-provided question.
    :param num_images: Number of images to include in the prompt.
    :return: A list containing message objects for each image and the text prompt.
    """
    messages = []
    if data_dict["context"]["images"]:
        for image in data_dict["context"]["images"][:num_images]:
            image_message = {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{image}"},
            }
            messages.append(image_message)
    text_message = {
        "type": "text",
        "text": (
            "You are an analyst tasked with answering questions about visual content.\n"
            "You will be give a set of image(s) from a slide deck / presentation.\n"
            "Use this information to answer the user question. \n"
            f"User-provided question: {data_dict['question']}\n\n"
        ),
    }
    messages.append(text_message)
    return [HumanMessage(content=messages)]

def multi_modal_rag_chain(retriever):
    """
    Multi-modal RAG chain,

    :param retriever: A function that retrieves the necessary context for the model.
    :return: A chain of functions representing the multi-modal RAG process.
    """
    # Initialize the multi-modal Large Language Model with specific parameters
    model = ChatOpenAI(temperature=0, model="gpt-4-vision-preview", max_tokens=1024)

    # Define the RAG pipeline
    chain = (
        {
            "context": retriever | RunnableLambda(get_resized_images),
            "question": RunnablePassthrough(),
        }
        | RunnableLambda(img_prompt_func)
        | model
        | StrOutputParser()
    )

    return chain

This code splits the base64-encoded images and texts, processes them separately, and then combines them in a way that optimizes the context length for the model ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum Context Length Exceeded Due to Base64-Encoded Image in Prompt #24109

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Maximum Context Length Exceeded Due to Base64-Encoded Image in Prompt #24109

tusharraskar Jul 11, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment

dosubot[bot] bot Jul 11, 2024

tusharraskar
Jul 11, 2024

dosubot[bot]
bot Jul 11, 2024