Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image reduction middleware for Microsoft.Extensions.AI #5747

Open
SteveSandersonMS opened this issue Dec 17, 2024 · 0 comments
Open

Image reduction middleware for Microsoft.Extensions.AI #5747

SteveSandersonMS opened this issue Dec 17, 2024 · 0 comments
Labels
area-AI bug This issue describes a behavior which is not expected - a bug. untriaged

Comments

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Dec 17, 2024

This would require design, but I think there's a need for something here.

Right now you can add ImageContent (or possibly DataContent) to a ChatMessage to send it to the LLM and get back a response about it. That's great for one-shot usage but doesn't seem like a viable pattern for stateful conversations. What would typically happen:

  • User picks an image from their camera roll on their phone, adding a ~4MiB file to the chat history, which we base64-encode
  • Now this gets re-sent to the backend on every subsequent interaction with this chat history

I don't think this would ever be what the app developer wants to happen. It would be very slow and expensive. And the user might continue to supply more images, making things ever slower and more expensive.

Possible solutions

History reduction

An obvious, and fairly cheap option, would be some middleware that strips images from chat history if they are not part of the last message. Then each image only gets sent once, giving the LLM one chance to do something with it.

This has some serious drawbacks:

  • You can't send a message with some image and then in a subsequent message ask "what's in that image"? Or at least not reliably. If the LLM responded to the original message with a description then it would likely work, but if it did not, the information will have been lost already.
  • We either have to mutate the original ChatMessage to remove the image (which might make it disappear unexpectedly from the UI), or clone the entire List<ChatMessage> to replace all image-containing non-final ChatMessage instances with instances that don't contain images. This could confuse later middleware, so you really want this image-reduction middleware to go last.
    • Arguably that's just a special case of the drawback innate to any chat history reduction middleware, so maybe we have to deal with that anyway.

Passing URIs

Another option would be some kind of helpers or guidance for avoiding putting base64-encoded images in chat history in the first place, and instead making them available as publicly-reachable web resources with some URI. For example OpenAI accepts a image_url. This is complicated in other ways:

  • It's completely impossible if you're just building a standalone client app and don't even have a web server / blob storage account / etc.
  • Even if you do have a webserver this is complicated, because now you have to manage storage and lifetimes and cleanup - we can't provide any simple helper that would do it in general (at minimum, it requires shared storage in common to all your webservers, such as blob storage).

Persistent threads

If you're working with something like OpenAI Assistants, I suspect the problem can be skipped entirely since each chat message only needs to be sent once within a persistent thread.

@SteveSandersonMS SteveSandersonMS added bug This issue describes a behavior which is not expected - a bug. untriaged area-AI labels Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-AI bug This issue describes a behavior which is not expected - a bug. untriaged
Projects
None yet
Development

No branches or pull requests

1 participant