You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This would require design, but I think there's a need for something here.
Right now you can add ImageContent (or possibly DataContent) to a ChatMessage to send it to the LLM and get back a response about it. That's great for one-shot usage but doesn't seem like a viable pattern for stateful conversations. What would typically happen:
User picks an image from their camera roll on their phone, adding a ~4MiB file to the chat history, which we base64-encode
Now this gets re-sent to the backend on every subsequent interaction with this chat history
I don't think this would ever be what the app developer wants to happen. It would be very slow and expensive. And the user might continue to supply more images, making things ever slower and more expensive.
Possible solutions
History reduction
An obvious, and fairly cheap option, would be some middleware that strips images from chat history if they are not part of the last message. Then each image only gets sent once, giving the LLM one chance to do something with it.
This has some serious drawbacks:
You can't send a message with some image and then in a subsequent message ask "what's in that image"? Or at least not reliably. If the LLM responded to the original message with a description then it would likely work, but if it did not, the information will have been lost already.
We either have to mutate the original ChatMessage to remove the image (which might make it disappear unexpectedly from the UI), or clone the entire List<ChatMessage> to replace all image-containing non-final ChatMessage instances with instances that don't contain images. This could confuse later middleware, so you really want this image-reduction middleware to go last.
Arguably that's just a special case of the drawback innate to any chat history reduction middleware, so maybe we have to deal with that anyway.
Passing URIs
Another option would be some kind of helpers or guidance for avoiding putting base64-encoded images in chat history in the first place, and instead making them available as publicly-reachable web resources with some URI. For example OpenAI accepts a image_url. This is complicated in other ways:
It's completely impossible if you're just building a standalone client app and don't even have a web server / blob storage account / etc.
Even if you do have a webserver this is complicated, because now you have to manage storage and lifetimes and cleanup - we can't provide any simple helper that would do it in general (at minimum, it requires shared storage in common to all your webservers, such as blob storage).
Persistent threads
If you're working with something like OpenAI Assistants, I suspect the problem can be skipped entirely since each chat message only needs to be sent once within a persistent thread.
The text was updated successfully, but these errors were encountered:
This would require design, but I think there's a need for something here.
Right now you can add
ImageContent
(or possiblyDataContent
) to aChatMessage
to send it to the LLM and get back a response about it. That's great for one-shot usage but doesn't seem like a viable pattern for stateful conversations. What would typically happen:I don't think this would ever be what the app developer wants to happen. It would be very slow and expensive. And the user might continue to supply more images, making things ever slower and more expensive.
Possible solutions
History reduction
An obvious, and fairly cheap option, would be some middleware that strips images from chat history if they are not part of the last message. Then each image only gets sent once, giving the LLM one chance to do something with it.
This has some serious drawbacks:
ChatMessage
to remove the image (which might make it disappear unexpectedly from the UI), or clone the entireList<ChatMessage>
to replace all image-containing non-finalChatMessage
instances with instances that don't contain images. This could confuse later middleware, so you really want this image-reduction middleware to go last.Passing URIs
Another option would be some kind of helpers or guidance for avoiding putting base64-encoded images in chat history in the first place, and instead making them available as publicly-reachable web resources with some URI. For example OpenAI accepts a
image_url
. This is complicated in other ways:Persistent threads
If you're working with something like OpenAI Assistants, I suspect the problem can be skipped entirely since each chat message only needs to be sent once within a persistent thread.
The text was updated successfully, but these errors were encountered: