You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
huggingface transformers 4.34, which is quite new, has support for "chat templates" and can also tell you the size of a chat in tokens.
however, a lot of models don't have the required chat templates (yet?) and getting chat templates for some models (e.g., llama2) requires special permission even if a derived quantized model was not behind a signup wall.
Use this tech, or something like it, to replace the hard-coded query formatting of the llama_cpp backend and to improve the length of the query itself instead of having the hard-coded limit of 5 messages.
The text was updated successfully, but these errors were encountered:
huggingface transformers 4.34, which is quite new, has support for "chat templates" and can also tell you the size of a chat in tokens.
however, a lot of models don't have the required chat templates (yet?) and getting chat templates for some models (e.g., llama2) requires special permission even if a derived quantized model was not behind a signup wall.
Use this tech, or something like it, to replace the hard-coded query formatting of the llama_cpp backend and to improve the length of the query itself instead of having the hard-coded limit of 5 messages.
The text was updated successfully, but these errors were encountered: