Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve query formatting & sizeing #23

Open
jepler opened this issue Oct 9, 2023 · 0 comments
Open

Improve query formatting & sizeing #23

jepler opened this issue Oct 9, 2023 · 0 comments

Comments

@jepler
Copy link
Owner

jepler commented Oct 9, 2023

huggingface transformers 4.34, which is quite new, has support for "chat templates" and can also tell you the size of a chat in tokens.

however, a lot of models don't have the required chat templates (yet?) and getting chat templates for some models (e.g., llama2) requires special permission even if a derived quantized model was not behind a signup wall.

Use this tech, or something like it, to replace the hard-coded query formatting of the llama_cpp backend and to improve the length of the query itself instead of having the hard-coded limit of 5 messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant