-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically apply chat template in non-chat scenarios #1533
base: master
Are you sure you want to change the base?
Conversation
src/README.md
Outdated
@@ -73,6 +73,8 @@ output: | |||
'it is made up of carbon atoms. The carbon atoms are arranged in a linear pattern, which gives the yellow color. The arrangement of carbon atoms in' | |||
``` | |||
|
|||
>**Note**: The chat_template from tokenizer_config.json will be automatically applied to the prompt at the generation stage. If you want to disable it, you can do it by calling pipe.get_tokenizer().set_chat_template(""). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can also add to generate() method docs? for overloads which operate with strings for both VLM and LLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean like that
* chat_template will be applied to the prompt, run pipe.get_tokenizer().set_chat_template(custom_chat_template) to update it. |
try { | ||
ChatHistory history({{{"role", "user"}, {"content", prompt}}}); | ||
constexpr bool add_generation_prompt = true; | ||
auto templated_prompt = m_tokenizer.apply_chat_template(history, add_generation_prompt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you need to fix our test framework to do the same as chat template is applied only for chat cases
The same for tests in .github folder.
f1ece12
to
e5fa889
Compare
If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench). |
what's about HF e2e pipeline? |
|
what if it's instruction model? |
Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied. |
do you think it's better to add explicit flag, then? pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40) |
This option looks good to me but for drop-in replacement of HF API to OV GenAI it is better to follow HF approach with message format. Anyway, they should have more experience and user's feedback. |
Should both ways be added - possibility to put |
CVS-157276