Automatically apply chat template in non-chat scenarios #1533

sbalandi · 2025-01-13T11:51:14Z

ilya-lavrenov · 2025-01-13T12:57:10Z

src/README.md

@@ -73,6 +73,8 @@ output:
 'it is made up of carbon atoms. The carbon atoms are arranged in a linear pattern, which gives the yellow color. The arrangement of carbon atoms in'
 ```

+>**Note**: The chat_template from tokenizer_config.json will be automatically applied to the prompt at the generation stage. If you want to disable it, you can do it by calling pipe.get_tokenizer().set_chat_template("").


maybe we can also add to generate() method docs? for overloads which operate with strings for both VLM and LLM

Do you mean like that

openvino.genai/src/cpp/include/openvino/genai/llm_pipeline.hpp

Line 188 in 863f556

* chat_template will be applied to the prompt, run pipe.get_tokenizer().set_chat_template(custom_chat_template) to update it.

?

ilya-lavrenov · 2025-01-13T13:15:34Z

src/cpp/src/icontinuous_batching.cpp

+            try {
+                ChatHistory history({{{"role", "user"}, {"content", prompt}}});
+                constexpr bool add_generation_prompt = true;
+                auto templated_prompt = m_tokenizer.apply_chat_template(history, add_generation_prompt);


Looks like you need to fix our test framework to do the same as chat template is applied only for chat cases

The same for tests in .github folder.

src/cpp/src/icontinuous_batching.cpp

README.md

AlexKoff88 · 2025-01-14T06:55:52Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

ilya-lavrenov · 2025-01-14T07:06:02Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline?
do they apply chat_template by default?

@eaidova

AlexKoff88 · 2025-01-14T07:18:43Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?

@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

ilya-lavrenov · 2025-01-14T08:09:14Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

AlexKoff88 · 2025-01-14T08:56:47Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

ilya-lavrenov · 2025-01-14T13:08:31Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?

pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)

AlexKoff88 · 2025-01-14T13:35:33Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?
pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)

This option looks good to me but for drop-in replacement of HF API to OV GenAI it is better to follow HF approach with message format. Anyway, they should have more experience and user's feedback.

sbalandi · 2025-01-14T17:42:27Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?
pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)
This option looks good to me but for drop-in replacement of HF API to OV GenAI it is better to follow HF approach with message format. Anyway, they should have more experience and user's feedback.

Should both ways be added - possibility to put messages to the generate() function, apply chat_template if input value is messages and leave as is if it's string and add apply_chat_template as input parameter for generate() ?

github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) no-match-files labels Jan 13, 2025

ilya-lavrenov reviewed Jan 13, 2025

View reviewed changes

ilya-lavrenov assigned ilya-lavrenov and Wovchena Jan 13, 2025

ilya-lavrenov added this to the 2025.0 milestone Jan 13, 2025

ilya-lavrenov requested a review from AlexKoff88 January 13, 2025 13:16

Wovchena reviewed Jan 13, 2025

View reviewed changes

src/cpp/src/icontinuous_batching.cpp Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

github-actions bot added category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update category: GenAI C++ API Changes in GenAI C++ public headers labels Jan 13, 2025

sbalandi force-pushed the chat_templ branch 2 times, most recently from f1ece12 to e5fa889 Compare January 13, 2025 21:42

github-actions bot added the category: samples GenAI samples label Jan 13, 2025

sbalandi force-pushed the chat_templ branch from e5fa889 to 863f556 Compare January 13, 2025 21:46

sbalandi force-pushed the chat_templ branch from 863f556 to f5d74b4 Compare January 14, 2025 13:42

sbalandi added 3 commits January 14, 2025 14:41

Automatically apply chat template in non-chat scenarios

bde3a3d

update test and docs

11dec94

ci fix

f5d74b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically apply chat template in non-chat scenarios #1533

Automatically apply chat template in non-chat scenarios #1533

sbalandi commented Jan 13, 2025 •

edited

Loading

ilya-lavrenov Jan 13, 2025

sbalandi Jan 13, 2025

ilya-lavrenov Jan 13, 2025

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025 •

edited

Loading

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

sbalandi commented Jan 14, 2025

Automatically apply chat template in non-chat scenarios #1533

Are you sure you want to change the base?

Automatically apply chat template in non-chat scenarios #1533

Conversation

sbalandi commented Jan 13, 2025 • edited Loading

ilya-lavrenov Jan 13, 2025

Choose a reason for hiding this comment

sbalandi Jan 13, 2025

Choose a reason for hiding this comment

ilya-lavrenov Jan 13, 2025

Choose a reason for hiding this comment

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025 • edited Loading

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

sbalandi commented Jan 14, 2025

sbalandi commented Jan 13, 2025 •

edited

Loading

ilya-lavrenov commented Jan 14, 2025 •

edited

Loading