From 75a5f14446e73497f5e891f7981ded3b7d530ecb Mon Sep 17 00:00:00 2001 From: Oleg Pipikin Date: Tue, 14 Jan 2025 15:04:38 +0000 Subject: [PATCH 1/2] Update samples readme --- samples/cpp/text_generation/README.md | 59 +++++++++++++----------- samples/python/text_generation/README.md | 59 +++++++++++++----------- 2 files changed, 66 insertions(+), 52 deletions(-) diff --git a/samples/cpp/text_generation/README.md b/samples/cpp/text_generation/README.md index d9e5bd8d22..9eb91ee649 100644 --- a/samples/cpp/text_generation/README.md +++ b/samples/cpp/text_generation/README.md @@ -2,7 +2,7 @@ These samples showcase the use of OpenVINO's inference capabilities for text generation tasks, including different decoding strategies such as beam search, multinomial sampling, and speculative decoding. Each sample has a specific focus and demonstrates a unique aspect of text generation. The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. -There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descritions. +There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descriptions. ## Table of Contents 1. [Download and Convert the Model and Tokenizers](#download-and-convert-the-model-and-tokenizers) @@ -11,25 +11,49 @@ There are also Jupyter notebooks for some samples. You can find links to them in 4. [Support and Contribution](#support-and-contribution) ## Download and convert the model and tokenizers - The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. - -It's not required to install [../../export-requirements.txt](../../export-requirements.txt) for deployment if the model has already been exported. - +Install [../../export-requirements.txt](../../export-requirements.txt) if model conversion is required. ```sh -pip install --upgrade-strategy eager -r ../../requirements.txt +pip install --upgrade-strategy eager -r ../../export-requirements.txt optimim-cli export openvino --model ``` +If a HF model is already converted (as example [OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov)), it can be download directly via huggingface-cli +```sh +huggingface-cli download --local-dir +``` ## Sample Descriptions ### Common information Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples. +Follow [build instruction](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/BUILD.md) to build GenAI samples -Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU. +GPUs usually provide better performance compared to CPUs. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a GPU. Modify the source code to change the device for inference to the GPU. See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models. -### 1. Greedy Causal LM (`greedy_causal_lm`) +Install [../../deployment-requirements.txt](../../deployment-requirements.txt) to run samples +```sh +pip install --upgrade-strategy eager -r ../../deployment-requirements.txt +``` + +### 1. Chat Sample (`chat_sample`) +- **Description:** +Interactive chat interface powered by OpenVINO. +Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python. +Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc +- **Main Feature:** Real-time chat-like text generation. +- **Run Command:** + ```bash + ./chat_sample + ``` +#### Missing chat template +If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model. +The following template can be used as a default, but it may not work properly with every model: +``` +"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}", +``` + +### 2. Greedy Causal LM (`greedy_causal_lm`) - **Description:** Basic text generation using a causal language model. Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python. @@ -40,7 +64,7 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc ./greedy_causal_lm "" ``` -### 2. Beam Search Causal LM (`beam_search_causal_lm`) +### 3. Beam Search Causal LM (`beam_search_causal_lm`) - **Description:** Uses beam search for more coherent text generation. Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python. @@ -51,23 +75,6 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc ./beam_search_causal_lm "" ["" ...] ``` -### 3. Chat Sample (`chat_sample`) -- **Description:** -Interactive chat interface powered by OpenVINO. -Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python. -Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc -- **Main Feature:** Real-time chat-like text generation. -- **Run Command:** - ```bash - ./chat_sample - ``` -#### Missing chat template -If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model. -The following template can be used as a default, but it may not work properly with every model: -``` -"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}", -``` - ### 4. Multinomial Causal LM (`multinomial_causal_lm`) - **Description:** Text generation with multinomial sampling for diversity. Recommended models: meta-llama/Llama-2-7b-hf, etc diff --git a/samples/python/text_generation/README.md b/samples/python/text_generation/README.md index 9940904cfb..d0ddc8c219 100644 --- a/samples/python/text_generation/README.md +++ b/samples/python/text_generation/README.md @@ -2,7 +2,7 @@ These samples showcase the use of OpenVINO's inference capabilities for text generation tasks, including different decoding strategies such as beam search, multinomial sampling, and speculative decoding. Each sample has a specific focus and demonstrates a unique aspect of text generation. The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. -There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descritions. +There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descriptions. ## Table of Contents 1. [Download and Convert the Model and Tokenizers](#download-and-convert-the-model-and-tokenizers) @@ -11,25 +11,49 @@ There are also Jupyter notebooks for some samples. You can find links to them in 4. [Support and Contribution](#support-and-contribution) ## Download and convert the model and tokenizers - The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. - -It's not required to install [../../export-requirements.txt](../../export-requirements.txt) for deployment if the model has already been exported. - +Install [../../export-requirements.txt](../../export-requirements.txt) if model conversion is required. ```sh -pip install --upgrade-strategy eager -r ../../requirements.txt +pip install --upgrade-strategy eager -r ../../export-requirements.txt optimim-cli export openvino --model ``` +If a HF model is already converted (as example [OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov)), it can be download directly via huggingface-cli +```sh +huggingface-cli download --local-dir +``` ## Sample Descriptions ### Common information Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples. +Follow [build instruction](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/BUILD.md) to build GenAI samples -Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU. +GPUs usually provide better performance compared to CPUs. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a GPU. Modify the source code to change the device for inference to the GPU. See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models. -### 1. Greedy Causal LM (`greedy_causal_lm`) +Install [../../deployment-requirements.txt](../../deployment-requirements.txt) to run samples +```sh +pip install --upgrade-strategy eager -r ../../deployment-requirements.txt +``` + +### 1. Chat Sample (`chat_sample`) +- **Description:** +Interactive chat interface powered by OpenVINO. +Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python. +Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc +- **Main Feature:** Real-time chat-like text generation. +- **Run Command:** + ```bash + python chat_sample.py model_dir + ``` +#### Missing chat template +If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model. +The following template can be used as a default, but it may not work properly with every model: +``` +"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}", +``` + +### 2. Greedy Causal LM (`greedy_causal_lm`) - **Description:** Basic text generation using a causal language model. Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python. @@ -40,7 +64,7 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc python greedy_causal_lm.py [-h] model_dir prompt ``` -### 2. Beam Search Causal LM (`beam_search_causal_lm`) +### 3. Beam Search Causal LM (`beam_search_causal_lm`) - **Description:** Uses beam search for more coherent text generation. Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-question-answering) that provides an example of LLM-powered text generation in Python. @@ -51,23 +75,6 @@ Recommended models: meta-llama/Llama-2-7b-hf, etc python beam_search_causal_lm.py model_dir prompt [prompts ...] ``` -### 3. Chat Sample (`chat_sample`) -- **Description:** -Interactive chat interface powered by OpenVINO. -Here is a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) that provides an example of LLM-powered text generation in Python. -Recommended models: meta-llama/Llama-2-7b-chat-hf, TinyLlama/TinyLlama-1.1B-Chat-v1.0, etc -- **Main Feature:** Real-time chat-like text generation. -- **Run Command:** - ```bash - python chat_sample.py model_dir - ``` -#### Missing chat template -If you encounter an exception indicating a missing "chat template" when launching the `ov::genai::LLMPipeline` in chat mode, it likely means the model was not tuned for chat functionality. To work this around, manually add the chat template to tokenizer_config.json of your model. -The following template can be used as a default, but it may not work properly with every model: -``` -"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}", -``` - ### 4. Multinomial Causal LM (`multinomial_causal_lm`) - **Description:** Text generation with multinomial sampling for diversity. Recommended models: meta-llama/Llama-2-7b-hf, etc From 1b3d849bb8a1b2dff84307aa89da7676ebf438d2 Mon Sep 17 00:00:00 2001 From: Oleg Pipikin Date: Tue, 14 Jan 2025 17:42:02 +0000 Subject: [PATCH 2/2] Apply comments --- samples/cpp/text_generation/README.md | 1 + samples/python/text_generation/README.md | 1 + 2 files changed, 2 insertions(+) diff --git a/samples/cpp/text_generation/README.md b/samples/cpp/text_generation/README.md index 9eb91ee649..de9d770446 100644 --- a/samples/cpp/text_generation/README.md +++ b/samples/cpp/text_generation/README.md @@ -19,6 +19,7 @@ optimim-cli export openvino --model ``` If a HF model is already converted (as example [OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov)), it can be download directly via huggingface-cli ```sh +pip install --upgrade-strategy eager -r ../../export-requirements.txt huggingface-cli download --local-dir ``` diff --git a/samples/python/text_generation/README.md b/samples/python/text_generation/README.md index d0ddc8c219..64395506b4 100644 --- a/samples/python/text_generation/README.md +++ b/samples/python/text_generation/README.md @@ -19,6 +19,7 @@ optimim-cli export openvino --model ``` If a HF model is already converted (as example [OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov)), it can be download directly via huggingface-cli ```sh +pip install --upgrade-strategy eager -r ../../export-requirements.txt huggingface-cli download --local-dir ```