-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Samples] merge LLM samples to "text_generation" folder #1411
Conversation
9e7f861
to
3901fbb
Compare
port: #28248 connected to: openvinotoolkit/openvino.genai#1411
connected to: openvinotoolkit/openvino.genai#1411 Co-authored-by: Andrzej Kopytko <[email protected]>
b270500
to
a48de38
Compare
If there is no more major comments, I will make the similar changes to python samples |
a48de38
to
9a9d41c
Compare
6db3e88
to
3b54139
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a merge conflict
3b54139
to
1457292
Compare
@ilya-lavrenov re-review please. The PR cannot be merged without +1 from you |
This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `openvino_genai.LLMPipeline` and configures it to run the simplest deterministic greedy sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python. | ||
These samples showcase the use of OpenVINO's inference capabilities for text generation tasks, including different decoding strategies such as beam search, multinomial sampling, and speculative decoding. Each sample has a specific focus and demonstrates a unique aspect of text generation. | ||
The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. | ||
There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descritions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: descritions -> descriptions
|
||
## Download and convert the model and tokenizers | ||
|
||
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. | ||
|
||
Install [../../export-requirements.txt](../../export-requirements.txt) to convert a model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to have the clear message in the readme, that export-requirements.txt is needed for conversion/optimization and deployment-requirements.txt is needed to run the sample instead of installing both and mentioning, that export-requirements.txt isn't needed if model is already exported.
It will be easier for developer to understand, which dependencies are necessary for which stage (model preparation vs model deployment)
"Install ../../export-requirements.txt to convert a model" looks more appropriate in this case
|
||
```sh | ||
pip install --upgrade-strategy eager -r ../../export-requirements.txt | ||
pip install --upgrade-strategy eager -r ../../requirements.txt | ||
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about also adding two other options to prepare the model here:
- Download the converted model from HF
huggingface-cli download "OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov" --local-dir TinyLlama-1.1B-Chat-v1.0-int8-ov - Convert model and compress weights to int4 precision
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 TinyLlama-1.1B-Chat-v1.0-int4
It will help developer to see, that there are multiple options to prepare the model.
We can also emphasize, that "Download the converted model from HF" can be preferred option here for the sample (no need to spend time for conversion )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is huggingface-cli installed by export-requirements.txt as dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it isn't. Please add huggingface_hub to export-requirements.txt
|
||
## Sample Descriptions | ||
### Common information | ||
Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples. | ||
|
||
Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can recommend using GPU without mentioning discrete GPU, because iGPU works perfectly with LLMs. Recommendation can be done due to performance and not the memory.
Which dGPU with 32GB+ RAM is meant here exactly?
Intel ARC has up to 16Gb memory
./beam_search_causal_lm <MODEL_DIR> "<PROMPT 1>" ["<PROMPT 2>" ...] | ||
``` | ||
|
||
### 3. Chat Sample (`chat_sample`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider to put chat sample at the first place in the list as it is the most popular sample
|
||
## Sample Descriptions | ||
### Common information | ||
Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear instructions how to build samples to be provided. For example, https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/BUILD.md to be extended with sample build section and these instructions to be linked from this readme
### Details: - Update links to genai samples - related to openvinotoolkit/openvino.genai#1411 ### Tickets: - *ticket-id*
### Details: - Update links to genai samples to 2024.6 branch - related to openvinotoolkit/openvino.genai#1411 ### Tickets: - *ticket-id*
port: #28248 connected to: openvinotoolkit/openvino.genai#1411
@DimaPastushenkov I addressed your comments in #1545. Please review |
No description provided.