Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Samples] merge LLM samples to "text_generation" folder #1411

Merged
merged 9 commits into from
Jan 13, 2025

Conversation

olpipi
Copy link
Collaborator

@olpipi olpipi commented Dec 19, 2024

No description provided.

@github-actions github-actions bot added category: GHA CI based on Github actions category: cmake / build Cmake scripts category: samples GenAI samples labels Dec 19, 2024
@olpipi olpipi force-pushed the samples_movement branch 2 times, most recently from 9e7f861 to 3901fbb Compare December 19, 2024 11:22
samples/cpp/text_generation/CMakeLists.txt Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Show resolved Hide resolved
samples/cpp/text_generation/README.md Show resolved Hide resolved
@ilya-lavrenov ilya-lavrenov changed the title Samples movement [Samples] merge LLM samples to "text_generation" folder Dec 20, 2024
samples/cpp/text_generation/CMakeLists.txt Outdated Show resolved Hide resolved
samples/CMakeLists.txt Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
github-merge-queue bot pushed a commit to openvinotoolkit/openvino that referenced this pull request Jan 3, 2025
github-merge-queue bot pushed a commit to openvinotoolkit/openvino that referenced this pull request Jan 3, 2025
@olpipi
Copy link
Collaborator Author

olpipi commented Jan 3, 2025

If there is no more major comments, I will make the similar changes to python samples

Copy link
Collaborator

@Wovchena Wovchena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a merge conflict

@olpipi olpipi enabled auto-merge January 9, 2025 12:26
@olpipi olpipi disabled auto-merge January 9, 2025 12:27
@olpipi
Copy link
Collaborator Author

olpipi commented Jan 9, 2025

@ilya-lavrenov re-review please. The PR cannot be merged without +1 from you

src/README.md Show resolved Hide resolved
samples/cpp/text_generation/README.md Outdated Show resolved Hide resolved
@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Jan 10, 2025
@olpipi olpipi added this pull request to the merge queue Jan 13, 2025
This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `openvino_genai.LLMPipeline` and configures it to run the simplest deterministic greedy sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.
These samples showcase the use of OpenVINO's inference capabilities for text generation tasks, including different decoding strategies such as beam search, multinomial sampling, and speculative decoding. Each sample has a specific focus and demonstrates a unique aspect of text generation.
The applications don't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU.
There are also Jupyter notebooks for some samples. You can find links to them in the appropriate sample descritions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: descritions -> descriptions


## Download and convert the model and tokenizers

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.

Install [../../export-requirements.txt](../../export-requirements.txt) to convert a model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to have the clear message in the readme, that export-requirements.txt is needed for conversion/optimization and deployment-requirements.txt is needed to run the sample instead of installing both and mentioning, that export-requirements.txt isn't needed if model is already exported.
It will be easier for developer to understand, which dependencies are necessary for which stage (model preparation vs model deployment)
"Install ../../export-requirements.txt to convert a model" looks more appropriate in this case


```sh
pip install --upgrade-strategy eager -r ../../export-requirements.txt
pip install --upgrade-strategy eager -r ../../requirements.txt
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about also adding two other options to prepare the model here:

  1. Download the converted model from HF
    huggingface-cli download "OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov" --local-dir TinyLlama-1.1B-Chat-v1.0-int8-ov
  2. Convert model and compress weights to int4 precision
    optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 TinyLlama-1.1B-Chat-v1.0-int4

It will help developer to see, that there are multiple options to prepare the model.
We can also emphasize, that "Download the converted model from HF" can be preferred option here for the sample (no need to spend time for conversion )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is huggingface-cli installed by export-requirements.txt as dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it isn't. Please add huggingface_hub to export-requirements.txt


## Sample Descriptions
### Common information
Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples.

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can recommend using GPU without mentioning discrete GPU, because iGPU works perfectly with LLMs. Recommendation can be done due to performance and not the memory.
Which dGPU with 32GB+ RAM is meant here exactly?
Intel ARC has up to 16Gb memory

./beam_search_causal_lm <MODEL_DIR> "<PROMPT 1>" ["<PROMPT 2>" ...]
```

### 3. Chat Sample (`chat_sample`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider to put chat sample at the first place in the list as it is the most popular sample


## Sample Descriptions
### Common information
Follow [Get Started with Samples](https://docs.openvino.ai/2024/learn-openvino/openvino-samples/get-started-demos.html) to get common information about OpenVINO samples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear instructions how to build samples to be provided. For example, https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/BUILD.md to be extended with sample build section and these instructions to be linked from this readme

Merged via the queue into openvinotoolkit:master with commit 8b62451 Jan 13, 2025
59 checks passed
github-merge-queue bot pushed a commit to openvinotoolkit/openvino that referenced this pull request Jan 13, 2025
### Details:
 - Update links to genai samples
- related to openvinotoolkit/openvino.genai#1411

### Tickets:
 - *ticket-id*
github-merge-queue bot pushed a commit to openvinotoolkit/openvino that referenced this pull request Jan 13, 2025
### Details:
 - Update links to genai samples to 2024.6 branch
- related to openvinotoolkit/openvino.genai#1411

### Tickets:
 - *ticket-id*
github-merge-queue bot pushed a commit to openvinotoolkit/openvino that referenced this pull request Jan 14, 2025
@olpipi olpipi mentioned this pull request Jan 14, 2025
@olpipi
Copy link
Collaborator Author

olpipi commented Jan 14, 2025

@DimaPastushenkov I addressed your comments in #1545. Please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: cmake / build Cmake scripts category: GHA CI based on Github actions category: samples GenAI samples no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants