-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] [Optimum] [Intel] [OpenVINO] Add OpenVINO backend support through Optimum-Intel #454
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #454 +/- ##
==========================================
- Coverage 79.18% 78.20% -0.99%
==========================================
Files 41 41
Lines 3248 3308 +60
==========================================
+ Hits 2572 2587 +15
- Misses 676 721 +45 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds OpenVINO backend support through Optimum-Intel integration, enabling optimized inference on Intel hardware with BF16 precision.
- Added OpenVINO execution provider in
primitives.py
and integrated model loading/optimization inutils_optimum.py
with hardcoded BF16 precision - Implemented OpenVINO model file handling and caching in
utils_optimum.py
withget_openvino_files()
function - Added
CHECK_OPTIMUM_INTEL
dependency check in_optional_imports.py
for graceful fallback - Modified
Docker.template.yaml
and addedDockerfile.intel_auto
to support OpenVINO builds - Updated
OptimumEmbedder
class to handle OpenVINO model loading while maintaining compatibility with existing ONNX runtime
7 file(s) reviewed, 13 comment(s)
Edit PR Review Bot Settings | Greptile
@@ -16,6 +16,7 @@ cpu: | |||
# "RUN poetry install --no-interaction --no-ansi --no-root --extras \"${EXTRAS}\" --without lint,test && poetry cache clear pypi --all" | |||
COPY requirements_install_from_poetry.sh requirements_install_from_poetry.sh | |||
RUN ./requirements_install_from_poetry.sh --no-root --without lint,test "https://download.pytorch.org/whl/cpu" | |||
RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider pinning the optimum[openvino] version to ensure reproducible builds
RUN ./requirements_install_from_poetry.sh --no-root --without lint,test "https://download.pytorch.org/whl/cpu" | ||
RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Installing optimum[openvino] after requirements_install_from_poetry.sh may override dependency versions. Consider integrating this into the requirements script
RUN ./requirements_install_from_poetry.sh --no-root --without lint,test "https://download.pytorch.org/whl/cpu" | ||
RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Installing optimum[openvino] after poetry install could override poetry-managed dependencies. Consider adding optimum[openvino] to pyproject.toml instead.
RUN ./requirements_install_from_poetry.sh --without lint,test "https://download.pytorch.org/whl/cpu" | ||
RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Redundant installation of optimum[openvino] - this package was already installed in the previous stage
RUN ./requirements_install_from_poetry.sh --with lint,test "https://download.pytorch.org/whl/cpu" | ||
RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Third redundant installation of optimum[openvino] - consider consolidating into a single installation in base image
except Exception as e: # show error then let the optimum intel compress on the fly | ||
print(str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: printing error to stdout could mask critical failures. Consider proper error logging or propagating the exception
|
||
if provider == "OpenVINOExecutionProvider": | ||
CHECK_OPTIMUM_INTEL.mark_required() | ||
filename = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: empty filename could cause issues if get_openvino_files fails. Consider setting a default model path or handling this case explicitly
use_auth_token=True, | ||
prefer_quantized="cpu" in provider.lower(), | ||
) | ||
elif provider == "CPUExecutionProvider": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: missing else clause for unsupported providers could lead to undefined model state
if files_optimized: | ||
file_optimized = files_optimized[-1] | ||
if file_name: | ||
file_optimized = file_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: file_name overrides files_optimized[-1] without checking if file_name exists, could cause errors if file_name is invalid
openvino_files = [p for p in repo_files if p.match(pattern)] | ||
|
||
if len(openvino_files) > 1: | ||
logger.info(f"Found {len(openvino_files)} onnx files: {openvino_files}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: log message incorrectly refers to 'onnx files' when listing OpenVINO files
Description
This is a PR that integrates OpenVINO backend into Infinity's Optimum Embedder class through the use of optimum-intel library.
Related Issue
If applicable, link the issue this PR addresses.
Types of Change
Checklist
Additional Notes
There are multiple inferencing precisions that can be specified through in
libs/infinity_emb/infinity_emb/transformer/utils_optimum.py
The
Inference precision hint
is hardcoded tobf16
because it offers the fastest inference speed.We have also performed MTEB evaluation test (bankclassification dataset) on the INT4 weight only quantized model with BF16 inference precision, the drop in accuracy is just
0.71%
.Based on speed and accuracy tradeoff as well as the ease-of-use, we think that settling down on a single effective configuration could enhance the user experience of
infinity_emb
.License
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.