-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor -> select_model(functional) #468
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR refactors the model selection and batch handling system to improve multiprocessing capabilities and support for new models like nomic-embed-text-v1.5.
- Replaced direct model instantiation with factory functions in
/libs/infinity_emb/infinity_emb/inference/batch_handler.py
for better multiprocessing support - Added
CallableReturningBaseTypeHint
Protocol in/libs/infinity_emb/infinity_emb/transformer/abstract.py
to improve type safety - Simplified
select_model()
in/libs/infinity_emb/infinity_emb/inference/select_model.py
to return callable engine functions instead of tuples with timing info - Added
tiktoken
as required dependency in/libs/infinity_emb/pyproject.toml
for nomic-embed-text-v1.5 support - Modified telemetry in
/libs/infinity_emb/infinity_emb/infinity_server.py
to use empty dicts instead of engine capabilities
9 file(s) reviewed, 8 comment(s)
Edit PR Review Bot Settings | Greptile
|
||
# TODO: Can be parallelized | ||
for device_map in engine_args._loading_strategy.device_mapping: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: type: ignore on device_mapping access should be replaced with proper type annotation
assert len(engine_replicas) > 0, "No engine replicas were loaded" | ||
|
||
return engine_replicas, min_inference_t, max_inference_t | ||
return engine_replicas # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: type: ignore on return is unnecessary since return type matches annotation
@@ -57,55 +61,51 @@ def get_engine_type_from_config( | |||
return EmbedderEngine.from_inference_engine(engine_args.engine) | |||
|
|||
|
|||
def _get_engine_replica(unloaded_engine, engine_args, device_map) -> "BaseTypeHint": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: function lacks type hints for unloaded_engine and device_map parameters
) | ||
telemetry_log_info() | ||
app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore | ||
th = threading.Thread( | ||
target=send_telemetry_start, | ||
args=(engine_args_list, [e.capabilities for e in app.engine_array]), # type: ignore | ||
args=(engine_args_list, [{} for e in app.engine_array]), # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Passing empty dictionaries instead of actual engine capabilities will result in loss of telemetry data about model capabilities
@@ -56,6 +56,7 @@ diskcache = {version = "*", optional=true} | |||
onnxruntime-gpu = {version = "1.19.*", optional=true} | |||
tensorrt = {version = "^10.6.0", optional=true} | |||
soundfile = {version="^0.12.1", optional=true} | |||
tiktoken = "^0.8.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: tiktoken should be marked as optional since it's only needed for specific models. Add optional=true to the dependency.
@@ -16,3 +16,4 @@ def test_engine(engine): | |||
model_warmup=False, | |||
) | |||
) | |||
[model_func() for model_func in model_funcs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider catching potential exceptions when calling model functions - initialization could fail for various reasons
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #468 +/- ##
==========================================
+ Coverage 79.51% 79.56% +0.05%
==========================================
Files 41 41
Lines 3417 3441 +24
==========================================
+ Hits 2717 2738 +21
- Misses 700 703 +3 ☔ View full report in Codecov by Sentry. |
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.