Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor -> select_model(functional) #468

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

michaelfeil
Copy link
Owner

Related Issue

Checklist

  • I have read the CONTRIBUTING guidelines.
  • I have added tests to cover my changes.
  • I have updated the documentation (docs folder) accordingly.

Additional Notes

Add any other context about the PR here.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR refactors the model selection and batch handling system to improve multiprocessing capabilities and support for new models like nomic-embed-text-v1.5.

  • Replaced direct model instantiation with factory functions in /libs/infinity_emb/infinity_emb/inference/batch_handler.py for better multiprocessing support
  • Added CallableReturningBaseTypeHint Protocol in /libs/infinity_emb/infinity_emb/transformer/abstract.py to improve type safety
  • Simplified select_model() in /libs/infinity_emb/infinity_emb/inference/select_model.py to return callable engine functions instead of tuples with timing info
  • Added tiktoken as required dependency in /libs/infinity_emb/pyproject.toml for nomic-embed-text-v1.5 support
  • Modified telemetry in /libs/infinity_emb/infinity_emb/infinity_server.py to use empty dicts instead of engine capabilities

9 file(s) reviewed, 8 comment(s)
Edit PR Review Bot Settings | Greptile


# TODO: Can be parallelized
for device_map in engine_args._loading_strategy.device_mapping: # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: type: ignore on device_mapping access should be replaced with proper type annotation

assert len(engine_replicas) > 0, "No engine replicas were loaded"

return engine_replicas, min_inference_t, max_inference_t
return engine_replicas # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: type: ignore on return is unnecessary since return type matches annotation

@@ -57,55 +61,51 @@ def get_engine_type_from_config(
return EmbedderEngine.from_inference_engine(engine_args.engine)


def _get_engine_replica(unloaded_engine, engine_args, device_map) -> "BaseTypeHint":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: function lacks type hints for unloaded_engine and device_map parameters

)
telemetry_log_info()
app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore
th = threading.Thread(
target=send_telemetry_start,
args=(engine_args_list, [e.capabilities for e in app.engine_array]), # type: ignore
args=(engine_args_list, [{} for e in app.engine_array]), # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Passing empty dictionaries instead of actual engine capabilities will result in loss of telemetry data about model capabilities

@@ -56,6 +56,7 @@ diskcache = {version = "*", optional=true}
onnxruntime-gpu = {version = "1.19.*", optional=true}
tensorrt = {version = "^10.6.0", optional=true}
soundfile = {version="^0.12.1", optional=true}
tiktoken = "^0.8.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: tiktoken should be marked as optional since it's only needed for specific models. Add optional=true to the dependency.

@@ -16,3 +16,4 @@ def test_engine(engine):
model_warmup=False,
)
)
[model_func() for model_func in model_funcs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider catching potential exceptions when calling model functions - initialization could fail for various reasons

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 89.18919% with 8 lines in your changes missing coverage. Please review.

Project coverage is 79.56%. Comparing base (8ac0b3c) to head (80d65be).
Report is 33 commits behind head on main.

Files with missing lines Patch % Lines
...finity_emb/infinity_emb/inference/batch_handler.py 90.90% 3 Missing ⚠️
...nfinity_emb/infinity_emb/inference/select_model.py 90.00% 2 Missing ⚠️
libs/infinity_emb/infinity_emb/engine.py 66.66% 1 Missing ⚠️
.../infinity_emb/infinity_emb/transformer/abstract.py 75.00% 1 Missing ⚠️
...ty_emb/infinity_emb/transformer/embedder/neuron.py 50.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #468      +/-   ##
==========================================
+ Coverage   79.51%   79.56%   +0.05%     
==========================================
  Files          41       41              
  Lines        3417     3441      +24     
==========================================
+ Hits         2717     2738      +21     
- Misses        700      703       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants