You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 2, 2024. It is now read-only.
As I'm working with AOTIndutor and Torchscript for exporting models, I'm realizing that model publishers will sometimes want to reference runtime details for multiple model artifacts, without duplicating all model extension info.
AOTInductor (.pt2) exports a model with hardware specific optimizations, so it will be tied to a particular accelerator (cpu, gpu, tpu, etc.)
Torchscript tracing (.pt) is hardware agnostic. the loaded model and model inputs just need to be moved to the correct hardware before inference. The optimizations are not hardware specific so accelerator utilization is lower than models compiled with AOTInductor.
Model publishers might want to provide any of combinations of a hardware agnostic model artifact, multiple optimized models, or the original weights.
I think we should probably accept an array of Runtime Objects instead of a single Runtime Object.
The text was updated successfully, but these errors were encountered:
Various model artifacts should be provided by distinct Assets with mlm:model role.
Each Asset can also provide mlm:artifact_type to be more explicit about the specific artifact content.
Other fields such as mlm:framework can also be applied on individual Assets to allow providing multiple equivalent definitions by various implementations.
As I'm working with AOTIndutor and Torchscript for exporting models, I'm realizing that model publishers will sometimes want to reference runtime details for multiple model artifacts, without duplicating all model extension info.
AOTInductor (.pt2) exports a model with hardware specific optimizations, so it will be tied to a particular accelerator (cpu, gpu, tpu, etc.)
Torchscript tracing (.pt) is hardware agnostic. the loaded model and model inputs just need to be moved to the correct hardware before inference. The optimizations are not hardware specific so accelerator utilization is lower than models compiled with AOTInductor.
Model publishers might want to provide any of combinations of a hardware agnostic model artifact, multiple optimized models, or the original weights.
I think we should probably accept an array of Runtime Objects instead of a single Runtime Object.
The text was updated successfully, but these errors were encountered: