There's useless DQ node in matmul_model_quant_io.onnx #1

HectorSVC · 2024-10-17T00:12:22Z

There's useless DQ node in matmul_model_quant_io.onnx

Also have some questions:

The model has 2 inputs and 1 output with large data size, which means huge IO cost for NPU, maybe you can try something different like, make 2nd input as initializer, change inputs to [1, 6, 256, 1500] * [1, 6, 1500, 1500], so output is [1, 6, 256, 256]
in your benchmark script, the time cost includes the 1st inference run. Normally we would skip the 1st inference run as warmup.

nonnull-ca · 2024-10-17T01:00:43Z

Regarding #1, I will note in the readme:

This benchmark is designed to resemble some real world models we depend on

Regarding #2, Whisper (and most other other models) doesn't run the same matrix multiplication over and over again. Instead it runs a bunch of different (large) multiplications in a row. This tends to push weights out of cache, and as such I'd argue that cold-cache performance for a single layer's operations is, if anything, more important than warm-cache performance.

HectorSVC · 2024-10-17T03:33:51Z

Does your real word models have same IO size? It doesn't make sense that just extract part of the model and test it separately. It makes more sense to test a full model instead.

HectorSVC · 2024-10-17T03:35:31Z

Also the benchmark script compare QDQ model on NPU vs fp32 model on CPU, it's not apple to apple.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There's useless DQ node in matmul_model_quant_io.onnx #1

There's useless DQ node in matmul_model_quant_io.onnx #1

HectorSVC commented Oct 17, 2024

nonnull-ca commented Oct 17, 2024 •

edited

Loading

HectorSVC commented Oct 17, 2024

HectorSVC commented Oct 17, 2024

There's useless DQ node in matmul_model_quant_io.onnx #1

There's useless DQ node in matmul_model_quant_io.onnx #1

Comments

HectorSVC commented Oct 17, 2024

nonnull-ca commented Oct 17, 2024 • edited Loading

HectorSVC commented Oct 17, 2024

HectorSVC commented Oct 17, 2024

nonnull-ca commented Oct 17, 2024 •

edited

Loading