Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: x64: enable matmul-based IP for forward inference #2341

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

densamoilov
Copy link
Contributor

@densamoilov densamoilov commented Jan 6, 2025

This PR introduces a matmul-based inner product implementation for inference that will be used instead of the current brgemm-based one. The latter will be disabled for inference.

This PR is targeting oneDNN v3.8 and will be merged after the oneDNN v3.7 code freeze date.

The implementation allows using blocked weights layouts directly or via the special tag any.
The Inner Product weights must meet ONE of the following requirements to enable using the blocked layouts:

  • Weights don't have spatial.
  • Weights have unit spatial.
  • Weights have non-unit spatial but the number of input channels is a multiple of K block.

If none of the above requirements are met then a plain layout will be used.

The new matmul-based implementation is able to leverage blocked weights layouts for all cases used in the performance testing and therefore provides the best possible performance.
Below is a comparison of the matmul-based vs brgemm-based inner product. > 100% means matmul-based is better.

image _________________ image

@densamoilov densamoilov requested review from a team as code owners January 6, 2025 10:44
@github-actions github-actions bot added the platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 label Jan 6, 2025
@densamoilov
Copy link
Contributor Author

make test
enable device_cpu
disable device_gpu

VDISPATCH_INNER_PRODUCT(is_fwd(), VERBOSE_BAD_PROPKIND);
VDISPATCH_INNER_PRODUCT(
get_prop_kind() == prop_kind::forward_training,
VERBOSE_BAD_PROPKIND);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify why jit_brgemm_ip gets restricted to forward training?

IIUC, jit_brgemm_ip is after the new matmul_ip in the dispatch list, so there is a chance that if a forward inference case is not handled by matmul_ip (per the documented restrictions), it will also be skipped by jit_brgemm_ip and will go to a lower performance impl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there is a chance that if a forward inference case is not handled by matmul_ip (per the documented restrictions)

If a blocked layout cannot be used for weights then it'll fall back to a plain layout so all cases that can be handled by brgemm-ip can also be handled by matmul-based ip.

The goal is to remove brgemm-ip completely so we can't use it as a complement.

@densamoilov densamoilov force-pushed the dsamoylo/main/matmul-ip-fwd-i branch from e908f44 to ee4229f Compare January 6, 2025 21:58
src/cpu/x64/matmul_inner_product.cpp Outdated Show resolved Hide resolved
src/cpu/x64/matmul_inner_product.cpp Outdated Show resolved Hide resolved
src/cpu/x64/matmul_inner_product.cpp Show resolved Hide resolved
src/cpu/x64/matmul_inner_product.cpp Outdated Show resolved Hide resolved
src/cpu/x64/matmul_inner_product.cpp Show resolved Hide resolved
@densamoilov densamoilov force-pushed the dsamoylo/main/matmul-ip-fwd-i branch from ee4229f to dbc17df Compare January 10, 2025 00:05
@densamoilov
Copy link
Contributor Author

make test
enable device_cpu
disable device_gpu

@densamoilov densamoilov force-pushed the dsamoylo/main/matmul-ip-fwd-i branch from dbc17df to 84d5a94 Compare January 10, 2025 01:10
@densamoilov densamoilov force-pushed the dsamoylo/main/matmul-ip-fwd-i branch from 84d5a94 to fbe9c28 Compare January 10, 2025 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants