-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu: x64: enable matmul-based IP for forward inference #2341
base: main
Are you sure you want to change the base?
Conversation
make test |
VDISPATCH_INNER_PRODUCT(is_fwd(), VERBOSE_BAD_PROPKIND); | ||
VDISPATCH_INNER_PRODUCT( | ||
get_prop_kind() == prop_kind::forward_training, | ||
VERBOSE_BAD_PROPKIND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify why jit_brgemm_ip gets restricted to forward training?
IIUC, jit_brgemm_ip is after the new matmul_ip in the dispatch list, so there is a chance that if a forward inference case is not handled by matmul_ip (per the documented restrictions), it will also be skipped by jit_brgemm_ip and will go to a lower performance impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so there is a chance that if a forward inference case is not handled by matmul_ip (per the documented restrictions)
If a blocked layout cannot be used for weights then it'll fall back to a plain layout so all cases that can be handled by brgemm-ip can also be handled by matmul-based ip.
The goal is to remove brgemm-ip completely so we can't use it as a complement.
e908f44
to
ee4229f
Compare
ee4229f
to
dbc17df
Compare
make test |
dbc17df
to
84d5a94
Compare
84d5a94
to
fbe9c28
Compare
This PR introduces a matmul-based inner product implementation for inference that will be used instead of the current brgemm-based one. The latter will be disabled for inference.
This PR is targeting oneDNN v3.8 and will be merged after the oneDNN v3.7 code freeze date.
The implementation allows using blocked weights layouts directly or via the special tag
any
.The Inner Product weights must meet ONE of the following requirements to enable using the blocked layouts:
If none of the above requirements are met then a plain layout will be used.
The new matmul-based implementation is able to leverage blocked weights layouts for all cases used in the performance testing and therefore provides the best possible performance.
Below is a comparison of the matmul-based vs brgemm-based inner product. > 100% means matmul-based is better.
_________________