Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-power of 2 head size with quantization #2352

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

umar456
Copy link
Contributor

@umar456 umar456 commented Jan 7, 2025

Description

This pull request adds the ability to support quantization with head size that is not a power of two. This was a limitation on the vs ugemm and was required by the phi3-mini-4k-instruct model.

The shape that was failing was Q [1x32xSEQ_LENx96] KV [1x32xSEQ_LENx96] which now passes with this change.

Additionally this PR addresses a failure to build when the vs work group was configured with an 16x16 work group tile.

@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jan 7, 2025
@umar456 umar456 marked this pull request as ready for review January 7, 2025 22:50
@umar456 umar456 requested a review from a team as a code owner January 7, 2025 22:50
@umar456 umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from a278031 to 9db21cb Compare January 8, 2025 03:19
@umar456 umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from 9db21cb to bdf9a78 Compare January 8, 2025 16:04
@umar456
Copy link
Contributor Author

umar456 commented Jan 8, 2025

make test
disable device_cpu
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_graph

@umar456 umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from bdf9a78 to 563bb62 Compare January 9, 2025 07:02
@umar456 umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from 563bb62 to 50c6cc1 Compare January 9, 2025 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants