Add support for non-power of 2 head size with quantization #2352

umar456 · 2025-01-07T22:50:17Z

Description

This pull request adds the ability to support quantization with head size that is not a power of two. This was a limitation on the vs ugemm and was required by the phi3-mini-4k-instruct model.

The shape that was failing was Q [1x32xSEQ_LENx96] KV [1x32xSEQ_LENx96] which now passes with this change.

Additionally this PR addresses a failure to build when the vs work group was configured with an 16x16 work group tile.

src/gpu/intel/ocl/micro_sdpa.cpp

src/gpu/intel/ocl/micro_sdpa.hpp

umar456 · 2025-01-08T16:07:43Z

make test
disable device_cpu
disable benchdnn_all
enable benchdnn_nightly
enable benchdnn_graph

github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jan 7, 2025

umar456 marked this pull request as ready for review January 7, 2025 22:50

umar456 requested a review from a team as a code owner January 7, 2025 22:50

petercad reviewed Jan 7, 2025

View reviewed changes

src/gpu/intel/ocl/micro_sdpa.cpp Show resolved Hide resolved

umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from a278031 to 9db21cb Compare January 8, 2025 03:19

syurkevi approved these changes Jan 8, 2025

View reviewed changes

petercad reviewed Jan 8, 2025

View reviewed changes

src/gpu/intel/ocl/micro_sdpa.hpp Outdated Show resolved Hide resolved

petercad approved these changes Jan 8, 2025

View reviewed changes

umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from 9db21cb to bdf9a78 Compare January 8, 2025 16:04

umar456 added 2 commits January 8, 2025 12:05

xe: sdap: Add support for non-power of 2 head_size with quantization

f37d824

xe: sdpa: Add support for 16,16,8,2 vs work group configuration

c71728b

umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from bdf9a78 to 563bb62 Compare January 9, 2025 07:02

xe: sdpa: Add checks for VS ugemm group size when not power of 2

50c6cc1

umar456 force-pushed the uarshad/sdpa_non_pow2_quan_support branch from 563bb62 to 50c6cc1 Compare January 9, 2025 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for non-power of 2 head size with quantization #2352

Add support for non-power of 2 head size with quantization #2352

umar456 commented Jan 7, 2025

umar456 commented Jan 8, 2025

Add support for non-power of 2 head size with quantization #2352

Are you sure you want to change the base?

Add support for non-power of 2 head size with quantization #2352

Conversation

umar456 commented Jan 7, 2025

Description

umar456 commented Jan 8, 2025