-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: graph: add document for sdpa with compressed key and value #2301
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(open question) Do we want to have separate document for non-quantized and quantized patterns?
My understanding of existing SDPA page was that it would regroup different data-types given that it currently has a subsection for floating-point pattern.
Thank you @mgouicem ! That's also my question. Initially when i added the floating-point section, i was thinking to include all these fp, int8 quantized, and only kv quantized patterns together in a single page. But with this PR, it seems there will be too much information. Maybe we need @ranukund 's input for which is a better format. |
51d1971
to
9c91ae9
Compare
Thanks for the question! From my perspective, I think it's better for us to put the quantized sdpa patterns in a separate document as it requires much more information compared with pure floating-point patterns, such as fpmath mode setting, group quantization and adding extra dynamic quantization ops. It's also worth noting that this PR only includes the document for quantization sdpa patterns with compressed KV, but not the pure quantized sdpa. We may need to think about fusing them together in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document folder has been changed on main branch. Please rebase the PR. Thanks.
9c91ae9
to
2f307af
Compare
@ranukund, please help with the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments, please incorporate as you see fit.
2f307af
to
05e789e
Compare
Thank you @ranukund! The suggestions are incorporated now. |
05e789e
to
dbb71a0
Compare
dbb71a0
to
3d5e987
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more minor edits suggested, please incorporate, thanks!
3d5e987
to
7036823
Compare
Thanks Ranu. I've incorporated the suggestions. |
7036823
to
bc13d21
Compare
General