Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip assited decoding unit test for models using paged attention #998

Merged
merged 2 commits into from
Nov 22, 2024

Conversation

kaixuanliu
Copy link
Contributor

No description provided.

@kaixuanliu kaixuanliu marked this pull request as ready for review November 22, 2024 00:58
@sywangyi sywangyi merged commit 039c72d into huggingface:paged_attn Nov 22, 2024
1 check passed
IlyasMoutawwakil added a commit that referenced this pull request Dec 5, 2024
* add page attention implementation remove jit logic

Signed-off-by: Wang, Yi A <[email protected]>

* add support in transformers 4.45

Signed-off-by: Wang, Yi A <[email protected]>

* fix congif (#935)

* move patch model to init

Signed-off-by: Wang, Yi A <[email protected]>

* refine class IPEXPagedCache's update method (#945)

* refine class IPEXPagedCache's update method

Signed-off-by: Liu, Kaixuan <[email protected]>

* replace tensor on xpu to List to avoid memory copy

Signed-off-by: Liu, Kaixuan <[email protected]>

* split IPEXPagedCache's update function into `update_for_prefill` and `update_for_decode`

Signed-off-by: Liu, Kaixuan <[email protected]>

---------

Signed-off-by: Liu, Kaixuan <[email protected]>

* fix bug when doing beam search (#954)

Signed-off-by: Liu, Kaixuan <[email protected]>

* enable qkv concat layer (#958)

* enable qkv

* split key value into 2 lists

* add xpu cache optimiztion

Signed-off-by: Wang, Yi A <[email protected]>

* xpu mlp optimization

Signed-off-by: Wang, Yi A <[email protected]>

* optimize cache ops in xpu, improve for beam search

Signed-off-by: Wang, Yi A <[email protected]>

* enable gpt2, falcon has core dump error in PagedAttention.single_quer… (#979)

* enable gpt2, falcon has core dump error in PagedAttention.single_query_cached_kv_attention

* enable new_decoder_arch falcon

* only keep 1 config

* rm autocast

* fix unit test case, CPU part is OK; Enable Falcon7b for XPU (#992)

* fix bug when run IPEXCausalModel forward directly; fix bug when using `save_pretrain`

Signed-off-by: Liu, Kaixuan <[email protected]>

* add LinearGelu Op support for XPU

Signed-off-by: Liu, Kaixuan <[email protected]>

* fix unit test error

Signed-off-by: Liu, Kaixuan <[email protected]>

* adjust unit test case

Signed-off-by: Liu, Kaixuan <[email protected]>

* fix bug

Signed-off-by: Liu, Kaixuan <[email protected]>

---------

Signed-off-by: Liu, Kaixuan <[email protected]>

* skip assited decoding unit test for models using paged attention (#998)

* skip assited decoding unit test for models using paged attention

Signed-off-by: Liu, Kaixuan <[email protected]>

* XPU CI tests get almost all passed

Signed-off-by: Liu, Kaixuan <[email protected]>

---------

Signed-off-by: Liu, Kaixuan <[email protected]>

* fix ci config (#1010)

Signed-off-by: jiqing-feng <[email protected]>

* Fix tests versions (#1011)

* fix ci config

* fix test versions

* fix ipex version

Signed-off-by: jiqing-feng <[email protected]>

* fix torch test version (#1012)

Signed-off-by: jiqing-feng <[email protected]>

* use python3.9 test (#1013)

* use python3.9 test

Signed-off-by: jiqing-feng <[email protected]>

* change ipex transformers limited verison in setup (#1015)

* change ipex transformers limited verison in setup
* fix inc tests

Signed-off-by: jiqing-feng <[email protected]>

* add XPU LinearAddAdd op (#1017)

Signed-off-by: Liu, Kaixuan <[email protected]>

* fix bert and vit patch (#1022)

* fix bert and vit patch
* fix vit and bert save


Signed-off-by: jiqing-feng <[email protected]>

* Paged attn (#1024)

* fix reorder cache for non-patch models

Signed-off-by: jiqing-feng <[email protected]>

* disable torch < 2.3 tests, we won't use torch < 2.4

Signed-off-by: jiqing-feng <[email protected]>

* fix test beam serach

Signed-off-by: jiqing-feng <[email protected]>

* fix cache selection

Signed-off-by: jiqing-feng <[email protected]>

* upgrad to transformers4.46

Signed-off-by: jiqing-feng <[email protected]>

* change ipex test yaml transformers version to 4.46

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* set device as the same as origin model (#1031)

* set device as the same as origin model
* fix device

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* Simplify IPEXModel (#1032)

* simplify forward and save pretrained since no jit support

* fix format

* rm warmup because no jit mode anymore

* simplify forward for causal lm model

* fix paged pkv  forward

* disable use_cache when just run forward

---------

Signed-off-by: jiqing-feng <[email protected]>

* nice code (#1035)

Signed-off-by: Liu, Kaixuan <[email protected]>

* Paged attn (#1036)

* nice code
* device type adjustment

Signed-off-by: Liu, Kaixuan <[email protected]>

* Enable torch.compile for non-generation tasks in CPU (#1037)

* enable compile for non-generation tasks
* add no_grad in forward
* warmup compiled model
* disable compile not ready models
* set system level optimize for torch.compile
* fix typo
* add comments
* set torch minimum version for compiling

Signed-off-by: jiqing-feng <[email protected]>

* Fix ipex upload and update readme. (#1045)

* fix readme and push to hub support

Signed-off-by: jiqing-feng <[email protected]>

* rm export in tests

Signed-off-by: jiqing-feng <[email protected]>

* test with torch 2.5.*

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* Fix tests (#1047)

* fix tests
* fix typo
* add patched tests

* change forward to generate

* fix tests

* fix test model name


---------

Signed-off-by: jiqing-feng <[email protected]>

* Patch gpt2 block forward for passing input_lens. (#1050)

* fix forward without pkv
* patch gpt2 block forward
* fix typo
* revert causal lm tests

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Liu, Kaixuan <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Co-authored-by: jiqing-feng <[email protected]>
Co-authored-by: kaixuanliu <[email protected]>
Co-authored-by: Ilyas Moutawwakil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants