Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix gpt bigcode model loading with fp16 weights precision #1098

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

eaidova
Copy link
Collaborator

@eaidova eaidova commented Jan 6, 2025

What does this PR do?

fix issue found during enabling https://huggingface.co/WizardLMTeam/WizardCoder-15B-V1.0

RuntimeError: Expected attn_mask dtype to be bool or float or to match query dtype, but got attn_mask.dtype: c10::Half and  query.dtype: float instead.

gpt-bigcode model cast attention mask to transformer.wte.weight.dtype that is float16 in case if we load model with preserving weights in original dtype (https://github.com/huggingface/transformers/blob/v4.46.3/src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py#L916) and provide query as float32, as the result dtype mismatch may take place on scaled_dot_product operation inputs

@eaidova eaidova requested a review from AlexKoff88 January 6, 2025 11:02
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@eaidova eaidova changed the title fix gpt bigcode model laoding with fp16 weights precision fix gpt bigcode model loading with fp16 weights precision Jan 6, 2025
@IlyasMoutawwakil
Copy link
Member

Does this happen in transformers as well ? if so can you please open an issue/PR there as well.

@eaidova
Copy link
Collaborator Author

eaidova commented Jan 7, 2025

Does this happen in transformers as well ? if so can you please open an issue/PR there as well.

this is related to how openvino model conversion works - to reduce memory consumption, we load model with auto as torch_dtype to avoid additional memory overhead for conversion fp16 to fp32. as usually, model exports happen on cpu that does not always support bf16/fp16 on pytorch level, for overcome that we patch some operations to be executed in fp32 and skip memory consumed ops like linear or embeddings during tracing replacing them directly with ov op, so as the result on our side model.transformer.wte is skipped and passed as float16 and and query calculated as float32, I do not think that this is matched with torch behavior, but I'll check

@echarlaix
Copy link
Collaborator

@eaidova would you mind taking a look at the conflicts, will merge once resolved

@eaidova eaidova force-pushed the ea/fix_gpt_bigcode_fp16 branch from de0777e to 2ce21bf Compare January 8, 2025 13:21
@eaidova
Copy link
Collaborator Author

eaidova commented Jan 8, 2025

@eaidova would you mind taking a look at the conflicts, will merge once resolved

@echarlaix, I resolved merge conflicts

@IlyasMoutawwakil IlyasMoutawwakil merged commit 49441bc into huggingface:main Jan 9, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants