fix gpt bigcode model loading with fp16 weights precision #1098

eaidova · 2025-01-06T11:02:06Z

What does this PR do?

fix issue found during enabling https://huggingface.co/WizardLMTeam/WizardCoder-15B-V1.0

RuntimeError: Expected attn_mask dtype to be bool or float or to match query dtype, but got attn_mask.dtype: c10::Half and  query.dtype: float instead.

gpt-bigcode model cast attention mask to transformer.wte.weight.dtype that is float16 in case if we load model with preserving weights in original dtype (https://github.com/huggingface/transformers/blob/v4.46.3/src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py#L916) and provide query as float32, as the result dtype mismatch may take place on scaled_dot_product operation inputs

HuggingFaceDocBuilderDev · 2025-01-06T11:07:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

IlyasMoutawwakil · 2025-01-07T08:53:48Z

Does this happen in transformers as well ? if so can you please open an issue/PR there as well.

eaidova · 2025-01-07T09:13:15Z

Does this happen in transformers as well ? if so can you please open an issue/PR there as well.

this is related to how openvino model conversion works - to reduce memory consumption, we load model with auto as torch_dtype to avoid additional memory overhead for conversion fp16 to fp32. as usually, model exports happen on cpu that does not always support bf16/fp16 on pytorch level, for overcome that we patch some operations to be executed in fp32 and skip memory consumed ops like linear or embeddings during tracing replacing them directly with ov op, so as the result on our side model.transformer.wte is skipped and passed as float16 and and query calculated as float32, I do not think that this is matched with torch behavior, but I'll check

echarlaix · 2025-01-08T13:16:23Z

@eaidova would you mind taking a look at the conflicts, will merge once resolved

eaidova · 2025-01-08T17:12:30Z

@eaidova would you mind taking a look at the conflicts, will merge once resolved

@echarlaix, I resolved merge conflicts

eaidova requested a review from AlexKoff88 January 6, 2025 11:02

eaidova changed the title ~~fix gpt bigcode model laoding with fp16 weights precision~~ fix gpt bigcode model loading with fp16 weights precision Jan 6, 2025

eaidova requested review from IlyasMoutawwakil and echarlaix January 6, 2025 11:14

AlexKoff88 approved these changes Jan 6, 2025

View reviewed changes

IlyasMoutawwakil approved these changes Jan 7, 2025

View reviewed changes

eaidova added 2 commits January 8, 2025 17:19

fix gpt bigcode model laoding with fp16 weights precision

5564d20

code style after rebase

2ce21bf

eaidova force-pushed the ea/fix_gpt_bigcode_fp16 branch from de0777e to 2ce21bf Compare January 8, 2025 13:21

IlyasMoutawwakil merged commit 49441bc into huggingface:main Jan 9, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gpt bigcode model loading with fp16 weights precision #1098

fix gpt bigcode model loading with fp16 weights precision #1098

eaidova commented Jan 6, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 6, 2025

IlyasMoutawwakil commented Jan 7, 2025

eaidova commented Jan 7, 2025

echarlaix commented Jan 8, 2025

eaidova commented Jan 8, 2025

fix gpt bigcode model loading with fp16 weights precision #1098

fix gpt bigcode model loading with fp16 weights precision #1098

Conversation

eaidova commented Jan 6, 2025 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jan 6, 2025

IlyasMoutawwakil commented Jan 7, 2025

eaidova commented Jan 7, 2025

echarlaix commented Jan 8, 2025

eaidova commented Jan 8, 2025

eaidova commented Jan 6, 2025 •

edited

Loading