fix convert_weights not working for Qwen2.5 HF checkpoints #2233

zhangtemplar · 2025-01-06T23:11:08Z

Summary: In QWen2.5, the attention's linear projection layer has bias=True, but torchtune.convert_weights is not yet supporting bias=True. This diff add support for that

Differential Revision: D67880222

Summary: In QWen2.5, the attention's linear projection layer has bias=True, but torchtune.convert_weights is not yet supporting bias=True. This diff add support for that Differential Revision: D67880222

pytorch-bot · 2025-01-06T23:11:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2233

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 0881133 with merge base 213f386 ():

NEW FAILURE - The following job has failed:

Lint / lint (3.10) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-01-06T23:11:23Z

This pull request was exported from Phabricator. Differential Revision: D67880222

ebsmothers · 2025-01-07T01:52:39Z

@calvinpelletier can you review?

calvinpelletier · 2025-01-08T03:15:36Z

Hi @zhangtemplar , you're changing the generic convert_weights function. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.

In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here

joecummings · 2025-01-09T21:30:15Z

Hi @zhangtemplar , you're changing the generic convert_weights function. Qwen2.5 already has a specific convert weights function here which handles the biases of the linear projections.

In our Qwen2.5 configs, we specify the model type as "QWEN2" here which causes the checkpointer to call the Qwen-specific convert weights function here

This raises a good point tho that we don't tell the user if their model type is wrong. We should probably allow either QWEN2_5 or QWEN2 to point to the same conversion function.

fix convert_weights not working for Qwen2.5 HF checkpoints

0881133

Summary: In QWen2.5, the attention's linear projection layer has bias=True, but torchtune.convert_weights is not yet supporting bias=True. This diff add support for that Differential Revision: D67880222

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 6, 2025

facebook-github-bot added the fb-exported label Jan 6, 2025

ebsmothers requested a review from calvinpelletier January 7, 2025 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix convert_weights not working for Qwen2.5 HF checkpoints #2233

fix convert_weights not working for Qwen2.5 HF checkpoints #2233

zhangtemplar commented Jan 6, 2025

pytorch-bot bot commented Jan 6, 2025 •

edited

Loading

facebook-github-bot commented Jan 6, 2025

ebsmothers commented Jan 7, 2025

calvinpelletier commented Jan 8, 2025 •

edited

Loading

joecummings commented Jan 9, 2025

fix convert_weights not working for Qwen2.5 HF checkpoints #2233

Are you sure you want to change the base?

fix convert_weights not working for Qwen2.5 HF checkpoints #2233

Conversation

zhangtemplar commented Jan 6, 2025

pytorch-bot bot commented Jan 6, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2233

❌ 1 New Failure

facebook-github-bot commented Jan 6, 2025

ebsmothers commented Jan 7, 2025

calvinpelletier commented Jan 8, 2025 • edited Loading

joecummings commented Jan 9, 2025

pytorch-bot bot commented Jan 6, 2025 •

edited

Loading

calvinpelletier commented Jan 8, 2025 •

edited

Loading