Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From NVIDIA Megatron-LM for visibility #18

Open
wants to merge 3,638 commits into
base: multi-query-attention
Choose a base branch
from

Conversation

RaymondLi0
Copy link
Collaborator

No description provided.

@RaymondLi0 RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12
@RaymondLi0 RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12
huvunvidia and others added 28 commits November 14, 2024 21:20
…oder, encoder-decoder) to be compatible with all 3 TE backends

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: root <[email protected]>
Updating all T5 attention masks (encoder, decoder, encoder-decoder) to be compatible with all 3 TE backends

See merge request ADLR/megatron-lm!2273
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Add hierarchical cp comm group

See merge request ADLR/megatron-lm!2279
Add missing arg to save_checkpoint call

See merge request ADLR/megatron-lm!2351
NVLM example scripts

See merge request ADLR/megatron-lm!2306
ci: Re-enable llava tests

See merge request ADLR/megatron-lm!2348
ci: Retry download assets

See merge request ADLR/megatron-lm!2357
Support etp==tp when epp==0 and enforce torch ckpt-format when epp>1

See merge request ADLR/megatron-lm!2260
QKNorm to work with TENorm

See merge request ADLR/megatron-lm!2347
Support RMSNorm when TE and Apex are not installed

See merge request ADLR/megatron-lm!2015
Clarifications for batch x pipeline parallel logic

See merge request ADLR/megatron-lm!2343
Add attention bias arg in MCore transformer for TE cuDNN FusedAttention

See merge request ADLR/megatron-lm!2293
chore: Add mypy optionally

See merge request ADLR/megatron-lm!2360
ci: JET improvements

See merge request ADLR/megatron-lm!2365
… 'main'

update golden values for nightly test

See merge request ADLR/megatron-lm!2364
ko3n1g and others added 30 commits January 7, 2025 01:35
ci: Update golden values of nightlies

See merge request ADLR/megatron-lm!2511
Make generate function only return results for newly added requests

See merge request ADLR/megatron-lm!2370
ci: Use torchrun

See merge request ADLR/megatron-lm!2507
chore: Fix local generator script

See merge request ADLR/megatron-lm!2519
Co-authored-by: William Dykas <[email protected]>
Co-authored-by: Mcore Bot <[email protected]>
Co-authored-by: William Dykas <[email protected]>
Co-authored-by: root <[email protected]>
Fix log probs output for inference

See merge request ADLR/megatron-lm!2430
Add tests for MoE models with average_in_collective=True

See merge request ADLR/megatron-lm!2489
ci: Allow running nemo-ci

See merge request ADLR/megatron-lm!2509
ci: Fail-fast on unit tests

See merge request ADLR/megatron-lm!2520
remove tensorstore pin

See merge request ADLR/megatron-lm!2516
ci: nemo-ci inputs

See merge request ADLR/megatron-lm!2522
Adding (bias-based) relative position embedding to T5

See merge request ADLR/megatron-lm!2428
Inference CUDA graphs (MCore version)

See merge request ADLR/megatron-lm!2429
…ayers

Co-authored-by: Jon Barker <[email protected]>
Co-authored-by: Jon Barker <[email protected]>
Co-authored-by: Jon Barker <[email protected]>
Fix bug when loading pp>1 model with frozen layers

See merge request ADLR/megatron-lm!2523
Make MoE token dispatcher cuda graph-able if token-drop and padding

See merge request ADLR/megatron-lm!2426
ci: Implement `frozen-ckpt` tests

See merge request ADLR/megatron-lm!2514
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.