From NVIDIA Megatron-LM for visibility #18

RaymondLi0 · 2023-01-24T20:01:13Z

No description provided.

…oder, encoder-decoder) to be compatible with all 3 TE backends Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: root <[email protected]>

Updating all T5 attention masks (encoder, decoder, encoder-decoder) to be compatible with all 3 TE backends See merge request ADLR/megatron-lm!2273

Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Add hierarchical cp comm group See merge request ADLR/megatron-lm!2279

Add missing arg to save_checkpoint call See merge request ADLR/megatron-lm!2351

NVLM example scripts See merge request ADLR/megatron-lm!2306

ci: Re-enable llava tests See merge request ADLR/megatron-lm!2348

ci: Retry download assets See merge request ADLR/megatron-lm!2357

… ckpt-format when epp>1 Co-authored-by: Jon Barker <[email protected]>

Support etp==tp when epp==0 and enforce torch ckpt-format when epp>1 See merge request ADLR/megatron-lm!2260

Co-authored-by: Shanmugam Ramasamy <[email protected]>

QKNorm to work with TENorm See merge request ADLR/megatron-lm!2347

…alled

Support RMSNorm when TE and Apex are not installed See merge request ADLR/megatron-lm!2015

…logic

Clarifications for batch x pipeline parallel logic See merge request ADLR/megatron-lm!2343

…or TE cuDNN FusedAttention Co-authored-by: yaoyu-33 <[email protected]>

Add attention bias arg in MCore transformer for TE cuDNN FusedAttention See merge request ADLR/megatron-lm!2293

chore: Add mypy optionally See merge request ADLR/megatron-lm!2360

ci: JET improvements See merge request ADLR/megatron-lm!2365

Co-authored-by: Huy Vu2 <[email protected]>

… 'main' update golden values for nightly test See merge request ADLR/megatron-lm!2364

ci: Update golden values of nightlies See merge request ADLR/megatron-lm!2511

…r newly added requests

Make generate function only return results for newly added requests See merge request ADLR/megatron-lm!2370

ci: Use torchrun See merge request ADLR/megatron-lm!2507

chore: Fix local generator script See merge request ADLR/megatron-lm!2519

Co-authored-by: William Dykas <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: William Dykas <[email protected]> Co-authored-by: root <[email protected]>

Fix log probs output for inference See merge request ADLR/megatron-lm!2430

…ective=True Co-authored-by: Oliver Koenig <[email protected]>

Add tests for MoE models with average_in_collective=True See merge request ADLR/megatron-lm!2489

ci: Allow running nemo-ci See merge request ADLR/megatron-lm!2509

ci: Fail-fast on unit tests See merge request ADLR/megatron-lm!2520

remove tensorstore pin See merge request ADLR/megatron-lm!2516

ci: nemo-ci inputs See merge request ADLR/megatron-lm!2522

…ng to T5 Co-authored-by: Huy Vu2 <[email protected]>

Adding (bias-based) relative position embedding to T5 See merge request ADLR/megatron-lm!2428

Co-authored-by: Jimmy Zhang <[email protected]>

Inference CUDA graphs (MCore version) See merge request ADLR/megatron-lm!2429

…ayers Co-authored-by: Jon Barker <[email protected]> Co-authored-by: Jon Barker <[email protected]> Co-authored-by: Jon Barker <[email protected]>

Fix bug when loading pp>1 model with frozen layers See merge request ADLR/megatron-lm!2523

…token-drop and padding

Make MoE token dispatcher cuda graph-able if token-drop and padding See merge request ADLR/megatron-lm!2426

ci: Implement `frozen-ckpt` tests See merge request ADLR/megatron-lm!2514

RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12

RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12

huvunvidia and others added 28 commits November 14, 2024 21:20

ADLR/megatron-lm!2273 - Updating all T5 attention masks (encoder, dec…

c1728c1

…oder, encoder-decoder) to be compatible with all 3 TE backends Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: root <[email protected]>

Merge branch 'huvu/update_t5_attentionmasktype' into 'main'

2163865

Updating all T5 attention masks (encoder, decoder, encoder-decoder) to be compatible with all 3 TE backends See merge request ADLR/megatron-lm!2273

ADLR/megatron-lm!2279 - Add hierarchical cp comm group

645c329

Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Merge branch 'add_hierarchical_cp_comm_group' into 'main'

2bdc60c

Add hierarchical cp comm group See merge request ADLR/megatron-lm!2279

ADLR/megatron-lm!2351 - Add missing arg to save_checkpoint call

8b72751

Merge branch 'jbarker-main-patch-72619' into 'main'

63b8520

Add missing arg to save_checkpoint call See merge request ADLR/megatron-lm!2351

ADLR/megatron-lm!2306 - NVLM example scripts

4131b07

Merge branch 'trintamaki/nvlm-example-scripts' into 'main'

ce507ee

NVLM example scripts See merge request ADLR/megatron-lm!2306

ADLR/megatron-lm!2348 - ci: Re-enable llava tests

9e9d4f5

Merge branch 'ko3n1g/ci/re-enable-mm-tests' into 'main'

6c88bfc

ci: Re-enable llava tests See merge request ADLR/megatron-lm!2348

ADLR/megatron-lm!2357 - ci: Retry download assets

06c67b4

Merge branch 'ko3n1g/ci/retry-download' into 'main'

5438d15

ci: Retry download assets See merge request ADLR/megatron-lm!2357

ADLR/megatron-lm!2260 - Support etp==tp when epp==0 and enforce torch…

57ed924

… ckpt-format when epp>1 Co-authored-by: Jon Barker <[email protected]>

Merge branch 'jbarker/etp_equals_tp' into 'main'

0f389f2

Support etp==tp when epp==0 and enforce torch ckpt-format when epp>1 See merge request ADLR/megatron-lm!2260

ADLR/megatron-lm!2347 - QKNorm to work with TENorm

62e2e33

Co-authored-by: Shanmugam Ramasamy <[email protected]>

Merge branch 'qknorm' into 'main'

68e11fb

QKNorm to work with TENorm See merge request ADLR/megatron-lm!2347

ADLR/megatron-lm!2015 - Support RMSNorm when TE and Apex are not inst…

693ae86

…alled

Merge branch 'torch-rms-norm' into 'main'

c4c9057

Support RMSNorm when TE and Apex are not installed See merge request ADLR/megatron-lm!2015

ADLR/megatron-lm!2343 - Clarifications for batch x pipeline parallel …

2e975f0

…logic

Merge branch 'helenn-fix-batch-pipeline-logic' into 'main'

2138248

Clarifications for batch x pipeline parallel logic See merge request ADLR/megatron-lm!2343

ADLR/megatron-lm!2293 - Add attention bias arg in MCore transformer f…

cd1d30b

…or TE cuDNN FusedAttention Co-authored-by: yaoyu-33 <[email protected]>

Merge branch 'yuya/add_attn_bias' into 'main'

6033e95

Add attention bias arg in MCore transformer for TE cuDNN FusedAttention See merge request ADLR/megatron-lm!2293

ADLR/megatron-lm!2360 - chore: Add mypy optionally

4f5aa6d

Merge branch 'ko3n1g/chore/add-mypy' into 'main'

f214627

chore: Add mypy optionally See merge request ADLR/megatron-lm!2360

ADLR/megatron-lm!2365 - ci: JET improvements

a231b87

Merge branch 'ko3n1g/ci/jet-fleet' into 'main'

b6866ae

ci: JET improvements See merge request ADLR/megatron-lm!2365

ADLR/megatron-lm!2364 - update golden values for nightly test

886fd12

Co-authored-by: Huy Vu2 <[email protected]>

Merge branch 'huvu/update_t5_attentionmask_nightly_goldenvalues' into…

7b79d5b

… 'main' update golden values for nightly test See merge request ADLR/megatron-lm!2364

ko3n1g and others added 30 commits January 7, 2025 01:35

ADLR/megatron-lm!2511 - ci: Update golden values of nightlies

c383fe9

Merge branch 'ko3n1g/ci/update-nightlies' into 'main'

15517f6

ci: Update golden values of nightlies See merge request ADLR/megatron-lm!2511

ADLR/megatron-lm!2370 - Make generate function only return results fo…

342e359

…r newly added requests

Merge branch 'generate_fix' into 'main'

df28200

Make generate function only return results for newly added requests See merge request ADLR/megatron-lm!2370

ADLR/megatron-lm!2507 - ci: Use torchrun

6e09dd4

Merge branch 'ko3n1g/ci/use-torchrun' into 'main'

ab171c5

ci: Use torchrun See merge request ADLR/megatron-lm!2507

ADLR/megatron-lm!2519 - chore: Fix local generator script

c8d12e6

Merge branch 'ko3n1g/chore/fix-local-generator-script' into 'main'

65720c8

chore: Fix local generator script See merge request ADLR/megatron-lm!2519

ADLR/megatron-lm!2430 - Fix log probs output for inference

5ff34d0

Co-authored-by: William Dykas <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: William Dykas <[email protected]> Co-authored-by: root <[email protected]>

Merge branch 'wdykas/fix-logprobs' into 'main'

4dc8977

Fix log probs output for inference See merge request ADLR/megatron-lm!2430

ADLR/megatron-lm!2489 - Add tests for MoE models with average_in_coll…

c99a5fe

…ective=True Co-authored-by: Oliver Koenig <[email protected]>

Merge branch 'add_test_for_average_in_collective_ddp' into 'main'

ad41174

Add tests for MoE models with average_in_collective=True See merge request ADLR/megatron-lm!2489

ADLR/megatron-lm!2509 - ci: Allow running nemo-ci

6ce0da5

Merge branch 'ko3n1g/ci/run-nemo-ci' into 'main'

05780f3

ci: Allow running nemo-ci See merge request ADLR/megatron-lm!2509

ADLR/megatron-lm!2520 - ci: Fail-fast on unit tests

9220838

Merge branch 'ko3n1g/ci/fail-fast-unit-tests' into 'main'

1ce944c

ci: Fail-fast on unit tests See merge request ADLR/megatron-lm!2520

ADLR/megatron-lm!2516 - remove tensorstore pin

72e86a6

Merge branch 'pstjohn/remove-tensorstore-pin' into 'main'

a26b93d

remove tensorstore pin See merge request ADLR/megatron-lm!2516

ADLR/megatron-lm!2522 - ci: nemo-ci inputs

67130c9

Merge branch 'ko3n1g/ci/fix-inputs-to-nemo-ci' into 'main'

bafab5a

ci: nemo-ci inputs See merge request ADLR/megatron-lm!2522

ADLR/megatron-lm!2428 - Adding (bias-based) relative position embeddi…

a852cb9

…ng to T5 Co-authored-by: Huy Vu2 <[email protected]>

Merge branch 'huvu/relative_posemd_attention_bias' into 'main'

93cb1c1

Adding (bias-based) relative position embedding to T5 See merge request ADLR/megatron-lm!2428

ADLR/megatron-lm!2429 - Inference CUDA graphs (MCore version)

fa93a05

Co-authored-by: Jimmy Zhang <[email protected]>

Merge branch 'hn-inference-cudagraphs-mcore' into 'main'

8fba594

Inference CUDA graphs (MCore version) See merge request ADLR/megatron-lm!2429

ADLR/megatron-lm!2523 - Fix bug when loading pp>1 model with frozen l…

458bfc9

…ayers Co-authored-by: Jon Barker <[email protected]> Co-authored-by: Jon Barker <[email protected]> Co-authored-by: Jon Barker <[email protected]>

Merge branch 'jbarker/debug_pp_convert' into 'main'

3046e33

Fix bug when loading pp>1 model with frozen layers See merge request ADLR/megatron-lm!2523

ADLR/megatron-lm!2426 - Make MoE token dispatcher cuda graph-able if …

f27a04f

…token-drop and padding

Merge branch 'graphable_token_dispatch' into 'main'

726da58

Make MoE token dispatcher cuda graph-able if token-drop and padding See merge request ADLR/megatron-lm!2426

ADLR/megatron-lm!2514 - ci: Implement frozen-ckpt tests

b41bcba

Merge branch 'ko3n1g/ci/frozen-ckpt' into 'main'

c76410a

ci: Implement `frozen-ckpt` tests See merge request ADLR/megatron-lm!2514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From NVIDIA Megatron-LM for visibility #18

From NVIDIA Megatron-LM for visibility #18

RaymondLi0 commented Jan 24, 2023

From NVIDIA Megatron-LM for visibility #18

Are you sure you want to change the base?

From NVIDIA Megatron-LM for visibility #18

Conversation

RaymondLi0 commented Jan 24, 2023