Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.6 backport PR request list #8455

Open
tengyifei opened this issue Dec 5, 2024 · 12 comments
Open

2.6 backport PR request list #8455

tengyifei opened this issue Dec 5, 2024 · 12 comments

Comments

@tengyifei
Copy link
Collaborator

tengyifei commented Dec 5, 2024

This is a tracker for backport/cherry-picks into 2.6. For any PRs you want to backport to 2.6, please reply with following:

  • Original PR link (this PR should merge into master)
  • Reason to backport
  • 2.6 backport PR link (a separate PR should be created, and that PR should merge into r2.6)

This process is similar to the backport request thread in 2.5 release #7977

Please note the criterion for cherrypicking into 2.6: #7203

@tengyifei
Copy link
Collaborator Author

tengyifei commented Dec 11, 2024

@lsy323 libtpu pin update to 0.0.6: #8480.
Reason: pick stable libtpu for Trillium
Backport PR link: manually pushed to r2.6 since there's no delta between these branches

@yaochengji
Copy link
Collaborator

Fix a DDP graph capture issue #8489
Reason: The DDP result before this patch is wrong
Backport PR link: #8500

@mcuiaws
Copy link
Contributor

mcuiaws commented Dec 18, 2024

Original PR: Compute and hash buffer_donor_indices for step marker #8467
Reason: Fixes tensor corruption issue
Backport PR: #8503

@mcuiaws
Copy link
Contributor

mcuiaws commented Dec 18, 2024

Original PR: xm.save() should not set sync_xla_data=True when sync'ing. #8484
Reason: Fixes tensor corruption issues, easily reproducible by running huggingface tutorials.
Backport PR: #8504

@mcuiaws
Copy link
Contributor

mcuiaws commented Dec 18, 2024

Original PR: Add xm.xla_device_kind() to return XLA device kind string. #8493
Reason: Requested by Neuron customers. Feature available in JAX but not PyTorch/XLA. Should be very low risk.
Backport PR: #8506

@jeffhataws
Copy link
Collaborator

Original PR: When modifying IR node, make sure to not lose the read_only bit #8505
Reason: Fixed a bug where 0-dimensional tensors result in aliasing errors
Backport PR: TBD

@mcuiaws
Copy link
Contributor

mcuiaws commented Dec 20, 2024

Original PR: When modifying IR node, make sure to not lose the read_only bit #8505 Reason: Fixed a bug where 0-dimensional tensors result in aliasing errors Backport PR: TBD

Backport PR: #8508

@avizon-aws
Copy link
Collaborator

avizon-aws commented Dec 20, 2024

Cherry-Pick PR for softmax autocast:
Reason: This PR fixes the precision issues related to softmax being done in BF16 as it was not a part of the autocast policy which leads to convergence issues.
Original PR:#8509
Cherry-pick PR: #8511

@savitha-aws
Copy link
Contributor

Originial PR: Add xla autocast support, update autocast APIs in checkpointing #8523
Reason: PR adds missing XLA autocast support in gradient checkpointing and updates deprecated APIs for cuda and cpu autocast.
Backport PR: #8527

@rpsilva-aws
Copy link
Contributor

Original PR: Introduce deterministic hash for user computations #8539
Original issue: #8537
Reason: Fixes a day-one bug with the user computation hash
Backport PR: #8554

@rpsilva-aws
Copy link
Contributor

Original PR: Metadata agnostic hash for user computations #8550
Original feature: #8538
Reason: User computation is cache agnostic to OpMetadata in the HLO module proto, not influencing execution semantics of the computation.
Backport PR: TBD (dependency on #8554)

@rpsilva-aws
Copy link
Contributor

Original PR: Metadata agnostic hash for user computations #8550 Original feature: #8538 Reason: User computation is cache agnostic to OpMetadata in the HLO module proto, not influencing execution semantics of the computation. Backport PR: TBD (dependency on #8554)

Backport PR: #8557

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants