Skip to content

v1.30.0 - Cloud HPC Toolkit A3 VM + NeMo Framework Solution

Compare
Choose a tag to compare
@harshthakkar01 harshthakkar01 released this 18 Mar 21:51
· 2169 commits to main since this release
08ae77e

What's Changed

Key New Features 🎉

  • Introduction of the Cloud HPC Toolkit A3 VM family blueprint featuring
    • A Slurm cluster composed of A3 VMs each with 8 NVIDIA H100 GPUs
    • An example for running the NVIDIA NeMo framework
    • An example for running the common nccl-tests benchmark

Module Improvements 🔨

Improvements 🛠

  • Add TPU v4 blueprint and tutorial to demonstrate running TPU workload by @harshthakkar01 in #2287
  • Update parameters for TPU nodeset module and add precondition checks and bump TPU to v3 by @harshthakkar01 in #2293
  • Add Slurm v6 version for image builder blueprint by @harshthakkar01 in #2297
  • Allow ghpc deploy blueprint.yaml by @mr0re1 in #2323
  • Slurm GCP version update; will cooldown before deleting orphan nodes by @nick-stroud in #2322
  • Add SlurmGCP v6 example of slurm compatible with startup scripts and integration test by @harshthakkar01 in #2346

Version Updates ⏫

Bug fixes 🐞

  • Added enable_devel for packer build to fix issue with bp by @cdunbar13 in #2334

New Contributors

Full Changelog: v1.29.0...v1.30.0