AWS HPC Blog

Tag: ML

AWS Batch enables near-real-time energy production forecasts using NVIDIA Earth-2

Using AWS Batch and NVIDIA Earth-2, we built a scalable workflow that explores millions of scenarios at a fraction of the cost of traditional methods. This innovative approach not only provides rapid energy calculations, but also shows the potential of AI-driven meteorology.

Large scale training with NeMo Megatron on AWS ParallelCluster using P5 instances

Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances

Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.