AWS HPC Blog
Tag: Machine Learning
Deploying generative AI applications with NVIDIA NIMs on Amazon EKS
Learn how to deploy AI models at scale with @AWS using NVIDIA’s NIM and Amazon EKS! This step-by-step guide shows you how to create a GPU cluster for inference. Don’t miss part 1 of this 2-part blog series!
Implementing e-mail and SMS notifications in AWS ParallelCluster with Slurm
Learn how to configure email and SMS alerts for job events to stay on top of your HPC workloads with AWS ParallelCluster using Slurm.
Gang scheduling pods on Amazon EKS using AWS Batch multi-node processing jobs
AWS Batch multi-node parallel jobs can now run on Amazon EKS to provide gang scheduling of pods across nodes for large scale distributed computing like ML model training. More details here.
Improve HPC workloads on AWS for environmental sustainability
Need to cut your carbon footprint without sacrificing productivity? Migrating HPC workloads to the cloud allowed Baker Hughes to reduce emissions by 99%! Get tips for optimizing compute, storage, networking so you can do better.
Simulating autonomous mining operations using Robotec.ai on AWS
Big changes are underway in mining – see how the Boliden Group simulates fleets of autonomous trucks using AWS Batch – for safety and efficiency.
Call for participation: HPC tutorial series from the HPCIC
Interested in getting hands-on experience with cutting-edge HPC tools? Check out this blog post on an upcoming virtual training series from @LLNL and @AWSCloud. Learn emerging technologies from the experts this August.
Securing HPC on AWS: implementing STIGs in AWS ParallelCluster
Want to accelerate creating compliant Amazon EC2 images? Learn how HPC users can leverage cloud-native methods for applying STIG security standards.
Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances
Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.
Building an AI simulation assistant with agentic workflows
Simulations provide critical insights but running them takes specialized people, which can slow everyone down. We show how a Simulation Assistant can use LLMs and agents to start these workflows via chat so you can get results sooner.
Using machine learning to drive faster automotive design cycles
Aerospace and automotive companies are speeding up their product design using AI. In this post we’ll discuss how they’re using machine learning to shift design cycles from hours to seconds using surrogate models.