Compute | AWS HPC Blog

Optimizing your AWS Batch architecture for scale with observability dashboards

AWS Batch customers often ask for guidance to optimize their architectures and make their workload to scale rapidly. Here we describe an observability solution that provides insights into your AWS Batch architectures and allows you to optimize them for scale and quickly identify potential throughput bottlenecks for jobs and instances.

DCV in 2022: a year in review

In this post we recap all the really significant feature released in DCV from 2022 that delighted our customers. Of course, we’re still not done, so expect more in 2023.

AWS ParallelCluster 3.3.0 now supports On-Demand Capacity Reservations

With #AWS #ParallelCluster 3.3, you can now easily take advantage of #EC2 On-Demand Capacity Reservations to help ensure your jobs have the capacity they need when they need it. This post describes the new feature and how you can benefit from it.

Building deep-learning models for geoscience with MATLAB and NVIDIA GPUs

Building deep learning models for geoscience using MATLAB and NVIDIA GPUs on Amazon EC2 (Part 2 of 2)

This is the second of a two-part post.Part 1 discussed the workflow for developing AI models using MATLAB for seismic interpretation. Today, we will discuss the various compute resources leveraged from AWS and NVIDIA for developing the models.

Building deep learning models for geoscience using MATLAB and NVIDIA GPUs on Amazon EC2 (Part 1 of 2)

In this blog post, we discuss how geoscientists can use shallow RNN-based algorithms with MATLAB to automatically recognize distinct geologic features in seismic images. We discuss the workflow for developing the AI models using MATLAB for seismic interpretation. In a second post will introduce the various compute resources leveraged from AWS and NVIDIA for developing the models.

Second generation EFA: improving HPC and ML application performance in the cloud

Since launch, EFA has seen continuous improvements in performance. In this post, we talk about our 2nd generation of EFA, which takes another step in improving Machine Learning and High Performance Computing in the Cloud.

Launch self-supervised training jobs in the cloud with AWS ParallelCluster

In this post we describe the process to launch large, self-supervised training jobs using AWS ParallelCluster and Facebook’s Vision Self-Supervised Learning (VISSL) library.

Avoid overspending on AWS Batch using a serverless cost guardian monitoring architecture

Avoid overspending with AWS Batch using a serverless cost guardian monitoring architecture

Pay-as-you-go resources are a compelling but budget-limited researchers performing HPC workloads need help working within the bounds of their grants. In this post, we show how to build a real-time cost guardian for AWS Batch to help enforce those limits.

Support for Instance Allocation Flexibility in AWS ParallelCluster 3.3

AWS ParallelCluster 3.3.0 now lets you define a list of Amazon EC2 instance types for resourcing a compute queue. This gives you more flexibility to optimize the cost and total time to solution of your HPC jobs, especially when capacity is limited or you’re using Spot Instances.

How AWS Batch developed support for Amazon Elastic Kubernetes Service

Today, we discuss AWS batch on Amazon EKS, and the initial motivation and design choices the team made when we developed the service, and some of the challenges to overcome.

Category: Compute