AWS HPC Blog

Supporting climate model simulations to accelerate climate science

The Amazon Sustainability Data Initiative (ASDI), AWS is donating cloud resources, technical support, and access to scalable infrastructure and fast networking providing high performance computing solutions to support simulations of near-term climate using the National Center for Atmospheric Research (NCAR) Community Earth System Model Version 2 (CESM2) and its Whole Atmosphere Community Climate Model (WACCM). In collaboration with ASDI, AWS, and SilverLining, a nonprofit dedicated to ensuring a safe climate, the National Center for Atmospheric Research (NCAR) will run an ensemble of 30 climate-model simulations on AWS. The climate runs will simulate the Earth system over the period of years 2022-2070 under a median scenario for warming and make them available through the AWS Open Data Program. The simulation work will demonstrate the ability to use cloud infrastructure to advance climate models in support of robust scientific studies by researchers around the world and aims to accelerate and democratize climate science.

High Burst CPU Compute for Monte Carlo Simulations on AWS

Playtech mathematicians and game designers need accurate, detailed game play simulation results to create fun experiences for players. While software developers have been able to iterate on code in an agile manner for many years, for non-analytical solutions, mathematicians have had to rely on slow CPU-bound Monte-Carlo simulations, waiting, as software engineers once did, many hours or overnight to get the results of their latest changes. These statistics are also required as evidence of game fairness in the highly regulated online gaming business. Playtech has developed an AWS Lambda Serverless based solution that provides massive burst compute performance that allows game simulations in minutes rather than hours. This post goes into the details of the architecture, as well as some examples of using the system in our development and operations.

Stion – a Software as a Service for Cryo-EM data processing on AWS

This post was written by Swapnil Bhatkar, Cloud Engineer, NREL in collaboration with Edward Eng Ph.D. and Micah Rapp Ph.D, both SEMC/NYSBC, and Evan Bollig Ph.D. and Aniket Deshpande, both AWS. Introduction Cryo-electron microscopy (Cryo-EM) technology allows biomedical researchers to image frozen biological molecules, such as proteins, viruses and nucleic acids, and obtain structures of […]

Price-Performance Analysis of Amazon EC2 GPU Instance Types using NVIDIA’s GPU optimized seismic code

Seismic imaging is the process of positioning the Earth’s subsurface reflectors. It transforms the seismic data recorded in time at the Earth’s surface to an image of the Earth’s subsurface. This is done by back-propagating data from time to space in a given velocity model. Kirchhoff depth migration is a well-known technique used in geophysics for seismic imaging. Kirchhoff time and depth migration produce an image with higher resolution and generate an image of the subsurface for a subset class of the data, providing valuable information about the petrophysical properties of the rocks and helps to determine how accurate the velocity model is. This blog post looks at the price-performance characteristics computing Kirchhoff migration methods on GPUs using Nvidia’s GPU-optimized code.

Bare metal performance with the AWS Nitro System

High Performance Computing (HPC) is known as a domain where applications are well-optimized to get the highest performance possible on a platform. Unsurprisingly, a common question when moving a workload to AWS is what performance difference there may be from an existing on-premises “bare metal” platform. This blog will show the performance differential between “bare metal” instances and instances that use the AWS Nitro hypervisor is negligible for the evaluated HPC workloads.

Pushing pixels with NICE DCV

NICE DCV, our high-performance, low-latency remote-display protocol, was originally created for scientists and engineers who ran large workloads on far-away supercomputers, but needed to visualize data without moving it. Pushing pixels over limited bandwidth across the globe has been the goal of the DCV team since 2007. DCV was able to make very frugal use of very scarce bandwidth, because it was super lean, used data-compression techniques and quickly adopted cutting-edge technologies of the time from GPUs (this is HPC, after all, we left nothing on the table when it came to exploiting new gadgets). This allowed the team to create a super light-weight visualization package that could stream pixels over almost any network. Fast forward to the 2020s, and a generation of gamers, artists, and film-makers all want to do the same thing as HPC researchers- only this time there are way more pixels, because we now have HD and 4k (and some people have multiple), and for most of them, it’s 60 frames per second, or it’s not worth having. Today we have around 12x the number of pixels, and around 3x the frame rate compared to TV of circa 2007. Fortunately, networking improved a lot in that time: a high-end user’s broadband connection grew around 60x in bandwidth, but the 120x growth in computing power really tipped the balance in favor of bringing remote streaming to the masses. Still, physics remains, meaning the latency forced on us by the curvature of the earth and the speed of light, is still a challenge. We still haven’t beaten physics, but we’re making up for it by building our own global fiber network and adding more machinery (and in local and wavelength zones) to get closer to more customers as soon as we can.

Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx

Batch processing is a common need across varied machine learning use cases such as video production, financial modeling, drug discovery, or genomic research. The elasticity of the cloud provides efficient ways to scale and simplify batch processing workloads while cutting costs. In this post, you’ll learn a scalable and cost-effective approach to configure AWS Batch Array jobs to process datasets that are stored on Amazon S3 and presented to compute instances with Amazon FSx for Lustre.

In the search for performance, there’s more than one way to build a network

AWS worked backwards from an essential problem in HPC networking (MPI ranks need to exchange lots of data quickly) and found a different solution for our unique circumstances, without trading off the things customers love the most about cloud: that you can run virtually any application, at scale, and right away. Find out more about how Elastic Fabric Adapter (EFA) can help your HPC workloads scale on AWS.

Getting started with containers in HPC at ISC’21

Containers are rapidly maturing within the high performance computing (HPC) community and we’re excited to be part of the movement: listening to what customers have to say and feeding this back to both the community and our own product and service teams. Containerization has the potential to unblock HPC environments, so AWS ParallelCluster and container-native schedulers like AWS Batch are moving quickly to reflect the best practices developed by the community and our customers. This year is the seventh consecutive year we are hosting the ‘High Performance Container Workshop’ at ISC High Performance 2021 conference (ISC’21). The workshop will be taking place on July 2nd at 2PM CEST (7AM CST). The full program for the workshop is available on the High Performance Container Workshop page at https://hpcw.github.io/

Building highly-available HPC infrastructure on AWS

In this blog post, we will explain how to launch highly available HPC clusters across an AWS Region. The solution is deployed using the AWS Cloud Developer Kit (AWS CDK), a software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation, hiding the complexity of integration between the components.