AWS HPC Blog
How SeatGeek simulates massive load with AWS Batch to prepare for big events
In this post we explore SeatGeek’s load testing system that simulates 50k simultaneous users. Originally built to prep SeatGeek for large-event traffic spikes, it now runs weekly to help them harden their code.
Customize Slurm settings with AWS ParallelCluster 3.6
With AWS ParallelCluster 3.6, you can directly specify Slurm settings in the cluster config file – improving reproducibility and another step towards self-documentation for your HPC infrastructure.
Protein Structure Prediction at Scale using AWS Batch
In this post, we discuss how Novo Nordisk approached the deployment of a scale-out HPC platform for running AlphaFold, while meeting their enterprise IT requirements and keeping the user experience simple.
HTC-Grid – examining the operational characteristics of the high throughput compute grid blueprint
The HTC-Grid blueprint meets the challenges that financial services industry (FSI) organizations for high throughput computing on AWS. This post goes into detail on the operational characteristics (latency, throughput, and scalability) of HTC-Grid to help you to understand if this solution meets your needs.
Deploying predictive models and simulations at scale using TwinFlow on AWS
AWS TwinFlow is an open source framework to build and deploy predictive models using heterogenous compute pipelines on AWS. In this post, we show the versatility of the framework with examples of engineering design, scenario analysis, systems analysis, and digital twins.
Rigor and flexibility: the benefits of agent-based computational economics
In this post, we describe Agent-Based Computational Economics (ACE), and how extreme scale computing makes it beneficial for policy design.
Streamlining distributed ML workflow orchestration using Covalent with AWS Batch
Complicated multi-step workflows can be challenging to deploy, especially when using a variety of high-compute resources. Covalent is an open-source orchestration tool that streamlines the deployment of distributed workloads on AWS resources. In this post, we outline key concepts in Covalent and develop a machine learning workflow for AWS Batch in just a handful of steps.
Introducing GPU health checks in AWS ParallelCluster 3.6
AWS ParallelCluster 3.6.0 can now detect GPU failures in HPC and AI/ML tasks. Health checks run at the start of Slurm jobs and if they fail, the job is requeued on another instance. This can increase reliability and prevent wasted spend.
Benchmarking the Oxford Nanopore Technologies basecallers on AWS
Oxford Nanopore sequencers enables direct, real-time analysis of long DNA or RNA fragments. They work by monitoring changes to an electrical current as nucleic acids are passed through a protein nanopore. The resulting signal is decoded to provide the specific DNA or RNA sequence by virtue of compute-intensive algorithms called basecallers. This blog post presents the benchmarking results for two of those Oxford Nanopore basecallers — Guppy and Dorado — on AWS. This benchmarking project was conducted in collaboration between G42 Healthcare, Oxford Nanopore Technologies and AWS.
Deploying a Level 3 Digital Twin Virtual Sensor with Ansys on AWS
AWS is developing new tools to enable easier and faster deployment of level 3 digital twin virtual sensors. in this post we show why L3 digital twins are needed for virtual sensors and how to elastically deploy one the cloud at scale.