AWS HPC Blog

Getting Started with NVIDIA Clara Parabricks on AWS Batch using AWS CloudFormation

In this blog post, we’ll show how you can run NVIDIA Parabricks on AWS Batch leveraging AWS CloudFormation templates. Parabricks is a GPU-accelerated tool for secondary genomic analysis. It reduces the runtime of variant calling on a 30x human genome from 30 hours to just 30 minutes, and leverages AWS Batch to provide an interface that scales compute jobs across multiple instances in the cloud.

Understanding the AWS Batch termination process

In this blog post, we help you understand the AWS Batch job termination process and how you may take actions to gracefully terminate a job by capturing SIGTERM signal inside the application. It provides you with an efficient way to exit your Batch jobs. You also get to know about how job timeouts occur, and how the retry operation works with both traditional AWS Batch jobs and array jobs.

Bayesian ML Models at Scale with AWS Batch

Ampersand is a data-driven TV advertising technology company that provides aggregated TV audience impression insights and planning on 42 million households, in every media market, across more than 165 networks and apps and in all dayparts (broadcast day segments). The Ampersand Data Science team estimated that building their statistical models would require up to 600,000 physical CPU hours to run, which would not be feasible without using a massively parallel and large-scale architecture in the cloud. AWS Batch enabled Ampersand to compress their time of computation over 500x through massive scaling while optimizing their costs using Amazon EC2 Spot. In this blog post, we will provide an overview of how Ampersand built their TV audience impressions (“impressions”) models at scale on AWS, review the architecture they have been using, and discuss optimizations they conducted to run their workload efficiently on AWS Batch.

Running cost-effective GROMACS simulations using Amazon EC2 Spot Instances with AWS ParallelCluster

In this blog post, we cover how to run GROMACS – a popular open source designed for simulations of proteins, lipids, and nucleic acids – cost effectively by leveraging EC2 Spot Instances within AWS ParallelCluster. We also show how to checkpoint GROMACS to recover gracefully from possible Spot Instance interruptions.

Introducing the Spack Rolling Binary Cache hosted on AWS

Today we’re excited to announce the availability of a new public Spack Binary Cache. In a collaboration, between AWS, E4S, Kitware, and the Lawrence Livermore National Laboratory (LLNL), Spack users now have access to a public build cache hosted on Amazon S3. The use of this Binary Cache will result in up to 20x faster install times for common Spack packages.

Benchmarking NVIDIA Clara Parabricks Somatic Variant Calling Pipeline on AWS

Somatic variants are genetic alterations which are not inherited but acquired during one’s lifespan, for example those that are present in cancer tumors. In this post, we will demonstrate how to perform somatic variant calling from matched tumor and normal genome sequence data, as well as tumor-only whole genome and whole exome datasets using an NVIDIA GPU-accelerated Parabricks pipeline, and compare the results with baseline CPU-based workflows.

AI-based drug discovery with Atomwise and WEKA Data Platform

Drug discovery is an expensive proposition, with a $2.6 billion cost over 10 years and just a 12% success rate. AI promises to significantly improve the success rate by finding small molecule hits for undruggable targets. On the forefront of using AI in drug discovery is Atomwise, with its AtomNet® platform. In this blog, we will lay out the challenges of the drug discovery process, and show how AI/ML startups are solving these challenges using solutions from Atomwise, AWS, and WEKA.

Figure 1: Comparison of simulation performance for the Le Mans test case run with Open MPI and Intel MPI. Intel MPI offers better performance compared to Open MPI.

Simcenter STAR-CCM+ price-performance on AWS

Organizations such as Amazon Prime Air and Joby Aviation use Simcenter STAR-CCM+ for running CFD simulations on AWS so they can reduce product manufacturing cycles and achieve faster times to market. In this post today, we describe the performance and price analysis of running Computational Fluid Dynamics (CFD) simulations using Siemens SimcenterTM STAR-CCM+TM software on AWS HPC clusters.