AWS HPC Blog

Category: Storage

Jumpstart your HPC journey with the AWS Parallel Computing Service getting started kit

Jumpstart your HPC journey with the AWS Parallel Computing Service getting started kit

If designing next-gen vehicles or chasing cures is part of your job, don’t miss this. We just launched a major new HPC service to help you scale complex workloads rapidly. Get the details on how AWS Parallel Computing Service can speed up your R&D.

Strategies for distributing executable binaries across grids in financial services

Strategies for distributing executable binaries across grids in financial services

You can boost the performance of your compute grids by strategically distributing your binaries. Our experts looked at lots of strategies for fast & efficient compute grid operations – to save you some work.

Building a 4x faster and scalable algorithm using AWS Batch for Amazon Logistics

Building a 4x faster and more scalable algorithm using AWS Batch for Amazon Logistics

In this post, AWS Professional Services highlights how they helped data scientists from Amazon Logistics rearchitect their algorithm for improving the efficiency of their supply-chain by making better planning decisions. Leveraging best practices for deploying scalable HPC applications on AWS, the teams saw a 4X improvement in run time.

Expanded filesystems support in AWS ParallelCluster 3.2

Expanded filesystems support in AWS ParallelCluster 3.2

AWS ParallelCluster version 3.2 introduces support for two new Amazon FSx filesystem types (NetApp ONTAP and OpenZFS). It also lifts the limit on the number of filesystem mounts you can have on your cluster. We’ll show you how, and help you with the details for getting this going right away.

Using Spot Instances with AWS ParallelCluster and Amazon FSx for Lustre

Processing large amounts of complex data often requires leveraging a mix of different Amazon EC2 instance types. These types of computations also benefit from shared, high performance, scalable storage like Amazon FSx for Lustre. A way to save costs on your analysis is to use Amazon EC2 Spot Instances, which can help to reduce EC2 costs up to 90% compared to On-Demand Instance pricing. This post will guide you in the creation of a fault-tolerant cluster using AWS ParallelCluster. We will explain how to configure ParallelCluster to automatically unmount the Amazon FSx for Lustre filesystem and resubmit the interrupted jobs back into the queue in the case of Spot interruption events.

Figure 1: High level architecture of the file system.

Scaling a read-intensive, low-latency file system to 10M+ IOPs

Many shared file systems are used in supporting read-intensive applications, like financial backtesting. These applications typically exploit copies of datasets whose authoritative copy resides somewhere else. For small datasets, in-memory databases and caching techniques can yield impressive results. However, low latency flash-based scalable shared file systems can provide both massive IOPs and bandwidth. They’re also easy to adopt because of their use of a file-level abstraction. In this post, I’ll share how to easily create and scale a shared, distributed POSIX compatible file system that performs at local NVMe speeds for files opened read-only.