Fred Hutch Microbiome Researchers Use AWS to Perform Seven Years of Compute Time in Seven Days
2019
At Seattle’s Fred Hutch Microbiome Research Initiative (MRI), a team of researchers are engaged in analysis of the microbiome, which is the collection of microbes on and inside the human body. But these researchers aren’t just studying the microbiome—they’re striving to manipulate microbiomes to make therapeutic cancer drugs more effective.
To support their efforts, researchers must analyze and process an immense number of whole genome datasets. “There are hundreds of thousands of microbes in each person’s microbiome, and they’re different for each individual,” says Sam Minot, PhD and staff scientist at Fred Hutch MRI. “Translating gigabytes of raw microbiome genomic data into insights about which specific microbes are present in a person is a computationally intensive task requiring highly scalable technology.”
Our goal is to accelerate our research processes on AWS so we can get closer to developing therapeutics to fight cancer."
Sam Minot,
PhD and Staff Scientist, Fred Hutch Microbiome Research Initiative
Running Scientific Research Workloads on AWS
Dr. Minot chose Amazon Web Services (AWS) to power the high-performance computing (HPC) platform that runs microbiome analysis. He says, “Additional research groups within the organization had been using AWS, and we saw how they benefited from the scalability of the cloud.”
These researchers execute their computational analysis using the Nextflow framework to orchestrate AWS Batch processes and scale the HPC platform to accelerate processing time. “AWS Batch integrates well with Nextflow, so it was easy for us to get Nextflow up and running without having to reinvent the wheel,” says Dr. Minot. The organization runs its HPC workloads on Amazon Elastic Compute Cloud (Amazon EC2) instances, powered by Intel Xeon Platinum 8000 Series processors, and it stores research data in Amazon Simple Storage Service (Amazon S3) buckets.
Recently, Dr. Minot’s group began using Amazon EC2 Spot Instances to access compute resources for its HPC environment. Amazon EC2 Spot Instances are unused Amazon EC2 capacity that is available at up to a 90 percent discount compared to On-Demand Instance prices. He says, “We can use the same budget to access more compute resources using Amazon EC2 Spot Instances.”
Seven Years of Compute Time in Seven Days
Using AWS, Dr. Minot’s group has the scalability to analyze publicly available datasets that contain data on more than 15,000 biological samples, each representing a gigabyte of storage. As a result, they have performed seven years of aggregate compute time in seven days, giving researchers the ability to get results faster and ultimately speed research that will find therapeutics for cancer treatments. “Running our microbiome research on Amazon EC2 Spot Instances, we spend less money and less time to get scientific answers from the analysis,” says Dr. Minot. “Our goal is to accelerate our research processes on AWS so we can get closer to developing therapeutics to fight cancer.”
“Zooming In” on Genes
Dr. Minot is using the AWS-based research computing environment to increase the resolution of analysis for large collections of microbiome samples. “We can use the scale of AWS to zoom in from the species level to the genes present inside those species, which requires an extremely high level of computational detail,” he says. By increasing analytical resolution, researchers more easily find links to health outcomes. “As an example, we can perform global analysis on AWS across thousands of published datasets and study groups of people with inflammatory bowel disease. At the same resolution, we can also analyze microbiomes in people with colorectal cancer, and then identify how inflammation may relate to the development of cancer. Using the scale and resolution we get on AWS, we can make better comparisons across different disease states, many of which interact with the microbiome.”
Additionally, Dr. Minot can share his methods with other scientific researchers, who can conduct their own research using these same methods to further extend scientific discovery. “The simplicity of integrating Amazon EC2 Spot Instances with AWS Batch and Nextflow to scale gives us an element of reproducibility,” says Vijay Sureshkumar, director of technology partnerships at Fred Hutch MRI. “We can extend this model to other research labs within Fred Hutch or at other institutions. This can lead to more research collaboration, so we can work together to find potential cures for cancer.”
To learn more, visit thinkwithwp.com/hpc.
About Fred Hutch Microbiome Research Initiative
The Fred Hutch Microbiome Research Initiative, funded by Seattle’s Fred Hutchinson Cancer Research Center, includes microbiome investigators with expertise in study design, laboratory methods, animal models, human intervention studies, data analysis, and visualization. These researchers are working to predict health outcomes, understand the pathogenesis of disease, and manipulate the microbiota to promote health.
Benefits of AWS
- Processes data from more than 15,000 biological samples
- Reduced 7 years of compute time to 7 days, so researchers can get results faster
- Increases resolution on microbiome samples to find links to improve health outcomes
- Enables collaboration with other scientific researchers
AWS Services Used
Amazon EC2
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.
AWS Batch
AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices.
Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.
Get Started
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.