AWS HPC Blog
Run protein folding on AWS with Quantori
This post was contributed by Marissa Powers, Konstantinos Tzouvanas, and Pavlos Kaimakis from AWS, and Mikhail Serkov, Senior Director, HPC Services at Quantori
Proteins are large biomolecules and the target for the majority of pharmaceutical drugs on the market today. Researchers are increasingly using machine-learning models to predict the three-dimensional structure and properties of proteins. With the proliferation of these models, scientists are looking for easy-to-use interfaces for testing with the most efficient price and performance possible.
Quantori is a scientific informatics data science company, and they’ve been working closely with biopharmaceutical scientists to build a solution for running generative AI protein models on AWS. Their solution allows users to easily run their protein engineering analyses and visualizations in their own AWS accounts.
In this post, we’ll show the user interface for this solution, the architecture, and some example price and performance benchmarks so you can get a feel for how this might work for you.
User interface
One of the key features of the Quantori solution is the user interface, which makes it easier for users to predict protein structures. Let’s walk through some key components of the solution and the user experience – including the login interface, running an analysis, and viewing results.
Login
The application uses username and password authentication by default and stores the credentials in the solution’s database. To sign up, you enter an email and manually create a password. Integration with existing identity managers is possible for production use.
Running an analysis
Once logged in, you will first see a list of proteins registered in the system (Figure 2). The list includes the number of prior runs and user-selected tags for each protein. You can easily filter the list by tag to show only protein runs associated with specific projects.
To add a new protein, you can explicitly type the amino acid sequence or upload a FASTA file (a text-based format for storing DNA and protein sequences).
Selecting a specific protein shows a summary of the execution results, including if the runs were successful and the method used for prediction (e.g. AlphaFold2; which you’ll see in Figure 3).
To run an analysis, you should select Add run from the protein landing page (in Figure 3, again). This opens a page where you can easily select the machine-learning (ML) algorithm to run, the instance type from Amazon Elastic Compute Cloud (Amazon EC2) for each stage, and the database configuration (Figure 4).
Once you select your preferred settings, you just hit Run Folding! to begin the analysis.
View results
Once the analysis is complete, you can visualize your results (which we’ve shown in Figure 5). The interface includes confidence score metrics for the models run, along with the three-dimensional structure of the protein.
Architecture
The Quantori solution uses Amazon EC2 instances to host the user interface and AWS Batch to run protein folding jobs. We’ve shown an overview of the architecture in Figure 6.
The solution executes the user interface and AWS Batch jobs in separate Amazon Virtual Private Clouds (Amazon VPCs). The solution reserves one VPC for the user frontend, and a second for the backend.
Based on the selected configuration during a prediction run, AWS Batch downloads containers from Amazon Elastic Container Registry (ECR) and deploys them to appropriate instances in the backend VPC.
Instances in the backend VPC share an Amazon FSx for Lustre file system, allowing scaling to hundreds of GBytes/s of throughput . The container images the solution uses, reside in Amazon ECR and are sourced from an AWS CodePipeline.
Benchmarking
Let’s look at some results of testing Quantori performed across a range of proteins, instance types, and instance sizes.
The goal of these runs was to: (1) show expected performance; (2) demonstrate exemplar testing that you can easily run within the solution; and (3) provide guidance on best instance types for jackhmmer
(a tool used to identify the evolutionary relationships and common patterns between genes) and AlphaFold2
(an AI system that can predict 3D structures of proteins from amino acid sequences with atomic-level accuracy).
Figure 7 shows the scalability across a range of protein lengths. With the Quantori solution, you can easily run models across multiple proteins.
Quantori chose a single protein (PDB ID 7FCC) for their next set of tests and ran this on multiple instance types and sizes. Figure 8 shows jackhmmer runtime across two Amazon EC2 instance families (c6g and m7g). The m7g.4xlarge provided faster runtimes due to higher memory compared to c6g.4xlarge, at a similar cost.
Finally, Quantori compared AlphaFold2 runtime across three different sized g4dn series instance types. As shown in Figure 9, while the g4dn.4xlarge provides slightly better performance, the run costs more than twice as much as the smaller g4dn.xlarge, indicating that when we are just adding vCPUs but not changing the number of GPUs, jackhammer is not scaling linearly.
For this specific dataset (UniRef90 with the 7FCC protein), the m7g.4xlarge instance type for jackhmmer and the g4dn.xlarge for AlphaFold2 provide best price-performance from the instances Quantori tested. You can easily configure multiple architectures (e.g. x86, aarch64, or GPUs) for a single workflow and compare multiple instances types – and sizes – with the Quantori solution.
While these specific instances provide a good starting point, Quantori recommends using the solution to test and optimize for your specific workflow and datasets.
Conclusion
Scientists are using protein engineering models to accelerate drug discovery research.
The Quantori solution allows you to submit jobs with different input datasets and parameters easily. You can monitor jobs and visualize the results all from a single easy to use interface. This solution allows scientists to focus more on discovery and less on maintaining software stacks and environments. And that’s an important way the cloud can contribute immediately to medical science.
For more information, contact Quantori at contact@quantori.com.