AWS for Industries
Executive Conversations: Evolving R&D with Siping “Spin” Wang, President and CTO of TetraScience
Ground-breaking new therapies and scientific advances in technology are driving the exponential rate of innovation in the pharmaceutical industry. With the explosion of big data and the complexity that comes with it, leaders in life sciences research and development (R&D) are adopting digital transformation to speed up discovery and maximize investments.
Siping “Spin” Wang, President and CTO of TetraScience, joins Jared Saul, Head of Healthcare and Life Sciences Startup Business at AWS, for a discussion on industry trends shaping the R&D process at pharmaceutical and biotech companies, the complex data lifecycle inherent to R&D, and a vision for the lab of the future. TetraScience provides an open, data-centric, cloud-native platform that’s purpose built for life sciences R&D: the Tetra R&D Data Cloud.
This Executive Conversation is one of a series of discussions held with those progressing disruptive technologies in their industries, where we seek to learn more about their discovery, ingenuity, and contributions to healthcare and life sciences.
—
Jared Saul: What major industry trends are shaping the pharma R&D ecosystem today?
Siping Wang: There are four overarching trends in pharma that have led organizations to rethink their R&D. First and foremost, pharma companies are increasingly treating experimental data as core intellectual property (IP). These experimental data include design parameters, metadata, raw data, analysis results, and many other previously ignored datasets. Next is automation of data flows. Robots have automated the physical level, and now digital tools are enabling automation of data-related tasks. The third trend is aggressive re-platforming to the cloud for more agility, elasticity, and access to cloud services. And finally, pharma companies are using artificial intelligence (AI) and machine learning (ML) for a predictive understanding of their experiments at scale, ultimately helping them reduce their number of experiments.
JS: What hurdles stand in the way of transforming life science R&D?
SW: Life science R&D is uniquely complex compared to other industries. A typical workflow can include over twenty different steps before reaching clinical trials; for example, there’s hypothesis generation, disease pathophysiology, assay development, hits and validation, safety screening, and cGMP manufacturing to name a few. Each step also involves a multitude of sub-steps and workflows. As you can see, the lift associated with R&D is incredibly multidimensional, requiring considerable resources and time. The complexity is also compounded when multiple departments and teams contribute throughout the drug discovery process—in addition to contract research organizations (CRO), or contract development and manufacturing organizations (CDMO). Then in terms of tooling and data, things get even more disjointed. Different analysis jobs require different tools that produce different results, while most instrument manufacturers and informatics vendors lock pharma companies into vendor-specific and proprietary datasets. It makes for an incohesive, point-to-point, arduously manual and brittle connection across siloed data systems, leading to highly inconsistent architecture that is expensive to maintain and every change is like a heart transplant. This is further cemented by compliance requirements where the general attitude is to leave the process alone if it’s not broken.
JS: How can pharma companies foster R&D innovation?
SW: With so many heterogeneous and incompatible data silos, pharma encounters significant bottlenecks to keep experimental and R&D data in an accessible and centralized repository. Additionally, pharma companies need to ensure the data is consistent, harmonized, labeled and ready to use by any downstream application. When this challenge is solved, innovation will thrive. Analytics tools will freely access R&D data that has been unlocked from vendor-proprietary formats and is readily available for advanced data science. At TetraScience, we’ve invested heavily in this direction and are already working with 12 of the top 30 pharma companies to achieve this.
JS: How does TetraScience enable innovation in pharma R&D?
SW: Our success is based on three pillars: data-centric solutions, a cloud-native approach, and a focus on R&D. In terms of being data-centric, it is fundamentally imperative to treat data as a core asset, providing the tooling and stewardship for pharma companies to manage data throughout its entire lifecycle—from data collection, harmonization, integration with third-party data systems, ETL, and preparation for analytics. Being data-centric means we do not just collect and store the data, or just do ETL. These traditionally disjointed data operations are brought together in the Tetra R&D Data Cloud. Our data-centric approach is also vendor agnostic, meaning we configure with different instrumentation and systems. We’ve turned the tables a bit. We say to organizations, “Continue doing what you’re doing and leverage architecture that enables any data system to be plugged in.” This changes the paradigm of how pharma companies use their data. Before they had to ask a vendor for a particular feature, which because of regulations, could take years to develop and deploy. Now, within days, they can build new applications leveraging state-of-the-art data science tools on top of data that TetraScience has harmonized. Next, being cloud-native means we can deliver high scalability, full traceability, and transparency for customers, which is very difficult within on-premises solutions. And lastly, the fact that our solution is tailored to R&D and not a generic data cloud means it’s equipped to handle challenges that are unique to pharma R&D.
JS: What’s the current state of data liquidity across the R&D ecosystem?
SW: Data liquidity will enable faster, more effective therapeutic and drug development that will also positively impact human health. Those benefits are quickly becoming recognized in R&D, but there are still many barriers to achieve it, including access to actionable data without sacrificing data integrity, plus security and compliance issues. However, we see the move toward data liquidity as an inevitable outcome and aim to bring everyone into the mix. We’re not competing with R&D informatics software vendors or instrument manufacturers; our task is to make data more accessible, enabling pharma companies to easily develop on top of it using data analytics tools and allowing the data to flow freely. Through the Tetra R&D Data Cloud, we hope to spark the development of a new layer of applications based on harmonized data and productized integrations. Historically, pharma companies had to manually create Excel spreadsheets or write proprietary code to analyze hundreds or thousands of experiments across different instruments.
JS: What security, regulatory, and compliance challenges does pharma R&D face?
SW: With the increased need for data liquidity in pharma R&D, the issue of keeping data secure while also adhering to the continuously evolving compliance and regulatory landscape is a serious concern. For example, look at qualification and validation of GxP guidelines. Qualification ensures the software provides accurate and precise results that are trustworthy. Validation establishes that the software meets the requirements for the intended use case. Traditionally, with on-premises infrastructure, qualification and validation have been very difficult to achieve due to challenges with infrastructure as code. TetraScience, on the other hand, is built on cloud-native infrastructure, which means full traceability is baked in. Through AWS Config, AWS CloudFormation, Amazon CloudWatch, AWS CloudTrail, and other services, our customers easily see what happened to their data, as well as the infrastructure that processed their data.
JS: What will the lab of the future look like, and how far in the future are we talking?
SW: The lab of the future will be fully automated, connected, and most importantly, data driven. Every scientist will be more like a data scientist and will use data to uncover new insights and discoveries. At TetraScience, we’re already working with pharma companies to build abstraction layers for their most important systems. This will dramatically reduce the barrier for extracting data across sources like experimental design, raw data, and analysis results. That’s the first step. Once data is accessible by any application, R&D organizations will quickly begin to benefit from data liquidity. It won’t matter if scientists don’t understand vendor-proprietary software or datasets, because they can apply any tools to data in JSON and Parquet formats. The democratization of R&D data will introduce brand new data-driven applications, dramatically lowering the barrier for scientists to gain valuable insights. From there, we believe the rate of innovation will pick up. We’ve seen the same thing happen with AWS. When AWS became available, it lowered the barrier to entry for developers because they no longer had to focus on infrastructure.
JS: How do TetraScience and AWS work together to enable a better data experience?
SW: The cloud-native approach is so fundamental to our strategy. Working with AWS has allowed us to focus on the data itself, instead of spending all our energy building and maintaining on-premise datacenters. We don’t have to worry about storage scalability or how to deploy across multiple pharma customers and the variations within their IT setups. If that were our focus, we wouldn’t be solving life science problems. Our unique contribution to this industry is creating data models, data pipelines, data integrations, and data applications that are uniquely tied to R&D workflows and datasets—not infrastructure. The robust suite of AWS services and solutions allows us to eliminate building in-house and also adopt best practices around compliance and security. Another benefit is pharma companies do not need to adopt new methods or processes. Being cloud-native on AWS, we can offer private cloud deployments, giving pharma companies the level of ownership they’re familiar with. This empowers R&D data teams to build more applications within their environments, which in turn facilitates data liquidity and allows them to gain much more value.
JS: What currently excites you about the pharma R&D space?
SW: The most exciting thing is asking, “What if?” Historically, scientists or data scientists were hesitant to ask that question—it was intimidating because the process to find the answer through massive amounts of Excel spreadsheets, proprietary formats, and siloed data systems was daunting and required specific skills. Our solution unlocks that possibility by bringing together previously disjointed datasets from different vendors, sites, and experiments. With the freedom to ask questions, curiosity spreads across departments and collaboration picks up. We are enabling a community-driven data application ecosystem that permeates innovation across the industry.
To learn more about how AWS works with biopharma customers, see Pharma and Biotech on AWS. For more Executive Conversations, head to the AWS for Industries blog.
—
Spin Wang is the Co-founder and CEO of TetraScience, the Boston-based R&D Data Cloud technology company that makes life sciences R&D data actionable and accessible. He earned a MS in Electrical Engineering and Computer Science from MIT, where he was awarded the Lockheed Martin Energy Fellowship. He also holds a BS in Applied Physics and Electrical Engineering from Cornell University. In 2017, Spin was selected to Forbes Magazine’s 30 Under 30 in Science. He is currently on the board of Pistoia Alliance, a global life science pre-competitive collaboration organization.