AWS Public Sector Blog
Alzheimer’s disease research portal enables data sharing and scientific discovery at scale
According to the World Health Organization (WHO), more than 55 million people globally have dementia, with the most common form, Alzheimer’s disease, accounting for approximately 60-70%. The annual financial impact is estimated to be $1.3 trillion USD. Identifying causes of Alzheimer’s disease and developing diagnostics and potential cures for the condition requires multi-modal, multi-omics analysis. This is possible through public and private partnerships that enable access to large genetic, genomic, and neuroimaging datasets and expertise in big data processing, informatics, and algorithm development.
Unifying genomics data for Alzheimer’s disease research using AWS
Li San Wang, PhD, the Peter C. Nowell, M.D. Professor and Vice Chair for Research in the Department of Pathology and Laboratory Medicine at the University of Pennsylvania’s Perelman School of Medicine, is co-director for the Penn Neurodegeneration Genomics Center (PNGC) and directs multiple National Institutes of Health (NIH) funded projects on Alzheimer’s disease genetics.
In 2011, PNGC Director Dr. Gerard Schellenberg met with Dr. Wang and his research team to begin exploring cloud technology for large-scale genomic computations. They began to use Amazon Web Services (AWS) to build a small scale proof-of-concept of 36 exomes, or 1.1 terabytes (TB). Since then, their work has evolved to become one of the largest databases of genomic data for Alzheimer’s disease and related conditions: the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS DSS), powered by AWS.
Pictured: Li San Wang, PhD, the Peter C. Nowell, M.D. Professor and Vice Chair for Research in the Department of Pathology and Laboratory Medicine at the University of Pennsylvania’s Perelman School of Medicine.
The NIAGADS genomic database on AWS is a searchable annotation resource that provides access to publicly available datasets for Alzheimer’s disease and related neuropathologies. Created to make Alzheimers-genetics knowledge more accessible to researchers, NIAGADS has genomics data on 172,701 samples from 98 datasets and is now 1.3 petabytes (PB) in total size. Data types include whole-genome/exome sequencing; genome-wide-association studies (GWAS) and imputation; RNASeq; single-neuron whole genome sequencing (WGS); proteomics; and metabolomics. The database’s interface is designed to guide users unfamiliar with genetic data in not only exploring, but also interpreting this ever-growing volume of data.
Researchers can identify and interpret genomic regions of interest compiled from harmonized datasets via interactive search and the NIAGADS genome browser. The data is curated along with variant and gene annotations, as well as their functional significance based on public or Alzheimer’s disease-related experimental data sources
Enabling data sharing to accelerate Alzheimer’s disease research
NIAGADS is creating a system that promotes scientific discovery through data sharing with a large cadre of institutions. The NIAGADS Data Sharing Service facilitates the deposition and sharing of genomic data from the Alzheimer’s Disease Sequencing Project (ADSP) and other National Institute on Aging (NIA)-funded dementia genomic studies with approved researchers from the broader community. Identifying the genetic variants that increase the risk of Alzheimer’s disease or protect against it requires sequencing and analyzing the genomes of many individuals—something that’s impossible with data from a single institution alone.
To date, more than 90 genome-wide significant locations (loci) associated with Alzheimer’s disease risk have been discovered (Kunkle AD GWAS NG2019, Bellenguez AD GWAS NG2022, Bis Mol Psychiatry and Holstege WES 2022). The data housed in NIAGADS represents some of these major advances and findings in the field. Association with other related clinical outcomes, such as age at onset and cerebrospinal fluid biomarker levels, have led to the discovery of hundreds of loci and associations with the potential to help researchers better understand the biology of dementia, test new hypotheses, and develop novel therapeutic strategies. The Alzheimer’s Disease Variant Portal (ADVP) at NIAGADS maintains a collection of such genetic findings with links to publications and annotations of genes and variants.
As a result of the vision and hard work of many individuals from academia, industry, and federal government, principal investigators can request available data through a data access request management system by logging in using their eRA Commons ID. Each data access request is reviewed by the NIAGADS data access committee.
AWS infrastructure supporting the NIAGADS DSS
NIAGADS uses AWS for the transfer, processing, storage, and archival of genomics data, as well as monitoring of data access patterns. For the data sharing infrastructure, NIAGADS uses Amazon Simple Storage Service (Amazon S3), Amazon S3 Glacier Deep Archive, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic File System (Amazon EFS), Amazon Elastic Block Store (Amazon EBS), and the AWS Transfer Family. For security and compliance, the team leverages services such as AWS CloudTrail, Amazon GuardDuty, AWS Config, AWS Security Hub, and Amazon CloudWatch.
Diversifying the datasets and more next steps for NIAGADS
With the WHO reporting that over 60% of individuals with a diagnosis of dementia live in low- and middle-income countries, expanding the pool of researchers, including international collaborators, is a key goal for the program. NIAGADS is excited to continue to build on AWS to further expand its global reach, ability to support collaborative analysis of all types of Alzheimer’s disease data, and the data sharing ecosystem.
It will take a village to help identify protective gene variants and pathways for therapy and prevention. Researchers from qualifying institutions are encouraged to visit the NIAGADS website and work with the NIAGADS team on contributing to and analyzing data.
Read more about open science models on AWS:
- Largest metastatic cancer dataset now available at no cost to researchers worldwide
- Creating access control mechanisms for highly distributed datasets
- Pediatric cancer researchers use AWS to accelerate Cancer Moonshot
- How researchers can meet new open data policies for federally-funded research with AWS
- Accelerating and democratizing research with the AWS Cloud
- Introducing 10 minute cloud tutorials for research
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.