AWS Public Sector Blog
Calling All Data Scientists to Help Improve Cancer Screening Technology
Two out of every five people in the U.S. will be diagnosed with cancer during their lifetimes and the number of new cancer cases will rise to 22 million globally within the next two decades, according to the National Cancer Institute (NCI). And as research organizations work to find a cure, the same technology behind improved voice assistants and credit card fraud detection—artificial intelligence—could also help improve cancer screening and save lives.
Through Machine Learning and Artificial Intelligence, participants of the third annual Data Science Bowl, have the chance to improve lung cancer screening technology that can reduce lung cancer deaths by 20 percent.
The Data Science Bowl competition was created by Booz Allen Hamilton in partnership with Kaggle. Amazon Web Services is proud to sponsor the 2017 Data Science Bowl, which aims to inspire everyday citizens, data scientists, and medical communities around the world to work together and improve the success rate of low-dose CT scanning, using training and test datasets directly provided or facilitated through the National Cancer Institute.
This year, the 90-day Data Science Bowl competition will award winners with over $1 million in prizes, including AWS cloud computing credits. The funds for the prize purse will be provided by the Laura and John Arnold Foundation. To learn more and participate in the Data Science Bowl, visit DataScienceBowl.com.
Last year’s Data Science Bowl was related to heart health. Learn more about it here.
AWS Public Datasets
Today, qualified researchers can access two of the world’s largest collections of cancer genome data as AWS Public Datasets:
- The Cancer Genome Atlas (TCGA) corpus of raw and processed genomic, transcriptomic, and epigenomic data from thousands of cancer patients is now freely available on Amazon S3 for registered users of the Cancer Genomics Cloud, one of the funded cancer cloud pilots of the National Cancer Institute.
- Over 2,400 whole genomes from 1,100 unique donors from the International Cancer Genome Consortium (ICGC) PanCancer Analysis of Whole Genomes (PCAWG) dataset is also freely available on Amazon S3 for credentialed researchers, subject to the ICGC data sharing policies.
And, in order to help data scientists work with unique datasets, we built the AWS Research Cloud Program. The program was built by researchers, for researchers, in order to enable easy use of AWS resources by the scientific community around the globe. It’s free to join the program, and you can download the guide here to get started.
Key Resources for Researchers and Scientists
Additionally, below are some key resource links for researchers to help in the Data Science Bowl:
- Big Compute (HPC) on AWS (particularly C5, F1, Elastic GPUs)
- Artificial Intelligence on AWS
- Machine Learning on AWS
- AWS Research & Technical Computing Tools
How Does the Cloud Help Cure Cancer?
The cloud can fuel cancer breakthroughs at a rapid speed and we are looking forward to seeing what the participants of the Data Science Bowl are able to achieve using the cloud. For example, The Algorithms, Machines, and People (AMP) Lab at the University of California Berkeley builds scalable machine learning and data analysis technologies that turn raw data into actionable research insights, shared globally.
Among the many experiments run by the AMP Lab, one area of concentration is in the field of genomics and cancer research. Due to the vast amount of data that genome sequencing produces, the AMP Lab leverages AWS cloud-based compute power to quickly scale the compute resources needed to analyze algorithms that are used in genomics work. As a result, researchers are able to use many machines in the cloud simultaneously, to process genome data faster and more cost effectively than ever before.
Learn how more customers, like American Heart Association, National Institute of Health, and Harvard Medical School, use the AWS Cloud to revolutionize our understanding of disease and develop novel approaches to diagnosis and treatment.
Good luck to all participants!