AWS Public Sector Blog
22 new or updated open datasets on AWS: New polar satellite data, blockchain data, and more
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through this program, customers are making over 100PB of high-value, cloud-optimized data available for public use.
The full list of publicly available datasets are on the Registry of Open Data on AWS and are now also discoverable on AWS Data Exchange. This quarter, AWS released 22 new or updated datasets including Amazonia-1 imagery, Bitcoin and Ethereum data, and elevation data over the Arctic and Antarctica. Check out some highlights:
ArcticDEM and the Reference Elevation Model of Antarctica (REMA)
Launched this quarter, ArcticDEM and REMA are the result of an initiative between the National Geospatial-Intelligence Agency (NGA) and the National Science Foundation (NSF) to create a high-resolution, high quality, 3D digital representation of polar regions using high-resolution satellite imagery acquired by Maxar. Led by researchers at the University of Minnesota’s Polar Geospatial Center (PGC), REMA and ArcticDEM provide essential high-resolution terrain maps that help to inform climate policy and national security decision making. Both datasets include large area topographic mosaics and smaller time-dependent elevation data scenes that researchers can use to map the physical environment and observe change through time.
AWS Public Blockchain Data
Bitcoin and Ethereum blockchain datasets are now available in open, cloud-optimized Parquet format via Amazon Simple Storage Service (Amazon S3), managed by AWS. These two datasets simplify cross-chain analytics for researchers and others by providing blockchain data in a consistent format and schema. Additionally, these datasets provide and document a fully open source pipeline for customers wishing to reproduce these datasets or manage their own blockchain data. Read more about these datasets on the AWS Database Blog.
Co-Produced Climate Data to Support California’s Resilience Investments
The Co-Produced Climate Data to Support California’s Resilience Investments dataset includes next-generation climate projections to support energy sector resilience and adaptation planning. In addition, it supports a suite of climate-related research datasets produced for and by California’s Fifth Climate Change Assessment. Regionally downscaled Global Circulation Models (GCM) that contribute to the Coupled Model Intercomparison Project Phase 6 (CMIP6) project form the foundation of the advanced metrics and analytics datasets generated to make climate data actionable. This generation of regionalized data takes a major step forward in spatial resolution (from 6km to 2km) and temporal resolution (daily to hourly). This provides climate data at temporal and spatial scales relevant to infrastructure and community requirements for adaptation planning. In addition to temperature and precipitation, additional climate variables of relevance to stakeholders, such as wind speed and direction, surface solar radiation, relative humidity, and a suite of hydrological variables are regionally downscaled and included in the dataset.
Here is a full list of the datasets released or significantly updated this quarter joining over 350 datasets already available:
Blockchain:
- AWS Public Blockchain Data managed by AWS
Climate and weather:
- Co-Produced Climate Data to Support California’s Resilience Investments from Cal-Adapt Analytics Engine
- Global Real-Time Ocean Forecast System from US National Oceanic and Atmospheric Administration (NOAA)
- S-102 Bathymetric Surface Data from NOAA
- Earth Exchange Global Daily Downscaled Projections (NEX-GDDP-CMIP6) from the National Aeronautics and Space Administration (NASA)
Geospatial:
- Amazonia-1 satellite imagery managed by AMS Kepler
- Homeland Security and Infrastructure US Cities managed by Hobu, Inc
- ArcticDEM and Reference Elevation Model of Antarctica (REMA) managed by Polar Geospatial Center
- Updated: Maxar Open Data Program from Maxar—now with spatio temporal asset catalog (STAC) metadata and in Cloud-Optimized GeoTIFF format. The updates include Hurricane Ian imagery.
- Updated: Capella Space Synthetic Aperture Radar (SAR) managed by Capella Space—now includes TileDB format.
Life sciences:
- Mouse Brain Anatomy: MouseLight Imagery from the Howard Hughes Medical Institute (HHMI) Janelia Farm Research Campus
- Cell Painting Gallery from the Broad Institute
- The Singapore Nanopore Expression Data Set from the Genome Institute of Singapore
- Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) from the Allen Institute
- CAncer MEtastases in LYmph nOdes challeNge (CAMELYON) dataset from Radboud University Medical Center
- Updated: 1000 Genomes Reanalysis with DRAGEN 3.5 and 3.7 from Illumina, Inc.—now includes a reanalysis with DRAGEN 3.7.6 in the AWS US-East-1 Region
- Updated: UniProt managed by the Swiss Institute for Bioinformatics—now includes the 2022_03 release of UniProt
Machine learning:
- 3DCoMPaT: Composition of Materials on Parts of 3D Things from King Abdullah University of Science and Technology
- Wizard of Tasks from Amazon
- PersonPath22 from AWS
- BodyM from Amazon
AWS Open Data on re:Post
Questions on AWS re:Post can now be tagged with AWS Open Data. With AWS re:Post, you and your data users can communicate with a community of peers and AWS experts about a specific dataset, and can also contribute to topics relevant to AWS Open Data at large. You do not need an AWS account to browse re:Post questions, but to post or respond to a query, you need to be logged in to your AWS account. By adding AWS Open Data as a skill to your re:Post profile, you will receive a notification if a new question is tagged with AWS Open Data.
Learn more about AWS for open data
We’re excited to see how you can put these great datasets to work. If you have examples of tutorials, applications, tools, or publications that use these datasets, make sure to list them on the Registry of Open Data on AWS so the community can find them. Learn how to propose your dataset to the AWS Open Data Sponsorship Program and learn more about open data on AWS.
Read more about AWS for open data:
- OpenFold, OpenAlex catalog of scholarly publications, and Capella Space satellite data: The latest open data on AWS
- Creating access control mechanisms for highly distributed datasets
- Downscaled CMIP5, 1950 US Census, and open genomics data for Galaxy: The latest open data on AWS
- From open data to machine learning, making 1950 Census data available with AWS
- Bringing world-class satellite imagery to smallholder farmers with open data
- Preventing the next pandemic: How researchers analyze millions of genomic datasets with AWS
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.