AWS Public Sector Blog
21 new or updated datasets available on the Registry of Open Data on AWS
The Amazon Web Services (AWS) Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on AWS. We work with data providers to:
- Democratize access to data by making it available to the public for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Through this program, customers are making more than 100 petabytes (PB) of high-value, cloud-optimized data available for public use. The full list of publicly available datasets is on the Registry of Open Data on AWS and these datasets are also discoverable on AWS Data Exchange. This past quarter, AWS released 21 new or updated datasets. What will you build with these datasets?
ECMWF Reanalysis 5 (ERA5) returns to the Registry of Open Data
National Science Foundation (NSF) National Center for Atmospheric Research (NCAR) is providing a NetCDF-4 structured version of the 0.25 degree atmospheric European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis 5 (ERA5) to the Registry of Open Data on AWS. ERA5 is produced using high-resolution forecasts (HRES) at 31 kilometer resolution (one fourth the spatial resolution of the operational model) and a 62 kilometer resolution ten member 4D-Var ensemble of data assimilation (EDA) in CY41r2 of ECMWF’s Integrated Forecast System (IFS) with 137 hybrid sigma-pressure (model) levels in the vertical, up to a top level of 0.01 hPa. Atmospheric data on these levels are interpolated to 37 pressure levels (the same levels as in ERA-Interim).
Surface or single level data are also available, containing 2D parameters such as precipitation, 2 meter temperature, top of atmosphere radiation and vertical integrals over the entire atmosphere. The IFS is coupled to a soil model, the parameters of which are also designated as surface parameters, and an ocean wave model. ERA5 products are used to train ML/AI based weather forecast models and support retrospective climate research use cases, including where to locate solar and wind farms.
Full list of new or updated datasets
ECMWF ERA5 joins 20 other new or updated datasets on the Registry of Open Data in the following categories.
Climate and weather
- NOAA Analysis of Record for Calibration (AORC) Dataset from National Oceanic and Atmospheric Administration (NOAA)
- OAQPS 2022 Modeling Platform from S. Environmental Protection Agency, Office of Air Quality Planning and Standards, Air Quality Assessment Division, Air Quality Modeling Group
- NOAA’s Coastal Ocean Reanalysis (CORA) Dataset from NOAA
- NOAA Unified Forecast System (UFS) Hierarchical Testing Framework (HTF) from NOAA
- CESM-HR from TAMU
- NSF NCAR Curated ECMWF Reanalysis 5 (ERA5) from NSF National Center for Atmospheric Research
- ERA5-for-WRF Open Data on AWS from Veer Renewables
- Digital Earth Africa Normalised Difference Vegetation Index (NDVI) Climatology from Digital Earth Africa
- Digital Earth Africa Monthly Normalised Difference Vegetation Index (NDVI) Anomaly from Digital Earth Africa
- NOAA Global Data Assimilation (DA) Test Data from NOAA
- Demand-Side Grid (dsgrid) Toolkit from National Renewable Energy Laboratory
- NOAA Cloud Optimized Zarr Reference Files (Kerchunk) from NOAA’s National Ocean Service, the Integrated Ocean Observing System (IOOS)
Geospatial
- Sub-Meter Canopy Tree Height of California in 2020 from CTrees.org
- Collection of open nation-scale LiDAR datasets from Flai
- EarthDEM from Polar Geospatial Center
- OpenUniverse 2024 Matched Rubin and Roman Simulations: Preview from NASA/IPAC Infrared Science Archive (IRSA) at Caltech
- Vermont Open Geospatial on AWS from Vermont Center for Geographic Information
Life sciences
- Baby Open Brains (BOBs) Repository on AWS from Masonic Institute for the Developing Brain (MIDB)
- Reference data for HiFi human WGS from Pacific Biosciences of California, Inc
Machine learning
- Estimating Confidence Intervals for 2020 Census Statistics Using an Approximate Monte Carlo Simulation from United States Census Bureau
- MegaScenes from Cornell University
What are people doing with open data?
- Amazon SageMaker uses datasets from the Registry of Open Data in SageMaker Geospatial
- The COVID Moonshot Consortium crowdsourced the discovery of potent COVID-19 therapeutics.
- Scion Research applies artificial intelligence to open aerial data to assess the impact of Cycle Gabrielle on New Zealand’s forests.
How can you make your data available?
Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Learn how to propose your dataset to the AWS Open Data Sponsorship Program.