AWS Public Sector Blog
34 new or updated datasets available on the Registry of Open Data on AWS
The Amazon Web Services (AWS) Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on AWS. We work with data providers to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Through this program, customers make more than 100 petabytes (PB) of high-value, cloud-optimized data available for public use. The full list of publicly available datasets are on the Registry of Open Data on AWS and also discoverable on the AWS Data Exchange. This quarter, AWS released 34 new or updated datasets. What will you build with these datasets? Continue reading for inspiration.
More AI analysis-ready datasets on the Registry of Open Data
The Multi-robot, Multi-Sensor, Multi-Environment Event Dataset (M3ED) is the first multi-sensor event camera dataset focused on high-speed dynamic motions in robotics applications. M3ED provides high-quality synchronized and labeled data from multiple platforms, including ground vehicles, legged robots, and aerial robots. These platforms typically operate in challenging conditions such as driving on off-road trails, navigating through dense forests, and performing aggressive flight maneuvers. Our dataset also covers demanding operational scenarios for event cameras, such as scenes with high egomotion and multiple independently moving objects. The sensor suite used to collect M3ED includes high-resolution stereo event cameras (1280×720), grayscale imagers, an RGB imager, a high-quality inertial measurement unit (IMU), a 64-beam LiDAR, and real-time kinematic (RTK) localization. This dataset aims to accelerate the development of event-based algorithms and methods for edge cases encountered by autonomous systems in dynamic environments.
Full list of new or updated datasets
This dataset joins 33 other new or updated datasets on the Registry of Open Data in four categories: climate and weather, geospatial, life sciences, and machine learning (ML).
Climate and weather
- Computational Materials Data by Materials Project at Berkeley Lab
- Argo float data and metadata from Global Data Assembly Centre (Argo GDAC) at Euro-Argo
- CMIP6 GCMs downscaled in Southeast Asia using WRF from National University of Singapore
- Gulfwide Avian Colony Monitoring Survey Photos from The Water Institute (Louisiana and the Gulf Coast)
- UKESM1 ARISE-SAI impacts of geoengineering via the injection of sulphur dioxide from UK Met Office
- LHD Large Helical Device experiment data from National Institute for Fusion Science (NIFS)
- GEOGloWS Hydrologic Model Version 2 from GEOGloWS
- CRAAM VLF – Centro de Rádio Astronomia e Astrofísica Mackenzie Very Low Frequency signals from Universidade Presbiteriana Mackenzie
- Tropical Cyclone Precipitation, Infrared, Microwave, and Environmental Dataset (TC PRIMED) from CIRA (Cooperative Institute for Research in the Atmosphere)
- Wind and Loads on Parabolic Troughs from National Renewable Energy Laboratory
- Taiwan Central Weather Administration OpenData from Central Weather Administration
- NOAA Hurricane Analysis and Forecast System (HAFS) from NOAA
Geospatial
- New Zealand Imagery from Toitū Te Whenua Land Information New Zealand
- KyFromAbove aerial photography and elevation datasets plus derivatives from Commonwealth of Kentucky, Division of Geographic Information
- AllWISE Data Wide-field Infrared Survey Explorer from NASA/IPAC Infrared Science Archive
- Spitzer Enhanced Imaging Products (SEIP) Super Mosaics via infrared astronomy space telescope from NASA/IPAC Infrared Science Archive
- Wise-allsky Wide-field Infrared Survey Explorer full cryogenic data from NASA/IPAC Infrared Science Archive
- Wide-field Infrared Survey Explorer 3-Band Cryo Data from NASA/IPAC Infrared Science Archive
- Near-Earth Object Wide-field Infrared Survey Explorer (NEOWISE) from NASA/IPAC Infrared Science Archive
- NEOWISE Post-Cryo Data from NASA/IPAC Infrared Science Archive
- Japan Aerospace EXploration Agency (JAXA) SELenological and ENgineering Explorer (SELENE) from NASA
- NASA / USGS Lunar Orbiter Laser Altimeter Cloud Optimized Point Cloud from NASA
- NASA High Energy Astrophysics Mission Data from NASA
- NASA Legacy Archive for Microwave Background Data Analysis (LAMBDA) from NASA
Life sciences
- Kraken2 RefSeq Complete V205 database from Dalhousie University
- CZ CELLxGENE Discover Census from Chan Zuckerberg Initiative, Single Cell Biology
- Harvard-Emory ECG database (HEEDB) from Brain Data Science Platform
- Opioid Industry Documents Archive (OIDA) from Johns Hopkins University
Machine learning
- Public Utility Data Liberation Project from Catalyst Cooperative
- M3ED (Multi-Robot, Multi-Sensor, Multi-Environment Event Dataset) from University of Pennsylvania
- 2020 Census Demographic and Housing Characteristics Noisy Measurement File from United States Census Bureau
- 2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File (NMF) from United States Census Bureau
- 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File from United States Census Bureau
- 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File from United States Census Bureau
What are people doing with open data?
- Amazon SageMaker uses datasets from the Registry of Open Data in the SageMaker Geospatial ML feature
- The COVID Moonshot Consortium crowdsourced the discovery of potent COVID-19 therapeutics
- Scion Research applies artificial intelligence to open aerial data to assess the impact of Cyclone Gabrielle on New Zealand’s forests
How can you make your data available?
The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. Learn how to propose your dataset to the AWS Open Data Sponsorship Program. Learn more about open data on AWS.