AWS Public Sector Blog
39 new or updated datasets available on the Registry of Open Data on AWS
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-based techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through this program, customers are making more than 100 petabytes (PB) of high-value, cloud-optimized data available for public use.
The full list of publicly available datasets are on the Registry of Open Data on AWS and are now also discoverable on AWS Data Exchange. This quarter, AWS released 39 new or updated datasets.
What will you build with these datasets?
RSNA Cervical Spine Fracture Detection (RSNA-CSF) Dataset
More than 1.5 million spine fractures occur annually in the United States alone resulting in over 17,730 spinal cord injuries annually. The most common site of spine fracture is the cervical spine. There has been a rise in the incidence of spinal fractures in the elderly, and in this population, fractures can be more difficult to detect on imaging due to degenerative disease and osteoporosis. Imaging diagnosis of adult spine fractures is now almost exclusively performed with computed tomography (CT). Quickly detecting and determining the location of any vertebral fractures is essential to prevent neurologic deterioration and paralysis after trauma. RSNA has teamed with the American Society of Neuroradiology (ASNR) and the American Society of Spine Radiology (ASSR) to create this ground truth dataset, collecting imaging data from twelve sites on six continents, including approximately 2,000 CT studies. Spine radiology specialists from the ASNR and ASSR provided expert image-level annotations to these studies to indicate the presence, vertebral level, and location of any cervical spine fractures.
Full list of new or updated datasets
The RSNA-CSF Dataset joins 38 other new or updated datasets on the Registry of Open Data in the following categories.
Climate and weather:
- National Climate Database (NCDB) from National Renewable Energy Laboratory
- Global Carbon Budget Data from Global Carbon Budget Office
- NOAA National Air Quality Forecast Capability (NAQFC) Regional Model Guidance from National Oceanic and Atmospheric Administration (NOAA)
- Moorings – Hourly time-series product from Open Access to Ocean Data (AODN)
- National Mooring Network – CTD profiles from AODN
- Ocean Gliders – Delayed mode from AODN
- Ocean Radar – Turquoise coast site -Sea water velocity – Delayed mode from AODN
- Satellite – Sea surface temperature – Level 3 – Single sensor – 1 day – Day and night time from AODN
- Met Office Global Deterministic 10km on a 2-year rolling archive from Met Office
- Met Office UK Deterministic (UKV)2km on a 2-year rolling archive from Met Office
- SPARTAN Data from Atmospheric Composition Analysis Group
- Ships of Opportunity – Biogeochemical sensors – Delayed mode from AODN
- Ships of Opportunity – Expendable bathythermographs – Real time from AODN
- Ships of Opportunity – Fisheries vessels – Real time from AODN
- Ships of Opportunity – Sea surface temperature – 1-minute average data products from AODN
- Ships of Opportunity – Tropical research vessels – Real time from AODN
- EPA Dynamically Downscaled Ensemble (EDDE) from US Environmental Protection Agency
Geospatial:
- SSL4EO S12 Landsat Multi Product Dataset from Sankranti Joshi
- NASA SOTERIA Simulation Testbed Data from NASA
- Unblurred Coadds of the Wide-field Infrared Survey Explorer (unWISE) from NASA/IPAC Infrared Science Archive (IRSA)
- Poseidon 3D Seismic, Australia from Tomlinson Geophysical Services Inc (TGS)
- Spatiam Corporation National Lab Research Announcement International Space Station Technology Demonstration from Spatiam Corporation
- OceanCurrent – Gridded sea level anomaly – Near real time from AODN
- Sentinel-1 Precise Orbit Determination (POD) Products from The Alaska Satellite Facility (ASF)
- High resolution, annual cropland and landcover maps for selected African countries from The Agricultural Impacts Research Group
- JAXA / USGS / NASA Kaguya/SELENE Terrain Camera Digital Terrain Models from NASA
- A region-wide, multi-year set of crop field boundary labels for Africa from The Agricultural Impacts Research Group
- GEOS-Chem Nested Input Data from GEOS-Chem
- NOAA / NGA Satellite Computed Bathymetry Assessment-SCuBA from NOAA
Life sciences:
- MONKEY from Radboud University Medical Center
- CryoET Data Portal from Chan Zuckerberg Initiative Foundation
- RSNA Abdominal Trauma Detection (RSNA-ABT) from Radiological Society of North America
- RSNA Cervical Spine Fracture Detection (RSNA-CSF) Dataset from Radiological Society of North America
- RSNA Intracranial Hemorrhage Detection from Radiological Society of North America
- RSNA Pulmonary Embolism Detection from Radiological Society of North America
- RSNA Screening Mammography Breast Cancer Detection (RSNA-SMBC) Dataset from Radiological Society of North America
Machine learning:
- Estimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2020 Census Production Run) from United States Census Bureau
- Animal Tracking – Acoustic Telemetry – Quality controlled detections from AODN
- ABEJA CC JA from ABEJA inc.
What are people doing with open data?
- Amazon SageMaker uses datasets from the Registry of Open Data in the SageMaker Geospatial
- Radiological Society of North America is using the Registry of Open Data to develop machine learning (ML) for early detection to evaluate and screen mammograms in an effort to reduce cancer fatalities.
- High-resolution annual cropland and landcover maps for selected African countries developed by Clark University‘s Agricultural Impacts Research Group using various ML approaches applied to planet imagery, including field boundary and cultivated frequency maps, as well as multiclass land cover within the Registry of Open Data.
- A global community of researchers and innovators are using open data for sustainability-related uses as part of the Amazon Sustainability Data Initiative (ASDI).
- The Institut Pasteur is creating a searchable DNA database of all life on Earth using data from AWS Open Data.
How can you make your data available?
Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:
- Democratize access to data by making it available for analysis on AWS
- Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
- Encourage the development of communities that benefit from access to shared datasets
Learn how to propose your dataset to the AWS Open Data Sponsorship Program.