AWS for Industries
Discovering mining data with Elsevier Geofacets on OSDU Data Platform
Overview
Knowing what data you have at your disposal and quickly finding and accessing it has become an essential first step for digital transformation in the energy industry. The Open Group OSDU Forum aims to reduce organizational data silos by facilitating collaboration on the OSDU Data Platform that helps break the barriers to innovation and democratize data usage across all aspects of the energy value chain. One of the key guiding principles of the OSDU Data Platform is its extensibility and being able to add new data types in support of different business workflows.
Elsevier and Amazon Web Services (AWS) have collaborated on extending the OSDU Data Platform to support the mining industry. With the new developments, it is now possible to upload, search, and retrieve mining data. The newly developed modules allow converting Elsevier Geofacets metadata, publishing it into the OSDU Data Platform, and retrieving the data for future consumption and analysis through the platform’s core application program interfaces (APIs).
Elsevier Geofacets
Elsevier Geofacets’ solution provides actionable insights for energy, critical minerals, and other natural resources. The metadata associated with the maps, tables, and graphs is extracted and validated. This facilitates easily finding the data and integrates it with the business-driving workflows. However, the backend store for this data and metadata has historically been proprietary. The existing APIs have provided accessibility to the data, but taking it one step further and using the open-source data platform helps unlock the mining data and its applications more broadly.
As OSDU is gaining momentum across the energy value chain, AWS and Elsevier set the goal to illustrate how it is possible to add new schemas and extend the core OSDU Data Platform to support the most frequently used mining-industry data. New schemas needed to be created to accurately capture metadata from files, documents, tables, and graphs. The ingestion data pipeline also needed to be enhanced, and ingestion scripts needed to be developed to parse, extract, and store the mining metadata in the OSDU Data Platform using the OSDU Core services APIs.
New OSDU mining schemas
The overall conceptual architecture of the OSDU Data Platform with the new OSDU mining schema extensions is shown in figure 1. GeoWorkspace schema was created for the ingestion of drawing format (*.DWG), Geo TIFF (.TIF, *.TIFF), and ASCII Grid (.GRD) data types. In addition, four schemas were created for an article and its extracted subcomponents such as maps, tables, and figures. All records were created of the type work-product-component using the OSDU template, https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/SchemaRegistrationResources/shared-schemas/osdu, and follow the standards for JSON schemas, https://json-schema.org/. Schemas build on top of each other and inherit their properties from other schemas. The custom properties of the schemas were added in the Individual Properties object of the Data property. After adding the name and versioning in the header, the new schema can be uploaded to OSDU through the REST API PUT request of the OSDU schema service. Python scripts for the GeoWorkspace and article-ingestion workflows were also developed. These scripts are responsible for pulling out the metadata from the source data and constructing the appropriate records in the OSDU Data Platform.
Figure 1. The OSDU Data Platform conceptual architecture with new mining-industry data schemas
The functionality described above was set up on the secure and reliable implementation of the OSDU Data Platform from AWS. The newly developed extensions to support mining workflows take full advantage of Amazon Simple Storage Service (Amazon S3), an object storage service, and Amazon DynamoDB, a fully managed, serverless, key-value NoSQL database. Schema definitions and its metadata are stored in Amazon DynamoDB, which automatically scales up and down depending on the usage of the OSDU Data Platform. This allows OSDU Data Platform on AWS to not only quickly adapt to new customer requirements but also handle metadata changes and their size quickly and without compromising performance. When querying the metadata from Amazon DynamoDB, it provides fast access to items by specifying primary key values. It can also be further optimized with one or more secondary (or alternate) keys, and this leads to much more flexible query patterns for the interrogation of the metadata. This becomes especially relevant when extending the existing OSDU schemas at scale to support new workflows, such as the ones for the Elsevier Geofacets application.
The underlying articles, figures, maps, and tables are stored as objects in Amazon S3, which provides the necessary scalability, availability, security, and performance to retrieve the data and deliver it to the application layer. Whether it is an article attachment, a high-resolution image of the area, a map related to the mining activities, or a SEG-Y seismic file, the flexibility of Amazon S3 allows you to store nearly any type and amount of data that you want.
Proof of concept
Using the newly created mining schemas and data ingestion mechanisms, Elsevier developed a proof of concept and strategy to implement a geospatial query from Elsevier Geofacets user interface against the OSDU Search service. The mock-up of the Elsevier Geofacets connected to the OSDU Data Platform extended with mining schemas is shown in figure 2. This proof of concept illustrates how a user can search a section of a map by drawing a box and seeing the results for that location from both the Elsevier Geofacets rich-data repository together with the connected OSDU instance.
Figure 2. The OSDU Data Platform conceptual architecture with new mining industry data schemas
Conclusion
The Open Group OSDU Forum is focused on building and advancing a standards-based, technology-agnostic data platform to help transform and facilitate the energy industry in addressing the world’s ever-evolving energy needs with data management and data analytics. OSDU Data Platform extensibility facilitates companies to effectively work and advance the OSDU platform. The Elsevier and AWS effort highlighted how OSDU can be extended into the completely new industry in an accelerated manner. In absence of OSDU, a similar effort would have taken at least 6 or more months. With OSDU, it took less than 2 months to complete the integration. The next steps are to work with the Open Group OSDU Forum and explore the contribution of the developed mining schemas to the forum. This effort also demonstrates how independent software vendors (ISVs) can interact and work with the OSDU Data Platform and advance the platform further into the new areas.