AWS for Industries
How NOV solved critical oilfield operations using Databricks Data Intelligence Platform
The global demand for oil remains high as it powers transportation, industries, and electricity worldwide. One facet of the energy lifecycle is drilling, where drilling operators face numerous challenges in maintaining consistent oil supply to meet this demand. They must navigate harsh environments using heavy machinery with precision, while controlling costs and risks. Drilling operators need to find ways to mitigate risks, control costs, and prevent disruptions of oil extraction and production. A key strategy to improve drilling efficiency and uptime is condition-based maintenance (CBM) of equipment. However, most operators lack the capabilities to implement CBM, which needs real-time monitoring and predictive data analytics.
What is CBM and why is it so challenging?
Condition-based maintenance (CBM) revolutionizes drilling maintenance by using real-time sensor data and analytics to proactively identify potential equipment failures before they occur. This is meant to prevent costly unplanned downtime and dangerous breakdowns, enhancing safety, reliability, efficiency, and sustainability in the high-stakes drilling environment where uptime is paramount. By shifting from reactive to proactive strategies, CBM optimizes asset lifespan and performance, aligning maintenance with the fast-paced, tech-driven drilling industry.
Implementing CBM poses challenges primarily from a data perspective. The difficulty lies in efficiently collecting, processing, and interpreting large volumes of real-time data generated by sensors monitoring equipment conditions. The complexity of integrating diverse data sources and maintaining compatibility among different systems and equipment adds to the complexity. Establishing standardized protocols for data analysis and maintenance practices is crucial. Additionally, working in remote locations means a lack of reliable telecommunications or band-width prioritization thus compounding existing challenges.
Overcoming these challenges requires an open environment following open standards that addresses compatibility issues, manages both streaming and batch data, provides advanced data analytics and machine learning (ML) tools, encourages cross collaboration between experts, and fosters a culture that values data driven decisions. Successful CBM implementation hinges on the ability to effectively manage and make meaningful insights from the wealth of data generated by monitoring systems.
The energy industry’s backbone is its extensive network of physical assets and infrastructure (rigs, ships, refineries, pipelines etc.) that is operated for sometimes decades. This complex machinery generates high-frequency sensor data (millions of data points) that can be used to reduce unplanned downtime, maximize output and drive a high degree of process automation, ultimately driving safer operations and higher performance.
Databricks Data Intelligence Platform on AWS
The Databricks Data Intelligence Platform from Databricks, an AWS partner, empowers energy companies to extract maximum value from a spectrum of data sources, encompassing assets, operations, environment, and customer interactions. By using this comprehensive data platform, companies can revolutionize their approach to create energy solutions that prioritize safety, reliability, and intelligence. The Databricks Data Intelligence Platform optimizes operational efficiency to deliver enhanced environmental sustainability and is used for building energy solutions that are not only technologically advanced, but also attuned to the evolving needs of the industry, thus fostering a paradigm shift towards safer, more reliable, and smarter energy landscapes
Figure 1: Scaled oilfield analytics: Databricks Data Intelligence Platform on AWS
The preceding figure shows a generic reference architecture of a Databricks Data Intelligence Platform leveraging AWS services to cover the swim lanes of Source, Ingest, Transform, Query and Process, Serve and Analysis/Output with storage shown at the bottom as Amazon S3. The architecture starts with various data sources, such as unstructured, semi-structured, and structured data from sensors and IoT devices, media files, logs, relational databases, and business applications.
In the Ingest layer and storage, streaming events from Amazon Kinesis, Kafka, and Event Hub are directly read into Databricks through Structured Streaming. The data is stored in Amazon Simple Storage Service (Amazon S3), where extract, transform, and load (ETL) pipelines use the medallion architecture to store data in a curated way as Delta files/tables.
Figure 2: Databricks Workspace
The Transform layer handles the transformations using Spark and Photon engines of the Databricks Data Intelligence Platform. The Query and Process layer supports both SQL queries and SQL Warehouses, as well as Python and Scala queries through Databricks Workspaces (see the preceding figure). It also supports data science workloads through the Databricks DS/ML environment with specialized ML runtimes for AutoML and coding ML jobs. The overall data science workflows are best supported by MLflow and Feature Store.
In the Serve layer, the Databricks Data Intelligence Platform provides both “Databricks SQL” and “Serverless SQL” as Data Warehouse offerings for business intelligence (BI) use cases with capabilities for batch and serverless real-time inference. External systems such as operational databases can also be used to store final data products and serve them to users through Analysis/Output layer in the form of Apps and BI Tools.
For data governance and orchestration, Unity Catalog is the central data governance solution of Databricks, providing capabilities such as metadata storage, data discovery, lineage tracking, and fine-grained access control. Databricks Workflows is a flexible workflow management solution running millions of orchestrated pipelines in production for Databricks’ customers. Additionally, continuous integration/continuous delivery (CI/CD) support and the MLOps Stacks allow for orchestrating the supporting processes.
NOV uses the Databricks Data Intelligence Platform to enable CBM strategies
NOV is a global oilfield services corporation. In order to better achieve scale and achieve data driven decisions across their operations, NOV aspired to create a data command center involving a comprehensive approach to ingest and seamlessly integrate real-time streaming data with historical batch data to provide a holistic overview of well site activities.They have implemented advanced data analytics and visualization tools to make sure that operators can promptly identify anomalies, potential issues, and trends from the well data.
Data landscape overview
Rigs generate over 1 billion rows of data per day from 30,000+ sensors streaming at 1 Hz or higher. This sensor data streams seamlessly into central repositories such as Aveva PI, formerly known as OSIsoft, yet corporate data and equipment details remain siloed, needing duplicated efforts due to limited data access. A robust model deployment pipeline significantly improves data-driven optimization. Modernizing to a unified, scalable data environment is critical for unlocking data-driven insights.
In contemplating model deployment, critical considerations emerge regarding scheduling, infrastructure, and scalability. Questions arise regarding how to efficiently schedule the model, determine its runtime environment, and scale it effectively across diverse assets. The need to formalize and schedule the data pipeline that feeds into the model is equally important, providing a seamless integration that supports consistent and reliable predictions. Additional considerations for analytics include the following:
- An environment with quality connected product data.
- A way to scale analysis, not only model development but also data analysis.
- An environment to scale model, training, inference, and deployment on hundreds of assets.
Using Databricks on AWS, NOV utilizes data handling for timely analysis, proactive equipment failure identification, and optimized maintenance. Databricks enables advanced analytics, ML model deployment, equipment health prediction, and efficient maintenance scheduling.
The environment met the initial needs of NOV, but NOV also needed to get data from Aveva PI System and other sources to scale. Furthermore, the following setup helped NOV to scale with simplified administration and governance.Figure 3: The initial waterfall approach
Initially, a traditional waterfall approach streamed all data into Delta Lake, created large datasets, built components upfront, and handled backfills and monitoring. However, challenges arose with resource-intensive processing, stream issues impacting downstream model runs, slow backfills, and metadata management struggles.
Then, NOV transitioned to an agile, project-based methodology – build Delta Lake incrementally, streaming only required data, grouping by service level agreements (SLAs), and iterating on component templates. This reduced processing overhead, localized stream issues, expedited backfills, and facilitated metadata tracking.
Lessons learned included maintaining data quality through gap detection, robust stream monitoring with Datadog integration, dedicated job monitoring and backfilling services, a centralized metadata library, and a defined governance plan using Unity Catalog. Creating reusable utilities, centralizing cluster operations, and providing ongoing training further enabled scaling analytics across the organization.
Implementing Databricks on AWS has proven instrumental in advancing CBM strategies within the realm of drilling equipment, leading to substantial improvements in the supply chain management of such crucial assets. By harnessing the power of data analytics and ML on the Databricks Data Intelligence Platform organizations can efficiently monitor the health and performance of drilling equipment in real-time.
This proactive approach enables predictive maintenance, reducing downtime and mitigating the over-supply of inventory. Consequently, the enhanced supply chain efficiency not only optimizes resource allocation, but also contributes significantly to the reduction of the carbon footprint associated with excess production and transportation. Databricks on AWS emerges as a pivotal tool, seamlessly integrating technology and sustainability to drive positive transformations in the drilling equipment industry. The advantages of an open and flexible data environment allows customers to address their current data needs and create a scalable architecture that is purpose built for your future data challenges. Contact Databricks to know more about their Databricks Data Intelligence Platform.