AWS for Industries

How NOV solved critical oilfield operations using Databricks Data Intelligence Platform

The global demand for oil remains high as it powers transportation, industries, and electricity worldwide. One facet of the energy lifecycle is drilling, where drilling operators face numerous challenges in maintaining consistent oil supply to meet this demand. They must navigate harsh environments using heavy machinery with precision, while controlling costs and risks. Drilling operators need to find ways to mitigate risks, control costs, and prevent disruptions of oil extraction and production. A key strategy to improve drilling efficiency and uptime is condition-based maintenance (CBM) of equipment. However, most operators lack the capabilities to implement CBM, which needs real-time monitoring and predictive data analytics.

What is CBM and why is it so challenging?

Condition-based maintenance (CBM) revolutionizes drilling maintenance by using real-time sensor data and analytics to proactively identify potential equipment failures before they occur. This is meant to prevent costly unplanned downtime and dangerous breakdowns, enhancing safety, reliability, efficiency, and sustainability in the high-stakes drilling environment where uptime is paramount. By shifting from reactive to proactive strategies, CBM optimizes asset lifespan and performance, aligning maintenance with the fast-paced, tech-driven drilling industry.

Implementing CBM poses challenges primarily from a data perspective. The difficulty lies in efficiently collecting, processing, and interpreting large volumes of real-time data generated by sensors monitoring equipment conditions. The complexity of integrating diverse data sources and maintaining compatibility among different systems and equipment adds to the complexity. Establishing standardized protocols for data analysis and maintenance practices is crucial. Additionally, working in remote locations means a lack of reliable telecommunications or band-width prioritization thus compounding existing challenges.

Overcoming these challenges requires an open environment following open standards that addresses compatibility issues, manages both streaming and batch data, provides advanced data analytics and machine learning (ML) tools, encourages cross collaboration between experts, and fosters a culture that values data driven decisions. Successful CBM implementation hinges on the ability to effectively manage and make meaningful insights from the wealth of data generated by monitoring systems.

The energy industry’s backbone is its extensive network of physical assets and infrastructure (rigs, ships, refineries, pipelines etc.) that is operated for sometimes decades. This complex machinery generates high-frequency sensor data (millions of data points) that can be used to reduce unplanned downtime, maximize output and drive a high degree of process automation, ultimately driving safer operations and higher performance.

Databricks Data Intelligence Platform on AWS

The Databricks Data Intelligence Platform from Databricks, an AWS partner, empowers energy companies to extract maximum value from a spectrum of data sources, encompassing assets, operations, environment, and customer interactions. By using this comprehensive data platform, companies can revolutionize their approach to create energy solutions that prioritize safety, reliability, and intelligence. The Databricks Data Intelligence Platform optimizes operational efficiency to deliver enhanced environmental sustainability and is used for building energy solutions that are not only technologically advanced, but also attuned to the evolving needs of the industry, thus fostering a paradigm shift towards safer, more reliable, and smarter energy landscapesFigure 1 Scaled oilfield analytics Databricks Data Intelligence Platform on AWS

Figure 1: Scaled oilfield analytics: Databricks Data Intelligence Platform on AWS

The preceding figure shows a generic reference architecture of a Databricks Data Intelligence Platform leveraging AWS services to cover the swim lanes of Source, Ingest, Transform, Query and Process, Serve and Analysis/Output with storage shown at the bottom as Amazon S3. The architecture starts with various data sources, such as unstructured, semi-structured, and structured data from sensors and IoT devices, media files, logs, relational databases, and business applications.

In the Ingest layer and storage, streaming events from Amazon Kinesis, Kafka, and Event Hub are directly read into Databricks through Structured Streaming. The data is stored in Amazon Simple Storage Service (Amazon S3), where extract, transform, and load (ETL) pipelines use the medallion architecture to store data in a curated way as Delta files/tables.

Figure 2 Databricks WorkspaceFigure 2: Databricks Workspace

The Transform layer handles the transformations using Spark and Photon engines of the Databricks Data Intelligence Platform. The Query and Process layer supports both SQL queries and SQL Warehouses, as well as Python and Scala queries through Databricks Workspaces (see the preceding figure). It also supports data science workloads through the Databricks DS/ML environment with specialized ML runtimes for AutoML and coding ML jobs. The overall data science workflows are best supported by MLflow and Feature Store.

In the Serve layer, the Databricks Data Intelligence Platform provides both “Databricks SQL” and “Serverless SQL” as Data Warehouse offerings for business intelligence (BI) use cases with capabilities for batch and serverless real-time inference. External systems such as operational databases can also be used to store final data products and serve them to users through Analysis/Output layer in the form of Apps and BI Tools.

For data governance and orchestration, Unity Catalog is the central data governance solution of Databricks, providing capabilities such as metadata storage, data discovery, lineage tracking, and fine-grained access control. Databricks Workflows is a flexible workflow management solution running millions of orchestrated pipelines in production for Databricks’ customers. Additionally, continuous integration/continuous delivery (CI/CD) support and the MLOps Stacks allow for orchestrating the supporting processes.

NOV uses the Databricks Data Intelligence Platform to enable CBM strategies

NOV is a global oilfield services corporation. In order to better achieve scale and achieve data driven decisions across their operations, NOV aspired to create a data command center involving a comprehensive approach to ingest and seamlessly integrate real-time streaming data with historical batch data to provide a holistic overview of well site activities.They have implemented advanced data analytics and visualization tools to make sure that operators can promptly identify anomalies, potential issues, and trends from the well data.

Data landscape overview

Rigs generate over 1 billion rows of data per day from 30,000+ sensors streaming at 1 Hz or higher. This sensor data streams seamlessly into central repositories such as Aveva PI, formerly known as OSIsoft, yet corporate data and equipment details remain siloed, needing duplicated efforts due to limited data access. A robust model deployment pipeline significantly improves data-driven optimization. Modernizing to a unified, scalable data environment is critical for unlocking data-driven insights.

In contemplating model deployment, critical considerations emerge regarding scheduling, infrastructure, and scalability. Questions arise regarding how to efficiently schedule the model, determine its runtime environment, and scale it effectively across diverse assets. The need to formalize and schedule the data pipeline that feeds into the model is equally important, providing a seamless integration that supports consistent and reliable predictions. Additional considerations for analytics include the following:

  • An environment with quality connected product data.
  • A way to scale analysis, not only model development but also data analysis.
  • An environment to scale model, training, inference, and deployment on hundreds of assets.

Using Databricks on AWS, NOV utilizes data handling for timely analysis, proactive equipment failure identification, and optimized maintenance. Databricks enables advanced analytics, ML model deployment, equipment health prediction, and efficient maintenance scheduling.

The environment met the initial needs of NOV, but NOV also needed to get data from Aveva PI System and other sources to scale. Furthermore, the following setup helped NOV to scale with simplified administration and governance.Figure 3 The initial waterfall approachFigure 3: The initial waterfall approach

Initially, a traditional waterfall approach streamed all data into Delta Lake, created large datasets, built components upfront, and handled backfills and monitoring. However, challenges arose with resource-intensive processing, stream issues impacting downstream model runs, slow backfills, and metadata management struggles.

Then, NOV transitioned to an agile, project-based methodology – build Delta Lake incrementally, streaming only required data, grouping by service level agreements (SLAs), and iterating on component templates. This reduced processing overhead, localized stream issues, expedited backfills, and facilitated metadata tracking.

Lessons learned included maintaining data quality through gap detection, robust stream monitoring with Datadog integration, dedicated job monitoring and backfilling services, a centralized metadata library, and a defined governance plan using Unity Catalog. Creating reusable utilities, centralizing cluster operations, and providing ongoing training further enabled scaling analytics across the organization.

Implementing Databricks on AWS has proven instrumental in advancing CBM strategies within the realm of drilling equipment, leading to substantial improvements in the supply chain management of such crucial assets. By harnessing the power of data analytics and ML on the Databricks Data Intelligence Platform organizations can efficiently monitor the health and performance of drilling equipment in real-time.

This proactive approach enables predictive maintenance, reducing downtime and mitigating the over-supply of inventory. Consequently, the enhanced supply chain efficiency not only optimizes resource allocation, but also contributes significantly to the reduction of the carbon footprint associated with excess production and transportation. Databricks on AWS emerges as a pivotal tool, seamlessly integrating technology and sustainability to drive positive transformations in the drilling equipment industry. The advantages of an open and flexible data environment allows customers to address their current data needs and create a scalable architecture that is purpose built for your future data challenges. Contact Databricks to know more about their Databricks Data Intelligence Platform.

TAGS:
Crystal Blankenship

Crystal Blankenship

As NOV’s Analytics Operations Manager, Crystal Blankenship, is responsible for developing a platform that optimizes and scales NOV’s data and analytics developments. Her team contributes to the analytical and data science lifecycle by developing an environment and processes for rapid analysis and model development, streamlining the analytics lifecycle, and providing expertise for automating analytical developments across the model lifecycle. She has worked extensively as a Data Engineer and Data Scientist in roles prior to joining NOV working on a variety of analytical tool and model developments mostly within the energy and finance industry. She enjoys being part of new model and analytics developments and working through new challenges.

Andrew Kraemer

Andrew Kraemer

Andrew Kraemer is a Data Scientist who specializes helping customers build and deploy custom machine learning solutions. He joined Databricks as a Solutions Architect in 2023 and focuses on enabling customers in the energy space.

Caitlin Gordon

Caitlin Gordon

Caitlin Gordon leads Industry Marketing for Manufacturing, Transportation, and Energy at Databricks where she led the launch of the Data Intelligence Platform for Energy in 2024. In this role, Caitlin brings technical solutions to a range of data consumers, democratizing data for all in these highly important and complex industries.

Reishin Toolsi

Reishin Toolsi

Reishin Toolsi is a recognized leader in the energy industry, with regards to Data and AI. He worked for 14 years at SLB across multiple business units, all across the energy value chain. Currently, he is a Senior Solutions Architect and Energy Subject Matter Expert for Databricks. This position has allowed him to combine his passion for AI with his expertise in the connected energy ecosystem, therefore leading the effort of simplifying the way Databricks clients access and use data.

Dhruv Vashisth

Dhruv Vashisth

Dhruv Vashisth, a principal solutions architect for Global Energy Partners at AWS, brings over 19 years of deep experience in architecting and implementing enterprise solutions, with a 15-year tenure specifically in the energy industry. Dhruv is dedicated to helping AWS energy partners in constructing upstream and decarbonization solutions on AWS. Since joining AWS in 2019, Dhruv has been driving the success of energy partners by leading solution architecture, solution launches, and joint go-to-market strategies on AWS.