Streamlining U.S. EPA Subpart W Greenhouse Gas Reporting with AWS and Sustainability Insights Framework

Introduction

Oil and Gas companies operate in a complex global environment that requires them to balance energy demand with a lower carbon footprint. The United States’ Environment Protection Agency (EPA) requires all Oil and Gas companies to report greenhouse gas (GHG) emissions annually to track high GHG-emitting facilities and develop climate policies accordingly. GHG reporting for an Oil and Gas company involves collecting data from a wide range of systems, calculating emissions, and reviewing emission data before submitting them to the EPA. The preparation of this data can be a complex undertaking and take up a significant portion of the GHG reporting effort if the correct solutions aren’t used.

In this post we show you how to build a cloud-focused GHG reporting solution using AWS services and the AWS Sustainability Insights Framework (SIF), which help streamline the GHG reporting process.

Business challenges

Scale and scope of data collection: Oil and Gas operations often span multiple facilities, equipment types, and processes, each of which must be monitored and documented to compile a comprehensive GHG emissions report. Based on the scale of operations, companies need to document and report on a wide range of emissions sources including combustion (fuel consumption for drilling, completions, and operations), fugitives (pneumatics, reciprocating compressors, population factors, and leaking equipment), and flaring/venting sources (flares from tanks, pilots, production, and compressor engine blowdowns). Consolidating this data from disparate sources and systems can be complex and resource-intensive for companies.

Time-intensive activities: The time and effort required to prepare these comprehensive GHG reports can be a significant drain on company resources, which diverts valuable personnel time away from core business activities. EPA reporting requirements and guidelines are also constantly evolving, which can necessitate continuous process improvements and system updates.

Calculation of emissions: Emissions calculations are based on complex methodologies and factors specified by the EPA, requiring specialized expertise and robust internal controls. Making sure of the accuracy and integrity of this emissions data is critical as any errors or discrepancies could expose the company to regulatory scrutiny, financial penalties, and reputational damage.

Validation and audit: Review and validation of emissions data prior to submission adds another layer of complexity to the GHG reporting process. These companies must have governance structures in place with cross-functional teams comprised of subject matter experts, data analysts, and compliance professionals to thoroughly vet the information before it is filed with the EPA.

Solution overview

The solution centers around the creation of a carbon data lake to bring the disparate data necessary for reporting together into one centralized location. In building this data lake, the team identified the necessary data sources and created a generic ingest mechanism to handle the existing and new data sources. A standardized data model was created that allows for the quick onboarding of new activity data into the data lake and enabling uniform downstream processing of the data regardless of the source.

Calculations on the ingested data were conducted to generate the KPIs required for EPA reporting. These calculations were managed in the SIF guidance. This guidance accelerates the building of carbon accounting solutions by providing functionality such as emissions factor management, calculations, processing pipelines, and audit logging out-of-the-box. Calculations in SIF are defined through a low-code language that allows business stakeholders to manage the calculations without having to involve the customer’s IT team.

Then, the resulting calculated KPIs were used to build the reports needed for annual EPA reporting requirements. Dashboards were also built using this data for more frequent assessments of emissions and mitigation projects.

A key requirement of a GHG reporting solution is audit records of calculated KPIs and data lineage. SIF automatically creates audit records for calculations done by the framework. Furthermore, data lineage is a built-in feature in the data lake. Observe Audit Service in the architecture section for more detail.

Architecture

This architecture illustrates the data flow from ingestion to the reporting dashboards displaying the calculated results. In this section we dive deeper into each block called out in the architecture.

Data ingestion

Block 1: This represents raw activity data from the data lake and outside systems. Activity data is data from daily operations that contribute to carbon emissions. The ingestion service is used to pull this data into the system.

Block 2 is the ingestion service. This service handles the ingestion of data from various data sources. The ingestion service runs on a schedule, ingests new data, conducts data quality checks, and writes the data to Amazon S3.

Block 3 shows the S3 bucket that stores the data quality results.

Block 4 is the Amazon S3 location where the ingested data is written. This data is kept for an historical archive of the ingested data. It is the source data for further data processing.

Block 5 shows the Activity Service. This service reads the data changes as a result of the ingestion process and writes the data into the Amazon Aurora database.

Block 6 is the Aurora database where data is stored in a standardized data model across ingestion sources. The data model is shown in the following image, along with a brief explanation of each table:

Activity: Tracks daily/weekly/yearly activities for each source. For example, for a compressor engine, we track the hours it was running each day.
Activity Config: Each activity type can have different information captured by the activity service, and this information is present in the Activity Config table.
Equipment: Tracks the equipment, their counts, active status, and location.
Equipment Metadata Config: Tracks the metadata config changes for equipment. Each equipment type in the Equipment table can have different columns to capture, and that information is present in the Equipment Metadata Config table.
Facility: This table contains information about each facility, info such as location, name address, and status.

Block 7 represents the calculation service. This service is mainly tasked with coordinating with the SIF, invoking the SIF pipeline for corresponding activity, making sure the SIF pipeline was completed successfully, and, once complete, recapturing the results from SIF and storing those results in an emissions table in Aurora.

Block 8 is the SIF Module. SIF allows us to keep our calculations configurable. This allows calculations to be defined in low-code syntax, which is familiar to business stakeholders. SIF also manages emissions factors data, which is referenced in the emissions calculations. The calculations service invoked calculations in SIF and stores results in the Aurora database in the data lake.

Block 9 shows the reporting service. At this stage, we ingest new activities and calculate daily emissions, which are ready for reporting and dashboarding. The reporting service is responsible for the following tasks:

Metric retrieval: Fetches the calculated metrics (customer-defined KPIs) from SIF.
Report generation: Generates EPA reports according to the configuration (see Block 13) and stores them in the Aurora tables. The report configuration includes metadata such as query definitions, which allows customizing reports without changes to the reporting code.

Block 10 represents the business intelligence dashboards. Amazon QuickSight was used here, but many tools can be used depending on the customer’s experience and preference.

Block 11 shows Amazon Athena tables, which are used to expose the tables in the Aurora database to the end user for read-only purposes and for QuickSight reporting.

Block 12 represents an AWS Lambda connector, which is used to connect the Aurora database with Athena. This exposes the tables from Aurora to the end users, because these tables were in a private subnet and weren’t accessible over the internet.

Block 13 is the configuration service. This is a central service that is used by the other services. This service tracks the configs used by the different services. Configs serve as the blueprint for each service, allowing end users to configure them without code changes. This service provides an API that end users can use to change/edit the existing configs or add new configs. Configs include details such as the following:

Data sources and data lake tables for the ingestion service.
Queries for specific ingestion jobs.
SIF pipelines to be invoked by the calculation service for a given activity type.
Reporting service configurations for report generation.

This flexibility empowers end users to tailor the system to their needs.

Block 14 shows the Amazon DynamoDB table that serves as the storage layer for the configuration service.

Block 15 represents the UI Application that provides the ability to change configs and to view the data quality results.

Audit service

The audit methodology involves capturing and storing audit events through a structured integration with multiple services, using synchronous and asynchronous processes. Retrieval is facilitated through an API that allows auditors to access audit trails and lineage, thus promoting comprehensive traceability and accountability. This structured approach enhances the ability to track and verify records across different systems effectively.

The audit process involves tracing records through specific queries and steps. For any calculation that needs to be audited, unique field values are extracted and used to query the audit_history table to extract audit details. Then, these are used to trace the audit lineage and gather additional information.

Security considerations

The solution presented in the preceding diagram uses multiple AWS services. The following links provide more information on the security best practices when using these services:

Outcomes

Using a carbon data lake solution in conjunction with SIF provides several key benefits for the EPA GHG emissions reporting process:

Centralized data management and accessibility: By centralizing relevant data sources into a carbon data lake, the solution streamlines the data collection and consolidation process, thus reducing the significant time and effort previously required. The generic ingest mechanism also allows for the quick onboarding of new data sources as a company’s operations and reporting requirements evolve.

Standardized data model: Implementing a standardized data model provides a unified foundation for developing reporting, calculation, and data quality processes. It also empowers companies to derive deeper insights from their GHG data and build future reporting solutions.

SIF: The use of SIF is pivotal in accelerating the building of solutions by providing the necessary functionality to perform complex emissions calculations based on EPA methodologies, managing emissions factors, and creating robust audit trails. The use of low-code calculation syntax defined in SIF empowers business stakeholders to manage these critical calculations without relying on their IT team.

Enhanced analytics and insights: By using the standardized data model, the flexible architecture of a data lake for carbon data tracking and capabilities of the SIF enable companies to fulfill their mandatory EPA reporting obligations and develop customized EPA reports. Furthermore, this approach enables companies to derive deeper insights into their emissions profiles and build dashboards for metrics visualization, top 10 analysis, and historical trends for their sustainability teams and operations managers.

Data quality and audit: This solution enables the development of data quality rules management and an automated data quality monitoring process. These capabilities support comprehensive data cleansing and validation, maintaining accuracy, completeness, and reliability of the information used for EPA reporting, which mitigates the risk of non-compliance.

These benefits collectively can help companies significantly reduce the overall effort required for data collection and validation for EPA reporting. This solution not only helps companies with compliance on EPA regulatory obligations, but also frees up valuable resources and time that could be reallocated toward other strategic business priorities and sustainability initiatives.

Conclusion

This holistic, cloud-focused approach can transform an Oil and Gas company’s GHG reporting from a resource-intensive compliance exercise to a strategic business tool that supports its transition toward a lower-carbon future. Overall, this solution demonstrates how Oil and Gas companies can use AWS services and SIF to build a scalable, adaptable, and data-driven GHG reporting system that addresses their complex operational and regulatory requirements.

AWS for Industries