Customize large language models with oil and gas terminology using Amazon Bedrock

Numerous energy companies now use generative artificial intelligence (AI) to rapidly retrieve information from hundreds of documents, enable chatbot assistants to increase operational efficiency, and generate synthetic data for use in simulation and decision-making. Out of the various types of generative models, large language models (LLMs) are getting the most attention due to their impressive capabilities for text generation, sentiment analysis, and code generation.

Although LLMs can perform various tasks with natural language, they are generalists, not specialists. Working with generalist language models can be a challenge when processing text data from highly specialized domains that have a language of their own, such the oil and gas industry. Drilling notes, for example, are full of jargon and abbreviations that make the content difficult for nonspecialists to decipher. As oil and gas companies explore how generative AI can help them unlock value from thousands of drilling documents, they will need language models that can understand those documents—including all the jargon and abbreviations.

Amazon Bedrock, a service from Amazon Web Services (AWS) that offers a choice of high-performing foundation models from leading AI companies, has recently launched capabilities to customize foundation models with the user’s own data to build applications that are specific to a domain, organization, and use case. Fine-tuning is a customization approach that increases an LLM’s accuracy by using task-specific labeled data to create a specialized LLM. This blog will illustrate how oil and gas customers can use drilling data to fine-tune an LLM to understand the abbreviations and the terminology of drilling notes and to automatically generate daily summaries. This specialization improves the efficiency of drilling engineers by providing a draft of the report that they must send to oil and gas companies.

Building the solution

The solution uses AWS Glue—a serverless data integration service—and Amazon Athena—a serverless, interactive analytics service—to query drilling reports. It stores the reports using Amazon Simple Storage Service (Amazon S3), which provides object storage built to retrieve virtually any amount of data from anywhere. For its foundation model, the solution uses Amazon Titan Text Express, a customizable foundation model offered through Amazon Bedrock. We will fine-tune Amazon Titan Text Express with hourly comments and daily summaries written by drilling engineers to help the model learn drilling jargon. For fine-tuning, deployment, and inference, we will use the Amazon Bedrock API.

Figure 1. Solution architecture to query drilling reports stored in WITSML format in Amazon S3. An AWS Lambda function converts the XML files into JSON, and an AWS Glue Crawler loads data into the AWS Glue data catalog. Amazon Athena retrieves data using SQL-like statements to fine-tune the model. Both the original model and the customized model are made available to the user for comparison.

Using the Volve dataset

The Norwegian multinational energy company Equinor has made Volve dataset, a set of drilling reports available for research, study, and development purposes. (When using external data, be sure to abide by the license the data is offered under.) The dataset contains 1,759 daily drilling reports—each containing both hourly comments and a daily summary—from the Volve field in the North Sea. Drilling rig supervisors tend to use domain-specific terminology and grammar when describing operations in both the hourly comments and the daily summary. This terminology is standard in the industry, which is why fine-tuning a foundation model using these reports is likely to improve summarization accuracy by enhancing the LLM’s ability to understand jargon and speak like a drilling engineer.

Below is an example daily drilling report from the Volve field in PDF format. Rig operations personnel create daily drilling reports to record the state of the operation, record actions taken earlier that day, and communicate planned activities.

Figure 2. Daily drilling report PDF from the Volve field

In this blog, we will customize Amazon Titan Text Express to specialize in automatically generating a draft of the daily drilling report from the drilling engineers’ annotations. The labeled dataset will consist of the “Summary of activities (24 hours)” section and the “Remark” column in the “Operations” section.

The labeled dataset will be queried from drilling reports from the Volve dataset, which are available in the WITSML format. WITSML is an XML-based hierarchal data format used to store data related to oil and gas operations, based on Energistics standards. To facilitate the querying process, the WITSML files have been transformed into the JSON format and are queried using AWS Glue and Amazon Athena. A few examples of the hourly comments and daily summaries are shown below.

SELECT dr.witsml_drillReports.witsml_drillReport.witsml_nameWell as WellName, 
dr.witsml_drillReports.witsml_drillReport.witsml_dTimStart as start_time, 
dr.witsml_drillReports.witsml_drillReport.witsml_statusInfo.witsml_sum24Hr as  daily_summary_comment,
array_join(transform(dr.witsml_drillReports.witsml_drillReport.witsml_activity, x -> x.witsml_comments),' ') as hourly_comments
FROM "{glue_db}"."{glue_tb}" dr
ORDER BY dr.witsml_drillReports.witsml_drillReport.witsml_nameWell, dr.witsml_drillReports.witsml_drillReport.witsml_dTimStart

well_name	start_time	daily_summary_comment	hourly_comments
NO 15/9-19 A	1997-10-25T00:00:00+02:00	COMPLETED PU/MU 3 1/2″ TUBING. LANDED STRING IN WELLHEAD AND CORRELATED WITH AWS/MWS WIRELINE EQUIPMENT.	COMPLETED MU DST BHA. TESTED TO 345 BAR FOR 10 MINUTES. CONTINUED RUNNING DST STRING – PU/MU 3 1/2″ PH-6 TUBING FOR TESTING. FILLING TUBING WITH BASE OIL EVERY 7 JOINTS. 38 JOINTS PU/MU AS OF REPORT TIME. CONTINUED RUNNING DST STRING – PU/MU 3 1/2″ PH-6 TUBING FOR TESTING. FILLING TUBING WITH BASE OIL EVERY 7 JOINTS.
NO 15/9-19 S	1993-03-19T00:00:00+01:00	LAID DOWN HANGER RUNNING TOOL. MADE UP NEW BHA AND RIH TO 2870M. WASHED/REAMED FROM 2870M TO TOP PBR AT 3206M. CIRC BTM’S UP. POOH AND LAID DOWN 49 JOINTS 5″ DP. TESTED LINER LAP TO 70 BAR/10 MIN. *MADE UP 6″ BHA, AND PICKED UP 3 1/2″ DP WHILE RIH.	POOH WITH LINER RUNNING TOOL FROM 1470M TO SURFACE. LAID DOWN 78 JOINTS OF 5″ DP WHILE POOH. LAID DOWN NODECO HANGER RUNNING TOOL AND CEMENT STAND. MADE UP 8 1/2″ BIT AND SCRAPER AND RIH TO 2870. WASHED AND REAMED FROM 2870M TO TOP OF PBR AT 3206M, NO HARD CEMENT. CIRCULATE BOTTOMS UP, NO TRACES OF CEMENT. SLUGGED PIPE AND POOH FROM 3206M TO 827M. LAID DOWN 14 JOINTS OF 5″ DP WHILE POOH.
NO 15/9-19 B	1997-12-27T00:00:00+01:00	COMPLETED ROUND TRIP DUE TO MWD POWER FAILURE. RE-LOGGED PREVIOUS DRILLED SECTION, CONTINUED DRILLING FROM 3690M TO 3708M.	CONTINUED POOH. RE-SLUGGED AT LINER SHOE. BROKE OUT BIT. DUMP MWD MEMORY, CHANGED PROBE IN MWD. LAID DOWN MPR AND SERVICED SAME. MADE UP MPR AND MWD ASSY. CHECKED MWD CONNECTOR. MADE UP BIT AND CHECKED SCRIBE LINE. RIH TO LINER SHOE. FILLED STRING EVERY 20 STANDS. S/C DRILLING LINE. CONTINUED RIH TO 3629M. BROKE CIRCULATION AND WASH AND ROTATED F/3629- 3645M. LOG SECTION FROM 3645M TO 3679M WITH MWD. OBSERVED PIT GAIN 0,5 M3.

Fine-tuning Amazon Titan Text Express using Amazon Bedrock

Amazon Bedrock supports fine-tuning for Meta Llama 2, Cohere Command, and Amazon Titan models through the CreateModelCustomizationJob API method. With this API method, the user specifies the model and the fine-tuning dataset. Optionally, the user can also supply a validation dataset and hyperparameters to customize the training job. For this summarization task, the training data consists of labeled examples, such as the one shown below.

{
"prompt": "VERIFIED THAT LPR-N VALVE WASN'T LEAKING.CLOSED LOWER LUBRICATOR VALVE & PERFORMED 10 MINUTE IN-FLOW TEST. CLOSED UPPER LUBRICATOR VALVE. BLED PRESSURE OFF OF SURFACE LINES THROUGH CHOKE MANIFOLD & FLUSHED LINES TO STOCK TANK WITH SEAWATER. WHILE CONTINUING WITH SHUT-IN PRESSURE BUILD-UP TEST, RU ELECTRIC LINE IN PREPARATION FOR FUTURE WIRELINE SAMPLING. CONTINUED PERFORMING SHUT-IN BUILD-UP TEST. CONTINUED PERFORMING SHUT-IN BUILD-UP TEST. OPENED WELL & FLOWED IN PREPARATION OF BOTTOM-HOLE-SAMPLING BHS OPERATIONS. SHUT-IN WELL \u00d8 1645 HRS. RU 4 BHS TOOLSTRING ON ATLAS ELECTRIC LINE. TESTED LUBRICATOR/BOPS TO 200 BAR. RIH WITH BHS TO 3765 M - WELL WAS OPENED-UP AFTER TOOLSTRING WAS BELOW 500 M. SHUT-IN WELL. PERFORM BHS OPERATIONS. POOH WITH BHS / ELECTRIC LINE.\n Summarize the text above",
"completion": "COMPLETED SHUT-IN BUILD-UP TEST. PERFORMED BOTTOM-HOLE-SAMPLING\nOPERATIONS VIA ELECTRIC LINE. STARTED MINI-FRAC OPERATIONS."
}

The CreateModelCustomizationJob method is called with the Amazon S3 path to the training files, as shown below.

create_model_response = bedrock_client.create_model_customization_job(
    jobName='drilling_roport_summarization_job',
    customModelName='drilling_roport_summarization_model',
    roleArn=<arn of IAM role with access to Amazon Bedrock and the S3 training locations>,
    baseModelIdentifier='amazon.titan-tg1-large',
    trainingDataConfig={
        's3Uri': 's3://<s3_bucket_name>/training/drilling_report_training_data.jsonl'
    },
    validationDataConfig={
        'validators': [{
            's3Uri': 's3://<s3_bucket_name>/validation/drilling_report_validation_data.jsonl'
        }]
    },
    outputDataConfig={'s3Uri': 's3://<s3_bucket_name>/finetuning/output'},
    jobTags = [ 
      { 
         "key": "Note",
         "value": "This fine-tuned model summarizes drilling reports"
      }
   ]
)

After the training completes, users can plot the loss and the perplexity curves to observe if the training job converged. Here, we see that after 40 steps, training continues to improve both the training and the validation loss metrics, but perplexity has plateaued for both training and validation data. Often when fine-tuning an LLM, training needs to stop before the loss metrics plateau to preserve the generalizability of the custom model. Otherwise, the resulting model can be so specific to the fine-tuning dataset that it forgets to do other basic tasks—a scenario called catastrophic forgetting.

Figure 3. Loss and perplexity curves for the example data

Evaluating the custom model

The custom model’s ability to generate daily summaries that resemble those written by drilling engineers will be evaluated both quantitatively and qualitatively. For a quantitative evaluation, we will use the Bilingual Evaluation Understudy (BLEU) metric—a numerical metric describing text similarity that is commonly used to measure the quality of translated text—to measure the similarity between summaries prepared by a language model and those created by drilling engineers. The BLEU score ranges from 0 to 1, with higher values indicating a better match between the texts. In other words, the closer the value is to 1, the better the generated text. On the qualitative side, we will select a few examples to observe if the language models use proper industry terminology and acronyms and if they can select the most relevant activities to include in the daily summary.

The table below shows several completions from both the foundation model and the fine-tuned model and how they compare to the summaries generated by a rig supervisor. The fine-tuned completion tends to have a better match to the human summary in terms of content (that is, the fine-tuned model has better judgment in selecting relevant activities to be included in the summary), length, and style. The BLEU score also indicates that the custom model has a much closer resemblance to the summaries prepared by drilling engineers than the original model, as shown by the score increasing by several orders of magnitude in some cases.

Prompt	Human Summary	Fine Tuned Model Completion	Fine Tuned BLEU Score	Foundation Model Completion	Foundation Model BLEU Score
CIRCULATED BTM UP TO CLEAN WELL FOR CORING GAS. MAX GAS 0,7 % FLOW CHECKED, OK. PUMP SLUG AND POOH. FLOW CHECKED INSIDE CSG WINDOW, OK. CURRENT DEPTH AT REPORT TIME 1195M. CONTINUED TO POOH. FLOW CHECKED BEFORE PULLING BHA INTO BOP, OK. RETRIEVED CORE NO 4. 96,3 % RECOVERY. NO H2S GAS PRESENT IN COREBARREL MADE UP NEW COREHEAD AND INNERBARREL. SERVICED COREBARREL. RIH WITH COREBARREL TO 3847M. BROKE CIRCULATION EACH 1000M. WASHED AND ROTATED FROM 3847M TO 3935,5M. TAGGED BTM AND SPACED OUT. SEATED BALL IN BALL SEAT WITH 26 BAR PRESSURE INCREASED. RECORDED SCR’S. CUT CORE NO 5 FROM 3935,5M TO 3962,5M. BROKE CORE WITH 11 TON OVER PULL. PUMPED AND ROTATED UP TO 3820M. CIRCULATED BTM UP TO CLEAN WELL FROM CORING GAS. MAX GAS 2,1 % DECLINED TO 0,8 %. FLOW CHECKED, OK. PUMPED SLUG AND POOH TO 3600M AT 2400 HRS. Summarize the text above	RECOVERED CORE NO 4. RIH AND CUTTED CORE NO 5. CIRCULATED BTM UP. POOH. STARTED TO RETRIEVE CORE NO 5.	CUT CORE NO 5 FROM 3935,5M TO 3962,5M. POOH WITH CORE. CUT CORE NO 4 FROM 3962,5M TO 4000M.	0.02774	Btm was pumped and rotated up to clean the well from coring gas. The maximum gas flow was 0.7%, and the flow checked inside the csg window, which was ok. Current depth was 1195m. Flow checked before pulling BHA into BOP, ok. Retrieved core no 4, with 96.3% recovery. No H2S gas present in core barrel. Serviced core barrel and rih with core barrel to 3847m. Broke circulation each 1000m. Washed and rotated from 3847m to 3935.5m. Tagged BTM and spaced out. Seated ball in ball seat with 26 bar pressure increased. Recorded scr’s. Cut core no 5 from 3935.5m to 3962.5m. Broke core with 11 ton over pull. Pumped and rotated up to 3820m. Circulated Btm up to clean well from coring gas. The maximum gas flow was 2.1%, and the flow checked inside the csg window, which was ok. Pumped slug and pooh to 3600m at 2400 hs.	1.42*10-3¹⁰
Circulated out spacer and excess cement at 2569m MD with 2500 lpm, 162 bar and 60 rpm. Pulled out of hole with 3 1/2″ cement stinger on 5 1/2″ drill pipe from 2569m MD to 291m MD. Performed flowcheck prior to pull 3 1/2″ cement stinger through BOP – well static. Changed out PS21 slips and installed 3 1/2″ auto slips. Broke out 3 1/2″ x 5 1/2″ crossover and laid down. Changed out die carriers and elevator inserts. Pulled out of hole with 3 1/2″ cement stinger from 291m MD and racked back same. Broke out and laid down mule shoe. Changed to 5 1/2″ handling equipment. Cleaned and tidied rig floor. Performed prejob meeting prior to pick up and make up 8 1/2″ bottom hole assembly. Picked up and made up 8 1/2″ bottom hole assembly to 40m MD. Uploaded MWD and installed radioactove sources. Picked up and made up 8 1/2″ bottom hole assembly from 40m MD to 196m MD. Ran in hole with 8 1/2″ bottom hole assembly from 196m MD to 993m MD. Engaged top drive and filled pipe. Established circulation with 2200 lpm and 123 bar. Shallow tested MWD. Displaced kill and choke lines to 1.28 sg OBM. Ran in hole with 8 1/2″ bottom hole assembly from 993m MD to 1354m MD. Emptied trip tank. Ran in hole with 8 1/2″ bottom hole assembly from 1354m MD to 1609m MD. Performed slip and cut of drill line. Summarize the text above	Circulated out spacer and excess cement at 2569m MD. Pulled out of hole with 3 1/2″ cement stinger on 5 1/2″ drill pipe from 2569m MD. Picked up and made up 8 1/2″ bottom hole assembly. Ran in hole with 8 1/2″ bottom hole assembly to 1557m MD. Performed slip and cut of drill line.	Ran in hole with 8 1/2″ bottom hole assembly from 993m MD to 2143m MD. Pulled out of hole with 3 1/2″ cement stinger. Performed flowcheck prior to picking up and making up 10 3/4″ bottom hole assembly. Made up 10 3/4″ bottom hole assembly from 40m MD to 259m MD.	0.19289	The work involved circulation out of spacer and excess cement, pulling out of the hole with a 3 1/2″ cement stinger, changing out PS21 slips, installing 3 1/2″ auto slips, breaking out and laying down a crossover, changing out die carriers and elevator inserts, pulling out of the hole with a 3 1/2″ cement stinger, breaking out and laying down a mule shoe, changing to 5 1/2″ handling equipment, cleaning and tidying the rig floor, performing a prejob meeting, picking up and making up an 8 1/2″ bottom hole assembly, uploading MWD and installing radioactive sources, picking up and making up an 8 1/2″ bottom hole assembly, running in the hole with the assembly, engaging the top drive, filling the pipe, establishing circulation, shallow testing MWD, displacing kill and choke lines, running in the hole with the assembly, emptying the trip tank, and running in the hole with the assembly.	0.0194
Drilled and orientated 8 1/2″ hole from 4162m to 4250m with 2200 lpm, 209-211 bar, 180 rpm, 20-23 kNm, WOB 4-8 MT, ECD 1.45-1.46 sg. Average ROP: 15 m/hr. Drilled and orientated 8 1/2″ hole from 4250m to 4277m with 2200 lpm, 212-213 bar, 180 rpm, 20-21 kNm, WOB 4-8 MT, ECD 1.46 sg. Average ROP: 13.5 m/hr. Performed slow circulation rates. Drilled and orientated 8 1/2″ hole from 4277m to 4372m with 2000-2200 lpm, 181-213 bar, 180 rpm, 19-23 kNm, WOB 2-5 MT, ECD 1.46 sg. Average ROP: 12.3 m/hr. Drilled and orientated 8 1/2″ hole from 4351m to 4380m with 2350 lpm, 220-230 bar, 110 rpm, 21-22 kNm, WOB 5-6 MT, ECD 1.44 – 1.46 sg. Average ROP: 16 m/hr. 8 Average ROP: 16.8 m/hr. Experienced occational problems with top drive causing decrease in RPM. Drilled and orientated 8 1/2″ hole from 4446m to 4464m with 2350 lpm, 230-232 bar, 140-180 rpm, 19-23 kNm, WOB 8-10 MT, ECD 1.44 – 1.45 sg. Average ROP: 7.2 m/hr. Occasionally drilled with reduced RPM due to hard stringers. Summarize the text above	Drilled and oriented 8 1/2″ hole from 4162m MD to 4464m MD.	Drilled and oriented 8 1/2″ hole from 4162m to 4464m.	0.57311	Drilled and orientated eight 1/2″ holes with varying conditions and average ROP. Slow circulation rates were used in some sections. Occasional top drive issues caused decreased RPM.	0.00427

Cleaning up

If you deploy a custom model using Amazon Bedrock, remember afterward to navigate to the Custom Models tab on the Amazon Bedrock AWS Console page to delete the model and to prevent incurring cost for models that you are not using.

Conclusion

Orchestrating, parallelizing, and managing fine-tuning jobs to customize foundation models for specific tasks used to be difficult and expensive. Amazon Bedrock simplifies this process so that anyone can build, deploy, and maintain sophisticated fine-tuned models that understand and respond in a domain-specific verbiage. AWS customers can quickly fine-tune a foundation model without dedicated data scientists or extensive expertise.

Generative AI has the potential to improve efficiency by automating time-consuming tasks even in domains that require deep knowledge of industry-specific nomenclature and acronyms. Having a custom model that provides drilling engineers with a draft of daily activities has the potential to save hours of work every week. Model customization can also help energy and utilities customers in other applications that involve the generation of highly technical content, as is the case of geological analyses, maintenance reports, and shift handover reports.

AWS for Industries