AWS Cloud Operations Blog

Using Amazon Q Business to streamline your operations

Amazon Q, is a new generative artificial intelligence- (AI)-powered assistant designed for work that can be tailored to your business. You can use Amazon Q to have conversations, solve problems, generate content, gain insights, and take action by connecting to your company’s information repositories, code, data, and enterprise systems. Amazon Q provides immediate, relevant information and advice to employees to streamline tasks, accelerate decision-making and problem-solving, and help spark creativity and innovation at work.

In this blog post, we will show you how Amazon Q can be applied to enable your operational team during an issue or an outage. In an organization there are both internal and external applications that run on different services with multiple dependencies.The application/DevOps team creates runbooks that outline details about the application, its dependencies, and information that helps the operations team understand, troubleshoot, and resolve issues. The primary goal of the operations team is to expedite application recovery. Recovery steps could include identifying the key infrastructure components of the application, perform troubleshooting, failing over to a disaster recovery environment and escalating to application owners. All the required actions should be performed as quickly as possible to reduce the MTTI (Mean Time To Identify) and MTTR (Mean Time To Recovery).

Amazon Q offers user-based plans, so you get features, pricing, and options tailored to how you use the product. Amazon Q can adapt its interactions to each individual user based on the existing identities, roles, and permissions of your business. AWS never uses customer content from Amazon Q to train the underlying models. In other words, your company information remains secure and private.

Customers can use Amazon Q Business to build an application that can help reduce MTTI and MTTR. This application can be connected to a centralized Amazon S3 bucket containing the application runbooks. It can also be connected to AWS documentation for services used by the applications to assist further in understanding and resolving issues.

Sample application

For the blog post, we will be using a PetAdoption application that is available on GitHub. It is built using a microservice architecture, and different components of the application are deployed on various services, such as Amazon Elastic Kubernetes Service, Amazon Elastic Container Service, AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon Simple Queue Service, Amazon Simple Notification Service, and AWS Step Functions. The application architecture is shown in the following diagram.

Petadoption application

Figure 1 – PetAdoption application.

Building the Amazon Q Business Application

Architecture Diagram

Figure 2 – Architecture Diagram.

Prerequisites

  • AWS IAM Identity Center as the SAML 2.0-compliant identity provider (IdP). Please ensure that you have Enabled an IAM Identity Center instance, provisioned at least one user, and provided each user with a valid email address. For more details, see Configure user access with the default IAM Identity Center directory.
  • Amazon S3 bucket that will act as a central repository to store your Application runbooks.
  • Let’s upload the sample runbook to the S3 bucket. The sample runbook captures known issues, and other application information required to help the operations team with triage and escalation. You can choose to do this using AWS CloudShell and the commands listed below or by manually copying and uploading the runbook into the S3 bucket using the S3 service console.
cat << EOF > petadoption-runbook.doc
Application Name: PetAdoptions Production Application
 
Application Description: This Application enables people to easily adopt pets, It is a digital marketplace of over 10,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, This application has helped millions of pets find their forever homes.

Account info: 111111111111 (Primary Account), 222222222222 (DR Account)
 
Application Owners:
            Development Lead: Puneeth Komaragiri 
            Development Manager: Vikram Venkataraman
            Program/Product Manager: Puneeth Komaragiri
            On-call Alias: petadoptionsoncall@example.com

Severity: Critical
 
Public Facing: Yes
 
DR/Backup Environment: Yes; (us-west-2 is the backup/DR for us-east-1 env)
 
Core AWS Services used:

* S3
* EKS
* Dynamo-Db
* ELB
* SQS
* SNS
* CloudWatch 
* CloudFront

Regions: us-east-1 (Northern Virginia) & us-west-2 (Oregon)

Core infrastructure components: 

Dynamo-DB Global Table : pet-adoption-table
EKS Clusters: PetSite-FrontEnd,PetSearch-API, PetListAdoptions-API,PayForAdoptions-API, PetAdoptionStatusUpdater, PetAdoptionsHistory-API
Lambda Functions: PriceLessThan55, PriceGreaterThan55,PetAdoptionStatusUpdater
S3 Bucket: petadoptionss3bucket

 
Previously Occurred/ Known Issues & Fixes:
In very rare cases, you might encounter a behavior where the site does not show any pet images. Click on Perform Housekeeping in the PetSite home page upper right corner.
 
 
Failing over to DR Region:
            Description: The RTO (Recovery Time Objective) & RPO(Recovery Point Objective) Requirement for this application is 45mins. The Application needs to be failed-over to the active DR region in us-west-2 in case the outage lasts more than 30mins. The Application will be failed back after 24 hours of observing the primary region. 

Procedure: To failover to the DR region, The User will need to run the “DR-FAILOVER“ workflow from the Central-DevOps-Account.
 
Troubleshooting:

    * Is there an AWS Outage?
        * Check https://health.thinkwithwp.com/health/status for AWS service health.
            
    * How to reach AWS?
        * If Application is Down for customers, Cut an AWS Support ticket using link https://console.thinkwithwp.com/support
        * See https://docs.thinkwithwp.com/awssupport/latest/user/case-management.html for support case severity
        * Always open a Phone/Chat case for high, Urgent & Critical severity cases
            
        * Reach out to the Account Team Alias Email (sampleaccountteamemail@amazon.com) for additional help.
    *  How to escalate to the Application team?
        
        * Please reach out to the on-call via phone/email oncallpetadoptionapplication@example.com
---
EOF
aws s3 cp petadoption-runbook.doc s3://<Your S3 Bucket used as Datasource>/

Creating the Amazon Q Business application

First, let’s create a new application in the Amazon Q console and name it petadoptions-ops-app. For the access management prompt, let’s choose the recommended path which is using IAM Identity Center

Setting IAM Identity center as Identity provider

Figure 3 – Setting IAM Identity center as Identity provider.

Creating the Q business app

Creating the Q business app

In the next step, we will choose the retriever for the Amazon Q application. We will use native retriever which creates an Amazon Q Business index that can connect to the Amazon Q Business supported data sources that you choose.

Selecting Native Retriever

Figure 5 – Selecting Native Retriever

In the next step, we will be adding data sources that contain relevant data required for this use-case. For this blog post, we will be using two data sources: Amazon S3 bucket and Web crawler. Let’s first add the Web crawler data source. We will name the data source as aws-core-services-crawlers and will add the URLs listed below.

Adding datasource

Figure 6 – Adding data source.

Sample AWS documentation links:

[EKS Best Practices] : https://aws.github.io/aws-eks-best-practices/
[EKS Knowledge Center Articles ] : https://repost.aws/knowledge-center/all?view=all&search=EKS&sort=recent
[Load Balancer Troubleshooting] : https://docs.thinkwithwp.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html
[EC2 Troubleshooting] : https://docs.thinkwithwp.com/AWSEC2/latest/UserGuide/ec2-instance-troubleshoot.html
[Lambda Troubleshooting] : https://docs.thinkwithwp.com/lambda/latest/dg/lambda-troubleshooting.html
[S3 Troubleshooting] : https://docs.thinkwithwp.com/AmazonS3/latest/userguide/troubleshooting.html
[DynamoDB Troubleshooting] : https://docs.thinkwithwp.com/amazondynamodb/latest/developerguide/Troubleshooting.html
[AWS Support Case] : https://docs.thinkwithwp.com/awssupport/latest/user/case-management.html
Webcrawler datasource

Figure 7 – Webcrawler datasource.

Other options can be left as default. Please choose the Create a new service role option from the dropdown for the IAM role as shown in the screenshot below.

IAM role creation

Figure 8 – IAM role creation

For the sync run schedule, you can choose the frequency depending on the rate at which the data changes.

Configuring sync schedule

Figure 9 – Configuring sync schedule.

Next, let’s add the Amazon S3 bucket which was identified in the prerequisites as a data source. Please choose the Create a new service role option from the dropdown for the IAM role as shown in the screenshot below.

S3 as datasource

Figure 10 – S3 as datasource.

In the sync scope, specify the S3 bucket that was created as part of the prerequisites. For the sync run schedule, you can choose the frequency depending on the rate at which the data changes.

Sync scope for S3 datasource

Figure 11 – Sync scope for S3 datasource.

Once you have added both the data sources, click on ‘Next’.

Datasource configuration

Figure 12 – Data source configuration.

In this step, we will add users and groups from your IAM Identity Center directory. Let’s click on Add Users and select Assign existing users and groups

Adding Users

Figure 13 – Adding Users.

Adding Users

Figure 14 – Adding Users.

Now, let’s search and select for the User/Group name that was created as part of the prerequisites.

Looking up users

Figure 15 – Looking up users.

Click on Create Application.

Creating application

Figure 16 – Creating application.

You should see the application status as Created successfully with the Web experience URL.

Webexperience URL

Figure 17 – Web experience URL

Now, click into the newly created application and select each of the data sources and click on the Sync now button to initiate the data sync. This Data Sync might take a few minutes. The data sources post syncing should have a Completed sync status like below :

Sync completion

Figure 18 – Sync completion

Accessing the Amazon Q Application’s web experience endpoint

In the next steps, we will be interacting with the interface of petadoption-ops-app application to get insights into the PetAdoption application.

Click the Web experience settings tab in the Amazon Q application console to copy the deployed URL of the application.

Accessing the application

Accessing the application

Use your browser to access the Deployed URL, It should take you to the IAM Identity center for authentication. Post authentication, you should see the user interface of the Amazon Q application that looks like the screenshot below:

Interacting with Q application

Interacting with Q application

Let’s see the petadoption-ops-app in action. Let’s assume you are a new member of the SRE team and you are supporting the PetAdoption application. Let’s interact with the application to get an overview of the application.

Overview of application

Figure 21 – Overview of application.

The petadoption-ops-app was able to crawl through the data sources and provide a quick summary of the PetAdoption application.

Next, let’s say you are seeing errors specific to Amazon EKS services and would like to know the services that leverage EKS services and the respective contacts of the application.

Services using EKS

Figure 22 – Services using Amazon EKS.

Now that we have the necessary information, let’s share the error messages we see to get the root cause of the issue.

Identifying root cause

Figure 23 – Identifying root cause.

petadoption-ops-app was able to provide insight into the potential root cause of the issue and the metrics that need to be captured to monitor the throttling on the API Server.

Let’s say the PetAdoption application is down and you have to switch the application to your DR site. You are not sure of the process. Let’s try this scenario with petadoption-ops-app

Performing diaster recovery

Figure 24 – Performing disaster recovery.

Finally, Let’s ask the petadoption-ops-app on details pertaining to opening an AWS Support Case.

Creating support case

Figure 25 – Creating support case.

Conclusion

This purpose of this blog post is to implore you to think about different ways you can use Amazon Q to enable your teams to operate more effectively. In this instance, the Amazon Q Business application can can further be enhanced by connecting it to more data sources, like your content repositories, business applications and collaboration tools. You can also leverage the same application for change management where operational teams often rely on runbooks for executing specific procedures. You can learn more about Amazon Q using the links below.

Learn more

Amazon Q main product page
Amazon Q details for IT pros and developers
Get started with Amazon Q

Read more about Amazon Q

Introducing Amazon Q, a new generative AI-powered assistant (preview)
Improve developer productivity with generative-AI powered Amazon Q in Amazon CodeCatalyst (preview)
Upgrade your Java applications with Amazon Q Code Transformation (preview)
New generative AI features in Amazon Connect, including Amazon Q, facilitate improved contact center service
New Amazon Q in QuickSight uses generative AI assistance for quicker, easier data insights (preview)
Amazon Q brings generative AI-powered assistance to IT pros and developers (preview)

About the authors

Vikram Venkataraman author photo

Vikram Venkataraman

Vikram Venkataraman is a Principal Specialist Solutions Architect at Amazon Web Services. He helps customers modernize, scale and adopt best practices for their containerized workloads. He is passionate about Observability and focusses on Open Source AWS Observability services like Amazon Managed Service for Prometheus, Amazon Managed Grafana and AWS Distro for Open Telemetry.

Puneeth Ranjan Komaragiri author photo

Puneeth Ranjan Komaragiri

Puneeth is a Principal Technical Account Manager at AWS. He started his journey as a Cloud Support Engineer in the Networking team where he worked on various AWS Networking & Monitoring services. He is passionate about Monitoring and Observability and Cloud Financial Management domains. He likes working with customers to help them design and architect their workloads for scale and resilience.