AWS Cloud Operations Blog
How Capgemini used AWS Systems Manager and AWS cloud native observability to provide self-service logging and analytics
This post was written in collaboration with David Wansell, an Enterprise Cloud Architect at Capgemini with over 20 years of experience across multiple enterprise domains. He designs and builds automation and solutions that enable customers to deliver on their desired outcomes in their cloud adoption journey.
Log analysis helps customers to manage infrastructure and applications more effectively by diagnosing any issues, determining trends etc . Customers would like to gather, explore, search, and analyze the log data from various applications and system infrastructure. Many customers leverage managed solutions providers to manage their AWS accounts, and they’re looking for AWS native solutions to solve their business problems.
As a certified AWS Managed Services Provider (MSP), an AWS Premier Consulting Partner with seven AWS Competencies, and a AWS Well-Architected Partner Program, Capgemini has been proven to create solutions for challenges to fit the unique and evolving needs of customers.
Cloud Operation Services (COS) from CapGemini is a Managed Service offer for AWS Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solutions. Built on AWS best practices and tools, this post provides a breakdown of the various components leveraged and implemented to offer modern cloud managed services by using cloud native tooling. This illustrates the relationship between the components and provides detailed information on each component’s use and process flow.
For Self-Service logging and analyzing the logs, Capgemini leverages AWS Systems Manager. The SSM agent allows Systems Manager to update, manage, and configure Amazon CloudWatch agent installed on the resources which can dynamically use Run Command and Parameter Store. The CloudWatch logs are filtered and transformed for analytics using AWS Lambda, and placed in Amazon Kinesis Data Firehose, which are streamed into Amazon Simple Storage Service (Amazon S3). The AWS Glue crawler uses Amazon S3 as the data source, it creates a data store and it lets Amazon Athena query real-time streaming data.
AWS Services and Feature components used in this solution
Systems Manager is an AWS service that you can use to view and control your infrastructure on AWS, on-premises, or hybrid environment. Using the Systems Manager console, you can view operational data from multiple AWS services and automate operational tasks across your AWS resources. For more information on systems Manager capabilities, refer to this link.
CloudWatch Agent is a software package that autonomously and continuously runs on your servers. Using CloudWatch Agent, we can collect metrics and logs from Amazon Elastic Compute Cloud (Amazon EC2), hybrid, and on-premises servers running both Linux and Windows. CloudWatch Agent provides access to more system level and in-guest metrics, in addition to host metrics already provided by Amazon EC2. Furthermore the agent lets us collect, aggregate, and summarize metrics and logs from containerized applications and microservices.
Using Run Command, a capability of AWS Systems Manager, you can remotely and securely manage the configuration of your managed instances. Parameter Store, a capability of AWS Systems Manager, provides secure, hierarchical storage for configuration data management and secrets management.
Amazon EventBridge is a serverless event bus that makes it easier to build event-driven applications at scale using events generated from your applications, integrated Software-as-a-Service (SaaS) applications, and AWS services. To know more about EventBridge and how it works, refer to this link.
Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon OpenSearch Service, Splunk, and any custom HTTP endpoint or HTTP endpoints that are owned by third-party service providers.
AWS Glue is a serverless, fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog. This is an ETL engine that automatically generates Python or Scala code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries.
Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there’s no infrastructure to manage, and you pay only for the queries that you run.
Logging prerequisites
The Amazon EC2 instances that should be managed by the solution must follow these prerequisites:
Tags: Instances are required to be tagged with the appropriate management tag key and value that corresponds with what the “COS-Lambda-Create-EC2-Instance-CloudWatch-Alarms” Lambda function is scanning for.
Systems Manager: Make sure that the instances complete the Systems Manager’s prerequisites as per here. The Instance profile role will also need the “CloudWatchAgentServerPolicy” policy attached to stream metrics to CloudWatch.
CloudWatch: Refer to this link for a list of supported operating systems for the CloudWatch Agent.
How Capgemini made it work
When an Amazon EC2 is launched and in the “running” state, the event is detected by a creation EventBridge rule “CreateLoggingEventRule”. This forwards the instance-id to the creation Lambda function. Once the Lambda function receives the instance-id of the newly provisioned Amazon EC2 instance, then it will do the following:
- Check that instances are correctly tagged. If the correct management tag (with the correct case) isn’t found, then the instance won’t be processed.
- Install and update the CloudWatch agent via the Systems Manager “AWS-ConfigureAWSPackage” document.
- Determine if the Amazon EC2 is Windows or Linux.
-
- Windows instances will run the Systems Manager “AWS-RunPowerShellScript” commands which will:
-
-
- Install the “AmazonCloudWatch-coswindowsloggingconfiguration” CloudWatch Agent configuration file for windows held in the SSM parameter store.
- Start the CloudWatch Agent.
-
-
- Linux Instances will run the Systems Manager “AWS-RunShellScript” commands, which will:
-
-
- Install collectd and epel.
- Install the “AmazonCloudWatch-coslinuxloggingconfiguration” CloudWatch Agent configuration file for Linux held in the Systems Manager parameter store.
- Start the CloudWatch Agent.
-
The streamed logs are temporarily stored in a CloudWatch log group. By default, the Log group for Windows and Linux are created according to the following log group naming convention: CCP-logs-${AccountName}-${AWS::AccountId}-${AWS::Region}. CloudWatch log group subscription filters define the pattern for logs that will be moved to an Amazon S3 and will be available for Athena querying. The deployed CloudWatch log groups are configured to keep the logs for two days. Any older logs will be deleted.
The “Cos-Log-ETL” Lambda function is an ETL function that transforms the logs into the JSON format to make them compatible with Athena, and then places them into Kinesis Data Firehose. Then, it places the logs into an Amazon S3 Bucket. This bucket is only accessible to Kinesis Data Firehose, and the Athena and isn’t publicly accessible. It’s a store for all of the client logs. No log data will be streamed off account.
Log files in Amazon S3 are stored for three months. A lifecycle management policy is setup, and by default it will automatically change data that is older than one month to RRS, and data older than two months to IA. Glacier isn’t applicable due to the data being inaccessible, until a request is made to return the data from storage.
Athena lets you query the logs that are being stored in the Amazon S3 bucket with user-friendly SQL language. Athena queries are saved into the Amazon S3 Bucket. Then, Athena uses AWS Glue Crawler to automatically update the data store with the real-time data. By default, Crawler updates the data store every five minutes and allows Athena to query the real-time streaming data.
Logs
By default, the following logs are configured to be streamed to CloudWatch logs. These settings are stored in the AmazonCloudWatch-coslinuxloggingconfiguration and AmazonCloudWatch-coswindowsloggingconfiguration parameter store items for Windows and Linux respectively.
Linux:
Windows:
Summary
Capgemini now offers a logging solution for your applications and system infrastructure, and it helps with transforming the logs for analytics. To learn more about how Capgemini can assist with your business challenges related to management and governance, and to learn more about Capgemini AWS Cloud Operation Services, visit Capgemini Cloud Platform. To learn more about how AWS Systems Manager could be leveraged to manage instances in a hybrid environment, visit AWS Cloud Operation Services.
About the authors: