Amazon CloudWatch | AWS Machine Learning Blog

Unlocking language barriers: Translate application logs with Amazon Translate for seamless support

This post addresses the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment.

Enable pod-based GPU metrics in Amazon CloudWatch

This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.

Build a centralized monitoring and reporting solution for Amazon SageMaker using Amazon CloudWatch

In this post, we present a cross-account observability dashboard that provides a centralized view for monitoring SageMaker user activities and resources across multiple accounts. It allows the end-users and cloud management team to efficiently monitor what ML workloads are running, view the status of these workloads, and trace back different account activities at certain points of time.

Getting started with the Amazon Kendra SharePoint Online connector

Amazon Kendra is a highly accurate and easy-to-use enterprise search service powered by machine learning (ML). To get started with Amazon Kendra, we offer data source connectors to get your documents easily ingested and indexed. This post describes how to use Amazon Kendra’s SharePoint Online connector. To allow the connector to access your SharePoint Online […]

Use Amazon CloudWatch custom metrics for real-time monitoring of Amazon Sagemaker model performance

The training and learning process of deep learning (DL) models can be expensive and time consuming. It’s important for data scientists to monitor the model metrics, such as the training accuracy, training loss, validation accuracy, and validation loss, and make informed decisions based on those metrics. In this blog post, I’ll show you how to […]

Monitoring GPU Utilization with Amazon CloudWatch

Deep learning requires a large amount of matrix multiplications and vector operations that can be parallelized by GPUs (graphics processing units) because GPUs have thousands of cores. Amazon Web Services allows you to spin up P2 or P3 instances that are great for running Deep Learning frameworks such as MXNet, which emphasizes speeding up the deployment […]

AWS Machine Learning Blog