AWS Machine Learning Blog

Category: Amazon CloudWatch

Build a centralized monitoring and reporting solution for Amazon SageMaker using Amazon CloudWatch

In this post, we present a cross-account observability dashboard that provides a centralized view for monitoring SageMaker user activities and resources across multiple accounts. It allows the end-users and cloud management team to efficiently monitor what ML workloads are running, view the status of these workloads, and trace back different account activities at certain points of time.

Getting started with the Amazon Kendra SharePoint Online connector

Amazon Kendra is a highly accurate and easy-to-use enterprise search service powered by machine learning (ML). To get started with Amazon Kendra, we offer data source connectors to get your documents easily ingested and indexed. This post describes how to use Amazon Kendra’s SharePoint Online connector. To allow the connector to access your SharePoint Online […]

Use Amazon CloudWatch custom metrics for real-time monitoring of Amazon Sagemaker model performance

The training and learning process of deep learning (DL) models can be expensive and time consuming. It’s important for data scientists to monitor the model metrics, such as the training accuracy, training loss, validation accuracy, and validation loss, and make informed decisions based on those metrics. In this blog post, I’ll show you how to […]

Monitoring GPU Utilization with Amazon CloudWatch

Deep learning requires a large amount of matrix multiplications and vector operations that can be parallelized by GPUs (graphics processing units) because GPUs have thousands of cores. Amazon Web Services allows you to spin up P2 or P3 instances that are great for running Deep Learning frameworks such as MXNet, which emphasizes speeding up the deployment […]