Analytics | AWS Big Data Blog

Enforce business glossary classification rules in Amazon SageMaker Catalog

Amazon SageMaker Catalog now supports metadata enforcement rules for glossary terms classification (tagging) at the asset level. With this capability, administrators can require that assets include specific business terms or classifications. Data producers must apply required glossary terms or classifications before an asset can be published. In this post, we show how to enforce business glossary classification rules in SageMaker Catalog.

Enhanced data discovery in Amazon SageMaker Catalog with custom metadata forms and rich text documentation

Amazon SageMaker Catalog now supports custom metadata forms and rich text descriptions at the column level, extending existing curation capabilities for business names, descriptions, and glossary term classifications. Column-level context is essential for understanding and trusting data. This release helps organizations improve data discoverability, collaboration, and governance by letting metadata stewards document columns using structured and formatted information that aligns with internal standards. In this post, we show how to enhance data discovery in SageMaker Catalog with custom metadata forms and rich text documentation at the schema level.

Getting started with Amazon S3 Tables in Amazon SageMaker Unified Studio

In this post, you learn how to integrate SageMaker Unified Studio with S3 Tables and query your data using Amazon Athena, Amazon Redshift, or Apache Spark in EMR and AWS Glue.

Cross-account lakehouse governance with Amazon S3 Tables and SageMaker Catalog

In this post, we walk you through a practical solution for secure, efficient cross-account data sharing and analysis. You’ll learn how to set up cross-account access to S3 Tables using federated catalogs in Amazon SageMaker, perform unified queries across accounts with Amazon Athena in Amazon SageMaker Unified Studio, and implement fine-grained access controls at the column level using AWS Lake Formation.

Your guide to AWS Analytics at AWS re:Invent 2025

It’s that time of year again — AWS re:Invent is here! At re:Invent, bold ideas come to life. Get a front-row seat to hear inspiring stories from AWS experts, customers, and leaders as they explore today’s most impactful topics, from data analytics to AI. For all the data enthusiasts and professionals, we’ve curated a comprehensive […]

How Yelp modernized its data infrastructure with a streaming lakehouse on AWS

This is a guest post by Umesh Dangat, Senior Principal Engineer for Distributed Services and Systems at Yelp, and Toby Cole, Principle Engineer for Data Processing at Yelp, in partnership with AWS. Yelp processes massive amounts of user data daily—over 300 million business reviews, 100,000 photo uploads, and countless check-ins. Maintaining sub-minute data freshness with […]

Introducing the Amazon OpenSearch Lens for the AWS Well-Architected Framework

In this post, we show you how to use the Amazon OpenSearch Service Lens to evaluate your OpenSearch Service workloads against architectural best practices.

Amazon MSK Express brokers now support Intelligent Rebalancing for 180 times faster operation performance

Effective today, all new Amazon Managed Streaming for Apache Kafka (Amazon MSK) Provisioned clusters with Express brokers will support Intelligent Rebalancing at no additional cost. In this post we’ll introduce the Intelligent Rebalancing feature and show an example of how it works to improve operation performance.

Analyzing Amazon EC2 Spot instance interruptions by using event-driven architecture

In this post, you’ll learn how to build this comprehensive monitoring solution step-by-step. You’ll gain practical experience designing an event-driven pipeline, implementing data processing workflows, and creating insightful dashboards that help you track interruption trends, optimize ASG configurations, and improve the resilience of your Spot Instance workloads.

Enhanced search with match highlights and explanations in Amazon SageMaker

Amazon SageMaker now enhances search results in Amazon SageMaker Unified Studio with additional context that improves transparency and interpretability. The capability introduces inline highlighting for matched terms and an explanation panel that details where and how each match occurred across metadata fields such as name, description, glossary, and schema. In this post, we demonstrate how to use enhanced search in Amazon SageMaker.

AWS Big Data Blog

Category: Analytics