AWS Big Data Blog

Category: Announcements

Achieve 2x faster data lake query performance with Apache Iceberg on Amazon Redshift

In 2025, Amazon Redshift delivered several performance optimizations that improved query performance over twofold for Iceberg workloads on Amazon Redshift Serverless, delivering exceptional performance and cost-effectiveness for your data lake workloads. In this post, we describe some of the optimizations that led to these performance gains.

Introducing catalog federation for Apache Iceberg tables in the AWS Glue Data Catalog

AWS Glue now supports catalog federation for remote Iceberg tables in the Data Catalog. With catalog federation, you can query remote Iceberg tables, stored in Amazon S3 and cataloged in remote Iceberg catalogs, using AWS analytics engines and without moving or duplicating tables. In this post, we discuss how to get started with catalog federation for Iceberg tables in the Data Catalog.

Accelerate data lake operations with Apache Iceberg V3 deletion vectors and row lineage

In this post, we walk you through the new capabilities in Iceberg V3, explain how deletion vectors and row lineage address these challenges, explore real-world use cases across industries, and provide practical guidance on implementing Iceberg V3 features across AWS analytics, catalog, and storage services.

Introducing Cluster Insights: Unified monitoring dashboard for Amazon OpenSearch Service clusters

This blog will guide you through setting up and using Cluster Insights, including key features and metrics. By the conclusion, you’ll understand how to use Cluster insights to recognize and address performance and resiliency issues within your OpenSearch Service clusters.

Enforce business glossary classification rules in Amazon SageMaker Catalog

Amazon SageMaker Catalog now supports metadata enforcement rules for glossary terms classification (tagging) at the asset level. With this capability, administrators can require that assets include specific business terms or classifications. Data producers must apply required glossary terms or classifications before an asset can be published. In this post, we show how to enforce business glossary classification rules in SageMaker Catalog.

Enhanced data discovery in Amazon SageMaker Catalog with custom metadata forms and rich text documentation

Amazon SageMaker Catalog now supports custom metadata forms and rich text descriptions at the column level, extending existing curation capabilities for business names, descriptions, and glossary term classifications. Column-level context is essential for understanding and trusting data. This release helps organizations improve data discoverability, collaboration, and governance by letting metadata stewards document columns using structured and formatted information that aligns with internal standards. In this post, we show how to enhance data discovery in SageMaker Catalog with custom metadata forms and rich text documentation at the schema level.

Introducing Amazon MWAA Serverless

Today, AWS announced Amazon Managed Workflows for Apache Airflow (MWAA) Serverless. This is a new deployment option for MWAA that eliminates the operational overhead of managing Apache Airflow environments while optimizing costs through serverless scaling. In this post, we demonstrate how to use MWAA Serverless to build and deploy scalable workflow automation solutions.