AWS Big Data Blog
Category: Advanced (300)
Run Apache Spark and Iceberg 4.5x faster than open source Spark with Amazon EMR
This post shows how Amazon EMR 7.12 can make your Apache Spark and Iceberg workloads up to 4.5x faster performance.
Apache Spark encryption performance improvement with Amazon EMR 7.9
In this post, we analyze the results from our benchmark tests comparing the Amazon EMR 7.9 optimized Spark runtime against Spark 3.5.5 without encryption optimizations. We walk through a detailed cost analysis and provide step-by-step instructions to reproduce the benchmark.
Introducing catalog federation for Apache Iceberg tables in the AWS Glue Data Catalog
AWS Glue now supports catalog federation for remote Iceberg tables in the Data Catalog. With catalog federation, you can query remote Iceberg tables, stored in Amazon S3 and cataloged in remote Iceberg catalogs, using AWS analytics engines and without moving or duplicating tables. In this post, we discuss how to get started with catalog federation for Iceberg tables in the Data Catalog.
Getting started with Apache Iceberg write support in Amazon Redshift
In this post, we show how you can use Amazon Redshift to write data directly to Apache Iceberg tables stored in Amazon S3 and S3 Tables for seamless integration between your data warehouse and data lake while maintaining ACID compliance.
Orchestrating data processing tasks with a serverless visual workflow in Amazon SageMaker Unified Studio
In this post, we show how to use the new visual workflow experience in SageMaker Unified Studio IAM-based domains to orchestrate an end-to-end machine learning workflow. The workflow ingests weather data, applies transformations, and generates predictions—all through a single, intuitive interface, without writing any orchestration code.
Save up to 24% on Amazon Redshift Serverless compute costs with Reservations
In this post, you learn how Amazon Redshift Serverless Reservations can help you lower your data warehouse costs. We explore ways to determine the optimal number of RPUs to reserve, review example scenarios, and discuss important considerations when purchasing these reservations.
Enhanced data discovery in Amazon SageMaker Catalog with custom metadata forms and rich text documentation
Amazon SageMaker Catalog now supports custom metadata forms and rich text descriptions at the column level, extending existing curation capabilities for business names, descriptions, and glossary term classifications. Column-level context is essential for understanding and trusting data. This release helps organizations improve data discoverability, collaboration, and governance by letting metadata stewards document columns using structured and formatted information that aligns with internal standards. In this post, we show how to enhance data discovery in SageMaker Catalog with custom metadata forms and rich text documentation at the schema level.
Getting started with Amazon S3 Tables in Amazon SageMaker Unified Studio
In this post, you learn how to integrate SageMaker Unified Studio with S3 Tables and query your data using Amazon Athena, Amazon Redshift, or Apache Spark in EMR and AWS Glue.
Analyzing Amazon EC2 Spot instance interruptions by using event-driven architecture
In this post, you’ll learn how to build this comprehensive monitoring solution step-by-step. You’ll gain practical experience designing an event-driven pipeline, implementing data processing workflows, and creating insightful dashboards that help you track interruption trends, optimize ASG configurations, and improve the resilience of your Spot Instance workloads.
Enhanced search with match highlights and explanations in Amazon SageMaker
Amazon SageMaker now enhances search results in Amazon SageMaker Unified Studio with additional context that improves transparency and interpretability. The capability introduces inline highlighting for matched terms and an explanation panel that details where and how each match occurred across metadata fields such as name, description, glossary, and schema. In this post, we demonstrate how to use enhanced search in Amazon SageMaker.









