AWS Big Data Blog
Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data
In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift, the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools. This cloud service was a significant leap from the traditional data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Since then, customer demands for better scale, higher throughput, and agility in handling a wide variety of changing, but increasingly business critical analytics and machine learning use cases has exploded, and we have been keeping pace. Today, tens of thousands of customers use Amazon Redshift in AWS global infrastructure to collectively process exabytes of data daily and employs Amazon Redshift as a key component of their data architecture to drive use cases from typical dashboarding to self-service analytics, real-time analytics, machine learning, data sharing and monetization, and more
The advancements to Amazon Redshift announced at AWS re:Invent 2023 further accelerates modernization of your cloud analytics environments, keeping our core tenet to help you achieve the best price-performance at any scale. These announcements drive forward the AWS Zero-ETL vision to unify all your data, enabling you to better maximize the value of your data with comprehensive analytics and ML capabilities, and innovate faster with secure data collaboration within and across organizations. From price-performance improvements to zero-ETL, to generative AI capabilities, we have something for everyone. Let’s dive into the highlights.
Modernizing analytics for scale, performance, and reliability
“Our migration from legacy on-premises platform to Amazon Redshift allows us to ingest data 88% faster, query data 3x faster, and load daily data to the cloud 6x faster. Amazon Redshift enabled us to optimize performance, availability, and reliability—significantly easing operational complexity, while increasing the velocity of our end-users’ decision-making experience on the Fab floor.”
– Sunil Narayan, Sr Dir, Analytics at GlobalFoundries
Diligently driving the best price-performance at scale with new improvements
Since day 1, Amazon Redshift has been building innovative capabilities to help you get to optimum performance, while keeping costs lower. Amazon Redshift continues to lead on the price-performance front with up to 6x better price-performance than alternative cloud data warehouse and for dash boarding applications with high concurrency and low latency. We closely analyze query patterns in the fleet and look for opportunities to drive customer-focused innovation. For example, earlier in the year, we announced speed ups for string-based data processing up to 63x compared to alternative compression encodings such as LZO (Lempel-Ziv-Oberhumer) or ZStandard. At AWS re:Invent 2023, we introduced more performance enhancements in query planning and execution such as enhanced bloom filters , query rewrites, and support for write operations in auto scaling . For more information about performance improvement capabilities, refer to the list of announcements below.
Amazon Redshift Serverless is more intelligent than ever with new AI-driven scaling and optimizations
Speaking of price-performance, new next generation AI-driven scaling and optimizations capabilities in Amazon Redshift Serverless can deliver up to 10x better price-performance for variable workloads (based on internal testing), without manual intervention. Amazon Redshift Serverless, generally available since 2021, allows you to run and scale analytics without having to provision and manage the data warehouse. Since GA, Redshift Serverless executed over a billion queries to power data insights for thousands of customers. With these new AI optimizations, Amazon Redshift Serverless scales proactively and automatically with workload changes across all key dimensions —such as data volume, concurrent users, and query complexity. You just specify your desired price-performance targets to either optimize for cost or optimize for performance or balanced and serverless does the rest. Learn more about additional improvements in Redshift Serverless, under the list of announcements below.
Multi-data warehouse writes through data sharing
Data sharing is a widely adopted feature in Amazon Redshift with customers running tens of millions of queries on shared data daily. Customers share live transactionally consistent data within and across organizations and regions for read purposes without data copies or data movement. Customers are using data sharing to modernize their analytics architectures from monolithic architectures to multi-cluster, data mesh deployments that enable seamless and secure access across organizations to drive data collaboration and powerful insights. At AWS re:Invent 2023, we extended data sharing capabilities to launch multi-data warehouse writes in preview. You can now start writing to Redshift databases from other Redshift data warehouses in just a few clicks, further enabling data collaboration, flexible scaling of compute for ETL/data processing workloads by adding warehouses of different types and sizes based on price-performance needs. Experience greater transparency of compute usage as each warehouse is billed for its own compute and consequently keep your costs under control.
Multidimensional data layouts
Amazon Redshift offers industry leading predictive optimizations that continuously monitor your workloads and seamlessly accelerate performance and maximize concurrency by adjusting data layout and compute management as you use the data warehouse more. In addition to the powerful optimizations Redshift already offers, such as Automatic Table Sort, Automatic sort and distribution keys, we are introducing Multidimensional Data Layouts, a new powerful table sorting mechanism that improves performance of repetitive queries by automatically sorting data based on the incoming query filters (for example: Sales in a specific region). This method significantly accelerates the performance of table scans compared to traditional methods.
Unifying all your data with zero-ETL approaches
“Using the Aurora MySQL zero-ETL integration, we experience near real-time data synchronization between Aurora MySQL databases and Amazon Redshift, making it possible to build an analysis environment in just three hours instead of the month of developer time it used to take before”
– Money Forward i
JOYME uses Amazon Redshift’s streaming ingestion and other Amazon services for risk control over users’ financial activity such as recharge, refund, and rewards.
“With Redshift, we are able to view risk counterparts and data in near real time—
instead of on an hourly basis. Redshift significantly improved our business ROI efficiency.”– PengBo Yang, CTO, JOYME
Data pipelines can be challenging and costly to build and manage and can create hours-long delays to obtain transactional data for analytics. These delays can lead to missed business opportunities, especially when the insights derived from analytics on transactional data are relevant for only a limited amount of time. Amazon Redshift employs AWS’s zero-ETL approach that enables interoperability and integration between the data warehouse and operational databases and even your streaming data services, so that the data is easily and automatically ingested into the warehouse for you, or you can access the data in place, where it lives.
Zero-ETL integrations with operational databases
We delivered zero-ETL integration between Amazon Aurora MySQL Amazon Redshift (general availability) this year, to enable near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from Amazon Aurora. Within seconds of transactional data being written into Aurora, the data is available in Amazon Redshift, so you don’t have to build and maintain complex data pipelines to perform extract, transform, and load (ETL) operations. At AWS re:Invent, we extended zero-ETL integration to additional sources specifically Aurora PostgreSQL, Dynamo DB, and Amazon RDS MySQL. Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications.
Data lake querying with support for Apache Iceberg tables
Amazon Redshift allows customers to run a wide range of workloads on data warehouse and data lakes using its support for various open file and table formats. At AWS re:Invent, we announced the general availability of support for Apache Iceberg tables, so you can easily access your Apache Iceberg tables on your data lake from Amazon Redshift and join it with the data in your data warehouse when needed. Use one click to access your data lake tables using auto-mounted AWS Glue data catalogs on Amazon Redshift for a simplified experience. We have improved data lake query performance by integrating with AWS Glue statistics and introduce preview of incremental refresh for materialized views on data lake data to accelerate repeated queries.
Learn more about the zero-ETL integrations, data lake performance enhancements, and other announcements below.
Maximize value with comprehensive analytics and ML capabilities
“Amazon Redshift is one of the most important tools we had in growing Jobcase as a company.”
– Ajay Joshi, Distinguished Engineer, Jobcase
With all your data integrated and available, you can easily build and run near real-time analytics to AI/ML/Generative AI applications. Here’s a couple of highlights from this week and for the full list, see below.
Amazon Q Generative SQL capability
Query Editor, an out-of-the-box web-based SQL experience in Amazon Redshift is a popular tool for data exploration, visual analysis, and data collaboration. At AWS re:Invent, we introduced Amazon Q Generative SQL capabilities in Amazon Redshift Query Editor (preview), to simplify query authoring and increase your productivity by allowing you to express queries in natural language and receive SQL code recommendations. Generative SQL uses AI to analyze user intent, query patterns, and schema metadata to identify common SQL query patterns directly allowing you to get insights faster in a conversational format without extensive knowledge of your organization’s complex database metadata.
Amazon Redshift ML large language model (LLM) integration
Amazon Redshift ML enables customers to create, train, and deploy machine learning models using familiar SQL commands. Customers use Redshift ML to run an average of over 10 billion predictions a day within their data warehouses. At AWS re:Invent, we announced support for LLMs as preview. Now, you can use pre-trained open source LLMs in Amazon SageMaker JumpStart as part of Redshift ML, allowing you to bring the power of LLMs to analytics. For example, you can make inferences on your product feedback data in Amazon Redshift, use LLMs to summarize feedback, perform entity extraction, sentiment analysis and product feedback classification.
Innovate faster with secure data collaboration within and across the organizations
“Millions of companies use Stripe’s software and APIs to accept payments, send payouts, and manage their businesses online. Access to their Stripe data via leading data warehouses like Amazon Redshift has been a top request from our customers. Our customers needed secure, fast, and integrated analytics at scale without building complex data pipelines or moving and copying data around. With Stripe Data Pipeline for Amazon Redshift, we’re helping our customers set up a direct and reliable data pipeline in a few clicks. Stripe Data Pipeline enables our customers to automatically share their complete, up-to-date Stripe data with their Amazon Redshift data warehouse, and take their business analytics and reporting to the next level.”
– Tony Petrossian, Head of Engineering, Revenue & Financial Management at Stripe
With Amazon Redshift, you can easily and securely share data and collaborate no matter where your teams or data is located. And have the confidence that your data is secure no matter where you operate or how highly regulated your industries are. We have enabled fine grained permissions, an easy authentication experience with single sign-on for your organizational identity—all provided at no additional cost to you.
Unified identity with IAM identity center integration
We announced Amazon Redshift integration with AWS IAM Identity Center to enable organizations to support trusted identity propagation between Amazon QuickSight,, Amazon Redshift Query Editor, and Amazon Redshift, . Customers can use their organization identities to access Amazon Redshift in a single sign-on experience using third party identity providers (IdP), such as Microsoft Entra ID, Okta, Ping, OneLogin, etc. from Amazon QuickSight and Amazon Redshift Query Editor. Administrators can use third-party identity provider users and groups to manage fine grained access to data across services and audit user level access in AWS CloudTrail. With trusted identity propagation, a user’s identity is passed seamlessly between Amazon QuickSight, Amazon Redshift reducing time to insights and enabling a friction free analytics experience.
For the full set of announcements, see the following:
-
Modernizing analytics for scale, performance, and reliability
- What’s new – New AI-driven scaling and optimizations in Amazon Redshift Serverless (Preview)
- What’s new – Multi-data warehouse writes through data sharing (Preview)
- What’s new – Multi-Dimensional Data Layouts (Preview)
- What’s new – Support for Multi-AZ deployments in GA
- What’s new – Concurrency scaling now supports Create Table As Select
- What’s new – Enhanced manageability and usability features for Amazon Redshift Serverless
- Redshift price-performance improvements
-
Unifying all your data with zero-ETL approaches
- What’s new – Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift (Preview)
- What’s new – Amazon RDS for MySQL zero-ETL integration with Amazon Redshift (Preview)
- What’s new – Amazon DynamoDB zero-ETL integration with Amazon Redshift (Preview)
- What’s new – Support for Apache Iceberg tables in GA
- What’s new – Incremental refreshes on materialized views (Preview)
- What’s new – Integration with AWS Glue column-level statistics
-
Maximize value with comprehensive analytics and ML capabilities
- What’s new – Amazon Q Generative SQL in Amazon Redshift (Preview)
- What’s new – Large Language Model support in Amazon Redshift (Preview)
- What’s new – AWS Glue support for multi engine views with AWS Analytics Engines
- What’s new – Integration with visual studio code
- What’s new – Autocomplete suggestions in Amazon Redshift Query Editor V2
-
Innovate faster with secure data collaboration within and across the organizations
- What’s new – Trusted Identity propagation with IAM Identity Center
- What’s new – New Fine grained access control capabilities (Preview)
- What’s new – Integration with secrets manager
- What’s new – Row level security enhancements
- What’s new – Metadata security for multi-tenant applications
Learn more: https://thinkwithwp.com/redshift
About the authors
Neeraja Rentachintala is a Principal Product Manager with Amazon Redshift. Neeraja is a seasoned Product Management and GTM leader, bringing over 20 years of experience in product vision, strategy and leadership roles in data products and platforms. Neeraja delivered products in analytics, databases, data Integration, application integration, AI/Machine Learning, large scale distributed systems across On-Premise and Cloud, serving Fortune 500 companies as part of ventures including MapR (acquired by HPE), Microsoft SQL Server, Oracle, Informatica and Expedia.com.
Sunaina AbdulSalah leads product marketing for Amazon Redshift. She focuses on educating customers about the impact of data warehousing and analytics and sharing AWS customer stories. She has a deep background in marketing and GTM functions in the B2B technology and cloud computing domains. Outside of work, she spends time with her family and friends and enjoys traveling.