AWS Big Data Blog
Invoke AWS Lambda functions from cross-account Amazon Kinesis Data Streams
A multi-account architecture on AWS is essential for enhancing security, compliance, and resource management by isolating workloads, enabling granular cost allocation, and facilitating collaboration across distinct environments. It also mitigates risks, improves scalability, and allows for advanced networking configurations. In a streaming architecture, you may have event producers, stream storage, and event consumers in a […]
Hybrid Search with Amazon OpenSearch Service
This post explains the internals of hybrid search and how to build a hybrid search solution using OpenSearch Service. We experiment with sample queries to explore and compare lexical, semantic, and hybrid search. All the code used in this post is publicly available in the GitHub repository.
Scale AWS Glue jobs by optimizing IP address consumption and expanding network capacity using a private NAT gateway
As businesses expand, the demand for IP addresses within the corporate network often exceeds the supply. An organization’s network is often designed with some anticipation of future requirements, but as enterprises evolve, their information technology (IT) needs surpass the previously designed network. Companies may find themselves challenged to manage the limited pool of IP addresses. […]
Amazon Managed Service for Apache Flink now supports Apache Flink version 1.18
Apache Flink is an open source distributed processing engine, offering powerful programming interfaces for both stream and batch processing, with first-class support for stateful processing and event time semantics. Apache Flink supports multiple programming languages, Java, Python, Scala, SQL, and multiple APIs with different level of abstraction, which can be used interchangeably in the same […]
Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight
It always pays to know more about your customers, and AWS Data Exchange makes it straightforward to use publicly available census data to enrich your customer dataset. The United States Census Bureau conducts the US census every 10 years and gathers household survey data. This data is anonymized, aggregated, and made available for public use. […]
Multicloud data lake analytics with Amazon Athena
Many organizations operate data lakes spanning multiple cloud data stores. This could be for various reasons, such as business expansions, mergers, or specific cloud provider preferences for different business units. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics […]
Amazon OpenSearch H2 2023 in review
2023 was been a busy year for Amazon OpenSearch Service! Learn more about the releases that OpenSearch Service launched in the first half of 2023. In the second half of 2023, OpenSearch Service added the support of two new OpenSearch versions: 2.9 and 2.11 These two versions introduce new features in the search space, machine […]
How VMware Tanzu CloudHealth migrated from self-managed Kafka to Amazon MSK
This is a post co-written with Rivlin Pereira & Vaibhav Pandey from Tanzu CloudHealth (VMware by Broadcom). VMware Tanzu CloudHealth is the cloud cost management platform of choice for more than 20,000 organizations worldwide, who rely on it to optimize and govern their largest and most complex multi-cloud environments. In this post, we discuss how […]
Gain insights from historical location data using Amazon Location Service and AWS analytics services
Many organizations around the world rely on the use of physical assets, such as vehicles, to deliver a service to their end-customers. By tracking these assets in real time and storing the results, asset owners can derive valuable insights on how their assets are being used to continuously deliver business improvements and plan for future […]
Build a RAG data ingestion pipeline for large-scale ML workloads
For building any generative AI application, enriching the large language models (LLMs) with new data is imperative. This is where the Retrieval Augmented Generation (RAG) technique comes in. RAG is a machine learning (ML) architecture that uses external documents (like Wikipedia) to augment its knowledge and achieve state-of-the-art results on knowledge-intensive tasks. For ingesting these […]