AWS Big Data Blog
Category: Compute
Power your Kafka Streams application with Amazon MSK and AWS Fargate
November 2024: This post was reviewed and updated for accuracy. Today, companies of all sizes across all verticals design and build event-driven architectures centered around real-time streaming and stream processing. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easy for you to build and run applications that […]
Query a Teradata database using Amazon Athena Federated Query and join with data in your Amazon S3 data lake
If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Teradata as your transactional data store, you may need to join the data in your data lake with Teradata in the cloud, Teradata running on Amazon Elastic Compute Cloud (Amazon EC2), or with an on-premises Teradata database, for example to build […]
Query Snowflake using Athena Federated Query and join with data in your Amazon S3 data lake
This post was last reviewed and updated July, 2022 with updates in Athena federation connector. If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Snowflake as your data warehouse solution, you may need to join your data in your data lake with Snowflake. For example, you may want to build […]
Auto scaling Amazon Kinesis Data Streams using Amazon CloudWatch and AWS Lambda
This post is co-written with Noah Mundahl, Director of Public Cloud Engineering at United Health Group. Update (12/1/2021): Amazon Kinesis Data Streams On-Demand mode is now the recommended way to natively auto scale your Amazon Kinesis Data Streams. In this post, we cover a solution to add auto scaling to Amazon Kinesis Data Streams. Whether […]
Query your Oracle database using Athena Federated Query and join with data in your Amazon S3 data lake
This post was last reviewed and updated July, 2022 with updates in Athena federation connector. If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Oracle as your transactional data store, you may need to join the data in your data lake with Oracle on Amazon Relational Database Service (Amazon RDS), Oracle running on Amazon […]
Create a secure data lake by masking, encrypting data, and enabling fine-grained access with AWS Lake Formation
You can build data lakes with millions of objects on Amazon Simple Storage Service (Amazon S3) and use AWS native analytics and machine learning (ML) services to process, analyze, and extract business insights. You can use a combination of our purpose-built databases and analytics services like Amazon EMR, Amazon OpenSearch Service, and Amazon Redshift as […]
Automate Amazon ES synonym file updates
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Search engines provide the means to retrieve relevant content from a collection of content. However, this can be challenging if certain exact words aren’t entered. You need to find the right item from a catalog of products, or the correct […]
Increase Amazon Elasticsearch Service performance by upgrading to Graviton2
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Amazon OpenSearch Service supports multiple instance types based on your use case. In 2021, AWS announced general purpose (M6g), compute optimized (C6g), and memory optimized (R6g, R6gd) instance types for Amazon OpenSearch Service version 7.9 or later powered by AWS […]
Manage and process your big data workflows with Amazon MWAA and Amazon EMR on Amazon EKS
Many customers are gathering large amount of data, generated from different sources such as IoT devices, clickstream events from websites, and more. To efficiently extract insights from the data, you have to perform various transformations and apply different business logic on your data. These processes require complex workflow management to schedule jobs and manage dependencies […]
Estimate Amazon EC2 Spot Instance cost savings with AWS Glue DataBrew, AWS Glue, and Amazon QuickSight
AWS provides many ways to optimize your workloads and save on costs. For example, services like AWS Cost Explorer and AWS Trusted Advisor provide cost savings recommendations to help you optimize your AWS environments. However, you may also want to estimate cost savings when comparing Amazon Elastic Compute Cloud (Amazon EC2) Spot to On-Demand Instances. […]