AWS Big Data Blog
Category: Learning Levels
Enable remote reads from Azure ADLS with SAS tokens using Spark in Amazon EMR
Organizations use data from many sources to understand, analyze, and grow their business. These data sources are often spread across various public cloud providers. Enterprises may also expand their footprint by mergers and acquisitions, and during such events they often end up with data spread across different public cloud providers. These scenarios can create the […]
Improved resiliency with backpressure and admission control for Amazon OpenSearch Service
Amazon OpenSearch Service is a managed service that makes it simple to secure, deploy, and operate OpenSearch clusters at scale in the AWS Cloud. Last year, we introduced Shard Indexing Backpressure and admission control, which monitors cluster resources and incoming traffic to selectively reject requests that would otherwise pose stability risks like out of memory […]
Best practices for running production workloads using Amazon MSK tiered storage
In the second post of the series, we discussed some core concepts of the Amazon Managed Streaming for Apache Kafka (Amazon MSK) tiered storage feature and explained how read and write operations work in a tiered storage enabled cluster. This post focuses on how to properly size your MSK tiered storage cluster, which metrics to […]
Federate Amazon QuickSight access with open-source identity provider Keycloak
Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML) powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Organizations are working toward centralizing their identity and access strategy across all their applications, including on-premises and third-party. Many organizations use Keycloak as their identity provider […]
AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry
Organizations across the world are increasingly relying on streaming data, and there is a growing need for real-time data analytics, considering the growing velocity and volume of data being collected. This data can come from a diverse range of sources, including Internet of Things (IoT) devices, user applications, and logging and telemetry information from applications, […]
Cost monitoring for Amazon EMR on Amazon EKS
Amazon EMR is the industry-leading cloud big data solution, providing a collection of open-source frameworks such as Spark, Hive, Hudi, and Presto, fully managed and with per-second billing. Amazon EMR on Amazon EKS is a deployment option allowing you to deploy Amazon EMR on the same Amazon Elastic Kubernetes Service (Amazon EKS) clusters that is […]
Choosing an open table format for your transactional data lake on AWS
August 2023: This post was updated to include Apache Iceberg support in Amazon Redshift. Disclaimer: Due to rapid advancements in AWS service support for open table formats, recent developments might not yet be reflected in this post. For the latest information on AWS service support for open table formats, refer to the official AWS service […]
Implement alerts in Amazon OpenSearch Service with PagerDuty
In today’s fast-paced digital world, businesses rely heavily on their data to make informed decisions. This data is often stored and analyzed using various tools, such as Amazon OpenSearch Service, a powerful search and analytics service offered by AWS. OpenSearch Service provides real-time insights into your data to support use cases like interactive log analytics, […]
Automate and accelerate your Amazon QuickSight asset deployments using the new APIs
Business intelligence (BI) and IT operations (BIOps) teams often need to automate and accelerate the deployment of BI assets to ensure business continuity. We heard that you wanted an automated and scalable way to deploy, back up, and replicate Amazon QuickSight assets at scale so that BIOps teams within your organization can work in an […]
Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit
Amazon EMR on EKS provides a deployment option for Amazon EMR that allows organizations to run open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). With EMR on EKS, Spark applications run on the Amazon EMR runtime for Apache Spark. This performance-optimized runtime offered by Amazon EMR makes your Spark jobs run fast […]