AWS Big Data Blog
Category: Amazon Athena
New features from Apache Hudi available in Amazon EMR
Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development by providing record-level insert, update and delete capabilities. This record-level capability is helpful if you’re building your data lakes on Amazon S3 or HDFS. You can use it to comply with data privacy regulations and simplify data […]
Create a custom data connector to Slack’s Member Analytics API in Amazon QuickSight with Amazon Athena Federated Query
Amazon QuickSight recently added support for Amazon Athena Federated Query, which allows you to query data in place from various data sources. With this capability, QuickSight can extend support to query additional data sources like Amazon CloudWatch Logs, Amazon DynamoDB, and Amazon DocumentDB (with Mongo DB compatibility) via their existing Amazon Athena data source. You […]
Integrating Datadog data with AWS using Amazon AppFlow for intelligent monitoring
Infrastructure and operation teams are often challenged with getting a full view into their IT environments to do monitoring and troubleshooting. New monitoring technologies are needed to provide an integrated view of all components of an IT infrastructure and application system. Datadog provides intelligent application and service monitoring by bringing together data from servers, databases, […]
Querying a Vertica data source in Amazon Athena using the Athena Federated Query SDK
The ability to query data and perform ad hoc analysis across multiple platforms and data stores with a single tool brings immense value to the big data analytical arena. As organizations build out data lakes with increasing volumes of data, there is a growing need to combine that data with large amounts of data in […]
Automating AWS service logs table creation and querying them with Amazon Athena
I was working with a customer who was just getting started using AWS, and they wanted to understand how to query their AWS service logs that were being delivered to Amazon Simple Storage Service (Amazon S3). I introduced them to Amazon Athena, a serverless, interactive query service that allows you to easily analyze data in […]
Building AWS Data Lake visualizations with Amazon Athena and Tableau
July 2023: This post was reviewed for accuracy. Amazon Athena is an interactive query service that makes it easy to analyze data in a data lake using standard SQL. One of the key elements of Athena is that you only pay for the queries you run. This is an attractive feature because there is no […]
How EMX reduced data pipeline costs by 85% with Amazon Athena
This is a guest blog post by Gary Bouton and Louis Ashner from EMX. In their own words, “ENGINE Media Exchange (EMX) is a leading marketing technology company, leveraging a patented, end-to-end tech stack purpose-built to meet the demands of today’s digital marketplace. The company creates both open- and closed-loop solutions designed to unify advertisers, […]
Detecting anomalous values by invoking the Amazon Athena machine learning inference function
Amazon Athena has released a new feature that allows you to easily invoke machine learning (ML) models for inference directly from your SQL queries. Inference is the stage in which a trained model is used to infer and predict the testing samples and comprises a similar forward pass as training to predict the values. Unlike […]
Orchestrating an AWS Glue DataBrew job and Amazon Athena query with AWS Step Functions
As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Also, as we start building complex data engineering or data analytics pipelines, we look for a simpler orchestration mechanism with graphical user interface-based ETL (extract, transform, load) tools. Recently, AWS […]
Accessing and visualizing data from multiple data sources with Amazon Athena and Amazon QuickSight
Amazon Athena now supports federated query, a feature that allows you to query data in sources other than Amazon Simple Storage Service (Amazon S3). You can use federated queries in Athena to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3. With Athena […]