AWS Big Data Blog
Category: Intermediate (200)
How a blockchain startup built a prototype solution to solve the need of analytics for decentralized applications with AWS Data Lab
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. This post is co-written with Dr. Quan Hoang Nguyen, CTO at Fantom Foundation. Here at Fantom Foundation (Fantom), we have developed a high performance, highly scalable, and secure smart contract platform. It’s […]
Build incremental crawls of data lakes with existing Glue catalog tables
AWS Glue includes crawlers, a capability that make discovering datasets simpler by scanning data in Amazon Simple Storage Service (Amazon S3) and relational databases, extracting their schema, and automatically populating the AWS Glue Data Catalog, which keeps the metadata current. This reduces the time to insight by making newly ingested data quickly available for analysis […]
Improve federated queries with predicate pushdown in Amazon Athena
In modern data architectures, it’s common to store data in multiple data sources. However, organizations embracing this approach still need insights from their data and require technologies that help them break down data silos. Amazon Athena is an interactive query service that makes it easy to analyze structured, unstructured, and semi-structured data stored in Amazon […]
Land data from databases to a data lake at scale using AWS Glue blueprints
To build a data lake on AWS, a common data ingestion pattern is to use AWS Glue jobs to perform extract, transform, and load (ETL) data from relational databases to Amazon Simple Storage Service (Amazon S3). A project often involves extracting hundreds of tables from source databases to the data lake raw layer. And for […]
Automate ETL jobs between Amazon RDS for SQL Server and Azure Managed SQL using AWS Glue Studio
Nowadays many customers are following a multi-cloud strategy. They might choose to use various cloud-managed services, such as Amazon Relational Database Service (Amazon RDS) for SQL Server and Azure SQL Managed Instances, to perform data analytics tasks, but still use traditional extract, transform, and load (ETL) tools to integrate and process the data. However, traditional ETL tools may […]
Enable self-service visual data integration and analysis for fund performance using AWS Glue Studio and Amazon QuickSight
June 2023: This post was reviewed and updated for accuracy. IMM (Institutional Money Market) is a mutual fund that invests in highly liquid instruments, cash, and cash equivalents. IMM funds are large financial intermediaries that are crucial to financial stability in the US. Due to its criticality, IMM funds are highly regulated under the security […]
New additions to line charts in Amazon QuickSight
Amazon QuickSight is a fully-managed, cloud-native business intelligence (BI) service that makes it easy to create and deliver insights to everyone in your organization or even with your customers and partners. You can make your data come to life with rich interactive charts and create beautiful dashboards to be shared with thousands of users, either […]
Crawl Delta Lake tables using AWS Glue crawlers
June 2023: This post was reviewed and updated for accuracy. In recent evolution in data lake technologies, it became popular to bring ACID (atomicity, consistency, isolation, and durability) transactions on Amazon Simple Storage Service (Amazon S3). You can achieve that by introducing open-source data lake formats such as Apache Hudi, Apache Iceberg, and Delta Lake. […]
New row and column interactivity options for tables and pivot tables in Amazon QuickSight – Part 1
Amazon QuickSight is a fully-managed, cloud-native business intelligence (BI) service that makes it easy to create and deliver insights to everyone in your organization. You can make your data come to life with rich interactive charts and create beautiful dashboards to share with thousands of users, either directly within a QuickSight application, or embedded in […]
Build a pseudonymization service on AWS to protect sensitive data: Part 1
According to an article in MIT Sloan Management Review, 9 out of 10 companies believe their industry will be digitally disrupted. In order to fuel the digital disruption, companies are eager to gather as much data as possible. Given the importance of this new asset, lawmakers are keen to protect the privacy of individuals and […]