AWS Big Data Blog
Build data integration jobs with AI companion on AWS Glue Studio notebook powered by Amazon CodeWhisperer
Data is essential for businesses to make informed decisions, improve operations, and innovate. Integrating data from different sources can be a complex and time-consuming process. AWS offers AWS Glue to help you integrate your data from multiple sources on serverless infrastructure for analysis, machine learning (ML), and application development. AWS Glue provides different authoring experiences […]
Introducing the vector engine for Amazon OpenSearch Serverless, now in preview
We are pleased to announce the preview release of the vector engine for Amazon OpenSearch Serverless. The vector engine provides a simple, scalable, and high-performing similarity search capability in Amazon OpenSearch Serverless that makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (AI) applications without having […]
Alcion supports their multi-tenant platform with Amazon OpenSearch Serverless
This is a guest blog post co-written with Zack Rossman from Alcion. Alcion, a security-first, AI-driven backup-as-a-service (BaaS) platform, helps Microsoft 365 administrators quickly and intuitively protect data from cyber threats and accidental data loss. In the event of data loss, Alcion customers need to search metadata for the backed-up items (files, emails, contacts, events, […]
Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable
Amazon Redshift Serverless makes it simple to run and scale analytics in seconds. It automatically provisions and intelligently scales data warehouse compute capacity to deliver fast performance, and you pay only for what you use. Just load your data and start querying right away in the Amazon Redshift Query Editor or in your favorite business […]
Enable data analytics with Talend and Amazon Redshift Serverless
This is a guest post co-written with Cameron Davie from Talend. Today, in order to accelerate and scale data analytics, companies are looking for an approach to minimize infrastructure management and predict computing needs for different types of workloads, including spikes and ad hoc analytics. The integration of Talend Cloud and Talend Stitch with Amazon […]
Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation
Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. Many organizations have a distributed tools and infrastructure across various business units. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture […]
Query your Apache Hive metastore with AWS Lake Formation permissions
Apache Hive is a SQL-based data warehouse system for processing highly distributed datasets on the Apache Hadoop platform. There are two key components to Apache Hive: the Hive SQL query engine and the Hive metastore (HMS). The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, […]
Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics
This post is co-written with Eliad Gat and Oded Lifshiz from Orca Security. With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. One key component that plays a central role in modern data architectures is the data lake, which allows organizations to […]
Dimensional modeling in Amazon Redshift
Amazon Redshift is a fully managed and petabyte-scale cloud data warehouse that is used by tens of thousands of customers to process exabytes of data every day to power their analytics workload. You can structure your data, measure business processes, and get valuable insights quickly can be done by using a dimensional model. Amazon Redshift […]
Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue
Today, we are pleased to announce a new AWS Glue connector for Google Cloud Storage that allows you to move data bi-directionally between Google Cloud Storage and Amazon Simple Storage Service (Amazon S3). In this post, we go over how the new connector works, introduce the connector’s functions, and provide you with key steps to set it up. We provide you with prerequisites, share how to subscribe to this connector in AWS Marketplace, and describe how to create and run AWS Glue for Apache Spark jobs with it.