AWS Database Blog

Category: Advanced (300)

Load vector embeddings up to 67x faster with pgvector and Amazon Aurora

pgvector is the open source PostgreSQL extension for vector similarity search that powers generative artificial intelligence (AI) applications using techniques such as semantic search and retrieval-augmented generation (RAG). Amazon Aurora PostgreSQL-Compatible Edition has supported pgvector 0.5.1 since 2023. Amazon Aurora now supports pgvector version 0.7.0, which adds parallelism to improve the performance of building Hierarchical Navigable Small Worlds […]

Build a streaming ETL pipeline on Amazon RDS using Amazon MSK

Customers who host their transactional database on Amazon Relational Database Service (Amazon RDS) often seek architecture guidance on building streaming extract, transform, load (ETL) pipelines to destination targets such as Amazon Redshift. This post outlines the architecture pattern for creating a streaming data pipeline using Amazon Managed Streaming for Apache Kafka (Amazon MSK). Amazon MSK offers a fully managed Apache Kafka service, enabling you to ingest and process streaming data in real time.

Modernize your legacy databases with AWS data lakes, Part 1: Migrate SQL Server using AWS DMS

This is a three-part series in which we discuss the end-to-end process of building a data lake from a legacy SQL Server database. In this post, we show you how to build data pipelines to replicate data from Microsoft SQL Server to a data lake in Amazon S3 using AWS DMS. You can extend the solution presented in this post to other database engines like PostgreSQL, MySQL, and Oracle.

Performance testing MySQL migration environments using query playback and traffic mirroring – Part 3

This is the third post in a series where we dive deep into performance testing of MySQL environments being migrated from on premises. In Part 1, we compared the query playback and traffic mirroring approaches at a high level. In Part 2, we showed how to set up and configure query playback. In this post, we show you how to set up and configure traffic mirroring.

Performance testing MySQL migration environments using query playback and traffic mirroring – Part 2

This is the second post in a series where we dive deep into performance testing MySQL environments being migrated from on premises. In Part 1, we compared the query playback and traffic mirroring approaches at a high level. In this post, we dive into the setup and configuration of query playback.

How Claroty Improved Database Performance and Scaled the Claroty xDome Platform using Amazon Aurora Optimized Reads

Claroty is a leading provider of industrial cybersecurity solutions, protecting cyber-physical systems (CPS), such as industrial control systems, operational technology networks, and healthcare networks from cyber threats. Claroty’s business is rooted in its need to efficiently manage large volumes of data and run complex queries to ensure a great user experience for its customers who are reducing security risks to cyber-physical systems. One key workload involves an API that provides users with an interface to extract device, alert, and vulnerability data from the Claroty xDome dashboard, enabling seamless integration into their own data stores. In this post, we share how Claroty improved database performance and scaled Claroty xDome using the advanced features of Aurora.

Unlock cost savings using compression with Amazon DocumentDB

In the post Reduce cost and improve performance by migrating to Amazon DocumentDB 5.0, we discussed various ways to reduce costs by migrating your workload to Amazon DocumentDB. In this post, we demonstrate the document compression feature in Amazon DocumentDB to reduce storage usage and I/O cost.

Visualize vector embeddings stored in Amazon Aurora PostgreSQL and explore semantic similarities

In this post, we show how you can visualize vector embeddings and explore semantic similarities. We use PCA for dimensionality reduction. PCA is a well-known dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving as much of the original variance as possible. By projecting data onto orthogonal axes called principal components, PCA enables you to visualize the underlying structure of the data in a more manageable form