AWS Database Blog

Category: Amazon Machine Learning

Use a DAO to govern LLM training data, Part 4: MetaMask authentication

In Part 1 of this series, we introduced the concept of using a decentralized autonomous organization (DAO) to govern the lifecycle of an AI model, focusing on the ingestion of training data. In Part 2, we created and deployed a minimalistic smart contract on the Ethereum Sepolia using Remix and MetaMask, establishing a mechanism to govern which training data can be uploaded to the knowledge base and by whom. In Part 3, we set up Amazon API Gateway and deployed AWS Lambda functions to copy data from InterPlanetary File System (IPFS) to Amazon Simple Storage Service (Amazon S3) and start a knowledge base ingestion job, creating a seamless data flow from IPFS to the knowledge base. In this post, we demonstrate how to configure MetaMask authentication, create a frontend interface, and test the solution.

Use a DAO to govern LLM training data, Part 3: From IPFS to the knowledge base

In Part 1 of this series, we introduced the concept of using a decentralized autonomous organization (DAO) to govern the lifecycle of an AI model, focusing on the ingestion of training data. In Part 2, we created and deployed a minimalistic smart contract on the Ethereum Sepolia testnet using Remix and MetaMask, establishing a mechanism to govern which training data can be uploaded to the knowledge base and by whom. In this post, we set up Amazon API Gateway and deploy AWS Lambda functions to copy data from InterPlanetary File System (IPFS) to Amazon Simple Storage Service (Amazon S3) and start a knowledge base ingestion job.

Use a DAO to govern LLM training data, Part 2: The smart contract

In Part 1 of this series, we introduced the concept of using a decentralized autonomous organization (DAO) to govern the lifecycle of an AI model, specifically focusing on the ingestion of training data. In this post, we focus on the writing and deployment of the Ethereum smart contract that contains the outcome of the DAO decisions.

Use a DAO to govern LLM training data, Part 1: Retrieval Augmented Generation

Blockchain and generative AI are two technical fields that have received a lot of attention in the recent years. There is an emerging set of use cases that can benefit from these two technologies. In this four-part series, we build a solution that governs the training data ingestion process of an AI model, using a smart contract and serverless components. We guide you through the different steps to build the solution. In this post, we review the overall architecture of the solution, and set up a large language model (LLM) knowledge base.

Embed textual data in Amazon RDS for SQL Server using Amazon Bedrock

In Part 1 of this post, we covered how Retrieval Augmented Generation (RAG) can be used to enhance responses in generative AI applications by combining domain-specific information with a foundation model (FM). However, we stayed focused on the semantic search aspect of the solution, assuming that our vector store was already built and fully populated. In this post, we explore how to generate vector embeddings on Wikipedia data stored in a SQL Server database hosted on Amazon RDS. We also use Amazon Bedrock to invoke the appropriate FM APIs and an Amazon SageMaker Jupyter Notebook to help us orchestrate the overall process.

Visualize vector embeddings stored in Amazon Aurora PostgreSQL and explore semantic similarities

In this post, we show how you can visualize vector embeddings and explore semantic similarities. We use PCA for dimensionality reduction. PCA is a well-known dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving as much of the original variance as possible. By projecting data onto orthogonal axes called principal components, PCA enables you to visualize the underlying structure of the data in a more manageable form

Vector search for Amazon DynamoDB with zero ETL for Amazon OpenSearch Service

As organizations increasingly rely on Amazon DynamoDB for their operational database needs, the demand for advanced data insights and enhanced search capabilities continues to grow. Leveraging the power of Amazon OpenSearch Service and Amazon Bedrock, you can now unlock generative artificial intelligence (AI) capabilities for your DynamoDB data. In this post, we show how you […]

How Apollo Tyres built their tyre genealogy solution using Amazon Neptune and Amazon Bedrock

This is a joint post co-authored with Shailender Gupta, Global Head of Data Engineering, Reporting and Analytics at Apollo Tyres Apollo Tyres, headquartered in Gurgaon, India, is a prominent global tyre manufacturer with production facilities in India and Europe. The company has a widespread presence, selling tyres to consumers and industrial customers across over 100 […]

Analyzing PL/SQL and T-SQL code using Amazon Bedrock

In this post, we use the Anthropic Claude3 Sonnet large language model (LLM) on Amazon Bedrock to provide a detailed breakdown of the complex PL/SQL and T-SQL code, making it more understandable and comprehensible for developers who are new to a code base or working with unfamiliar code, because it helps them understand the logic and flow of the code more effectively.

Improve speed and reduce cost for generative AI workloads with a persistent semantic cache in Amazon MemoryDB

In this post, we present the concepts needed to use a persistent semantic cache in MemoryDB with Knowledge Bases for Amazon Bedrock, and the steps to create a chatbot application that uses the cache. We use MemoryDB as the caching layer for this use case because it delivers the fastest vector search performance at the highest recall rates among popular vector databases on AWS. We use Knowledge Bases for Amazon Bedrock as a vector database because it implements and maintains the RAG functionality for our application without the need of writing additional code.