AWS Machine Learning Blog

Integrate Amazon Bedrock Knowledge Bases with Microsoft SharePoint as a data source

Amazon Bedrock Knowledge Bases provides a fully managed solution for supplying foundation models (FMs) and agents in Amazon Bedrock contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG) workflows. Adding RAG to generative AI applications yields more relevant, accurate, and customized responses.

Amazon Bedrock Knowledge Bases supports adding one or more data sources, and the supported data source types are continuously expanding. This post walks through how to configure the Microsoft SharePoint data source connector with your Microsoft SharePoint Site. Microsoft SharePoint is an integrated content management and collaboration tool that many organizations use for storing, organizing, and sharing their internal data. See Create a data source connector for your knowledge base for the full list of supported data source connectors.

Solution overview

Connecting SharePoint as a data source with your Amazon Bedrock knowledge base offers the following benefits:

  • AI inference will have context from information stored in SharePoint. Amazon Bedrock Knowledge Bases accomplishes this task by:
    • Breaking down multiple documents into smaller chunks in a process called chunking.
    • Transforming the smaller chunks of document content into a numerical representation that represents its semantic meaning in a process called embedding.
    • Indexing the embeddings of the document chunks for future search.
    • Comparing and searching against the index of embeddings based on the semantic meaning (embedding) of the user’s prompt to find relevant document chunks.
    • Retrieving the associated source document chunks from the SharePoint Site associated with the matching embeddings from the search.
    • Providing the source document text as context with the AI model invocation.
  • You can extract structured data, metadata, and other information from documents stored in SharePoint to provide relevant search results based on the user query.
  • You can incrementally sync or index your data sources with your knowledge base as content in your SharePoint Site changes.
  • You can attribute the source data and content for Amazon Bedrock RAG responses generated.

In the following sections, we walk through the steps to create a knowledge base, configure your data source, and test the solution.

Prerequisites

The following are the prerequisites necessary to implement Amazon Bedrock Knowledge Bases with SharePoint as a connector:

Register a new application in the Microsoft Azure Portal

In this section, you register a new application in the Microsoft Azure Portal for the forthcoming knowledge base. You use the tenant ID from this step when configuring the data source for the knowledge base. Complete the following steps:

  1. Open the Azure Portal and log in with your Microsoft account.
  2. From the Microsoft Entra admin center, expand Applications in the navigation pane and choose App registrations.
  3. Choose New registration.

  1. Provide the following information:
    • For Name, provide the name for your application. Let’s refer to this application as KnowledgeBaseDataSourceApp. Amazon Bedrock Knowledge Bases uses KnowledgeBaseDataSourceApp to connect to the SharePoint Site to crawl and index the data.
    • For Who can use this application or access this API, choose Accounts in this organizational directory only (<Tenant name> only – Single tenant).
    • Choose Register.
    • Note the values for Application (client) ID and the Directory (tenant) ID on the Overview You need them later when asked for clientId in Secrets Manager and TenantId in the knowledge base data source configuration.
  2. Choose API permissions in the navigation pane.
  3. Choose Add a permission and provide the following information:
    • Choose Microsoft Graph.
    • Choose Delegated permissions.
    • Search for Sites and select Sites.Read.All.
    • Choose Add permissions.

This permission allows the app to read data in your organization’s directory about the signed-in user.

  1. Remove the original Read – Delegated permission by selecting the options menu (three dots) and choosing Remove permission.
  2. Choose Grant admin consent for the default directory.
  3. Choose Certificates & secrets in the navigation pane.
  4. Choose New client secret.
  5. Provide the following information:
    • For Description, enter a description, such as Client Secret for Amazon Bedrock knowledge base authentication.
    • Choose a value for Expires. In production, you’ll need to manually rotate your secret before it expires.
    • Choose Add.
  6. Note down the value for your new client secret. You’ll need it later when saving your credentials in Secrets Manager (under the clientSecret key).
  7. Optionally, choose Owners to add any additional owners for the application.

Owners will be able to manage permissions of the Azure AD app (KnowledgeBaseDataSourceApp).

To sync your Amazon Bedrock knowledge base with your SharePoint Site, you set up authentication credentials in Secrets Manager. For Amazon Bedrock to use these credentials to sync with the SharePoint Site, you need to disable security defaults in Azure because Amazon Bedrock can’t perform multifactor authentication (MFA).

  1. Set up your authentication credentials:
    1. Navigate to Identity, Overview in Microsoft Entra.
    2. On the Properties tab, choose Manage security defaults.
    3. Choose Disabled on the Security defaults
    4. Select Other for Reason for disabling and provide a reason, such as Disabling security defaults to allow authentication for Amazon Bedrock Knowledge Base to sync with a SharePoint data source.
    5. Choose Save.

To enable seamless OAuth 2.0 integration between Amazon Bedrock Knowledge Bases and SharePoint Online, we recommend turning off Entra ID security defaults. These defaults enforce MFA, which cause connectivity issues when crawling SharePoint Online.

Security defaults are enabled by default and enforce MFA, block legacy authentication protocols, and protect privilege activities to the Azure Portal. By disabling these settings, you allow OAuth 2.0 from AWS to connect. It is recommended that you review the deployment considerations and either enforce MFA at a per-user level or use Entra Conditional Access.

Create a Secrets Manager secret for credentials to the SharePoint data source

Complete the following steps to create a Secrets Manager secret to connect to the SharePoint online sites listed as site URLs within the data source:

  1. On the Secrets Manager console, choose Store a new secret.
  2. For Secret type, select Other type of secret.
  3. For Key/value pairs, enter the following keys:
    1. username – The user name for your account or a Service Account that is registered in Microsoft Entra. This will have the format <userId>@<domain>.onmicrosoft.com.
    2. password – The password for the account associated with the preceding user name.
    3. clientId – The application (client) ID you saved earlier in the Azure Portal.
    4. clientSecret – The client secret you saved earlier in the Azure Portal.
  4. For Encryption key, choose aws/secretsmanager.
  5. Choose Next.
  6. In the Secret name and description section, enter the name of the secret and an optional description. The secret name must start with AmazonBedrock – to allow the knowledge base to read this secret.
  7. Add any associated tags in the Tags
  8. Leave Resource permissions and Replication secret as default.
  9. Choose Next.
  10. In the Configure rotation section, leave as default or modify according to your organizational policies.
  11. Choose Next.
  12. Review the options you selected and choose Store.
  13. On the secrets detail page, save your secret Amazon Resource Name (ARN) value for use when configuring the knowledge base authentication.

Create a knowledge base and connect to the data source

Complete the following steps to set up a knowledge base on Amazon Bedrock and connect to a SharePoint data source:

  1. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  2. Choose Create knowledge base.
  3. In the Knowledge base details section, optionally change the default name and enter a description for your knowledge base
  4. In the IAM permissions section, choose an IAM role that provides Amazon Bedrock permission to access other AWS services. You can let Amazon Bedrock create the service role or choose a custom role that you have created.
  5. In the Choose data source section, select SharePoint.
  6. Optionally, add tags to your knowledge base. For more information, see Tag resources.
  7. Choose Next.
  8. In the Name and Description section, optionally change the default data source name and enter a description of the data source.
  9. In the Source section, provide the following information:
    1. For Site URLs, enter the site URL to use for crawling and indexing the content for RAG. Your URL should be formatted like https://<domain>.sharepoint.com/sites/<site_name>.
    2. For Domain, enter the domain name associated with the data source. For example, if the site URL is https://company_name.sharepoint.com/companysite, the domain value would be company_name.
    3. Expand Advanced settings and note the default selections.

When converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages by default. To use your own AWS Key Management Service (AWS KMS) key, you can choose Customize encryption settings (Advanced) and create or select a KMS key. For more information, see Encryption of transient data storage during data ingestion.

Delete is selected as the default data deletion policy. The following are descriptions of the data deletion policy options for your data source:

  • Delete – Deletes all underlying data belonging to the data source from the vector store upon deletion of a knowledge base or data source resource. The vector store itself is not deleted, only the underlying data. This flag is ignored if an AWS account is deleted.
  • Retain – Retains all underlying data in your vector store upon deletion of a knowledge base or data source resource.

For more information on managing your knowledge base, see Connect to your data repository for your knowledge base.

  1. In the Authentication section, provide the following information:
    1. For Tenant ID, enter your directory (tenant) ID you saved earlier.
    2. For AWS Secrets Manager secret, enter the secret ARN you saved earlier.

The SharePoint data source will need credentials to connect to the SharePoint Online Site using the Microsoft Graph API. To facilitate this, create a new Secrets Manager secret. These credentials will not be used in any access logs for the SharePoint Online Site.

  1. In the Content chunking and parsing section, leave the Default strategy selected. This is how larger documents are broken down into smaller chunks before making embeddings that are used in the search process of a RAG workflow.
  2. In the Metadata and Filtering section, optionally select any content types that you want to include or exclude.
  3. Choose Next.
  4. For Embeddings model, choose Titan Text Embeddings v2 or another embeddings model as desired.
  5. For Vector database, select Quick create a new vector store to create a vector store for the embeddings.
  6. Choose Next.
  7. On the Review and create page, verify the selections you made and choose Create.

The knowledge base creation should be complete with SharePoint connected as a data source. Finally, we need to sync the data source to crawl the SharePoint Site and index the associated content.

  1. To initiate this process, select your data source and choose Sync on the knowledge base details page.

Test the solution

Complete the following steps to test the knowledge base you created:

  1. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  2. Select the synced knowledge base you created and choose Test knowledge base.
  3. Choose a model of your choice for testing and choose Apply.
  4. Enter a question that would require context from content stored in your SharePoint Site.
  5. Choose Show source details in the response to see more information about the data used in the RAG workflow.

Clean up

If you created a new knowledge base to experiment using this post and don’t plan to use it further, delete the knowledge base so that your AWS account doesn’t accumulate costs. For instructions, see Delete an Amazon Bedrock knowledge base.

Conclusion

In this post, you walked through configuring an Amazon Bedrock knowledge base connected to SharePoint Online as a data source. By connecting SharePoint Online as a data source, you can interact with your organization’s knowledge and data stored in SharePoint using natural language, making it straightforward to find relevant information, extract key points, and derive valuable insights. This can significantly improve productivity, decision-making, and knowledge sharing within your organization.

Try this feature on the Amazon Bedrock console today! See Retrieve data and generate AI responses with knowledge bases to learn more.


About the Authors

Surendar Gajavelli is a Sr. Solutions Architect based out of Nashville, Tennessee. He is a passionate technology enthusiast who enjoys working with customers and helping them build innovative solutions.

Abhi Patlolla is a Sr. Solutions Architect based out of the New York City region, helping customers in their cloud transformation, AI/ML, and data initiatives. He is a strategic and technical leader, advising executives and engineers on cloud strategies to foster innovation and positive impact.

Todd Moore is a Sr. Solutions Architect based in Denver, Colorado. With a strong background in creating cloud solutions, Todd collaborates with clients to design scalable architectures that drive business success. He’s passionate about cloud technologies, AI, and building efficient systems that meet diverse needs. Outside of work, Todd enjoys gaming, exploring the Colorado outdoors, traveling to new places, and spending time with his family.

Brian Smitches is a Solutions Architect based in Austin, Texas. With a focus on designing and implementing innovative cloud solutions, Brian collaborates with clients to create scalable architectures that drive business success. Passionate about cloud technologies, AI, and system optimization, he brings a strategic approach to every project. Outside of work, Brian enjoys exploring the vibrant Austin food scene, traveling to new destinations, and spending time outdoors.

Andrew Chen is a Solutions Architect based in Denver, Colorado. Specializing in designing cloud solutions, Andrew partners with clients to develop scalable architectures that enhance business performance. He is passionate about cloud innovation, AI integration, and creating efficient systems tailored to meet unique client needs. Outside of work, Andrew enjoys exploring the Colorado outdoors, hitting the slopes for skiing, and discovering new hiking trails.