AWS Machine Learning Blog
Index your Dropbox content using the Dropbox connector for Amazon Kendra
Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.
Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to pull together data across several structured and unstructured repositories to index and search on.
One such data repository is Dropbox. Enterprise users use Dropbox to upload, transfer, and store documents to the cloud. Along with the ability to store documents, Dropbox offers Dropbox Paper, a coediting tool that lets users collaborate and create content in one place. Dropbox Paper can optionally use templates to add structure to documents. In addition to files and paper, Dropbox also allows you to store shortcuts to webpages in your folders.
We’re excited to announce that you can now use the Amazon Kendra connector for Dropbox to search information stored in your Dropbox account. In this post, we show how to index information stored in Dropbox and use the Amazon Kendra intelligent search function. In addition, Amazon Kendra’s ML powered intelligent search can accurately find information from unstructured documents having natural language narrative content, for which keyword search is not very effective.
Solution overview
With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a Dropbox repository or folder using the Amazon Kendra connector for Dropbox. The solution consists of the following steps:
- Configure an app on Dropbox and get the connection details.
- Store the details in AWS Secrets Manager.
- Create a Dropbox data source via the Amazon Kendra console.
- Index the data in the Dropbox repository.
- Run a sample query to get the information.
Prerequisites
To try out the Amazon Kendra connector for Dropbox, you need the following:
- A Dropbox Enterprise (not personal) account.
- An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
- Basic knowledge of AWS.
Configure a Dropbox app and gather connection details
Before we set up the Dropbox data source, we need a few details about your Dropbox repository. Let’s gather those in advance.
- Go to www.dropbox.com/developers.
- Choose App console.
- Sign in with your credentials (make sure you’re signing in to an Enterprise account).
- Choose Create app.
- Select Scoped access.
- Select Full Dropbox (or the name of the specific folder you want to index).
- Enter a name for your app.
- Choose Create app.
You can see the configuration screen with a set of tabs. - To set up permissions, choose the Permissions tab.
- Select a minimal set of permissions, as shown in the following screenshots.
- Choose Submit.
A message appears saying that the permission change was successful.
- On the Settings tab, copy the app key.
- Choose Show next to App secret and copy the secret.
- Under Generated access token, choose Generate and copy the token.
Store these values in a safe place—we need to refer to these later.
The session token is valid for up to 4 hours. You have to generate a new session token each time you index the content.
Store Dropbox credentials in Secrets Manager
To store your Dropbox credentials in Secrets Manager, compete the following steps:
- On the Secrets Manager console, choose Store a new secret.
- Choose Other type of secret.
- Create three key-value pairs for
appKey
,appSecret
, andrefreshToken
and enter the values saved from Dropbox. - Choose Save.
- For Secret name, enter a name (for example,
AmazonKendra-dropbox-secret
). - Enter an optional description.
- Choose Next.
- In the Configure rotation section, keep all settings at their defaults and choose Next.
- On the Review page, choose Store.
Configure the Amazon Kendra connector for Dropbox
To configure the Amazon Kendra connector, complete the following steps:
- On the Amazon Kendra console, choose Create an Index.
- For Index name, enter a name for the index (for example,
my-dropbox-index
). - Enter an optional description.
- For Role name, enter an IAM role name.
- Configure optional encryption settings and tags.
- Choose Next.
- In the Configure user access control section, leave the settings at their defaults and choose Next.
- For Provisioning editions, select Developer edition.
- Choose Create.
This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes. - Choose Data sources in the navigation pane.
- Under Dropbox, choose Add connector.
- For Data source name, enter a name (for example,
my-dropbox-connector
). - Enter an optional description.
- Choose Next.
- For Type of authentication token, select Access Token (temporary use).
- For AWS Secrets Manager secret, choose the secret you created earlier.
- For IAM role, choose Create a new role.
- For Role name, enter a name (for example,
AmazonKendra-dropbox-role
). - Choose Next.
- For Select entities or content types, choose your content types.
- For Frequency, choose Run on demand.
- Choose Next.
- Set any optional field mappings and choose Next.
- Choose Review and Create and choose Add data source.
- Choose Sync now.
- Wait for the sync to complete.
Test the solution
Now that you have ingested the content from your Dropbox account into your Amazon Kendra index, you can test some queries.
Go to your index and choose Search indexed content. Enter a sample search query and test out your search results (your query will vary based on the contents of your account).
The Dropbox connector also crawls local identity information from Dropbox. For users, it sets user email id as principal. For groups, it sets group id as principal. To filter search results by users/groups, go to the Search Console.
Click on “Test query with user name or groups” to expand it and click on the button that says “apply user name or groups”.
Enter the user and/or group names and click Apply. Next, enter the search query and hit enter. This brings you a filtered set of results based on your criteria.
Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Dropbox account.
Generate permanent tokens for offline access
The instructions in this post walk you through creating, configuring, and using a temporary access token. Apps can also get long-term access by requesting offline access, in which case the app receives a refresh token that can be used to retrieve new short-lived access tokens as needed, without further manual user intervention. You can find more information in the Dropbox OAuth Guide and Dropbox authorization documentation. Use the following steps to create a permanent refresh token (for example to set the sync to trigger on a schedule):
- Get the app key and app secret as before.
- In a new browser, navigate to
https://www.dropbox.com/oauth2/authorize?token_access_type=offline&response_type=code&client_id=<appkey>
. - Accept the defaults and choose Submit.
- Choose Continue.
- Choose Allow.
An access code is generated for you. - Copy the access code.
Now you get the refresh token from the access code. - In a terminal window, run the following curl command:
You can store this refresh token along with the app key and app secret to configure a permanent token in the data source configuration for Amazon Kendra. Amazon Kendra generates the access token and uses it as needed for access.
Limitations
This solution has the following limitations:
- File comments are not imported into the index
- You don’t have the option to add custom metadata for Dropbox
- Google docs, sheets, and slides need a Google workspace or Google account and are not included
Conclusion
With the Dropbox connector for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.
In this post, we introduced you to the basics, but there are many additional features that we didn’t cover. For example:
- You can enable user-based access control for your Amazon Kendra index and restrict access to users and groups that you configure
- You can specify
allowedUsersColumn
andallowedGroupsColumn
so you can apply access controls based on users and groups, respectively - You can map additional fields to Amazon Kendra index attributes and enable them for faceting, search, and display in the search results
- You can integrate the Dropbox data source with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion
To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide.
About the author
Ashish Lagwankar is a Senior Enterprise Solutions Architect at AWS. His core interests include AI/ML, serverless, and container technologies. Ashish is based in the Boston, MA, area and enjoys reading, outdoors, and spending time with his family.