AWS Machine Learning Blog
Introducing Amazon Kendra tabular search for HTML Documents
Amazon Kendra is an intelligent search service powered by machine learning (ML). Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization.
Amazon Kendra users can now quickly find the information they need from tables on a webpage (HTML tables) using Amazon Kendra tabular search. Tables contains useful information in structured format so it can be easily interpreted by making visual associations between row and column headers. With Amazon Kendra tabular search, you can now get specific information from the cell or certain rows and columns relevant to your query, as well as preview of the table.
In this post, we provide an example of how to use Amazon Kendra tabular search.
Tabular search in Amazon Kendra
Let’s say you have a webpage in HTML format that contains a table with inflation rates and annual changes in the US from 2012–2021, as shown in the following screenshot.
When you search for “Inflation rate in US”, Amazon Kendra presents the top three rows in the preview and up to five columns, as shown in the following screenshot. You can then see if this article has the relevant details that you’re looking for and decide to either use this information or open the link to get additional details. Amazon Kendra tabular search can also handle merged rows.
Let’s do another search and get specific information from the table by asking “What was the annual change of inflation rate in 2017?”. As shown in the following screenshot, Amazon Kendra tabular search highlights the specific cell that contains the answer to your question.
Now let’s search for “Which year had top inflation rate?”, Amazon Kendra searches the table, sorts the results, and gives you the year that had the highest inflation rate.
Amazon Kendra can also find the range of column information that you’re looking for. For example, let’s search for “Inflation rate from 2012 and 2014.” Amazon Kendra displays the rows and columns between 2012–2014 in the preview.
Get started with Amazon Kendra tabular search
Amazon Kendra tabular search is turned on by default and no special configuration is required to enable it. For newer documents, Amazon Kendra tabular search will work by default. For existing HTML pages that contain tables, you can either update the document and sync (if you only have a few documents), or reach out to AWS Support .
To test tabular search on your internal or external webpage, complete the following steps:
- Create an index.
- Add data sources by using the web crawler or downloading the HTML page and uploading it to an Amazon Simple Storage Service (Amazon S3) bucket.
- Go to the Search Indexed Content tab and test it out.
Limitations and considerations
Keep the following in mind when using this feature:
- In this release, Amazon Kendra only supports HTML formatted tables or HTML tables within the table tag. This doesn’t include nested tables or other forms of tables.
- Amazon Kendra can search through tables up to 30 columns and 60 rows, and up to 500 total table cells. If you have a table with a higher numbers of rows, columns, or table cells, Amazon Kendra will not search within that table.
- Amazon Kendra doesn’t display tabular search results if the confidence score of query result for the column and row is very low. You can look at the confidence score within ScoreAttributes using the
QueryResultItem
API.
Conclusion
With Amazon Kendra’s tabular search for HTML in Amazon Kendra, you can now search across both unstructured data from various data sources and structured data in the form of tables. This further enhances the user experiences and you can get factual responses from your natural language query as well as from the tables. The table preview with Kendra’s suggested answers allows you to quickly asses if the HTML document table contains relevant information you are looking for, thereby saving time.
Amazon Kendra tabular search is available in the following AWS regions during launch: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Asia Pacific (Sydney), Asia Pacific (Singapore), Canada (Central) and AWS GovCloud (US-West).
To learn more about Amazon Kendra, visit the Amazon Kendra product page.
About the authors
Vikas Shah is an Enterprise Solutions Architect at Amazon web services. He is a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His areas of interest are ML, IoT, robotics and storage. In his spare time, Vikas enjoys building robots, hiking, and traveling.