AWS Machine Learning Blog

Category: Amazon Textract

Improved OCR and structured data extraction with Amazon Textract

Optical character recognition (OCR) technology, which enables extracting text from an image, has been around since the mid-20th century, and continues to be a research topic today. OCR and document understanding are still vibrant areas of research because they’re both valuable and hard problems to solve. AWS has been investing in improving OCR and document […]

How Kabbage improved the PPP lending experience with Amazon Textract

This is a guest post by Anthony Sabelli, Head of Data Science at Kabbage, a data and technology company providing small business cash flow solutions. Kabbage is a data and technology company providing small business cash flow solutions. One way in which we serve our customers is by providing them access to flexible lines of […]

Translating PDF documents using Amazon Translate and Amazon Textract

September 2024: This post was reviewed and updated for accuracy. In 1993, the Portable Document Format or the PDF was born and released to the world. Since then, companies across various industries have been creating, scanning, and storing large volumes of documents in this digital format. These documents and the content within them are vital […]

Using Amazon Textract with AWS PrivateLink

Amazon Textract now supports Amazon Virtual Private Cloud (Amazon VPC) endpoints via AWS PrivateLink so you can securely initiate API calls to Amazon Textract from within your VPC and avoid using the public internet. In this post, we show you how to access Amazon Textract APIs from within your VPC without traversing the public internet, […]

Amazon Textract now available in Asia Pacific (Mumbai) and EU (Frankfurt) Regions 

You can now use Amazon Textract, a machine learning (ML) service that quickly and easily extracts text and data from forms and tables in scanned documents, for workloads in the AWS Asia Pacific (Mumbai) and EU (Frankfurt) Regions. Amazon Textract goes beyond simple optical character recognition (OCR) to identify the contents of fields in forms, […]

Processing PDF documents with a human loop using Amazon Textract and Amazon Augmented AI

Businesses across many industries, including financial, medical, legal, and real estate, process a large number of documents for different business operations. Healthcare and life science organizations, for example, need to access data within medical records and forms to fulfill medical claims and streamline administrative processes. Amazon Textract is a machine learning (ML) service that makes […]

Extracting custom entities from documents with Amazon Textract and Amazon Comprehend

July 2024: This post was reviewed and updated for accuracy. Amazon Textract is a machine learning (ML) service that makes it easy to extract text and data from scanned documents. Textract goes beyond simple optical character recognition (OCR) to identify the contents of fields in forms and information stored in tables. This allows you to […]

Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex

Organizations across industries have a large number of physical documents such as invoices that they need to process. It is difficult to extract information from a scanned document when it contains tables, forms, paragraphs, and check boxes. Organization have been addressing these problems with manual effort or custom code or by using Optical Character Recognition […]

Amazon Textract is now SOC and ISO compliant

You can now use Amazon Textract, a machine learning (ML) service that quickly and easily extracts text and data from forms and tables in scanned documents, for workloads that are subject to Service Organization Control (SOC) compliance and International Organization for Standardization (ISO) compliance. This launch builds upon the existing portfolio of AWS ML services […]

Analyzing and tagging assets stored in Veeva Vault PromoMats using Amazon AI services

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Veeva Systems is a provider of cloud-based software for the global life sciences industry, which offers products that serve multiple domains ranging from clinical, regulatory, quality, and more. Veeva’s Vault Platform manages both content and data in a single platform […]