How to Apply AI to Extract Data from Documents

Documents are part of every organization's processes. Some rely on paper-driven workflows that result in typewritten or handwritten documents, and others need to process lengthy documents as part of their everyday work. AI can help streamline these workflows and make it easier to use the data in other processes.

Here are three common use cases with an example:

  • Read an entire document: all of the text is read, which is helpful for contracts, legal filings, academic papers, etc

  • Extract form fields: specific pieces of a document are read and each is assigned a name (e.g. first name, phone number, city) to use downstream for automating data entry

  • Extract tabular data: useful for reading repeated sets of data like pulling out invoice line items, expenses claimed on tax returns, financial holdings, etc

How does this work?
Essentially, documents are a grid of pixels, which means that a computer can recognize patterns from pixels that are near each other, then compare those to a known set of patterns for typewritten and handwritten letters and numbers.

For extracting data from a form (in both fields and tables), part of the process involves overlaying rectangles on the document to tell AI where to read.

Where do you go to get started?
A few SaaS offerings include Docparser and Docsumo, both of which can integrate via Zapier or Make for no-code / low-code integrations.

For more control and extensibility, all of the major PaaS vendors have offerings as well: Amazon, Azure, Google.

All of these include prebuilt templates for reading common forms like a Passport or W-2, and allow you to upload your own documents to create custom templates.

TLDR: Extracting data from documents with AI is a great first step for paper-heavy processes; you can improve accuracy and reduce processing times.

Only Done Right Daily

A free, daily email newsletter with practical insights into digital strategy and transformation, designed for both practitioners and executives looking to make processes and technology work better.

Each email is a two minute read packed with content on how to continually drive digital transformation in your organization.

    I will not send you spam nor share your email address with anyone else.

    If you're still not sure, you can browse the archive.

    Previous
    Previous

    Quantify Your Organization’s Culture: Why & How

    Next
    Next

    Exploring Revenue Models: One Time, Reoccurring, & Recurring