Google Cloud Document AI Samples
The repository contains samples and Community Samples that demonstrate how to analyze, classify and search documents using Google Cloud Document AI.
- Apps Script & Google Drive Integration: Code in Google Apps Script for integration with Document AI.
- Document AI Warehouse Processing (Python): This project demonstrates how to perform common actions on Document AI Warehouse through API.
- Document AI Warehouse Batch Ingestion via script: This project is a helper utility to do batch ingestion of the documents into the Document AI Warehouse.
- BQ Connector: This project uses the Document AI API to process a document, format the result and save it into a BigQuery table.
- Content Moderation with Dialogflow CX: This project uses the Content Moderation processor with Dialogflow CX for toxicity routing during a conversation.
- Filter HITL Language: This project uses the languages detected by Document AI (post-HITL) to sort the
Document.json
files into separate Cloud Storage buckets. - Fraud Detection: This project uses the Document AI Invoice Parser with EKG and Google Maps to store document Entities in BigQuery.
- JSON Explorer: A React Tool to explore the Document JSON Response.
- Language Extraction: This project uses the Document AI API to detect the languages in a multi-page document.
- Paper Summarization: This project uses the Document AI API to summarize scientific articles.
- PDF Embedded Text: Demonstrates how to use the Native PDF parsing feature for the OCR Processor (
v1beta3
) - SQL over Docs: This project shows how to run a BigQuery SQL and extract information from documents.
- Tax Processing Pipeline: This project uses the Document AI API to classify, parse, and calculate a tax form using multiple document types.
- Web App Demo: This project is a full-stack application that uses Document AI to process different types of documents. This application currently supports Form, Invoice and OCR processors.
Replaced by Document AI Toolbox
- PDF Splitter: This project uses the Document AI API to split PDF documents.
- Tabular Data Extraction: This project uses the Document AI API to extract tabular data from a document.
If you need Document Files to run the samples, you can access them from this publicly-accessible Google Cloud Storage Bucket.
gs://cloud-samples-data/documentai/
You can also view sample input/output files by processor on the Sample Output page of the documentation.
- Optical Character Recognition (OCR) with Document AI (Python)
- Form Parsing with Document AI (Python)
- Specialized Processors with Document AI (Python)
- Managing Document AI processors (Python)
Disclaimer: Community samples are not officially maintained by Google.
- PDF Annotator Sample: This project uses the Document AI API to annotate PDF documents.
Contributions welcome! See the Contributing Guide.
Please use the issues page to provide feedback or submit a bug report.
This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.