Google Cloud Document AI Samples

Overview

The repository contains samples and Community Samples that demonstrate how to analyze, classify and search documents using Google Cloud Document AI.

Samples

Apps Script & Google Drive Integration: Code in Google Apps Script for integration with Document AI.
Document AI Warehouse Processing (Python): This project demonstrates how to perform common actions on Document AI Warehouse through API.
Document AI Warehouse Batch Ingestion via script: This project is a helper utility to do batch ingestion of the documents into the Document AI Warehouse.
BQ Connector: This project uses the Document AI API to process a document, format the result and save it into a BigQuery table.
Content Moderation with Dialogflow CX: This project uses the Content Moderation processor with Dialogflow CX for toxicity routing during a conversation.
Filter HITL Language: This project uses the languages detected by Document AI (post-HITL) to sort the Document.json files into separate Cloud Storage buckets.
Fraud Detection: This project uses the Document AI Invoice Parser with EKG and Google Maps to store document Entities in BigQuery.
JSON Explorer: A React Tool to explore the Document JSON Response.
Language Extraction: This project uses the Document AI API to detect the languages in a multi-page document.
Paper Summarization: This project uses the Document AI API to summarize scientific articles.
PDF Embedded Text: Demonstrates how to use the Native PDF parsing feature for the OCR Processor (v1beta3)
SQL over Docs: This project shows how to run a BigQuery SQL and extract information from documents.
Tax Processing Pipeline: This project uses the Document AI API to classify, parse, and calculate a tax form using multiple document types.
Web App Demo: This project is a full-stack application that uses Document AI to process different types of documents. This application currently supports Form, Invoice and OCR processors.

Samples not in this Repository

Deprecated Samples

Replaced by Document AI Toolbox

PDF Splitter: This project uses the Document AI API to split PDF documents.
Tabular Data Extraction: This project uses the Document AI API to extract tabular data from a document.

Test Document Files

If you need Document Files to run the samples, you can access them from this publicly-accessible Google Cloud Storage Bucket.

gs://cloud-samples-data/documentai/

You can also view sample input/output files by processor on the Sample Output page of the documentation.

Codelabs

Community Samples

Disclaimer: Community samples are not officially maintained by Google.

PDF Annotator Sample: This project uses the Document AI API to annotate PDF documents.

Contributing

Contributions welcome! See the Contributing Guide.

Getting help

Please use the issues page to provide feedback or submit a bug report.

Disclaimer

This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 1,585 Commits
.github		.github
apps-script-google-drive		apps-script-google-drive
bq-connector		bq-connector
classify-split-extract-workflow		classify-split-extract-workflow
community		community
cx-content-moderation		cx-content-moderation
document-json-explorer		document-json-explorer
document-processing-workflows		document-processing-workflows
document_ai_warehouse		document_ai_warehouse
ekg-demo		ekg-demo
extract-languages		extract-languages
extract-tables		extract-tables
filter-hitl-language		filter-hitl-language
form-parser-to-cde		form-parser-to-cde
fraud-detection-python		fraud-detection-python
hitl-custom-review		hitl-custom-review
incubator-tools		incubator-tools
paper_summarization		paper_summarization
pdf-embedded-text		pdf-embedded-text
pdf-splitter-python		pdf-splitter-python
sql-pdf-python		sql-pdf-python
tax-processing-pipeline-python		tax-processing-pipeline-python
toolbox-batch-processing		toolbox-batch-processing
uptraining_docai_processor_using_python		uptraining_docai_processor_using_python
watermark-remover		watermark-remover
web-app-demo		web-app-demo
web-app-pix2info-python		web-app-pix2info-python
.gitignore		.gitignore
.repo-metadata.json		.repo-metadata.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
noxfile.py		noxfile.py
owlbot.py		owlbot.py
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Cloud Document AI Samples

Overview

Samples

Samples not in this Repository

Deprecated Samples

Test Document Files

Codelabs

Community Samples

Contributing

Getting help

Disclaimer

About

Contributors 22

Languages

License

GoogleCloudPlatform/document-ai-samples

Folders and files

Latest commit

History

Repository files navigation

Google Cloud Document AI Samples

Overview

Samples

Samples not in this Repository

Deprecated Samples

Test Document Files

Codelabs

Community Samples

Contributing

Getting help

Disclaimer

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Contributors 22

Languages