Skip to content

This end-to-end JavaScript sample showcases an event-based Blob storage triggered function that converts PDF documents to text at scale.

License

Notifications You must be signed in to change notification settings

Azure-Samples/functions-e2e-blob-pdf-to-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flex Consumption plan - PDF to text processor | Azure Functions

Processing binary files from Azure Blob Storage is a key scenario for Azure Functions. This end-to-end JavaScript sample showcases an event-based Blob storage triggered function that converts PDF documents to text at scale. It also uses managed identity and a virtual network between the function app and storage account for security best practices.

This solution creates two containers in blob storage, unprocessed-pdf and processed-text. An Event Grid-based Blob storage triggered function written in JavaScript is executed when a PDF file is added to the unprocessed-pdf container, converts the PDF to text using the PDF.js library, and saves the text to the processed-text container.

Using an Event Grid-based Blob storage trigger reduces latency by triggering your function instantly as changes occur in the container. This type of Blob storage trigger is the only type of Blob storage trigger that can be used when running in a Flex Consumption plan.

The communication between the function and the storage account happens via a system assigned managed identity, and the storage account is restricted behind a virtual network. The Azure Function uses VNet integration to reach the storage account. You can opt out of a VNet being used in the sample by setting SKIP_VNET to true in the parameters.

Diagram showing customers uploading PDF files into the unprocessed-pdf Blob storage container and Azure Functions Flex Consumption processing them into PDF and sending the text to the processed-text container. The storage account and function app are inside the VNet.

Important

This sample creates several resources. Make sure to delete the resource group after testing to minimize charges!

Prerequisites

Before you can run this sample, you must have the following:

Provision the solution on Azure

To set up this sample, follow these steps:

  1. Clone this repository to your local machine.
  2. in the root folder use the Azure Developer CLI (azd) to provision a new resource group with all the resources for the sample using the the environment name you and in the location you provide.
azd up

Alternatively, you can opt-out of a VNet being used in the sample. To do so, use azd env to configure SKIP_VNET to true before running azd up:

azd env set SKIP_VNET true
azd up

Inspect the solution (optional)

  1. Once the deployment is done, inspect the new resource group. The Flex Consumption function app and plan, storage, and App Insights have been created and configured: List of resources created by the bicep template
  2. By default the storage account will be locked to the virtual network. To be able to access the containers using the Azure Portal, you can add your client IP Address to the storage account Firewall. In the Azure Portal open the storage account created by the sample, go to Networking in the Security + networking section, and add your client IP address to the Firewall. After a minute you will be able to browse to the data storage containers. This step is not required if you turned off the VNet creation. Blob storage network settings adding client IP address to Firewall exceptions
  3. The storage account has two extra containers in blob storage: Blob storage account with two extra containers
  4. Open the processed-text and unprocessed-pdf containers, which are empty.

Test the solution

  1. Using the Azure Portal or any other tool, upload PDF files to the unprocessed-pdf container. There are sample PDF files in the local data folder. For example, once all files in data folder are uploaded to the unprocessed-pdf container you should see: List of PDF files in the unprocessed-pdf folder
  2. Browse to the processed-text folder and notice that within seconds all the uploaded PDF files have now been processed into text files by the Flex Consumption hosted function: List of text files in the processed-text folder

Clean up resources

When you no longer need the resources created in this sample, run the following command to delete the Azure resources:

azd down

Resources

For more information on Azure Functions, Event Hubs, and VNet integration, see the following resources:

About

This end-to-end JavaScript sample showcases an event-based Blob storage triggered function that converts PDF documents to text at scale.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •