This repo contains a CDK (AWS Cloud Development Kit) application that deploys the necessary code required for setting up Human-in-the-loop for Amazon Textract. You can clone the repo and perform the deployment steps just as for a normal CDK app. However, it is recommended that you use an Amazon Cloud9 environment to deploy this since Cloud9 comes pre-installed with all the necessary dependencies required to deploy the application.
Setup a AWS Cloud9 environment by clicking the launch stack button below or follow the step-by-step instructions to deploy an AWS Cloud9 environment. (estimated time to deploy 10 minutes)
Before proceeding with the steps for installing, please review the architecure to understand the solution. You must create a Workteam using Amazon SageMaker GroundTruth Labeling Workforces. Follow the instructions here to create the workteam. Once you have your Private Workforce ready, take a note of the ARN of the workteam.
In your AWS Cloud9 or local machine, start by cloning this repository.
git clone <repo_url>
The first thing required is to update the .env
file found under /app
directory. Below is what the .env
file looks like. You should replace the IDP_REGION
, IDP_ACCOUNT
, and the TEXTRACT_CONFIDENCE_THRESHOLD
values per your needs. Update the workteam ARN for WORKTEAM_ARN
based on the private work team you just created.
Note: the TEXTRACT_CONFIDENCE_THRESHOLD
value is the confidence threshold value that this deployment and SageMaker Ground Truth will use to evaluate confidence for sending Amazon Textract outputs, and any confidence scores less than this threshold's value is sent to human for review. Once deployed, you can change the confidence threshold value from AWS Systems Manager Parameter Store. See Updating Confidence Thresholds section below for details.
Change into the /app
directory cd app
and update the .env
file by replacing the appropriate values
IDP_REGION=<your-region>
IDP_ACCOUNT=<your-aws-account-number>
TEXTRACT_CONFIDENCE_THRESHOLD=<your-confidence-threshold>
WORKTEAM_ARN=<your-smgt-workteam-arn>
example
IDP_REGION=us-east-1
IDP_ACCOUNT=123456789
TEXTRACT_CONFIDENCE_THRESHOLD=90
WORKTEAM_ARN="arn:aws:sagemaker:eu-central-1:123456789:workteam/private-crowd/idp-workteam"
If you are in a Cloud9 environment, you will have AWS CDK Toolkit (CLI) already installed for you. From there, all you need to do is run the following commands.
-
Change into the
/app
directory if you are not already in that directory and install the dependenciescd app npm install --save
-
Bootstrap CDK for your account. Note that the
account-id
andregion
in the command below are the same values from the.env
file.cdk bootstrap aws://<account>/<region>
-
Run CDK Synth to synthesize the CDK app to Cloudformation template
cdk synth
-
Run CDK deploy to deploy the application
cdk deploy --outputs-file ./cdk-outputs.json
Once the deployment is complete, the cdk-outputs.json
will contain the log of resources created with the stack.
Once deployed the application will create a threshold parameter under AWS SSM Parameter store. You can easily change the value of your confidence thresholds while testing via Parameter store directly without having to re-deploy this application again. To update the threshold value-
- Log on to your AWS console and search for "SSM" in the Search field and click on Systems Manager from the menu.
- In the Systems Manager console, select the Parameter store option from under Application Management in the left navigation menu.
- The following screen will display a list of parameters. Search for parameter name
CFN-idptextractconfidencethreshold
- Click the name of the parameter to view it's details. Notice that the Value of the parameter is set to the value that was specified in the
.env
file. - To edit/modify this value, click Edit button from the top right and modify the value. Note: the value should be numeric represenation with acceptable value range between 0 and 100.
- Once done, click Save changes to save your changes.
Important Note about updating threshold: Changing the threshold value will take affect for new Amazon Textract results. Existing tasks in the SageMaker Ground Truth human review task queue will not consider this new value and will continue to use the prior threshold value
⚠️ WARNING: The step below is destructive, which means that all resources including the Amazon S3 bucket, the Amazon DynamoDB table will be deleted. It is recommended that you back up the Amazon S3 bucket before deleting the application.
To clean-up and delete the application and all the corresponding AWS resources from your account, run the following command from the /app
directory.
cdk destroy
This library is licensed under the MIT-0 License. See the LICENSE file.