-
Notifications
You must be signed in to change notification settings - Fork 969
new serverless pattern - s3-eventbridge-lambda-textract-node #2562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Amazon S3 to Amazon Textract through AWS EventBridge | ||
|
||
This pattern demonstrates how to create an S3 bucket which when uploaded with an object invokes a Lambda function through EventBridge and detects the text in a document through Amazon Textract. The lambda function code uses NodeJs runtime. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please rephrase this. |
||
|
||
Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns/s3-eventbridge-lambda-textract-node | ||
|
||
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. | ||
|
||
## Requirements | ||
|
||
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. | ||
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured | ||
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) | ||
* [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed | ||
|
||
## Deployment Instructions | ||
|
||
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: | ||
``` | ||
git clone https://github.com/aws-samples/serverless-patterns | ||
``` | ||
1. Change directory to the pattern directory: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Increase the sequence 1, 2, 3... |
||
``` | ||
cd s3-eventbridge-lambda-textract-node | ||
``` | ||
1. From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file: | ||
``` | ||
sam deploy --guided | ||
``` | ||
1. During the prompts: | ||
* Enter a stack name | ||
* Enter the desired AWS Region | ||
* Allow SAM CLI to create IAM roles with the required permissions. | ||
|
||
Once you have run `sam deploy --guided` mode once and saved arguments to a configuration file (samconfig.toml), you can use `sam deploy` in future to use these defaults. | ||
|
||
1. Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing. | ||
|
||
## How it works | ||
|
||
The Cloudformation template creates 2 S3 buckets (source and destination buckets) along with a Lambda function (NodeJs) and an EventBridge event. The Lambda function is triggered by the EventBridge which listens to an object upload in the S3 bucket. The lambda function makes a DetectText API call and stores the output in the destination S3 bucket. | ||
|
||
## Testing | ||
|
||
Upload the file (document/image) to the input S3 <STACK_NAME>-input-bucket-<AWS_ACCOUNTID> bucket via the console or use the PutObject API call: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update the comment to replace |
||
|
||
``` | ||
aws s3api put-object --bucket your-bucket-name --key your-document.pdf --body /path/to/your/document.pdf | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. aws s3api put-object --bucket |
||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got the below error in the Lambda log while testing:
|
||
Replace the parameters in the above command appropriately. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add complete testing steps. What will the user test after uploading the file? What is to be validated. |
||
## Cleanup | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to add a step to clean up the buckets, |
||
1. Delete the stack | ||
```bash | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use |
||
aws cloudformation delete-stack --stack-name STACK_NAME | ||
``` | ||
1. Confirm the stack has been deleted | ||
```bash | ||
aws cloudformation list-stacks --query "StackSummaries[?contains(StackName,'STACK_NAME')].StackStatus" | ||
``` | ||
---- | ||
Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
|
||
SPDX-License-Identifier: MIT-0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
{ | ||
"title": "S3 to Textract using EventBridge ", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Amazon S3 to Amazon Textract using Amazon EventBridge |
||
"description": "SAM template for an S3 object upload to invoke a Lambda function through EventBridge that detects the text in a document through Amazon Textract", | ||
"language": "nodejs", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Node.js |
||
"level": "200", | ||
"framework": "SAM", | ||
"introBox": { | ||
"headline": "How it works", | ||
"text": [ | ||
"This pattern demonstrates how to creates two S3 buckets (source and destination) which when uploaded with an object invokes a Lambda function through EventBridge and detects the text in a document through Amazon Textract.", | ||
"Once a file is uploaded to an S3 bucket, it is listened by the EventBridge which further invokes the lambda function", | ||
"The lambda function writes the output of the textract text detection to another S3 bucket. " | ||
] | ||
}, | ||
"gitHub": { | ||
"template": { | ||
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/s3-eventbridge-lambda-textract-node", | ||
"templateURL": "serverless-patterns/s3-eventbridge-lambda-textract-node", | ||
"projectFolder": "s3-eventbridge-lambda-textract-node", | ||
"templateFile": "template.yaml" | ||
} | ||
}, | ||
"resources": { | ||
"bullets": [ | ||
{ | ||
"text": "Detecting text with an AWS lambda function", | ||
"link": "https://docs.aws.amazon.com/textract/latest/dg/lambda.html" | ||
}, | ||
{ | ||
"text": "Amazon Textract", | ||
"link": "https://docs.aws.amazon.com/textract/latest/dg/what-is.html" | ||
} | ||
] | ||
}, | ||
"deploy": { | ||
"text": [ | ||
"sam deploy" | ||
] | ||
}, | ||
"testing": { | ||
"text": [ | ||
"See the GitHub repo for detailed testing instructions." | ||
] | ||
}, | ||
"cleanup": { | ||
"text": [ | ||
"Delete the stack: <code>cdk delete</code>." | ||
] | ||
}, | ||
"authors": [ | ||
{ | ||
"name": "Abilashkumar P C", | ||
"image": "https://drive.google.com/file/d/1bxOh_WBw8J_xEqvT-qRezH8WXqSBPI24/view?usp=sharing", | ||
"bio": "Sr. Cloud Support Engineer @ AWS", | ||
"linkedin": "abilashkumar-p-c" | ||
} | ||
] | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3"; | ||
import { TextractClient, DetectDocumentTextCommand } from "@aws-sdk/client-textract"; | ||
|
||
const s3Client = new S3Client(); | ||
const textractClient = new TextractClient(); | ||
|
||
export const handler = async (event, context) => { | ||
// Extract bucket and key from the EventBridge event | ||
console.log(event) | ||
event.Records[0].s3.bucket.name; | ||
|
||
const bucket = event.Records[0].s3.bucket.name; | ||
const key = event.Records[0].s3.object.key; | ||
console.log(bucket); | ||
console.log(key); | ||
try { | ||
// Call Textract to detect document text | ||
const detectParams = { | ||
Document: { | ||
S3Object: { | ||
Bucket: bucket, | ||
Name: key | ||
} | ||
} | ||
}; | ||
const detectCommand = new DetectDocumentTextCommand(detectParams); | ||
const response = await textractClient.send(detectCommand); | ||
console.log(response); | ||
// Prepare the output key | ||
let outputKey = `textract-output-${key}`; | ||
outputKey = outputKey.substring(0, outputKey.lastIndexOf('.')) + '.json'; | ||
console.log(outputKey); | ||
|
||
// Write the Textract output to the output bucket | ||
const putParams = { | ||
Bucket: process.env.OUTPUT_BUCKET, | ||
Key: outputKey, | ||
Body: JSON.stringify(response) | ||
}; | ||
const putCommand = new PutObjectCommand(putParams); | ||
await s3Client.send(putCommand); | ||
|
||
return { | ||
statusCode: 200, | ||
body: JSON.stringify('Document processed successfully') | ||
}; | ||
} catch (error) { | ||
console.error('Error:', error); | ||
return { | ||
statusCode: 500, | ||
body: JSON.stringify('Error processing document') | ||
}; | ||
} | ||
}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
AWSTemplateFormatVersion: '2010-09-09' | ||
Transform: AWS::Serverless-2016-10-31 | ||
Description: 'SAM template for S3 trigger to Lambda for Textract document detection with EventBridge using NodeJs' | ||
|
||
Resources: | ||
# Input S3 bucket | ||
InputBucket: | ||
Type: AWS::S3::Bucket | ||
Properties: | ||
BucketName: !Sub '${AWS::StackName}-input-bucket-${AWS::AccountId}' | ||
NotificationConfiguration: | ||
EventBridgeConfiguration: | ||
EventBridgeEnabled: true | ||
|
||
# Output S3 bucket | ||
OutputBucket: | ||
Type: AWS::S3::Bucket | ||
Properties: | ||
BucketName: !Sub '${AWS::StackName}-output-bucket-${AWS::AccountId}' | ||
|
||
# Lambda function | ||
TextractFunction: | ||
Type: AWS::Serverless::Function | ||
Properties: | ||
FunctionName: !Sub '${AWS::StackName}-textract-function' | ||
Handler: index.handler | ||
Runtime: nodejs18.x | ||
Timeout: 60 | ||
Environment: | ||
Variables: | ||
OUTPUT_BUCKET: !Ref OutputBucket | ||
Policies: | ||
- S3ReadPolicy: | ||
BucketName: !Ref InputBucket | ||
- S3WritePolicy: | ||
BucketName: !Ref OutputBucket | ||
- Statement: | ||
- Effect: Allow | ||
Action: | ||
- textract:DetectDocumentText | ||
Resource: '*' | ||
CodeUri: src/ | ||
|
||
# EventBridge Rule | ||
S3ObjectCreatedRule: | ||
Type: AWS::Events::Rule | ||
Properties: | ||
Description: "Rule to capture S3 object created events" | ||
EventPattern: | ||
source: | ||
- aws.s3 | ||
detail-type: | ||
- Object Created | ||
detail: | ||
bucket: | ||
name: | ||
- !Ref InputBucket | ||
State: "ENABLED" | ||
Targets: | ||
- Arn: !GetAtt TextractFunction.Arn | ||
Id: "TextractFunctionTarget" | ||
|
||
# Permission for EventBridge to invoke Lambda | ||
TextractFunctionPermission: | ||
Type: AWS::Lambda::Permission | ||
Properties: | ||
FunctionName: !Ref TextractFunction | ||
Action: "lambda:InvokeFunction" | ||
Principal: "events.amazonaws.com" | ||
SourceArn: !GetAtt S3ObjectCreatedRule.Arn | ||
|
||
Outputs: | ||
InputBucketName: | ||
Description: 'Name of the input S3 bucket' | ||
Value: !Ref InputBucket | ||
OutputBucketName: | ||
Description: 'Name of the output S3 bucket' | ||
Value: !Ref OutputBucket | ||
TextractFunctionName: | ||
Description: 'Name of the Textract Lambda function' | ||
Value: !Ref TextractFunction | ||
TextractFunctionArn: | ||
Description: 'ARN of the Textract Lambda function' | ||
Value: !GetAtt TextractFunction.Arn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazon EventBridge