Skip to content

new serverless pattern - s3-eventbridge-lambda-textract-node #2562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions s3-eventbridge-lambda-textract-node/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Amazon S3 to Amazon Textract through AWS EventBridge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon EventBridge


This pattern demonstrates how to create an S3 bucket which when uploaded with an object invokes a Lambda function through EventBridge and detects the text in a document through Amazon Textract. The lambda function code uses NodeJs runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase this.
Ref: "This pattern demonstrates how to create an S3 bucket"
This pattern is not demonstrating how to create S3 bucket.


Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns/s3-eventbridge-lambda-textract-node

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

## Requirements

* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
* [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed

## Deployment Instructions

1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```
git clone https://github.com/aws-samples/serverless-patterns
```
1. Change directory to the pattern directory:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase the sequence 1, 2, 3...

```
cd s3-eventbridge-lambda-textract-node
```
1. From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file:
```
sam deploy --guided
```
1. During the prompts:
* Enter a stack name
* Enter the desired AWS Region
* Allow SAM CLI to create IAM roles with the required permissions.

Once you have run `sam deploy --guided` mode once and saved arguments to a configuration file (samconfig.toml), you can use `sam deploy` in future to use these defaults.

1. Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing.

## How it works

The Cloudformation template creates 2 S3 buckets (source and destination buckets) along with a Lambda function (NodeJs) and an EventBridge event. The Lambda function is triggered by the EventBridge which listens to an object upload in the S3 bucket. The lambda function makes a DetectText API call and stores the output in the destination S3 bucket.

## Testing

Upload the file (document/image) to the input S3 <STACK_NAME>-input-bucket-<AWS_ACCOUNTID> bucket via the console or use the PutObject API call:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment to replace InputBucketName from the output from sam deploy command.


```
aws s3api put-object --bucket your-bucket-name --key your-document.pdf --body /path/to/your/document.pdf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws s3api put-object --bucket InputBucketName --key your-document.pdf --body /path/to/your/document.pdf

```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the below error in the Lambda log while testing:

2025-02-14T07:05:16.220Z 426f36aa-3f5a-4189-9123-9d5e2f0784b7 ERROR Invoke Error { "errorType": "TypeError", "errorMessage": "Cannot read properties of undefined (reading '0')", "stack": [ "TypeError: Cannot read properties of undefined (reading '0')", " at Runtime.handler (file:///var/task/index.mjs:10:18)", " at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1173:29)" ] }

Replace the parameters in the above command appropriately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add complete testing steps. What will the user test after uploading the file? What is to be validated.

## Cleanup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a step to clean up the buckets,

1. Delete the stack
```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use sam delete instead

aws cloudformation delete-stack --stack-name STACK_NAME
```
1. Confirm the stack has been deleted
```bash
aws cloudformation list-stacks --query "StackSummaries[?contains(StackName,'STACK_NAME')].StackStatus"
```
----
Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.

SPDX-License-Identifier: MIT-0
59 changes: 59 additions & 0 deletions s3-eventbridge-lambda-textract-node/example-pattern.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
{
"title": "S3 to Textract using EventBridge ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon S3 to Amazon Textract using Amazon EventBridge

"description": "SAM template for an S3 object upload to invoke a Lambda function through EventBridge that detects the text in a document through Amazon Textract",
"language": "nodejs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node.js

"level": "200",
"framework": "SAM",
"introBox": {
"headline": "How it works",
"text": [
"This pattern demonstrates how to creates two S3 buckets (source and destination) which when uploaded with an object invokes a Lambda function through EventBridge and detects the text in a document through Amazon Textract.",
"Once a file is uploaded to an S3 bucket, it is listened by the EventBridge which further invokes the lambda function",
"The lambda function writes the output of the textract text detection to another S3 bucket. "
]
},
"gitHub": {
"template": {
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/s3-eventbridge-lambda-textract-node",
"templateURL": "serverless-patterns/s3-eventbridge-lambda-textract-node",
"projectFolder": "s3-eventbridge-lambda-textract-node",
"templateFile": "template.yaml"
}
},
"resources": {
"bullets": [
{
"text": "Detecting text with an AWS lambda function",
"link": "https://docs.aws.amazon.com/textract/latest/dg/lambda.html"
},
{
"text": "Amazon Textract",
"link": "https://docs.aws.amazon.com/textract/latest/dg/what-is.html"
}
]
},
"deploy": {
"text": [
"sam deploy"
]
},
"testing": {
"text": [
"See the GitHub repo for detailed testing instructions."
]
},
"cleanup": {
"text": [
"Delete the stack: <code>cdk delete</code>."
]
},
"authors": [
{
"name": "Abilashkumar P C",
"image": "https://drive.google.com/file/d/1bxOh_WBw8J_xEqvT-qRezH8WXqSBPI24/view?usp=sharing",
"bio": "Sr. Cloud Support Engineer @ AWS",
"linkedin": "abilashkumar-p-c"
}
]
}

54 changes: 54 additions & 0 deletions s3-eventbridge-lambda-textract-node/src/index.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
import { TextractClient, DetectDocumentTextCommand } from "@aws-sdk/client-textract";

const s3Client = new S3Client();
const textractClient = new TextractClient();

export const handler = async (event, context) => {
// Extract bucket and key from the EventBridge event
console.log(event)
event.Records[0].s3.bucket.name;

const bucket = event.Records[0].s3.bucket.name;
const key = event.Records[0].s3.object.key;
console.log(bucket);
console.log(key);
try {
// Call Textract to detect document text
const detectParams = {
Document: {
S3Object: {
Bucket: bucket,
Name: key
}
}
};
const detectCommand = new DetectDocumentTextCommand(detectParams);
const response = await textractClient.send(detectCommand);
console.log(response);
// Prepare the output key
let outputKey = `textract-output-${key}`;
outputKey = outputKey.substring(0, outputKey.lastIndexOf('.')) + '.json';
console.log(outputKey);

// Write the Textract output to the output bucket
const putParams = {
Bucket: process.env.OUTPUT_BUCKET,
Key: outputKey,
Body: JSON.stringify(response)
};
const putCommand = new PutObjectCommand(putParams);
await s3Client.send(putCommand);

return {
statusCode: 200,
body: JSON.stringify('Document processed successfully')
};
} catch (error) {
console.error('Error:', error);
return {
statusCode: 500,
body: JSON.stringify('Error processing document')
};
}
};
84 changes: 84 additions & 0 deletions s3-eventbridge-lambda-textract-node/template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: 'SAM template for S3 trigger to Lambda for Textract document detection with EventBridge using NodeJs'

Resources:
# Input S3 bucket
InputBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub '${AWS::StackName}-input-bucket-${AWS::AccountId}'
NotificationConfiguration:
EventBridgeConfiguration:
EventBridgeEnabled: true

# Output S3 bucket
OutputBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub '${AWS::StackName}-output-bucket-${AWS::AccountId}'

# Lambda function
TextractFunction:
Type: AWS::Serverless::Function
Properties:
FunctionName: !Sub '${AWS::StackName}-textract-function'
Handler: index.handler
Runtime: nodejs18.x
Timeout: 60
Environment:
Variables:
OUTPUT_BUCKET: !Ref OutputBucket
Policies:
- S3ReadPolicy:
BucketName: !Ref InputBucket
- S3WritePolicy:
BucketName: !Ref OutputBucket
- Statement:
- Effect: Allow
Action:
- textract:DetectDocumentText
Resource: '*'
CodeUri: src/

# EventBridge Rule
S3ObjectCreatedRule:
Type: AWS::Events::Rule
Properties:
Description: "Rule to capture S3 object created events"
EventPattern:
source:
- aws.s3
detail-type:
- Object Created
detail:
bucket:
name:
- !Ref InputBucket
State: "ENABLED"
Targets:
- Arn: !GetAtt TextractFunction.Arn
Id: "TextractFunctionTarget"

# Permission for EventBridge to invoke Lambda
TextractFunctionPermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref TextractFunction
Action: "lambda:InvokeFunction"
Principal: "events.amazonaws.com"
SourceArn: !GetAtt S3ObjectCreatedRule.Arn

Outputs:
InputBucketName:
Description: 'Name of the input S3 bucket'
Value: !Ref InputBucket
OutputBucketName:
Description: 'Name of the output S3 bucket'
Value: !Ref OutputBucket
TextractFunctionName:
Description: 'Name of the Textract Lambda function'
Value: !Ref TextractFunction
TextractFunctionArn:
Description: 'ARN of the Textract Lambda function'
Value: !GetAtt TextractFunction.Arn