Skip to content

New serverless pattern - apigw-lambda-transcribe #2713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions apigw-lambda-transcribe/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Subtitle generation with AWS Lambda and Amazon Transcribe

Using this sample pattern, users can securely upload videos to an Amazon S3 bucket by requesting a pre-signed URL through Amazon API Gateway. This URL allows secure and temporary access for uploading files directly to S3.

Once a video file is uploaded, an S3 event invokes another Lambda function to start the Transcribe job using the StartTranscriptionJob API. Once the transcription is completed, the generated subtitles will be stored in the output S3 bucket.

Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns/apigw-lambda-transcribe

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

## Requirements

* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
* [Terraform](https://learn.hashicorp.cxom/tutorials/terraform/install-cli?in=terraform/aws-get-started) installed

## Deployment Instructions

1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```
git clone https://github.com/aws-samples/serverless-patterns
```
1. Change directory to the pattern directory:
```
cd apigw-lambda-transcribe
```
1. From the command line, initialize terraform to downloads and installs the providers defined in the configuration:
```
terraform init
```
1. From the command line, apply the configuration in the main.tf file:
```
terraform apply
```
1. During the prompts
```
#var.prefix
- Enter a value: {enter any prefix to associate with resources}

#var.region
- Enter a value: {enter the region for deployment}
```

## Testing

1. Make a POST request to the API using the following cURL command:

```
curl --location 'API_ENDPOINT' --header 'Content-Type: application/json' --data '{"object_name": "video.mp4", "content_type": "video/mp4"}'
```

Note: Replace `API_ENDPOINT` with the generated `api_endpoint` from Terraform (refer to the Terraform Outputs section) `object_name` with your desired name for the S3 object and `content_type` with the content type of the video, for ex, mp4.

1. Get the pre-signed URL from the previous step and use the following cURL command to upload the object in S3:

```
curl -v --location -T "video.mp4" 'PRESIGNED_URL' --header 'Content-Type: video/mp4'
```

Note: Replace `PRESIGNED_URL` with pre-signed URL generated in the previous step. `Content-Type` should match the content type used to generate the pre-signed URL in the previous step.

Once this command is run successfully and the object is uploaded, HTTP 200 OK should be seen. You can also check the S3 bucket to see if the object is uploaded correctly.

1. Once the object is uploaded successfully, the `process_s3_event` Lambda function is invoked. Lambda function will then invoke the `StartTranscriptionJob` API and Amazon Transcribe will upload the transcribed output to the output S3 bucket (Refer to the Terraform Outputs section under `output_bucket_name`).

## Cleanup

1. Delete the Transcription jobs:
Go to Transcribe > Transcription jobs > Select your transcription jobs and choose Delete

1. Change directory to the pattern directory:
```
cd serverless-patterns/apigw-lambda-transcribe
```

1. Delete all created resources
```
terraform destroy
```

1. During the prompts:
```
Enter all details as entered during creation.
```

1. Confirm all created resources has been deleted
```
terraform show
```
----
Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved.

SPDX-License-Identifier: MIT-0
Binary file added apigw-lambda-transcribe/Transcribe.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
91 changes: 91 additions & 0 deletions apigw-lambda-transcribe/apigw-lambda-transcribe.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
{
"title": "Subtitle generation using AWS API Gateway and AWS Lambda",
"description": "This pattern creates an AWS Lambda function which will invoke Amazon Transcribe for speech-to-text conversion, and stores results in Amazon S3",
"language": "Python",
"level": "200",
"framework": "Terraform",
"introBox": {
"headline": "How it works",
"text": [
"This sample pattern is an automated serverless solution for subtitle generation using AWS services. This system securely handles video file uploads via pre-signed URLs, automatically triggers Amazon Transcribe for speech-to-text conversion, and stores results in S3."
]
},
"gitHub": {
"template": {
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/apigw-lambda-transcribe",
"templateURL": "serverless-patterns/apigw-lambda-transcribe",
"projectFolder": "apigw-lambda-transcribe",
"templateFile": "main.tf"
}
},
"resources": {
"bullets": [
{
"text": "Uploading objects with presigned URLs",
"link": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html"
},
{
"text": "StartTranscriptionJob",
"link": "https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartTranscriptionJob.html"
}
]
},
"deploy": {
"text": ["terraform init", "terraform apply"]
},
"testing": {
"text": ["See the GitHub repo for detailed testing instructions."]
},
"cleanup": {
"text": ["terraform destroy", "terraform show"]
},
"authors": [
{
"name": "Archana V",
"image": "https://media.licdn.com/dms/image/v2/D5603AQGhkVtEhllFEw/profile-displayphoto-shrink_400_400/B56ZZH3LL6H0Ag-/0/1744962369913?e=1750291200&v=beta&t=R0hX6jzWC03OyoWKvYJ0jDDTuPocobPSy0lAJY-3XfA",
"bio": "Solutions Architect at AWS",
"linkedin": "archana-venkat-9b80b7184"
}
],
"patternArch": {
"icon1": {
"x": 15,
"y": 50,
"service": "s3",
"label": "Amazon S3"
},
"icon2": {
"x": 40,
"y": 50,
"service": "lambda",
"label": "AWS Lambda"
},
"icon3": {
"x": 65,
"y": 50,
"service": "transcribe",
"label": "Amazon Transcribe"
},
"icon4": {
"x": 90,
"y": 50,
"service": "s3",
"label": "Amazon S3"
},
"line1": {
"from": "icon1",
"to": "icon2",
"label": ""
},
"line2": {
"from": "icon2",
"to": "icon3",
"label": ""
},
"line3": {
"from": "icon3",
"to": "icon4",
"label": ""
}
}
}
58 changes: 58 additions & 0 deletions apigw-lambda-transcribe/example-pattern.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
{
"title": "Subtitle generation using AWS API Gateway and AWS Lambda",
"description": "This pattern creates an AWS Lambda function which will invoke Amazon Transcribe for speech-to-text conversion, and stores results in Amazon S3",
"language": "Python",
"level": "200",
"framework": "Terraform",
"introBox": {
"headline": "How it works",
"text": [
"This sample pattern is an automated serverless solution for subtitle generation using AWS services. This system securely handles video file uploads via pre-signed URLs, automatically triggers Amazon Transcribe for speech-to-text conversion, and stores results in S3."
]
},
"gitHub": {
"template": {
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/apigw-lambda-transcribe",
"templateURL": "serverless-patterns/apigw-lambda-transcribe",
"projectFolder": "apigw-lambda-transcribe",
"templateFile": "main.tf"
}
},
"resources": {
"bullets": [
{
"text": "Uploading objects with presigned URLs",
"link": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html"
},
{
"text": "StartTranscriptionJob",
"link": "https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartTranscriptionJob.html"
}
]
},
"deploy": {
"text": [
"terraform init",
"terraform apply"
]
},
"testing": {
"text": [
"See the GitHub repo for detailed testing instructions."
]
},
"cleanup": {
"text": [
"terraform destroy",
"terraform show"
]
},
"authors": [
{
"name": "Archana V",
"image": "https://media.licdn.com/dms/image/v2/D5603AQGhkVtEhllFEw/profile-displayphoto-shrink_400_400/B56ZZH3LL6H0Ag-/0/1744962369913?e=1750291200&v=beta&t=R0hX6jzWC03OyoWKvYJ0jDDTuPocobPSy0lAJY-3XfA",
"bio": "Solutions Architect at AWS",
"linkedin": "archana-venkat-9b80b7184"
}
]
}
Binary file added apigw-lambda-transcribe/generate_presigned_url.zip
Binary file not shown.
Loading