diff --git a/lambda-sqs-best-practices-cdk/README.md b/lambda-sqs-best-practices-cdk/README.md new file mode 100644 index 000000000..09eeae018 --- /dev/null +++ b/lambda-sqs-best-practices-cdk/README.md @@ -0,0 +1,183 @@ +# Lambda SQS Best Practices with AWS CDK + +This pattern demonstrates a production-ready implementation of AWS Lambda processing messages from Amazon SQS using AWS CDK. It serves as a reference architecture for building robust, observable, and maintainable serverless applications, featuring AWS Lambda Powertools integration for enhanced observability through structured logging, custom metrics, and distributed tracing with X-Ray. The pattern implements comprehensive error handling with automatic retries and Dead Letter Queue (DLQ) configuration, along with a detailed CloudWatch Dashboard for operational monitoring. Security is enforced through least privilege IAM roles, while operational excellence is maintained through proper resource configurations and cost optimizations. This enterprise-grade solution includes batch message processing, configurable timeouts, message validation, and a complete monitoring strategy, making it ideal for teams building production serverless applications that require high reliability, observability, and maintainability. + + +Architecture + +Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. + +## Requirements + +* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. +* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured +* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) +* [Node.js 20 or greater](https://nodejs.org/en/download/) installed +* [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) installed + +## Deployment Instructions + +1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: + ``` + git clone https://github.com/aws-samples/serverless-patterns + ``` +1. Change directory to the pattern directory: + ``` + cd serverless-patterns/lambda-sqs-best-practices-cdk + ``` + +1. Install cdk dependencies + ``` + npm install + ``` + +1. Install lambda dependencies + ``` + cd lambda + npm install + ``` + +1. Deploy cdk stack + ``` + cd .. + cdk deploy + + ``` + +Note: If you are using CDK for the first time then bootstrap CDK in your account by using below command: + +``` +cdk bootstrap aws://ACCOUNT-NUMBER-1/REGION-1 + +``` + +## How it works + +This pattern sets up: + +1. An SQS queue with a Dead Letter Queue (DLQ) for failed message handling +2. A Lambda function with: + - AWS Lambda Powertools integration + - Structured logging + - Custom metrics + - X-Ray tracing +3. A CloudWatch Dashboard for operational monitoring +4. Least priviledge permissions implemented on roles and policies +Architecture +[ ensured by implemeting individual inline policies with only required permissions added to role ] + + +The Lambda function: +- Processes messages in batches +- Validates message format +- Simulates downstream API calls with random failures (5% failure rate) +- Demonstrates handling of external service dependencies +- Handles errors gracefully +- Reports metrics and traces +- Uses structured logging + +Failed messages are: +- Logged with error details +- Sent to DLQ after 3 retries +- Monitored via CloudWatch metrics + +## Testing + +The pattern includes a load testing script to verify functionality: + +1. Set the Queue URL environment variable: +``` +export QUEUE_URL=$(aws cloudformation describe-stacks --stack-name LambdaSqsBestPracticesCdkStack --query 'Stacks[0].Outputs[?OutputKey==`QueueUrl`].OutputValue' --output text) + +export AWS_REGION=us-east-1 # or your AWS region +``` + +2. Rum test script +``` +npm run test # 100 messages +npm run test:small # 50 messages +npm run test:medium # 200 messages +npm run test:large # 500 messages + +``` + + +## Monitoring Guide + +Locating Resources + +``` +1. Navigate to AWS CloudFormation Console +2. Select the stack "LambdaSqsBestPracticesCdkStack" +3. Go to the "Resources" tab +4. Here you can find: + - All resources created by the stack + - Direct links to each resource's console + - Resource physical IDs and types + - Current status of each resource +``` + +CloudWatch Logs + +``` +1. Navigate to CloudWatch Console > Log Groups +2. Find /aws/lambda/BatchProcessingLambdaFunction +3. View structured logs with: + * Batch processing information + * Error details +``` + +Example walkthrough on structured logging for a batch : +1. Batch information before starting processing +Architecture + +2. Success information +Architecture + +3. Error information of failure +Architecture + +4. Batch processing info +Architecture + +5. Failed items returned back to queue for reprocessing +Architecture + +6. Failed Item retried [note messageID and time for retry] +Architecture + +7. Additionally, in case of failed retries/poison pill +Architecture + +Message in moved to DLQ +Architecture + +Custom tracing can be used as well to get quick information on batch processing +Architecture + + +Metrics Dashboard + +``` +1. Go to CloudWatch > Dashboards +2. Find the dashboard “SQS-Processing-Dashboard” +3. Monitor: + * Message processing success rate + * Batch size and processing time + * Error rates + * Monitor Queue metrics to understand Source queue depth, processing speed of messages in queue and DLQ message count + * Lambda performance including duration + +``` + + +Architecture + +## Cleanup + +To remove all deployed resources: + +``` +cdk destroy +``` + diff --git a/lambda-sqs-best-practices-cdk/bin/lambda-sqs-best-practices-cdk.js b/lambda-sqs-best-practices-cdk/bin/lambda-sqs-best-practices-cdk.js new file mode 100644 index 000000000..33e726209 --- /dev/null +++ b/lambda-sqs-best-practices-cdk/bin/lambda-sqs-best-practices-cdk.js @@ -0,0 +1,6 @@ +#!/usr/bin/env node +const cdk = require('aws-cdk-lib'); +const { LambdaSqsBestPracticesCdkStack } = require('../lib/lambda-sqs-best-practices-cdk-stack'); + +const app = new cdk.App(); +new LambdaSqsBestPracticesCdkStack(app, 'LambdaSqsBestPracticesCdkStack', {}); \ No newline at end of file diff --git a/lambda-sqs-best-practices-cdk/cdk.json b/lambda-sqs-best-practices-cdk/cdk.json new file mode 100644 index 000000000..b081d10b0 --- /dev/null +++ b/lambda-sqs-best-practices-cdk/cdk.json @@ -0,0 +1,88 @@ +{ + "app": "node bin/lambda-sqs-best-practices-cdk.js", + "watch": { + "include": [ + "**" + ], + "exclude": [ + "README.md", + "cdk*.json", + "jest.config.js", + "package*.json", + "yarn.lock", + "node_modules", + "test" + ] + }, + "context": { + "@aws-cdk/aws-lambda:recognizeLayerVersion": true, + "@aws-cdk/core:checkSecretUsage": true, + "@aws-cdk/core:target-partitions": [ + "aws", + "aws-cn" + ], + "@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true, + "@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true, + "@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true, + "@aws-cdk/aws-iam:minimizePolicies": true, + "@aws-cdk/core:validateSnapshotRemovalPolicy": true, + "@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true, + "@aws-cdk/aws-s3:createDefaultLoggingPolicy": true, + "@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true, + "@aws-cdk/aws-apigateway:disableCloudWatchRole": true, + "@aws-cdk/core:enablePartitionLiterals": true, + "@aws-cdk/aws-events:eventsTargetQueueSameAccount": true, + "@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true, + "@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true, + "@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true, + "@aws-cdk/aws-route53-patters:useCertificate": true, + "@aws-cdk/customresources:installLatestAwsSdkDefault": false, + "@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true, + "@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true, + "@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true, + "@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true, + "@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true, + "@aws-cdk/aws-redshift:columnId": true, + "@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true, + "@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true, + "@aws-cdk/aws-apigateway:requestValidatorUniqueId": true, + "@aws-cdk/aws-kms:aliasNameRef": true, + "@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true, + "@aws-cdk/core:includePrefixInUniqueNameGeneration": true, + "@aws-cdk/aws-efs:denyAnonymousAccess": true, + "@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true, + "@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true, + "@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true, + "@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true, + "@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true, + "@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true, + "@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true, + "@aws-cdk/aws-cloudwatch-actions:changeLambdaPermissionLogicalIdForLambdaAction": true, + "@aws-cdk/aws-codepipeline:crossAccountKeysDefaultValueToFalse": true, + "@aws-cdk/aws-codepipeline:defaultPipelineTypeToV2": true, + "@aws-cdk/aws-kms:reduceCrossAccountRegionPolicyScope": true, + "@aws-cdk/aws-eks:nodegroupNameAttribute": true, + "@aws-cdk/aws-ec2:ebsDefaultGp3Volume": true, + "@aws-cdk/aws-ecs:removeDefaultDeploymentAlarm": true, + "@aws-cdk/custom-resources:logApiResponseDataPropertyTrueDefault": false, + "@aws-cdk/aws-s3:keepNotificationInImportedBucket": false, + "@aws-cdk/aws-ecs:enableImdsBlockingDeprecatedFeature": false, + "@aws-cdk/aws-ecs:disableEcsImdsBlocking": true, + "@aws-cdk/aws-ecs:reduceEc2FargateCloudWatchPermissions": true, + "@aws-cdk/aws-dynamodb:resourcePolicyPerReplica": true, + "@aws-cdk/aws-ec2:ec2SumTImeoutEnabled": true, + "@aws-cdk/aws-appsync:appSyncGraphQLAPIScopeLambdaPermission": true, + "@aws-cdk/aws-rds:setCorrectValueForDatabaseInstanceReadReplicaInstanceResourceId": true, + "@aws-cdk/core:cfnIncludeRejectComplexResourceUpdateCreatePolicyIntrinsics": true, + "@aws-cdk/aws-lambda-nodejs:sdkV3ExcludeSmithyPackages": true, + "@aws-cdk/aws-stepfunctions-tasks:fixRunEcsTaskPolicy": true, + "@aws-cdk/aws-ec2:bastionHostUseAmazonLinux2023ByDefault": true, + "@aws-cdk/aws-route53-targets:userPoolDomainNameMethodWithoutCustomResource": true, + "@aws-cdk/aws-elasticloadbalancingV2:albDualstackWithoutPublicIpv4SecurityGroupRulesDefault": true, + "@aws-cdk/aws-iam:oidcRejectUnauthorizedConnections": true, + "@aws-cdk/core:enableAdditionalMetadataCollection": true, + "@aws-cdk/aws-lambda:createNewPoliciesWithAddToRolePolicy": true, + "@aws-cdk/aws-s3:setUniqueReplicationRoleName": true, + "@aws-cdk/aws-events:requireEventBusPolicySid": true + } +} diff --git a/lambda-sqs-best-practices-cdk/example-pattern.json b/lambda-sqs-best-practices-cdk/example-pattern.json new file mode 100644 index 000000000..9b5cbb254 --- /dev/null +++ b/lambda-sqs-best-practices-cdk/example-pattern.json @@ -0,0 +1,98 @@ +{ + "title": "Lambda SQS best practices with Powertools", + "description": "Creates a Lambda function with SQS trigger implementing best practices using AWS Lambda Powertools, including structured logging, metrics, and tracing", + "language": "nodejs", + "level": "200", + "framework": "AWS CDK", + "introBox": { + "headline": "How it works", + "text": [ + "This pattern demonstrates how to implement a Lambda function with SQS trigger using AWS best practices. It includes AWS Lambda Powertools for structured logging, metrics, and tracing, along with proper error handling and dead-letter queue configuration.", + "The pattern implements message validation, batch processing with partial failures, and comprehensive operational monitoring through CloudWatch dashboards.", + "The implementation includes load testing capabilities to verify the system's behavior under different scenarios including invalid messages and error conditions.", + "This pattern deploys an SQS Queue, Dead Letter Queue, Lambda function, CloudWatch Dashboard, and necessary IAM roles and permissions." + ] + }, + "gitHub": { + "template": { + "repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/lambda-sqs-best-practices-cdk", + "templateURL": "serverless-patterns/lambda-sqs-best-practices-cdk", + "projectFolder": "lambda-sqs-best-practices-cdk", + "templateFile": "lib/lambda-sqs-best-practices-cdk-stack.js" + } + }, + "resources": { + "bullets": [ + { + "text": "AWS Lambda Powertools for Node.js", + "link": "https://www.npmjs.com/org/aws-lambda-powertools" + }, + { + "text": "AWS Lambda with SQS", + "link": "https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html" + }, + { + "text": "SQS Dead Letter Queues", + "link": "https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html" + }, + { + "text": "Lambda Function Scaling with SQS", + "link": "https://docs.aws.amazon.com/lambda/latest/dg/services-sqs-scaling.html" + } + ] + }, + "deploy": { + "text": [ + "git clone https://github.com/aws-samples/serverless-patterns", + "cd serverless-patterns/lambda-sqs-best-practices-cdk", + "Install cdk dependencies", + "npm install", + "Install lambda dependencies", + "cd lambda", + "npm install", + "cd .." + "cdk deploy" + ] + }, + "testing": { + "text": [ + "After deployment, you can test the pattern using the included load test script:", + "export QUEUE_URL=$(aws cloudformation describe-stacks --stack-name LambdaSqsBestPracticesCdkStack --query 'Stacks[0].Outputs[?OutputKey==`QueueUrl`].OutputValue' --output text)", + "export AWS_REGION=us-east-1 # or your AWS region", + "npm run test # Sends 100 messages", + "npm run test:small # Sends 50 messages", + "npm run test:medium # Sends 200 messages", + "npm run test:large # Sends 500 messages", + "Monitor the results in the CloudWatch Dashboard created by the stack." + ] + }, + "cleanup": { + "text": [ + "Delete the stack: cdk destroy" + ] + }, + "authors": [ + { + "name": "Shubham More", + "image": "https://avatars.githubusercontent.com/u/150242047?s=400&u=aaa2db07529d3e1e82ec59daacb05b6abc7c3a5f&v=4", + "bio": "Cloud Support Engineer II - SVLS", + "linkedin": "shubham-more-1b6aa185" + }, + { + "name": "Umang Aggarwal", + "image": "https://avatars.githubusercontent.com/u/75968399?v=4&size=64", + "bio": "Cloud Support Engineer II - SVLS", + "linkedin": "umangaggarwal" + }, + { + "name": "Vaibhav Jain", + "bio": "AWS - Sr. Application Architect", + "linkedin": "https://www.linkedin.com/in/vaibhavjainv/" + }, + { + "name": "Adam Wagner", + "bio": "AWS - Principal Serverless Solutions Architect", + "linkedin": "https://www.linkedin.com/in/adam-wagner-4bb412/" + } + ] +} diff --git a/lambda-sqs-best-practices-cdk/lambda/index.js b/lambda-sqs-best-practices-cdk/lambda/index.js new file mode 100644 index 000000000..7e1cbe84d --- /dev/null +++ b/lambda-sqs-best-practices-cdk/lambda/index.js @@ -0,0 +1,145 @@ +const { Logger } = require('@aws-lambda-powertools/logger'); +const { Metrics } = require('@aws-lambda-powertools/metrics'); +const { Tracer } = require('@aws-lambda-powertools/tracer'); +const AWS = require('aws-sdk'); + +// Initialize Powertools +const logger = new Logger({ + serviceName: process.env.POWERTOOLS_SERVICE_NAME, + logLevel: 'INFO' +}); + +const metrics = new Metrics({ + namespace: process.env.POWERTOOLS_METRICS_NAMESPACE, + serviceName: process.env.POWERTOOLS_SERVICE_NAME +}); + +const tracer = new Tracer({ + serviceName: process.env.POWERTOOLS_SERVICE_NAME +}); + +// Simulate external API call +const callExternalAPI = async (data) => { + // Simulate API latency + await new Promise(resolve => setTimeout(resolve, 200)); + + // Randomly fail (5% of the time) + if (Math.random() < 0.05) { + throw new Error('External API failed'); + } + + return { success: true, data }; +}; + +// Capture AWS SDK +const awsSDK = tracer.captureAWS(AWS); + +const processMessage = async (message, messageId) => { + try { + logger.info('Processing message', { messageId, message }); + + + // Validate message + if (typeof message !== 'object' || !message.id || !message.data) { + // Simulating poison pill scenario + throw new Error('Invalid message format'); + } + + try { + // Call external API + await callExternalAPI(message.data); + + metrics.addMetric('SuccessfulMessages', 'Count', 1); + logger.info('Message processed successfully', { messageId }); + return true; + } catch (apiError) { + // Handle API failure + logger.error('External API call failed', { + messageId, + error: apiError.message + }); + metrics.addMetric('FailedMessages', 'Count', 1); + return false; + } + } catch (error) { + // Handle other errors + logger.error('Failed to process message', { + messageId, + error: error.message + }); + metrics.addMetric('FailedMessages', 'Count', 1); + return false; + } +}; + +const parseMessage = (body) => { + try { + return JSON.parse(body); + } catch (error) { + throw new Error('Invalid JSON format'); + } +}; + +const handler = async (event) => { + const batchStartTime = Date.now(); + const batchItemFailures = []; + + //Creating subsegment for Batchprocessing annotations + const subsegment = tracer.getSegment().addNewSubsegment('BatchProcessing'); + + try { + subsegment.addAnnotation('batchSize', event.Records.length); + logger.info('Starting batch processing', { + batchSize: event.Records.length + }); + + metrics.addMetric('BatchSize', 'Count', event.Records.length); + + for (const record of event.Records) { + try { + const message = parseMessage(record.body); + const success = await processMessage(message, record.messageId); + + if (!success) { + batchItemFailures.push({ itemIdentifier: record.messageId }); + } + } catch (error) { + logger.error('Record processing failed', { + messageId: record.messageId, + error: error.message + }); + batchItemFailures.push({ itemIdentifier: record.messageId }); + } + } + + const batchProcessingTime = Date.now() - batchStartTime; + + // Final batch processing result + const batch_results = { + total: event.Records.length, + failed: batchItemFailures.length, + succeeded: event.Records.length - batchItemFailures.length + }; + + // Adding batch processing annotations + subsegment.addAnnotation('processedMessages', batch_results.total); + subsegment.addAnnotation('failedMessages', batch_results.failed); + + logger.info('Batch processing completed', batch_results); + metrics.addMetric('BatchProcessingTime', 'Milliseconds', batchProcessingTime); + metrics.publishStoredMetrics(); + + if (batchItemFailures.length > 0) { + logger.info('Batch Items returned on failure :', batchItemFailures); + } + + return { batchItemFailures }; + } catch (error) { + logger.error('Batch processing error', { error: error.message }); + throw error; + } finally { + subsegment.close(); + } +}; + +exports.handler = handler; \ No newline at end of file diff --git a/lambda-sqs-best-practices-cdk/lambda/package.json b/lambda-sqs-best-practices-cdk/lambda/package.json new file mode 100644 index 000000000..559c7f13b --- /dev/null +++ b/lambda-sqs-best-practices-cdk/lambda/package.json @@ -0,0 +1,10 @@ +{ + "name": "lambda-function", + "version": "1.0.0", + "dependencies": { + "@aws-lambda-powertools/logger": "^1.17.0", + "@aws-lambda-powertools/metrics": "^1.17.0", + "@aws-lambda-powertools/tracer": "^1.17.0", + "aws-sdk": "^2.1531.0" + } +} \ No newline at end of file diff --git a/lambda-sqs-best-practices-cdk/lib/lambda-sqs-best-practices-cdk-stack.js b/lambda-sqs-best-practices-cdk/lib/lambda-sqs-best-practices-cdk-stack.js new file mode 100644 index 000000000..10493380f --- /dev/null +++ b/lambda-sqs-best-practices-cdk/lib/lambda-sqs-best-practices-cdk-stack.js @@ -0,0 +1,226 @@ +const { Stack, Duration, CfnOutput } = require('aws-cdk-lib'); +const lambda = require('aws-cdk-lib/aws-lambda'); +const sqs = require('aws-cdk-lib/aws-sqs'); +const lambdaEventSources = require('aws-cdk-lib/aws-lambda-event-sources'); +const cloudwatch = require('aws-cdk-lib/aws-cloudwatch'); +const iam = require('aws-cdk-lib/aws-iam'); +const path = require('path'); + +class LambdaSqsBestPracticesCdkStack extends Stack { + constructor(scope, id, props) { + super(scope, id, props); + + // Create DLQ + const dlq = new sqs.Queue(this, 'BatchProcessingDeadLetterQueue', { + queueName: 'batch-processing-dead-letter-queue', + retentionPeriod: Duration.days(14), + encryption: sqs.QueueEncryption.SQS_MANAGED, + enforceSSL: true + }); + + // Create main queue + const queue = new sqs.Queue(this, 'BatchProcessingSourceQueue', { + queueName: 'batch-processing-source-queue', + visibilityTimeout: Duration.seconds(30), + encryption: sqs.QueueEncryption.SQS_MANAGED, + enforceSSL: true, + deadLetterQueue: { + queue: dlq, + maxReceiveCount: 3 + } + }); + + // Create Lambda function + const mainLambdaFunction = new lambda.Function(this, 'BatchProcessingLambdaFunction', { + functionName: 'BatchProcessingLambdaFunction', + runtime: lambda.Runtime.NODEJS_20_X, + handler: 'index.handler', + code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')), + environment: { + POWERTOOLS_SERVICE_NAME: 'sqs-processor', + POWERTOOLS_METRICS_NAMESPACE: 'SQSProcessor', + LOG_LEVEL: 'INFO', + ENVIRONMENT: props?.environment || 'development', + POWERTOOLS_TRACER_CAPTURE_RESPONSE: 'true', + POWERTOOLS_TRACER_CAPTURE_ERROR: 'true', + AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1' + }, + timeout: Duration.seconds(5), + memorySize: 256, + tracing: lambda.Tracing.ACTIVE + }); + + // Add X-Ray permissions + mainLambdaFunction.addToRolePolicy(new iam.PolicyStatement({ + effect: iam.Effect.ALLOW, + actions: [ + 'xray:PutTraceSegments', + 'xray:PutTelemetryRecords' + ], + resources: ['*'] + })); + + // Add CloudWatch Metrics permissions + mainLambdaFunction.addToRolePolicy(new iam.PolicyStatement({ + effect: iam.Effect.ALLOW, + actions: [ + 'cloudwatch:PutMetricData' + ], + resources: ['*'] + })); + + // Add SQS trigger + mainLambdaFunction.addEventSource( + new lambdaEventSources.SqsEventSource(queue, { + batchSize: 10, + reportBatchItemFailures: true, + maxConcurrency: 1000 + }) + ); + + // Grant permissions + queue.grantConsumeMessages(mainLambdaFunction); + dlq.grantConsumeMessages(mainLambdaFunction); + + // Create CloudWatch Dashboard + const dashboard = new cloudwatch.Dashboard(this, 'ProcessingDashboard', { + dashboardName: 'SQS-Processing-Dashboard' + }); + + // Message Success/Failure Widget + const messageProcessingWidget = new cloudwatch.GraphWidget({ + title: 'Message Processing Success/Failure', + left: [ + new cloudwatch.Metric({ + namespace: 'SQSProcessor', + metricName: 'SuccessfulMessages', + dimensionsMap: { + service: 'sqs-processor' + }, + statistic: 'sum', + period: Duration.minutes(1) + }), + new cloudwatch.Metric({ + namespace: 'SQSProcessor', + metricName: 'FailedMessages', + dimensionsMap: { + service: 'sqs-processor' + }, + statistic: 'sum', + period: Duration.minutes(1) + }) + ], + width: 12 + }); + + // Batch Processing Widget + const batchProcessingWidget = new cloudwatch.GraphWidget({ + title: 'Batch Processing', + left: [ + new cloudwatch.Metric({ + namespace: 'SQSProcessor', + metricName: 'BatchSize', + dimensionsMap: { + service: 'sqs-processor' + }, + statistic: 'sum', + period: Duration.minutes(1) + }), + new cloudwatch.Metric({ + namespace: 'SQSProcessor', + metricName: 'BatchProcessingTime', + dimensionsMap: { + service: 'sqs-processor' + }, + statistic: 'average', + period: Duration.minutes(1) + }) + ], + width: 12 + }); + + // Queue Metrics Widget + const queueMetricsWidget = new cloudwatch.GraphWidget({ + title: 'Queue Metrics', + left: [ + queue.metricApproximateNumberOfMessagesVisible({ + period: Duration.minutes(1), + statistic: 'sum' + }), + queue.metricApproximateAgeOfOldestMessage({ + period: Duration.minutes(1), + statistic: 'maximum' + }), + dlq.metricApproximateNumberOfMessagesVisible({ + period: Duration.minutes(1), + statistic: 'sum' + }) + ], + width: 12 + }); + + // Lambda Performance Widget + const lambdaPerformanceWidget = new cloudwatch.GraphWidget({ + title: 'Lambda Performance', + left: [ + mainLambdaFunction.metricInvocations({ + period: Duration.minutes(1), + statistic: 'sum' + }), + mainLambdaFunction.metricDuration({ + period: Duration.minutes(1), + statistic: 'average' + }), + mainLambdaFunction.metricErrors({ + period: Duration.minutes(1), + statistic: 'sum' + }) + ], + width: 12 + }); + + // Add all widgets to dashboard + dashboard.addWidgets( + messageProcessingWidget, + batchProcessingWidget, + queueMetricsWidget, + lambdaPerformanceWidget + ); + + // Create CloudWatch Alarms + // DLQ Messages Alarm + new cloudwatch.Alarm(this, 'DLQMessagesPresent', { + metric: dlq.metricApproximateNumberOfMessagesVisible(), + threshold: 1, + evaluationPeriods: 1, + alarmDescription: 'Messages present in DLQ' + }); + + // Stack Outputs + new CfnOutput(this, 'QueueUrl', { + value: queue.queueUrl, + description: 'Main SQS Queue URL', + exportName: 'MainQueueUrl' + }); + + new CfnOutput(this, 'DlqUrl', { + value: dlq.queueUrl, + description: 'Dead Letter Queue URL', + exportName: 'DlqUrl' + }); + + new CfnOutput(this, 'LambdaFunction', { + value: mainLambdaFunction.functionName, + description: 'Lambda Function Name', + exportName: 'LambdaFunctionName' + }); + + new CfnOutput(this, 'DashboardUrl', { + value: `https://${this.region}.console.aws.amazon.com/cloudwatch/home?region=${this.region}#dashboards:name=${dashboard.dashboardName}`, + description: 'CloudWatch Dashboard URL', + exportName: 'DashboardUrl' + }); + } +} + +module.exports = { LambdaSqsBestPracticesCdkStack } \ No newline at end of file diff --git a/lambda-sqs-best-practices-cdk/package.json b/lambda-sqs-best-practices-cdk/package.json new file mode 100644 index 000000000..2fa3437d0 --- /dev/null +++ b/lambda-sqs-best-practices-cdk/package.json @@ -0,0 +1,30 @@ +{ + "name": "lambda-sqs-best-practices-cdk", + "version": "1.0.0", + "bin": { + "lambda-sqs-best-practices-cdk": "bin/lambda-sqs-best-practices-cdk.js" + }, + "scripts": { + "build": "echo \"The build step is not required when using JavaScript!\" && exit 0", + "cdk": "cdk", + "test": "node scripts/load-test.js", + "test:small": "MESSAGE_COUNT=50 node scripts/load-test.js", + "test:medium": "MESSAGE_COUNT=200 node scripts/load-test.js", + "test:large": "MESSAGE_COUNT=500 node scripts/load-test.js", + "deploy": "cdk deploy", + "destroy": "cdk destroy" + }, + "devDependencies": { + "aws-cdk": "^2.110.0", + "jest": "^29.7.0" + }, + "dependencies": { + "aws-cdk-lib": "^2.110.0", + "aws-sdk": "^2.1490.0", + "constructs": "^10.3.0", + "@aws-sdk/client-sqs": "^3.x" + }, + "engines": { + "node": ">=20.0.0" + } +} \ No newline at end of file diff --git a/lambda-sqs-best-practices-cdk/resources/Batch-processing-info.png b/lambda-sqs-best-practices-cdk/resources/Batch-processing-info.png new file mode 100644 index 000000000..911aa8359 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/Batch-processing-info.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/Error-info.png b/lambda-sqs-best-practices-cdk/resources/Error-info.png new file mode 100644 index 000000000..f39a715c4 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/Error-info.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/Failed-item-retry.png b/lambda-sqs-best-practices-cdk/resources/Failed-item-retry.png new file mode 100644 index 000000000..b713f2e16 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/Failed-item-retry.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/Failed-items.png b/lambda-sqs-best-practices-cdk/resources/Failed-items.png new file mode 100644 index 000000000..06e5ae9cd Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/Failed-items.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/Lambda-SQS-Best-Practice.png b/lambda-sqs-best-practices-cdk/resources/Lambda-SQS-Best-Practice.png new file mode 100644 index 000000000..c9b6bf22f Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/Lambda-SQS-Best-Practice.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/Least-priviledge.png b/lambda-sqs-best-practices-cdk/resources/Least-priviledge.png new file mode 100644 index 000000000..e5b4c3705 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/Least-priviledge.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/SQS_operational_dashboard.png b/lambda-sqs-best-practices-cdk/resources/SQS_operational_dashboard.png new file mode 100644 index 000000000..431911934 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/SQS_operational_dashboard.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/Success-info.png b/lambda-sqs-best-practices-cdk/resources/Success-info.png new file mode 100644 index 000000000..049dfe397 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/Success-info.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/batch-info.png b/lambda-sqs-best-practices-cdk/resources/batch-info.png new file mode 100644 index 000000000..a500a1b20 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/batch-info.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/message-in-DLQ.png b/lambda-sqs-best-practices-cdk/resources/message-in-DLQ.png new file mode 100644 index 000000000..24f5bbe30 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/message-in-DLQ.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/poison-pill.png b/lambda-sqs-best-practices-cdk/resources/poison-pill.png new file mode 100644 index 000000000..075df37be Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/poison-pill.png differ diff --git a/lambda-sqs-best-practices-cdk/resources/trace-info.png b/lambda-sqs-best-practices-cdk/resources/trace-info.png new file mode 100644 index 000000000..a233315f0 Binary files /dev/null and b/lambda-sqs-best-practices-cdk/resources/trace-info.png differ diff --git a/lambda-sqs-best-practices-cdk/scripts/load-test.js b/lambda-sqs-best-practices-cdk/scripts/load-test.js new file mode 100644 index 000000000..cca6ff59e --- /dev/null +++ b/lambda-sqs-best-practices-cdk/scripts/load-test.js @@ -0,0 +1,69 @@ +const { SQSClient, SendMessageBatchCommand } = require('@aws-sdk/client-sqs'); + +// Initialize SQS client +const sqs = new SQSClient({ + region: process.env.AWS_REGION || 'us-east-1' +}); + +const QUEUE_URL = process.env.QUEUE_URL; +const MESSAGE_COUNT = parseInt(process.env.MESSAGE_COUNT || '100'); +const BATCH_SIZE = 10; + +// Create message with 10% chance of failure +const createMessage = (index) => { + const rand = Math.random(); + + // 5% chance for invalid format + if (rand < 0.05) { + return JSON.stringify("not an object"); + } + + // 5% chance for missing required field + if (rand < 0.10) { + return JSON.stringify({ data: "missing id field" }); + } + + // 90% valid messages + return JSON.stringify({ + id: `msg-${index}`, + data: `test data ${index}` + }); +}; + +async function sendMessages() { + console.log(`Sending ${MESSAGE_COUNT} messages...`); + + for (let i = 0; i < MESSAGE_COUNT; i += BATCH_SIZE) { + const entries = Array.from({ length: Math.min(BATCH_SIZE, MESSAGE_COUNT - i) }, + (_, index) => ({ + Id: (i + index).toString(), + MessageBody: createMessage(i + index) + }) + ); + + try { + const command = new SendMessageBatchCommand({ + QueueUrl: QUEUE_URL, + Entries: entries + }); + + const response = await sqs.send(command); + + if (response.Failed && response.Failed.length > 0) { + console.warn('Some messages failed to send:', response.Failed); + } + } catch (error) { + console.error('Batch send error:', error); + } + } + + console.log('Messages sent'); +} + +if (require.main === module) { + if (!QUEUE_URL) { + console.error('Please set QUEUE_URL environment variable'); + process.exit(1); + } + sendMessages().catch(console.error); +} \ No newline at end of file