This pattern demonstrates how to invoke an Amazon ECS task from AWS Lambda Durable Functions using Python, showcasing resilient multi-step workflows with automatic checkpointing and state management.
Lambda Durable Functions enable you to build resilient applications that can execute for up to one year while maintaining reliable progress despite interruptions. This pattern shows two integration approaches: synchronous (polling with durable waits) and callback (async with durable steps).
Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns
Important: This application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details.
Lambda Durable Functions enable you to build resilient multi-step applications that can execute for up to one year while maintaining reliable progress despite interruptions. Key features include:
- Automatic Checkpointing: Each step is automatically checkpointed, so your function can resume from the last completed step after interruptions
- Cost-Effective Waits: During wait operations, your function suspends without incurring compute charges
- Built-in Retries: Steps have automatic retry logic with progress tracking
- Deterministic Replay: When resuming, completed steps use stored results instead of re-executing
This pattern uses the AWS Durable Execution SDK for Python to implement these capabilities.
This pattern is designed for learning and demonstration purposes. The IAM roles and security group use permissive configurations to simplify deployment and focus on the integration patterns:
- Security Group: Allows all outbound traffic (required for pulling Docker images and calling AWS APIs)
- IAM Roles: Use wildcard (
*) resources for ECS task management
For production use, you should:
- Restrict security group egress to specific AWS service endpoints using VPC endpoints
- Scope IAM policies to specific resources (task definitions, DynamoDB tables)
- Implement least privilege access based on your security requirements
- Consider using AWS PrivateLink for service-to-service communication
- Enable VPC Flow Logs for network traffic monitoring
- Package the AWS SDK in your Lambda deployment package (13-14MB) instead of relying on the Lambda-provided runtime SDK
- Include the Durable Execution SDK in your deployment package for production (included in requirements.txt)
Deploy this pattern in a non-production AWS account or isolated environment for testing.
- Create an AWS account if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
- AWS CLI installed and configured
- Git Installed
- AWS Serverless Application Model (AWS SAM) installed
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Lambda Durable │ │ ECS Task │ │ CloudWatch │
│ Function (Sync) │─────▶│ (Python) │─────▶│ Logs │
│ │ │ │ │ │
└─────────────────────┘ └──────────────────┘ └─────────────┘
│ │
│ Durable Wait (no charges) │
└───────────────────────────────┘
Polls with checkpointing
How it works:
- Lambda durable function invokes the ECS task using
ecs:RunTask(checkpointed step) - Function uses
context.wait()to pause without compute charges - After each wait, function checks task status using
ecs:DescribeTasks(checkpointed step) - If interrupted, function automatically resumes from last checkpoint
- Once complete, Lambda returns the result
- Can run for up to 1 year (vs 15 minutes for standard Lambda)
Key Durable Features:
@durable_executiondecorator enables durable execution@durable_stepdecorator marks functions as checkpointed stepscontext.wait()suspends execution without charges- Automatic replay and recovery from failures
Use cases:
- Long-running tasks (hours to days)
- Tasks requiring reliable progress tracking
- Workflows that need automatic recovery
- Cost-sensitive polling operations
Advantages over standard Lambda:
- No 15-minute timeout limitation
- Pay only for active execution time (not wait time)
- Automatic checkpointing and recovery
- Built-in retry logic
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Lambda Durable │ │ ECS Task │ │ CloudWatch │
│ Function (Callback)│─────▶│ (Python) │─────▶│ Logs │
│ │ │ │ │ │
└─────────────────────┘ └──────────────────┘ └─────────────┘
│ │ │
│ Checkpointed Steps │ │
│ ▼ │
│ ┌─────────────────┐ │
└──────────────────────│ DynamoDB │◄─────────────┘
│ Table │
└─────────────────┘
How it works:
- Lambda durable function creates DynamoDB record (checkpointed step)
- Lambda invokes the ECS task using
ecs:RunTask(checkpointed step) - Lambda updates DynamoDB with task ARN (checkpointed step)
- Lambda returns immediately (async pattern)
- The Python application in ECS processes the work
- When done, the ECS task updates DynamoDB with the result
- If any step fails, automatic retry with checkpoint recovery
Key Durable Features:
- Each step is automatically checkpointed
- If interrupted, function resumes from last completed step
- No re-execution of completed steps
- Reliable task initiation guaranteed
Use cases:
- Fire-and-forget workflows
- Asynchronous processing
- When you don't need immediate results
- Decoupling task execution from API responses
- Workflows requiring guaranteed task initiation
Advantages:
- Reliable task initiation with automatic recovery
- Minimal Lambda execution time
- Each step is independently retryable
- No risk of duplicate task creation (idempotent)
- Python 3.13 or 3.14 runtime support for Lambda Durable Functions
- AWS SAM CLI version that supports DurableConfig and container images
- Docker installed (for building Lambda container images)
git clone https://github.com/aws-samples/serverless-patterns
cd serverless-patterns/lambda-ecs-python-samThis pattern uses Lambda container images with Python 3.13 to support durable functions. The build process will:
- Build Docker images with the Durable Execution SDK
- Create ECR repositories automatically
- Push images to ECR
- Deploy Lambda functions using the container images
sam build
sam deploy --guidedDuring the prompts:
- Stack Name:
lambda-ecs-durable-demo(or your preferred name) - AWS Region: Your preferred region (e.g.,
us-east-1) - Parameter VpcCIDR: Press Enter to use default (10.0.0.0/16)
- Confirm changes before deploy: Y
- Allow SAM CLI IAM role creation: Y
- Disable rollback: N
- SyncLambdaFunction has no authorization defined: Y
- CallbackLambdaFunction has no authorization defined: Y
- Create managed ECR repositories for all functions: Y (required for container images)
- Save arguments to samconfig.toml: Y
The deployment will take 5-10 minutes as it creates VPC, ECS cluster, Lambda functions, and other resources.
After deployment, note the following outputs:
SyncLambdaFunctionArn- ARN for the synchronous pattern LambdaCallbackLambdaFunctionArn- ARN for the callback pattern LambdaCallbackTableName- DynamoDB table for callback trackingECSClusterName- Name of the ECS clusterLogGroupName- CloudWatch log group for ECS tasks
Important: When invoking durable functions, you must use a qualified ARN (append :$LATEST to the function name).
- Invoke the durable function asynchronously:
Lambda Durable Functions with execution timeout > 15 minutes must be invoked asynchronously. Use the --invocation-type Event flag and a qualified ARN (with :$LATEST):
aws lambda invoke \
--function-name lambda-ecs-durable-demo-sync-function:\$LATEST \
--invocation-type Event \
--cli-binary-format raw-in-base64-out \
--payload '{"message": "Hello from durable sync pattern", "processingTime": 10}' \
response.jsonNote: The \$LATEST qualifier is required for durable functions. The backslash escapes the dollar sign in bash.
- Monitor the Lambda execution logs:
aws logs tail /aws/lambda/lambda-ecs-durable-demo-sync-function --followYou'll see:
- Task starting with checkpointed step
- Durable waits (no compute charges during waits)
- Status checks every 5 seconds (PROVISIONING → PENDING → RUNNING → STOPPED)
- Each check is a separate checkpointed operation
- Final result when task completes
- View ECS task logs:
aws logs tail /ecs/lambda-ecs-durable-demo --follow- View execution in Lambda console:
Navigate to the Lambda console → Your function → "Monitoring" tab → "Logs" to see the execution timeline and checkpoints.
- Invoke the durable function asynchronously:
aws lambda invoke \
--function-name lambda-ecs-durable-demo-callback-function:\$LATEST \
--invocation-type Event \
--cli-binary-format raw-in-base64-out \
--payload '{"message": "Hello from durable callback pattern", "processingTime": 30}' \
response.json- Monitor the Lambda execution logs:
aws logs tail /aws/lambda/lambda-ecs-durable-demo-callback-function --followYou'll see:
- DynamoDB record creation (checkpointed)
- ECS task initiation (checkpointed)
- Function returns immediately
- Check the status in DynamoDB:
# Scan the table to see all executions
aws dynamodb scan --table-name lambda-ecs-durable-demo-callbacks
# Or get a specific execution (replace with your execution ID from logs)
aws dynamodb get-item \
--table-name lambda-ecs-durable-demo-callbacks \
--key '{"executionId": {"S": "YOUR-EXECUTION-ID"}}'- Monitor ECS task logs:
aws logs tail /ecs/lambda-ecs-durable-demo --followThe ECS task will update DynamoDB when processing is complete. You'll see the result in the result field with status COMPLETED.
| Feature | Synchronous (Durable Polling) | Callback (Durable Async) |
|---|---|---|
| Execution Duration | Up to 1 year | Up to 1 year |
| Checkpointing | Automatic for each step | Automatic for each step |
| Wait Charges | No charges during waits | N/A (returns immediately) |
| Polling | Durable waits between checks | No polling needed |
| Task Awareness | Task doesn't know about Lambda | Task updates DynamoDB |
| Complexity | Moderate (durable steps + waits) | Moderate (durable steps + DynamoDB) |
| Use Case | Long-running tasks needing results | Fire-and-forget workflows |
| Cost | Pay only for active execution | Minimal (quick execution) |
| Result Retrieval | Returned by function | Query DynamoDB |
| Reliability | Automatic recovery from failures | Guaranteed task initiation |
Compared to standard Lambda functions:
✅ Extended Duration: Execute for up to 1 year (vs 15 minutes) ✅ Cost Optimization: No charges during wait operations ✅ Automatic Recovery: Built-in checkpointing and replay ✅ Simplified Code: No manual state management needed ✅ Reliable Execution: Guaranteed progress despite interruptions ✅ Built-in Retries: Automatic retry logic for steps
To delete the resources:
sam delete- AWS Lambda Durable Functions
- Durable Execution SDK
- AWS Lambda
- Amazon ECS
- Amazon DynamoDB
- ECS RunTask API
Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0