Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Distributed mode for Step Function maps (v2) #1683

Closed
wants to merge 1 commit into from

Conversation

bh-perennial
Copy link

@bh-perennial bh-perennial commented Jan 12, 2024

This PR introduces a way to utilize the "Distributed" type of Map (vs "Inline") to enable Step Functions to scale to larger sizes (issue: #1216).

How it works

  • Add in a new flag --use-distributed-map when creating a step function (ie python helloworld.py step-functions create --use-distributed-map)
  • All map steps are then turned to use Distributed mode (vs the Inline) mode
  • Distributed Map values are written to S3 and then we need to run some additional steps (fetch the S3 manifest and run a second Distributed Map step to extract the value)
  • The Run ID of the initial step function is passed down to maps so that it can be accessed on child Step Functions so data can be found in the S3 bucket. This is achieved by passing it into the Dynamo table.
  • Adds a retry (with jitter) for creating too many batch requests as it only permits 50 req/sec

Infra changes

To support the new mode you need to ensure you have updated the step_functions_role (from https://github.com/outerbounds/terraform-aws-metaflow) to have states:StartExecution and kms:GenerateDataKey

Additional Notes

@mvarma-caris
Copy link

Thank you for this change! Do you have an estimate of when it'll be merged and available in a new minor version?

@savingoyal
Copy link
Collaborator

@bh-perennial thank you for your patience. I think we can clean up the implementation in this PR a bit. For example, we don't need to rely on DynamoDB to pass along run ids. They can be set at the start step and moved along, as in this PR - #1720.

Next up, I am reviewing the need to interface with S3 and whether right now is the opportune time to rip dynamodb away from the setup.

@savingoyal
Copy link
Collaborator

#1720 supersedes this

@savingoyal savingoyal closed this Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants