Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update burst compute workflow to prepare it for node18 backend #1

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
509cec2
a first take to use node18 but invoke async does not work
Nov 10, 2023
f49de7e
use map to distribute the work
Nov 11, 2023
82e7ed3
filter parameters to not pass batch results around
Nov 11, 2023
eda5a29
eliminated filter params state
Nov 11, 2023
eaa18b0
added persist results state in order to persist directly from the sta…
Nov 13, 2023
451189e
removed states that did not seem to be used and from map went directl…
Nov 13, 2023
64c6e42
added states to handle partitioning of the batch jobs in order to sat…
Nov 13, 2023
a1d0ad4
specify payload for combine function to exclude batches
Nov 13, 2023
9e35153
put monitor back to check for timeout
Nov 13, 2023
0367b23
monitor needs all parameters that must propagate such as firstBatchId…
Nov 14, 2023
9603af6
use dump batches on s3 and use itemreader
Nov 16, 2023
79c7665
introduce an intermediate state to set completed flag
Nov 17, 2023
8d8325d
specify runtime and architecture in the provider block - switch to arm64
Nov 28, 2023
f22c50a
Merge branch 'master' into node18
Feb 18, 2025
32538bd
update some packages
Feb 18, 2025
f51e981
set up an org and updated the bundle plugin
Feb 21, 2025
8dd61b0
update runtime to node20x
Feb 22, 2025
b61d61a
set toleratedPercentageFailure
Feb 24, 2025
70674df
added cleanup step - to clean batch inputs
Feb 25, 2025
932d865
changed logging message
Feb 25, 2025
038934e
dotenv; additional state to check for errors
Feb 25, 2025
1ed9091
renamed branching factor
Feb 25, 2025
b8498e4
updated the to describe the new workflow
Feb 26, 2025
d506298
update the default interface name
Feb 26, 2025
55578e3
updated diagram
krokicki Feb 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
MAX_PARALLELISM=5000
MAX_BATCHED_JOBS_ITERATIONS=2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point both values are the same as the defaults so it could be removed - I used this to test it with multiple values to find the best settings.

2 changes: 2 additions & 0 deletions .eslintrc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ env:
es2021: true
extends:
- airbnb-base
parser: '@babel/eslint-parser'
parserOptions:
ecmaVersion: 12
requireConfigFile: false
rules:
no-console: off
radix: off
Expand Down
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ In the diagram below, the code you write is indicated by the blue lambda icons.
Here's how it works, step-by-step:
1) You define a **worker** function and a **combiner** function
2) Launch your burst compute job by calling the **dispatch** function with a range of items to process
3) The dispatcher will start copies of itself recursively and efficiently start your worker lambdas
4) Each **worker** is given a range of inputs and must compute results for those inputs and write results to DynamoDB
5) The Step Function monitors all the results and calls the combiner function when all workers are done
6) The **combiner** function reads all output from DynamoDB and aggregates them into the final result
3) The dispatcher partitions the work into batches and starts an AWS Step Function that implements a map-reduce workflow.
4) The Step Function maps all the batches to workers, where each **worker** is given a range of inputs, computes the results for those inputs and then sends them to another workflow task that persists these results to a DynamoDB table.
5) When all wotkers complete and their results are persisted to the database, the Step Function invokes the **combiner** that reads all outputs from the DynamoDB table and aggregates them into the final result.
6) The Step Function also monitors for a timeout and terminates the process if the time configured time limit is reached.

## Build

You need Node.js 12.x or later in your path, then:
You need Node.js 20.x or later in your path, then:

```bash
npm install
Expand All @@ -40,13 +40,16 @@ To deploy to the *dev* stage:
npm run sls -- deploy
```

This will create a application stack named `burst-compute-dev`.
This will create an application stack named `janelia-burst-compute-dev`.

To deploy to a different stage (e.g. "prod"), add a stage argument:
To deploy to a different stage (e.g. "prod") and a different organization (the default organtization is 'janelia'), add a stage and an org argument:
```bash
npm run sls -- deploy -s prod
npm run sls -- deploy -s prod --org myorg
```

This will create an application stack named
`myorg-burst-compute-dev`

## Usage

1. Create **worker** and **combiner** functions which follow the input/output specification defined in the [Interfaces](docs/Interfaces.md) document.
Expand Down
9 changes: 4 additions & 5 deletions docs/Interfaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ This function will requires dynamodb:Query permission to the **TasksTable**:
"Action": [
"dynamodb:Query"
],
"Resource": "arn:aws:dynamodb:<REGION>:*:table/burst-compute-<STAGE>-tasks",
"Resource": "arn:aws:dynamodb:<REGION>:*:table/<ORG>-burst-compute-<STAGE>-tasks",
"Effect": "Allow"
},
```
Expand All @@ -102,12 +102,11 @@ The dispatch function expects the following input object:
{
workerFunctionName: "Name or ARN of the user-defined worker Lambda function",
combinerFunctionName: "Name or ARN of the user-defined combiner Lambda function",
startIndex: "Start index to process, inclusive, e.g. 0",
endIndex: "End index to process, exclusive, e.g. N if you have N items to process",
datasetStartIndex: "Start index to process, inclusive, e.g. 0",
datasetEndIndex: "End index to process, exclusive, e.g. N if you have N items to process",
batchSize: "How many items should each worker instance process",
numLevels: "Number of levels in the dispatcher tree, e.g. 1 or 2",
maxParallelism: "Maximum number of batches to run",
searchTimeoutSecs: "Number of seconds to wait for job to finish before ending with a timeout",
jobsTimeoutSecs: "Number of seconds to wait for job to finish before ending with a timeout",
jobParameters: {
// Any parameters that each worker should receive
},
Expand Down
Binary file modified docs/burst-compute-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading