Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Batch decorator: EFS volumes support (and custom mount point) #1650

Merged
merged 2 commits into from
Jan 11, 2024

Conversation

enricomarchesin
Copy link
Contributor

Add support to mount EFS volumes in batch containers by adding a new parameter to the decorator, efs_volumes, that works exactly like the recently added host_volumes one, and just needs the EFS volume ID.

For example one can use something like:

./job.py --with batch:cpu=16,memory=32768,efs_volumes=fs-000a90e5f7dc01234 run

By default the volume is mounted in /mnt/, in a folder named like the EFS volume id. In the example above that means in /mnt/fs-000a90e5f7dc01234.

This PR also adds a little improvement that allows to specify the mountpoint of the volume, which I backported also for the host_volumes parameter: just add a semicolon (:) after the EFS volume ID (or the host folder), followed by where the volume will be mounted inside the container.

For example:

./job.py --with batch:cpu=16,memory=32768,efs_volumes=fs-000a90e5f7dc01234:/mnt/efs run

As I said, I backported it also for host volumes, so one can use something like: host_volumes=/home:/mnt/host-homedir.

I've extensively tested it for my use case, and it worked like a charm, and given this was mentioned in a few issues at least (for example #441), plus considering it's really a ridiculous change, I though it might be worth trying to give something back to this awesome community that's maintaining such a great product.

I beliene someone else can find it useful, and I'd love for this little tweak to be accepted upstream.

Note: I wanted to add a couple simple tests, but I can't find where the other parts of the AWS Batch plugin test suite are... Anyway I've run locally the basic set of tests (run_tests.py --debug --contexts dev-local) against my branch, and they all succeeded.

@enricomarchesin
Copy link
Contributor Author

Actually, I just realised that two tests failed locally:

[pid 27920] ### test 'LargeArtifactTest' graph 'single-linear-step' in context dev-local (path /var/folders/tj/zsvlbvs96b53x_td5ywjtggm0000gn/T/tmpv4kj07xz_metaflow_test/test_flow.py) ###
[pid 27920] ### test 'CardExtensionsImportTest' graph 'simple-foreach' in context dev-local (path /var/folders/tj/zsvlbvs96b53x_td5ywjtggm0000gn/T/tmppnrkd6wr_metaflow_test/test_flow.py) ###

But I can't see any connection with my changes... 🤷

@enricomarchesin
Copy link
Contributor Author

Any luck reviewing this PR?

Copy link
Collaborator

@madhur-ob madhur-ob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks a lot for this.

Please allow me to check it manually once from my side and then I can approve...

@madhur-ob
Copy link
Collaborator

@enricomarchesin Thanks, I have manually tested it by creating an EFS volume and can confirm that it does get mounted...

Screenshot 2024-01-11 at 4 38 31 AM

My only question is (just out of curiosity) --

does the task switch from SUBMITTED to STARTING immediately for you?
In my case, it was in RUNNABLE (printed 3 times in the logs) before finally STARTING and then RUNNING...

@madhur-ob
Copy link
Collaborator

Checked custom mount paths and the step-functions scheduler as well...
Everything looks good!

@nflx-mf-bot
Copy link
Collaborator

Testing[594] @ 3c511cc

@romain-intel romain-intel merged commit 48450fe into Netflix:master Jan 11, 2024
2 checks passed
@enricomarchesin
Copy link
Contributor Author

My only question is (just out of curiosity) --
does the task switch from SUBMITTED to STARTING immediately for you? In my case, it was in RUNNABLE (printed 3 times in the logs) before finally STARTING and then RUNNING...

That happens to me all the time, using the standard Metaflow release, with when I run my flows using AWS Batch, both EC2 and Fargate clusters.

Depending on the actual availability of the provisioned machines, for EC2 clusters I can see RUNNABLE many times (5+) before it moves to the RUNNING state.

🤷

@enricomarchesin enricomarchesin deleted the efs-support branch January 11, 2024 18:22
@enricomarchesin
Copy link
Contributor Author

Thanks for accepting the PR! 🙏

madhur-ob pushed a commit to madhur-ob/metaflow that referenced this pull request Jan 18, 2024
…tflix#1650)

* AWS plugin: support for mounting EFS volumes

* Support custom mountpoint for volumes
emattia pushed a commit to emattia/metaflow that referenced this pull request Jan 23, 2024
…tflix#1650)

* AWS plugin: support for mounting EFS volumes

* Support custom mountpoint for volumes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants