Add JavaScript/TypeScript + TensorFlow.js kernel example #12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available: N/A
Description of changes:
To aim for working GPU support, the new example uses a similar approach to the AWS TensorFlow v2.4 GPU training DLC: building from an
nvidia/cuda:11.0-base-ubuntu
, with similar library installs and base Python setup.However, many of the more SageMaker training-specific or core TensorFlow-specific items are skipped out: E.g. Python TensorFlow install, MPI/Horovod, SMDistributed, SSH, Boost, etc... Just leaving a few utilities like AWS CLI and the high-level SageMaker SDKs in case users want to use them.
On top of this base, NodeJS and tslab are installed (Providing Jupyter kernels for either JavaScript or TypeScript) - and TensorFlow.js (CPU and GPU versions) + the AWS SDK for JS are pre-installed.
Very open to feedback on the approach: It does yield pretty long build times (esp since the Python is built from source) and big containers (~3400MB according to my ECR)... but I thought sticking close to the DLC's approach was likely to give good driver compatibility results; and directly inheriting from the DLC would have resulted in significant extras and user complexity (having to look up the correct account ID for your region, and do another ECR login)
Testing:
I've built the container and successfully attached it to SMStudio both by the
jslab
andtslab
kernels, which generally seem to work OK. Not quite got the TensorFlow.js MNIST example working via notebook yet, but can use the kernels and import the libraries correctly.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.