Open
Description
Hi,
I'm trying to do a Docker build . on a SageMaker-managed EC2 instance in AWS (ml.g4dn.12xlarge, with T4 cards).
docker build .
runs for few minutes, outputs several things and errors with the following:
csrc/layer_norm_cuda_kernel.cu:4:10: fatal error: ATen/cuda/DeviceUtils.cuh: No such file or directory
#include "ATen/cuda/DeviceUtils.cuh"
^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Interestingly, early in the build it says
Step 17/18 : RUN cd /home/ && git clone https://github.com/NVIDIA/apex.git apex && cd apex && python setup.py install --cuda_ext --cpp_ext
---> Running in e8df4e2bf69e
Cloning into 'apex'...
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Warning: Torch did not find available GPUs on this system.
If your intention is to cross-compile, this is not an error.
By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
Volta (compute capability 7.0), Turing (compute capability 7.5),
and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
If you wish to cross-compile for a single specific architecture,
export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.
which suprises me since I have 4 GPUs on my machine.
How to build that docker image in a SageMaker-managed AWS EC2 instance?
Metadata
Metadata
Assignees
Labels
No labels