Skip to content

Docker build fails on Amazon SageMaker: fatal error: ATen/cuda/DeviceUtils.cuh: No such file or directory #include "ATen/cuda/DeviceUtils.cuh" #168

Open
@la-cruche

Description

@la-cruche

Hi,

I'm trying to do a Docker build . on a SageMaker-managed EC2 instance in AWS (ml.g4dn.12xlarge, with T4 cards).
docker build . runs for few minutes, outputs several things and errors with the following:

csrc/layer_norm_cuda_kernel.cu:4:10: fatal error: ATen/cuda/DeviceUtils.cuh: No such file or directory
 #include "ATen/cuda/DeviceUtils.cuh"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Interestingly, early in the build it says

Step 17/18 : RUN cd /home/ && git clone https://github.com/NVIDIA/apex.git apex && cd apex && python setup.py install --cuda_ext --cpp_ext
 ---> Running in e8df4e2bf69e
Cloning into 'apex'...
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

Warning: Torch did not find available GPUs on this system.
 If your intention is to cross-compile, this is not an error.
By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
Volta (compute capability 7.0), Turing (compute capability 7.5),
and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
If you wish to cross-compile for a single specific architecture,
export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.

which suprises me since I have 4 GPUs on my machine.

How to build that docker image in a SageMaker-managed AWS EC2 instance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions