Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Docker for Cuda fails weirdly #783

Open
blackcatstudiosdevelopment opened this issue May 28, 2024 · 1 comment
Open

Building Docker for Cuda fails weirdly #783

blackcatstudiosdevelopment opened this issue May 28, 2024 · 1 comment

Comments

@blackcatstudiosdevelopment
Copy link

blackcatstudiosdevelopment commented May 28, 2024

Docker File

FROM nvidia/cuda:12.2.0-base-ubuntu22.04

COPY . /app

RUN apt-get update && \
    apt-get install -y --allow-unauthenticated --no-install-recommends \
    wget \
    git \
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/*

ENV HOME "/root"
ENV CONDA_DIR "${HOME}/miniconda"
ENV PATH="$CONDA_DIR/bin":$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false
ENV PIP_DOWNLOAD_CACHE="$HOME/.pip/cache"
ENV TORTOISE_MODELS_DIR="$HOME/tortoise-tts/build/lib/tortoise/models"

RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda3.sh \
    && bash /tmp/miniconda3.sh -b -p "${CONDA_DIR}" -f -u \
    && "${CONDA_DIR}/bin/conda" init bash \
    && rm -f /tmp/miniconda3.sh \
    && echo ". '${CONDA_DIR}/etc/profile.d/conda.sh'" >> "${HOME}/.profile"

# --login option used to source bashrc (thus activating conda env) at every RUN statement
SHELL ["/bin/bash", "--login", "-c"]

RUN conda create --name tortoise python=3.9 numba inflect --yes \
    && conda activate tortoise \
    && conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia --yes \
    && conda install transformers=4.31.0 --yes \
    && cd /app \
    && python setup.py install

Build command: docker build . -t tts
Run Command:

docker run --gpus all -e TORTOISE_MODELS_DIR=/models -v "J:\AI\voice\tortoise-tts\tortoise\models":/models \
-v "J:\AI\voice\tortoise-tts\tortoise\results":/results \
-v "%USERPROFILE%\.cache\huggingface":/root/.cache/huggingface \
-v "J:\AI\voice\tortoise-tts\tortoise\work":/work -it tts

(base) root@9454180e9c47:/# cd app
(base) root@9454180e9c47:/app# conda activate tortoise
(tortoise) root@9454180e9c47:/app# python -c "import torch; print(torch.cuda.is_available());torch.zeros(1).cuda()"

/root/miniconda/envs/tortoise/lib/python3.9/site-packages/torch/cuda/__init__.py:141: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at /opt/conda/conda-bld/pytorch_1711403380164/work/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/miniconda/envs/tortoise/lib/python3.9/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

I'm not sure how to even troubleshoot this error.

(tortoise) root@9454180e9c47:/app# nvidia-smi

Tue May 28 20:50:37 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        On  |   00000000:2A:00.0  On |                  N/A |
|  0%   44C    P5             18W /  170W |    1644MiB /  12288MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        36      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        48      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
@blackcatstudiosdevelopment
Copy link
Author

I thought #760 would have been the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant