-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues using OpenHands w/ PyTorch + CUDA #4230
Comments
Thanks @x66ccff, I opened a separate issue so we can track this. Just to clarify, are you following our custom sandbox guide? And also, is your use case that you would like to develop programs with CUDA+PyTorch? |
hmm, i didnt use the Seems editing
Yeah, but I'm not very familiar with Docker. Any advice would be appreciated! |
OK, I think that there might be an easier way for you to do this. Specifically, you could probably start working directly from one of the official pytorch docker images. You can pick which one best matches your expected version of PyTorch and do something like the following:
Note that I changed That may just work for your purposes, but if you still have issues I can help! |
Sigh, i ve tried many commands and all failed using pytorch together with openhands. i also tried first copy a docker image from a running openhands container, it fails too. i have tried these commands ❌ docker run --gpus all -it --pull=always \
-e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel \
-e SANDBOX_USER_ID=$(id -u) \
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
-v $WORKSPACE_BASE:/opt/workspace_base \
-v /var/run/docker.sock:/var/run/docker.sock \
-p 7127:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
ghcr.io/all-hands-ai/openhands:0.9 ❌ docker run --gpus all -it --pull=always \
-e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda12.4-cudnn9-runtime \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.9-nikolaik \
-e SANDBOX_USER_ID=$(id -u) \
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \
-v $WORKSPACE_BASE:/opt/workspace_base \
-v /var/run/docker.sock:/var/run/docker.sock \
-p 7127:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
ghcr.io/all-hands-ai/openhands:0.9 ❌
❌
❌
Here i encountered 3 types of errors:
# Stage 1: Configure PyTorch and CUDA environment based on PyTorch image
FROM pytorch/pytorch:2.4.1-cuda12.4-cudnn9-runtime AS pytorch
# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive
# Install necessary dependencies, including sudo, g++, wget, and other common tools
RUN apt-get update && \
apt-get install -y sudo g++ wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 git && \
apt-get clean
# Stage 2: Use the all-hands-ai image and copy the PyTorch and CUDA environment
FROM ghcr.io/all-hands-ai/runtime:0.9-nikolaik
# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive
# Install necessary dependencies, including sudo, g++, build-essential, wget, bzip2, and other common tools
RUN apt-get update && \
apt-get install -y sudo g++ build-essential wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 git && \
apt-get clean
# Copy necessary PyTorch environment from the PyTorch image
COPY --from=pytorch /opt/conda /opt/conda
# Set Conda environment variables
ENV PATH="/opt/conda/bin:${PATH}"
# Initialize conda
RUN conda init bash
# Create a new conda environment and install PyTorch (if updates or modifications are needed)
RUN conda create -n pytorch_env python=3.9 -y && \
. /opt/conda/etc/profile.d/conda.sh && \
conda activate pytorch_env && \
conda install pytorch torchvision torchaudio -c pytorch -y
# Add all users to sudoers and allow passwordless sudo
RUN echo "ALL ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
# Set the default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc
# Clean up caches to reduce image size
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && conda clean -a -y
# Set the working directory
WORKDIR /workspace
# Set the default command
CMD ["/bin/bash"] I have run
however, using this docker image together with openhands, even with
|
OK, thanks for the detailed report. We'll try to figure this out ASAP! |
For these errors, are you facing these errors using the
The Related convo here: https://openhands-ai.slack.com/archives/C078L0FUGUX/p1727972321900899?thread_ts=1727917602.417709&cid=C078L0FUGUX cc @mamoodi if he has any idea |
Setting the sandbox_base_container_runtime is not supported through the docker command: You must use the development workflow until that is implemented |
@x66ccff merged in a change. Can you try running with: The method of I'll just update the documentation and close this hopefully. |
@mamoodi hi, i got this error
Initial pull
|
What's the error? Did you try accessing localhost:3000 now? |
@mamoodi Well, the error is just Even when i try the original recommened cmd
✔
❌
the 3000 port is already used by another app, so i use 7127. but i don think this is the reason, because 0.9 version works fine |
Hmmm.... I'm unsure. Looking at the bug description, Graham says: |
Yeah. |
i just want the agent can manage conda env in the openhands image |
Something is going wrong here: OpenHands/containers/app/entrypoint.sh Lines 48 to 53 in 2d2d3cc
It seems like the Are you able to exec into the container at all or it's not starting in your docker desktop? Just want to know what: returns |
@mamoodi no, It just won't start. |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This issue was closed because it has been stalled for over 30 days with no activity. |
One thing I am trying to crack, is getting OpenHands (within the Docker container) to access the GPU on the machine. It often does not pop up even when nVidia packages are working outside of the container (and |
@SmartManoj how would that command fit in to the
P.S. thanks @neubig for bringing is issue back up |
@BradKML That's a typo. It should be
device_requests = [docker.types.DeviceRequest(
count=-1, # Allocate all available GPUs
capabilities=[['gpu']] # Specify GPU capability
)] |
@SmartManoj sorry but then I hit this when I tried to |
Are you using Devcontainer like in #5529? SmartManoj#122 (comment) |
@SmartManoj directly |
Sorry but got another error issue @SmartManoj https://pastebin.com/bGtBgSV2 |
Could you comment this line and check? |
|
|
For 2 @SmartManoj this happened https://pastebin.com/5eQfMRMk |
Oops, it's plural |
|
value is also a list |
@SmartManoj welp made the change and no errors about that, but something else cropped up https://pastebin.com/1BGTNZkU |
Is this happening in the initial version too? |
Which version? That one did not comment out the Also, GPU is now detected in the container so thx, the changed needs to be added to the next version of OpenHands. But I don't get it @SmartManoj why comment that specific line? |
@neubig adding that one line is enough to fix the sandbox, I have to thank @SmartManoj for that |
Thanks @BradKML ! @SmartManoj if you want to send a PR for this we'd welcome one: SmartManoj@a24a00d |
Fixes All-Hands-AI#4230 (cherry picked from commit a24a00d)
the bug is, if you install anaconda in the image (using docker). the agent will need to exec
conda init
first. theconda init
command require the user toclose
the terminal, however, the agent can not close it. so i try to let the agent execsource ./bashrc
. Then, it stucksOriginally posted by @x66ccff in #2178 (comment)
The text was updated successfully, but these errors were encountered: