Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues using OpenHands w/ PyTorch + CUDA #4230

Closed
neubig opened this issue Oct 6, 2024 · 39 comments · Fixed by #6042
Closed

Issues using OpenHands w/ PyTorch + CUDA #4230

neubig opened this issue Oct 6, 2024 · 39 comments · Fixed by #6042
Assignees

Comments

@neubig
Copy link
Contributor

neubig commented Oct 6, 2024

          Hi, @neubig  you can refer to here https://github.com/SmartManoj/Kevin/issues/65 

the bug is, if you install anaconda in the image (using docker). the agent will need to exec conda init first. the conda init command require the user to close the terminal, however, the agent can not close it. so i try to let the agent exec source ./bashrc. Then, it stucks

# Stage 1: Prepare GPU environment based on NVIDIA CUDA image
FROM nvidia/cuda:12.6.1-devel-ubi8 AS cuda

# Stage 2: Use specified image
FROM ghcr.io/all-hands-ai/runtime:0.9-nikolaik

# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive

# Install necessary dependencies, including sudo, g++, build-essential, and other common tools
RUN apt-get update && \
    apt-get install -y sudo g++ build-essential wget bzip2 ca-certificates \
    libglib2.0-0 libxext6 libsm6 libxrender1 git && \
    apt-get clean

# Copy CUDA toolchain and libraries from CUDA image
COPY --from=cuda /usr/local/cuda /usr/local/cuda

# Set CUDA environment variables
ENV PATH="/usr/local/cuda/bin:${PATH}"

# Explicitly initialize LD_LIBRARY_PATH to avoid undefined variable warnings
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64"

# Download and install specified version of Anaconda
RUN wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh -O anaconda.sh && \
    bash anaconda.sh -b -p /opt/conda && \
    rm anaconda.sh

# Set conda environment variables
ENV PATH="/opt/conda/bin:${PATH}"

# Initialize conda
RUN conda init bash

# Create a new conda environment and install PyTorch
RUN conda create -n pytorch_env python=3.9 -y && \
    . /opt/conda/etc/profile.d/conda.sh && \
    conda activate pytorch_env && \
    conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y

# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc

# Clean cache to reduce image size
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && conda clean -a -y

# Set working directory
WORKDIR /workspace

# Set default command
CMD ["/bin/bash"]

# Initialize conda
RUN conda init

# Set default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc

Originally posted by @x66ccff in #2178 (comment)

@neubig
Copy link
Contributor Author

neubig commented Oct 6, 2024

Thanks @x66ccff, I opened a separate issue so we can track this.

Just to clarify, are you following our custom sandbox guide?
https://docs.all-hands.dev/modules/usage/how-to/custom-sandbox-guide

And also, is your use case that you would like to develop programs with CUDA+PyTorch?
I'm just trying to better understand the situation so we can recommend the easiest way to fix the issue.

@x66ccff
Copy link

x66ccff commented Oct 6, 2024

Just to clarify, are you following our custom sandbox guide? https://docs.all-hands.dev/modules/usage/how-to/custom-sandbox-guide

hmm, i didnt use the config.toml in the entire process. i m not sure whether it is correct. I just make a new docker image use the dockerfile above and then use this to start the openhands.

Seems editing config.toml is only necessary when build from source? not docker?

 docker run -it --pull=always   
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=kk-openhands-pytorch
  -e SANDBOX_USER_ID=$(id -u)    
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  
   -v $WORKSPACE_BASE:/opt/workspace_base 
    -v /var/run/docker.sock:/var/run/docker.sock  
   -p 3000:3000   
  --add-host host.docker.internal:host-gateway  
   --name openhands-app-$(date +%Y%m%d%H%M%S)  
   ghcr.io/all-hands-ai/openhands:0.9

And also, is your use case that you would like to develop programs with CUDA+PyTorch? I'm just trying to better understand the situation so we can recommend the easiest way to fix the issue.

Yeah, but I'm not very familiar with Docker. Any advice would be appreciated!

@neubig
Copy link
Contributor Author

neubig commented Oct 6, 2024

OK, I think that there might be an easier way for you to do this. Specifically, you could probably start working directly from one of the official pytorch docker images. You can pick which one best matches your expected version of PyTorch and do something like the following:

 docker run -it --pull=always   
  -e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda11.8-cudnn9-runtime
  -e SANDBOX_USER_ID=$(id -u)    
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  
   -v $WORKSPACE_BASE:/opt/workspace_base 
    -v /var/run/docker.sock:/var/run/docker.sock  
   -p 3000:3000   
  --add-host host.docker.internal:host-gateway  
   --name openhands-app-$(date +%Y%m%d%H%M%S)  
   ghcr.io/all-hands-ai/openhands:0.9

Note that I changed SANDBOX_RUNTIME_CONTAINER_IMAGE to SANDBOX_BASE_CONTAINER_IMAGE, which will allow OpenHands to install its necessary software on top of the base image.

That may just work for your purposes, but if you still have issues I can help!

@neubig neubig changed the title Issues using OpenHands w/ Conda Issues using OpenHands w/ PyTorch + CUDA Oct 6, 2024
@x66ccff
Copy link

x66ccff commented Oct 7, 2024

Sigh, i ve tried many commands and all failed using pytorch together with openhands. i also tried first copy a docker image from a running openhands container, it fails too.

i have tried these commands

 docker run --gpus all -it --pull=always   \
  -e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel \
  -e SANDBOX_USER_ID=$(id -u)    \
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
   -v $WORKSPACE_BASE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock  \
   -p 7127:3000   \
  --add-host host.docker.internal:host-gateway  \
   --name openhands-app-$(date +%Y%m%d%H%M%S)  \
   ghcr.io/all-hands-ai/openhands:0.9

 docker run --gpus all -it --pull=always   \
  -e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda12.4-cudnn9-runtime \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.9-nikolaik \
  -e SANDBOX_USER_ID=$(id -u)    \
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
   -v $WORKSPACE_BASE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock  \
   -p 7127:3000   \
  --add-host host.docker.internal:host-gateway  \
   --name openhands-app-$(date +%Y%m%d%H%M%S)  \
   ghcr.io/all-hands-ai/openhands:0.9

 docker run --gpus all -it --pull=always   \
  -e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda12.4-cudnn9-runtime \
  -e SANDBOX_USER_ID=$(id -u)    \
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
   -v $WORKSPACE_BASE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock  \
   -p 7127:3000   \
  --add-host host.docker.internal:host-gateway  \
   --name openhands-app-$(date +%Y%m%d%H%M%S)  \
   ghcr.io/all-hands-ai/openhands:0.9.7

 docker run --gpus all -it --pull=always   \
  -e SANDBOX_BASE_CONTAINER_IMAGE=kk_openhands_pytorch_nvidiactk \
  -e SANDBOX_USER_ID=$(id -u)    \
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
   -v $WORKSPACE_BASE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock  \
   -p 7127:3000   \
  --add-host host.docker.internal:host-gateway  \
   --name openhands-app-$(date +%Y%m%d%H%M%S)  \
   ghcr.io/all-hands-ai/openhands:0.9

 docker run -it --gpus all  --pull=always    \
 -e  SANDBOX_RUNTIME_CONTAINER_IMAGE=kk_openhands_pytorch_nvidiactk     
\  -e SANDBOX_USER_ID=$(id -u)     -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE    
\ -v $WORKSPACE_BASE:/opt/workspace_base     -v /var/run/docker.sock:/var/run/docker.sock   
\  -p 7127:3000  
\   --add-host host.docker.internal:host-gateway    
\ --name openhands-app-$(date +%Y%m%d%H%M%S)     ghcr.io/all-hands-ai/openhands:0.9

Here i encountered 3 types of errors:

  1. when using pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel, i will run into
03:50:19 - openhands:ERROR: docker.py:130 - Python executable not found: [Errno 2] No such file or directory: 'docker'
03:50:19 - openhands:ERROR: runtime_build.py:383 - Sandbox image build failed: [Errno 2] No such file or directory: 'docker'
03:50:19 - openhands:ERROR: agent_session.py:194 - Runtime initialization failed: [Errno 2] No such file or directory: 'docker'
03:50:19 - openhands:ERROR: agent_session.py:84 - Error starting session: [Errno 2] No such file or directory: 'docker'
  1. when using kk_openhands_pytorch_nvidiactk —— this is a docker image build with dockerfile like this
# Stage 1: Configure PyTorch and CUDA environment based on PyTorch image
FROM pytorch/pytorch:2.4.1-cuda12.4-cudnn9-runtime AS pytorch

# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive

# Install necessary dependencies, including sudo, g++, wget, and other common tools
RUN apt-get update && \
    apt-get install -y sudo g++ wget bzip2 ca-certificates \
    libglib2.0-0 libxext6 libsm6 libxrender1 git && \
    apt-get clean

# Stage 2: Use the all-hands-ai image and copy the PyTorch and CUDA environment
FROM ghcr.io/all-hands-ai/runtime:0.9-nikolaik

# Set non-interactive frontend
ENV DEBIAN_FRONTEND=noninteractive

# Install necessary dependencies, including sudo, g++, build-essential, wget, bzip2, and other common tools
RUN apt-get update && \
    apt-get install -y sudo g++ build-essential wget bzip2 ca-certificates \
    libglib2.0-0 libxext6 libsm6 libxrender1 git && \
    apt-get clean

# Copy necessary PyTorch environment from the PyTorch image
COPY --from=pytorch /opt/conda /opt/conda

# Set Conda environment variables
ENV PATH="/opt/conda/bin:${PATH}"

# Initialize conda
RUN conda init bash

# Create a new conda environment and install PyTorch (if updates or modifications are needed)
RUN conda create -n pytorch_env python=3.9 -y && \
    . /opt/conda/etc/profile.d/conda.sh && \
    conda activate pytorch_env && \
    conda install pytorch torchvision torchaudio -c pytorch -y

# Add all users to sudoers and allow passwordless sudo
RUN echo "ALL ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

# Set the default conda environment
RUN echo "conda activate pytorch_env" >> ~/.bashrc

# Clean up caches to reduce image size
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && conda clean -a -y

# Set the working directory
WORKDIR /workspace

# Set the default command
CMD ["/bin/bash"]

I have run docker run --rm --gpus all kk_openhands_pytorch nvidia-smi to test this image, and it can print this successfully

Mon Oct  7 04:01:29 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:17:00.0 Off |                  N/A |
| 75%   34C    P8             26W /  250W |   20960MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
....

however, using this docker image together with openhands, even with --gpus all option will always get

nvidia-smi command not found

  1. If i clone the openhands instance and manually install nvidia tools inside the image, then use this image again with the openhands, i will get
$ nvidia-smi
Failed to initialize NVML: Unknown Error

@neubig
Copy link
Contributor Author

neubig commented Oct 7, 2024

OK, thanks for the detailed report. We'll try to figure this out ASAP!

@xingyaoww
Copy link
Collaborator

For these errors, are you facing these errors using the docker run command?
If so, can you actually try using make run from https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md to launch it instead?

03:50:19 - openhands:ERROR: docker.py:130 - Python executable not found: [Errno 2] No such file or directory: 'docker'
03:50:19 - openhands:ERROR: runtime_build.py:383 - Sandbox image build failed: [Errno 2] No such file or directory: 'docker'
03:50:19 - openhands:ERROR: agent_session.py:194 - Runtime initialization failed: [Errno 2] No such file or directory: 'docker'
03:50:19 - openhands:ERROR: agent_session.py:84 - Error starting session: [Errno 2] No such file or directory: 'docker'

The docker not found the issue is likely related to docker not being installed "inside" the app container (e.g., ghcr.io/all-hands-ai/openhands).

Related convo here: https://openhands-ai.slack.com/archives/C078L0FUGUX/p1727972321900899?thread_ts=1727917602.417709&cid=C078L0FUGUX

cc @mamoodi if he has any idea

@mamoodi
Copy link
Collaborator

mamoodi commented Oct 7, 2024

Setting the sandbox_base_container_runtime is not supported through the docker command:
#4220

You must use the development workflow until that is implemented

@mamoodi
Copy link
Collaborator

mamoodi commented Oct 10, 2024

@x66ccff merged in a change. Can you try running with:
ghcr.io/all-hands-ai/openhands:main
and see if it works now?

The method of 1.when using pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel, i will run into step, you shouldn't get those errors.

I'll just update the documentation and close this hopefully.

@x66ccff
Copy link

x66ccff commented Oct 10, 2024

@mamoodi hi, i got this error

(openhands) kent@kent-Super-Server:~/_Project/openhands$  docker run --gpus all -it --pull=always   \
  -e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel \
  -e SANDBOX_USER_ID=$(id -u)    \
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
   -v $WORKSPACE_BASE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock  \
   -p 7127:3000   \
  --add-host host.docker.internal:host-gateway  \
   --name openhands-app-$(date +%Y%m%d%H%M%S)  \
   ghcr.io/all-hands-ai/openhands:main
main: Pulling from all-hands-ai/openhands
Digest: sha256:d3a4ed8b661b3a0e65830e97411ad18eb7f0e519e0f7442b4fb5c97042d917a6
Status: Image is up to date for ghcr.io/all-hands-ai/openhands:main
Starting OpenHands...
Setting up enduser with id 1000
Docker socket group id: 983
Creating group with id 983
groupadd: group 'docker' already exists

Initial pull

(openhands) kent@kent-Super-Server:~/_Project/openhands$  docker run --gpus all -it --pull=always   \
  -e SANDBOX_BASE_CONTAINER_IMAGE=pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel \
  -e SANDBOX_USER_ID=$(id -u)    \
 -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
   -v $WORKSPACE_BASE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock  \
   -p 7127:3000   \
  --add-host host.docker.internal:host-gateway  \
   --name openhands-app-$(date +%Y%m%d%H%M%S)  \
   ghcr.io/all-hands-ai/openhands:main
main: Pulling from all-hands-ai/openhands
09f376ebb190: Already exists
276709cbedc1: Already exists
2e133733af76: Already exists
ded8879d9a79: Already exists
3cf9507408dc: Already exists
0edbbd94250f: Already exists
e494d0176b10: Already exists
5fa558507e5e: Already exists
231ac2d3b56e: Pull complete
53f50c4d33b4: Pull complete
d4013f78c628: Pull complete
09a13691b725: Pull complete                                                                        136ca1f6741b: Pull complete                                                                        c0af04535154: Pull complete
41e03b1234d4: Pull complete
d963e1867e31: Pull complete
d1a32222dcaf: Pull complete                                                                        381ad02f0168: Pull complete                                                                        fe1889fba864: Pull complete                                                                        4f4fb700ef54: Pull complete                                                                        33c03875d330: Pull complete
071cab0c0483: Pull complete                                                                        07afac7ced9b: Pull complete
8782a62765f5: Pull complete
1340567b9141: Pull complete
ba310f50634e: Pull complete
a51970467f4d: Pull complete
d8fb29726c90: Pull complete
a0d1c6768975: Pull complete
Digest: sha256:d3a4ed8b661b3a0e65830e97411ad18eb7f0e519e0f7442b4fb5c97042d917a6
Status: Downloaded newer image for ghcr.io/all-hands-ai/openhands:main
Starting OpenHands...
Setting up enduser with id 1000
Docker socket group id: 983
Creating group with id 983
groupadd: group 'docker' already exists

@mamoodi
Copy link
Collaborator

mamoodi commented Oct 10, 2024

What's the error? Did you try accessing localhost:3000 now?

@x66ccff
Copy link

x66ccff commented Oct 10, 2024

@mamoodi Well, the error is just groupadd: group 'docker' already exists and the images cannot start.

Even when i try the original recommened cmd

docker run -it --pull=always     -e SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.9-nikolaik     -e SANDBOX_USER_ID=$(id -u)     -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE     -v $WORKSPACE_BASE:/opt/workspace_base     -v /var/run/docker.sock:/var/run/docker.sock     -p 7127:3000     --add-host host.docker.internal:host-gat
eway     --name openhands-app-$(date +%Y%m%d%H%M%S)     ghcr.io/all-hands-ai/openhands:0.9

0.9: Pulling from all-hands-ai/openhands
Digest: sha256:1488932730c3897bd3aa0b594fae0f19adabf6e12c01ec43325ad07f0ac3e179
Status: Image is up to date for ghcr.io/all-hands-ai/openhands:0.9


this works fine. however when i use openhands:main i got

groupadd: group 'docker' already exists

Did you try accessing localhost:3000 now?

the 3000 port is already used by another app, so i use 7127. but i don think this is the reason, because 0.9 version works fine

@mamoodi
Copy link
Collaborator

mamoodi commented Oct 10, 2024

Hmmm.... I'm unsure. Looking at the bug description, Graham says:
"if you install anaconda in the image (using docker)". Where are you doing that? In the APP image?

@x66ccff
Copy link

x66ccff commented Oct 10, 2024

@mamoodi

Hmmm.... I'm unsure. Looking at the bug description, Graham says: "if you install anaconda in the image (using docker)". Where are you doing that? In the APP image?

Yeah.

@x66ccff
Copy link

x66ccff commented Oct 10, 2024

i just want the agent can manage conda env in the openhands image

@mamoodi
Copy link
Collaborator

mamoodi commented Oct 10, 2024

Something is going wrong here:

if getent group $DOCKER_SOCKET_GID; then
echo "Group with id $DOCKER_SOCKET_GID already exists"
else
echo "Creating group with id $DOCKER_SOCKET_GID"
groupadd -g $DOCKER_SOCKET_GID docker
fi

It seems like the if getent group DOCKER_SOCKET_GID is returning false, even though the group exists...and it tries to create it.

Are you able to exec into the container at all or it's not starting in your docker desktop? Just want to know what:
DOCKER_SOCKET_GID=$(stat -c '%g' /var/run/docker.sock)
getent group $DOCKER_SOCKET_GID;

returns

@x66ccff
Copy link

x66ccff commented Oct 10, 2024

@mamoodi no, It just won't start.

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale Inactive for 30 days label Nov 12, 2024
Copy link
Contributor

This issue was closed because it has been stalled for over 30 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 19, 2024
@BradKML
Copy link

BradKML commented Jan 1, 2025

One thing I am trying to crack, is getting OpenHands (within the Docker container) to access the GPU on the machine. It often does not pop up even when nVidia packages are working outside of the container (and --gpus=all), which makes the agent wanted to start a Docker instance with the current Docker instance (Docker-ception). OpenHands would have a hard time learning ML. @x66ccff thanks for all the testing and I would like @SmartManoj to maybe take a peek at this
P.S. somewhat related #1020

@SmartManoj
Copy link
Contributor

@BradKML Could you pass the device_requests arg here like this and run in development mode?

@neubig neubig removed the Stale Inactive for 30 days label Jan 2, 2025
@BradKML
Copy link

BradKML commented Jan 2, 2025

@SmartManoj how would that command fit in to the containers.run? As in the device_requests parameter with [docker.types.DeviceRequest(device_ids=["0,2"], capabilities=[['gpu']])]? Why would the device_ids defaults to like ["0,2"]? (like the default for laptops GPU would be 0)
P.S. why is docker compose not used like THIS or THIS?

    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

P.S. thanks @neubig for bringing is issue back up

@neubig neubig reopened this Jan 2, 2025
@SmartManoj
Copy link
Contributor

SmartManoj commented Jan 2, 2025

@BradKML That's a typo. It should be ['0', '2'] which is not the default. The OP in that question used the 1st and 3rd GPUs only.


count (int): Number or devices to request. Optional.
    Set to -1 to request all available devices.
device_ids (list): List of strings for device IDs. Optional.
    Set either ``count`` or ``device_ids``.

Source


device_requests = [docker.types.DeviceRequest(
    count=-1,                   # Allocate all available GPUs
    capabilities=[['gpu']]      # Specify GPU capability
)]

@BradKML
Copy link

BradKML commented Jan 2, 2025

@SmartManoj sorry but then I hit this when I tried to make build in WSL #5529 https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md

@SmartManoj
Copy link
Contributor

Are you using Devcontainer like in #5529? SmartManoj#122 (comment)

@BradKML
Copy link

BradKML commented Jan 2, 2025

@SmartManoj directly make build instead of starting a devcointainer, but have the errors. After commenting out the code it is functional now, proceeding with testing this

@BradKML
Copy link

BradKML commented Jan 2, 2025

Sorry but got another error issue @SmartManoj https://pastebin.com/bGtBgSV2

@SmartManoj
Copy link
Contributor

Could you comment this line and check?

@BradKML
Copy link

BradKML commented Jan 2, 2025

  1. Will check and see
  2. Take a look at this error, will need to see where I need to import the class https://pastebin.com/sM45mNzc

@SmartManoj
Copy link
Contributor

  1. Could you replace it with docker.types.DeviceRequest?

@BradKML
Copy link

BradKML commented Jan 2, 2025

For 2 @SmartManoj this happened https://pastebin.com/5eQfMRMk

@SmartManoj
Copy link
Contributor

Oops, it's plural device_requests.

@BradKML
Copy link

BradKML commented Jan 2, 2025

TypeError: Invalid type for device_requests param: expected list but found <class 'docker.types.containers.DeviceRequest'> when I added device_requests=docker.types.DeviceRequest(count=-1,capabilities=[['gpu']]), (note: haven't commented out the other line yet)

@SmartManoj
Copy link
Contributor

SmartManoj commented Jan 2, 2025

value is also a list device_requests=[docker.types.DeviceRequest(count=-1,capabilities=[['gpu']])]

@BradKML
Copy link

BradKML commented Jan 2, 2025

@SmartManoj welp made the change and no errors about that, but something else cropped up https://pastebin.com/1BGTNZkU

@SmartManoj
Copy link
Contributor

SmartManoj commented Jan 2, 2025

Is this happening in the initial version too?

@BradKML
Copy link

BradKML commented Jan 2, 2025

Which version? That one did not comment out the _container_port line, but after commenting that line out, load and re-start session (first session is broken), another launch shows that it is functional. Testing if container spots the GPU.

Also, GPU is now detected in the container so thx, the changed needs to be added to the next version of OpenHands. But I don't get it @SmartManoj why comment that specific line?

@SmartManoj
Copy link
Contributor

Known bug #5943 #5964

SmartManoj added a commit to SmartManoj/Kevin that referenced this issue Jan 2, 2025
@BradKML
Copy link

BradKML commented Jan 2, 2025

@neubig adding that one line is enough to fix the sandbox, I have to thank @SmartManoj for that

@neubig
Copy link
Contributor Author

neubig commented Jan 4, 2025

Thanks @BradKML !

@SmartManoj if you want to send a PR for this we'd welcome one: SmartManoj@a24a00d

@neubig neubig self-assigned this Jan 4, 2025
SmartManoj added a commit to SmartManoj/Kevin that referenced this issue Jan 5, 2025
Fixes All-Hands-AI#4230

(cherry picked from commit a24a00d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants