Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with Setup #1

Open
jonahg0221 opened this issue Mar 2, 2019 · 4 comments
Open

Error with Setup #1

jonahg0221 opened this issue Mar 2, 2019 · 4 comments

Comments

@jonahg0221
Copy link

After a successful setup.sh, I haven't been able to run "run-code.sh" with getting a OCI runtime create failed error. I tried removing existing images again, but that didn't help.

Error:
OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused \"process_linux.go:407: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=7256 /var/lib/docker/overlay2/fb0f79cb30332a756ca8891f40822fc0846adb25716ae5c99b776e772fb47705/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: brand = tesla\\\\n\\\"\"": unknown Unable to find image 'primebydesign/tensorflow-gpu-src:latest' locally docker: Error response from daemon: pull access denied for primebydesign/tensorflow-gpu-src, repository does not exist or may require 'docker login'.

@TheDevelolper
Copy link
Owner

Hello, looking at the error message, it seems that the problem is on my side. I think I must have been relying on a docker image that I'd built and pushed to my docker hub.

I think I deleted that image accidentally when pushing something for another project. I'll take a look into how I can restore the image and I'll let you know when it's fixed.

@TheDevelolper
Copy link
Owner

TheDevelolper commented Mar 6, 2019

Okay so I believe that this problem was caused by using a specific tensorflow image. I've now updated the image to tensorflow/tensorflow:1.13.1-gpu-py3. As opposed to latest gpu (which will constantly change).

It seems that they'd removed pip3 from the image, this is required for creating our own local docker image. Since the build fails docker then looks online for the new image. This generates the error message you see. Click here to see the exact change

I think this should fix your issue. Have a go and let me know how you get on.

@jonahg0221
Copy link
Author

After changing to a specific version of tensorflow gpu, I am getting a different error response:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused \"process_linux.go:407: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=7302 /var/lib/docker/overlay2/fdfc2a8198c70586e027e1d5c04e48dc8201ed5fe800a8c04f2b021635636a28/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: brand = tesla\\\\n\\\"\"": unknown.

@TheDevelolper
Copy link
Owner

this is because NVidia docker has updated and requires CUDA 10. I think we'll need to have a look at downgrading NVidia docker until Tensorflow supports CUDA 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants