Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there a standard procedure to make tapir run inference on gpu/ubuntu22.04 #45

Open
CHYjeremy opened this issue Aug 6, 2023 · 4 comments

Comments

@CHYjeremy
Copy link

CHYjeremy commented Aug 6, 2023

Hi everyone,

i find it really hard to get tapir to run on gpu, is there a standard procedure to do this?

the thing i do/try is: (after i create a new conda environment)

  1. I first do this: as instructed by jax
    pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html (for jax/cuda/cudnn installation i suppose)
  2. then i do:
    pip install requirements_inference.txt

and the following error pops out
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.func.launch' failed: Failed to load PTX text as a module: CUDA_ERROR_INVALID_IMAGE: device kernel image is invalid; current tracing scope: fusion; current profiling annotation: XlaModule:#hlo_module=jit__threefry_seed,program_id=0#.

note that using only cpu version of this wouldn't hurt (simply pip install requirement_inference.txt)

could someone state your standard procedure for making it work? much thanks

@CHYjeremy CHYjeremy changed the title make jax run on gpu/ubuntu22.04 is there a standard procedure to make tapir run inference on gpu/ubuntu22.04 Aug 6, 2023
@cdoersch
Copy link
Collaborator

cdoersch commented Aug 9, 2023

I've been running on ubuntu 20.04. I wasn't able to make it work using the nvidia drivers and cuda that are distributed with ubuntu; I needed to uninstall these and install nvidia's versions of the driver, CUDA, and CUDNN which match the JAX version. However, after installing, CUDA wasn't on my PATH (IIRC I did get a similar error message about failing to load PTX as a result). I found that export PATH=/usr/local/cuda/bin:$PATH before running the live demo made it work.

I have no idea if this is your issue, however. You might be better off posting this question on the JAX github.

@xbowlove
Copy link

have you solved your problem? if you solved, could you please show the resolution?

@kumar-sanjeeev
Copy link

@xbowlove
I did the following things and running the live demo on local laptop GPU worked for me.

  • created local virtual env using venv
  • git clone https://github.com/deepmind/tapnet.git
  • pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html For Reference
  • commented out jax jaxline from given tapnet/requirements_inference.txt.
  • pip install -r requirements_inference.txt

After this followed the rest step mentioned in the REAME file to run the live demo.

@nutsintheshell
Copy link

thanks for your answer. but I occur the following error when 'from jaxline import platform' after finishing all the steps you offered.
Traceback (most recent call last):
File "/home/jishengyin/newpan/tapnet/./experiment.py", line 30, in
from jaxline import platform
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/jaxline/platform.py", line 34, in
import tensorflow as tf
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/init.py", line 48, in
from tensorflow._api.v2 import internal
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/_api/v2/internal/init.py", line 8, in
from tensorflow._api.v2.internal import autograph
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/_api/v2/internal/autograph/init.py", line 8, in
from tensorflow.python.autograph.core.ag_ctx import control_status_ctx # line: 34
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/python/autograph/core/ag_ctx.py", line 21, in
from tensorflow.python.autograph.utils import ag_logging
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/python/autograph/utils/init.py", line 17, in
from tensorflow.python.autograph.utils.context_managers import control_dependency_on_returns
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/python/autograph/utils/context_managers.py", line 19, in
from tensorflow.python.framework import ops
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 29, in
from tensorflow.core.framework import attr_value_pb2
File "/home/jishengyin/anaconda3/lib/python3.11/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 5, in
from google.protobuf.internal import builder as _builder
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (unknown location)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants