RuntimeError: Tensor on device meta is not on the expected device cuda:0! #4

bryanlinnan · 2025-01-10T03:34:38Z

Hi bro,
During the "model.generate" in the script run_llava_mini.py, i met the error:

RuntimeError: Tensor on device meta is not on the expected device cuda:0!

btw CUDA is 11.6, GPU memory is 8G

My pip list is as below:
accelerate 0.29.0
addict 2.4.0
aiofiles 23.2.1
annotated-types 0.7.0
anyio 4.8.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
attrs 24.2.0
babel 2.16.0
bitsandbytes 0.45.0
bleach 6.1.0
cachetools 5.5.0
certifi 2024.12.14
cffi 1.17.1
charset-normalizer 3.3.2
click 8.1.8
comm 0.2.2
debugpy 1.8.5
decord 0.6.0
deepspeed 0.12.6
defusedxml 0.7.1
descartes 1.1.0
docker-pycreds 0.4.0
einops 0.6.1
einops-exts 0.0.4
exceptiongroup 1.2.2
executing 2.1.0
fastapi 0.115.6
fastjsonschema 2.20.0
ffmpy 0.5.0
filelock 3.16.1
fire 0.6.0
fqdn 1.5.1
fsspec 2024.12.0
gitdb 4.0.12
GitPython 3.1.44
gnupg 2.3.1
gradio 5.9.1
gradio_client 1.5.2
h11 0.14.0
hjson 3.1.0
httpcore 1.0.7
httpx 0.28.1
huggingface-hub 0.27.1
idna 3.10
importlib_metadata 8.4.0
ipykernel 6.29.5
ipython 8.27.0
ipywidgets 8.1.5
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.4
joblib 1.4.2
json5 0.9.25
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter 1.1.1
jupyter_client 8.6.2
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.5
jupyter_server 2.14.2
jupyter_server_terminals 0.5.3
jupyterlab 4.2.5
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
jupyterlab_widgets 3.0.13
latex2mathml 3.77.0
llava_mini 1.0.0 /workspace/LLaVA-Mini-main
markdown-it-py 3.0.0
markdown2 2.5.2
MarkupSafe 2.1.5
matplotlib-inline 0.1.7
mdurl 0.1.2
mistune 3.0.2
mmcv-full 1.7.1
mmdet 2.28.2
mpmath 1.3.0
nbclient 0.10.0
nbconvert 7.16.4
nbformat 5.10.4
nest-asyncio 1.6.0
netifaces 0.11.0
networkx 3.4.2
ninja 1.11.1.3
notebook 7.2.2
notebook_shim 0.2.4
numpy 1.23.5
nuscenes-devkit 1.1.10
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.560.30
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.6.85
nvidia-nvtx-cu12 12.1.105
opencv-python 4.10.0.84
orjson 3.10.14
overrides 7.7.0
packaging 24.2
pandas 2.2.3
pandocfilters 1.5.1
parso 0.8.4
peft 0.14.0
pillow 11.1.0
pip 24.2
platformdirs 4.2.2
prometheus_client 0.20.0
prompt_toolkit 3.0.47
protobuf 5.29.3
psutil 6.1.1
pure_eval 0.2.3
py-cpuinfo 9.0.0
pycocotools 2.0.8
pycparser 2.22
pycryptodomex 3.21.0
pydantic 2.10.4
pydantic_core 2.27.2
pydub 0.25.1
Pygments 2.19.1
pynvml 12.0.0
pypcd 0.1.1
pyquaternion 0.9.9
python-dateutil 2.9.0.post0
python-json-logger 2.0.7
python-lzf 0.2.6
python-multipart 0.0.20
pytz 2024.2
PyYAML 6.0.2
pyzmq 26.2.0
referencing 0.35.1
regex 2024.11.6
requests 2.32.3
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.9.4
rpds-py 0.20.0
ruff 0.8.6
safehttpx 0.1.6
safetensors 0.5.2
scikit-learn 1.2.2
scipy 1.15.0
semantic-version 2.10.0
Send2Trash 1.8.3
sentencepiece 0.1.99
sentry-sdk 2.19.2
setproctitle 1.3.4
setuptools 75.1.0
Shapely 1.8.5
shellingham 1.5.4
shortuuid 1.0.13
six 1.17.0
smmap 5.0.2
sniffio 1.3.1
stack-data 0.6.3
starlette 0.41.3
svgwrite 1.4.3
sympy 1.13.3
termcolor 2.4.0
terminado 0.18.1
terminaltables 3.1.10
threadpoolctl 3.5.0
timm 0.6.13
tinycss2 1.3.0
tokenizers 0.19.0
tomli 2.0.1
tomlkit 0.13.2
torch 2.1.2
torchvision 0.16.2
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
transformers 4.43.1
triton 2.1.0
typer 0.15.1
types-python-dateutil 2.9.0.20240821
typing_extensions 4.12.2
tzdata 2024.2
uri-template 1.3.0
urllib3 2.3.0
uvicorn 0.34.0
wandb 0.19.2
wavedrom 2.0.3.post3
wcwidth 0.2.13
webcolors 24.8.0
websocket-client 1.8.0
websockets 14.1
wheel 0.44.0
widgetsnbextension 4.0.13
yapf 0.40.2

Any idea about the cause?

zhangshaolei1998 · 2025-01-10T04:40:42Z

This problem may be caused by the small GPU memory, which cannot load the entire model.
We have updated run_llava_mini.py. You can try to use --load-4bit or --load-8bit to see if it can run on 8GB of memory.

bryanlinnan · 2025-01-10T06:20:21Z

This problem may be caused by the small GPU memory, which cannot load the entire model. We have updated run_llava_mini.py. You can try to use --load-4bit or --load-8bit to see if it can run on 8GB of memory.

Thx for the reply, i replace the run_llava_mini.py with updated file, and then got the error:

ValueError: You can't pass load_in_4bitor load_in_8bit as a kwarg when passing quantization_config argument at the same time.

zhangshaolei1998 · 2025-01-10T07:45:49Z

Thanks for point out!
You should also update llavamini/model/builder.py, and use --load-8bit.

bryanlinnan · 2025-01-10T10:51:00Z

Thanks for point out! You should also update llavamini/model/builder.py, and use --load-8bit.

Yes, it worked, thx again for the great open work

IamShubhamGupto · 2025-01-12T10:18:01Z

can we also do 8bit /4bit inference with the web ui demo? @zhangshaolei1998

zhangshaolei1998 · 2025-01-13T03:08:57Z

can we also do 8bit /4bit inference

@IamShubhamGupto --load-8bit is available for web ui demo. Add it when starting the model worker.

shubhamgupto · 2025-01-13T14:41:11Z

Would it be possible to add support for Jetson orin nano runtime? I tried running the model in 4bit quantization after modifying the interface slightly and the OS seems to crash probably because of OOM

It would be really interesting to see this run on edge devices

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Tensor on device meta is not on the expected device cuda:0! #4

RuntimeError: Tensor on device meta is not on the expected device cuda:0! #4

bryanlinnan commented Jan 10, 2025

zhangshaolei1998 commented Jan 10, 2025

bryanlinnan commented Jan 10, 2025

zhangshaolei1998 commented Jan 10, 2025

bryanlinnan commented Jan 10, 2025

IamShubhamGupto commented Jan 12, 2025

zhangshaolei1998 commented Jan 13, 2025

shubhamgupto commented Jan 13, 2025

RuntimeError: Tensor on device meta is not on the expected device cuda:0! #4

RuntimeError: Tensor on device meta is not on the expected device cuda:0! #4

Comments

bryanlinnan commented Jan 10, 2025

zhangshaolei1998 commented Jan 10, 2025

bryanlinnan commented Jan 10, 2025

zhangshaolei1998 commented Jan 10, 2025

bryanlinnan commented Jan 10, 2025

IamShubhamGupto commented Jan 12, 2025

zhangshaolei1998 commented Jan 13, 2025

shubhamgupto commented Jan 13, 2025