Cannot reproduce results on vllava datasets #81

williamium3000 · 2024-08-27T08:19:13Z

Dear authors of VideoLLaMA2,
Thanks for the great work. We tried to reproduce your results on vllava datasets using the latest version of the code. However, we observe a large discrepancy in the three test datasets.

Model	MVBench	Egoschema	ActivityNet	Avg
reported	45.5	42.2	47.6	45.1
reproduced	44.475	38.5	43.55	42.175

We directly use your code, and follow your instructions to download the vllava datasets as well as three test sets, i.e. MVBench, Egoschema, and ActivityNet.

Can you hint at how you achieved the average 45.1 results?

Best
Yijiang

clownrat6 · 2024-08-27T13:40:42Z

Please adopt the new scripts to train our videollama2 under videollava settings. Previous scripts adopt another projector, which is not consistent with the projector of this experiment.

williamium3000 · 2024-08-27T13:46:22Z

I am using the lastest. Unless you guys updated within two days.
My results were obatined with the code pulled last weekend. I use the pretrain.sh and finetune.sh in scripts/vllava

lixin4ever · 2024-08-27T15:00:44Z

Please adopt the new scripts to train our videollama2 under videollava settings. Previous scripts adopt another projector, which is not consistent with the projector of this experiment.

Hi Yijiang, my colleague may not state this clearly 😂 We updated the fine-tuning script this afternoon, please check the latest commit and launch your training jobs (on video-llava dataset) with the new script.

williamium3000 · 2024-08-27T18:48:51Z

Oh thanks!!! Sorry for the misunderstanding. I will try tonight.

williamium3000 · 2024-08-30T09:39:48Z

Hi @lixin4ever and @clownrat6,
We have switched to connector-v35 but still cannot reproduce. The results are even lower than the first version.

Model	MVBench	Egoschema	ActivityNet	Avg
reported	45.5	42.2	47.6	45.1
reproduced	44.475	38.5	43.55	42.175
reproduced connector-v35	43.5	35.38	41.48	40.12

williamium3000 · 2024-08-30T09:44:20Z

We attached all the json files generated here:
config.json
generation_config.json
model.safetensors.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json
trainer_state.json

williamium3000 · 2024-08-30T09:44:41Z

env we use:
name: videollama
channels:

defaults
dependencies:
_libgcc_mutex=0.1=main
_openmp_mutex=5.1=1_gnu
ca-certificates=2024.7.2=h06a4308_0
ld_impl_linux-64=2.38=h1181459_1
libffi=3.4.4=h6a678d5_1
libgcc-ng=11.2.0=h1234567_1
libgomp=11.2.0=h1234567_1
libstdcxx-ng=11.2.0=h1234567_1
ncurses=6.4=h6a678d5_0
openssl=3.0.14=h5eee18b_0
pip=24.0=py39h06a4308_0
python=3.9.19=h955ad1f_1
readline=8.2=h5eee18b_0
sqlite=3.45.3=h5eee18b_0
tk=8.6.14=h39e8969_0
wheel=0.43.0=py39h06a4308_0
xz=5.4.6=h5eee18b_1
zlib=1.2.13=h5eee18b_1
pip:
- absl-py==2.1.0
- accelerate==0.33.0
- aiofiles==23.2.1
- aliyun-python-sdk-core==2.15.1
- aliyun-python-sdk-kms==2.16.3
- altair==5.3.0
- annotated-types==0.7.0
- anyio==4.4.0
- attrs==24.2.0
- beautifulsoup4==4.7.1
- beautifultable==0.7.0
- bitsandbytes==0.43.0
- boto3==1.34.158
- botocore==1.34.158
- bypy==1.8.5
- certifi==2024.7.4
- cffi==1.17.0
- chardet==4.0.0
- charset-normalizer==3.3.2
- click==8.1.7
- contourpy==1.2.1
- crcmod==1.7
- cryptography==43.0.0
- cycler==0.12.1
- decorator==4.4.2
- decord==0.6.0
- deepspeed==0.14.4
- dill==0.3.8
- distro==1.9.0
- docker==3.6.0
- docker-pycreds==0.4.0
- einops==0.6.1
- einops-exts==0.0.4
- evalai==1.3.18
- exceptiongroup==1.2.2
- fastapi==0.112.0
- ffmpy==0.4.0
- filelock==3.14.0
- flash-attn==2.5.8
- fonttools==4.53.1
- fsspec==2024.6.1
- gitdb==4.0.11
- gitpython==3.1.43
- gradio==3.50.0
- gradio-client==0.6.1
- grpcio==1.65.4
- h11==0.14.0
- hf-transfer==0.1.8
- hjson==3.1.0
- httpcore==0.17.3
- httpx==0.24.1
- huggingface-hub==0.23.4
- idna==2.10
- imageio==2.34.0
- imageio-ffmpeg==0.4.9
- importlib-metadata==8.2.0
- importlib-resources==6.4.0
- jinja2==3.1.4
- jiter==0.5.0
- jmespath==0.10.0
- joblib==1.4.2
- jsonschema==4.23.0
- jsonschema-specifications==2023.12.1
- kiwisolver==1.4.5
- latex2mathml==3.77.0
- lxml==4.6.2
- markdown==3.6
- markdown-it-py==3.0.0
- markdown2==2.5.0
- markupsafe==2.1.5
- matplotlib==3.9.1.post1
- mdurl==0.1.2
- moviepy==1.0.3
- mpmath==1.3.0
- multiprocess==0.70.16
- networkx==3.2.1
- ninja==1.11.1.1
- numpy==1.24.4
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==9.1.0.70
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-ml-py==12.560.30
- nvidia-nccl-cu12==2.20.5
- nvidia-nvjitlink-cu12==12.6.20
- nvidia-nvtx-cu12==12.1.105
- openai==1.40.1
- opencv-python==4.6.0.66
- openxlab==0.1.1
- orjson==3.10.6
- oss2==2.17.0
- packaging==24.1
- pandas==2.2.2
- peft==0.4.0
- pillow==10.4.0
- platformdirs==4.2.2
- proglog==0.1.10
- protobuf==4.25.4
- psutil==6.0.0
- py-cpuinfo==9.0.0
- pyarrow==17.0.0
- pycocotools==2.0.8
- pycparser==2.22
- pycryptodome==3.20.0
- pydantic==2.8.2
- pydantic-core==2.20.1
- pydub==0.25.1
- pygments==2.18.0
- pynvml==11.5.3
- pyparsing==3.1.2
- pysubs2==1.7.3
- python-dateutil==2.9.0.post0
- python-multipart==0.0.9
- pytz==2023.4
- pyyaml==6.0.2
- referencing==0.35.1
- regex==2024.7.24
- requests==2.28.2
- requests-toolbelt==1.0.0
- rich==13.4.2
- rpds-py==0.20.0
- s3transfer==0.10.2
- safetensors==0.4.4
- scenedetect==0.6.3
- scikit-learn==1.2.2
- scipy==1.13.1
- semantic-version==2.10.0
- sentencepiece==0.1.99
- sentry-sdk==2.12.0
- setproctitle==1.3.3
- setuptools==60.2.0
- shortuuid==1.0.13
- six==1.16.0
- smmap==5.0.1
- sniffio==1.3.1
- soupsieve==2.5
- starlette==0.37.2
- svgwrite==1.4.3
- sympy==1.13.1
- tabulate==0.9.0
- tensorboard==2.17.0
- tensorboard-data-server==0.7.2
- termcolor==1.1.0
- threadpoolctl==3.5.0
- timm==1.0.3
- tokenizers==0.19.1
- toolz==0.12.1
- torch==2.4.0
- torchaudio==2.4.0
- torchvision==0.19.0
- tqdm==4.65.2
- transformers==4.44.2
- triton==3.0.0
- typing-extensions==4.12.2
- tzdata==2024.1
- urllib3==1.26.19
- uvicorn==0.30.5
- validators==0.12.6
- videollama2==1.0
- wandb==0.17.5
- wavedrom==2.0.3.post3
- websocket-client==1.8.0
- websockets==11.0.3
- werkzeug==3.0.3
- xformers==0.0.27.post2
- zipp==3.19.2
  prefix: /share/yijiangli/docker_conda/envs/videollama

zhuqiangLu · 2024-09-18T10:02:22Z

Hi, I am going to reproduce this experiment. May I ask how many gpus did you use and how many days it took to run?

lixin4ever · 2024-09-18T13:17:04Z

Hi @lixin4ever and @clownrat6, We have switched to connector-v35 but still cannot reproduce. The results are even lower than the first version.

Model MVBench Egoschema ActivityNet Avg
reported 45.5 42.2 47.6 45.1
reproduced 44.475 38.5 43.55 42.175
reproduced connector-v35 43.5 35.38 41.48 40.12

Hi Yijiang, We found that the latest codebase, migrated from the older one (I.e., V1.0) to be better compatible with Qwen2 (and other recent LLMs), indeed suffers from performance degradation when switching the language decoder to Mistral. However, due to the lack of resources, we temporarily have no GPUs to further verify if the code migration leads to this issue. We will continue the verification in early October, please stay tuned.

lixin4ever · 2024-09-18T13:25:08Z

Hi, I am going to reproduce this experiment. May I ask how many gpus did you use and how many days it took to run?

Two A100/A800 nodes (i.e., 16 GPUs) for < 20 hours (pretraining + fine-tuning)

zhuqiangLu · 2024-09-18T13:27:03Z

Hi, I am going to reproduce this experiment. May I ask how many gpus did you use and how many days it took to run?

Two A100/A800 nodes (i.e., 16 GPUs) for < 20 hours (pretraining + fine-tuning)

Oh, that is much faster than I thought, thank you. Are you training full model or using LoRA?

zhuqiangLu · 2024-09-18T16:17:41Z

Hi, I am going to reproduce this experiment. May I ask how many gpus did you use and how many days it took to run?

Two A100/A800 nodes (i.e., 16 GPUs) for < 20 hours (pretraining + fine-tuning)

Also, I just tried using 8xA100 for the pretraining stage, it estimates the pretraining stage will take 48 hours, could you please clarify that the pretraining stage should include both valley and llavaimage dataset?

lixin4ever · 2024-09-19T01:39:36Z

Yes, both Valley and LLaVA-Image should be included.

Regarding the time cost, I just checked the pretraining log of one run and it took around 8 hours on 2 A800 nodes (i.e., 16 80G-A800s).

zhuqiangLu · 2024-09-19T01:50:12Z

Yes, both Valley and LLaVA-Image should be included.

Regarding the time cost, I just checked the pretraining log of one run and it took around 8 hours on 2 A800 nodes (i.e., 16 80G-A800s).

Thank you for your response, may I ask for the checkpoint of the model trained on valley dataset? I am keen to see how it performs on my custom dataset.

williamium3000 · 2024-09-19T04:54:04Z

Hi, I am going to reproduce this experiment. May I ask how many gpus did you use and how many days it took to run?

I use 8 a800 80G GPUs. local and global batch size follows the scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce results on vllava datasets #81

Cannot reproduce results on vllava datasets #81

williamium3000 commented Aug 27, 2024

clownrat6 commented Aug 27, 2024

williamium3000 commented Aug 27, 2024 •

edited

Loading

lixin4ever commented Aug 27, 2024

williamium3000 commented Aug 27, 2024

williamium3000 commented Aug 30, 2024

williamium3000 commented Aug 30, 2024

williamium3000 commented Aug 30, 2024

zhuqiangLu commented Sep 18, 2024

lixin4ever commented Sep 18, 2024

lixin4ever commented Sep 18, 2024

zhuqiangLu commented Sep 18, 2024 •

edited

Loading

zhuqiangLu commented Sep 18, 2024

lixin4ever commented Sep 19, 2024

zhuqiangLu commented Sep 19, 2024

williamium3000 commented Sep 19, 2024

Cannot reproduce results on vllava datasets #81

Cannot reproduce results on vllava datasets #81

Comments

williamium3000 commented Aug 27, 2024

clownrat6 commented Aug 27, 2024

williamium3000 commented Aug 27, 2024 • edited Loading

lixin4ever commented Aug 27, 2024

williamium3000 commented Aug 27, 2024

williamium3000 commented Aug 30, 2024

williamium3000 commented Aug 30, 2024

williamium3000 commented Aug 30, 2024

zhuqiangLu commented Sep 18, 2024

lixin4ever commented Sep 18, 2024

lixin4ever commented Sep 18, 2024

zhuqiangLu commented Sep 18, 2024 • edited Loading

zhuqiangLu commented Sep 18, 2024

lixin4ever commented Sep 19, 2024

zhuqiangLu commented Sep 19, 2024

williamium3000 commented Sep 19, 2024

williamium3000 commented Aug 27, 2024 •

edited

Loading

zhuqiangLu commented Sep 18, 2024 •

edited

Loading