model zero-shot retrieval capability of the videochat2 stage-1 model #212

wengzejia1 · 2024-07-26T20:56:50Z

Hello. Can you apply the evaluation results (especially the zero-shot retrieval performance on MSR-VTT dataset) for the videochat2 stage-1 model. Should it perform better the UMT model or not? Thanks.

wengzejia1 · 2024-07-27T18:39:06Z

Also the current code seems to have some problems in stage-1 evaluation. I modify the code to run the evaluation process for your released stage-1 model checkpoint. But the result is strange that the VTM results are worse than the VTC results. Can you help me verify this. Or can you release the evaluation for stage-1? Thank you.

Andy1621 · 2024-07-29T01:06:54Z

Hi! You may refer to BLIP2 for help. In my memory, the stage-1 model does not work better than UMT.

wengzejia1 · 2024-07-29T01:18:34Z

Once I resume your released stage-1 model and continual the stage-1 training process, it seems the VTM results will go better than the VTC results, while my testing on released stage-1 model checkpoint shows VTM results are worse than the VTC results.
I would be appreciate if you can update the stage-1 evaluation code, and give the stage-1 zero-shot retrieval results for your released stage-1 model.

Andy1621 · 2024-07-29T01:34:18Z

Hi! It may be difficult to release the stage-1 evaluated results, since it was done by another intern who has quit. 😭

wengzejia1 · 2024-07-29T01:40:41Z

Also it seems the code of loading pretrained UMT model codeline has some problems, because of the misleading name prefix "vision_encoder." The parameter names of the UMT vision encoder in umt-l16 contains the "vision_encoder" prefix, while in the codeline, parameters of the vit model do not contain that prefix still. That will cause the failure of the pretrained model loading and bring the failure of reimplementation of stage-1.

I would be appreciate if you can check whether that bug exists. Thank you so much.

wengzejia1 · 2024-07-29T01:44:28Z

Hi! It may be difficult to release the stage-1 evaluated results, since it was done by another intern who has quit. 😭

Could you tell me the name of the author who did the first stage training? Maybe I can email him for consultation. 😬

Andy1621 · 2024-07-29T02:46:23Z

Hi! It may be difficult to release the stage-1 evaluated results, since it was done by another intern who has quit. 😭

Could you tell me the name of the author who did the first stage training? Maybe I can email him for consultation. 😬

Yizhuo Li conducts the experiment~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model zero-shot retrieval capability of the videochat2 stage-1 model #212

model zero-shot retrieval capability of the videochat2 stage-1 model #212

wengzejia1 commented Jul 26, 2024

wengzejia1 commented Jul 27, 2024 •

edited

Loading

Andy1621 commented Jul 29, 2024

wengzejia1 commented Jul 29, 2024

Andy1621 commented Jul 29, 2024

wengzejia1 commented Jul 29, 2024 •

edited

Loading

wengzejia1 commented Jul 29, 2024

Andy1621 commented Jul 29, 2024

model zero-shot retrieval capability of the videochat2 stage-1 model #212

model zero-shot retrieval capability of the videochat2 stage-1 model #212

Comments

wengzejia1 commented Jul 26, 2024

wengzejia1 commented Jul 27, 2024 • edited Loading

Andy1621 commented Jul 29, 2024

wengzejia1 commented Jul 29, 2024

Andy1621 commented Jul 29, 2024

wengzejia1 commented Jul 29, 2024 • edited Loading

wengzejia1 commented Jul 29, 2024

Andy1621 commented Jul 29, 2024

wengzejia1 commented Jul 27, 2024 •

edited

Loading

wengzejia1 commented Jul 29, 2024 •

edited

Loading