[feat] support video evaluation for qwen2-vl and add mix-evals-video2text #275

Luodian · 2024-09-22T18:03:01Z

Integrated vision processing for videos and images, improving context handling within the model.
Added error logging for missing utility dependencies to inform users about installation requirements.
Updated YAML configuration to standardize prompt handling for various video tasks.
Bumped version number to indicate ongoing development status.

These changes streamline how visuals are managed in the model, contributing to better assistant responses in tasks involving media.

…anage samples saving process

Eliminated the commented-out import statement for WandbLogger to tidy up the code and enhance readability. This helps maintain focus on active components and prevents confusion over unused code. A cleaner structure contributes to better maintainability in the long run. No functional changes were made, just a step towards a more streamlined codebase.

- Integrated vision processing for videos and images, improving context handling within the model. - Added error logging for missing utility dependencies to inform users about installation requirements. - Updated YAML configuration to standardize prompt handling for various video tasks. - Bumped version number to indicate ongoing development status. These changes streamline how visuals are managed in the model, contributing to better assistant responses in tasks involving media.

Luodian · 2024-09-23T08:14:39Z

Now Qwen2-VL is changed to use original qwen_vl_utils do process images and videos.

Here's the difference:

pure transformers operations

qwen_vl_utils operations

- Added automatic naming for W&B runs if not specified, improving organization. - Updated video frame rate from 1.0 to 0.5 for better performance and resource management during visual content processing. - Streamlined W&B logging by removing redundant code, ensuring cleaner execution flow. These changes optimize logging efficiency and enhance the overall user experience.

- Updated chat template logic for better formatting in responses, ensuring consistent handling of user and assistant roles. - Reduced maximum new tokens in multiple evaluation files to ensure more concise outputs and improve efficiency. - Enhanced clarity in few-shot tasks by explicitly labeling question and answer roles in generated text. - Simplified logging of contextual and target information during evaluation, ensuring better tracking of results. These adjustments improve the overall output quality and streamline the evaluation processes.

Luodian added 11 commits September 16, 2024 02:29

feat: add new ouput_path saving logic and add evaluation tracker to m…

9e4a2e1

…anage samples saving process

add: regression test

edf2c00

add: regression test

7464a05

clean: unuseful code

022f1bb

[task] add mix_evals for video evaluation

7452238

Merge branch 'origin/main'

b198dd7

✨ Improve model name sanitization for Hugging Face formats

21a906a

🧹 Refactor settings for Llava OneVision model

7646dbc

Merge branch 'main' into dev/fix_output_path

d96f60d

kcz358 approved these changes Sep 23, 2024

View reviewed changes

Luodian added 5 commits September 23, 2024 16:49

feat: change qwen2 vl video reading to 0.25 fps to avoid oom

638b2e0

🎥 Update video message structure in Qwen2_VL

34b35cd

Update qwen2_vl.py

f0f5e62

Luodian merged commit 259e494 into main Sep 24, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] support video evaluation for qwen2-vl and add mix-evals-video2text #275

[feat] support video evaluation for qwen2-vl and add mix-evals-video2text #275

Luodian commented Sep 22, 2024

Luodian commented Sep 23, 2024

[feat] support video evaluation for qwen2-vl and add mix-evals-video2text #275

[feat] support video evaluation for qwen2-vl and add mix-evals-video2text #275

Conversation

Luodian commented Sep 22, 2024

Luodian commented Sep 23, 2024