Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] support video evaluation for qwen2-vl and add mix-evals-video2text #275

Merged
merged 16 commits into from
Sep 24, 2024

Conversation

Luodian
Copy link
Contributor

@Luodian Luodian commented Sep 22, 2024

  • Integrated vision processing for videos and images, improving context handling within the model.
  • Added error logging for missing utility dependencies to inform users about installation requirements.
  • Updated YAML configuration to standardize prompt handling for various video tasks.
  • Bumped version number to indicate ongoing development status.

These changes streamline how visuals are managed in the model, contributing to better assistant responses in tasks involving media.

Eliminated the commented-out import statement for WandbLogger to tidy up the code and enhance readability. This helps maintain focus on active components and prevents confusion over unused code. A cleaner structure contributes to better maintainability in the long run.

No functional changes were made, just a step towards a more streamlined codebase.
- Integrated vision processing for videos and images, improving context handling within the model.
- Added error logging for missing utility dependencies to inform users about installation requirements.
- Updated YAML configuration to standardize prompt handling for various video tasks.
- Bumped version number to indicate ongoing development status.

These changes streamline how visuals are managed in the model, contributing to better assistant responses in tasks involving media.
@Luodian
Copy link
Contributor Author

Luodian commented Sep 23, 2024

Now Qwen2-VL is changed to use original qwen_vl_utils do process images and videos.

Here's the difference:

  1. pure transformers operations
image
  1. qwen_vl_utils operations
image

- Added automatic naming for W&B runs if not specified, improving organization.
- Updated video frame rate from 1.0 to 0.5 for better performance and resource management during visual content processing.
- Streamlined W&B logging by removing redundant code, ensuring cleaner execution flow.

These changes optimize logging efficiency and enhance the overall user experience.
- Updated chat template logic for better formatting in responses, ensuring consistent handling of user and assistant roles.
- Reduced maximum new tokens in multiple evaluation files to ensure more concise outputs and improve efficiency.
- Enhanced clarity in few-shot tasks by explicitly labeling question and answer roles in generated text.
- Simplified logging of contextual and target information during evaluation, ensuring better tracking of results.

These adjustments improve the overall output quality and streamline the evaluation processes.
@Luodian Luodian merged commit 259e494 into main Sep 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants