[Audio] Qwen Audio Example #1082

kylesayrs · 2025-01-19T01:13:18Z

Purpose

Demonstrate support for compressing audio models through examples

Prerequisites

[Audio] Support Audio Datasets #1085

Changes

Examples
- examples/multimodal_audio/whisper_example.py
- examples/multimodal_audio/qwen2_audio_example.py
Traceable definitions
- TraceableWhisperForConditionalGeneration
- TraceableQwen2AudioForConditionalGeneration
Add support for special case where the processor only supports **kwargs, as is the case for the Whisper processor -_-

TODO

Qwen Audio

Testing

Signed-off-by: Kyle Sayers <[email protected]>

## Purpose ## * Support oneshot with audio datasets ## Changes ## * Extend `apply_pad_mask_to_batch` to handle cases where there are no `input_ids` and where there might be `decoder_input_ids` * Extend `TextGenerationDataset` to detect if a dataset is already tokenized based on `processor.model_input_names` rather than only `input_ids` ## Testing ## * Ran `test_processors.py` to completion, which verifies that the `model_input_names` attribute is defined for most processors * Ran whisper to completion in #1082 <details><summary>test_processors.py</summary> ```python3 import pytest from transformers import AutoProcessor @pytest.mark.parametrize( "model_id,expected", [ ("meta-llama/Meta-Llama-3-8B-Instruct", ["input_ids", "attention_mask"]), ("mistralai/Mixtral-8x7B-Instruct-v0.1", ["input_ids", "attention_mask"]), ( "Qwen/Qwen2-VL-2B-Instruct", [ "input_ids", "attention_mask", "pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw", ], ), ("mgoin/pixtral-12b", ["input_ids", "attention_mask", "pixel_values"]), ("openai/whisper-large-v2", ["input_features"]), ( "Qwen/Qwen2-Audio-7B-Instruct", ["input_ids", "attention_mask", "input_features", "feature_attention_mask"], ), ], ) def test_processor_model_input_names(model_id, expected): """ Tests the model_input_names attribute of common model processors """ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) assert processor.model_input_names == expected ``` </details> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 3 commits January 19, 2025 00:23

WIP

470b425

Signed-off-by: Kyle Sayers <[email protected]>

WIP: traceable, sample generation WIP

5276c9f

Signed-off-by: Kyle Sayers <[email protected]>

WIP: working, need to change ds split

898f86e

Signed-off-by: Kyle Sayers <[email protected]>

vllm-project deleted a comment from github-actions bot Jan 19, 2025

kylesayrs added 5 commits January 19, 2025 01:35

readme todo

8ca9b6d

Signed-off-by: Kyle Sayers <[email protected]>

split to peoples_speech dataset

98aca16

Signed-off-by: Kyle Sayers <[email protected]>

use cleanup example, add todo check

7067c3f

Signed-off-by: Kyle Sayers <[email protected]>

qwen2, need to add traceability

0848fb6

Signed-off-by: Kyle Sayers <[email protected]>

WIP

fbb6322

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~Audio Examples~~ [Audio] Support and Examples Jan 19, 2025

kylesayrs added 7 commits January 20, 2025 19:25

use model_input_names

f5daa3d

Signed-off-by: Kyle Sayers <[email protected]>

remove debug statements

fa69e8a

Signed-off-by: Kyle Sayers <[email protected]>

simplify example, fix tokenizer condition

504c7dc

Signed-off-by: Kyle Sayers <[email protected]>

shorten

e2f3735

Signed-off-by: Kyle Sayers <[email protected]>

restore example

9b34135

Signed-off-by: Kyle Sayers <[email protected]>

support audio datasets

2f3a416

Signed-off-by: Kyle Sayers <[email protected]>

mask decoder_input_ids

74283e8

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request Jan 20, 2025

[Audio] Support Audio Datasets #1085

Merged

kylesayrs added 3 commits January 20, 2025 20:56

Merge branch 'kylesayrs/audio-datasets' into kylesayrs/audio_examples

742dd6e

update example sample

36ec9f0

Signed-off-by: Kyle Sayers <[email protected]>

asdf

d11af96

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~[Audio] Support and Examples~~ [Audio] Whisper and Qwen Examples Jan 25, 2025

plug in readme

f1bd1d2

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs force-pushed the kylesayrs/audio_examples branch from 8897342 to f1bd1d2 Compare January 25, 2025 04:53

kylesayrs added 5 commits January 25, 2025 04:54

Merge remote-tracking branch 'origin' into kylesayrs/audio_examples

3c9af2e

add readme

0cbf97c

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/audio_examples

f8ebc5c

gibberish is produced, even when the model is exactly copied

8c40a65

Signed-off-by: Kyle Sayers <[email protected]>

update readme

6b775ef

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the base branch from main to kylesayrs/whisper_audio_example January 28, 2025 21:04

kylesayrs changed the base branch from kylesayrs/whisper_audio_example to main January 28, 2025 21:04

kylesayrs changed the title ~~[Audio] Whisper and Qwen Examples~~ [Audio] Qwen Audio Example Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Audio] Qwen Audio Example #1082

[Audio] Qwen Audio Example #1082

kylesayrs commented Jan 19, 2025 •

edited

Loading

[Audio] Qwen Audio Example #1082

Are you sure you want to change the base?

[Audio] Qwen Audio Example #1082

Conversation

kylesayrs commented Jan 19, 2025 • edited Loading

Purpose

Prerequisites

Changes

TODO

Testing

kylesayrs commented Jan 19, 2025 •

edited

Loading