Feature/faster whisper #2

ruokolt · 2024-01-11T11:12:16Z

Replace OpenAI Whisper with Faster Whisper.
Upgrade pyannote diarization pipeline from version 3.0 to 3.1
Update readme and docs
Add .lua version 20240111

Create the faster-whisper branch from personal fork

hsnfirooz

Looks good to me! Thanks Teemu.

Please go ahead and merge it.

hsnfirooz · 2024-01-11T15:50:33Z

src/speech2text.py

+def convert_to_wav(input_file, tmp_dir):
+    """Pyannote diarization pipeline does handle resampling to ensure 16 kHz and
+    stereo/mono mixing. However, number of supported audio/video formats appears to be 
+    limited and not listed in README. To be sure, we convert all files to .wav beforehand.
+
+    https://huggingface.co/pyannote/speaker-diarization-3.1
+    """
+
+    if str(input_file).lower().endswith(".wav"):
+        logger.info(f".. .. File is already in wav format: {input_file}")
+        return input_file
+
+    if not Path(input_file).is_file():
+        logger.info(f".. .. File does not exist: {input_file}")
+        return None
+
+    converted_file = Path(tmp_dir) / Path(Path(input_file).name).with_suffix(".wav")
+    if Path(converted_file).is_file():
+        logger.info(f".. .. Converted file {converted_file} already exists.")
+        return converted_file
+    try:
+        AudioSegment.from_file(input_file).export(converted_file, format="wav")
+        logger.info(f".. .. File converted to wav: {converted_file}")
+        return converted_file
+    except Exception as err:
+        logger.info(f".. .. Error while converting file: {err}")
+        return None
+
+


Not sure if we are using the same code for Kubernetes in the future or not, but just checking the suffix of a file is not enough.
How WhisperX and OpenAI Whisper did the format change were calling a subprocess with ffmpeg in the background. Look at here.

I suggest we can do the same, but it's not necessary at this point.

Yes, the suffix check is hacky and should be replaced with a proper check in the future. ffmpeg in a subprocess works also for the conversion.

- Replace OpenAI Whisper with Faster Whisper - Upgrade pyannote diarization pipeline from version 3.0 to 3.1 - Update readme and docs - Add .lua version 20240111

ruokolt and others added 30 commits November 28, 2023 14:06

MOD: change whisper model location

5b236ac

MOD: change whisper model location

352b116

MOD: change whisper model location

572ab5f

MOD: change whisper model location

d36f059

trying out whisper cache path

aa6f28b

update

30a70f2

add faster-whisper dependency

7cbd79c

fix no path error for first run

e1cc211

fix module whisper cache folder

c42c259

cleaned the code for faster-whisper compatibility

74b25cd

remove original whisper dependency

ae212e7

changed default thread to six for faster inference

3fb2769

Merge pull request #1 from hsnfirooz/feature/faster-whisper

1367a80

Create the faster-whisper branch from personal fork

remove unused import

8f8a26b

add development .lua

a2b7b92

remove whisper_cache

67b01c5

fix: remove download_root

7e52dd7

del: remove _MODELS and available_models() in utils.py

07de85e

fix: remove hub from HF_HOME path

eb3ad17

fix: remove imports

237205f

remove .wav conversion mention from submit.py

45ed3de

updated .lua files

9837cf6

delete obsolete data folder

1b04e2f

update .gitignore

587bd1c

update README

bc75ea7

fix version

b390820

update docs

3cf38ed

add line spacing to print outs and lint

3f2cec9

add check_language and check_email functions

09b4135

force language 2-letter abbreviation convertions

df4099b

ruokolt added 11 commits January 10, 2024 11:46

lower case args.SPEECH2TEXT_LANGUAGE

a31aee9

fix: keys and values to list

385b655

rename load_model and load_pipeline

024da9b

reorder diarization and transcription

5bade96

convert input audio to .wav again (debug pyannote)

70fbac6

fix logging

f7f3902

fix: language.lower()

8a30e7d

FIX: PYANNOTE_CACHE, config, readme

6828598

DOC: add info about pyannote file format support

d94079d

rename: .lua date

a372657

update readme

fa00f4e

ruokolt requested a review from hsnfirooz January 11, 2024 11:13

ruokolt added 4 commits January 11, 2024 13:32

fix version in .lua

038d5f0

mod: only permit full language names (no abbreviations)

92471f1

fix logging

84ffdea

fix logging

9107a10

hsnfirooz approved these changes Jan 11, 2024

View reviewed changes

ruokolt merged commit 428a660 into main Jan 12, 2024
2 checks passed

ruokolt deleted the feature/faster-whisper branch January 12, 2024 07:55

ruokolt added a commit that referenced this pull request Jan 12, 2024

Feature/faster whisper (#2)

def9934

- Replace OpenAI Whisper with Faster Whisper - Upgrade pyannote diarization pipeline from version 3.0 to 3.1 - Update readme and docs - Add .lua version 20240111

ruokolt mentioned this pull request Jan 30, 2024

Add type hints #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/faster whisper #2

Feature/faster whisper #2

ruokolt commented Jan 11, 2024 •

edited

Loading

hsnfirooz left a comment

hsnfirooz Jan 11, 2024

ruokolt Jan 12, 2024

Feature/faster whisper #2

Feature/faster whisper #2

Conversation

ruokolt commented Jan 11, 2024 • edited Loading

hsnfirooz left a comment

Choose a reason for hiding this comment

hsnfirooz Jan 11, 2024

Choose a reason for hiding this comment

ruokolt Jan 12, 2024

Choose a reason for hiding this comment

ruokolt commented Jan 11, 2024 •

edited

Loading