-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/faster whisper #2
Conversation
ruokolt
commented
Jan 11, 2024
•
edited
Loading
edited
- Replace OpenAI Whisper with Faster Whisper.
- Upgrade pyannote diarization pipeline from version 3.0 to 3.1
- Update readme and docs
- Add .lua version 20240111
Create the faster-whisper branch from personal fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks Teemu.
Please go ahead and merge it.
def convert_to_wav(input_file, tmp_dir): | ||
"""Pyannote diarization pipeline does handle resampling to ensure 16 kHz and | ||
stereo/mono mixing. However, number of supported audio/video formats appears to be | ||
limited and not listed in README. To be sure, we convert all files to .wav beforehand. | ||
|
||
https://huggingface.co/pyannote/speaker-diarization-3.1 | ||
""" | ||
|
||
if str(input_file).lower().endswith(".wav"): | ||
logger.info(f".. .. File is already in wav format: {input_file}") | ||
return input_file | ||
|
||
if not Path(input_file).is_file(): | ||
logger.info(f".. .. File does not exist: {input_file}") | ||
return None | ||
|
||
converted_file = Path(tmp_dir) / Path(Path(input_file).name).with_suffix(".wav") | ||
if Path(converted_file).is_file(): | ||
logger.info(f".. .. Converted file {converted_file} already exists.") | ||
return converted_file | ||
try: | ||
AudioSegment.from_file(input_file).export(converted_file, format="wav") | ||
logger.info(f".. .. File converted to wav: {converted_file}") | ||
return converted_file | ||
except Exception as err: | ||
logger.info(f".. .. Error while converting file: {err}") | ||
return None | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we are using the same code for Kubernetes in the future or not, but just checking the suffix of a file is not enough.
How WhisperX and OpenAI Whisper did the format change were calling a subprocess with ffmpeg in the background. Look at here.
I suggest we can do the same, but it's not necessary at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the suffix check is hacky and should be replaced with a proper check in the future. ffmpeg in a subprocess works also for the conversion.
- Replace OpenAI Whisper with Faster Whisper - Upgrade pyannote diarization pipeline from version 3.0 to 3.1 - Update readme and docs - Add .lua version 20240111