Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/faster whisper #2

Merged
merged 45 commits into from
Jan 12, 2024
Merged

Feature/faster whisper #2

merged 45 commits into from
Jan 12, 2024

Conversation

ruokolt
Copy link
Contributor

@ruokolt ruokolt commented Jan 11, 2024

  • Replace OpenAI Whisper with Faster Whisper.
  • Upgrade pyannote diarization pipeline from version 3.0 to 3.1
  • Update readme and docs
  • Add .lua version 20240111

@ruokolt ruokolt requested a review from hsnfirooz January 11, 2024 11:13
Copy link
Member

@hsnfirooz hsnfirooz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks Teemu.

Please go ahead and merge it.

Comment on lines +266 to +294
def convert_to_wav(input_file, tmp_dir):
"""Pyannote diarization pipeline does handle resampling to ensure 16 kHz and
stereo/mono mixing. However, number of supported audio/video formats appears to be
limited and not listed in README. To be sure, we convert all files to .wav beforehand.

https://huggingface.co/pyannote/speaker-diarization-3.1
"""

if str(input_file).lower().endswith(".wav"):
logger.info(f".. .. File is already in wav format: {input_file}")
return input_file

if not Path(input_file).is_file():
logger.info(f".. .. File does not exist: {input_file}")
return None

converted_file = Path(tmp_dir) / Path(Path(input_file).name).with_suffix(".wav")
if Path(converted_file).is_file():
logger.info(f".. .. Converted file {converted_file} already exists.")
return converted_file
try:
AudioSegment.from_file(input_file).export(converted_file, format="wav")
logger.info(f".. .. File converted to wav: {converted_file}")
return converted_file
except Exception as err:
logger.info(f".. .. Error while converting file: {err}")
return None


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we are using the same code for Kubernetes in the future or not, but just checking the suffix of a file is not enough.
How WhisperX and OpenAI Whisper did the format change were calling a subprocess with ffmpeg in the background. Look at here.

I suggest we can do the same, but it's not necessary at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the suffix check is hacky and should be replaced with a proper check in the future. ffmpeg in a subprocess works also for the conversion.

@ruokolt ruokolt merged commit 428a660 into main Jan 12, 2024
2 checks passed
@ruokolt ruokolt deleted the feature/faster-whisper branch January 12, 2024 07:55
ruokolt added a commit that referenced this pull request Jan 12, 2024
- Replace OpenAI Whisper with Faster Whisper
- Upgrade pyannote diarization pipeline from version 3.0 to 3.1
- Update readme and docs
- Add .lua version 20240111
@ruokolt ruokolt mentioned this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants