Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts for speech-to-text using whisper and stt+forced alignment with whisperX #13

Merged
merged 11 commits into from
Mar 28, 2024

Conversation

900miles
Copy link

Adds two functions for using Whisper or WhisperX to transcribe an audio file, and can perform speaker diarization and forced alignment of text output if using WhisperX.

@Rahul-Brito
Copy link
Contributor

hey @900miles this looks great so far. Could you try to play around with having the input to the functions be the Audio class. this is a nice way to zip the signal and sampling rate throughout the functions

see here for when it is output, and two lines down from there where it is an input

return Audio(resampler(self.signal.unsqueeze(0)).squeeze(0), 16000)

@fabiocat93
Copy link
Contributor

fabiocat93 commented Mar 21, 2024

hey @900miles do you mind adding the packages you use in your process.py file to the dependencies of the package?

@900miles
Copy link
Author

New commit should allow working directly with Audio objects. I've also added a requirements.txt but I've never really made one before so I'm not sure if I did it correctly.

@satra
Copy link
Contributor

satra commented Mar 22, 2024

instead of a requirements.txt just add it to the pyproject.toml

@satra
Copy link
Contributor

satra commented Mar 22, 2024

also perhaps change the filename to speech2text.

@900miles
Copy link
Author

Done and done!

@satra satra closed this Mar 28, 2024
@satra satra reopened this Mar 28, 2024
@satra satra merged commit 1c1ed04 into main Mar 28, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants