Add scripts for speech-to-text using whisper and stt+forced alignment with whisperX #13

900miles · 2024-03-19T00:33:52Z

Adds two functions for using Whisper or WhisperX to transcribe an audio file, and can perform speaker diarization and forced alignment of text output if using WhisperX.

… with whisperX

Rahul-Brito · 2024-03-20T17:36:02Z

hey @900miles this looks great so far. Could you try to play around with having the input to the functions be the Audio class. this is a nice way to zip the signal and sampling rate throughout the functions

see here for when it is output, and two lines down from there where it is an input

b2aiprep/src/b2aiprep/process.py

Line 51 in b5b342f

return Audio(resampler(self.signal.unsqueeze(0)).squeeze(0), 16000)

fabiocat93 · 2024-03-21T15:48:44Z

hey @900miles do you mind adding the packages you use in your process.py file to the dependencies of the package?

900miles · 2024-03-22T01:19:10Z

New commit should allow working directly with Audio objects. I've also added a requirements.txt but I've never really made one before so I'm not sure if I did it correctly.

satra · 2024-03-22T02:43:15Z

instead of a requirements.txt just add it to the pyproject.toml

satra · 2024-03-22T02:44:34Z

also perhaps change the filename to speech2text.

900miles · 2024-03-22T03:22:10Z

Done and done!

remove whisper and update whisperx

Miles B Silva added 3 commits March 18, 2024 20:19

Add scripts for speech-to-text using whisper and stt+forced alignment…

b5b342f

… with whisperX

Add scripts for speech-to-text using whisper and stt+forced alignment…

d9c90c4

… with whisperX

Add args to function docstring

d8db77f

Use Audio object instead of file paths; add requirements.txt

1717ade

Move requirements to pyproject.toml and rename file

cf84b79

satra added 5 commits March 27, 2024 13:23

ref: add some dependencies and format annotations

8db9868

fix: merge conflicts

0c77b01

fix: adjust to 16KHz sample rate and allow for char level transcription

56c36bf

tst: add whisperx test

0c4671a

Merge pull request #21 from sensein/enh/whisper

23bdf24

remove whisper and update whisperx

satra closed this Mar 28, 2024

satra reopened this Mar 28, 2024

ref: restrict python for TTS

4a31e77

satra added release minor labels Mar 28, 2024

satra merged commit 1c1ed04 into main Mar 28, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scripts for speech-to-text using whisper and stt+forced alignment with whisperX #13

Add scripts for speech-to-text using whisper and stt+forced alignment with whisperX #13

900miles commented Mar 19, 2024

Rahul-Brito commented Mar 20, 2024

fabiocat93 commented Mar 21, 2024 •

edited

Loading

900miles commented Mar 22, 2024

satra commented Mar 22, 2024

satra commented Mar 22, 2024

900miles commented Mar 22, 2024

Add scripts for speech-to-text using whisper and stt+forced alignment with whisperX #13

Add scripts for speech-to-text using whisper and stt+forced alignment with whisperX #13

Conversation

900miles commented Mar 19, 2024

Rahul-Brito commented Mar 20, 2024

fabiocat93 commented Mar 21, 2024 • edited Loading

900miles commented Mar 22, 2024

satra commented Mar 22, 2024

satra commented Mar 22, 2024

900miles commented Mar 22, 2024

fabiocat93 commented Mar 21, 2024 •

edited

Loading