Replies: 1 comment 1 reply
-
I've done this using https://github.com/lhotse-speech/lhotse, which has a Whisper model with word-level alignment |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is there any way that we can get an audio as well as its word-level timestamps?
Beta Was this translation helpful? Give feedback.
All reactions