Word-level timestamps #276

junwang4 · 2023-01-29T16:17:21Z

junwang4
Jan 29, 2023

Is there any way that we can get an audio as well as its word-level timestamps?

carlosdp · 2023-01-29T20:12:58Z

carlosdp
Jan 29, 2023

I've done this using https://github.com/lhotse-speech/lhotse, which has a Whisper model with word-level alignment

1 reply

junwang4 Feb 4, 2023
Author

I am currently using https://github.com/lowerquality/gentle to do the word-level alignment for the tortoise-tts audio output. It is pretty fast on CPU and the result is good enough. If tortoise can output the word-level alignment as a by-product, that would save me the additional step. BTW, what is your experience of using Lhotse? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word-level timestamps #276

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Word-level timestamps #276

junwang4 Jan 29, 2023

Replies: 1 comment · 1 reply

carlosdp Jan 29, 2023

junwang4 Feb 4, 2023 Author

junwang4
Jan 29, 2023

Replies: 1 comment 1 reply

carlosdp
Jan 29, 2023

junwang4 Feb 4, 2023
Author