Assessing the performance of the energy-detection sampler in enhancing speech rates #227
lucasgautheron
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The energy-based sampler samples portions of the recordings above some energy level. It can be used to sample from the portions of the recordings that are not pure silence. It is also a model-independent way of targeting speech segments.
In order to assess the performance of the energy-based sampler, we ran it on the VanDam daylong dataset using energy thresholds ranging from 0% (= no selection) to 90% (= discard the 90% lowest energy windows) for a window-length of 30 seconds. We then plotted the average amount of OCH and CHI vocalization time per window passing the energy cut; the fraction of those windows that have at least some OCH/CHI speech; and the ratio of OCH/CHI voc. time as estimated by the VTC, the LENA, and the CHAT transcripts.
Note: in the CHAT annotations, every portion of audio is labelled with a speaker, even when the audio is silent. Therefore, speech time is completely unreliable; the presence of speech in some window is more reliable, and it is even more so as the signal energy increases.
Beta Was this translation helpful? Give feedback.
All reactions