Recognition of Emotion from Speech directly.
- The main problem with traditional systems is that the errors occurred while transcribing audio is propagated and that affects the emotion recognition task.
- We lose important acoustic features like pitch, loudness etc.
- Confusing emotional contexts in some cases. For example, the phrase, ‘What a man!’ can indicate surprise or even disgust.
Only the audio tracks have been used.
Dataset description:
○ Gender balanced consisting of 24 professional
actors.
○ 7 emotions in total: calm, happy, sad,
angry,fearful, surprise, and disgust
expressions.
○ Each expression is produced at two levels of
emotional intensity, with an additional neutral
expression.
○ 1250 samples in total.
Dataset description:
○ Voices of two actresses, aged 26 and 64.
○ 7 emotions in total: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral.
○ Audiometric testing indicated that both actresses have thresholds within the normal range.
○ 1370 samples in total.