Speech Emotion Recognition

Speech is the most natural way of expressing ourselves as humans. It is only natural then to extend this communication medium to computer applications. We define speech emotion recognition (SER) systems as a collection of methodologies that process and classify speech signals to detect the embedded emotions. SER is not a new field, it has been around for over two decades, and has regained attention thanks to the recent advancements.

This is my first attempt at audio classification on Colab. I am using the popular dataset Crema from Speech Emotion Recognition (en) which contains 7,442 original clips from 91 actors - 48 male and 43 female of a wide range of ages, races and ethnicities.
The actors spoke from a selection of 12 sentences, each presented using one of six emotions (anger, disgust, fear, happiness, neutral and sadness).