You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying too get mood predictions using Essentia models, but we are unsure about the way to use the outputs. We are getting a lists of value pairs, but unsure how to convert / interpret this list of pairs (assuming they are arousal and valence pairs).
The text was updated successfully, but these errors were encountered:
Hi @Aradhya-Tripathi,
All our arousal/valence models operate on small chunks of 1 to 3 seconds (depending on your chosen embedding model).
The expected output shape is: (T, D), where T depends on the audio duration, and D represents the valence and arousal values. To predict overall A/V values, you can compute the average along the time axis (T).
We are trying too get mood predictions using Essentia models, but we are unsure about the way to use the outputs. We are getting a lists of value pairs, but unsure how to convert / interpret this list of pairs (assuming they are arousal and valence pairs).
The text was updated successfully, but these errors were encountered: