Created SER tutorial #201

wilke0818 · 2024-11-18T21:58:37Z

Description

Creates a tutorial using Senselab for SER.

Related Issue(s)

Motivation and Context

We were lacking a tutorial using the functionality for audio classification, which currently has one specific implementation, speech emotion recognition, and so now novice users can better understand how to Senselab might be useful for this task.

How Has This Been Tested?

Through Colab

fabiocat93 · 2024-11-18T22:30:50Z

Thanks @wilke0818 for the tutorial! It’s helpful and does a good job of addressing real challenges that users might face. I like it, and I’m curious to see how actual users respond.

A few things are still missing to make this more complete:

Senselab installation: We should install senselab only when running in Colab. If the user is running elsewhere, we can assume they’ve already set it up. You can use a function like this:

def is_colab():
    try:
        import google.colab
        return True
    except ImportError:
        return False

if is_colab():
    %pip install senselab
else:
    print("Not running on Colab. Skipping installation.")

API for classification and SER: Add an API for audio classification and speech emotion recognition (SER). For now, this API should call the Hugging Face-based function only if the model uses Hugging Face. If other types of models are used, raise a NotImplementedError. This will make this task more in line with the others in senselab and more easily maintainable.
Documentation for classification and SER: Add documentation for these tasks. It should explain what the tasks are and link to the tutorial. You can use resources like this to get started: https://huggingface.co/tasks/audio-classification.
This branch is out-of-date with the base branch: Please, update the branch before requesting a new review.

Let me know if you need help with any of these points!

fabiocat93

Thanks @wilke0818 . I have commented some required changes

fabiocat93 · 2024-12-23T15:26:31Z

hi, @wilke0818 ! Did you have any time to work on this?

wilke0818 · 2024-12-23T23:19:03Z

Nope. Added the tutorial change that you gave (might not have updated). Need to refactor for the API per the other issue on this topic. It is unclear to me what you would want for documentation. The functionalities themselves are documented already and this tutorial provides the information about the task a user might need (it is pretty akin to the link you sent).

This reverts commit 9749f39ec273cac1016fb3461e5cf1344f340073.

fabiocat93 · 2025-01-09T23:00:46Z

It is unclear to me what you would want for documentation. The functionalities themselves are documented already and this tutorial provides the information about the task a user might need (it is pretty akin to the link you sent).

Every task has a documentation page that explains what the task is, how it's commonly evaluated, what are the popular datasets and models. You can see the doc: https://sensein.group/senselab/senselab/audio/tasks/text_to_speech.html

wilke0818 · 2025-01-10T20:58:05Z

I mean that makes sense but do we want this to be an SER task or a generic audio classification task (which is what the HuggingFace pipeline is) which doesn't have a specific task/dataset but where SER is just an example usage of the task?

fabiocat93 · 2025-01-10T21:11:22Z

I mean, that makes sense, but do we want this to be an SER task or a generic audio classification task (which is what the HuggingFace pipeline is) that doesn't have a specific task/dataset but where SER is just an example usage of the task?

Following #197, both. I would implement both a classification task (as HuggingFace has) and a SER task and would make it so that SER exploits the classification interfaces and employs some checks before and after (e.g., outputs should be emotion-related)

Created SER tutorial

ca7eb4d

wilke0818 requested a review from fabiocat93 November 18, 2024 21:58

fabiocat93 assigned wilke0818 Nov 18, 2024

fabiocat93 added the enhancement New feature or request label Nov 18, 2024

fabiocat93 linked an issue Nov 18, 2024 that may be closed by this pull request

Task: create an abstract interface for senselab Speech Emotion Recognition and Audio Classification #197

Open

fabiocat93 requested changes Nov 20, 2024

View reviewed changes

fabiocat93 marked this pull request as draft November 20, 2024 02:39

wilke0818 and others added 6 commits December 27, 2024 14:18

Merge branch 'main' into ser_tut

0be838f

Add line fabio requested

0aab1f1

Revert "Add line fabio requested"

c3cf523

This reverts commit 9749f39ec273cac1016fb3461e5cf1344f340073.

Merged in main and added Fabio requested lines

afdc0e9

Removed continuous emotion since it doesn't fall under classification

d6d5b29

Adding code for audio classification result class for cleaner outputs

1c47c93

Convert code to use common API formalism

bf7f43e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created SER tutorial #201

Created SER tutorial #201

wilke0818 commented Nov 18, 2024

fabiocat93 commented Nov 18, 2024

fabiocat93 left a comment

fabiocat93 commented Dec 23, 2024

wilke0818 commented Dec 23, 2024

fabiocat93 commented Jan 9, 2025

wilke0818 commented Jan 10, 2025

fabiocat93 commented Jan 10, 2025

Created SER tutorial #201

Are you sure you want to change the base?

Created SER tutorial #201

Conversation

wilke0818 commented Nov 18, 2024

Description

Related Issue(s)

Motivation and Context

How Has This Been Tested?

fabiocat93 commented Nov 18, 2024

fabiocat93 left a comment

Choose a reason for hiding this comment

fabiocat93 commented Dec 23, 2024

wilke0818 commented Dec 23, 2024

fabiocat93 commented Jan 9, 2025

wilke0818 commented Jan 10, 2025

fabiocat93 commented Jan 10, 2025